I'm curious to know how big sites spread the load between the different DB server in the case where users write as much as they read, ie. when the standard solution of having one master to accept write, and several slaves that only let users read data doesn't work because it simply turns the master server into the bottleneck.
For those of you who manage a big site with a load balancer -> multiple web servers -> multiple DB servers, how do you spread the load evenly between the DB servers so that users (at best) don't have to wait for the master to update the slaves, or (at worst) users end up reading dirty data from slaves that haven't been updated yet?
Thank you.
Check out http://highscalability.com/
You can use more complicated methods of storing the data basically to denormalise and segment it into chunks that you can load balance across servers. Look for shards.
The general answer seems to be to make the single writing DB machine more and more powerful for as long as possible before you move to those other methods though.
In most cases the best way to solve the problem is to rethink how your site works to cut down the number of writes/make them batchable.
What you need is a proper multi-master database. And as far as I know the only DB engine that has so far implemented this in a reliable way is Oracle. Which goes some way to explain why all the big boys use Oracle.
Having said that, MySql does support multi-master replication, although (AFAIK) not in a full production release. See http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-replication-multi-master.html for more detail.
This answer does not answer the title of the question because it makes no attempt to keep the DBs in sync but it does answer the body of the question to do with distributing requests for high scale websites.
You can use Sharding to divide your data so for example you have 26 database servers one for each letter of the alphabet. All the users with name beginning with A go to one server. You can use various algorithms to divide up your requests evenly. It's a complex solution that shouldn't really be used until other options have been exhausted.
https://en.wikipedia.org/wiki/Shard_(database_architecture)
I presume you are talking about MySQL, based on your terms. Unfortunately this DBMS has lack of support for the distributed writes, only the NDB supports that.
http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-overview.html
http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-nodes-groups.html
Another solution can be: use DNS level partition based on your client GEO location resolv different IP addresses where to connect to and basically separate the data by this info. There is a problem with this sort of solution, if you have a query for example you want to know how many items do you have globally then this won't work very well.
It depends on the site and the part of the site.
Some pieces will have a single write server, which will then replicate to a bunch of read servers.
Other pieces of the site will have lots of servers each holding a small part of the data in them. For example a couple of million customer accounts per database server with logic in the application so that it knows which server you are on based on your UserId.
A solution is to rethink your application so that you can split data between multiple database servers. Sometimes it's easy... sometimes not.