the mysql database has arround 2TB of data.
i have a master-slave-slave replication running. the application that uses the database does read (SELECT) queries just on one of the 2 slaves and write (DELETE/INSERT/UPDATE) queries on the master. the application does way more reads, than writes.
if we have a problem with the read (SELECT) queries, we can just add another slave database and tell the application, that there is another salve. so it scales well...
Currently, the master is running arround 40% disk io due to the writes.
So i'm thinking about how to scale the the database in the future. Because one day the master will be overloaded.
What could be a solution there?
maybe mysql cluster? if so, are there any pitfalls or limitations in switching the database to ndb?
thanks a lot in advance... :)
There is no one-size-fits-all answer to scaling MySQL. A few general tips:
Scale "diagonally" as long as you can, ie. keep things on a single MySQL server as long as you're still able to run on commodity hardware. That probably means 2 x quad-core CPUs, 64+ GB RAM, 8 disk RAID 10 -- or higher. The upper end of what is "commodity hardware" is getting faster each year.
Have a look at Brad Fitzpatrick's presentations about scaling LiveJournal. They're pretty much classics as far as scaling LAMP goes. On page 25 - 26 of this presentation you see the problem you will eventually face with MySQL replication: The writes consume all available disk I/O.
Read "High Performance MySQL". It's a really good book by authors who have seen many high-load MySQL installations.
Avoid sharding (spreading data over many MySQL servers) as long as possible. When you start sharding, you give up most benefits of relational databases, and you slow down development. If you have to do sharding, consider using a NoSQL datastore with a built-in multi server model instead -- fx Riak, Cassandra, HBase, MongoDB. Ideally, do "functional partitioning" between MySQL and NoSQL, so that you keep using MySQL for lesser hot data that fits well into an RDBMS, and you use the NoSQL engine for 'hot' data you don't need to join with the MySQL data.
In "Web Operations" there is a chapter on MySQL by Baron Schwartz. He pretty much justs says "No!" to using MySQL Cluster / NDB in a website environment. Quote: ".. it doesn't perform well for joins and GROUP BY queries, and web applications need those.".
MySQL clustering will gain you scalability of writes by breaking your database into fragments that are spread over multiple machines. But it will drastically slow down complex queries that pull data from multiple fragments. Only you can determine the effect of this on your application's performance.
You might want to look at sharding the data manually rather than letting the clustering engine do it for you. It will take more configuration, but if you understand how your application uses the database, you may be able to come up with a sharding scheme that allows most queries to only access one shard.
Remember, that MySQL replication is single threaded, so probably your replication will be not limited by master capacity, but by slaves, which can't keep behind the master and will be out of sync. From this article:
You can think about clustering the information itself.If it is possible - to separate write consuming tables between different servers.