I am wondering where to begin with database scaling/optimization strategies. After reading articles like highscalability.com's facebook architecture article, and this twitter architecture article, I am not sure if by RAM they mean only memcached, or something else.
My questions are:
- Can entire SQL databases be stored in RAM? I'm thinking sharding and all that here... Just learning (not from a db background).
- Are SQL database indexes stored in RAM?
- Are NOSQL databases stored in RAM? Can they be?
- Or is Memcached the standard for storing things in RAM? So 99% of requests read from RAM (Memcached), and the database (disk) is a backup?
Just looking for a point in the right direction.
The trouble with storing a database in RAM is that RAM has a nasty habit of forgetting everything when the power goes away i.e. it's not persistent. That said, making proper use of memory for high-traffic sites is absolutely essential to getting decent performance, because you get extremely good IO rates from it and that's a very useful thing if you have a high load.
From memory, MySQL had a MEMORY table type that stored data in RAM rather than on disk (as InnoDB and MyISAM would). Creative use of a RAMdisk would also allow any database to use RAM as a disk backing, but as above, this isn't probably what you'd want to do. As you've hit upon, a more useful application would be the use of RAM as a high-performance cache, using something like Memcached. As I'm sure you know, this gives a fast key/value store, but requires the application to know to look there first and then fall back to the persistent database if nothing's found. Sites which require a high IO rate across their entire relational DB have the option of dumping the entire DB onto something like a Fusion IO drive. This isn't going to be as quick as RAM, but has the option of being persistent so can be a useful middle ground. I believe SO runs it's database on a Fusion IO drive (see this blog post about their findings.
So, in summary, a high-volume site will have its data stored on persistent storage (spinning disk, SSD etc.) and then set up a series of layers of higher-performing caches in order to reduce (usually the reads) load on the database. Writes typically go straight into the db, but you can use a localized write-cache if you've got a lot of writes.
In answer to your specific questions:
I dont know how twitter does it concrete, but to answer your general questions:
Can X stored in RAM questions: Yes if the structures are not already cached in RAM by the database system itself, imagine just a RAM-disk as filesystem. Everything there is in memory. You got enormous bandwith from those systems. Drawback with those: Guess what happens when someone draws the plug... all your memory is lost. There are of course solutions to this, like doing regular snapshots/writing the stuff to real hard disks, or you could use persistent memory (not flash, that is far too slow and limited, but there are (really expensive) solutions like MRAM.
Yes, although I think twitter use a variety of technologies and not just RDBMS. There are engines for MySQL that ONLY run in memory for example (the cluster NDB if memory serves).
Frequently, yes.
Not by definition, but yes some can be. It's often best for any database to maximise use of RAM and minimise slow disk access.
Memcached is certainly 1 common front end cache to a lot of database back ends. I gave a presentation on the use of memcached with Amazon simpleDB a couple of years ago which may or may not be useful.
Strategy wise memcached in front of a database can be highly beneficial, but you can use clustering and protocol compatible solutions like membase too.
Twitter uses Redis for their DB operations. It is a form of NoSql DB. It is also in-memory , so is blazing fast in read/write operations. Twitter has adopted it to store all of their data in key-value form for all of their user data. Though it requires you to implement your own algos on how to manage the Redis Servers, how to implement sharding and also configure your own Master-Slave configurations. You can have a deeper look here https://www.youtube.com/watch?v=rP9EKvWt0zo
Yes, all the SQL databases can be stored in RAM, and it is a pretty standard method on high performance sites like this.
Yes, the SQL indexes are most likely stored in RAM as well.
You can store anything in RAM, it's simply a storage area. What you have to take into effect is storage size, and what else needs access to RAM to make sure you have enough.