I am considering whether we should get one big (48 Gb RAM) server or buy 4 smaller (12Gb RAM) servers for running memcached service. In either case I would also have stand-by spares for redundancy, and I will be able to add more servers if needed later, so this question is purely about performance, not fault tolerance or scalability.
I am leaning towards having one big server just because its easier to manage, "greener" and takes less space, but I don't know if it can perform on par with a set of smaller servers.
Any ideas would be greatly appreciated!
Andrey,
To answer your question I would refer you to the snippet below. Performance wise memcache client library would still be doing two internal lookups regardless of whether you have 1 or more instance.
But having only one instance might hurt because it would take sometime for the backup to warmup when it comes online.
http://www.linuxjournal.com/article/7451?page=0,1 A request to get/set a key with a value requires that the key be run through a hash function. A hash function is a one-way function mapping a key (be it numeric or string) to some number that is going to be the bucket number. Once the bucket number has been calculated, the list of nodes for that bucket is searched, looking for the node with the given key. If it's not found, a new one can be added to the list.
So how does this relate to Memcached? Memcached presents to the user a dictionary interface (key -> value), but it's implemented internally as a two-layer hash. The first layer is implemented in the client library; it decides which Memcached server to send the request to by hashing the key onto a list of virtual buckets, each one representing a Memcached server. Once there, the selected Memcached server uses a typical hash table.
Remember that the more memcached clients you have, the more connections you have to have open. If it's a web service running under Apache pre-forking, and you want to be able to handle 10,000 simultaneous connections, the difference between one server and 4 is 30,000 TCP connections.
As far as performance goes though, it really depends on your application. But having 4 servers may give you 4x as many CPUs and 4x as many network interfaces, unless you really beef up that central server. Above a certain point, the cost of those sort of upgrades goes non-linear (a 32-core system is going to cost way more than 4x the cost of a dual socket quad core, for example). But, I haven't seen memcached be CPU bounded typically, so it's definitely plausible that the single server may not have to be such a high end server.
But, as with many things, it's probably something you have to test for yourself with your specific applications. It seems likely to me that a single 48GB server could handle the load, so try getting a modest server with 48GB of RAM, put munin on it and run some stress testing. If you find bottle-necks, you have more information about whether you need to get 3 more servers and spread the RAM around them, or perhaps take some other action.
Well, are you sure it is greener? I'd rather have several, just for reliability and to keep scaling in your head.