We run a newspaper-style site and are consolidating our architecture from one that grew amorphously into a more scalable, resiliant solution.
I was thinking of the following:
internet
|
h/w firewall
|
h/w load balancer
| |
| control server (nagios, mail server & misc)
|
pair of nginx load-balancing reverse caching proxies
| |
pair of apache app servers pair of mogilefs storage nodes
and mogilefs trackers
|
pair of mysql dbs (master/slave)
and mogilefs db
all machines will run 64-bit centos.
We need to be able to service 7 simultaneous users on the app servers, and serve 840 static files per second. So I was thinking of speccing things out like below:
- mogilefs storage nodes - 2GB RAM, Intel Atom (1.6GHz)
- app servers - 8GB RAM, AMD Athlon II X2 (2.8GHz)
- reverse proxies & control server - 4GB RAM, AMD Athlon II X2 (2.8GHz)
- dbs - 8GB RAM, AMD Phenom II X6 (2.8GHz)
All would have 7.2krpm disks. There's not a huge amount of data in the database, so it can basically all be cached in buffers. Plus we only have around a 15% memcached miss rate, so there's not a huge load on the db.
A future stage would be round-robin DNS with everything mirrored to a different data centre.
Is there anything missing from this topology? Has anyone done anything similar with any of the components? Do the machines seem like they're under-/over-specced?
Thanks
EDIT
A bit more info:
7 simultaneous page views per second to be served by apache - a lot of the cms content is cached anyway, on disk & using memcached where possible. 840 static files need serving per second - but this may be a little too high since with far-future expiry dates only a fraction of page views will be with cold caches on the client.
The only admins will upload static content to the mogilefs storage nodes. They might upload ~100 files per day. I'm new to mogilefs - they'll just use commodity disks (7.2krpm)
This content will then be accessed via http://static*.ourdomain... Nginx will proxy the request to this content and cache it locally so while the first retrieval may be a little slow, subsequent retrievals will come from the nginx cache.
You're doing ~7 page req/s from the (dynamic) webservers, and ~850 req/s for (smallfile) static content, and for this you need a multi-layered architecture with ~10 servers?
Just off the top of my head, that sounds way too slow. Either you're overbuilding, or your site has some slow slow code, or something else?
I would propose to benchmark your application thoroughly, and from that build an estimate on what hardware you need for your load.
A few thoughts:
Having 2 load balacing layers is additional complexity, is that needed? How about just one HW load balancer, and a single cache server (Squid or Varnish).
Never use Atom CPUs for real servers, they are way underpowered.
I don't see why you want to use old desktop class CPUs like dual-core Athlons. Modern quad-core server CPUs are at least 2x faster in real use. Using modern more powerful hardware would allow you to consolidate layers and simplify your architecture.
MogileFS is probably great; I don't know much about it except its origin and that it has been in heavy use for years with great success. But why set up a technology you're not familiar with just to scale to 2 servers? If you just need the performance level of 2 servers with Intel Atom CPUs, then ditch that config, and get a single modern quadcore server with a fast disk subsystem (4 or 8 disk RAID 10, or SSDs) instead.
Recommendations:
Your architecture above is sound and well considered. But get some numbers for the real-life performance of the individual parts. :-)
This is a bit too general to ask in a simple question. You will need to provide a lot more input on your proposed solution and on the load:
Also, I don't see any memcached in there; depending on the setup that could be useful.
7 simultaneous users does not sound a lot, how many pageviews per second is that in your view?
Edit to reflect the new info:
There are a lot of details to flesh out, but this appears to be reasonable. A lot will depend on how you configure the nginx caching and the CMS. Keep the network in mind as well, I'd suggest at least gigabit.
I'm a bit concerned with the mogilefs performance. If you are still in the design phase, I would suggest looking at alternatives (maybe direct filesystem replication) or future migration scenarios, depending on your requirements.
Also, your loadbalancer is presently a very highlevel element in the design. Untill you are very sure of the requirements in terms of performance and features, I'd leave all the options on the table there.