We run a newspaper-style site and are consolidating our architecture from one that grew amorphously into a more scalable, resiliant solution.
I was thinking of the following:
internet
|
h/w firewall
|
h/w load balancer
| |
| control server (nagios, mail server & misc)
|
pair of nginx load-balancing reverse caching proxies
| |
pair of apache app servers pair of mogilefs storage nodes
and mogilefs trackers
|
pair of mysql dbs (master/slave)
and mogilefs db
all machines will run 64-bit centos.
We need to be able to service 7 simultaneous users on the app servers, and serve 840 static files per second. So I was thinking of speccing things out like below:
- mogilefs storage nodes - 2GB RAM, Intel Atom (1.6GHz)
- app servers - 8GB RAM, AMD Athlon II X2 (2.8GHz)
- reverse proxies & control server - 4GB RAM, AMD Athlon II X2 (2.8GHz)
- dbs - 8GB RAM, AMD Phenom II X6 (2.8GHz)
All would have 7.2krpm disks. There's not a huge amount of data in the database, so it can basically all be cached in buffers. Plus we only have around a 15% memcached miss rate, so there's not a huge load on the db.
A future stage would be round-robin DNS with everything mirrored to a different data centre.
Is there anything missing from this topology? Has anyone done anything similar with any of the components? Do the machines seem like they're under-/over-specced?
Thanks
EDIT
A bit more info:
7 simultaneous page views per second to be served by apache - a lot of the cms content is cached anyway, on disk & using memcached where possible. 840 static files need serving per second - but this may be a little too high since with far-future expiry dates only a fraction of page views will be with cold caches on the client.
The only admins will upload static content to the mogilefs storage nodes. They might upload ~100 files per day. I'm new to mogilefs - they'll just use commodity disks (7.2krpm)
This content will then be accessed via http://static*.ourdomain... Nginx will proxy the request to this content and cache it locally so while the first retrieval may be a little slow, subsequent retrievals will come from the nginx cache.