I currently use Amazon S3 for much of my static file serving needs but my monthly bill is getting very expensive. I did some rough calculations using the logs and at peak times, my most expensive Amazon bucket is handling about 100 180 Mbps of traffic. Mostly images under 50K.
S3 is hugely helpful when it comes to storage and redundancy but I don't really need to be paying for bandwidth and GET requests if I can help it. I have plenty of inexpensive bandwidth at my own datacenter, so I configured an nginx server as a caching proxy and then primed the cache with the bulk of my files (about 240 GB) so that my disk wouldn't be writing like crazy on an empty cache.
I tried cutting over and my my server choked.
It looks like my disks were the problem - this machine has 4 x 1 TB SATA disks (Barracuda XT) set up in RAID 10. It's the only thing that I had on hand with enough storage space to be used for this. I'm pretty sure nginx was set up properly as I had already been using it as a caching proxy for another, smaller Amazon bucket. Assuming that this is a reasonable amount of traffic for a single machine, maybe an SSD would be worth a try.
If you handle large amounts of static file serving, what hardware do you use?
additional information
Filesystem: ext4, mounted noatime,barrier=0,data=writeback,nobh (I have battery backup on the controller) Nginx: worker_connections = 4096, worker_rlimit_nofile 16384, worker_processes 8, open_file_cache max=100000 inactive=60m
Your. Discs. Suck. Point.
Try getting a lot more and a lot faster discs. SAS comes nicely here, as doe Velociraptors.
That said, the best would be getting... a SSD.
Your discs probably do around 200 IOPS each. With SAS you can get that up to around 450, with Velocidaptors to about 300. A high end SSD can get you... 50.000 (no joke - I really mean 5 0 0 0 0 0 0) IOPS.
Make the math ;) A single SSD, no RAID, would be about 62 times as fast as your Raid 10 ;)
I don't think your disk is the issue. First nginx's ncache uses a disk store for cache. So, disk speed is going to be one potential cause of issues depending on how hot/cold your dataset is, however, I see no reason that you couldn't serve 100mb/sec with the hardware you've mentioned - especially if you're using nginx.
First thing I would guess is your # of worker processes was low, your worker_connections were probably way too low, and you probably didn't have your open_file_cache set high enough. However, none of those settings would cause a high IO Wait nor a spike like that. You say that you are serving <50k images and it looks like 1/4 of your set could easily be buffered by the OS. Nginx is surely not configured optimally.
Varnish handles the problem in a slightly different way using RAM rather than disk for its cache.
Much depends on your dataset, but, based on the data you've given, I don't see any reason for disk IO to have spiked like that. Did you check dmesg and the logs to see if one of your drives encountered some IO errors at the time? The only other thing I can think that might have caused that spike was exceeding nginx's filecache which would have caused it to have to go into a FIFO mode opening new files.
Make sure your filesystem is mounted with noatime which should cut a considerable amount of writeops off your workload.
As an example of a machine that regularly handles 800mb/sec:
MRTG:
http://imgur.com/KYGp6.png
Dataset:
We're serving about 600 Mbps off of a server with SSDs on the backend, and nginx+varnish on the front. The actual processor is a little Intel Atom; we've got four of them behind a LB doing 600 Mbits/sec each (using DSR). Perhaps not appropriate for every situation, but it's been perfect for our use case.
Does the machine you are using have enough ram to have the working set of files cached in RAM?
Also - have you looked at something like Varnish? Nginx is great for handling tons of connections - but it's not the ultimate in terms of caching and systems performance.
Add much more disks. You can trade single disk speed with number of disks (up to a certain point): maybe you can get the same performance with X expensive SAS 15kRPM disks or with (guessing, not meaningful values) X*2 cheap SATA 7k2RPM disks. You have to do your math and see what's better for you - and that also depends on how much you pay rack space and power at your datacenter.
SSD will give you all the IOPS you'll need but they're not cheap for bulk storage (that's why their primary use case is database like workloads).