I created a sub-domain for static content to be able to serve it more efficiently by several load-balanced web servers. This static content is updated automatically, at a rate of ~1k files/day.
Right now I use rsync to update the servers in a master/slave way but since the content has a growing number of 100k+ files, it takes more and more time and puts an increasing I/O load on both the master and the slaves.
I cannot use the solution I proposed on the Improve rsync performance question since I cannot know which files are modified without stat
-ing them all, and that wouldn't solve the increasing I/O cost. I also have to handle file deletion.
I thought about using something like a r/o NFS on the slaves but that could somewhat defeat the load-balancing effect and put a gratuitous SPOF.
Btw, the servers are running AIX, but I'm also interested in a solution in a more generic context.
You should maybe consider DRBD with OCFS so that you can have master/master nodes. This creates no SPOF because each node has a local copy.
You can also make two NFS node servers (DRBD master/standby or DRBD master/master with LB). If you have many nodes, this is the best option.
Why don't you just use reverse proxy, such as Squid?
If you are using rsync, make sure that the files do not get accessed each time you sync them. This way only the file lists and the timestamps get compared and that should be reasonably fast, even with millions of files.
rsync -t should replicate the time stamps and compare timestamps only. If that doesn't work, use the option size-only (if your file may change without changing the size, be careful)