In the last few months, I've been trying to find the best way to share the content of multiple websites across multiple web servers (12+) all running Apache+PHP. One of the biggest challenge I face is that we need to be able to read and write to the file system at all time, for all kind of good and bad reasons (i.e. app not in our control, wordpress site managed via web interface, etc...).
Here are some of the things I've tried and out it turned out:
- Rsync/duplicity/csync2: Only runs once per minute which means certain changes won't be shared across the cluster fast enough to prevent huge issues.
- inotify/incron: Too complicated considering the huge amount of files and directories to monitor. Also, it wasn't working very well with new files.
- GlusterFS: We had a 4 server gluster backend and the performance, while definitively slow, was tolerable. Unfortunately, the gluster client running on each web server crashed constantly, which then froze one of the 4 file servers for anywhere between 2 and 15 minutes. We reached out to Gluster Inc. to get some of their engineers to help us but they were unable to figure out the problem. We had to give up after 3 months of usage.
General information about our setup:
- Hosted on Amazon EC2
- Running Ubuntu Maverick
- Running Nginx (2) -> Varnish (2) -> Apache (12+)
- .htaccess is disabled for best performance. We add the directives directly to the sites config files
- Most of the websites/apps we run are not ours and unfortunately, a read only environment is not possible
- High-availability with automatic fail-over is very important for us considering the task that these web servers are handling
So I think that covers everything :). Thanks in advance for your time and responses.
The "Least Worst" (TM) option here is NFS. I know thats tough to swallow. I tried to avoid it with rsync, I've tried to avoid it with GFS, I've tried to avoid it with incron/inotify. I've tried to avoid it by pushing developers to stop using the filesystem as a datastore. In the end we really don't have a better option than NFS. Not because NFS is good, as you said the HA part of it is rough, but because there's really no better option.
It's still technically 'beta' but Bittorrent Sync would be perfect for you I think. I don't have such an environment (multiple webservers) so haven't tried it myself, but have heard of others doing exactly this and been very happy. I do use it for distributed server backups, which is a similar use from a technical perspective: http://wandin.net/dotclear/index.php?post/2013/07/17/Distributed-server-backups-with-btsync
http://labs.bittorrent.com/experiments/sync.html
What's wrong with good old NFS+DRBD (assuming this is on Linux)?