I am trying to setup a fileserver in different geographical locations on a centos based linux server. Currently I am planning to have two such servers now and will be extending that to other areas in the near future. The file servers should be mirroring itself when a file is added in any of the location(I am yet to arrive at a deletion strategy, but just thinking should allow deletion of files when it is deleted from the primary server). Currently I am thinking I will have apache "directory listing" and rsync to do this work. I just want to know if there are any better tools to do the above. Also I would like to hear some suggestions on a better directory listing script(php/python based). It would be good if this tools has some search capabilities, options to upload files, etc(Am I asking too much ? ;) ).
Note: The current server also hosts a subversion replication. I also thought about committing all files into subversion and get it checked out in the secondary location. But I feel space would be a constraint as I would have keep removing some unwanted files so that I will have diskspace under control, this will not be possible as svn's version history will hold the data
Thanks in advance.
This is an incredibly hard problem to solve in the general case. Geographically-distributed, multi-master replicated filesystems are a topic you can get a PhD in even when you don't solve the whole problem, so a small snippet of PHP or Python is unlikely to get very far.
If you're only handling file adds (no modifications) and there's no possibility of filename collision, the problem becomes much easier and you can get away with a small shell script. Be warned, though, that this isn't a common situation -- you might think it is now, but I'll bet the users' ideas are different.
My advice: find someone who knows about this sort of thing and give them some money to perform a thorough requirements analysis and come up with a solution.
If "fileserver" means users map drives to this server through something like Samba or NFS, that's the very hard problem Womble described so well. I've seen some systems get kind of close to this, but they don't involve mounted volumes; they use a specific client on every directory-tree involved in the replication scheme and use some complex collision-detection algorithms to make sure things don't get trod on. And multiple-open files like Access databases simply don't work well in this circumstance.
If "fileserver" means a static-file server for a dynamic website, that's much easier. DRBD and Rsync were designed for that kind of workload. That you're having to do a lot of hand-holding suggests something else might be going on, though.
GlusterFS is not a good solution for WAN.
I can only recommend DRBD (Will need to purchase DRBD Proxy), or look at csync2.
I believe you can use something like inotify to trigger csync2 or use lsyncd.
HTH
Brent
Finally I had done this. Have setup a rsync. Along with it I extplorer which provides webbased filemanager capabilities. With these I was able to solve the above mentioned problem, I am yet to move it to production but it is running successfully for the last 4 days.
PS: As per advise, let me go try my PHD :)