I'm building a application that needs to distribute a standard file server across a few sites over a WAN. Basically, each site needs to write a lot of misc files of varying size (some in the 100s MB range, but most small), and the application is written such that collisions aren't a problem. I'd like to have a system set up that meets the following qualifications:
- Each site can store files in a shared "namespace". That is, all the files would show up in the same filesystem.
- Each site would not send data over the WAN unless necessary. I.e., there would be local storage on each side of the WAN that would be "merged" into the same logical filesystem.
- Linux & Free ($$$) is a Plus
Basically, something like a central NFS share would meet most of the requirements, however it would not allow the locally written data to stay local. All data from remote sides of the WAN would be copied locally all the time.
I have looked into Lustre, and have run some successful tests with it, however, it appears to distribute files fairly uniformly across the distributed storage. I have dug through the documentation and have not found anything that automatically will "prefer" local storage over remote storage. Even something that went with the lowest latency storage would be fine. It would work most of the time, which would meet this application's requirements.
Some answers to some questions asked below:
- Server nodes: 2 or 3 to start. Each server would have dozens of simultaneous reads/write clients connecting.
- WAN Topology is full mesh and reliable. (large corporation, cost isn't as limiting as red tape)
- Client failover: I actually hadn't thought about having the clients failover (mostly because our current appliction doesn't do this at just one site). I supposed the practicle answer is that the servers at each geographically distributed site are expected to be single points of failures for the clients they are serving. Though, if you are thinking about something specific here, I think it would be quite germane to the discussion.
- Roll-my-own: I have thought about rsync/unison, however I would need quite a bit of fancy logic to make the "dynamic" part of this work seamlessly. I.e., file appears to be local, but is only retrieved on demand.
- MS-DFS: It certainly appears to be something I should look into. My main issue would be potentially being unsure about NFS server configuration/reliability/performance on Windows, as many of the clients connecting are NFS clients.
Shame about the Linux requirement. This is exactly what Windows DFS does. Since 2003 R2, it does it on a block-level basis, too.
Some questions:
How many "server" nodes are you thinking about having participate in this thing?
What's the WAN connectivity topology like-- hub and spoke, full mesh? How reliable is it?
Do you expect clients to failover to a geographically non-local server in the event the local server fails?
Windows DFS-R certainly would what you're looking for, albeit for some potentially hefty licensing costs.
You say that collisions aren't a problem and you don't need a distributed lock manager, so you could do this with userland tools like rsync or Unison and just export the resulting corpus of files with NFS to the local clients. It's ugly, and you'd have to handle knocking together some kind of system to handle generating a replication topology and actually running the userland tools, but it would certainly be cheap as licensing cost goes.
Have you considered AFS?
As I understand it, most of the recent development has been behind the OpenAFS project.
I can't pretend to be familiar enough with the project to know if the "preferred locality" feature is available, but otherwise it sounds like a good fit.
Have you looked at OST pools in Lustre?
It won't be automatic but with OST pools you can assign directories/files to specific OST/OSSes - basically policy based storage allocation, rather than the default round-robin/striping across OSTs.
So you could setup a directory per site and assign that directory to the local OSTs for that site, which will direct all I/O to the local OSTs. It will still be a global namespace.
There's a lot of work going into improving Lustre over WAN connections (local caching servers and things like that) but it's all still under heavy development AFAIK.
Maybe NFS but with Cachefs on the application servers will accomplish your part of your goal. As I understand it everything written will still go the central server, but at least reads could end up being cached locally. This could potentially take a lot of delay off of reads depending on your usage patterns.
Also, mabye UnionFS is worth looking into. With this I think each location would be a NFS export, and then you could use UnionFS at each location to have that and all the other NFS mounts from the location appear as one filesystem. I don't have experience with this though.
You could look into DRBD to replicate the disks. http://www.drbd.org/. This is a linux High Availability solution which just now made it into the Kernel.
However, this has some limitations:
If you want to keep it simple then have a look a rsync, solves a lot of problems and can be scripted.
Check on chironfs.
Maybe it can do what you want, on file system basis.
Btsync is another solution that I've had good experience with. It uses BitTorrent protocol to transfer the files, so the more servers you have the faster it is as synchronizing new files.
Unlike rsync-based solution, it detects when you rename the files/folders, and renames them on all the nodes instead of delete/copy.
Yout btsync clients can then share the folders on a local network.
The only downside I found (compared to MS DFS) is that it will not detect a local file copy. Instead it will interpret it as a new file an uploaded to all the peers.
So far btsync seems to be the best synchronization solution and it can be installed on Windows, Linux, Android, and ARM devices (e.g. NAS)