I'm looking into setting up a shared filesystem/file server on AWS (EC2) infrastructure that offers replication and fairly painless failover. This filesystem would host potentially millions of files that are a few megs in size. Those files would be accessed (read/write) from several client VMs. If the primary file server fails I'd want the clients to be able to failover to the replica file server without losing any files (i.e. I want replication to be real-time). I've looked at a few options:
- Use S3 with s3fs. I'm concerned the latency of each request will be problematic when performing operations on thousands of files (e.g. when copying/moving files around). I've also heard some reports that make me question s3fs's stability—not sure if that's still the case.
- Setup an NFS server on an EC2 instance, using drbd to replicate blocks between two instances. Downsides:
- I've had reliability issues with drbd in the past, especially over high-latency links
- If the primary NFS server goes down it will take down the clients with it, requiring sysadmin intervention and/or reboots to get them to reconnect to the secondary server. There's no auto-failover.
Are there any better solutions?
It is possible, though not trivial, to setup NFS Clusters in Amazon EC2 using DRBD for synchronous replication and Pacemaker + Corosync for automating failover of the NFS service and exports between nodes (without interrupting client access).
If you're planning on replicating synchronously ("real-time"), you'll need both your EC2 instances to be in the same zone to limit the latency between them; otherwise that network latency will translate to disk latency.
Also, it's not possible to easily assign/unassign an IP address on Amazon EC2 instances; you need to use their API (or use their web-gui) to reassign an IP address. Moving an IP address is needed for the floating IP address that clients will use to connect to the active node. Some modification of the 'IPaddr2' Pacemaker resource agent will be required to get this working; it's a bash script.
Given the complexity of setting up a replicated NFS server, we're opting to go with S3. The performance of s3fs-fuze was terrible (doing an
ls
on a directory with over 1,000 files would take close to a minute due to it needing to query metadata for each file, and caching didn't seem to help). However, I then tried out RioFS, which provided me with instant responses in directory operations and overall felt very fast.I'm still planning on investigating a few additional options (S3QL and YAS3FS in particular) but so far the options look promising.
Just some updated information. If you are like me and you have wanted this functionality for a VERY, VERY long time, use Amazon Elastic File System (EFS). It is an NFS mount replicated across multiple availability zones.
(Sorry to bump the issue, but the google rank of this answer is high enough that a few people probably are searching for this solution.)