Currently we use an iSCSI SAN as storage for several VMware ESXi servers. I am investigating the use of an NFS target on a Linux server for additional virtual machines. I am also open to the idea of using an alternative operating system (like OpenSolaris) if it will provide significant advantages.
What Linux-based filesystem favours very large contiguous files (like VMware's disk images)? Alternatively, how have people found ZFS on OpenSolaris for this kind of workload?
(This question was originally asked on SuperUser; feel free to migrate answers here if you know how).
I'd really recommend you take a look at ZFS, but to get decent performance, you're going to need to pick up a dedicated device as a ZFS Intent Log (ZIL). Basically this is a small device (a few GB) that can write extremely fast (20-100K IOPS) which lets ZFS immediately confirm that writes have been synced to storage, but wait up to 30secs to actually commit the writes to the hard disks in your pool. In the event of crash/outage any uncommitted transaction in the ZIL are replayed upon mount. As a result, in addition to a UPS you may want a drive with an internal power supply/super-capacitor so that any pending IOs make it to permanent storage in the event of a power loss. If you opt against a dedicated ZIL device, writes can can have high latency leading to all sorts of problems. Assuming you're not interested in Sun's 18GB write optimized SSD "Logzilla" at ~$8200, some cheaper alternatives exist:
Once you've got OpenSolaris/Nexenta + ZFS setup there are quite a few ways to move blocks between your OpenSolaris and ESX boxen; what's right for you heavily depends on your existing infrastructure (L3 switches, Fibre cards) and your priorities (redundancy, latency, speed, cost). But since you don't need specialized licenses to unlock iSCSI/FC/NFS functionality you can evaluate anything you've got hardware for and pick your favorite:
If you can't spend $500 for evaluation, test with and without ZIL disabled to see if the ZIL is a bottleneck. (It probably is). Don't do this in production. Don't mess with ZFS deduplication just yet unless you also have lots of ram and an SSD for L2ARC. It's definitely nice once you get it setup, but you definitely try to do some NFS Tuning before playing with dedup. Once you get it saturating a 1-2 Gb links there are growth opportunities in 8gb FC, 10gigE and infiniband, but each require a significant investment even for evaluation.
I wouldn't do exactly this. In my experience, Linux (specifically CentOS 3/4/5) is a generally poor choice for a NFS server. I have had several and found that under load, latency and throughput tend to drop for reasons we could never quite get our heads around.
In our cases, we were comparing back-to-back Linux's performance to Solaris (on Ultra-SPARC) and NetApp; both of which returned results in terms of apples-to-apples performance and in nebulous terms of "engineers not complaining nearly as much about latency when the server was under load". There were multiple attempts to tune the Linux NFS server; both the NetApps and Solaris systems ran as-is out of the box. And since both the Solaris and NetApp systems involved were older, the Linux servers could be argued to have had every advantage and still failed to be convincing.
If you have the time, it would be a worth while experiment to set up the same hardware with OpenSolaris (now that Solaris is effectively too expensive to use), Linux, and perhaps a BSD variant or two, and race them. If you can come up with some performance metrics (disk I/O counts in a VM hosted off the store, for example) it might make for an interesting white paper or internet article. (If you have the time.)
Regarding NFS in general, the NetApp people told me several times that their benchmarks showed NFS only had a cost 5 to 10% in performance for VMs -- and if your application was sensitive enough that this was a problem, you shouldn't be virtualizing it in the first place.
But I should confess that after all that time and tears, our non-local production VM stores are all fed by iSCSI, mostly from NetApp.
We're using OpenSolaris 2009/06 with a RAID 10 ZFS config to provide NFS to our VMWare ESXi server. It works fairly well for our needs so far. We are using SATA Raid type drives (Seagate ES.2 1TB drives). We still have some tuning to do however.
I am a big fan of NFS datastores for VMware, NetApp has an excellent implementation.
TR-3808 compares the scaling of NetApp FC, iSCSI, and NFS connected shared datastores, which is an excellent read.
You might want to consider the 3+ years bug with ZFS ARC that still persists before jumping in too deep with ZFS...
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6522017
(This one is nasty as it will also go out-of-bounds from the VM limits of a hypervisor!)