I'm considering migrating from ext3 to ZFS for data storage on my Debian Linux host, using ZFS on Linux. One killer feature of ZFS that I really want is its data integrity guarantees. The ability to trivially grow storage as my storage needs increase is also something I'd look forward to.
However, I also run a few VMs on the same host. (Though normally, in my case only one VM is running on the host at any one time.)
Considering ZFS's data checksumming and copy-on-write behavior, together with the fact that the VM disk images are comparatively huge files (my main VM's disk image file currently sits at 31 GB), what are the performance implications inside the VM guest of such a migration? What steps can I take to reduce the possible negative performance impact?
I can live with less data integrity guarantees on the VM disk images if necessary (I don't do anything really critical inside any of the VMs) and can easily separate them from the rest of the filesystem, but it would be nice if I don't have to (even selectively) turn off pretty much the feature that most makes me want to migrate to a different file system.
The hardware is pretty beefy for a workstation-class system, but won't hold much of a candle to a high-end server (32 GB RAM with rarely >10 GB in use, 6-core 3.3 GHz CPU, currently 2.6 TB usable disk space according to df
and a total of about 1.1 TB free; migrating to ZFS will likely add some more free space) and I'm not planning on running data deduplication (as turning on dedup just wouldn't add much in my situation). The plan is to start with a JBOD configuration (obviously with good backups) but I may move to a two-way mirror setup eventually if conditions warrant.
Since ZFS works at a block level the size of the files makes no difference. ZFS requires more memory and CPU but is not inherently significantly slower as a filesystem. Though you need to be aware that RAIDZ is not equivalent in speed to RAID5. RAID10 is fine where speed is a priority.
ZFS on decent (i.e buff) hardware will likely be faster than other file systems, you likely want to create a ZIL on a fast (i.e. SSD) location. This is essentially a location to cache writes (well, more like a journal in ext3/4). This lets the box ack writes as being written to disk before the actual spindles have the data.
You can also create a L2 ARC on SSD for read cache. This is fantastic in a VM environment where you can bring physical disks to their knees by booting several VMs at the same time.
Drives go into VDEVs, VDEVs go into zpools (please use entire disks at a time). If this is a smaller system you may want to have a single zpool and (if you are not too concerned about data loss) a single VDEV. VDEVs are where you select the RAID level (although you can also MIRROR VDEVs if you've got enough disks). The slowest disk in a VDEV determines how fast the entire VDEV is.
ZFS is all about data integrity - the reason a lot of the traditional tools for file system maintenance don't exist (like fsck) is the problem they solve can't exist on a ZFS file system.
IMO the biggest drawback of ZFS is that if your file systems approach full (say 75%+) it gets VERY slow. Just don't go there.
31GB really isn't big at all...
Anyway, depending on the file system you are currently using, you may find ZFS is slightly slower but given your hardware specs it may be negligible.
Obviously ZFS will use a good chunk of RAM for caching which may make your VMs seem 'snappier' in general use (When not doing heavy reading or writing). I'm not sure of how ZFS is tuned on Linux but you may need to limit its ARC, if possible, to stop it running away with all your RAM (Seeing as you'll want a decent chunk left over for your host system and VMs).
I would enable compression (advice these days is to turn it on unless you have a good reason not to). Remember this has to be done before putting data on the file system. Most people are surprised to find it's actually quicker with it on, as the compression algorithms will generally run faster than disk IO. I doubt it will cause much of a performance issue with your 6 core processor. I wasn't expecting VMs to compress much, but I managed to turn ~470GB of VM data into 304GB just with the default compression setting.
Don't bother with dedupe, it will just come back to haunt you later on and you'll spend weeks shuffling data around trying to get rid of it.
If you do encounter performance problems then the obvious answer is to add an SSD as ZIL/L2ARC or even both. It's not ideal to use one device for both but it'll most likely still improve performance on a pool containing a small number of disks/vdevs.
To Add: I would really try and start with a redundant configuration if possibly (ideally mirrors), or convert to mirrors from a stripe as soon as possible. While ZFS will checksum all data and detect errors on the fly (or during a scrub), it won't be able to do anything about it (without using copies = 2 which will double disk usage). You'll just be left with it telling you there are errors in files (probably your VM disk images) which you won't be able to do a lot about without deleting and re-creating those files.
Depending on your use cases and VMs i would consider the Following. Let the Host Operating system take care of the files you are Storing on the ZFS Volumes.
If possible, create just a LUN for every VM, only containing the Operating System and necessary binary files. And present Storage stace for Individual Data as as shares via NFS, samba, or iSCSI (or zvols as mentioned in the comments). ZFS is able to keep track of every file with checksumming, and access times ect. Of course if the speed is no so important you could also enable compression on some Datastores. The benefit would be a missing layer of another Filesystem. If you'd create a LUN for second Virtual Harddrive and create an NTFS Filesystem ontop of that, ZFS has to handle a big Binary blob an does not know any of the contents or files, and therefore cant take advantage of ZIL or ARC cache in the same way the plane files could.
Mentioning ACLs, ZFS is able to use ACLs via NFSv4 or Samba (if enabled). I have do admit that I use ZFS on FreeBSD, and can not assure how to enable Sambas ACLs mating onto ZFS volumes. But I am sure this should not be a big deal.
Deduplication in combination with a Read cache is a big advantage when it comes to saving some space and improving massive reads (Boot storm) as all VMs begin to read the same blocks.
The same goes for ZFS snapshots for the VMs and the Datastores. You can create a simple shell script, to freeze the VM, take a snapshot of the VM and the Datastore and continue working, or just the Datastore alone, and clone the VM present the Snapshot of the original one and test some stuff.
The possibilities are endless with ZFS ;)
EDIT: Hopefully i have explained it a bit better now
EDIT2: Personal opinion: Consider using a RAIDZ2 (RAID6) as you can withstand a double disk failure! If you have a single spare disk left, it will never be wrong, but two disk failures should be enough for quick reaktion. I just postet my script for monitoring the disk status here