I'm planning on setting up a XenServer machine that uses a NexentaOS/ZFS-based SAN for storage of the Virtual Disk Images (VDIs) through iSCSI. I know I could just setup a big Storage Repository (SR) on the SAN and let XenServer take care of snapshotting and cloning disk images. However, I'd love to tap more into the power of ZFS and use that for snapshotting/cloning, for a few reasons:
- I'm not sure how XenServer's snapshotting/cloning works, but if it's based on LVM, I'm concerned I'd run into issues when dealing with multiple snapshots. I've done some experiments a while ago with multiple LVM snapshots of the same data and performance seemed poor and the snapshots wasted a lot of space. It seems that ZFS snapshots are far superior to LVM snapshots.
- The SAN would be taking automatic (and efficient) periodic ZFS snapshots that could go back in time a while, and I'd love to be able to revert a VM to such ZFS snapshot.
Would letting ZFS handle snapshotting/cloning instead of doing it through XenServer be advisable, and if so, what's the best way to go about it? If I put all VDIs inside a single large SR, and take ZFS snapshots of the entire SR, I would not be able to roll back an individual VM at a time. I could create one SR per VDI, and if I have to rollback a VDI, I'd carefully detach the SR, roll it back on the SAN, and re-attach it. However, I'm guessing that I'd run into problems when attaching a cloned SR if XenServer detects duplicate SR UUIDs. Are there any better ways to handle cloning or rolling back to previous snapshots from the SAN?
As other answers alluded to, the ideal approach is LUN-per-VDI. At first it didn't look like it was possible to do this, but there is an undocumented "iscsi" SR driver that will create a LUN-per-VDI SR (I found this when looking through the /opt/xensource/sm directory - see the ISCSISR.py file). You essentially setup one SR for each iSCSI target, and XenServer creates the VDIs for each LUN on that target. You can only set this up through the command-line, including creating the VBDs and attaching it to VMs. The VBDs and VDIs don't even show up in XenCenter.
Here's a sample command to set it up:
This will automatically create an VDI for each iSCSI LUN. If you end up adding a new iSCSI LUN on the SAN, XenServer will add a new VDI for it after executing the following command:
This also works when adding VDIs for cloned LUNs - the new VDI gets assigned a new UUID.
Also, if you end up resizing a LUN, XenServer does not automatically pick up on that, so you'd have to execute the following:
And to create a VBD and attach it to a VM:
I've done several configuration with xenserver similar to this setup
And I've used one of two methods
If I dont have many vm's I create the system disk with VDI per VM and data disk as direct attach Iscsi lun
If I have lots of vm (20+) I create a big SR and when I need to roll back I can
A. rename the VG before connecting to xenserver (vgrename on a different machine - even virtual )
B. You can even attach the big SR snapshot to a virtual machine and export it again with iscsi ,
I hope its not too complicated :)
Interesting configuration.. specially for ZFS.
According to my experience with XenServer, the best you can do is let the storage system manage the disk (and it snapshots and other administrative tasks).
I think you should let ZFS handle te snapshots and cloning, but I'm not sure if the best you can do is give to XenServer one big SR (I agree with what you said about roll back). This is a very complex setup, because if you have a lot of VM you'll have a lot of VDI, the administration could be a mess but you will earn the capability to rollback. I'd use one VDI per VM.
Answering your question, if you double check that you detach the SR when rolling back you wont have any problems when attaching a cloned SR. I've done that before, not in XenServer but in XCP (http://goo.gl/4wfE)
Our customers (Nexenta customers) who do not have very many VMs, fewer than 1000 mostly, choose to do a LUN per VM at times. This allows you to snapshot easily from NexentaStor of each individual VM. What's cool is that you can create one golden image, aka clome-master. Then use that clone-master to spin up new VMs, by simply cloning it. And the added benefit is that you are not using any additional disk space.
This seems relevant. The bash script below has functions for attaching and detaching a SR as well as renaming all the UUIDs in an existing (for example SAN cloned) SR. It could be used to rename the UUIDs of an old clone and attach it without conflicting with the latest version of the volume being attached.