I have an iSCSI fail over implementation setup so if one of my storage units fails the other takes over immediately (it also runs the NFS shares). When fail over occurs, volumes are exported, the IP is switched to the other machine and the targets are reconfigured. The fail over of the storage system itself works just fine. I use NexentaStor for my filer.
When I do a test (manual) fail over of my storage the following occurs:
Note: I run the admin VM's on NFS and customer based VM's on iSCSI
- All NFS based VM's remain up and working perfectly through the failover and after
- All VM 's running on iSCSI eventually report the following:
- An error about not being able to write to a particular block
- An error about journaling not working
- Then the file system goes RO
To get the VM's working again I have to do the following:
- Force shutdown of the "broken" VM's.
- Detach the iSCSI SR
- Re-attach the iSCSI SR
- Boot the VM on a different server (5 in my pool) If I don't boot on a different server I get this error
"Internal error: Failure("The VDI <uuid> is already attached in RW mode; it can't be attached in RO mode!")"
The only way I have found to fix that error is to reboot the entire server it was running on previously which is obviously a huge pain.
Currently multipathing is NOT enabled (but can be and the same thing still occurs). I have edited much of the /etc/iscsid.conf file to work with the timeout settings but to no avail.
In short, my storage fails over properly but XenServer does not keep the connection alive. As a thought, the error that shows up in #4 above might be the ultimate cause and fixing that would fix everything?
Any help would be appreciated more than you know.
I had a very similar problem with iSCSI failover. It's addressed in this question. You can see my accepted solution that I discovered on my own for info on how I solved it.
Basically it involved setting
so that the iSCSI session has enough time to recover before it reports errors up the chain to the kernel.
xe-toolstack-restart
fixed it for me.