My Xen servers are openSUSE 11.1 with open-iscsi to our iSCSI SAN cluster. The SAN modules are in an IP failover group behind a virtual IP that the initiators connect to.
In the event that the primary SAN server goes down, the secondary picks up the role of serving as the target. This is all handled by the LeftHand SAN/iQ software and works well in most situations.
The problem I have is that occasionally some of my Xen DomUs will have their root filesystem go read-only after an IP failover. It's not consistent, and happens to a different subset each time a failover occurs. They're all running the same openSUSE 11.1 software image.
The root filesystems for each DomU are mounted by open-iscsi in the Dom0 and then Xen uses the standard block device driver to expose it to the DomU.
The exact symptom is that as a root as running touch /test
returns the error "read-only filesystem". However, the output of mount
shows it as being mounted read-write. Of course, all other I/O on the domU is also failing at this time so the machine comes down hard. Simply restarting it with xm
from the Dom0 without even reconnecting the iSCSI session makes everything work again.
On the Dom0 side the syslog messages during the fail-over are something like the following:
kernel: connection1:0: iscsi: detected conn error (1011)
iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
iscsid: connection1:0 is operational after recovery (1 attempts)
I'm having a hard time figuring out at what layer to debug this problem, is it something in the DomU kernel? or at the Dom0 or Xen level? I think there's likely some parameter somewhere that needs tweaking to increase some kind of timeout, but I'm not sure where to look.
I don't really think it is an issue with open-iscsi simply because the connected block device is still readable and writeable from the Dom0.
I eventually solved this by using the following advice and settings from the open-iscsi documentation:
After setting up the connection to each LUN as described above, the failover works like a charm, even if it takes several minutes to happen.
This sounds like a problem with the iSCSI initiator running on the dom0. The initiator should not be sending SCSI failures up the stack that quickly. You'll probably want to set ConnFailTimeout in iscsi.conf this is the setting that determines how long before it considers a connection failure an error and sends that error up the SCSI stack.
I'd also look into how long that failover is actually taking, it may be taking longer than you expect. If so maybe the VIP failover is taking too long due to ARP related issues.
Are there any messages in dom0 indicating any sort of read/write errors or scsi errors at the time of the failover? If so, it's looking like this write error is being passed up to the domU. The domU doesn't "know" that it's an iSCSI device, so it's behaving as though the underlying disk had gone away and remounting the filesystem read-only (see mount(1) manpage -
errors=continue / errors=remount-ro / errors=panic
)From the dom0's perspective, it won't get changed to read-only - this read-only behaviour is a filesystem semantic, not a block device semantic.
You mention that "all other I/O is failing" at this time - do you mean the domU or dom0?
Usually when setting up an HA iSCSI solution I use multipathing rather than virtual IP takeover - it allows greater visibility to the host and you don't have an iSCSI session suddenly disappear then needing to be restarted - it's always there, there's just two of them. Is this an option in this environment?
Um...Part of the problem is also that you aren't running / as RO. Best practices security wise state you should have "/" mounted ro, and that any filesystems that need rw should be mounted seperately, (i.e, /var and /tmp). If there are directories under /etc that need writing to, they should be moved to /var/etc/path and symlinked to /etc.
"/" should only be mounted RW in single user mode.
Setting up in this fashion could prevent the segfault in the above situation when combined with the other suggestions.