I'm evaluating the possibility to use two off the shelf servers to build a cheap iSCSI redundant SAN. The idea is to run linux, pacemaker, and an iSCSI target - something like the SAN Active-Passive on linux-ha-examples.
The same page scares me a little when I read:
During the switchover of the iscsi-target one can detect a gap in the protocol of write-test.log. In our setup we observed a delay of 30s. There are problems reported in connection of ext3 and an iscsi failover This configuration has been tested with ext2 and ext3 and worked with both filesystems.
Has anyone put in production a redundant iSCSI SAN made out of linux boxes? Is a failover event really that bad? A 30 seconds freeze in I/O sounds like a disaster to me, isn't it?
SCSI connections time out after 15 seconds (or something) by default. If your home-built solution can't complete a takeover during that time, you'll need to play with that value. Also worth considering is that normal SANs mirror their cache so after a takeover, writes that were acknowledged but not yet committed to disk are not lost. If you can't arrange for that, you risk data corruption or having to avoid caching writes.
We have set up two Linux boxes as iSCSI target cluster. We use DRBD and SCST target and it works fine. (SCST target is better than the old iscsitarget, VMware ESXi can kill that one but not SCST).
The timeout is a client side settings so you can set it lower if you wish.