Greetings Team,
I’d like to share a brainstorm with the experts about how to produce a 100% uptime and redundant VMware infrastructure.
What I am currently working with is 2 VMware controller servers and they are presented with 2 iSCSI targets which are actually Linux DRBD configured systems to act as SANS with real time replication of the data between the targets.
If a controller fails, things are fine, if a iSCSI target fails yet again ok, but what if… in the event of a disaster both iSCSI targets failed, something went terribly wrong.
Now all together I am not against the idea of totally ditching DRBD for a specific instance where 100% uptime really means what it serves but given the fact it’s working well, what would we recommend as a tertiary form of redundancy to provide an instant or if not instant fast as possible turn around recovery mode for the iSCSI targets to get the virtual machines back online.
I look forward to hearing responses, have a great day all.
Best, Nick
It may be easier to achieve this objective with NFS instead of iSCSI.
First of all, NFS clients don't just explode if they can't access an NFS server -- they hang around and wait for it to come back.
Also, it isn't super-exotic to make an NFS server cluster. This example uses drbd, but this example uses shared storage (FC).
I'll freely admit that this isn't the answer you're looking for, but it does solve the more global issue of providing reliable storage to ESX servers.
IRC, DRBD can stream asynchronously to a third box, allowing a final level of data protection. Then you can bring up iSCSI on that box, or migrate data elsewhere, or whatever.