First, please excuse my ignorance on this topic. Until recently I had been blissfully ignorant of this field and only recently began to suspect that I'd need to become directly involved. In any case I find myself feeling about and trying to get a survey of potential issues. This the first in what may well be many questions, let's hope I don't screw up too badly.
At work I've been told that we have three ESXi 4.x systems in a cluster. As a result of being the certified hardware tech in addition to other duties I was checking these servers warranty status and associated information. I noted that the three servers have only 2 physical NICs each. So then it dawns on me that the primary admin has been complaining about these three machines and not being able to do some specific network configuration that he has in mind.
So I start getting a little itch that something isn't right. In a past life I know we had a quartet of clustered servers that had 6 physical NICs each so that they had redundant paths to their private cluster only network, their iSCSI interface to the storage array and then redundant paths to the 'real' network.
All of which leaves me with the following questions:
- Does that logic apply to an ESXi cluster?
- Do they need a private network for health and system state communications?
- Shouldn't there be a 'shared' disk or volume for the cluster to be able to fail over? If my understanding of the configuration is correct, each member of the cluster has images for the systems they're running, which doesn't seem right. I can't imagine how ESXi would be fault tolerant if the disk array spins down on a member.
- Would a vSwitch figure in here and how? One of the complaints from the other administrator is that he's not certain that the cluster can span switches or share network space (i.e. a VM on Host A is able to reach a VM on Host B via a shared internal network).
- Given the nature of my questions, is there a recommendation for some form of deep-dive immersion to get up to speed? Training? Books? Website? Hypnotherapy? Or should I just drink the Kool-Aid?
I'm not certain that this shouldn't be configured as a community wiki as several of the questions 'feel' opinion based. Here to hoping there is a single opinion for at least some of the questions.
Thanks for the help and any amount of clue is appreciated.
Certainly that cabling model is very common and does work well, and if you use vMotion or Fault Tolerance then you management NICs will periodically be very busy. If this traffic is shared with VM traffic then their performance will be affected, so it makes sense to put them on separate links.
We actually make do with two just 10Gbps NICs per server using HP's Flex-Fabric to carve up the bandwidth (1Gbps for management, 4Gbps for FCoE and 5Gbps for VM traffic), this works well but doesn't suit everyone. That said you could use two regular 10Gbps NICs, they'd manage just fine.
Oh and yes, you need shared disks for vMotion/DRS or HA, and yes you'd need a number of vSwitches and if the switch config is correct there's no problem with the VMs transmitting across hosts. Couldn't recommend some training more, there's one called 'Install and Configure' that's good but make sure you know a bit about switching, routing and shared storage first.
Please take a look at: VMware vsphere essentials plus network cabling
Specific recommendations here: http://www.networkworld.com/community/taxonomy/term/17790
A 1. & 2.
In VMware you have 3 types of network: Management, VMkernel (IP Storage, i.e. NFS ans/or iSCSI and VMotion) and guests. In ideal world, you keep them separate and each with at least 2 physical interfaces to avoid a SPOF: - Management doesn't need much bandwidth, but you don't want VMs to mess with packets there. - If you keep datastores on NFS / iSCSI then VMkernel will eat bandidth. VMotion too. In ideal world you separate them to prevent VMotion affecting host's access to its datastores. If you don't have datastores on NFS (or they are rarely used, e.g. keep just templates and you don't deploy servers by dozens) it's one network - VMs' network -- kept separate for security reasons. This can have multiple VLANs defined, trunked over physical ports in the virtual switch.
If you cannot separate the networks physically (because you don't have enough NICs), it's good practice to have them on different VLANs and IP subnets.
A 3.
For ESXi cluster to make any sense the datastores (i.e. disk space where your VMs live) should be kept on a shared storage. The file system for datastores (VMFS) is parallel and cluster-aware, so it's not only safe, it's recommended. If any of your physical machines dies, the surviving hosts will restart VMs. They won't be able to do this if VM disk images are on the dead host.
A 4.
You can define a VLAN, that comes from vSwitch in host A, through physical interfaces, physical switch(es) to host B. If it does not include anything else in the physical world, you now have a "private" network connecting VMs on different hosts. Actually, all VLANs used by any VM should be defined on all physical hosts. That way you enable both VMotion and HA features between hosts.
A 5.
Reading is good. I'd recommend official VMware documentation -- you won't get any misinformation. I went to some VMware trainings, but they are as good as the trainer. Some just run you by the script, others know a lot or know where to get the answers to your questions. Beside that caffeine, chocolate and pizza help ;)