My servers all have two NICs and I have two Ethernet switches (A and B) dedicated to the servers, in a redundant configuration. I have configured the NICs with bonding (Linux) or teaming (ESXi), and this seems to work fine -- I can turn off a switch or pull out a cable, and everything carries on. I have a cable from A to B so that the single-interface routers are accessible on both.
I need to connect my servers to the desktops, via a third switch (X). To take full advantage of the redundancy, I have configured this switch in the same way as the servers -- with link aggregation, and one cable to each of switches A and B. This results in random packets being dropped. It works correctly if X only connects to A or only to B, but of course this means that a switch failure takes down my network.
Where am I going wrong? Is this expected to work, and switch X is faulty, or should I not be using link aggregation on X, and instead rely on STP?
Yes, the 'problem' is Spanning-Tree Protocol killing one leg of your bonded connections from X to A+B.
If you need bonded connection to increase bandwidth, run 2 cables from X to each of A/B, for a total of 4 cables. Bond the pair going to A, likewise on the pair going to B. Let STP do its job (which will turn off one pair out of two).
You need to configure your hosts in a different way. Rather than as they are now, which seems to be teaming for load sharing (which will also supply redundancy, but only on a single switch), the team needs to be configured for redundancy. In this way the host will decide which switch to use and will swap it's MAC address from one NIC to the other depending on which it is using which will keep the switches happy.
There is a method of configuring Etherchannel, another name for load sharing over two or more NICs over more than one switch, but I do not think it is widely supported and has not been reliable in my experience.