We are in the midst of planning a re-cabling of the building and installing new switches. A few different designs have been proposed, and one such design puts our core switches at the heart of our network in such a way that even if one goes down completely, the other can carry on.
The idea is that there are 2 core switches, and 6 edge switches (to connect all of our endpoints). This proposal uses fixed-port switches. All of our servers would connect to both the cores, and each of our 6 edge switches would connect to both the cores. In this way, if core A goes down, core B is still physically connected to all servers and edge switches keeping data flowing without interruption.
The proposed design trunks various RJ45 ports, citing the backplane as a single point of failure.
My assumption was that we were going to stack the 6 edge devices, at the very least, and run a few lines of fiber using those mini-gBIC adapters to the cores... but I'm being told that if we stack all 6 edge switches and there is a problem with the backplane then all 6 switches are going to go "down". Under the proposed design, if an edge switch has a problem, only devices connected to that switch will be affected.
What is the probability of a switch backplane having an issue as compared to a standard ethernet port?
Does this proposal make sense and REALLY provide the redundancy it assumes?
Can't we just stack those 6 edge switches, and run two fiber lines via some mini gbic modules on one of them and call it a day? I thought that if a switch in a stack has a problem, the other switches could still "work around" it.
It's a balance between ease of manageability versus (more) complex configuration. The stacking will reduce the manageability requirements, but it is more complex as the switches will need to coordinate the stacking configurations amongst its stack peers.
There is a chance that the switches runs into stacking difficulties where multiple switches do not properly join the stack. The overall stack in those situations will take at least two hits to the single incident - one when not all switches join the stack correctly, two when the stack may need to be brought down during troubleshooting/resolution.
I've certainly had switches die on me. Don't know if that was "backplane", "PSU" or other component. Those have not been stacked, so I don't know if that'd actually take the stack down.
When you stack devices, you need to make sure that no single device is a SPOF, there are ways of doing that, but the big question is "is it simpler to replace a single free-standing switch than an element in a switch stack".
I'd say "yes, marginally". You can slap a current configuration (from your daily configuration backups) onto the switch in a comfortable environment, then get the switch installed with roughly two less cables to worry about (the stacking cables).