I'm having an issue with networking that I can't wrap my head around since I'm not a strong networking guy to get this. From our provider we have 2 drops via HSRP that go into our cisco 2960 switches that are stacked. So each switch has a drop. From there we have two Astaro devices behind the switches that handle all the firewall and VLAN routing. These then feed back into the Cisco 2960's and also all the VM hosts are on the same 2960's So it looks something like
-------------- --------------
|------ | Cisco 1 2960 | <--------> |Astaro 1 / VMS|
| ______________ --------------
----------- --------
| Uplink |
|---------- --------
| -------------- --------------
|-------| Cisco 2 2960 | <--------> |Astaro 2 / VMS|
-------------- --------------
So at anytime a cisco is the master of the stack and the an astaro is also master.
Say I have the following scenerio
Master Astaro is #1 Master Switch in the stack is #2
If I reload switch #2 i get around a 2 minute downtime as switch 1 takes over and things re-negotiate.
Some of my cisco configs look like
spanning-tree mode rapid-pvst
spanning-tree extend system-id
no spanning-tree vlan 1,100
interface GigabitEthernet1/0/1
switchport access vlan 100
switchport mode access
switchport nonegotiate
duplex full
!
interface GigabitEthernet1/0/2
switchport mode trunk
switchport nonegotiate
!
interface GigabitEthernet1/0/3
switchport mode access
switchport nonegotiate
!
interface GigabitEthernet1/0/4
switchport access vlan 100
switchport mode access
switchport nonegotiate
!
port 1 is to my provider and 2-4 are to the switch to the astaro for management port/vlan port and wan port.
I'm at a lose for why I can't have a better then a 2 minute failover if I reboot a switch.
Edit
below is the config for our "stack"
sw1a>show switch
Switch/Stack Mac Address : 64d8.1431.6a80
H/W Current
Switch# Role Mac Address Priority Version State
----------------------------------------------------------
1 Member 0cd9.960b.5b00 15 1 Ready
*2 Master 64d8.1431.6a80 10 1 Ready
- Port 1 on the switch is our uplink
- port 2 is the WAN port which goes back to the astaro
- port 3 is the management vlan port back to the astaro
- port 4 is the vlan port that goes back to the astaro
The astaro is just pretty much a linux appliance that gives a gui to all the iptables and such tools that linux will offer for networking.
Based on your edits and comments, I don't think that this is spanning-tree delay that you're seeing. The downtime that you're describing (2 minutes) is really too long to be explained by STP, and I kind of doubt that the Linux servers are running STP with the switches. You also basically are doing single-switch spanning-tree, as a switch stack is considered one logical switch.
There are some STP tweaks that are probably a good idea in your situation, though. First of all, you can re-enable Spanning-Tree on your VLANs -- no reason to have it turned off. Mode rapid-pvst is a good idea unless you're trying to run spanning-tree with the Linux boxes. You can also tell the switch that the trunks towards your Linux devices (Gi1/0/2) are not switches.
That leaves the other redundancy features you've got here, which are the switch stack itself, HSRP, and anything on the Astaros.
My bet is on the failure recovery mechanism on the Astaros. Since you mentioned that one is "master", that implies that only one is active at any one time. What kind of timers are setup on the Astaros devices for failover? Do you have any logs that indicate how long it takes the standby device to go active after the switch fails?
Spanning-tree doesn't seem right because of the fact that all the STP is being done on one switch, and because of the downtime. The switch stack (at least on 3750 stacks) failover should be faster than that too, although you might hookup a console to the secondary switch to see if its taking a long time to take over as master. HSRP (assuming its running at the provider and not on your switches) will also fail a good bit faster than that, and shouldn't be affecting you.
TL;DR -- I think it's the failover timers on your Linux boxes that are causing the delay. Second place goes to the switch stack taking a long time to have the secondary switch take over as master.