I'm managing an office network which is spread across 2 buildings. (see network setup below)
In building 2, we are getting a problem where the network slows down and then completely cuts off. The only way to bring it back is to power cycle the switch, then before it slows down again, go into the GUI dashboard and power cycle it from the dashboard.
The issue has happened 4 times (and is becoming more frequent). It first happened 2 weeks ago, then again yesterday, and twice today.
I'm trying to work out what could be causing it. I've believe I've narrowed it down the switch as the only network devices in building 2 are the switch and the WiFi access point. When the switch is down, the WiFi access point is still running.
When building 2 is down, if i go to building 1, I can still connect to the network fine.
Any idea what might be causing the above? My 2 guesses are:
network storm - but this would be odd as we haven't recently added notable new devices to the network. So why has it come about all of a sudden, plus why is it getting worse on each reset?
faulty switch (needs replacing under warranty)
Network is configured as follows :
Building 1
- Draytek 2860 router (connected to fibre modem)
- Netgear 24 gigabit port smart switch
- Ubiquity UAP wifi access point (mostly used by mobiles / tablets.. low traffic)
- 4 wired LAN computers (heavier use, as we all use dropbox on LAN sync mode for heavy files)
- printers and other devices
- all cabling cat6
Building 2 (connected to building 1 by cat6 c. 30m cable length)
- Netgear 48 gigabit port smart switch (Netgear GS748T)
- Ubiquity UAP wifi access point (used mostly by 2 laptops, heavier usage that building 1, again most traffic is used from dropbox LAN sync.)
First off: I believe Netgear Prosafe switches are like HP procurves, and have a limited lifetime, advanced replacement warranty - it may be worth logging the call with them as intermittent switch responsiveness (Especially if you can't get to management page when plugged in directly - see below), get a replacement in, and switch it over - that may be the quickest way to get things running.
The Web GUI - when it's going wrong, can you log into the GUI if Plugged directly into the switch? Sounds like you can't in the post, but didn't specify if you were plugged in direct or somewhere else
When it's happening, is the light pattern on the front of the switch "Normal"? That will be a quick clue if there's a broadcast storm, then you can log into the router and check things out - best place to look may be the packet statistics page - that will show you the number of both Broadcast and error packets get a feel for what it's like when things are normal, and when things are wrong, it will stand out more.
If you think it's a broadcast storm, use the GUI to set up a mirror port (Will replicate all traffic on the selected ports to the mirror port, making it work like an oldskool hub), plug a laptop in, and run Wireshark (https://www.wireshark.org) on it - you can then look at all the raw traffic going through the switch.
Not gonna lie: Working through a wireshark log is not fun, but if it is a storm, you should have one MAC Address screaming at you in the logs - you can then check the ARP Table in the Switch GUI to see which Mac address is plugged into which MAC Port, then dgo and unplug and test whatever's on the end of that port.
Hope some of that helps. Good Luck
Have you checked the firmware version on all of your equipment? There are a few versions of the Netgear switch that need to be checked. You did not list it in the OP. You can check this link here for V5 Netgear GS748T v5 You didn't list the other model. If it is unmanaged you might not be able to do much with it.
Look in the web interface of the switch for the firmware version, and then search for Netgear's site for the current release. I would check the router too, they are more common causes of problems, but I have seen switches do exactly what you are describing with old firmware.
UPDATE: As an added thought, you may also want to check the driver versions on all of the nic cards on your LAN PCs, sometimes they flake out as well and can cause issues like this. Worst case, shut down all of the PCs and non-essential hardware, network printers etc. Then while monitoring the traffic, check to be sure the issue is gone, then bring them back up one at a time. If you have any network non-PCs like a printers or web cams, start with those first, they are more likely to be the cause.