I recently switched jobs. By the time I left my last job our network was three years old and had been planned very well (in my opinion). Our address range was split down into a bunch of VLANs with the largest subnet a /22 range. It was textbook.
The company I now work for has built up their network over about 20 years. It's quite large, reaches multiple sites, and has an eclectic mix of devices. This organisation only uses VLANs for very specific things. I only know of one usage of VLANs so far and that is the SAN which also crosses a site boundary.
I'm not a network engineer, I'm a support technician. But occasionally I have to do some network traces for debugging problems and I'm astounded by the quantity of broadcast traffic I see. The largest network is a straight Class B network, so it uses a /16 mask. Of course if that were filled with devices the network would likely grind to a halt. I think there are probably 2000+ physical and virtual devices currently using that subnet, but it (mostly) seems to work. This practise seems to go against everything I've been taught.
My question is:
In your opinion and From my perspective - What measurement of which metric would tell me that there is too much broadcast traffic bouncing about the network? And what are the tell-tale signs that you are perhaps treading on thin ice?
The way I see it, there are more and more devices being added and that can only mean more broadcast traffic, so there must be a threshold. Would things just get slower and slower, or would the effects be more subtle than that?
There's nothing -inherently- wrong with a large broadcast domain if it's appropriately (and safely) configured. Employing PVLAN's, for example, can allow for very large networks without too much drama as isolated hosts don't see traffic from one another. Similarly if the network is relatively static, the links very stable and controls are in place to block broadcast/multicast/unicast flooding then it can be made to work.
That said, more often than not the sort of networks you're describing (2000+ hosts) are basically a crisis waiting to happen. Some of the issues/warning signs might include-
Excessive broadcast traffic - Either app traffic being blasted everywhere (i.e. like old school Windows), excessive ARP traffic, etc. Think of this in terms of packets per second moreso than absolute bandwidth - hundreds of packets per second of background traffic is getting up there. Bear in mind that certain network events (switches coming up or down) may exacerbate this terribly.
Network diameter / topological stability - Do transitory spanning tree loops occur under certain conditions (i.e. device reboot)? What volume of TCN's and so forth are you seeing? Is the root bridge moving around at all? Physically how many switches are cascaded together?
How do link failures work? If a link drops, what happens? I've seen situations where things were badly broken to the point where the network topology would literally never stabilize when a redundant link came down. It required mass reboots - well, more properly, it required a complete redesign, but that's a separate issue.
Interface drops on routers and switches? Buffer issues? These can also be hints.
In general bridges that cross physical sites cause a disproportionate amount of trouble. Is there a compelling reason why your sites (or floors) couldn't be broken up into routed subnets? Best practice is certainly to route were possible and bridge where not...
1)
"In your opinion and from my perspective" ... opinion and facts (which this site is based on) mix quite badly.
2)
Yes, with more broadcasts things get slower. But with modern networks (switches rather than hubs and gigabit speeds) using more than a full /24 should not be a problem. Even /22 should be fine.
A lot depends on your your applications and protocols. E.g. I would not fear an average office with network drives and up to 2000 computers and printers.
2b) Cisco seems to disagree with this. :) But then again cisco teaches how class A, B and C networks work. Which should only be thought in history lessons. Not in current network management lessons. :)
I have recently joined another company where I see such VLAN configuration with large subnets such as /16 and or / 22 configured, partly used as DHCP and partly static. Also they have used VLAN 1 and subnets such as 192.168.x.x with their environment. We are getting some slow responses from some manufacturing unit which uses 3rd party applications. I am thinking large broadcast domains could be the cause of delays here.