I'm stress testing a site that we're making, and we're finding a very surprising result compared to my expectations:
Our site starts to load very slow with a few hundred simultaneous people, even though CPU and memory are fine. Looking at Task Manager, Networking tab, I see that my 100 Mbps network card is maxed out at 98%.
For some reason, this sounds extremely ridiculous to me...
Every time I read something on scalability it's CPU, Memory, Caching, etc, etc, and here I'm getting the bottleneck on the network card itself.
We serve all our content gzipped, and our home page is kind of heavy, but not THAT much. I would've never expected the network card to be the bottleneck.
Is this normal?
Is everyone having public facing websites using a 1Gbps network card?
I thought 100 Mbps would be the standard.
Am I looking at something wrong? Am I interpreting the graph in the Networking tab incorrectly?
NOTE: I can think of a number of ways to fix this, starting with getting a 1 Gbps card, and moving static files to their own server(s). My question is mostly around whether everyone is simply using 1 Gbps connections, which would surprise me enormously.
Bandwidth as the first bottleneck is not something I'm too surprised about. CPU, RAM, HD and all the other components have all come on in leaps and bounds over the years, whereas 100 Mbps has been there for over a decade now. So you're in a situation where you've got a good box that's more than capable of handling a typical load, but it's connected using decade+ old technology.
Even so, are you quite certain that your 100 simultaneous user simulation is an accurate reflection of what real world traffic would be like? With 100 absolutely simultaneous hits, it only requires to serve 1 megabit each, or 128K each, for you to hit peak traffic. That's a very low ceiling, and my feeling is that - unless you're certain you're going to be getting that kind of usage - you might need to revise your load testing.
It sounds exactly like you are saturating your available bandwidth. You either need to cut down on your bandwidth or switch to a 1Gbps card, which is what I'd normally expect to find in a publicly facing web server (certainly has been the case with every server class machine I've touched in the last 10 years - where did you find a server with a cheapo 100mbps card anyway? Is it really a repurposed desktop?).
Some things to check or consider:
For years the simpler, less capable web servers have been touting their speed, and for years Apache aficionados have been noting that Apache is fast enough to easily saturate the network interface. Sounds like you have an efficient site. Are you really pumping 100 megabits, or is the network stack just taking up a lot of CPU?
I'd recommend to check the network side of your environment:
If there are no problems, upgrade to gigabit (which should be the standard for servers anyways).