I manage a campground wifi network with an average of 10 - 60 active users. I have encountered issues where the router starts acting flaky (failing to assign DHCP or failing to pass traffic) without any clear warning (low cpu utilization, etc). I upgraded the router a couple times and ended up with a Netgear ProSafe VPN router that seems to be handling the traffic. The interesting thing is that the Netgear has lower specs than the Buffalo router it replaced, indicating the issue is with the DD-WRT firmware. While I'll be pursuing this issue on the dd-wrt forums, I need a way to test routers.
My vision is having 1-2 computers connected on the LAN side and 1-2 computers connected on the WAN side. I want the LAN computers to be generating various type of traffic and connections, as well as requesting DCHP addresses.
A few notes:
- The wireless aspect should be a non-issue. Most clients would connect to a wireless bridge and come into the router through a network cable.
- I had a monitoring server with Nagios running check_dhcp against the router. This server was connected directly by a network cable, eliminating wifi bridges and other devices from the equation.
- This question is somewhat related, but not exactly: Load testing wireless LANs I am going to look at IxChariot.
- While I'd ideally like to use a 1 computer on each side running Linux and preferably free software, I can entertain running Windows, multiple computers, or non-free software.
- Total bandwidth doesn't seem to be the issue. I can transfer large files all day. Even on the busiest days, the users seemed to only pull ~5Mbps. There is very little "LAN to LAN traffic" and most of it might never have reached the main router.
- The issue I need to test for seems to be tied to active users, or more appropriately, active sessions.
- I know active users or active clients is a meaningless term from a router standpoint and wouldn't mind having more appropriate terms to use.
Summary: I need a way to test a routers ability in handling traffic from a large number of clients. My current strategy is to purchase a router, deploy it, and see how it fails in the live environment.
My suggestion would be to forget about using the cheap consumer or even the more expensive "small business" router/NAT devices. Every one of them I've gotten my hands on has been very disappointing as far as performance, reliability, functionality, and ease-of-use. These devices frequently have very small amounts of RAM and when loaded with more than a very modest amount of traffic will frequently run out of RAM (causing DHCP issues like you explained) or exhaust the space available in their state tracking tables (causing the router to refuse any new network flows until old state entries time out).
You ought to consider building a PFSense-based router. PFSense is a FreeBSD-based routing distribution. It can run either on a spare PC with two network cards or more preferrably, a small low-power embedded computer.
My preferred PFSense board is the ALIX 2d3. This board can be purchased as a kit along with a case, power supply, and CF card for a bit over $190. The installation process is very simple, and is well-documented on the PFSense website.
Once installed, you'll use a Web GUI for the rest of your configuration and maintenance. You'll find that not only does PFSense perform better, but it will be more reliable and incredibly more feature-rich than the commercially-available offerrings.
To directly answer your question about load testing: it's probably not worth your time. Load testing this sort of thing in a meaningful way is very non-trivial. Before being able to load test properly, you'd need to analyze traffic patterns during times when you're experiencing the issues and then use something to generate those type of traffic patterns. It won't be enough just to schlep a bunch of data through the router or even to, say, run an http load tester over the device. You'll need to test it with traffic that mimics your real load. Is is for this reason that I mentioned earlier that it's probably not worth your time.
Either upgrade to a high-quality (read: expensive) router or do something like I outlined above using PFSense.
Sixty active users may be a little high for the default DHCP settings on most routers. For a setup like that ensure you have a DHCP pool much larger than 60. You may want to set your DHCP range to start at 100 and allow 149 addresses. Setting the lease time to less than a day may help recapture addresses from computer which aren't switched on.
I would expect 60 active users will also generate significantly more than 60 active network connections (session) at any one time. It takes a couple of minutes for the connection to be shutdown after it is no longer being used.
You should be able to test connection capability with most load testing software. This should give you a sense of how many connections are possible over a period of minutes. You can will exercise the state tracking tables by creating lots of new connections over a relatively short period of time. Once you stop getting connections, you have an idea what the limit is.
Another area which may cause problems is the DNS cache on the router. Load testing with lookups for lots of different domains can give you an idea if this is the problem.
I don't know of any software for testing DHCP capacity. I would look at the lease count (available from the leases database) over time. As long as it stays below 80% of capacity you should be fine in that respect.
When increasing the size of the DHCP pool make sure you start above 1 (which is normally the routers address). Also keep the total of the size and starting point below 255. You may get better results using a prime number for the size of the pool.
There are plenty of Open Source solutions which you implement. Configuring a DNS cache server and a web proxy may help with the load.
EDIT: One thing you may be running into is network saturation. If you have access to the error counters for the router's interfaces you may see some indication as error rates increase.
I have seen reports lately that newer commercial routers are causing problems by queueing large volumes of packets. This can cause problems with bursty performance. Traffic shaping your load below the available bandwidth from your ISP is reported to help. Your lower spec router may be doing some traffic shaping inadvertently.
I saw a problem very much like this in a fitness club about two weeks ago. The solution was to make the DHCP range as broad as possible and to change the DHCP lease time to 1 hour.
My opinion is that you should stop "hacking" your routers with unsupported third party firmware and load test them with the firmware that comes from the manufacturer. You can't possibly know whether the problem is the router or the firmware unless you test them with the firmware that came with them out of the box.