I'm trying to set up IIS as a software load balancer on a Windows Server 2008 R2 box for back ends serving long lived, HTTP streaming requests. Individual curl requests work fine, but when I use httperf to make 10K connections to the IIS box the current requests seem to max out at 2500 for each of the 2 back ends, rather than the 5000 I would expect. Are the other requests being queued? If so is there some way around it? Is there something else I need to change? Should I use a different load balancer?
It sounds like you're limited by ARR's max concurrent limit. You can test by changing the load balancing algorithm to a different weight and see if the sum of the two nodes is still 5000. That will confirm that ARR is the bottleneck.
How are resources on the ARR server? I bet they are good, in which case it's just a matter of changing the settings. It sounds like ARR is doing a good job for you.
Your setting limit is likely system.webServer/serverRuntime appConcurrentRequestLimit which has a default value of 5000. Since you have a legit need to raise that, you can set it to something much higher on the ARR server(s).