Does anyone have some data or basic calculations that can answer when frame coalescing (NAPI) is required and when a single interrupt per frame is sufficient?
My hardware: IBM BladeServer HS22, Broadcom 5709 Gigabit NIC hardware (MSI-X), with dual Xeon E5530 quad-core processors. Main purpose is Squid proxy server. Switch is a nice Cisco 6500 series.
Our basic problem is that during peak times (100 Mbps traffic, only 10,000 pps) that latency and packet loss increases. I have done a lot of tuning and kernel upgrade to 2.6.38 and it has improved the packet loss but latency is still poor. Pings are sporadic; jumping even to 200ms on local Gbps LAN. Squid average response jumps from 30ms to 500+ms even though CPU/memory load is fine.
The interrupts climb to about 15,000/second during the peak. Ksoftirqd isn't using much CPU; I have installed irqbalance to balance the IRQs (8 each for eth0 and eth1) across all the cores but that hasn't helped much.
Intel NICs seem to never have these kinds of problems, but do the fact of the bladesystem and fixed configuration hardware, we are kind of stuck with the Broadcoms.
Everything is pointing at the NIC as being the main culprit. The best idea I have right now is to try decrease the interrupts while keeping both latency low and throughput high.
The bnx2 unfortunately doesn't support adaptive-rx or tx.
The NAPI vs Adaptive Interrupts thread answer provides a great over view of interrupt moderation but no concrete information on how to calculate optimal ethtool coalesce settings for given workaround. Is there a better approach then just trial and error?
Does the above mentioned workload and hardware configuration even need NAPI? Or should it be able to live on single interrupt per packet?