We have a Linux firewall with two outward facing 10Gbe adapters (Intel 82599EB) and one inward facing 10Gbe adapter (Intel 82598EB).
The problem I'm experiencing is that the firewall will only forward inbound traffic at a very low rate: approximately < 2 Mbps. However, a direct connection from the firewall to an "inside" machine gets ~6 Gbps, while a direct connection to the firewall from an outside machine gets ~1 Gbps. There's some tuning to be done clearly, but they demonstrate Gbps speeds.
We recently updated the Intel ixgbe
driver from version 2.1.4 to 3.7.14 due to stability concerns with the 2.1.4 driver (lock-ups) and this seems to be when the throughput problems began.
I also tried the 3.7.17 release, but this gave similar performance to 3.7.14. On reverting to the 2.1.4 driver (re-compiled for an updated kernel, with IXGBE_NO_LRO and IXGBE_NO_NAPI) I was able to get ~Gbps throughput (well ~900 Mbps with iperf over TCP with 3 threads).
This solves the immediate problem, but I would prefer to be able to use the current version of the driver as I'd like to keep up with bug-fixes etc. so, my question is
- How can I troubleshoot Linux router/firewall forwarding performance?
Specifically, how can I find out where the kernel / iptables / network driver, etc. are spending their time when forwarding packets?
Any relevant advice would be appreciated.
Really strange that you only get 1 Gbps of routing performance (even tough filtering usually means 2 copies from in kernel space for the same device, probably 4x for routing) - there was a LKML post a year ago that you can get 120Gbps of routing performance on 2.6.3X series with
ixgbe
devices. I mostly use Intel 10GbE NICs and usually get 1000MByte/s+ withiperf
over a switched infrastructure.First you need to check how the system performs for plain TCP with something like iperf between your endpoints. This should get you a baseline. Remember that a lot of things come into play if you need 10Gbps wire speed. On pre-Nehalem platforms this is even impossible to achieve. Also the system load should match the NUMA layout and the NICs have to be attached to the same PCI-complex (this is important if you're stuck at < 8 Gbps). The ixgbe source distribution has a IRQ pinning script (which also disables things like power saving and the irqbalancer which will only mess up the caches and is not topology aware) that should layout the RX-TX queues evenly across all cores (haven't checked them in a while).
Regarding your question about timings you need a kernel compiled with profiling support and a system level profiler like
oprofile
.Get your endpoint to endpoint performance ironed out before you enable packet filtering or routing and post that.
Several months ago I put a bunch of effort into optimizing Linux for wirespeed Gigabit routing with lots of small packets. This was for a load balancer (IPVS) and not a NAT firewall. Here are some tips based on that.
I have not yet seen any breakdown on time spent per kernel networking function such as switching vs routing vs firewall vs whatever.
Iptables is really an efficient firewall for Linux systems. It can handle a huge amount of traffic without begin the bottleneck given that you have written a good ruleset.
One thing you can do is to disable iptables by flushing all rules and set default
FORWARD
policy toACCEPT
. This way you can eliminate any concern about your iptables implementation. After that, you can look at the network driver and try to debug the problem if it persists.As an advice, be careful and not disable iptables on a publicly accessible machine unless you know what you are doing.
One-way pour performance may be caused by issues with tcp segmentation offload and other settings on NIC. It may be spotted in many cases, e.g. with VM or VPN traffic going through a physical NIC. It's easy to disable it using ethtool and check performance, so it's worth trying (make sure you disable it on both endpoints for test).
Here is a little more background:
http://www.peerwisdom.org/2013/04/03/large-send-offload-and-network-performance/ https://social.technet.microsoft.com/Forums/windowsserver/en-US/bdc40358-45c8-4c4b-883b-a695f382e01a/very-slow-network-performance-with-intel-nic-when-tcp-large-send-offload-is-enabled?forum=winserverhyperv