Background:
I've inherited a high volume caching nameserver environment (Redhat Enterprise Linux 5.8, IBM System x3550) that has inconsistent ring buffer settings: 1020 for eth0 and 255 for eth1. eth0 is connected to switch 1 of its local datacenter, eth1 is connected to switch 2 of the same. Every server in the cluster alternates between whether eth0 or eth1 is the active interface, and every cluster is located in a different region. The ring buffers obviously need to be made consistent.
Here's where things get trickier: I discovered the problem above when researching why a number of the nameservers are frequently logging "error sending response: unset" errors, which the ISC knowledgebase suggests is related to outbound congestion. Servers with the higher ring buffer setting (1020) drop fewer packets on ifconfig (as one would expect), but tend to log the above error with great frequency, ~20k times a day in one of my highest load groups. We'll call this ''Group 1''. The servers with the lower ring buffer (255) setting drop significantly more inbound packets per day (again, expected), but have far fewer instances of the BIND error, typically 0-150 in that same load group.
Not a huge mystery here either. Caching DNS is a recursive service: if something isn't cached, the server has to make multiple queries on behalf of that one question until it can finally return an answer. It's a (one in)->(many out) query relationship. Fixing the RX ring buffers should cause this number to equalize to a new value across the board, and from there it would probably be a good idea to tune the kernel's outbound network queue in proc (wmem_max/wmem_default).
I like being able to gauge the influence of configuration changes on a performance problem, so I wrote a report to gather some data before I started making production changes. Here's an example of the output for the first two servers in Group 1:
group1-01
RX: 7166.27/sec av.
TX: 7432.57/sec av.
RXDROP: 7.43/sec av.
unset_err: 27633
group1-02
RX: 7137.37/sec av.
TX: 7398.50/sec av.
RXDROP: 9.94/sec av.
unset_err: 107
These are the formulas. Note that this is a local script, and there are no reliance on shell scripts that have to be maintained per-server.
RXPACK=$(ssh $server "sar -n DEV -f /var/log/sa/sa$(date --date=yesterday '+%d') | grep \"Average: .*\$(awk '{if (\$2 == "00000000") { print \$1 }}' /proc/net/route)\" | awk '{print \$3}'" 2>/dev/null)
TXPACK=$(ssh $server "sar -n DEV -f /var/log/sa/sa$(date --date=yesterday '+%d') | grep \"Average: .*\$(awk '{if (\$2 == "00000000") { print \$1 }}' /proc/net/route)\" | awk '{print \$4}'" 2>/dev/null)
RXDROP=$(ssh $server "sar -n EDEV -f /var/log/sa/sa$(date --date=yesterday '+%d') | grep \"Average: .*\$(awk '{if (\$2 == "00000000") { print \$1 }}' /proc/net/route)\" | awk '{print \$6}'" 2>/dev/null)
TXDROP=$(ssh $server "sudo grep 'error sending response: unset' /var/log/dns_named.1" 2>/dev/null | wc -l)
Once I start running this report across all of my caching DNS environments, I notice that another group with a near identical packet load, which we'll call Group 2, has no problems at all:
group2-01
RX: 7066.44/sec av.
TX: 7345.95/sec av.
RXDROP: 0.00/sec av.
unset_err: 0
group2-02
RX: 7019.18/sec av.
TX: 7312.47/sec av.
RXDROP: 0.00/sec av.
unset_err: 0
The question:
Why does group2 behave this way without requiring further tuning of RX ring buffers or net.core.wmem_default
/net.core.wmem_max
? I'm going to need to normalize the ring buffers no matter what, but I would like to understand what else is going on here before I start playing with wmem values in /proc.
The only thing I can think of is that the queue is getting emptied faster by the application, but network stack tuning is not something I have a great deal of hands-on experience with and I'd like to get second opinions. (my eyes glaze over at some of the ethtool counter names, I won't deny it)
I have eliminated the following as possibilities. Proofs follow after the divider.
- The ring buffer layout is the same. (first server of group1 and group2 configured the same, second server of group1 and group2 configured the same)
- The default gateway layout is the same.
- The network cards are the same. (Broadcom BCM5708)
- The firmware version reported by ethtool is the same. (bc 4.0.3 ipms 1.6.0)
sysctl -a
output matches between the first servers of both groups and the second servers of both groups. (excluding kernel and fs sections)- The total number of servers in Group 1 and Group 2 are the same. (10)
For confidentiality reasons I cannot show the raw named.conf, or the grep filter I'm using to exclude information. You will have to take my word for it that the following configuration parameters are constant between all four servers:
notify no;
allow-transfer { none; };
allow-recursion { any; };
allow-query { any; };
allow-query-cache { any; };
recursive-clients 100000;
max-cache-size 2G;
max-ncache-ttl 900;
Below is a great deal of system information. The "hosthash" is just to demonstrate that each iteration of the loop is in fact hitting a different server without revealing the actual hostname.
Host hashes:
group1-1: dc78abcb154b74c87feecb3f35222263d40c028c
group1-2: 9fe491d58fd1e7d4e21e5bf10c164e4cf66e884b
group2-1: fc76bb3ee1ff580c6aba0d685713bb4145bd5fe3
group2-2: b7550c65d37622a131b1e47f066773defbb4d817
for server in $group1_1 $group1_2 $group2_1 $group2_2
do
echo ____________________
ssh $server "echo -en hosthash: \$(echo \$HOSTNAME | sha1sum)\\\n\\\n &&
SARFILE=/var/log/sa/sa\$(date --date=yesterday '+%d') &&
uname -srvmpio &&
sudo /usr/sbin/dmidecode -s system-product-name
dmesg | grep Broadcom &&
head /proc/cpuinfo &&
GWIF=\$(awk '{if (\$2 == 00000000) { print \$1 }}' /proc/net/route) &&
sar -n DEV -f \$SARFILE | egrep '(IFACE|Average)' &&
sar -n EDEV -f \$SARFILE | egrep '(IFACE|Average)' &&
sudo /sbin/ethtool \$GWIF &&
sudo /sbin/ethtool -i \$GWIF &&
sudo /sbin/ethtool -g \$GWIF &&
sudo /sbin/ethtool -c \$GWIF &&
sudo /sbin/ethtool -S \$GWIF &&
echo sysctl linecount: \$(sudo /sbin/sysctl -a | egrep -v '^(fs|kernel)' | wc -l) &&
echo sysctl hash: \$(sudo /sbin/sysctl -a | egrep -v '^(fs|kernel)' | sha1sum)"
done
Output:
____________________
hosthash: dc78abcb154b74c87feecb3f35222263d40c028c -
Linux 2.6.18-308.16.1.el5 #1 SMP Tue Sep 18 07:21:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
IBM System x3550 -[7978AC1]-
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 90, node addr 001a649db00e
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem ce000000, IRQ 177, node addr 001a649db010
cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.7 (July 20, 2011)
Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Aug 04, 2011)
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
stepping : 6
cpu MHz : 2493.750
cache size : 6144 KB
physical id : 0
siblings : 4
12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
Average: lo 1269.15 1269.15 206600.39 206600.39 0.00 0.00 0.00
Average: eth0 7166.27 7432.57 704051.80 2419779.42 0.00 0.00 0.94
Average: eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:00:01 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth0 0.00 0.00 0.00 7.43 0.00 0.00 0.00 0.00 0.00
Average: eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
driver: bnx2
version: 2.1.11
firmware-version: bc 4.0.3 ipms 1.6.0
bus-info: 0000:04:00.0
Ring parameters for eth0:
Pre-set maximums:
RX: 2040
RX Mini: 0
RX Jumbo: 8160
TX: 255
Current hardware settings:
RX: 1020
RX Mini: 0
RX Jumbo: 0
TX: 255
Coalesce parameters for eth0:
Adaptive RX: off TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 18
rx-frames: 12
rx-usecs-irq: 18
rx-frames-irq: 2
tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
NIC statistics:
rx_bytes: 1505439501410
rx_error_bytes: 0
tx_bytes: 4672574845104
tx_error_bytes: 0
rx_ucast_packets: 15315548049
rx_mcast_packets: 2035415
rx_bcast_packets: 1101989
tx_ucast_packets: 15505474251
tx_mcast_packets: 40018
tx_bcast_packets: 36019
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 0
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 0
rx_jabbers: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_64_byte_packets: 92309552
rx_65_to_127_byte_packets: 1243637891
rx_128_to_255_byte_packets: 790117566
rx_256_to_511_byte_packets: 127197337
rx_512_to_1023_byte_packets: 168929387
rx_1024_to_1522_byte_packets: 11591832
rx_1523_to_9022_byte_packets: 0
tx_64_byte_packets: 60586118
tx_65_to_127_byte_packets: 1976738758
tx_128_to_255_byte_packets: 2830395753
tx_256_to_511_byte_packets: 157607989
tx_512_to_1023_byte_packets: 1483716940
tx_1024_to_1522_byte_packets: 406821340
tx_1523_to_9022_byte_packets: 0
rx_xon_frames: 0
rx_xoff_frames: 0
tx_xon_frames: 116422
tx_xoff_frames: 134780
rx_mac_ctrl_frames: 0
rx_filtered_packets: 0
rx_ftq_discards: 0
rx_discards: 0
rx_fw_discards: 14015105
sysctl linecount: 504
sysctl hash: dd6aab90d0fd9ae90742c5f812a78734e2f2ff1c -
____________________
hosthash: 9fe491d58fd1e7d4e21e5bf10c164e4cf66e884b -
Linux 2.6.18-308.16.1.el5 #1 SMP Tue Sep 18 07:21:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
IBM System x3550 -[7978EHU]-
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 90, node addr 001a6479655c
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem ce000000, IRQ 177, node addr 001a6479655e
cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.7 (July 20, 2011)
Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Aug 04, 2011)
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
stepping : 6
cpu MHz : 2493.746
cache size : 6144 KB
physical id : 0
siblings : 4
12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
Average: lo 1261.04 1261.04 205548.08 205548.08 0.00 0.00 0.00
Average: eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth1 7137.37 7398.50 702340.35 2409580.71 0.00 0.00 0.97
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:00:01 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth1 0.00 0.00 0.00 9.94 0.00 0.00 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
driver: bnx2
version: 2.1.11
firmware-version: bc 4.0.3 ipms 1.6.0
bus-info: 0000:06:00.0
Ring parameters for eth1:
Pre-set maximums:
RX: 2040
RX Mini: 0
RX Jumbo: 8160
TX: 255
Current hardware settings:
RX: 255
RX Mini: 0
RX Jumbo: 0
TX: 255
Coalesce parameters for eth1:
Adaptive RX: off TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 18
rx-frames: 12
rx-usecs-irq: 18
rx-frames-irq: 2
tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
NIC statistics:
rx_bytes: 1501719289640
rx_error_bytes: 0
tx_bytes: 4654179094291
tx_error_bytes: 0
rx_ucast_packets: 15253610508
rx_mcast_packets: 2108112
rx_bcast_packets: 1136240
tx_ucast_packets: 15438361249
tx_mcast_packets: 40135
tx_bcast_packets: 1721
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 0
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 0
rx_jabbers: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_64_byte_packets: 92376678
rx_65_to_127_byte_packets: 1183040190
rx_128_to_255_byte_packets: 788176623
rx_256_to_511_byte_packets: 126838328
rx_512_to_1023_byte_packets: 168170816
rx_1024_to_1522_byte_packets: 13350337
rx_1523_to_9022_byte_packets: 0
tx_64_byte_packets: 60806588
tx_65_to_127_byte_packets: 1955234150
tx_128_to_255_byte_packets: 2806601346
tx_256_to_511_byte_packets: 154015585
tx_512_to_1023_byte_packets: 1466206531
tx_1024_to_1522_byte_packets: 405928513
tx_1523_to_9022_byte_packets: 0
rx_xon_frames: 0
rx_xoff_frames: 0
tx_xon_frames: 150648
tx_xoff_frames: 173552
rx_mac_ctrl_frames: 0
rx_filtered_packets: 1
rx_ftq_discards: 0
rx_discards: 0
rx_fw_discards: 19605427
sysctl linecount: 504
sysctl hash: 4626e3788c72e091487afe1e3a7cfd32278ab07d -
____________________
hosthash: fc76bb3ee1ff580c6aba0d685713bb4145bd5fe3 -
Linux 2.6.18-308.16.1.el5 #1 SMP Tue Sep 18 07:21:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
IBM System x3550 -[7978AC1]-
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 90, node addr 001a649dc68a
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem ce000000, IRQ 177, node addr 001a649dc68c
cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.7 (July 20, 2011)
Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Aug 04, 2011)
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
stepping : 6
cpu MHz : 2493.750
cache size : 6144 KB
physical id : 0
siblings : 4
12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
Average: lo 1891.67 1891.67 266593.77 266593.77 0.00 0.00 0.00
Average: eth0 7066.44 7345.95 730519.41 2215508.99 0.00 0.00 4.37
Average: eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:00:01 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
driver: bnx2
version: 2.1.11
firmware-version: bc 4.0.3 ipms 1.6.0
bus-info: 0000:04:00.0
Ring parameters for eth0:
Pre-set maximums:
RX: 2040
RX Mini: 0
RX Jumbo: 8160
TX: 255
Current hardware settings:
RX: 1020
RX Mini: 0
RX Jumbo: 0
TX: 255
Coalesce parameters for eth0:
Adaptive RX: off TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 18
rx-frames: 12
rx-usecs-irq: 18
rx-frames-irq: 2
tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
NIC statistics:
rx_bytes: 4640887074833
rx_error_bytes: 0
tx_bytes: 12640942400790
tx_error_bytes: 0
rx_ucast_packets: 46405845860
rx_mcast_packets: 14487857
rx_bcast_packets: 3476467
tx_ucast_packets: 47159091638
tx_mcast_packets: 118147
tx_bcast_packets: 5504
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 0
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 0
rx_jabbers: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_64_byte_packets: 136463411
rx_65_to_127_byte_packets: 4245502343
rx_128_to_255_byte_packets: 2357984838
rx_256_to_511_byte_packets: 355610202
rx_512_to_1023_byte_packets: 608223572
rx_1024_to_1522_byte_packets: 65320154
rx_1523_to_9022_byte_packets: 0
tx_64_byte_packets: 112166114
tx_65_to_127_byte_packets: 3010346100
tx_128_to_255_byte_packets: 4087240164
tx_256_to_511_byte_packets: 1625596725
tx_512_to_1023_byte_packets: 3037109096
tx_1024_to_1522_byte_packets: 927187571
tx_1523_to_9022_byte_packets: 0
rx_xon_frames: 0
rx_xoff_frames: 0
tx_xon_frames: 79164
tx_xoff_frames: 89685
rx_mac_ctrl_frames: 0
rx_filtered_packets: 1
rx_ftq_discards: 0
rx_discards: 0
rx_fw_discards: 6857729
sysctl linecount: 504
sysctl hash: dd6aab90d0fd9ae90742c5f812a78734e2f2ff1c -
____________________
hosthash: b7550c65d37622a131b1e47f066773defbb4d817 -
Linux 2.6.18-308.16.1.el5 #1 SMP Tue Sep 18 07:21:07 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
IBM System x3550 -[7978EHU]-
bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem c8000000, IRQ 90, node addr 00215e3f1ec4
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem ce000000, IRQ 177, node addr 00215e3f1ec6
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
stepping : 6
cpu MHz : 2493.753
cache size : 6144 KB
physical id : 1
siblings : 4
12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
Average: lo 1883.04 1883.04 263726.79 263726.79 0.00 0.00 0.00
Average: eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth1 7019.18 7312.47 720911.92 2214861.10 0.00 0.00 1.02
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:00:01 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
driver: bnx2
version: 2.1.11
firmware-version: bc 4.0.3 ipms 1.6.0
bus-info: 0000:06:00.0
Ring parameters for eth1:
Pre-set maximums:
RX: 2040
RX Mini: 0
RX Jumbo: 8160
TX: 255
Current hardware settings:
RX: 255
RX Mini: 0
RX Jumbo: 0
TX: 255
Coalesce parameters for eth1:
Adaptive RX: off TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 18
rx-frames: 12
rx-usecs-irq: 18
rx-frames-irq: 2
tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 18
tx-frames-irq: 2
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
NIC statistics:
rx_bytes: 4621548539323
rx_error_bytes: 0
tx_bytes: 12598031299743
tx_error_bytes: 0
rx_ucast_packets: 46260356368
rx_mcast_packets: 5352446
rx_bcast_packets: 3474589
tx_ucast_packets: 47008853953
tx_mcast_packets: 118164
tx_bcast_packets: 5471
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 0
rx_align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
rx_fragments: 0
rx_jabbers: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_64_byte_packets: 126851062
rx_65_to_127_byte_packets: 4117708205
rx_128_to_255_byte_packets: 2346047550
rx_256_to_511_byte_packets: 356266112
rx_512_to_1023_byte_packets: 604666332
rx_1024_to_1522_byte_packets: 62938478
rx_1523_to_9022_byte_packets: 0
tx_64_byte_packets: 111216848
tx_65_to_127_byte_packets: 2984505931
tx_128_to_255_byte_packets: 4027485330
tx_256_to_511_byte_packets: 1577669672
tx_512_to_1023_byte_packets: 3015060448
tx_1024_to_1522_byte_packets: 933575954
tx_1523_to_9022_byte_packets: 0
rx_xon_frames: 0
rx_xoff_frames: 0
tx_xon_frames: 129873
tx_xoff_frames: 145090
rx_mac_ctrl_frames: 0
rx_filtered_packets: 1
rx_ftq_discards: 0
rx_discards: 0
rx_fw_discards: 6752713
sysctl linecount: 504
sysctl hash: 4626e3788c72e091487afe1e3a7cfd32278ab07d -
Wondering if the box is a Dell? There's a well known issue with the bnx2i driver and chipsets shipped by Dell. The result is randomly dropped packets under heavy network load. Would seem logical that the tuned-up ring buffers could trigger it, if this is the case.
I believe Dell offers their own version of the driver as a fix. The other fix is to do something like this in modprobe.conf:
options bnx2i disable_msi=1
Can't hurt to try, anyhow. And x2 what kce said. One of the best written questions I've ever seen here.
Even if you're sure that you have a full list of load balancer VIPs for your servers, run a packet capture anyway. Just because your machine won't respond to ARP for an IP address doesn't mean that bogus packets can't be sent to it. Make sure the traffic being sent to your MAC addresses are matching up with configured IP addresses.
I appreciate the time that people put into this question, but my own due diligence was lacking here. In hindsight, I needed to build a PCAP filter like this:
Where:
There were a number of load balancer VIPs that were not given to me (I don't control the LB), and they were passing traffic on TCP port 53 in ways that would result in RX discards. The volume of traffic on these legacy IPs was so low that it was not likely to be noticed by an admin eyeballing traffic on the wire.