I have a Dell R610 with a dual-port 10G Broadcom NIC (BCM5719 - using the bnx2x driver); it has ~1000 pppoe users with an average throughput of 1-1.5 Gbps and peaks up to 3 Gbps.
I have noticed that with a traffic above 1 Gbps packet loss with the gateway takes place and looked at the interface statistics, where instead of the expected drops I have found only rx errors (on both ports, but the most on the WAN port) and overruns. The first reaction was to change the SFP modules between the ports, then testing a new SFP module, then I have replaced the machine with the available identical cold-spare - all with no improvement.
After some reading I started by increasing the ring buffer to its maximum value (ethtool -G enp5s0f0 4078
) and playing with the offload settings (ethtool -K enp5s0f0 gro off gso off gso off lro off
). I suppose the second command did the trick, but I am not 100% sure because additionaly I have upgraded the Linux distro (debian) to its latest release, together with the bnx2x driver and its non-free firmware.
Also I have shortened the default nf_conntrack_tcp_timeout_established
from its default - 432000 = 5 days - 10 min (although never filled up, the conntrack table was now reduced from ~110k to ~15k established connections).
Not sure if this or the ethtool settings above made the difference, but I got rid of the packet loss problem and now
Current situation: no more packet loss, traffic seems not to be affected anymore, but the rx errors counter is continuously increasing (quicker when the throughput goes above 1.5 Gbps).
I did not have dropped packets, either on the interface counters or as result of a full conntrack table; the number of overruns was quite big until the
The graph below shows the errors counter (taken from ifconfig) with a 5 days uptime and the interface statistics.
I would be very grateful for an explanation of this phenomenon and, if possible, a mitigation of the problem.
[root@gw01]:~# ethtool -S enp5s0f0
NIC statistics:
[0]: rx_bytes: 5754726086899
[0]: rx_ucast_packets: 4555048326
[0]: rx_mcast_packets: 0
[0]: rx_bcast_packets: 0
[0]: rx_discards: 0
[0]: rx_phy_ip_err_discards: 0
[0]: rx_skb_alloc_discard: 0
[0]: rx_csum_offload_errors: 0
[0]: tx_exhaustion_events: 0
[0]: tx_bytes: 6070322710813
[0]: tx_ucast_packets: 15474233735
[0]: tx_mcast_packets: 9
[0]: tx_bcast_packets: 0
[0]: tpa_aggregations: 0
[0]: tpa_aggregated_frames: 0
[0]: tpa_bytes: 0
[0]: driver_filtered_tx_pkt: 0
[1]: rx_bytes: 6111893987703
[1]: rx_ucast_packets: 4778498023
[1]: rx_mcast_packets: 0
[1]: rx_bcast_packets: 0
[1]: rx_discards: 0
[1]: rx_phy_ip_err_discards: 0
[1]: rx_skb_alloc_discard: 0
[1]: rx_csum_offload_errors: 53
[1]: tx_exhaustion_events: 0
[1]: tx_bytes: 16548597730
[1]: tx_ucast_packets: 60592175
[1]: tx_mcast_packets: 0
[1]: tx_bcast_packets: 0
[1]: tpa_aggregations: 0
[1]: tpa_aggregated_frames: 0
[1]: tpa_bytes: 0
[1]: driver_filtered_tx_pkt: 0
[2]: rx_bytes: 5807120321999
[2]: rx_ucast_packets: 4580626375
[2]: rx_mcast_packets: 0
[2]: rx_bcast_packets: 0
[2]: rx_discards: 0
[2]: rx_phy_ip_err_discards: 0
[2]: rx_skb_alloc_discard: 0
[2]: rx_csum_offload_errors: 25
[2]: tx_exhaustion_events: 0
[2]: tx_bytes: 17307821972
[2]: tx_ucast_packets: 64955978
[2]: tx_mcast_packets: 143
[2]: tx_bcast_packets: 1
[2]: tpa_aggregations: 0
[2]: tpa_aggregated_frames: 0
[2]: tpa_bytes: 0
[2]: driver_filtered_tx_pkt: 0
[3]: rx_bytes: 7585668862381
[3]: rx_ucast_packets: 6050090451
[3]: rx_mcast_packets: 0
[3]: rx_bcast_packets: 0
[3]: rx_discards: 0
[3]: rx_phy_ip_err_discards: 0
[3]: rx_skb_alloc_discard: 0
[3]: rx_csum_offload_errors: 0
[3]: tx_exhaustion_events: 0
[3]: tx_bytes: 15857583114
[3]: tx_ucast_packets: 62050481
[3]: tx_mcast_packets: 2
[3]: tx_bcast_packets: 0
[3]: tpa_aggregations: 0
[3]: tpa_aggregated_frames: 0
[3]: tpa_bytes: 0
[3]: driver_filtered_tx_pkt: 0
[4]: rx_bytes: 5507320349168
[4]: rx_ucast_packets: 4323030058
[4]: rx_mcast_packets: 0
[4]: rx_bcast_packets: 0
[4]: rx_discards: 0
[4]: rx_phy_ip_err_discards: 0
[4]: rx_skb_alloc_discard: 0
[4]: rx_csum_offload_errors: 0
[4]: tx_exhaustion_events: 0
[4]: tx_bytes: 18875914644
[4]: tx_ucast_packets: 61667057
[4]: tx_mcast_packets: 0
[4]: tx_bcast_packets: 1
[4]: tpa_aggregations: 0
[4]: tpa_aggregated_frames: 0
[4]: tpa_bytes: 0
[4]: driver_filtered_tx_pkt: 0
[5]: rx_bytes: 7597606902068
[5]: rx_ucast_packets: 5984661040
[5]: rx_mcast_packets: 0
[5]: rx_bcast_packets: 0
[5]: rx_discards: 0
[5]: rx_phy_ip_err_discards: 0
[5]: rx_skb_alloc_discard: 0
[5]: rx_csum_offload_errors: 1
[5]: tx_exhaustion_events: 0
[5]: tx_bytes: 15257461291
[5]: tx_ucast_packets: 61970141
[5]: tx_mcast_packets: 0
[5]: tx_bcast_packets: 0
[5]: tpa_aggregations: 0
[5]: tpa_aggregated_frames: 0
[5]: tpa_bytes: 0
[5]: driver_filtered_tx_pkt: 0
[6]: rx_bytes: 6104830059179
[6]: rx_ucast_packets: 4796493913
[6]: rx_mcast_packets: 0
[6]: rx_bcast_packets: 0
[6]: rx_discards: 910
[6]: rx_phy_ip_err_discards: 0
[6]: rx_skb_alloc_discard: 0
[6]: rx_csum_offload_errors: 1
[6]: tx_exhaustion_events: 0
[6]: tx_bytes: 17389300423
[6]: tx_ucast_packets: 64203382
[6]: tx_mcast_packets: 0
[6]: tx_bcast_packets: 0
[6]: tpa_aggregations: 0
[6]: tpa_aggregated_frames: 0
[6]: tpa_bytes: 0
[6]: driver_filtered_tx_pkt: 0
[7]: rx_bytes: 6185384387977
[7]: rx_ucast_packets: 4817905006
[7]: rx_mcast_packets: 0
[7]: rx_bcast_packets: 0
[7]: rx_discards: 0
[7]: rx_phy_ip_err_discards: 0
[7]: rx_skb_alloc_discard: 0
[7]: rx_csum_offload_errors: 0
[7]: tx_exhaustion_events: 0
[7]: tx_bytes: 16405943750
[7]: tx_ucast_packets: 60554882
[7]: tx_mcast_packets: 0
[7]: tx_bcast_packets: 0
[7]: tpa_aggregations: 0
[7]: tpa_aggregated_frames: 0
[7]: tpa_bytes: 0
[7]: driver_filtered_tx_pkt: 0
rx_bytes: 50654550957374
rx_error_bytes: 0
rx_ucast_packets: 39886353192
rx_mcast_packets: 0
rx_bcast_packets: 0
rx_crc_errors: 0
rx_align_errors: 0
rx_undersize_packets: 0
rx_oversize_packets: 0
rx_fragments: 0
rx_jabbers: 0
rx_discards: 910
rx_filtered_packets: 49130
rx_mf_tag_discard: 0
pfc_frames_received: 0
pfc_frames_sent: 0
rx_brb_discard: 27405292
rx_brb_truncate: 1329322
rx_pause_frames: 0
rx_mac_ctrl_frames: 0
rx_constant_pause_events: 0
rx_phy_ip_err_discards: 0
rx_skb_alloc_discard: 0
rx_csum_offload_errors: 80
tx_exhaustion_events: 0
tx_bytes: 6187965333737
tx_error_bytes: 0
tx_ucast_packets: 15910227831
tx_mcast_packets: 154
tx_bcast_packets: 2
tx_mac_errors: 0
tx_carrier_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
tx_64_byte_packets: 2006785053
tx_65_to_127_byte_packets: 8659794036
tx_128_to_255_byte_packets: 1069905700
tx_256_to_511_byte_packets: 348433834
tx_512_to_1023_byte_packets: 446363890
tx_1024_to_1522_byte_packets: 3378961793
tx_1523_to_9022_byte_packets: 0
tx_pause_frames: 7912342
tpa_aggregations: 0
tpa_aggregated_frames: 0
tpa_bytes: 0
recoverable_errors: 0
unrecoverable_errors: 0
driver_filtered_tx_pkt: 0
Tx LPI entry count: 0
ptp_skipped_tx_tstamp: 0
[root@gw01]:~# ethtool -S enp5s0f1
NIC statistics:
[0]: rx_bytes: 6254611311587
[0]: rx_ucast_packets: 15509699659
[0]: rx_mcast_packets: 1380
[0]: rx_bcast_packets: 2526946
[0]: rx_discards: 15041
[0]: rx_phy_ip_err_discards: 0
[0]: rx_skb_alloc_discard: 0
[0]: rx_csum_offload_errors: 0
[0]: tx_exhaustion_events: 0
[0]: tx_bytes: 6265388531698
[0]: tx_ucast_packets: 4863734758
[0]: tx_mcast_packets: 1361
[0]: tx_bcast_packets: 0
[0]: tpa_aggregations: 0
[0]: tpa_aggregated_frames: 0
[0]: tpa_bytes: 0
[0]: driver_filtered_tx_pkt: 0
[1]: rx_bytes: 4399727408
[1]: rx_ucast_packets: 8829318
[1]: rx_mcast_packets: 473
[1]: rx_bcast_packets: 35434
[1]: rx_discards: 0
[1]: rx_phy_ip_err_discards: 0
[1]: rx_skb_alloc_discard: 0
[1]: rx_csum_offload_errors: 0
[1]: tx_exhaustion_events: 0
[1]: tx_bytes: 6513873746426
[1]: tx_ucast_packets: 5033111560
[1]: tx_mcast_packets: 1207
[1]: tx_bcast_packets: 0
[1]: tpa_aggregations: 0
[1]: tpa_aggregated_frames: 0
[1]: tpa_bytes: 0
[1]: driver_filtered_tx_pkt: 0
[2]: rx_bytes: 5185615964
[2]: rx_ucast_packets: 8755258
[2]: rx_mcast_packets: 514
[2]: rx_bcast_packets: 2329
[2]: rx_discards: 0
[2]: rx_phy_ip_err_discards: 0
[2]: rx_skb_alloc_discard: 0
[2]: rx_csum_offload_errors: 0
[2]: tx_exhaustion_events: 0
[2]: tx_bytes: 6367820159824
[2]: tx_ucast_packets: 4904543961
[2]: tx_mcast_packets: 1622
[2]: tx_bcast_packets: 4268
[2]: tpa_aggregations: 0
[2]: tpa_aggregated_frames: 0
[2]: tpa_bytes: 0
[2]: driver_filtered_tx_pkt: 0
[3]: rx_bytes: 2903903736
[3]: rx_ucast_packets: 8047737
[3]: rx_mcast_packets: 544
[3]: rx_bcast_packets: 7248
[3]: rx_discards: 0
[3]: rx_phy_ip_err_discards: 0
[3]: rx_skb_alloc_discard: 0
[3]: rx_csum_offload_errors: 0
[3]: tx_exhaustion_events: 0
[3]: tx_bytes: 6592027526021
[3]: tx_ucast_packets: 5129457175
[3]: tx_mcast_packets: 1034
[3]: tx_bcast_packets: 0
[3]: tpa_aggregations: 0
[3]: tpa_aggregated_frames: 0
[3]: tpa_bytes: 0
[3]: driver_filtered_tx_pkt: 0
[4]: rx_bytes: 6931882922
[4]: rx_ucast_packets: 10206552
[4]: rx_mcast_packets: 1719
[4]: rx_bcast_packets: 4319
[4]: rx_discards: 0
[4]: rx_phy_ip_err_discards: 0
[4]: rx_skb_alloc_discard: 0
[4]: rx_csum_offload_errors: 0
[4]: tx_exhaustion_events: 0
[4]: tx_bytes: 6448967965669
[4]: tx_ucast_packets: 5022492659
[4]: tx_mcast_packets: 971
[4]: tx_bcast_packets: 0
[4]: tpa_aggregations: 0
[4]: tpa_aggregated_frames: 0
[4]: tpa_bytes: 0
[4]: driver_filtered_tx_pkt: 0
[5]: rx_bytes: 3800756009
[5]: rx_ucast_packets: 8248957
[5]: rx_mcast_packets: 477
[5]: rx_bcast_packets: 3161083
[5]: rx_discards: 0
[5]: rx_phy_ip_err_discards: 0
[5]: rx_skb_alloc_discard: 0
[5]: rx_csum_offload_errors: 0
[5]: tx_exhaustion_events: 0
[5]: tx_bytes: 6111594864886
[5]: tx_ucast_packets: 4809171334
[5]: tx_mcast_packets: 2055
[5]: tx_bcast_packets: 0
[5]: tpa_aggregations: 0
[5]: tpa_aggregated_frames: 0
[5]: tpa_bytes: 0
[5]: driver_filtered_tx_pkt: 0
[6]: rx_bytes: 5162315054
[6]: rx_ucast_packets: 9801307
[6]: rx_mcast_packets: 376
[6]: rx_bcast_packets: 5682
[6]: rx_discards: 0
[6]: rx_phy_ip_err_discards: 0
[6]: rx_skb_alloc_discard: 0
[6]: rx_csum_offload_errors: 0
[6]: tx_exhaustion_events: 0
[6]: tx_bytes: 6186413580299
[6]: tx_ucast_packets: 4819348468
[6]: tx_mcast_packets: 1755
[6]: tx_bcast_packets: 27814
[6]: tpa_aggregations: 0
[6]: tpa_aggregated_frames: 0
[6]: tpa_bytes: 0
[6]: driver_filtered_tx_pkt: 0
[7]: rx_bytes: 3957400032
[7]: rx_ucast_packets: 7706880
[7]: rx_mcast_packets: 439
[7]: rx_bcast_packets: 2078
[7]: rx_discards: 0
[7]: rx_phy_ip_err_discards: 0
[7]: rx_skb_alloc_discard: 0
[7]: rx_csum_offload_errors: 0
[7]: tx_exhaustion_events: 0
[7]: tx_bytes: 6382270562870
[7]: tx_ucast_packets: 4927082505
[7]: tx_mcast_packets: 1207
[7]: tx_bcast_packets: 0
[7]: tpa_aggregations: 0
[7]: tpa_aggregated_frames: 0
[7]: tpa_bytes: 0
[7]: driver_filtered_tx_pkt: 0
rx_bytes: 6286954089263
rx_error_bytes: 1176551
rx_ucast_packets: 15571295668
rx_mcast_packets: 5922
rx_bcast_packets: 5745119
rx_crc_errors: 0
rx_align_errors: 0
rx_undersize_packets: 0
rx_oversize_packets: 709
rx_fragments: 0
rx_jabbers: 0
rx_discards: 15041
rx_filtered_packets: 34473441
rx_mf_tag_discard: 0
pfc_frames_received: 0
pfc_frames_sent: 0
rx_brb_discard: 50179
rx_brb_truncate: 2570
rx_pause_frames: 0
rx_mac_ctrl_frames: 0
rx_constant_pause_events: 0
rx_phy_ip_err_discards: 0
rx_skb_alloc_discard: 0
rx_csum_offload_errors: 0
tx_exhaustion_events: 0
tx_bytes: 50868356937693
tx_error_bytes: 0
tx_ucast_packets: 39508942420
tx_mcast_packets: 11212
tx_bcast_packets: 32082
tx_mac_errors: 0
tx_carrier_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
tx_deferred: 0
tx_excess_collisions: 0
tx_late_collisions: 0
tx_total_collisions: 0
tx_64_byte_packets: 112055708
tx_65_to_127_byte_packets: 2743786458
tx_128_to_255_byte_packets: 1274063534
tx_256_to_511_byte_packets: 675953588
tx_512_to_1023_byte_packets: 832925023
tx_1024_to_1522_byte_packets: 33870217710
tx_1523_to_9022_byte_packets: 0
tx_pause_frames: 8837
tpa_aggregations: 0
tpa_aggregated_frames: 0
tpa_bytes: 0
recoverable_errors: 0
unrecoverable_errors: 0
driver_filtered_tx_pkt: 0
Tx LPI entry count: 0
ptp_skipped_tx_tstamp: 0
[root@gw01]:~# ifconfig enp5s0f0
enp5s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 1.2.3.1 netmask 255.255.255.252 broadcast 1.2.3.3
inet6 fe80::f6e9:d4ff:fe95:98f0 prefixlen 64 scopeid 0x20<link>
ether f4:e9:d4:95:98:f0 txqueuelen 1000 (Ethernet)
RX packets 39878608868 bytes 50643967437038 (46.0 TiB)
RX errors 28719050 dropped 0 overruns 910 frame 28718140
TX packets 15907741385 bytes 6187264747768 (5.6 TiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 36 memory 0xd3000000-d37fffff
[root@gw01]:~# ip -s link show enp5s0f0
8: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether f4:e9:d4:95:98:f0 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
50644374239906 39878906616 28719967 0 28719057 0
TX: bytes packets errors dropped carrier collsns
6187303001480 15907840103 0 0 0 0
[root@gw01]:~# ifconfig enp5s0f1
enp5s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::f6e9:d4ff:fe95:98f2 prefixlen 64 scopeid 0x20<link>
ether f4:e9:d4:95:98:f2 txqueuelen 1000 (Ethernet)
RX packets 15575664787 bytes 6286503267405 (5.7 TiB)
RX errors 68499 dropped 0 overruns 15041 frame 53458
TX packets 39504799770 bytes 50862595134132 (46.2 TiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 48 memory 0xd4000000-d47fffff
[root@gw01]:~# ip -s link show enp5s0f1
11: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether f4:e9:d4:95:98:f2 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
6286530470849 15575793330 68499 0 52749 5920
TX: bytes packets errors dropped carrier collsns
50863147969521 39505197930 0 0 0 0
0 Answers