Ask and Learn's questions -server

Ask and Learn

Asked: 2019-05-31 17:09:50 +0800 CST

Packet drop on HP ProLiant DL360 G9 running RHEL 6.10

4

We have a HP ProLiant DL360 G9 running RHEL 6.10 with 2 X Intel 82599ES 10-Gigabit SFI/SFP+. HP product name is HP Ethernet 10Gb 2-port 560SFP+ Adapter

eth5 and eth6 showing a lot of packet drop (rx_missed_errors) I disabled flow control at NIC level then rx_missed_errors stopped increase but rx_no_dma_resources started increase daily.

They both standalone interfaces not part of a bonding.
Eth5 and eth6 are on different cards
Both cards installed to a PCIe 3.0 X16 slot
irqbalance is running on the server

Update 1

Ring parameters for eth5 and eth6 are the same and already at max.

Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096

I noticed following for eth6 in /proc/interrupts

 Sun Jun  2 19:39:42 EDT 2019

            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14      CPU15      CPU16      CPU17      CPU18      CPU19
 165:          0          0          0          0          0     484430     111744     333783     458868     577617          0          0          0          0          0      17978     402211      84832     183482   10567190   PCI-MSI-edge      eth6-TxRx-0
 166:          0          0          0          0          0      92569    2522312      36248      19459       1970          0          0          0          0          0      10140      33710      10180    1071214     651534   PCI-MSI-edge      eth6-TxRx-1
 167:          0          0          0          0          0      41060    2532170      37345      10970      92570          0          0          0          0          0       3055      22158      12485    1203344     494179   PCI-MSI-edge      eth6-TxRx-2
 168:          0          0          0          0          0     218925       8555    2312817     115650     126113          0          0          0          0          0      14575       3965     114145     995924     538667   PCI-MSI-edge      eth6-TxRx-3
 169:          0          0          0          0          0       7354       7781     199591    2262057      45221          0          0          0          0          0      34813     176350     105008     649389     962393   PCI-MSI-edge      eth6-TxRx-4
 170:          0          0          0          0          0      27982      23890      44703     162340    2597754          0          0          0          0          0      25991      22873      11846     885511     943057   PCI-MSI-edge      eth6-TxRx-5
 171:          0          0          0          0          0      16710        370        155   17725587    7504781          0          0          0          0          0 1054801625    1644839      14655  583745291  266971465   PCI-MSI-edge      eth6-TxRx-6
 172:          0          0          0          0          0       9823       6688     407394      11207      44103          0          0          0          0          0      88057    2496075       9284      56799    1391075   PCI-MSI-edge      eth6-TxRx-7
 173:          0          0          0          0          0      21175       1995     125490     151465      27120          0          0          0          0          0      19960     177195    2288457     787724     848755   PCI-MSI-edge      eth6-TxRx-8
 174:          0          0          0          0          0       7835       2210       3990      56075     106870          0          0          0          0          0     109740      24135      27720    2599827    1510934   PCI-MSI-edge      eth6-TxRx-9
 175:          0          0          0          0          0      42450       2605      39545      54520     162830          0          0          0          0          0      56035      11380      33815      52905    3993251   PCI-MSI-edge      eth6-TxRx-10
 176:          0          0          0          0          0      92335      33470    2290862       7545     227035          0          0          0          0          0       7550      25460      17225      65205    1682649   PCI-MSI-edge      eth6-TxRx-11
 177:          0          0          0          0          0      81685      56468    2273033     264820     195585          0          0          0          0          0     120640      36250      29450     244895    1146510   PCI-MSI-edge      eth6-TxRx-12
 178:          0          0          0          0          0      39655      24693     703993    1680384      22325          0          0          0          0          0     147980      27170      41585      72085    1689466   PCI-MSI-edge      eth6-TxRx-13
 179:          0          0          0          0          0     108905       1335      48265    2415832      19985          0          0          0          0          0       3545      23360      12590      35185    1780334   PCI-MSI-edge      eth6-TxRx-14
 180:          0          0          0          0          0     134826     291569      98014       9159    2262093          0          0          0          0          0     128867      18499      20078      39858    1463678   PCI-MSI-edge      eth6-TxRx-15
 181:          0          0          0          0          0       3220      37430      39030     129550      11070          0          0          0          0          0    2382452      24840      10860     146795    1664089   PCI-MSI-edge      eth6-TxRx-16
 182:          0          0          0          0          0      23120      28700     134025      96455      31545          0          0          0          0          0      30340    2262857      24485     144620    1673189   PCI-MSI-edge      eth6-TxRx-17
 183:          0          0          0          0          0       8900      29070      22490     112785     186240          0          0          0          0          0      40690      31665    2274862      37160    1705474   PCI-MSI-edge      eth6-TxRx-18
 184:          0          0          0          0          0      77090      18270      68465      53235     142648          0          0          0          0          0      16295      33770      29175    2367462    1642926   PCI-MSI-edge      eth6-TxRx-19
 185:          0          0          0          0          0         11          0          0          0          0          0          0          0          0          0          0          0          0          0          4   PCI-MSI-edge      eth6

So looks like CPU/Core 15/18/19 are under stress to process traffic on eth6

Basically I have no idea where to look next, I am guessing this may have something to do with irq affinity but not sure. I am also think of disable irqbalance but not sure if that is going to make any difference.

any suggestions?

Update 2

NIC Driver info and I don't think we have that bug. As that was in 2009.

driver: ixgbe
version: 4.2.1-k
firmware-version: 0x800008ea
bus-info: 0000:08:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

The data arrived on both eth5/6 are multicast data. Is that enough, setup port mirroring needs a ticket to network engineer team and will take time. I also not sure what to tell them to look for.

If I understand your comments correctly, there is a way to balance eth6-rxtx queue to more than one CPU core. I did some search myself and collected following information, hopeful that is useful to you.

ethtool -x eth5 and eth6

RX flow hash indirection table for eth5 with 20 RX ring(s):
    0:      0     1     2     3     4     5     6     7
    8:      8     9    10    11    12    13    14    15
   16:      0     1     2     3     4     5     6     7
   24:      8     9    10    11    12    13    14    15
   32:      0     1     2     3     4     5     6     7
   40:      8     9    10    11    12    13    14    15
   48:      0     1     2     3     4     5     6     7
   56:      8     9    10    11    12    13    14    15
   64:      0     1     2     3     4     5     6     7
   72:      8     9    10    11    12    13    14    15
   80:      0     1     2     3     4     5     6     7
   88:      8     9    10    11    12    13    14    15
   96:      0     1     2     3     4     5     6     7
  104:      8     9    10    11    12    13    14    15
  112:      0     1     2     3     4     5     6     7
  120:      8     9    10    11    12    13    14    15
RSS hash key:
3c:f9:4a:0e:fc:7e:cb:83:c2:2a:a4:1c:cf:59:38:1c:ca:54:38:b9:6b:e8:2b:63:6e:d2:9f:eb:fc:04:c2:86:6d:e3:54:f2:73:30:6a:65

ethtool -n eth5 rx-flow-hash udp4 and eth6

UDP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA

I also run set_irq_affinity on both eth5 and eth6

sudo ./set_irq_affinity local eth5

IFACE CORE MASK -> FILE
=======================
eth5 0 1 -> /proc/irq/144/smp_affinity
eth5 1 2 -> /proc/irq/145/smp_affinity
eth5 2 4 -> /proc/irq/146/smp_affinity
eth5 3 8 -> /proc/irq/147/smp_affinity
eth5 4 10 -> /proc/irq/148/smp_affinity
eth5 10 400 -> /proc/irq/149/smp_affinity
eth5 11 800 -> /proc/irq/150/smp_affinity
eth5 12 1000 -> /proc/irq/151/smp_affinity
eth5 13 2000 -> /proc/irq/152/smp_affinity
eth5 14 4000 -> /proc/irq/153/smp_affinity
eth5 0 1 -> /proc/irq/154/smp_affinity
eth5 1 2 -> /proc/irq/155/smp_affinity
eth5 2 4 -> /proc/irq/156/smp_affinity
eth5 3 8 -> /proc/irq/157/smp_affinity
eth5 4 10 -> /proc/irq/158/smp_affinity
eth5 10 400 -> /proc/irq/159/smp_affinity
eth5 11 800 -> /proc/irq/160/smp_affinity
eth5 12 1000 -> /proc/irq/161/smp_affinity
eth5 13 2000 -> /proc/irq/162/smp_affinity
eth5 14 4000 -> /proc/irq/163/smp_affinity

sudo ./set_irq_affinity local eth6

IFACE CORE MASK -> FILE
=======================
eth6 5 20 -> /proc/irq/165/smp_affinity
eth6 6 40 -> /proc/irq/166/smp_affinity
eth6 7 80 -> /proc/irq/167/smp_affinity
eth6 8 100 -> /proc/irq/168/smp_affinity
eth6 9 200 -> /proc/irq/169/smp_affinity
eth6 15 8000 -> /proc/irq/170/smp_affinity
eth6 16 10000 -> /proc/irq/171/smp_affinity
eth6 17 20000 -> /proc/irq/172/smp_affinity
eth6 18 40000 -> /proc/irq/173/smp_affinity
eth6 19 80000 -> /proc/irq/174/smp_affinity
eth6 5 20 -> /proc/irq/175/smp_affinity
eth6 6 40 -> /proc/irq/176/smp_affinity
eth6 7 80 -> /proc/irq/177/smp_affinity
eth6 8 100 -> /proc/irq/178/smp_affinity
eth6 9 200 -> /proc/irq/179/smp_affinity
eth6 15 8000 -> /proc/irq/180/smp_affinity
eth6 16 10000 -> /proc/irq/181/smp_affinity
eth6 17 20000 -> /proc/irq/182/smp_affinity
eth6 18 40000 -> /proc/irq/183/smp_affinity
eth6 19 80000 -> /proc/irq/184/smp_affinity

Update 3

I modified upd4 rx-flow-hash to include source and destination port but it did not make any difference.

ethtool -n eth5 rx-flow-hash udp4

UDP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]

Disabled irqbalance and manually update /proc/irq/171/smp_affinity_list to include all 10 'local' CPU cores.

cat /proc/irq/171smp_affinity_list

5-9,15-19

Here is grep 171: /proc/interrupts after I made above change(Add src and dst port to udp4 rx-flow-hash and added 5-9,15-19 to /proc/irq/171/smp_affinity_list) Let's call it before.

Here is grep 171: from /proc/interrupts this morning, let's call it after.

Before 171:          0          0          0          0          0      16840        390        155   17725587    7505131          0          0          0          0          0 1282081848  184961789      21430  583751571  266997575   PCI-MSI-edge      eth6-TxRx-6
After  171:          0          0          0          0          0      16840        390        155   17725587    7505131          0          0          0          0          0 1282085923  184961789      21430  583751571  267026844   PCI-MSI-edge      eth6-TxRx-6

As you can see from above, irq 171 only handled by CPU 19. If irqbalance is running a different CPU will handle irq 171, it seems for some reason, irq 171 can't be balanced to more than one CPU.

Here is the packet drop updates

Wed Jun 5 01:39:41 EDT 2019
ethtool -S eth6 | grep -E "rx_missed|no_buff|no_dma"
rx_no_buffer_count: 0
rx_missed_errors: 2578857
rx_no_dma_resources: 3456533

Thu Jun 6 05:43:34 EDT 2019
njia@c4z-ut-rttp-b19 $ sudo ethtool -S eth6 | grep -E "rx_missed|no_buff|no_dma"
rx_no_buffer_count: 0
rx_missed_errors: 2578857
rx_no_dma_resources: 3950904

Time does not matter here, as multicast data stops after 16:00 PM each day.

I found this article on Red Hat site Packet loss when multiple processes subscribe to the same multicast group.

Our developer also mentioned if we only have one instance of our application running the number of drops reduced significantly. Usually there are 8.

Increased net.core.rmem_default from 4Mb to 16Mb

sysctl -w net.core.rmem_default=16777216
net.core.rmem_default = 16777216

Here is current Udp stack status, will check again tomorrow.

Fri Jun  7 00:40:10 EDT 2019
netstat -s | grep -A 4 Udp:

Udp:
    90579753493 packets received
    1052 packets to unknown port received.
    1264898431 packet receive errors
    1295021855 packets sent

Ask and Learn

Asked: 2017-02-16 15:41:27 +0800 CST

How to troubleshoot rx_tcp_udp_chksum_err

1

Recently our Nagios started reported network interface errors on few servers and I have no idea how to identify what caused these errors.

I collected some information here and hope someone can provide some advice.

eth1      Link encap:Ethernet  HWaddr 00:0F:53:08:A6:EC
          inet addr:10.182.4.17  Bcast:10.182.7.255  Mask:255.255.252.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:164600148032 errors:67859 dropped:26955363 overruns:2 frame:67859
          TX packets:3498398714 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:73232469076775 (66.6 TiB)  TX bytes:2998456371714 (2.7 TiB)
          Interrupt:40


ethtool -i eth1

   driver: sfc
   version: 3.2
   firmware-version: 3.3.0.6298
   bus-info: 0000:04:00.0
   supports-statistics: yes
   supports-test: yes
   supports-eeprom-access: no
   supports-register-dump: yes
   supports-priv-flags: no

ethtool -S eth1

     NIC statistics:
     tx_bytes: 2998441377182
     tx_good_bytes: 2998441377182
     tx_bad_bytes: 0
     tx_packets: 3498277681
     tx_bad: 0
     tx_pause: 10
     tx_control: 10
     tx_unicast: 3469342716
     tx_multicast: 28924898
     tx_broadcast: 10067
     tx_lt64: 0
     tx_64: 10707489
     tx_65_to_127: 2090483055
     tx_128_to_255: 1016140
     tx_256_to_511: 536073713
     tx_512_to_1023: 1122081
     tx_1024_to_15xx: 517594725
     tx_15xx_to_jumbo: 341280478
     tx_gtjumbo: 0
     tx_collision: 0
     tx_single_collision: 0
     tx_multiple_collision: 0
     tx_excessive_collision: 0
     tx_deferred: 0
     tx_late_collision: 0
     tx_excessive_deferred: 0
     tx_non_tcpudp: 0
     tx_mac_src_error: 0
     tx_ip_src_error: 0
     tx_tso_bursts: 0
     tx_tso_long_headers: 0
     tx_tso_packets: 0
     tx_pushes: 997738192
     rx_bytes: 73230776458079
     rx_good_bytes: 73159089285734
     rx_bad_bytes: 71687172345
     rx_packets: 164596356458
     rx_good: 164410019215
     rx_bad: 67675
     rx_pause: 0
     rx_control: 0
     rx_unicast: 3862651550
     rx_multicast: 160533922396
     rx_broadcast: 13445269
     rx_lt64: 0
     rx_64: 29145974
     rx_65_to_127: 24196814809
     rx_128_to_255: 85333977542
     rx_256_to_511: 29573419148
     rx_512_to_1023: 10692885531
     rx_1024_to_15xx: 11810617170
     rx_15xx_to_jumbo: 2959496284
     rx_gtjumbo: 0
     rx_bad_lt64: 0
     rx_bad_64_to_15xx: 0
     rx_bad_15xx_to_jumbo: 0
     rx_bad_gtjumbo: 0
     rx_overflow: 2
     rx_missed: 0
     rx_false_carrier: 0
     rx_symbol_error: 0
     rx_align_error: 0
     rx_length_error: 0
     rx_internal_error: 0
     rx_nodesc_drop_cnt: 26955363
     rx_reset: 0
     rx_tobe_disc: 186269583
     rx_ip_hdr_chksum_err: 0
     rx_tcp_udp_chksum_err: 67646
     rx_mcast_mismatch: 3470220
     rx_frm_trunc: 0
     rx_nodesc_trunc: 0

SolarFlare user guide just says rx_tcp_udp_chksum_err Number of packets received with TCP/UDP checksum error.

How do I troubleshoot this?

Ask and Learn

Asked: 2015-07-27 21:09:43 +0800 CST

What does this snmp v1 ID mean

0

We seeing two Dell DRAC started reporting following warnings

snmp trap server3.5.2.2.4331

and I have no idea what does it mean, searched that OID and only found this http://oid-info.com/get/1.3.6.1.4.1.181.2.3.5.3.2.2

anyone has any idea ?

Thanks

Ask and Learn

Asked: 2015-06-17 18:02:25 +0800 CST

Zenoss time based monitoring

3

We use Zenoss for system monitoring and few backup servers will use 100% of NIC bandwidth from 17:00 to 22:00 everyday.

I hope there is a setting to allow me to setup different threshold for different time period but did not find any.

Does Zenoss support this feature ?

Thanks

Ask and Learn

Asked: 2015-04-29 22:12:11 +0800 CST

Ansible command line retriving ssh password from vault

1

I am trying to setup Ansible to manage Linux boxes from different customers and here are what we have to work with.

No Pub key authentication - I wanted it as much as you do but it won't happen any time soon.
We login as root and each customer has a different root password for all linux boxes. We are pushing for disable direct root login and do everything via sudo but again, it will take some time.

I managed to create a ansible vault file for each customer with ansible_ssh_user and ansible_ssh_pass in it and following play-book works fine.

---
- hosts:
    - SERV01
    - SERV02
  vars_files:
    - roles/common/vault/main.yml

  tasks:
    - name: enable and start ntpd
      service: name=ntpd enabled=yes state=running

Now I would like to know how can I use vault files from command line, but none of the following worked.

ansible customer1 -m shell -a "var_files:roles/common/vault/main.yml uptime" --ask-vault-pass

ansible customer1 -m shell -a "uptime" -e "vars_files:roles/common/vault/main.yml"  --ask-vault-pass

What am I doing wrong ?

Thanks

Ask and Learn

Asked: 2015-02-18 18:32:27 +0800 CST

Is there a way to duplicate entire AWS environment

7

I know I can create image from my existing instance and relaunch them later, what I want to know is if there is a way to duplicate my whole environment, like duplicate every VM in UAT to create Beta.

Packet drop on HP ProLiant DL360 G9 running RHEL 6.10

Update 1

Update 2

Update 3

How to troubleshoot rx_tcp_udp_chksum_err

What does this snmp v1 ID mean

Zenoss time based monitoring

Ansible command line retriving ssh password from vault

Is there a way to duplicate entire AWS environment

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?