we are trying to setup a 40gbit connection between two servers and get weird cpu behaviour when using iperf. It is also only using around 10Gbit/s of the possible 40.
Server specs:
- AMD EPYC 7413
- 8x MultiBitECC 3200 MHz 16384 MB Memory
- Supermicro H12SSL-CT
- Intel XL710 40GBe
- Ubuntu 20.04.3 LTS 5.4.0-84-gene
The Servers are connected directly to each other via fibre. No switches.
Example
host1# iperf -s
host2# iperf -c host1 -i 1 -t 120
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 1.39 GBytes 12.0 Gbits/sec
[ 3] 1.0- 2.0 sec 1.00 GBytes 8.61 Gbits/sec
[ 3] 2.0- 3.0 sec 1.03 GBytes 8.88 Gbits/sec
[ 3] 3.0- 4.0 sec 1.04 GBytes 8.92 Gbits/sec
[ 3] 4.0- 5.0 sec 1021 MBytes 8.56 Gbits/sec
[ 3] 5.0- 6.0 sec 1.05 GBytes 9.01 Gbits/sec
[ 3] 6.0- 7.0 sec 1.02 GBytes 8.78 Gbits/sec
[ 3] 7.0- 8.0 sec 1.02 GBytes 8.74 Gbits/sec
[ 3] 8.0- 9.0 sec 1.01 GBytes 8.69 Gbits/sec
[ 3] 9.0-10.0 sec 1.02 GBytes 8.75 Gbits/sec
[ 3] 10.0-11.0 sec 1.05 GBytes 9.03 Gbits/sec
[ 3] 11.0-12.0 sec 1015 MBytes 8.51 Gbits/sec
[ 3] 12.0-13.0 sec 1.02 GBytes 8.72 Gbits/sec
[ 3] 13.0-14.0 sec 1014 MBytes 8.51 Gbits/sec
[ 3] 14.0-15.0 sec 974 MBytes 8.17 Gbits/sec
[ 3] 0.0-15.0 sec 15.6 GBytes 8.92 Gbits/sec
Around the internet I found the official performance tuning guide from AMD and something from fasterdata.es.net
They suggest to make certain system setting like changing the CPU governor and tcp buffer. I did the changes accordingly and only got 1Gbit/s improvement.
When I checked the CPU clock speed the CPU always clocked down to around 400MHZ when running iperf.
Any suggestion to why either iperf send the CPU sleeping or how I could improve single thread tcp transmission speed? Running multiple tcp threads utilizes the bandwidth better but is not our use case.
thank you