I have several machines with ConnectX-7 Infiniband cards and they're plugged into an Nvidia QM9700 switch. I've confirmed 400 Gbit NDR at both ends (ibstat on the host and in the console on the switch). The machines are running Ubuntu 22.04 and the Mellanox 5.8-3.0.7.0 drivers. I've done a lot of testing with ib_write_bw
and the most I can get is ~251 Gbit/s. The actual test commands are:
Server side (host_a):
numactl -N 0 -m 0 ib_write_bw -d mlx5_4 -F --report_gbits
Client side (host_b):
numactl -N 0 -m 0 ib_write_bw -d mlx5_4 -F --report_gbits --run_infinitely host_b
The cards are in the correct numa domains to match numactl, but I've tried other combinations of that with no luck. Output ends up looking something like this:
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_4
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 1
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x54 QPN 0x0058 PSN xxx RKey 0x1820e0 VAddr xxx
remote address: LID 0x53 QPN 0x0058 PSN xxx RKey 0x1820e0 VAddr xxx
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
65536 2353827 0.00 246.81 0.470754
65536 2339084 0.00 245.27 0.467815
65536 2338736 0.00 245.23 0.467746
65536 2338574 0.00 245.22 0.467713
65536 2338610 0.00 245.22 0.467720
I know this is probably a long shot, but wondering if anyone has actually achieved 400 gbit over infiniband with ib_write_bw
that might know something we missed.
So the answer ended up being that we needed to set the PCI parameter
MAX_ACC_OUT_READ
to 128. Once that was set viamlxconfig -y -d mlx5_4 s MAX_ACC_OUT_READ=128
for each card and then power cycling the machines, throughput jumped from ~250 gbit to ~375 Gbit. Not 400 but I'll take it. To do each card: