I know loopback will go through the kernel network stack until reach IP layer, including syscall overhead and some memory copy overhead. DPDK and RDMA use different technology to avoid these.
So let's say I have two machine connected by dpdk/rdma, then I do net latency test, will that be faster than loopback on just one machine?
I do a quick test of ping localhost
on CPU E5-2630 v4 @ 2.20GHz, which on average is 0.010ms.
I come up with this question when I was testing my ceph cluster using vstart.sh, I want to minize network latency in order to carefully analyze how osd-related code affect latency.
based on the conversation via comments, the real question is
Does DPDK/RDMA between 2 machines gives lower latency than localhost ping
.[Answer] yes, you can achieve the same. But has some caveats
rte_eth_tx_burst
only enqueue the packet descriptor for DMA on PCIe to send traffic. This does not actually send the packet out.rte_eth_tx_buffer_flush
explicitly flushes out any previously buffered packets to Hardware.rte_pktmbuf_alloc
to grab a mbuf and set the ref_cnt to 250.hence with the right NIC (which supports low latency transmit), DPDK API rte_eth_tx_buffer_flush, and pre-allocated mbuf with ref_cnt updated to higher value you can achieve you can achieve
0.010ms
on average.Note: For better baseline, use packet-generator or packet Balster send ICMP request to the target machine with Kernel and DPDK solution to compare the real loading performance for line rate such as 1%, 5%, 10%, 25%, 50%, 75%, 100%.