Ping a Specific Port

Question

Nils

Asked: 2011-07-07 05:08:16 +0800 CST2011-07-07 05:08:16 +0800 CST 2011-07-07 05:08:16 +0800 CST

Why does my gigabit bond not deliver at least 150 MB/s throughput?

772

I directly connected two PowerEdge 6950 crossover (using straight lines) on two different PCIe-adapters.

I get a gigabit link on each of these lines (1000 MBit, full duplex, flow contol in both directions).

Now I am trying to bond these interfaces into bond0 using the rr-algorithm on both sides (I want to get 2000 MBit for a single IP session).

When I tested the throughput by transferring /dev/zero to /dev/null using dd bs=1M and netcat in tcp mode I get a throughput of 70 MB/s - not - as expected more than 150MB/s.

When I use the single lines I get about 98 MB/s on each line, if I used a different direction for each line. When I use the single lines I get 70 MB/s and 90 MB/s on the line, if traffic goes into the "same" direction.

After reading through the bonding-readme (/usr/src/linux/Documentation/networking/bonding.txt) I found the following section to be useful: (13.1.1 MT Bonding Mode Selection for Single Switch Topology)

balance-rr: This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple interfaces. It is therefore the only mode that will allow a single TCP/IP stream to utilize more than one interface's worth of throughput. This comes at a cost, however: the striping often results in peer systems receiving packets out of order, causing TCP/IP's congestion control system to kick in, often by retransmitting segments.

    It is possible to adjust TCP/IP's congestion limits by
    altering the net.ipv4.tcp_reordering sysctl parameter. The
    usual default value is 3, and the maximum useful value is 127.
    For a four interface balance-rr bond, expect that a single
    TCP/IP stream will utilize no more than approximately 2.3
    interface's worth of throughput, even after adjusting
    tcp_reordering.

    Note that this out of order delivery occurs when both the
    sending and receiving systems are utilizing a multiple
    interface bond.  Consider a configuration in which a
    balance-rr bond feeds into a single higher capacity network
    channel (e.g., multiple 100Mb/sec ethernets feeding a single
    gigabit ethernet via an etherchannel capable switch).  In this
    configuration, traffic sent from the multiple 100Mb devices to
    a destination connected to the gigabit device will not see
    packets out of order.  However, traffic sent from the gigabit
    device to the multiple 100Mb devices may or may not see
    traffic out of order, depending upon the balance policy of the
    switch.  Many switches do not support any modes that stripe
    traffic (instead choosing a port based upon IP or MAC level
    addresses); for those devices, traffic flowing from the
    gigabit device to the many 100Mb devices will only utilize one
    interface.

Now I changed that parameter on both connected servers on all lines (4) from 3 to 127.

After bonding again I get about 100 MB/s but still not more than that.

Any ideas why?

Update: Hardware details from lspci -v:

24:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
        Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 24
        Memory at dfe80000 (32-bit, non-prefetchable) [size=128K]
        Memory at dfea0000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at dcc0 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] MSI: Mask- 64bit+ Count=1/1 Enable-
        Capabilities: [e0] Express Endpoint, MSI 00
        Kernel driver in use: e1000
        Kernel modules: e1000

Update final results:

8589934592 bytes (8.6 GB) copied, 35.8489 seconds, 240 MB/s

I changed a lot of tcp/ip and low-level-driver options. This includes enlargement of the network buffers. This is why dd now shows numbers greater than 200 MB/s: dd terminates while there is still output waiting to be transferred (in send buffers).

Update 2011-08-05: Settings that were changed to achive the goal (/etc/sysctl.conf):

# See http://www-didc.lbl.gov/TCP-tuning/linux.html
# raise TCP max buffer size to 16 MB. default: 131071
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# raise autotuninmg TCP buffer limits
# min, default and max number of bytes to use
# Defaults:
#net.ipv4.tcp_rmem = 4096 87380 174760
#net.ipv4.tcp_wmem = 4096 16384 131072
# Tuning:
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Default: Backlog 300
net.core.netdev_max_backlog = 2500
#
# Oracle-DB settings:
fs.file-max = 6815744
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
kernel.shmmax = 2147659776
kernel.sem = 1250 256000 100 1024
net.core.rmem_default = 262144
net.core.wmem_default = 262144
#
# Tuning for network-bonding according to bonding.txt:
net.ipv4.tcp_reordering=127

Special settings for the bond-device (SLES: /etc/sysconfig/network/ifcfg-bond0):

MTU='9216'
LINK_OPTIONS='txqueuelen 10000'

Note that setting the biggest possible MTU was the key to the solution.

Tuning of the rx/tx buffers of the involved network cards:

/usr/sbin/ethtool -G eth2 rx 2048 tx 2048
/usr/sbin/ethtool -G eth4 rx 2048 tx 2048

6 Answers

Voted

user842313 · Answer 1 · 2011-07-15T06:44:12+08:00

Best Answer

user842313

2011-07-15T06:44:12+08:002011-07-15T06:44:12+08:00

I had a similar problem trying to raise the speed of a drbd synchronization over two gigabit links some time ago. In the end I managed to get about 150MB/sec synch speed. These were the settings that I applied on both nodes:

ifconfig bond0 mtu 9000
ifconfig bond0 txqueuelen 10000
echo 3000 > /proc/sys/net/core/netdev_max_backlog

You could also try to enable interrupt coalescence if you don't already have for your network cards (with ethtool --coalesce)

9

Chopper3 · Answer 2 · 2011-07-07T05:23:38+08:00

Chopper3

2011-07-07T05:23:38+08:002011-07-07T05:23:38+08:00

Have you configured this two-way trunk on the switch? if not then it won't work like that, it'll just work in active/passive mode and only use 1 of the 1Gbps links.

0

user48838 · Answer 3 · 2011-07-11T10:53:06+08:00

user48838

2011-07-11T10:53:06+08:002011-07-11T10:53:06+08:00

It looks like the PowerEdge 6950 is limited to possibly PCI slots which top out at 133 MB/s shared across the entire bus. You might be seeing I/O limitations on the system bus architecture itself.

Outside of having other systems with different hardware and I/O architectures to test, cabling might also come into play as well. Some possible combinations may be along the lines of different ratings (5e vs. 6) as well as lengths (shorter is not always better).

0

Julien Vehent · Answer 4 · 2011-07-12T14:52:32+08:00

Julien Vehent

2011-07-12T14:52:32+08:002011-07-12T14:52:32+08:00

Jumbo frames ?

ifconfig <interface> mtu 9000

0

Will - TechToolbox · Answer 5 · 2011-07-15T12:14:50+08:00

Will - TechToolbox

2011-07-15T12:14:50+08:002011-07-15T12:14:50+08:00

doing jumbo frames is a gigantic help, as long as your switch and nic's support it. if you have an unmanaged siwtch, most likely you not going to get anywhere you want for the bandwidth, but thats not the case if you are binding the ports together on the switch. here's something ive learned a long time ago, 65% of the time, its a physical issue. are you using cat6 cable?

0

ashmere · Answer 6 · 2011-08-06T03:18:42+08:00

ashmere

2011-08-06T03:18:42+08:002011-08-06T03:18:42+08:00

if you have configured jumbo frames on your nics which by the look of it you have make sure you have configured your switches to support the high MTU as well.

Jumbo frames are a great performance on gigabit networks but you need to ensure your have configured them end to end (both source and destination servers and the network switches they use).

0

Why does my gigabit bond not deliver at least 150 MB/s throughput?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?