Ping a Specific Port

Question

Kyle Hailey

Asked: 2011-05-17 14:47:47 +0800 CST2011-05-17 14:47:47 +0800 CST 2011-05-17 14:47:47 +0800 CST

different ACK behaviors (slowing down throughput?)

772

I'm running netio (http://freshmeat.net/projects/netio/) on one machine (opensolaris) and contacting two different Linux machines (both on 2.6.18-128.el5 ), machine A and machine B. Machine A has a network throughput of 10MB/sec with netio and machine B 100MB/sec with netio. On the open solaris I dtraced the connections and all the interactions look the same - same windows sizes on the receive and send, same ssthresh, same congestion window sizes, but the slow machine is sending and ACK for every 2 or 3 receives whereas the fast machine is sending an ACK every 12 receives. All three machines are on the same switch. Here is the Dtrace output: Fast Machine:

delta send   recd  
 (us) bytes  bytes  swnd snd_ws   rwnd rcv_ws   cwnd    ssthresh 
  122 1448 \      195200      7 131768      2 128872  1073725440 
   37 1448 \      195200      7 131768      2 128872  1073725440 
   20 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   19 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440  
   57 1448 \      195200      7 131768      2 128872  1073725440
  171 1448 \      195200      7 131768      2 128872  1073725440    
   29  912 \      195200      7 131768      2 128872  1073725440   
   30      /    0 195200      7 131768      2 128872  1073725440

slow machine:

delta send   recd  
 (us) bytes  bytes  swnd snd_ws   rwnd rcv_ws   cwnd    ssthresh 
  161      /    0 195200     7 131768      2 127424   1073725440  
   52 1448 \      195200     7 131768      2 128872   1073725440 
   33 1448 \      195200     7 131768      2 128872   1073725440   
   11 1448 \      195200     7 131768      2 128872   1073725440   
  143      /    0 195200     7 131768      2 128872   1073725440   
   46 1448 \      195200     7 131768      2 130320   1073725440   
   31 1448 \      195200     7 131768      2 130320   1073725440   
   11 1448 \      195200     7 131768      2 130320   1073725440   
  157      /    0 195200     7 131768      2 130320   1073725440  
   46 1448 \      195200     7 131768      2 131768   1073725440 
   18 1448 \      195200     7 131768      2 131768   1073725440

Dtrace code

dtrace: 130717 drops on CPU 0
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs
inline int TICKS=$1;
inline string ADDR=$$2;
dtrace:::BEGIN
{
       TIMER = ( TICKS != NULL ) ?  TICKS : 1 ;
       ticks = TIMER;
       TITLE = 10;
       title = 0;
       walltime=timestamp;
       printf("starting up ...\n");
}
tcp:::send
/     ( args[2]->ip_daddr == ADDR || ADDR == NULL ) /
{
    nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
    delta= timestamp-walltime;
    walltime=timestamp;
    printf("%6d %8d \ %8s  %8d %8d %8d  %8d %8d %12d %12d %12d %8d %8d  %d  \n",
        delta/1000,
        args[2]->ip_plength - args[4]->tcp_offset,
        "",
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        args[3]->tcps_sack_fack,
        args[3]->tcps_sack_snxt,
        args[3]->tcps_rto,
        args[3]->tcps_mss,
        args[3]->tcps_retransmit
      );
    flag=0;
    title--;
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL ) && nfs[args[1]->cs_cid] /
{
      delta=timestamp-walltime;
      walltime=timestamp;

      printf("%6d %8s / %8d  %8d %8d %8d  %8d %8d %12d %12d %12d %8d %8d  %d  \n",
        delta/1000,
        "",
        args[2]->ip_plength - args[4]->tcp_offset,
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        args[3]->tcps_sack_fack,
        args[3]->tcps_sack_snxt,
        args[3]->tcps_rto,
        args[3]->tcps_mss,
        args[3]->tcps_retransmit
      );
    flag=0;
    title--;
}

Followup added to to include the number of unacknowledged bytes and it turns out the slow code does run up it's unacknowleged bytes until it hits the congestion window, where as the fast machine never hits it's congestion window. Here is the output from the slow machine when it's unacknowledged bytes hit the congestion window:

unack    unack    delta  bytes   bytes       send   recieve  cong       ssthresh
bytes    byte      us     sent   recieved    window window    window 
sent     recieved
139760      0     31     1448 \             195200  131768   144800   1073725440
139760      0     33     1448 \             195200  131768   144800   1073725440
144104      0     29     1448 \             195200  131768   146248   1073725440
145552      0     31          / 0           195200  131768   144800   1073725440
145552      0     41     1448 \             195200  131768   147696   1073725440
147000      0     30          / 0           195200  131768   144800   1073725440
147000      0     22     1448 \             195200  131768    76744        72400
147000      0     28          / 0           195200  131768    76744        72400
147000      0     18     1448 \             195200  131768    76744        72400
147000      0     26          / 0           195200  131768    76744        72400
147000      0     17     1448 \             195200  131768    76744        72400
147000      0     27          / 0           195200  131768    76744        72400
147000      0     18     1448 \             195200  131768    76744        72400
147000      0     56          / 0           195200  131768    76744        72400
147000      0     22     1448 \             195200  131768    76744        72400

dtrace code:

#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs
inline int TICKS=$1;
inline string ADDR=$$2;
tcp:::send, tcp:::receive
/     ( args[2]->ip_daddr == ADDR || ADDR == NULL ) /
{
    nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
    delta= timestamp-walltime;
    walltime=timestamp;
    printf("%6d %6d %6d %8d \ %8s  %8d %8d %8d  %8d %8d %12d %12d %12d %8d %8d  %d  \n",
        args[3]->tcps_snxt - args[3]->tcps_suna ,
        args[3]->tcps_rnxt - args[3]->tcps_rack,
        delta/1000,
        args[2]->ip_plength - args[4]->tcp_offset,
        "",
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        args[3]->tcps_sack_fack,
        args[3]->tcps_sack_snxt,
        args[3]->tcps_rto,
        args[3]->tcps_mss,
        args[3]->tcps_retransmit
      );
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL ) && nfs[args[1]->cs_cid] /
{
      delta=timestamp-walltime;
      walltime=timestamp;
      printf("%6d %6d %6d %8s / %-8d  %8d %8d %8d  %8d %8d %12d %12d %12d %8d %8d  %d  \n",
        args[3]->tcps_snxt - args[3]->tcps_suna ,
        args[3]->tcps_rnxt - args[3]->tcps_rack,
        delta/1000,
        "",
        args[2]->ip_plength - args[4]->tcp_offset,
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        args[3]->tcps_sack_fack,
        args[3]->tcps_sack_snxt,
        args[3]->tcps_rto,
        args[3]->tcps_mss,
        args[3]->tcps_retransmit
      );
}

Now it still is a question as to why one machine falls behind and the other doesn't ...

1 Answers

Voted

sysadmin1138 · Answer 1 · 2011-05-17T18:33:38+08:00

I have seen behavior like this before. I've seen two causes for it:

Bad TCP/IP flow control negotiation
Bad drivers

TCP/IP flow-control problems are less likely in your case since both machines are running the same kernel and (except for the device kernel modules if different) therefore running the same TCP/IP code.

Drivers though.

I had a Windows 2003 server a while back that simply couldn't transfer more than 6-10MB/s to certain servers, and as that was a backup-to-disk server this simply wasn't acceptable. After looking at some packet captures, they looked a LOT like what you're seeing. What fixed it was to update the network drivers (broadcom as it happened) on the receiving server (the Server 2003 backup server) to something newer. Once that was done, I was getting 60-80MB/s.

Since this is Linux, you just might be running into a Large Segment Offload problem of some kind. This does rely in some part on the NIC hardware itself handling the splitting of large segments. If that is not working for some reason (bad firmware?) it can cause these kinds of odd delays. This is configured on a per-driver or interface basis. ethtool -K can configure it by device.

different ACK behaviors (slowing down throughput?)

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?