Kyle Hailey's questions -server

Kyle Hailey

Asked: 2011-09-14 12:26:15 +0800 CST

How to collect api the "CPU Ready" statistic for each VM on ESX on an OpenSolaris box

1

I'm on an OpenSolaris box and I want to monitor the VMs on an ESX box. VMware has a remote tool kit in perl for linux and a remote esxtop for LINUX, but is there a way to collect ESX statistics on a non-LINUX box, in this case OpenSolaris x86? I can use ssh, expect and esxtop but want to know if there is way to do this with a remote API? (I'd rather not use ssh and expect)

Kyle Hailey

Asked: 2011-08-30 15:43:45 +0800 CST

NFS v3 versus v4

11

I am wondering why NFS v4 would be so much faster than NFS v3 and if there are any parameters on v3 that could be tweaked.

I mount a file system

sudo mount  -o  'rw,bg,hard,nointr,rsize=1048576,wsize=1048576,vers=4'  toto:/test /test

and then run

 dd if=/test/file  of=/dev/null bs=1024k

I can read 200-400MB/s but when I change version to vers=3, remount and rerun the dd I only get 90MB/s. The file I'm reading from is an in memory file on the NFS server. Both sides of the connection are Solaris and have 10GbE NIC. I avoid any client side caching by remounting between all tests. I used dtrace to see on the server to measure how fast data is being served via NFS. For both v3 and v4 I changed:

 nfs4_bsize
 nfs3_bsize

from default 32K to 1M (on v4 I maxed at 150MB/s with 32K) I've tried tweaking

nfs3_max_threads
clnt_max_conns
nfs3_async_clusters

to improve the v3 performance, but no go.

On v3 if I run four parallel dd's the throughput goes down from 90MB/s to 70-80MBs which leads me to believe the problem is some shared resource and if so, then I'm wondering what it is and if I can increase that resource.

dtrace code to get window sizes:

#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs

inline string ADDR=$$1;

dtrace:::BEGIN
{
       TITLE = 10;
       title = 0;
       printf("starting up ...\n");
       self->start = 0;
}

tcp:::send, tcp:::receive
/   self->start == 0  /
{
     walltime[args[1]->cs_cid]= timestamp;
     self->start = 1;
}

tcp:::send, tcp:::receive
/   title == 0  &&
     ( ADDR == NULL || args[3]->tcps_raddr == ADDR  ) /
{
      printf("%4s %15s %6s %6s %6s %8s %8s %8s %8s %8s  %8s %8s %8s  %8s %8s\n",
        "cid",
        "ip",
        "usend"    ,
        "urecd" ,
        "delta"  ,
        "send"  ,
        "recd"  ,
        "ssz"  ,
        "sscal"  ,
        "rsz",
        "rscal",
        "congw",
        "conthr",
        "flags",
        "retran"
      );
      title = TITLE ;
}

tcp:::send
/     ( ADDR == NULL || args[3]->tcps_raddr == ADDR ) /
{
    nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
    this->delta= timestamp-walltime[args[1]->cs_cid];
    walltime[args[1]->cs_cid]=timestamp;
    this->flags="";
    this->flags= strjoin((( args[4]->tcp_flags & TH_FIN ) ? "FIN|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_SYN ) ? "SYN|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_RST ) ? "RST|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_PUSH ) ? "PUSH|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_ACK ) ? "ACK|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_URG ) ? "URG|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_ECE ) ? "ECE|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_CWR ) ? "CWR|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags == 0 ) ? "null " : ""),this->flags);
    printf("%5d %14s %6d %6d %6d %8d \ %-8s %8d %6d %8d  %8d %8d %12d %s %d  \n",
        args[1]->cs_cid%1000,
        args[3]->tcps_raddr  ,
        args[3]->tcps_snxt - args[3]->tcps_suna ,
        args[3]->tcps_rnxt - args[3]->tcps_rack,
        this->delta/1000,
        args[2]->ip_plength - args[4]->tcp_offset,
        "",
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        this->flags,
        args[3]->tcps_retransmit
      );
    this->flags=0;
    title--;
    this->delta=0;
}

tcp:::receive
/ nfs[args[1]->cs_cid] &&  ( ADDR == NULL || args[3]->tcps_raddr == ADDR ) /
{
    this->delta= timestamp-walltime[args[1]->cs_cid];
    walltime[args[1]->cs_cid]=timestamp;
    this->flags="";
    this->flags= strjoin((( args[4]->tcp_flags & TH_FIN ) ? "FIN|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_SYN ) ? "SYN|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_RST ) ? "RST|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_PUSH ) ? "PUSH|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_ACK ) ? "ACK|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_URG ) ? "URG|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_ECE ) ? "ECE|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags & TH_CWR ) ? "CWR|" : ""),this->flags);
    this->flags= strjoin((( args[4]->tcp_flags == 0 ) ? "null " : ""),this->flags);
    printf("%5d %14s %6d %6d %6d %8s / %-8d %8d %6d %8d  %8d %8d %12d %s %d  \n",
        args[1]->cs_cid%1000,
        args[3]->tcps_raddr  ,
        args[3]->tcps_snxt - args[3]->tcps_suna ,
        args[3]->tcps_rnxt - args[3]->tcps_rack,
        this->delta/1000,
        "",
        args[2]->ip_plength - args[4]->tcp_offset,
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        this->flags,
        args[3]->tcps_retransmit
      );
    this->flags=0;
    title--;
    this->delta=0;
}

Output looks like ( not from this particular situation):

cid              ip  usend  urecd  delta     send     recd      ssz    sscal      rsz     rscal    congw   conthr     flags   retran
  320 192.168.100.186    240      0    272      240 \             49232      0  1049800         5  1049800         2896 ACK|PUSH| 0
  320 192.168.100.186    240      0    196          / 68          49232      0  1049800         5  1049800         2896 ACK|PUSH| 0
  320 192.168.100.186      0      0  27445        0 \             49232      0  1049800         5  1049800         2896 ACK| 0
   24 192.168.100.177      0      0 255562          / 52          64060      0    64240         0    91980         2920 ACK|PUSH| 0
   24 192.168.100.177     52      0    301       52 \             64060      0    64240         0    91980         2920 ACK|PUSH| 0

some headers

usend - unacknowledged send bytes
urecd - unacknowledged received bytes
ssz - send window
rsz - receive window
congw - congestion window

planning on taking snoop's of the dd's over v3 and v4 and comparing. Have already done it but there was too much traffic and I used a disk file instead of a cached file which made comparing timings meaningless. Will run other snoop's with cached data and no other traffic between boxes. TBD

Additionally the network guys say there is no traffic shaping or bandwidth limiters on the connections.

Kyle Hailey

Asked: 2011-08-21 23:19:59 +0800 CST

limiting output of esxtop in batch mode

0

is there anyway to limit the output from esxtop in batch mode? I tried running it in batch mode and got 16,000 columns! I could filter this out post collection but at that kind of data volumn it seems like I'd be wasting resources. The interactive output from esxtop is fairly customizable. Here is a pretty good discussion of esxtop http://www.yellow-bricks.com/esxtop/ If the batch mode is not, then I will probably see about parsing the interactive output progamatically. Another option would be using the SKD from VMware but I haven't found any practical examples. I'm doing the collection from opensolaris. There is a perl SDK for LINUX and Windows but I'd rather do everything from opensolaris if possible.

Kyle Hailey

Asked: 2011-07-01 16:40:07 +0800 CST

writing to opensolaris ramdisk over NFS

0

I want to read and write to a ramdisk on OpenSolaris for performance testing purposes. The tests would be aimed at network transmission and I want to rule out disk performance. I set up the ramdisk on the NFS server, machine A, with

mkfile -nv 1000m  `pwd`/ramdisk

on a directory that was mounted via NFS onto machine B. Reading the ramdisk went fine, but writing to it, just overwrote the file. I then setup a ramdisk with

ramdiskadm -a ramdisk1 1000m

which I can write to fine but I can't access over NFS. The ramdisk is put on /dev/ramdisk which is a link to /devices/pseudo I added /devices/pseudo to /etc/dfs/sharetab and mounted it on machine B without error, but the contents of the directory on machine B are emtpy.

Kyle Hailey

Asked: 2011-05-17 14:47:47 +0800 CST

different ACK behaviors (slowing down throughput?)

3

I'm running netio (http://freshmeat.net/projects/netio/) on one machine (opensolaris) and contacting two different Linux machines (both on 2.6.18-128.el5 ), machine A and machine B. Machine A has a network throughput of 10MB/sec with netio and machine B 100MB/sec with netio. On the open solaris I dtraced the connections and all the interactions look the same - same windows sizes on the receive and send, same ssthresh, same congestion window sizes, but the slow machine is sending and ACK for every 2 or 3 receives whereas the fast machine is sending an ACK every 12 receives. All three machines are on the same switch. Here is the Dtrace output: Fast Machine:

delta send   recd  
 (us) bytes  bytes  swnd snd_ws   rwnd rcv_ws   cwnd    ssthresh 
  122 1448 \      195200      7 131768      2 128872  1073725440 
   37 1448 \      195200      7 131768      2 128872  1073725440 
   20 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   19 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440 
   18 1448 \      195200      7 131768      2 128872  1073725440  
   57 1448 \      195200      7 131768      2 128872  1073725440
  171 1448 \      195200      7 131768      2 128872  1073725440    
   29  912 \      195200      7 131768      2 128872  1073725440   
   30      /    0 195200      7 131768      2 128872  1073725440

slow machine:

delta send   recd  
 (us) bytes  bytes  swnd snd_ws   rwnd rcv_ws   cwnd    ssthresh 
  161      /    0 195200     7 131768      2 127424   1073725440  
   52 1448 \      195200     7 131768      2 128872   1073725440 
   33 1448 \      195200     7 131768      2 128872   1073725440   
   11 1448 \      195200     7 131768      2 128872   1073725440   
  143      /    0 195200     7 131768      2 128872   1073725440   
   46 1448 \      195200     7 131768      2 130320   1073725440   
   31 1448 \      195200     7 131768      2 130320   1073725440   
   11 1448 \      195200     7 131768      2 130320   1073725440   
  157      /    0 195200     7 131768      2 130320   1073725440  
   46 1448 \      195200     7 131768      2 131768   1073725440 
   18 1448 \      195200     7 131768      2 131768   1073725440

Dtrace code

dtrace: 130717 drops on CPU 0
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs
inline int TICKS=$1;
inline string ADDR=$$2;
dtrace:::BEGIN
{
       TIMER = ( TICKS != NULL ) ?  TICKS : 1 ;
       ticks = TIMER;
       TITLE = 10;
       title = 0;
       walltime=timestamp;
       printf("starting up ...\n");
}
tcp:::send
/     ( args[2]->ip_daddr == ADDR || ADDR == NULL ) /
{
    nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
    delta= timestamp-walltime;
    walltime=timestamp;
    printf("%6d %8d \ %8s  %8d %8d %8d  %8d %8d %12d %12d %12d %8d %8d  %d  \n",
        delta/1000,
        args[2]->ip_plength - args[4]->tcp_offset,
        "",
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        args[3]->tcps_sack_fack,
        args[3]->tcps_sack_snxt,
        args[3]->tcps_rto,
        args[3]->tcps_mss,
        args[3]->tcps_retransmit
      );
    flag=0;
    title--;
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL ) && nfs[args[1]->cs_cid] /
{
      delta=timestamp-walltime;
      walltime=timestamp;

      printf("%6d %8s / %8d  %8d %8d %8d  %8d %8d %12d %12d %12d %8d %8d  %d  \n",
        delta/1000,
        "",
        args[2]->ip_plength - args[4]->tcp_offset,
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        args[3]->tcps_sack_fack,
        args[3]->tcps_sack_snxt,
        args[3]->tcps_rto,
        args[3]->tcps_mss,
        args[3]->tcps_retransmit
      );
    flag=0;
    title--;
}

Followup added to to include the number of unacknowledged bytes and it turns out the slow code does run up it's unacknowleged bytes until it hits the congestion window, where as the fast machine never hits it's congestion window. Here is the output from the slow machine when it's unacknowledged bytes hit the congestion window:

unack    unack    delta  bytes   bytes       send   recieve  cong       ssthresh
bytes    byte      us     sent   recieved    window window    window 
sent     recieved
139760      0     31     1448 \             195200  131768   144800   1073725440
139760      0     33     1448 \             195200  131768   144800   1073725440
144104      0     29     1448 \             195200  131768   146248   1073725440
145552      0     31          / 0           195200  131768   144800   1073725440
145552      0     41     1448 \             195200  131768   147696   1073725440
147000      0     30          / 0           195200  131768   144800   1073725440
147000      0     22     1448 \             195200  131768    76744        72400
147000      0     28          / 0           195200  131768    76744        72400
147000      0     18     1448 \             195200  131768    76744        72400
147000      0     26          / 0           195200  131768    76744        72400
147000      0     17     1448 \             195200  131768    76744        72400
147000      0     27          / 0           195200  131768    76744        72400
147000      0     18     1448 \             195200  131768    76744        72400
147000      0     56          / 0           195200  131768    76744        72400
147000      0     22     1448 \             195200  131768    76744        72400

dtrace code:

#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs
inline int TICKS=$1;
inline string ADDR=$$2;
tcp:::send, tcp:::receive
/     ( args[2]->ip_daddr == ADDR || ADDR == NULL ) /
{
    nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
    delta= timestamp-walltime;
    walltime=timestamp;
    printf("%6d %6d %6d %8d \ %8s  %8d %8d %8d  %8d %8d %12d %12d %12d %8d %8d  %d  \n",
        args[3]->tcps_snxt - args[3]->tcps_suna ,
        args[3]->tcps_rnxt - args[3]->tcps_rack,
        delta/1000,
        args[2]->ip_plength - args[4]->tcp_offset,
        "",
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        args[3]->tcps_sack_fack,
        args[3]->tcps_sack_snxt,
        args[3]->tcps_rto,
        args[3]->tcps_mss,
        args[3]->tcps_retransmit
      );
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL ) && nfs[args[1]->cs_cid] /
{
      delta=timestamp-walltime;
      walltime=timestamp;
      printf("%6d %6d %6d %8s / %-8d  %8d %8d %8d  %8d %8d %12d %12d %12d %8d %8d  %d  \n",
        args[3]->tcps_snxt - args[3]->tcps_suna ,
        args[3]->tcps_rnxt - args[3]->tcps_rack,
        delta/1000,
        "",
        args[2]->ip_plength - args[4]->tcp_offset,
        args[3]->tcps_swnd,
        args[3]->tcps_snd_ws,
        args[3]->tcps_rwnd,
        args[3]->tcps_rcv_ws,
        args[3]->tcps_cwnd,
        args[3]->tcps_cwnd_ssthresh,
        args[3]->tcps_sack_fack,
        args[3]->tcps_sack_snxt,
        args[3]->tcps_rto,
        args[3]->tcps_mss,
        args[3]->tcps_retransmit
      );
}

Now it still is a question as to why one machine falls behind and the other doesn't ...

Kyle Hailey

Asked: 2011-03-17 17:39:27 +0800 CST

how to connect two machines together without a switch (opensolaris + linux or solaris or aix or hpux)

1

I have two machines with same subnets X.Y.Z.1 and X.Y.Z.2 I connect them directly with a crossover cable. I can

$ ping X.Y.Z.2

from X.Y.Z.1 and the response is machine 2 is alive but if I do something like

$ ping -s X.Y.Z.2

it hangs. machine 1 is open solaris. Machine 2 has been hpUX, LINUX and Solaris Sparc second test

$ssh X.Y.Z.2

connects and asks for DSA key, which I accept with "yes", then it hangs

How to collect api the "CPU Ready" statistic for each VM on ESX on an OpenSolaris box

NFS v3 versus v4

limiting output of esxtop in batch mode

writing to opensolaris ramdisk over NFS

different ACK behaviors (slowing down throughput?)

how to connect two machines together without a switch (opensolaris + linux or solaris or aix or hpux)

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?