Ping a Specific Port

Question

pjmorse

Asked: 2011-03-26 10:39:54 +0800 CST2011-03-26 10:39:54 +0800 CST 2011-03-26 10:39:54 +0800 CST

How can I find the source of my high load issues on Ubuntu server?

772

We have an Ubuntu 10.4 VPS serving a Rails site which often shows pretty high load, but doesn't have high CPU or memory numbers. Reading a lot of other questions here on Server Fault suggests to me that this is an I/O issue (i.e. there are processes which are stuck in I/O wait state and therefore driving up load). I'm trying to track down those processes, but not having much luck. I'd appreciate help with (a) ways to identify the guilty processes, and/or (b) confirmation that I'm asking the right question.

Here's a snapshot of top:

top - 18:28:49 up 5 days,  3:07,  2 users,  load average: 1.79, 1.83, 1.73
Tasks:  82 total,   1 running,  81 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.3%sy,  0.0%ni, 99.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.1%st
Mem:   1794980k total,  1780384k used,    14596k free,    13356k buffers
Swap:   524284k total,     3116k used,   521168k free,  1012272k cached

Notice low swap, CPUs mostly idle; that's why I think we're I/O bound instead of memory or CPU bound.

Here's iostat (I've obfuscated the server name):

$ iostat -x 1 3
Linux 2.6.35.2-xenU (our.server.com)     03/25/11        _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.75    0.19    0.50    0.31    0.01   97.24

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvdap1            0.01    11.52    2.19    3.18   145.12   117.55    48.97     0.08   15.60   1.67   0.90
xvdap9            0.01     0.01    0.00    0.00     0.10     0.14    62.62     0.00   13.20   6.09   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdap9            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdap9            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

iotop won't run on this box:

$ iotop
Could not run iotop as some of the requirements are not met:
- Linux >= 2.6.20 with I/O accounting support (CONFIG_TASKSTATS, CONFIG_TASK_DELAY_ACCT, CONFIG_TASK_IO_ACCOUNTING): Not found
- Python >= 2.5 or Python 2.4 with the ctypes module: Found

ps seldom finds any processes in the D state:

$ sudo ps -eo pid,user,state,cmd | awk '$3 ~ /D/ { print $0 }'
  976 root     D [kjournald]
$ sudo ps -eo pid,user,state,cmd | awk '$3 ~ /D/ { print $0 }'
$ sudo ps -eo pid,user,state,cmd | awk '$3 ~ /D/ { print $0 }'
$

What's my next troubleshooting step?

ETA: I ran vmstat:

$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0   3116 509372  22880 773232    0    0    18    15   24   14  2  0 97  0

That wa value of 0 makes me wonder if I/O is really the problem.

Also, yes, I know load in the 1.x range isn't really a problem - but this app has a history of ramping up load until it chokes, and if I can track the source while it still has a low fever I might spare a fatality (to torture a metaphor).

1 Answers

Voted

Christopher Karel · Answer 1 · 2011-03-26T11:09:15+08:00

Christopher Karel

2011-03-26T11:09:15+08:002011-03-26T11:09:15+08:00

I would recommend searching for anything non in the S sleeping state. It's possible you've got zombie processes which can get counted as something running, despite not really doing anything. ps -eo pid,user,state,cmd | awk '$3 !~ /S/ {print $0}' This will show any non-sleeping processes. (Running, waiting on IO, zombied, etc)

It's worth noting that your load average isn't terribly alarming. Assuming you have more than two cores on the box, there's no doubt plenty of CPU power to go around. But obviously still worth looking into if you don't expect 1-2 processes running at any given time.

--Christopher Karel

2

How can I find the source of my high load issues on Ubuntu server?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?