Ping a Specific Port

Question

usef_ksa

Asked: 2010-06-30 03:46:32 +0800 CST2010-06-30 03:46:32 +0800 CST 2010-06-30 03:46:32 +0800 CST

wa (Waiting for I/O) from top command is big

772

I have a forum with a lot of visitors, Some days the load increase to reach 40 without increase of the number vistors. As you can see from the below output, the waiting time is high (57%). how do I find the reason for that?
The server software is Apache, MySQL and PHP.

root@server:~# top
top - 13:22:08 up 283 days, 22:06,  1 user,  load average: 13.84, 24.75, 22.79
Tasks: 333 total,   1 running, 331 sleeping,   0 stopped,   1 zombie
Cpu(s): 20.6%us,  7.9%sy,  0.0%ni, 13.4%id, 57.1%wa,  0.1%hi,  0.9%si,  0.0%st
Mem:   4053180k total,  3868680k used,   184500k free,   136380k buffers
Swap:  9936160k total,    12144k used,  9924016k free,  2166552k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
23930 mysql     20   0  549m 122m 6580 S   90  3.1   4449:04 mysqld
17422 www-data  20   0  223m  20m  10m S    2  0.5   0:00.21 apache2
17555 www-data  20   0  222m  19m 9968 S    2  0.5   0:00.13 apache2
17264 www-data  20   0  225m  19m 8972 S    1  0.5   0:00.17 apache2
17251 www-data  20   0  220m  12m 4912 S    1  0.3   0:00.12 apache2

.

root@server:~# top
top - 13:39:59 up 283 days, 22:24,  1 user,  load average: 6.66, 10.39, 13.95
Tasks: 318 total,   1 running, 317 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.6%us,  4.2%sy,  0.0%ni, 40.5%id, 40.6%wa,  0.2%hi,  0.8%si,  0.0%st
Mem:   4053180k total,  4010992k used,    42188k free,   119544k buffers
Swap:  9936160k total,    12160k used,  9924000k free,  2290716k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
23930 mysql     20   0  549m 122m 6580 S   44  3.1   4457:30 mysqld
19946 www-data  20   0  223m  21m  10m S    5  0.6   0:00.77 apache2
17316 www-data  20   0  226m  23m  11m S    1  0.6   0:01.76 apache2
17333 www-data  20   0  222m  21m  11m S    1  0.5   0:01.55 apache2
18212 www-data  20   0  225m  22m  11m S    1  0.6   0:01.58 apache2
19528 www-data  20   0  220m  13m 5480 S    1  0.3   0:00.63 apache2
19600 www-data  20   0  224m  20m  11m S    1  0.5   0:00.73 apache2
19942 www-data  20   0  225m  21m  10m S    1  0.5   0:00.82 apache2
20232 www-data  20   0  222m  16m 8760 S    1  0.4   0:00.65 apache2
20243 www-data  20   0  223m  21m  11m S    1  0.5   0:00.57 apache2
20299 www-data  20   0  225m  20m   9m S    1  0.5   0:00.67 apache2
20441 www-data  20   0  225m  21m  10m S    1  0.5   0:00.57 apache2
21201 www-data  20   0  220m  12m 5148 S    1  0.3   0:00.19 apache2
21362 www-data  20   0  220m  12m 5032 S    1  0.3   0:00.17 apache2
21364 www-data  20   0  220m  12m 4916 S    1  0.3   0:00.14 apache2
21366 www-data  20   0  220m  12m 5124 S    1  0.3   0:00.22 apache2
21373 www-data  20   0  222m  14m 7060 S    1  0.4   0:00.26 apache2

8 Answers

Voted

vdboor · Answer 1 · 2010-06-30T04:00:53+08:00

Best Answer

vdboor

2010-06-30T04:00:53+08:002010-06-30T04:00:53+08:00

Here are a few tools to find disk activity:

iotop
vmstat 1
iostat 1
lsof
strace -e trace=open <application>
strace -e trace=open -p <pid>

In ps auxf you'll also see which processes are are in uninterruptible disk sleep (D) because they are waiting for I/O.

Some days the load increase to reach 40 without increase of the number vistors.

You may also want to create a backup, and see if the harddrive is slowly failing. A harddrive generally starts to slow down before it deceases. This could also explain the high load.

40

ConcernedOfTunbridgeWells · Answer 2 · 2010-07-01T01:15:17+08:00

The output from top suggests that the DBMS is experiencing most of the I/O waits, so database tuning issues are an obvious candidate to investigate.

I/O waiting on a database server - particularly on load spikes - is a clue that your DBMS might be either disk bound (i.e. you need a faster disk subsystem) or it might have a tuning issue. You should probably also look into profiling your database server - i.e. get a trace of what it's doing and what queries are taking the time.

Some starter points for diagnising database tuning issues:-

Find the queries that take up the most time, and look at the query plans. See if any have odd query plans such as a table scan where it shouldn't be. Maybe the database needs an index added.
Long resource wait times may mean that some key resource pool needs to be expanded.
Long I/O wait times may mean that you need a faster disk subsystem.
Are your log and data volumes on separate drives? Database logs have a lot of small sequential writes (essentially they behave like a ring buffer). If you have a busy random access workload sharing the same disks as your logs this will disporportionately affect the throughput of the logging. For a database transaction to commit the log entries must be written out to disk, so this will place a bottleneck on the whole system.

Note that some MySQL storage engines don't use logs so this may not be an issue in your case.

Footnote: Queuing systems

Queuing systems (a statistical model for throughput) get hyperbolically slower as the system approaches saturation. For a high level approximation, a system that is 50% saturated has an average queue length of 2. A system that is 90% saturated has a queue length of 10, a system that is 99% saturated has a queue length of 100.

Thus, on a system that is close to saturation, small changes in load can result in large changes to wait times, in this case manifesting as time spent waiting on I/O. If the I/O capacity of your disk subsystem is nearly saturated then small changes in load can result in significant changes in response times.

Tobu · Answer 3 · 2010-06-30T03:51:44+08:00

Tobu

2010-06-30T03:51:44+08:002010-06-30T03:51:44+08:00

Run iotop, or atop -dD, to see what processes are doing io. Use strace if you need a closer look.

4

David Spillett · Answer 4 · 2010-06-30T05:13:10+08:00

David Spillett

2010-06-30T05:13:10+08:002010-06-30T05:13:10+08:00

Some days the load increase to reach 40 without increase of the number vistors.

What the users are doing could be as significant as the number that are actually there. Operations like searching the forum will be more demanding than just loading and viewing individual threads or lists of threads.

Also: are you running on a dedicated server or a VPS? If your service is not on a dedicated server then the actions of apps running on the same host will have an effect as the VMs your VM shares a host with will compete for a share of the I/O resource.

As others have pointed out, tools like iotop will help you to look deeper into what tasks are sat waiting for I/O responses and what files they are accessing at the time.

1

Flip · Answer 5 · 2010-06-30T05:23:17+08:00

Flip

2010-06-30T05:23:17+08:002010-06-30T05:23:17+08:00

In both screens sure looks like "mysqld" is responsible.

You need to see what that daemon is doing...what queries are running.

1

symcbean · Answer 6 · 2010-07-01T00:54:46+08:00

symcbean

2010-07-01T00:54:46+08:002010-07-01T00:54:46+08:00

As Flip says, it looks like the problem is around what mysql is doing.

Around half of your physical memory is currently being used for I/O caching - forum software usually generates lots of quick queries returning small numbers of rows, with highly skewed hot areas of disk - so there's something definitely screwy going on if the system is spending this much time in wait.

I only ever see CPU/disk usage like that when running queries which update millions of rows.

The high load average is direct consequence of the I/O.

Crank up your mysql logging to see if there's bad code in there / changing indexes would help. Analyzing your tables may help (but probably not much).

C.

0

ninohead · Answer 7 · 2019-12-12T23:07:56+08:00

ninohead

2019-12-12T23:07:56+08:002019-12-12T23:07:56+08:00

After checking all iotop and other tools, also check the "dmesg" queue, you might see the root problem for this issue. In my case it was "CIFS VFS: Server file.core.windows.net has not responded in 120 seconds. Reconnecting..."

0

Michaël Perrin · Answer 8 · 2020-01-23T12:37:56+08:00

Michaël Perrin

2020-01-23T12:37:56+08:002020-01-23T12:37:56+08:00

I got this very high wa CPU usage on a server. It turned out it has not enough available memory and the kswapd0 process was causing this high wa CPU usage.

The server didn't have any Swap memory, so I created some (1Gb) by running these commands (Ubuntu server):

sudo fallocate -l 1G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

The wa CPU usage is now very low, or at 0% most of times.

0

wa (Waiting for I/O) from top command is big

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?