Ping a Specific Port

Question

Toto

Asked: 2014-11-20 08:27:12 +0800 CST2014-11-20 08:27:12 +0800 CST 2014-11-20 08:27:12 +0800 CST

Evaluating the CPU I/O wait on Linux

772

Doing a top to check the io wait, I get these figures:

Cpu(s):  6.7%us,  1.4%sy,  1.2%ni, 85.5%id,  5.0%wa,  0.0%hi,  0.3%si,  0.0%st

Looking at these figures (%us ~= %wa), do they mean that:

there are almost as many CPU processes waiting than working? (=> bad)
the working processeses are waiting 5,0% of their execution plan? (=> ok in this case)
something else

3 Answers

Voted

Matthew Ife · Answer 1 · 2014-11-20T10:04:44+08:00

You need to be careful when evaluating these figures.

IOWait is related, but not necessarily linearly correlated with disk activity.
The number of CPUs you have affects your percentage.
A high IOWait (depending on your application) does not necessarily indicate a problem for you. Alternatively a small IOWait may translate into a problem for you. It basically boils down to what task is waiting.

IOWait in this context is the measure of time over a given period that a CPU (or all CPUS) spent idle because all runnable tasks were waiting for a IO operation to be fulfilled.

In your example, if you have 20 CPUs, with one task really hammering the disk, this task is (in effect) spending 100% of its time in IOWait, subsequently the CPU that this task runs on spends almost 100% of its time in IOWait. However, if 19 other CPUs are effectively idle and not utilizing this disk, they report 0% IOWait. This results in an averaged IOWait percentage of 5%, when in fact if you were to peek at your disk utilization this could report 100%. If the application waiting on disk is critical to you -- this 5% is somewhat misleading because the task in the bottleneck is seeing likely much higher performance issues than going 5% slow.

there are almost as many CPU processes waiting than working? (=> bad)

Probably, remember for the most part CPUs run tasks and tasks are what request IO. If two separate tasks are busy querying the same disk on two separate CPUs, this will put both CPUs at 100% IOWait (and in the 20 CPU example a 10% overall average IOWait).

Basically if you have a lot of tasks that request IO, especially from the same disk, plus that disk is 100% utilized (see iostat -mtx) then this is bad.

the working processeses are waiting 5,0% of their execution plan? (=> ok in this case)

No. The working processes are almost certainly waiting full-time for IO. It's just the average report case ("the other CPUs are not busy") fudges the percentage or the fact that the CPU has many tasks to run, of which many don't need to do IO.

As a general rule, on a multi-CPU system, an IOWait percentage which is equal to the number of CPUs you have divided by a 100 is probably something to investigate.

something else

See above. But note that applications that do very heavy writing are throttled (stop using writeback, start writing directly to disk). This causes those tasks to produce high IOWait whilst other tasks on the same CPU writing to the same disk would not. So exceptions do exist.

Also note if you have 1 CPU dedicated to running 2 tasks, one is a heavy IO read/writer and the other is a heavy CPU user, then the CPU will report 50% IOWait in this case, if you have 10 tasks like this it would be 10% IOWait (and a horrific load), so the number can be reported much lower than what might actually be a problem.

I think you really need to take a look at iostat -mtx to get some disk utilization metrics, and pidstat -d to get some per-process metrics, then consider whether or not the applications hitting those disks in that way are likely to cause a problem, or other potential applications that hit those disks being likely to cause a problem.

CPU metrics really act as indicators to underlying issues, they are general so understanding where they may be too general is a good thing.

Hrvoje Špoljar · Answer 2 · 2014-11-20T08:37:52+08:00

It means that 5% of CPU time is spent waiting for disk IO to finish, and 6,7% CPU time is spent to actually do the processing required by userland process.

Check vmstat output ; e.g. vmstat 1 30 as long as process count in column b does not pile up you're good. Column b indicates number of processes in uninteruptable state (D state) which are blocked until disk IO operation finishes.

So answer to your questions

there are almost as many CPU processes waiting than working? (=> bad)

No time is roughly the same but this is not necessarily a problem. As long as you dont have problem where processes start piling in D state, you are good. Improvements may include adding more RAM to have more room for pagecache (diskcache) to reduce number of disk reads, and rather read from in memory cache, tuning disk scheduler.

the working processeses are waiting 5,0% of their execution plan? (=> ok in this case)

This is portion of CPU time spent on handling userland processes; there is nothing to be worried about here especially with so much idle 85.5%id CPU time

Sobrique · Answer 3 · 2014-11-20T08:37:56+08:00

Sobrique

2014-11-20T08:37:56+08:002014-11-20T08:37:56+08:00

The wait state is when a process that is otherwise runnable is stopped waiting for IO. It's a sign of contention, usually for disk resources.

It does mean that some of your processes aren't running as fast as they could, but that's pretty normal.

0

Evaluating the CPU I/O wait on Linux

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?