As much as I have read about iowait, it is still mystery to me.
I know it's the time spent by the CPU waiting for a IO operations to complete, but what kind of IO operations precisely? What I am also not sure, is why it so important? Can't the CPU just do something else while the IO operation completes, and then get back to processing data?
Also what are the right tools to diagnose what process(es) did exactly wait for IO.
And what are the ways to minimize IO wait time?
Yes, the operating system will schedule other processes to run while one is blocked on IO. However inside that process, unless it's using asynchronous IO, it will not progress until whatever IO operation is complete.
Some tools you might find useful
iostat
, to monitor the service times of your disksiotop
(if your kernel supports it), to monitor the breakdown of IO requests per processstrace
, to look at the actual operations issued by a processOld question, recently bumped, but felt the existing answers were insufficient.
IOWait definition & properties
IOWait (usually labeled
%wa
in top) is a sub-category of idle (%idle
is usually expressed as all idle except defined subcategories), meaning the CPU is not doing anything. Therefore, as long as there is another process that the CPU could be processing, it will do so. Additionally, idle, user, system, iowait, etc are a measurement with respect to the CPU. In other words, you can think of iowait as the idle caused by waiting for io.Precisely, iowait is time spent receiving and handling hardware interrupts as a percentage of processor ticks. Software interrupts usually are labled separately as
%si
.Importance & Potential misconception
IOWait is important because it often is a key metric to know if you're bottlenecked on IO. But absense of iowait does not necessarily mean your application is not bottlenecked on IO. Consider two applications running on a system. If program 1 is heavily io bottlenecked and program 2 is a heavy CPU user, the
%user + %system
of CPU may still be something like ~100% and correspondingly, iowait would show 0. But that's just because program 2 is intensive and relatively appear to say nothing about program 1 because all this is from the CPU's point of view.Tools to Detect IOWait
See posts by Dave Cheney and Xerxes
But also a simple
top
will show in%wa
.Reducing IOWait
Also, as we are now almost entering 2013, in addition to what others said, the option of simply awesome IO storage devices are affordable, namely SSDs. SSDs are awesome!!!
I found the explanation and examples from this link very useful: What exactly is "iowait"?. BTW, for the sake of completeness, the I/O here refers to disk I/O, but could also include I/O on a network mounted disk (such as nfs), as explained in this other post.
I will quote a few important sections (in case the link goes dead), some of those would be repetitions of what others have said already, but to me at least these were clearer:
I was wondering what happens when system has other processes ready to run while one process is waiting for I/O. The below explains it:
And here is an example:
The full text is worth reading. Here is a mirror of this page, in case it goes down.
iowait
iowait
is time that the processor/processors are waiting (i.e. is in an idle state and does nothing), during which there in fact was outstanding disk I/O requests.This usually means that the block devices (i.e. physical disks, not memory) is too slow, or simply saturated.
You should hence note that if you see a high load average on your system, and on inspection notice that most of this is actually due to I/O wait, it does not necessarily mean that your system is in trouble - and this occurs when your machine simply has nothing to do, other than than I/O-bound processes (i.e. processes that do more I/O than anything else (non-I/O-bound system calls)). That should also be apparent from the fact that anything you do on the system is still very responsive.
tools
sar
(from thesysstat
package, available on most *nix machines)iostat
sarface
(a front-end tosar
)For Solaris, I use DTrace to look at what the processes are doing if I need to see what I/O operations are running. For Linux, there's a similar program called systemtap which provides a similar level of exposure to the kernel and process calls.
One example I used when learning DTrace was to compare a
cp
command to add
command. You can see thatdd
does a lot more reads for the write, whilecp
does not, mostly because of the buffer sizedd
uses by default (if I'm remembering correctly).What kind of IO operations will depend on your applications and setup.
It is important as in some cases the CPU can't get the data or instructions that it needs to continue. In some cases it can continue, but it will depend on what apps are running as to what it can do. If you have a single threaded application which does lots of disk access then you will need to wait.
To minimise the IO time, buy more and faster memory, get faster disks, defrag the disks you have.
If it is an in house application which is the bottleneck see if it can be optimised to read in bigger blocks or to do IO asynchronously.
using ps aux can print process STAT
if stat is D or Ds, the process is in Uninterruptible sleep (usually IO)
when a process enter Uninterruptible sleep, nr_iowait of runqueue is added, and if nr_iowait > 0, the idle time of cpu is counted to iowait
vmstat also shows how many process blocks
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.
http://bencane.com/2012/08/06/troubleshooting-high-io-wait-in-linux/