Ping a Specific Port

Question

Avada Kedavra

Asked: 2010-08-12 23:48:49 +0800 CST2010-08-12 23:48:49 +0800 CST 2010-08-12 23:48:49 +0800 CST

How to check disk I/O utilization per process?

772

I'm having a problem with a Linux system and I have found sysstat and sar to report huge peaks of disk I/O, average service time as well as average wait time.

How could I determine which process is causing these peaks the next time it happen?

Is it possible to do with sar? Can I find this info from the already recorded sar files?

Output of sar -d, system stall happened around 12.58-13.01pm.

12:40:01          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:40:01       dev8-0     11.57      0.11    710.08     61.36      0.01      0.97      0.37      0.43
12:45:01       dev8-0     13.36      0.00    972.93     72.82      0.01      1.00      0.32      0.43
12:50:01       dev8-0     13.55      0.03    616.56     45.49      0.01      0.70      0.35      0.47
12:55:01       dev8-0     13.99      0.08    917.00     65.55      0.01      0.86      0.37      0.52
13:01:02       dev8-0      6.28      0.00    400.53     63.81      0.89    141.87    141.12     88.59
13:05:01       dev8-0     22.75      0.03    932.13     40.97      0.01      0.65      0.27      0.62
13:10:01       dev8-0     13.11      0.00    634.55     48.42      0.01      0.71      0.38      0.50

I also have this follow-up question to another thread I started yesterday:

Sudden peaks in load and disk block wait

6 Answers

Voted

halp · Answer 1 · 2010-08-13T01:19:46+08:00

Best Answer

halp

2010-08-13T01:19:46+08:002010-08-13T01:19:46+08:00

If you are lucky enough to catch the next peak utilization period, you can study per-process I/O stats interactively, using iotop.

57

user920391 · Answer 2 · 2014-04-14T06:12:47+08:00

You can use pidstat to print cumulative io statistics per process every 20 seconds with this command:

# pidstat -dl 20

Each row will have follwing columns:

PID - process ID
kB_rd/s - Number of kilobytes the task has caused to be read from disk per second.
kB_wr/s - Number of kilobytes the task has caused, or shall cause to be written to disk per second.
kB_ccwr/s - Number of kilobytes whose writing to disk has been cancelled by the task. This may occur when the task truncates some dirty pagecache. In this case, some IO which another task has been accounted for will not be happening.
Command - The command name of the task.

Output looks like this:

05:57:12 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:32 PM       202      0.00      2.40      0.00  jbd2/sda1-8
05:57:32 PM      3000      0.00      0.20      0.00  kdeinit4: plasma-desktop [kdeinit]              

05:57:32 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:52 PM       202      0.00      0.80      0.00  jbd2/sda1-8
05:57:52 PM       411      0.00      1.20      0.00  jbd2/sda3-8
05:57:52 PM      2791      0.00     37.80      1.00  kdeinit4: kdeinit4 Running...                   
05:57:52 PM      5156      0.00      0.80      0.00  /usr/lib64/chromium/chromium --password-store=kwallet --enable-threaded-compositing 
05:57:52 PM      8651     98.20      0.00      0.00  bash 

05:57:52 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:12 PM       202      0.00      0.20      0.00  jbd2/sda1-8
05:58:12 PM      3000      0.00      0.80      0.00  kdeinit4: plasma-desktop [kdeinit]

mr.spuratic · Answer 3 · 2013-01-12T14:52:42+08:00

mr.spuratic

2013-01-12T14:52:42+08:002013-01-12T14:52:42+08:00

Nothing beats ongoing monitoring, you simply cannot get time-sensitive data back after the event...

There are a couple of things you might be able to check to implicate or eliminate however — /proc is your friend.

sort -n -k 10 /proc/diskstats
sort -n -k 11 /proc/diskstats

Fields 10, 11 are accumulated written sectors, and accumulated time (ms) writing. This will show your hot file-system partitions.

cut -d" " -f 1,2,42 /proc/[0-9]*/stat | sort -n -k +3

Those fields are PID, command and cumulative IO-wait ticks. This will show your hot processes, though only if they are still running. (You probably want to ignore your filesystem journalling threads.)

The usefulness of the above depends on uptime, the nature of your long running processes, and how your file systems are used.

Caveats: does not apply to pre-2.6 kernels, check your documentation if unsure.

(Now go and do your future-self a favour, install Munin/Nagios/Cacti/whatever ;-)

20

Wayne Walker · Answer 4 · 2015-10-17T08:41:22+08:00

Wayne Walker

2015-10-17T08:41:22+08:002015-10-17T08:41:22+08:00

Use atop. (http://www.atoptool.nl/)

Write the data to a compressed file that atop can read later in an interactive style. Take a reading (delta) every 10 seconds. do it 1080 times (3 hours; so if you forget about it the output file won't run you out of disk):

$ atop -a -w historical_everything.atop 10 1080 &

After bad thing happens again:

(even if it is still running in the background, it just appends every 10 seconds)

% atop -r historical_everything.atop

Since you said IO, I would hit 3 keys: tdD

t - move forward to the next data gathering (10 seconds)
d - show the disk io oriented information per process
D - sort the processes based on disk activity
T - go backwards 1 data point (10 seconds probably)
h - bring up help
b - jump to a time (nearest prior datapoint) - e.g. b12:00 - only jumps forward
1 - display per second instead of delta since last datapiont in the upper half of the display

12

Janne Pikkarainen · Answer 5 · 2010-08-13T00:02:25+08:00

Janne Pikkarainen

2010-08-13T00:02:25+08:002010-08-13T00:02:25+08:00

Use btrace. It's easy to use, for example btrace /dev/sda. If the command is not available, it is probably available in package blktrace.

EDIT: Since the debugfs is not enabled in the kernel, you might try date >>/tmp/wtf && ps -eo "cmd,pid,min_flt,maj_flt" >>/tmp/wtf or similar. Logging page faults is not of course at all the same than using btrace, but if you are lucky, it MAY give you some hint about the most disk hungry processes. I just tried that one on of my most I/O intensive servers and list included the processes I know are consuming lots of I/O.

8

16851556 · Answer 6 · 2022-03-09T00:21:04+08:00

16851556

2022-03-09T00:21:04+08:002022-03-09T00:21:04+08:00

Disk utilization by each process:

$ glances # (with htop the best tool to get idea what is going on. Hit right arrow keys for process sorting by disk utilization)

$ sudo iotop -ao # (-a accumulated; -o show only processes with activity)

0

How to check disk I/O utilization per process?

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?