Everyone knows that databases tend to do lots of small random I/O while big data things like Kafka tend to do large sequential I/O, but if I'm approaching this as a sysadmin without making assumptions, how do I determine what a running application is doing on my system, or in general whether my system is seeing? Without knowing how the application is written, how do I determine whether it is doing mostly sequential or random I/O, thereby making it easier for me to choose the right kind of disks, etc, to deploy?
I can use iostat
to get the average request size (avgrq-rz) of each IOP and a count of the number of IOPS (r/s + w/s). How do I determine whether these are mostly sequential or random?
Yes, there're actually couple of tools you can use to monitor this. Such as the one you mentioned, iotop and the iostat command.
Depending on your distribution, you can use either of the following to install iotop:
$ sudo apt-get install iotop or
$ yum install iotop
Run: root@tomcat-1-vm:/# iotop (to see list of disk I/O processess that are running)
You may also display I/O activity just by passing the o phrase
Example: root@tomcat-1-vm:/# iotop -o
To learn more about disk i/o ouput and information you can check the proc filesystem page
Again depending on the distribution, you may install iostat as follows:
$ sudo apt-get install sysstat or
$ yum install sysstat
Run: root@tomcat-1-vm:/# iostat -dx 5
You may also refer to the following iostat examples post on options you can include for more report.
Another command to use is the dstat.
Example: root@tomcat-1-vm:/# sudo apt-get install dstat