I am reading a large file sequentially from the disk and trying to understand the iostat output while the reading is taking place.
- Size of the file : 10 GB
- Read Buffer : 4 KB
- Read ahead (/sys/block/sda/queue/read_ahead_kb) : 128 KB
The iostat output is as follows
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 833.00 14.00 103.88 0.05 251.30 6.07 5.69 2.33 205.71 1.18 100.00
Computing the average size of an I/O request = (rMB/s divided by r/s) gives ~ 128 KB which is the read ahead value. This seems to indicate that while the read system call has specified a 4KB buffer, the actual disk I/O is happening according to the read ahead value.
When I increased the read ahead value to 256KB, the iostat output was as follows
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 28.00 412.00 12.00 102.50 0.05 495.32 10.78 12.15 4.76 265.83 2.36 100.00
Again the average I/O request size was 256 KB matching the read ahead.
This kept up until I set 512 KB as the read ahead value and did not hold up when I moved up to a read ahead value of 1024 KB - the average size of the I/O request was still 512 KB. Increasing max_sectors_kb (maximum amount of data per I/O request) from the default of 512 KB to 1024 KB also did not help here.
Why is this happening - ideally I would like to minimize my read IOPS as much as possible and read larger amount of data per I/O request (larger than 512 KB per request). Additionally, I am hitting 100% disk utilization in all cases - I would want to throttle myself to read at 50-60% disk utilization with good sequential throughput. In short, what are the optimized application/kernel settings for sequential read I/O.