While salvaging a 2-disk failure in my 3-disk RAID 5 setup, I happened to notice reconstruction was faster with NCQ disabled (~90M/sec) than with the NCQ enabled (~50M/sec). Running bonnie++ to benchmark the two configurations also revealed significantly better write performance for Sequential Output Block:
- no NCQ - 85M/sec, 2021ms latency
- NCQ - 62M/sec, 57118ms latency
Isn't 57 seconds a tad excessive?
I disabled NCQ with the kernel parameter libata.force=noncq and ran the tests twice alternately with no other configuration changes. I also ran bonnie++ on partitions on the three component disks and found no significant differences between running with or without NCQ.
Here's the full bonnie++ output:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
raid5-noncq 24G 435 92 85154 18 53585 9 3409 93 154272 10 297.6 13
Latency 525ms 2021ms 235ms 27652us 158ms 747ms
raid5 24G 372 81 61591 7 60999 9 3130 86 160280 10 296.4 13
Latency 18784us 57118ms 11323ms 59583us 133ms 150ms
sdd-ext4-noncq 24G 513 97 73428 8 33118 4 3324 91 90266 5 170.6 1
Latency 17985us 503ms 1805ms 30066us 15626us 1341ms
sdd-ext4 24G 499 97 71223 8 33015 4 3326 95 95342 5 276.0 3
Latency 17689us 1124ms 1345ms 11202us 18187us 1213ms
sde-ext4-noncq 24G 517 97 48200 5 22385 3 3555 94 62578 3 174.3 1
Latency 22423us 1609ms 2296ms 13131us 22446us 1960ms
sde-ext4 24G 491 97 47942 5 22317 3 3281 95 62669 3 257.1 3
Latency 20081us 2860ms 2434ms 12207us 27984us 990ms
sdb-ext4-nonc-2 24G 976 99 81552 9 47557 6 3514 95 146167 8 208.6 7
Latency 13004us 227ms 758ms 40575us 59198us 219ms
sdb-ext4 24G 1014 99 79603 8 48778 6 3598 97 146225 8 310.1 10
Latency 10686us 659ms 400ms 20054us 67295us 226ms
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
raid5-noncq 16 8997 38 +++++ +++ 18071 22 18097 75 +++++ +++ +++++ +++
Latency 25974us 695us 738us 123us 21us 43us
raid5 16 4227 21 +++++ +++ 15053 25 11634 58 +++++ +++ 20341 33
Latency 26221us 735us 735us 172us 15us 55us
sdd-ext4-noncq 16 10453 53 +++++ +++ +++++ +++ 16324 82 +++++ +++ +++++ +++
Latency 245us 1126us 895us 204us 30us 46us
sdd-ext4 16 15339 51 +++++ +++ +++++ +++ 23763 77 +++++ +++ +++++ +++
Latency 192us 957us 641us 115us 94us 101us
sde-ext4-noncq 16 12825 41 +++++ +++ +++++ +++ 21636 68 +++++ +++ +++++ +++
Latency 452us 662us 642us 115us 20us 41us
sde-ext4 16 13185 45 +++++ +++ +++++ +++ 23033 77 +++++ +++ +++++ +++
Latency 136us 634us 655us 118us 28us 41us
sdb-ext4-noncq 16 4657 78 +++++ +++ +++++ +++ 6912 97 +++++ +++ 23160 95
Latency 89592us 180us 115us 348us 46us 131us
sdb-ext4 16 5061 75 +++++ +++ +++++ +++ 7011 97 +++++ +++ 23400 95
Latency 12010us 110us 119us 633us 30us 143us
Thanks for updating the question! It seems like the NCQ test with the RAID5 is just exposing that the NCQ is done ignorantly of how the data must be written to the disks even if it is in software (this is a really common issue with IO schedulers and hardware raid in my experience, http://blog.nexcess.net/2010/11/07/changing-your-linux-io-scheduler/).
From: https://raid.wiki.kernel.org/index.php/Performance
I think the other tests do confirm that NCQ is working when talking directly to the disk, as they either show a performance boost or are near as makes no difference to the noNCQ tests.
Regarding the 57s value it is just an artifact of how much IO buffer you have and the scehduler you are using. If you are concerned about interactive responsiveness of the IO system you should probably investigate a different scehduler (see: http://blog.nexcess.net/2010/11/07/changing-your-linux-io-scheduler/).