I am having wildly different software raid10 performance and behavior on two otherwise identical machines.
I have two machines which are hardware identical, bought at the same time, with the same software versions, hardware versions, and firmware versions. Each has a SAS controller with 8 x 6 Gb/s channels going to a SAS enclosure which holds 12 SAS disks.
On machine 1, which is stable and seems to be working perfectly, each disk in the raid array behaves more or less identically: busy time is equal (about 33% across all disks in production load levels), and while the weekly software raid check runs, write and read performance is not degraded. The full raid check completes in about a day, using all available spare bandwidth to complete it as fast as possible. This amounts to about 200 MB/sec reads while this check completes.
Machine 2 is a problem child. The full raid check completes in basically never, although it is configured to also use all available disk bandwidth. While it is attempting to check, it plods along at 5 MB/sec, and write performance drops to about 30 MB/sec during this time. Also, four disks are at 35% busy, while the remaining ones are 22% busy on average.
After cancelling the raid check on machine 2, the write speed returns to about 160 MB/sec.
If I use dd
to test each individual mpath
device, on machine 1 I get most speeds around 145 MB/sec reading per drive, and the lowest of 119 MB/sec followed by 127 MB. The rest are all in the 145 MB/sec range.
On machine 2, I get speeds between 107 MB (x 3 disks) and the rest are all above 135 MB/sec, with the peak of 191 MB/sec (!) for one disk.
I admit to being well out of my comfort zone here, but I cannot find any evidence to draw a conclusion from. I have also checked SMART stats on each disk on both machines, and while there are a fair number of "read corrected" errors on all disks, there seems to be no correlation between the values and the read performance, nor between the busy% difference.
Nothing I can find explains the poor performance when performing a RAID check of the array on one box vs on the other. Suggestions on where to go next to debug this would be appreciated.
I found the problem. The write cache was disabled on 4 of the 12 disks in the software array.
Here's what I did to narrow this down:
I broke the array apart, and used dd with oflag=direct to test write speed to each disk. I found the ones with the higher busy % were also the ones which could only write about 75 MB/sec, while all the others could do 180 MB/sec for a sustained 1 GB and 10 GB data size write.
However, since the 4 slow disks were all very consistent with each other, I started digging in, and installed
sdparm
to allow me to fiddle with the SCSI parameter pages.Once I saw the default for WCE (write cache enable) is on, but these four disks had it off, I turned it on. Write speed went up to the 180 MB/sec mark, and the array is now rebuilding at the rate of 1 GB/sec (which is about the max this set of disks can do with this controller.)
The check command is
sdparm --get=WCE /dev/mapper/mpatha
and to set it,sdparm --set=WCE --save /dev/mapper/mpatha
for future users.Additionally, power savings was on -- this prevented OS level caching to maintain speed, although oflag=direct still did.