There are multiple questions here - but it starts with this: we've a Dell PowerEdge R710 with a PERC 6/i RAID controller (or controllers) in a RAID10 configuration.
The system is running Ubuntu Server 10.04 LTS with MySQL doing a read-intensive workload.
I increased readahead using blockdev --setra ### /dev/sda
to increase readahead (the reads are, at least in theory, sequential reads). This does not seem to have had a significant impact. I've not changed the disk elevator (I've seen noop
and deadline
recommended).
The load on the system skyrockets and it appears to be related to disk I/O waits. The system can be waiting up to 50% of the time for disk I/O - while CPU % is at about 7-10%. A comparable system with a RAID5 and a write-intensive MySQL installation smokes this system entirely.
The RAID10 system appears to have two PERC 6/i controllers given what Dell OpenManage reports; however, only Controller 0 has an enclosure and only Controller 0 has the RAID on it. The RAID is made up of four disks (slots 0-3 I believe) with two free slots.
The system is also running in a PowerSaving profile that lets the operating system manage the CPU speeds.
The system is also afflicted with the fsync() bug found in some Linux kernels.
Lastly, the PERC 6/i is reporting that the firmware is out of date: it has 6.2.0-0013 and wants 6.3.0-0001.
Now the questions:
- Is it possible to move one part of the RAID10 array to a second controller?
- Are there actually two controllers that can be used in the same backplane or am I missing something?
- Would a firmware update fix the disk speed issue?
- Would the RAID level have anything to do with the large disk IO wait?
- How much of an effect would the PowerSaving mode have? (Some reports seem to say it slows the kernel down.)
I strongly suspect that there is some kind of configuration that will zap the disks into frighteningly high speeds, but I can't seem to pin it down.
Update: The four disks used here are the Hitachi HDS721010CLA332 model, which is listed as having a SATA "Bus Protocol" but having a "SAS Address" as well? Are these disks those SAS-impersonating drives I've heard about that are supposed to be quite slow? In any case, these are 7200 RPM drives apparently.
The comparison system has SAS drives in it: the Seagate ST31000640SS - also 7200 RPM. This comparison system also has both RAID controllers utilized and with "backplane" entries associated with them.
The PERC 6/i is a dual-port controller; each port has 4 SAS lanes. On the 8x2.5in R710 chassis, that's a one-to-one mapping of front-panel disks to SAS lanes. On the 3.5in chassis, ports 6 and 7 are unused. With a 4-disk array, you could move 2 disks to slots 4 and 5 to split the workload between channels, although there's still the single processor and memory on the PERC card.
Updating firmware is typically a good idea, and is a fairly painless process (although it does require a reboot.)
4 disk RAID 10 gives you performance of 2 disks for writes and 4 disks for reads (absolutely best-case scenario). A 7200 rpm HDDs should give 75-100 IOps. What kind of performance do you see? Do you read
%util
close to 100 iniostat
?If the primary load is generated by a database, what makes you think it is going to be mainly sequential? Databases are the stereotypical random access case. You can use
iostat
to see average request size.collectl
will additionally give you information on I/O merges done in the kernel. Does it agree with your expectation of mainly sequential reads?What fsync() kernel bug do you mean?
What filesystem do you use? What mount options?
noatime
option can buy you noticeable speed up on ext[34], because modification of access time can mean extra write for every read of a file (worst case, high-res timestamps).Answer section ;)
Firmware update may help, but do not expect miracles. You may gain couple percent, not
RAID 10 is the best level for performance (if you want to keep redundancy), so it shouldn't cause problems in and of itself. However, you may have partitions and / or LVs not aligned with stripe size. This could potentially double IOs needed for small random reads (worst case scenario), and will impose overhead on any type of I/O.
Power Saving mode shouldn't cost you much. From what you tell us the disks are too busy to be spun down, and CPU is waiting for I/O anyhow.
Be careful using tools that show average CPU load. That number is certainly a good starting point to see a ball-park load but if you see 50% load on a 24 cpu system, how do you know 12 cpus aren't being 100% utilized and the other 12 idle? I've seen cases where the load is <10% yet 1 cpu is being hammered at 100% processing interrupts. -mark
One of our servers had that RAID controller and firmware revision; apparently, the newest version of the firmware fixes a bug where the write-cache battery doesn't properly charge. Due to the battery not being charged, the controller switches to Write Through mode to protect your data, significantly impacting your performance.
Update the firmware and give it a few hours for the battery to charge. Then you'll be running normally.