I am trying to tune my NAS, running openfiler, and wondering why I'm getting relatively poor read performance from 4 WD RE3 drives in RAID 5.
EDIT: Please note I am talking about the buffered disk read speed not cached speeds
EDIT: Changed formatting to make clear there are two sets of output.
When I run hdparm on the meta device I get the levels of performance I'd expect, drop to the volume and it's a third the speed !
Any one any idea why ? Is LVM that bad ?
Dean
Meta device /dev/md0 results
[root@nas2 etc]# hdparm -tT /dev/md0 /dev/md0: Timing cached reads: 4636 MB in 2.00 seconds = 2318.96 MB/sec Timing buffered disk reads: 524 MB in 3.01 seconds = 174.04 MB/sec
Vol group /dev/mapper/vg1-vol1 results
[root@nas2 etc]# hdparm -tT /dev/mapper/vg1-vol1 /dev/mapper/vg1-vol1: Timing cached reads: 4640 MB in 2.00 seconds = 2320.28 MB/sec Timing buffered disk reads: 200 MB in 3.01 seconds = 66.43 MB/sec
Edit: See section from the hdparm man page which suggest this is perfectly valid test for sequential read performance which is the issue I am trying to resolve.
-t Perform timings of device reads for benchmark and comparison purposes. For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with at least a couple of megabytes of free memory. This displays the speed of reading through the buffer cache to the disk without any prior caching of data. This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead. To ensure accurate measurements, the buffer cache is flushed during the processing of -t using the BLKFLSBUF ioctl. If the -T flag is also specified, then a correction factor based on the outcome of -T will be incorporated into the result reported for the -t operation.
The default readahead settings for LVM are really pessimistic. Try
blockdev --setra 8192 /dev/vg1/vol1
and see what that bumps your LVM performance up to. You will always take a performance hit using LVM; we measure it on properly configured systems at about 10% of underlying block device performance.I don't have a good explanation, but I can confirm the results.
Testing of RAID (raid5, 4x1.5TB drives)
test of volume which is uses md2 as the physical device.
I made the change proposed by womble and saw results like this.
Make sure that you compare apples to apples.
hdparm -t
reads from the beginning of the device which is also the fastest part of your disk if you're giving it a whole disk (and it's spinning platters).Make sure you compare it with a LV from the beginning of the disk.
To see the mapping use
pvdisplay -m
.(okay, granted, the difference in numbers may be negligible. But at least think about it :)
The workload created by hdparm -T is not representative for almost any use case except streaming reads from a single large file. Also, if performance is a concern, don't use raid5.
You can figure out where hdparm is spending its time with blktrace (if it's in I/O) or oprofile (if it's on CPU). Knowing the LVM setup would also help (pvdisplay, vgdisplay, lvdisplay).