I have the following hardware:
- 2x SuperMicro servers (128GB RAM, 2x 8 core AMDs)
- 2x/server LSI SAS2008 PCIe MPT-Fusion2 HBAs (2 SAS ports per card)
- 1x LSI CTS2600 DAS with 24x W.D. 15.7k RPM 600GB SAS drives.
Server is running OpenSuSE 11.4, with a custom build of multipath-tools built from upstream and incorporating the OpenSuSE 11.3 patch set. All 4 SAS ports on each server are connected to the DAS, 2 to each of the DAS RAID controllers.
DAS is setup with 22 drives in a RAID10, 128k stripe. I created a single 500GB volumegroup on the array and exported it to one of the servers.
Multipath is setup to multipath I/O to the 500GB LUN exported to the server. Here is the multipath.conf file:
defaults {
path_checker "directio"
path_selector "queue-length 0"
path_grouping_policy "multibus"
prio "random"
features "1 queue_if_no_path" #queue IO if all paths are lost
}
multipath -l
output:
pg (360080e50001b658a000005104df8c650) dm-0 LSI,INF-01-00
size=500G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 4:0:0:1 sda 8:0 active undef running
| `- 5:0:0:1 sde 8:64 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 4:0:1:1 sdc 8:32 active undef running
`- 5:0:1:1 sdg 8:96 active undef running
Notice how for the second set of paths, "status=enabled", not "status=active" as for the first two. Now, a look at iostat shows that indeed we are only using the first two paths:
Linux 2.6.37.6-0.5-default (slipdb01-primary) 07/07/2011 _x86_64_ (16 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 1.18 441.70 30.44 4748.62 21.58 0.79 1.79 0.24 10.60
sdb 0.00 0.00 0.00 0.00 0.00 0.00 14.22 0.00 83.56 82.92 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 8.06 0.00 334.53 331.73 0.02
sdd 0.00 0.00 0.00 0.00 0.00 0.00 16.99 0.00 98.73 95.76 0.00
sde 0.00 0.00 1.18 441.70 30.43 4747.77 21.58 0.79 1.79 0.24 10.60
sdf 0.00 0.00 0.00 0.00 0.00 0.00 14.43 0.00 77.17 76.66 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 8.06 0.00 301.72 297.05 0.02
sdh 0.00 0.00 0.00 0.00 0.00 0.00 14.29 0.00 83.12 82.69 0.00
sdi 0.00 0.00 0.08 0.48 8.73 35.82 159.00 0.06 99.95 1.08 0.06
sdj 0.00 2311.06 0.00 340.49 0.01 10606.18 62.30 0.04 0.12 0.08 2.83
dm-0 0.02 1353.74 2.36 883.40 60.86 9496.39 21.58 0.95 1.08 0.13 11.20
dm-2 0.00 0.00 2.38 2237.14 60.86 9496.39 8.54 1.90 0.84 0.05 11.20
As I understand it, setting path_grouping_policy to 'multibus' should balance the IO across ALL paths, so I should see 4 active paths. If I change path_grouping_policy to 'failover', I see the same 2 active paths.
Additionally, notice that I have path_selector set to 'queue-length 0' yet the output of 'multipath -l' clearly shows it's using round-robin.
Anyone have any ideas on why multipath-tools won't use all 4 paths, and why it's ignoring my choice of path selection algorithm?
Many thanks...
Well, it seems as though the array is not active/active in the manner I thought..and in the manner I thought was the common definition. CTS2600 is an active/active array in the manner that it can server LUN1 from Controller A and LUN2 from Controller B, but not LUN1 from Controller A,B. So it looks like I can't get all 4 paths going to one LUN.
However, I did figure out that I can load balance the IO across both controllers. I did this by creating a 22 drive RAID10 Volume Group on the CTS2600 array, creating two Volumes and setting the preferred path for Volume A to Controller A, and controller B for Volume B, and exporting them to the server. I then initialized them as LVM2 Physical Volumes using the names under /dev/mapper/. Next I created an LVM2 Volume Group that contained both Physical Volumes. Since I have two LUNs, upon doing the 'lvcreate' I added the option "--stripes 2". I then formatted, mounted, and used the device as usual. Watching both 'iostat' and the SANtricity builtin Performance Monitor, it was clear that the IO was being spread across both contollers, as expected.
Thanks to a kind gentleman on #postgres for giving me the heads up on the --stripes option to make this happen (especially since LSI was unwilling or unable to help).
Additionally, I left out details on the queue-length and path_selector bits. The storage device I'm using is already in the multipath database, and as such has certain defaults set, like queue-length and path_selector. My /etc/multipath.conf was missing a " devices { device { " section, which is where you can override default options. After making that change, I was able to confirm I could change (and multipath would use) queue-length and path_selector. The portion I added to /etc/multipath.conf:
Hope this helps somebody.
This is sometimes called "Dual Active" and not true Active/Active FC SAN engineers are accustomed to. Vendors could stand to do a better job in describing the limitation of their SAS based products. This article explains all the modes quite well.