well I am setting up a Linux-HA cluster running
*pacemaker-1.1.5
*openais-1.1.4
*multipath-tools-0.4.9
*OpenSuSE 11.4, kernel 2.6.37
Cluster configuration passed healthcheck by LinBit, so I'm pretty confident in it.
Multipath is being used because we have an LSI SAS array connected to each host via 2 HBAs (total 4 paths per host). What I would like to do now is to test the failover capabilities by removing paths from the multipath setup.
The multipath paths are as follows:
pgsql-data (360080e50001b658a000006874e398abe) dm-0 LSI,INF-01-00
size=6.0T features='0' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 4:0:0:1 sda 8:0 active undef running
| `- 5:0:0:1 sde 8:64 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 4:0:1:1 sdc 8:32 active undef running
`- 5:0:1:1 sdg 8:96 active undef running
To simulate losing a path, I echo 1 into /sys/block/{path}/device/state This causes the path to appear failed/faulty to multipath, as follows:
pgsql-data (360080e50001b658a000006874e398abe) dm-0 LSI,INF-01-00
size=6.0T features='0' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 4:0:1:1 sdc 8:32 failed faulty offline
| `- 5:0:1:1 sdg 8:96 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
|- 4:0:0:1 sda 8:0 active undef running
`- 5:0:0:1 sde 8:64 active undef running
However, I notice via watching /var/log/messages that the rdac checker says the path is still up:
multipathd: pgsql-data: sdc - rdac checker reports path is up
Also, let's step back to the multipath -l output --notice how the failed path is still in the active group? It should have been moved to the enabled group, and an active/running path from enabled should have taken its place (in active).
Now, if we down the other active enabled path, sdg, not only does rdac report the path as being up, but the multipath resource goes into a FAILED state in the cluster, neither of the two active / enabled paths take its place, and the result is a segfault, a kernel bug about not being able to dereference a NULL point, and the cluster STONITHs the node.
db01-primary:/home/kendall/scripts # crm resource show
db01-secondary-stonith (stonith:external/ipmi) Started
db01-primary-stonith (stonith:external/ipmi) Started
Master/Slave Set: master_drbd [drbd_pg_xlog]
Masters: [ db01-primary ]
Slaves: [ db01-secondary ]
Resource Group: ha-pgsql
multipathd (lsb:/etc/init.d/multipathd) Started FAILED
pgsql_mp_fs (ocf::heartbeat:Filesystem) Started
pg_xlog_fs (ocf::heartbeat:Filesystem) Started
ha-DBIP-mgmt (ocf::heartbeat:IPaddr2) Started
ha-DBIP (ocf::heartbeat:IPaddr2) Started
postgresql (ocf::heartbeat:pgsql) Started
incron (lsb:/etc/init.d/incron) Started
pgbouncer (lsb:/etc/init.d/pgbouncer) Stopped
pager-email (ocf::heartbeat:MailTo) Stopped
db01-primary:/home/kendall/scripts # multipath -l
pgsql-data (360080e50001b658a000006874e398abe) dm-0 LSI,INF-01-00
size=6.0T features='0' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 4:0:1:1 sdc 8:32 failed faulty offline
| `- 5:0:1:1 sdg 8:96 failed faulty offline
`-+- policy='round-robin 0' prio=0 status=active
|- 4:0:0:1 sda 8:0 active undef running
`- 5:0:0:1 sde 8:64 active undef running
Here is an excerpt from /var/log/messages showing the kernel bug
Aug 17 15:30:40 db01-primary multipathd: 8:96: mark as failed
Aug 17 15:30:40 db01-primary multipathd: pgsql-data: remaining active paths: 2
Aug 17 15:30:40 db01-primary kernel: [ 1833.424180] sd 5:0:1:1: rejecting I/O to offline device
Aug 17 15:30:40 db01-primary kernel: [ 1833.424281] device-mapper: multipath: Failing path 8:96.
Aug 17 15:30:40 db01-primary kernel: [ 1833.428389] sd 4:0:0:1: rdac: array , ctlr 1, queueing MODE_SELECT command
Aug 17 15:30:40 db01-primary multipathd: dm-0: add map (uevent)
Aug 17 15:30:41 db01-primary kernel: [ 1833.804418] sd 4:0:0:1: rdac: array , ctlr 1, MODE_SELECT completed
Aug 17 15:30:41 db01-primary kernel: [ 1833.804437] sd 5:0:0:1: rdac: array , ctlr 1, queueing MODE_SELECT command
Aug 17 15:30:41 db01-primary kernel: [ 1833.808127] sd 5:0:0:1: rdac: array , ctlr 1, MODE_SELECT completed
Aug 17 15:30:42 db01-primary multipathd: pgsql-data: sda - rdac checker reports path is up
Aug 17 15:30:42 db01-primary multipathd: 8:0: reinstated
Aug 17 15:30:42 db01-primary kernel: [ 1835.639635] device-mapper: multipath: adding disabled device 8:32
Aug 17 15:30:42 db01-primary kernel: [ 1835.639652] device-mapper: multipath: adding disabled device 8:96
Aug 17 15:30:42 db01-primary kernel: [ 1835.640666] BUG: unable to handle kernel NULL pointer dereference at (null)
Aug 17 15:30:42 db01-primary kernel: [ 1835.640688] IP: [<ffffffffa01408a3>] dm_set_device_limits+0x23/0x140 [dm_mod]
There is also a stack trace, which is available at http://pastebin.com/gifMj7gu
multipath.conf is available at http://pastebin.com/dw9pqF3Z
Anyone has any insight into this, and/or how to proceed?
I can re-create this each time.
Ok, so it turns out that just setting "offline" in /sys/block/{dev}/device/state was not sufficient to make rdac report the path as being down. Last night I spent some time with the unit, pulling the SAS cables and watching the behavior of the system. This works as properly. Not quite "as expected" because when an active path goes not, it does not get replaced from the enabled group, but that's a different issue. Failover also worked as expected; once the last path was lost the cluster shut the database and related resources down, and transferred them to the secondary node.
If you find yourself in a similar situation, you can trying setting the multipath hwhandler to "0" in multipath.conf; you'll have to set this in the device{} section. This basically disables path checks, so once the device is offline'd, it's really offline.