I have a multipath config that was working but now shows a "faulty" path:
[root@nas ~]# multipath -ll
sdd: checker msg is "readsector0 checker reports path is down"
mpath1 (36001f93000a63000019f000200000000) dm-2 XIOTECH,ISE1400
[size=200G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:1 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:1 sdd 8:48 [active][faulty]
At the same time I'm seeing these three lines over and over in /var/log/messages
Feb 5 12:52:57 nas kernel: sd 2:0:0:1: SCSI error: return code = 0x00010000
Feb 5 12:52:57 nas kernel: end_request: I/O error, dev sdd, sector 0
Feb 5 12:52:57 nas kernel: Buffer I/O error on device sdd, logical block 0
And this line shows up fairly often too
Feb 5 12:52:58 nas multipathd: sdd: readsector0 checker reports path is down
One thing I don't understand is why its using the readsector0
checking method when my /etc/multipath.conf
file say to use tur
[root@nas ~]# tail -n15 /etc/multipath.conf
devices {
device {
vendor "XIOTECH "
product "ISE1400 "
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -d /dev/%n"
path_checker tur
prio_callout "none"
path_selector "round-robin 0"
failback immediate
no_path_retry 12
user_friendly_names yes
}
}
Looking at the upstream documentation here this paragraph seems relevant: http://christophe.varoqui.free.fr/usage.html
For each path:
\_ host:channel:id:lun devnode major:minor [path_status][dm_status_if_known]
The dm status (dm_status_if_known) is like the path status
(path_status), but from the kernel's point of view. The dm status has two
states: "failed", which is analogous to "faulty", and "active" which
covers all other path states. Occasionally, the path state and the
dm state of a device will temporarily not agree.
Its been well over 24 hours for me so its not temporary.
So with all that as background my questions are
- how can I determine the root cause here?
- how can I manually/command-line perform whatever check its doing
- why is it ignoring my multipath.conf (did I do it wrong?)
Thanks in advance for any ideas, if there's anything else I can provide for info let me know in a comment and I'll edit it into the post.
There's a subtle bug in your multipath.conf, vendor and product are matching at the regexp level, that you've added a series of leading spaces is causing multipathd to fail to match your configuration with the actual devices on the system. If you were to examine the output of
echo 'show config' | multipathd -k
you would find two device sections for your SAN, one that matches all the extra spaces you added, and the default config (should it exist) provided by internal database.Adjust your multipath.conf to look like this:
SCSI Inquiry expects a vendor field that is no greater than 8 characters terminated by an ASCII Zero, if you don't use all 8 you must pad the field with spaces to reach 8 characters. Multipathd is interpreting the spec to the letter of the law, you could have also done
"XIOTECH.*"
if you really want to be sure.Once you make these changes, stop multipathd using your initscripts, multipath -F which will flush your config and then start multipathd again. Your config file should be honored now. If you still have problems, reboot.
If there's ever a doubt that your config file isn't being honored, always examine the running config using the echo incantation and compare what's loaded in the database to your config file.