I have two OpenSuSE 11.4 hosts connected to an LSI CTS2600 storage array via SAS. Every time I reboot the hosts, I see in dmesg output like
[ 255.942890] end_request: I/O error, dev sdg, sector 8
[ 256.445301] sd 5:0:1:1: [sdg] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 256.445308] sd 5:0:1:1: [sdg] Sense Key : Illegal Request [current]
[ 256.445315] sd 5:0:1:1: [sdg] <> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1
[ 256.445326] sd 5:0:1:1: [sdg] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
It just so happens that the devices with the reported IO error, are always the devices in the passive path group.
First, I'm wondering: Why does this happen? I assume it has something to do with the system seeing the attached SAS hardware and querying it before the proper device drivers and/or software is loaded, but I'm not positive.
Second, what can I do to stop this from happening? In addition to increasing the boot time, since it will sit there and re-query the device again and again and again, it looks bad in the logs. And kicks off Nagios alerts. And generally is just annoying.
Since I feel like it's related in some fashion to drivers or modules, here's some boot information:
INITRD_MODULES: dm-multipath, mptbase, mpt2sas, mptscsi, mptspi, mptsas, 3w-sas, thermal, ata_generic, processor, fan
MODULES_LOADED_ON_BOOT: drbd, dm-multipath
It looks to me like I've got my bases covered with the INITRD_MODULES, but I'm not sure.
Your array looks to be the OEM's version of a Dell MD3220, right? I have an MD3200i, it's the LFF and iSCSI version.
I had similar errors on the secondary path group, caused by multipath trying to use/check (I'm not sure) all existing paths to the LUN.
I'm not sure that the RDAC SCSI device handler module will help in your case; my Debian host has the following:
Out of the box, it's the only change I needed to get up and running. With lousy performance, which is where a SAS-attached version like yours would have come in handy.