In my central syslog I can see a some instances ofo the following error from LSI's RDAC multi-pathing driver for Linux.
[RAIDarray.mpp]MY_NICE_STORAGE_ARRAY:1:0:7 Cmnd-failed try alt ctrl 0. vcmnd SN 2436 pdev H1:C0:T0:L7 0x05/0x94/0x01 0x08000002 mpp_status:1
also some instances of
[RAIDarray.mpp]MY_NICE_STORAGE_ARRAY:1:0:10 Illegal Request ASC/ASCQ 0x20/0x0, SKSBs 0x0/0x0/0x0
followed by
[RAIDarray.mpp]MY_NICE_STORAGE_ARRAY:1:0:10 IO FAILURE. vcmnd SN 887 pdev H2:C0:T0:L10 0x05/0x20/0x00 0x08000002 mpp_status:1
I get it from nearly all of my machines in the SAN during the day, but not all of them at once - usually one of them in 5 hours. All FC switches and all FC HBAs show no errors from today and all paths to any LUN are up when i check them. Performance (IOPS and sequential access) is also very fine. Anyone seen this?
Well ASC/ASCQ 0x20/0x0 translates to INVALID COMMAND OPERATION CODE which might as well have been "INVALID FIELD IN CDB" e.g. this cmd is not supported at this target. What we don't know is what the command actually is that caused this fallout. Turning on verbose debug for this proprietary MP driver might help.
The vendor specific multipath driver messages isn't helping much:
The 0x02 is the status byte set to CHECK CONDITION, which means we have a problem, the driver byte is 0x08 which can be anything the vendor wants iirc. I don't know what 0x05/0x94/0x01 stands for, ask support.
Seeing that this is SAN wide, and assuming you're running the same LSI RDAC MP tool on all of them, I would concentrate my efforts on an LSI MP bug or a SAN configuration problem. I would also look into any clustering configurations and make sure they haven't been switched on by accident.
Since you're using the LSI mpath driver, you should really start with their support and take it from there. It's important to keep perspective here, as in so far this message hasn't resulted in any fatal or detrimental behavior, that you've measured so far. Keep that in mind if/when support asks you to reassemble your SAN ;).