Last weekend I had to replace one harddisc in a RAID 1 on a PERC/5i controller because it reported it as failed.
After replacing the disc it worked for 5 days and then the controller started complaining about the new disc:
# megaclisas-status
-- Controller informations --
-- ID | Model
c0 | PERC 5/i Adapter
-- Arrays informations --
-- ID | Type | Size | Status | InProgress
c0u0 | RAID1 | 465G | Degraded | None
-- Disks informations
-- ID | Model | Status
c0u0p0 | WD-WMAYP4753240WDC WD5003ABYX-01WERA1 01.01S02 | Failed
c0u0p1 | S13TJ1KQ503997 SAMSUNG HD502IJ 1AA01110 | Online, Spun Up
There is at least one disk/array in a NOT OPTIMAL state.
So now I'm a bit suspicious about the controller, it's hard to believe that such a new disc fails after so little uptime, or is it? What can I do to diagnose the source of the problem? And is there a way to just reset the status the controller thinks the disc is in?
If multiple drives keep failing in the same slot then its most likely to be the backplane that they are connected to, or possibly a physical fault in the sockets for that particular slot. Can you use a different slot on the backplane?
Note that it still could be the hard disks - if they've not been stored or transported right they can fail even if brand new.
Even if its possible to "reset" the status of the controller to clear this error somehow, why would you want to do that without being sure you've eliminated the root cause of the problems? The whole point of using mirroring is that you can have some kind of trust that the RAID members will be consistent and you've got evidence that this isn't likely here.
I had a similar issue, and it came down to changing the SCSI cable which connected the RAID card to the drive bay. I replaced the card but that didn't solve the problem. Have a look at the cable. HTH