In a previous message I have asked how to rebuild a faulty disk in a RAID 5 array with 4 disks. I have mounted a new drive (drive 4) in place of the faulty one and started a rebuild. During the rebuild, another disk (drive 2) started throwing ECC errors and timeouts. AT 95% of the rebuild process, the computer rebooted and hang at the start screen, with the controller (3ware 9500s) showing an error (drive 2 not found) and a typical noise coming from the faulty drive (drive 2), could be heard. I have turned off & on the PC few times, no changes. Then I have left the PC off for an hour. Turned on again, his time the missing drive (drive 2) was back in place. I could bot the operating system awaiting for the rebuild, started automatically from the controller. At a certain point, the controller started gave a rebuild error and halted the rebuild process. The server is now running with drive 2 with errors and drive 4 with a OK status, but degraded as the rebuild process could not complete. It looks like I'm at a dead end: at least 3 drives need to be ok to make things good, however one drive has errors and one drive is not rebuild.. What can I try?
Your best bet is to restore from backups. But I'm guessing you don't have those, or you wouldn't be asking the question.
So, failing backups, your next best bet is to copy as much of the data off as possible (from the sounds of things you'll have at least a couple unreadable sectors that won't be copyable) with whatever method you favor - file copy, disk image, disk-level copy, etc. Then once you have your data, you can replace the faulty drives, create a new RAID array and copy your data back.
Failing that, you can go through the expensive process of professional data recovery or just accepting your data loss and moving on, depending on how much your data is worth to you.
The easiest thing would be to restore from backup. But you're probably asking this question because you don't have one. In that case you are going to call a disk drive recovery center and see what they can do for you.
When you finally get this rebuilt you'll learn the real value of a backup system that works.
Can you show the output of
twcli /c0 show all
?If drive 2 is in
ECC-ERROR
state, you can possibly continue the rebuild by telling the controller to ignore the ECC errors on drive 2.@Sergey Vasilov's answer in this thread What does 3Ware's tw_cli mean by a "DEGRADED" disk vs "ECC-ERROR"? has the right information. (I used to know this offhand, but had to look up the commands, and Sergey's answer had the first hit in a google search so I'll give him the credit). Because it's always better to actually quote the answer:
Even if this lets you rebuild the array, you may still have filesystem corruption, or dataloss. Or you may not.
@Daniel this is the output from tw_cli