I have (had?) a RAID5 with 3 devices. One of them died, and after some days I experienced that the RAID stopped at all. However, I could restart it without experiencing any problems, but it stopped again after a few hours, I restarted it and it stopped again after some moments and so on. Since one month, the RAID can't start anymore. (Within the last month, I didn't do anything with the RAID as I hadn't time for it.)
I don't know if this is a hardware failure (of the drives) or "just" a defective contact of the power cable, since I had problems with this one year ago. I'm currently hoping for "just" a defective contact. The RAID holds mainly data of which I have a backup, however, the backup doesn't hold changes made within a month or so.
I found this blog post about recovering from a RAID5 with two failed disks. It describes a similar problem that I (hope to) have: The drives (or, at least one of the two failed ones) isn't really defective but only has been detached from the computer. Their approach is to re-create a RAID5 using all but the first failed device.
In my case, I have three disks, one of them is dead. So I have only two left: /dev/sda1 and /dev/sdc1, while the latter is the one which has been "detached" (at least, I hope it is not dead). So I hope to get the most important information from examining this device:
sudo mdadm --examine /dev/sdc1
Magic : a92b4efc
Version : 0.90.00
UUID : 83cb326b:8da61825:203b04db:473acb55 (local to host sebastian)
Creation Time : Wed Jul 28 03:52:54 2010
Raid Level : raid5
Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
Array Size : 1465143808 (1397.27 GiB 1500.31 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 127
Update Time : Tue Oct 23 19:19:10 2012
State : clean
Internal Bitmap : present
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Checksum : eaa3f133 - correct
Events : 523908
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1
0 0 8 1 0 active sync /dev/sda1
1 1 8 33 1 active sync /dev/sdc1
2 2 0 0 2 faulty removed
So it was October 23rd when the RAID stopped working at all.
Now I want to recover using the two devices with the command
sudo mdadm --verbose --create /dev/md127 --chunk=64 --level=5 --raid-devices=3 /dev/sda1 /dev/sdc1 missing
I hope someone can tell me if this is the correct command to use. I'm very nervous... It is telling me to confirm the following data about the drives to use to re-create the array:
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: /dev/sda1 appears to contain an ext2fs file system
size=1465143808K mtime=Tue Oct 23 14:39:49 2012
mdadm: /dev/sda1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Wed Jul 28 03:52:54 2010
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Wed Jul 28 03:52:54 2010
mdadm: size set to 732570816K
Continue creating array?
Additional info: I once created the array with 3 * 750GB drives, so the file system is 1.5TB (ext2). In particular, I wonder if the line telling that /dev/sda1 contains a 1.5TB ext2 file system is correct, because in the blog post linked above, their output doesn't show such a line...
I also wonder if I should zero the superblock on any device first...
Are there any checks I can do to confirm that this will most probably not kill something totally for which there exists a chance to recover from?
0 Answers