I was having Ubuntu 10.04 Server running over a software raid 0. Yesterday, I left it running continuously for 10 hours, when I came back, the computer became weird. I cannot shut it down. It was saying "Bus error" or something similar to that. So I force a shutdown by holding power button for 4 seconds. Then I turn it on back. And here come disaster: The raid was broken. System kept dumping out "Failed command: READ DMA EXT". I tried to run fsck.ext4 /dev/md0 from the Alternate CD rescue mode, but fsck.ext4 then said: "Attempt to read block from filesystem resulted in short read". So I use a Hiren CD and run the hard drive scanner and find 12 bad sectors on second hard drive (and at the very end of the drive: more than 80% from the beginning I recall) The told the software to fix the 12 bad sectors but I doubt if Ubuntu understand the fix.
I again ran the Alternate CD rescue mode again, and did e2fsck /dev/sda but it was saying device or resource is busy.
God and geeks, how come that 12 bad sectors mess up my whole RAID. What should I do to have my RAID and Ubuntu work again?
P/S: Once I get stuff work back, I'll switch to RAID 5. I swear.
RAID 0 has no redundancy so errors will break the entire array. Are you confusing it with RAID 1 (mirrored)?
Can you tell us how your RAID 0 array was set up? I had the impression that it consists of 2 physical drives:
/dev/sda + /dev/sdb
and the resulting device is /dev/md0. Now you are talking about /dev/md1. Does/dev/md0 = /dev/sda1 + /dev/sdb1
and/dev/md1 = /dev/sda2 + /dev/sdb2
? And if so - how you expect to repair the md0 filesystem (which is spread across 2 devices/partitions) when you run it only on one of these devices? This is RAID 0, not 1.-> is it the same "Superblock invalid" error?
This error message is because your RAID daemon is on. In case of RHEL/CentOS you can stop RAID service/daemon by the command:
After stopping RAID, check the file system using fsck -fyC /dev/sda