So I have a Debian 7 server with 3 hard drives. Its RAID-1 is basically configured this way:
md0: sda1, sdb1 --> / (root) md1: sda5, sdc1 + sdb5 (spare) --> /data (sdc1 is on a SSD, and sda5 is marked 'writemostly').
both sda and sdb have grub installed on them.
When installing an extra network card, I messed up and unplugged sdc's data cable (note that sdc doesnt have GRUB or /, and should have nothing to do with booting).
So the system booted fine after that. I noticed my error, shut down the machine, and plugged sdc back in (while mdadm was rebuilding md1 on the spare).
Now, the system either gave me the dreaded GRUB shell, or just a black screen with a blinking cursor. depending on which hard drive(s) I unplugged. But no combination of hard drives gave me a successful boot. I also tried it with having all 3 drives connected, and telling the BIOS to boot from any of the boot drives manually.
What I did in the end was to boot Debian setup in rescue mode, aassembled the RAID devices, and let them rebuild.
This didn't result in a successful boot.
So I booted rescue mode again, and manually re-installed GRUB on sda and sdb. This fixed my problem.
My question is: what happened here? a) sdc shouldn't affect booting in any way, AFAIK? b) even if something of the RAID rebuilding process I interrupted would affect booting, why did the system not boot after I rebuild the RAID arrays in rescue mode? Why did I have to re-install GRUB on sda and sdb manually, if, as far as I understand, the sectors on the drives that house GRUB dont have anything to do with the RAID arrays?
First off: don't do anything more. By interrupting one rebuild and testing various combinations it's possible that data has been corrupted, destroyed or lost. It's usually best to let one operation complete fully before trying the next step--interruptions introduce uncertainty and confusion, and lost time is usually much better than lost data.
The tact I'd suggest:
Work on one drive at a time.
dd
the full drive to a backup (if available) before writing any changes.With each drive, attempt to mount each partition on it's own without RAID. I believe you need to
mdadm --stop /dev/mdX
to detach it from RAID, and then you can mount it as normal.Find a clean (or the least messed-up) copy of each partition and transfer them to non-RAID drive(s). Once you have restored a bootable non-RAID system you should be able to rebuild your RAID devices. Since you have three drives and only two partitions you should be able to do this without additional disks (except for the
dd
backup--which isn't required, but great to keep from digging any deeper).