The simple question: how does initramfs know how to assemble mdadm RAID arrays at startup?
My problem: I boot my server and get:
Gave up waiting for root device.
ALERT! /dev/disk/by-uuid/[UUID] does not exist. Dropping to a shell!
This happens because /dev/md0 (which is /boot, RAID 1) and /dev/md1 (which is /, RAID 5) are not being assembled correctly. What I get is /dev/md0 isn't assembled at all. /dev/md1 is assembled, but instead of using /dev/sda2, /dev/sdb2, /dev/sdc2, and /dev/sdd2, it uses /dev/sda, /dev/sdb, /dev/sdc, /dev/sdd.
To fix this and boot my server I do:
$(initramfs) mdadm --stop /dev/md1
$(initramfs) mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
$(initramfs) mdadm --assemble /dev/md1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
$(initramfs) exit
And it boots properly and everything works. Now I just need the RAID arrays to assemble properly at boot so I don't have to manually assemble them. I've checked /etc/mdadm/mdadm.conf and the UUIDs of the two arrays listed in that file match the UUIDs from $ mdadm --detail /dev/md[0,1]
.
Other details: Ubuntu 10.10, GRUB2, mdadm 2.6.7.1
UPDATE: I have a feeling it has to do with superblocks. $ mdadm --examine /dev/sda
outputs the same thing as $ mdadm --examine /dev/sda2
. $ mdadm --examine /dev/sda1
seems to be fine because it outputs information about /dev/md0
. I don't know if this is the problem or not, but it seems to fit with /dev/md1
getting assembled with /dev/sd[abcd]
instead of /dev/sd[abcd]2
.
I tried zeroing the superblock on /dev/sd[abcd]
. This removed the superblock from /dev/sd[abcd]2
as well and prevented me from being able to assemble /dev/md1
at all. I had to $ mdadm --create
to get it back. This also put the super blocks back to the way they were.
Well looking at the scripts used to assemble the initramfs, I'm thinking the problem is probably just that your /etc/mdadm/mdadm.conf is out of date.
When your system's up with the arrays assemble, execute the following command to update your mdadm config. You may want to double check it just in case as well.
Once done, update your initramfs with :
If this consistently fails, then your superblocks ( the metadata used to assemble the arrays ) may be shot. You may want to examine each of your drives and their partitions to verify. Worse case, zero out out superblocks via mdadm and recreate.
It sounds like your initramfs was created when your RAID setup was wrong (or just different to now) and hasn't been updated since.
You could run
update-initramfs
(which is normally run after kernel updates) and hopefully this will rebuild your initramfs file, including building in the right raid configuration files.Here's a workaround I came up with:
Add this script to
/etc/initramfs-tools/scripts/local-top
:This fixes the RAID arrays before the system tries to mount
md1
to/root
. I had to add the pauses in in order to get the commands to work consistently.This doesn't actually fix the problem, but it's the best solution I've found that doesn't require changing the RAID arrays or upgrading software.
I have the same problem, and found this link that explains why it happens: https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/599515 seems that your sda2 partition goes all the way to the end of the disk and overwrites the disk superblock, so that sda and sda2 are the same thing to mdadm and it ends up assembling md1 with sda instead of sda2
To answer the question: yes, it does have to do with superblocks. The technical documentation is here: https://raid.wiki.kernel.org/index.php/RAID_superblock_formats
Are /dev/sd[abcd]2 set as type "fd" (RAID auto-detect) in the partition table? Run
fdisk -l | less
to see the partition tables. It sounds like the initrd is not detecting the partitions, but then on the raw device it is seeing the superblock. Or it could be that there's an incorrect mdadm.conf on the initrd, but I would expect thatupdate-initramfs
would fix that.You can extract the initrd by creating a directory,
cd
into it and then run:Then you can see all the files that make up the initrd and any scripts that it's running. Investigating these may help track down what exactly is causing it.
But first check the partition tables...
Similar problem with RAID + LVM on a Debian Lenny box. Before exiting the initramfs shell do :
then