I have a machine running Arch x86_64 2.6.30.
It's root is set on a raid5 array automatically mounted under /dev/md0.
This is done via the kernel params as follows:
kernel /vmlinuz26 md=0,/dev/sda3,/dev/sdb3,/dev/sdc3 md=2,/dev/sda2,/dev/sdb2,/dev/sdc2 rootfstype=ext4 root=/dev/md0 ro
This used to work fine, however sometimes, it will fail to assemble the md0 array.
What happens is that everything goes normal, the kernel probes for block IDs, it finds matches, then waits for the array to assemble 10 seconds.
This is usually instantly done, however sometimes it waits 10 seconds, after which it timeouts and drops to a recovery console which I can't use because it doesn't accept any input. I think this may be because the only keyboards I have available are USB keyboards (even tho the keyboard works in the GRUB menu).
When this happens, I just have to reboot and the array will mount just fine.
Btw, this happens around 30% of the time.
It happens after clean shutdowns.
It can happen more than once in a row.
Since it fails to mount the rootfs, it can't write in any logs.
Has anyone ever seen something like this?
Any ideas why it might be happening?
without even a screenshot of the error, I think nobody can say sure. The best thing you can do is to introduce initrd to your system and convert your boot process to use UUID instead of device names (which are subject of change, for example if you forgot your pendrive in any of the usb slots).
As you're raiding partitions you may want to check that they're all marked as Linux Raid Auto-detect partitions (Type FD) rather than Linux (Type 83). As long as you don't change partition size, you can safely modify the type with your partition editor of choice. If you're feeling particularly paranoid, just do one partition at a time so you don't nuke the whole raid.
Your kernel is sufficiently modern that as long as it has the appropriate mdadm module either built in or supplied in an initrd it will scan all of the available partitions. If it finds an FD type partition, it'll check for a mdadm super-block. All partitions with the same raid id will automatically be reassembled without your having to explicitly set it at boot.
You can of keep defining it too, but I'd start by checking the partition types are all FD and then drop the raid definition and see if it helps.
I'm assuming that it might just be that your kernel could be re-ordering the drive names, as it usually just assigns the names in the first come, first serve manner without something like udev to do the renaming.
My suggestion would be using an initrd which comes with udev, so you can ensure that the drives are named correctly. It would be even more failsafe if you use udev (or something similar, custom-built) and then just mount the drives via /dev/disk/by-uuid/, as they usually don't change.