About a month and a half ago I noticed that the two hard drives in a (OpenSuSE 11.3) server were dying. Guessing based off the SMART data, I replaced /dev/sdb first; to do this I removed the drive from the array, shut down the server, replaced the drive, rebooted, and added the new drive into the array. So far, so good. IIRC, I also installed GRUB onto this drive. Then I started on replacing and rebuilding /dev/sda. I can't remember if I shut down or not to replace /dev/sda (drives are in hot swap carriers/bays), but on the first reboot I was having all kinds of GRUB troubles that was preventing from booting into the OS. I finally got it booted, but noticed some strange behavior. For example, according to /proc/mdstat only one drive is active in the array:
openvpn01:/home/Kendall # cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb3[2]
20972784 blocks super 1.0 [2/1] [_U]
bitmap: 1/161 pages [4KB], 64KB chunk
md1 : active raid1 sdb2[2]
5245208 blocks super 1.0 [2/1] [_U]
bitmap: 2/11 pages [8KB], 256KB chunk
md0 : active raid1 sdb1[2]
1052212 blocks super 1.0 [2/1] [_U]
bitmap: 0/9 pages [0KB], 64KB chunk
unused devices: <none>
Hm, alright, so I attempt to add /dev/sda back into the array:
mdadm --manage /dev/md0 --add /dev/sda1
mdadm: add new device failed for /dev/sda1 as 3: Device or resource busy
That's odd...but notice what lsof shows us:
openvpn01:/home/Kendall # lsof /dev/sda3 | head -15
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,3 4096 128 /
init 1 root rtd DIR 8,3 4096 128 /
init 1 root txt REG 8,3 39468 404103 /sbin/init
init 1 root mem REG 8,3 91812 33849572 /lib/libaudit.so.1.0.0
init 1 root mem REG 8,3 17392 33648690 /lib/libdl-2.11.2.so
init 1 root mem REG 8,3 1674953 33683537 /lib/libc-2.11.2.so
init 1 root mem REG 8,3 55024 33994082 /lib/libpam.so.0.82.2
init 1 root mem REG 8,3 120868 33828745 /lib/libselinux.so.1
init 1 root mem REG 8,3 143978 33683531 /lib/ld-2.11.2.so
kthreadd 2 root cwd DIR 8,3 4096 128 /
...so it appears that the root file system is actually running from /dev/sda3. In the MD-RAID setup, md2 is the root FS array, and /dev/sd[ab]3 are the partitions in the array. Looking at the mounts
openvpn01:/home/Kendall # cat /proc/mounts
/dev/sda3 / xfs rw,relatime,attr2,noquota 0 0
/dev/md1 /boot ext4 rw,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
/dev/sda3 definitely has the root filesystem mounted, but yet /boot is using the array.
Additionally, when I go into the Boot Loader Configuration screen through yast2 and look at the boot loader details, it still has the old drives under "Disk order settings" (I know this via the serial numbers).
Basically, now I am worried about the array. The OS thinks there is only one drive in the array, and that's NOT the drive that the root filesystem is mounted on! I'm planning on trying to fix the remaining GRUB issues in the next few days, but then I am worried about what will happen to the data on the array; basically, will it be able to rebuild itself without hosing any/all of my data ?
Hopefully I've provided enough details; if not please comment and I'll add whatsoever is found necessary.
Thanks,
Kendall
What probably happened was that your initrd decided to use
/dev/sda3
as the root filesystem but when it was building the MD arrays,/dev/sdb3
had a later modification time than/dev/sda3
and was used to back the arrays.# mdadm --examine /dev/sd??
and# mdadm --detail /dev/md?
may give you some additional clues as to what's going on.The safest route would be just doing a backup and rebuild from a live CD as Zoredache recommends. Make sure you back up both /dev/sda? and the MD arrays - one or both may have more recent data than the other.