I've got a server running Debian Squeeze and a 3x 500 GB-drive RAID5 system which I haven't set up myself. When booting, the status of one partition in the RAID-array seems to be bad.
md: bind<sda2>
md: bind<sdc2>
md: bind<sdb2>
md: kicking non-fresh sda2 from array!
md: unbind<sda2>
md: export_rdev(sda2)
raid5: device sdb2 operational as raid disk 1
raid5: device sdc2 operational as raid disk 2
raid5: allocated 3179kB for md1
1: w=1 pa=0 pr=3 m=1 a=2 r=3 op1=0 op2=0
2: w=2 pa=0 pr=3 m=1 a=2 r=3 op1=0 op2=0
raid5: raid level 5 set md1 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
--- rd:3 wd:2
disk 1, o:1, dev:sdb2
disk 2, o:1, dev:sdc2
md1: detected capacity change from 0 to 980206485504
md1: unknown partition table
mdstat
also tells me the partition is missing:
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 sdb2[1] sdc2[2]
957232896 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]
md0 : active raid1 sda1[0] sdc1[2](S) sdb1[1]
9767424 blocks [2/2] [UU]
When running sudo mdadm -D
, the partition shows up as removed, and the array as degraded.
/dev/md1:
Version : 0.90
Creation Time : Mon Jun 30 00:09:01 2008
Raid Level : raid5
Array Size : 957232896 (912.89 GiB 980.21 GB)
Used Dev Size : 478616448 (456.44 GiB 490.10 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Thu Aug 11 16:58:50 2011
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 03205c1c:cef34d5c:5f1c2cc0:8830ac2b
Events : 0.275646
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2
2 8 34 2 active sync /dev/sdc2
/dev/md0:
Version : 0.90
Creation Time : Mon Jun 30 00:08:50 2008
Raid Level : raid1
Array Size : 9767424 (9.31 GiB 10.00 GB)
Used Dev Size : 9767424 (9.31 GiB 10.00 GB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Aug 11 17:21:20 2011
State : active
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
UUID : f824746f:143df641:374de2f8:2f9d2e62
Events : 0.93
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 - spare /dev/sdc1
However, md0
seems to be ok. So, what does all this tell me? Can the disk be faulty even though md0
is working? If not, can I just re-add /dev/sda2
to the md1
array to solve the problem?
Keeping the array working with a broken disk is the exact purpose of a RAID5. It keeps redundancy informations so you can lose one disk and still don't have data loss.
I would recommend to replace the disk as soon as possible because if you lose another disk, all your data will be gone.
The R in RAID stands for Redundant.
RAID 5 is
N+1 redundant
: If you lose one disk you're atN
-- The system will keep operating fine as long as you don't lose another one. If you lose a second disk you are now atN-1
and your universe collapses (or at the very least you lose lots of data).Like SvenW said, replace the disk AS SOON AS POSSIBLE (Follow your distribution's instructions for replacing disks in md RAID arrays, and for God's sake make sure you replace the correct disk! Pulling out one of the active disks will really screw up your day.)
Also be aware that when you replace a disk in a RAID 5 there is a lot of resulting disk activity as the new drive is rebuilt (lots of reads on the old disks, lots of writes on the new one). This has two major implications:
Your system will be slow during the rebuild.
How slow depends on your disks and disk I/O subsystem.
You may lose another disk during/shortly after the rebuild.
(All that disk I/O sometimes triggers enough errors from another drive that the controller declares it "bad").
The chances of #2 increase as you have more disks in your array, and follows the standard "bathtub curve" of hard drive mortality. This is part of why you should have a backup, and one of the many reasons you hear the mantra "RAID is not a backup" repeated so often on ServerFault.
Even though
/dev/sda1
appears to be working fine inmd0
now, the fact that the other partition on the same disk (sda2
) is faulty bodes ill for the health of the drive. I must concur with the other opinions already expressed here: replace thesda
drive immediately.Of course, that means you will need to
mdadm --fail
andmdadm --remove
partitionsda1
from arraymd0
, even though it appears to be fine right now. And when you install the replacement drive, you will need to ensure that its partitions are at least as large as those on the old drive, so that its partitions can be properly added to themd0
andmd1
arrays.