I've got 4x 500GB drives in software RAID. /dev/md0 is RAID 1 and mounted to /boot /dev/md1 is RAID 10 and is swap /dev/md2 is RAID 10 and is the main system and data device
I looked at mdadm this evening and noticed on md2...
State : clean, degraded
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 1 removed
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
Checking md0 and md1 all drives are shown as active sync and the device state as clean.
Here's the full outputs from mdadm for each device and also the output from /proc/mdstat http://pastebin.com/VL0uYdU9
So it looks like /dev/sdb1 and /dev/sdb2 are functioning in /dev/md0 and /dev/md1 respectively. But /dev/sdb3 has dropped out (apparently it's been removed) from /dev/md2
With RAID 10 I believe the data is ok unless I lose the other drive on the opposite side of the mirror. I am of course backing up to an external device and have verified that these are stable.
I've done some log grepping and noticed this pair of log lines...
Dec 9 04:25:37 hostname smartd[3199]: Device: /dev/sdb, 1 Currently unreadable (pending) sectors
Dec 9 04:25:37 hostname smartd[3199]: Device: /dev/sdb, 1 Offline uncorrectable sectors
Repeating every 30 minutes. It appears this has been the case for a while and it looks like the drive has failed a SMART data check.
On Jan 7th an idiot user rebooted the server, thinking it would solve a mail relay problem.
Here's the the boot from /var/log/messages... http://pastebin.com/jGVsDD54
Why do /dev/sdb1 and /dev/sdb2 appear to be functioning ok and just /dev/sdb3 failed?
Just a particular failed sector that happens to be on sdb3?
Is it worth attempting to re-add this partition to the md2 array?
Or should I bin the drive and replace with a fresh drive?
A SMART failure indicates that an overall drive failure is imminent (the timeframe is impossible to predict, however); replace this drive ASAP.