i had trying to add new HDD in place of Falty HDD. but new HDD can not sync with old one .sync process shown up to 30 % after that its stopped .
cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[2](S)
1458319504 blocks super 1.0 [2/1] [U_]
md1 : active raid1 sda2[3] sdb2[2]
524276 blocks super 1.0 [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[2]
6291444 blocks super 1.0 [2/2] [UU]
md0 and md1 sync successfully , but md2 can not
this is detail
mdadm --detail /dev/md2
/dev/md2:
Version : 1.0
Creation Time : Fri May 24 11:22:21 2013
Raid Level : raid1
Array Size : 1458319504 (1390.76 GiB 1493.32 GB)
Used Dev Size : 1458319504 (1390.76 GiB 1493.32 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Mon Aug 4 22:08:23 2014
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Name : rescue:2 (local to host rescue)
UUID : 96b46a6c:f520938c:f94879df:27851e8a
Events : 616
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 1 removed
2 8 19 - spare /dev/sdb3
is that any solution . i want to backup my data
The mdadm switch "grow" should pull the spare into the array. Something like "#mdadm --grow /dev/sdb3 --raid-devices=3" If that fails, I'd tail syslog to find out why.
This should do the job,
/dev/sdb3 is still marked as Spare, hence the (S).
If this is not enough you can either: remove it and try re-adding it:
You may want to stop and restart the array:
And your last option would be to force the resync (don't worry, it's not destructive) :
Also, just restarting the array is mostly enough to do the job without further hassle. And there is more: You can even re-create the whole thing with mdadm --create. ;)
Sorry for late arrival. So, I am suprised nobody answered this. There is even a link to a similar problem, but I doubt cables are in play in this case.
You started a sync to a new disk, but when sync went to 30%, the source (the last drive left that has all the data) encountered the read error. In case of read errors Linux MD RAID driver queries reads from other component devices, but in this case there is no synched component device to read from so it gives up. It'll stop sync on first such unrecoverable error and then restart a sync from the start. Of course, pulling spare out and re-adding it won't help. You have to use other ways to complete the sync or otherwise retrieve (slightly corrupted) data in such case.
The system might work perfectly, because this sector may not contain any data so it never tried to read from in during normal operation, but RAID sync is a special case, where it reads everything. We call such cases a silent bad blocks.
The first idea is to force drive to remap the bad block internally. Unfortunanely there is impossible to do this with guarantee, but there is a high chance that if you write this particular sector, it'll get remapped and then read back successfully. To do that, one can use a
hdparm
utility (notice--repair-sector
is a alias for--write-sector
):I deliberately put almost a random number here. That's 896543360/2, where the big number was taken from
dmesg
error message. You have to calculate it yourself for your case. Be extremly careful. I suggest to do a read check (--read-sector
) with the same number, to trigger the same error message and therefore to prove this is indeed the right sector. Note, you will lose anything in this sector, but it is unreadable anyway, so it is already essentially lost, and if it is silent, there was no useful information.Repeat this for all unreadable blocks. You'll need to replace this drive too, when sync is complete.
Other way to help the situation requires service stopping for an extended period of time. You need to stop the faulty RAID and run
ddrescue
from faulty disk to a new disk. After that, you neen first to remove old device completely and start the system from a new disk (with degraded arrays, I know). Then, if it works, add another new disk and complete the sync.In case you wondered, I've happened to do successful repairs both ways.
The lesson here is: just having a RAID is not enough; for data to be safe you need to monitor your array health, scrube it periodically (i.e. perform a read check for all devices and compare — to be sure every block gets read) and, of course, take required actions timely. Hardware RAIDs also have capabilities to set up automatic periodic scrubbing. For each MD RAID, you should do once a month:
(Debian has this by default, AFAIK). So when some disk gets silent unreadable sector, in a month you'll discover that. Then don't forget to replace the dying disk as soon as possible!