Ping a Specific Port

Question

0x6A75616E

Asked: 2010-06-03 09:23:39 +0800 CST2010-06-03 09:23:39 +0800 CST 2010-06-03 09:23:39 +0800 CST

mdadm raid1 fails to resync

772

I'm trying to solve this problem I'm having with an mdadm raid1.

I have an ubuntu 9.04 server running on a software 2-drive raid1 with mdadm. Yesterday, one of the drives failed, and so I replaced it with a brand new drive of the same size. I removed the faulty drive, copied the partition from the remaining good drive to the new drive and then added it to the raid. It re-synced and the system worked fine, until the drive that hadn't failed, was also labeled failed.

Now I had the raid running solely on the new drive. So I purchased another drive and repeated the procedure above. So now I had 2 brand new drives and the raid was syncing. However, after a few minutes I checked /proc/mdstat and the raid was no longer syncing.

mdadm --detail /dev/md1 shows: (sdb is the first new drive, and sdc is the second new drive)

root@dola:/home/jjaramillo# mdadm --detail /dev/md1 /dev/md1: Version : 00.90 Creation Time : Sat Dec 20 00:42:05 2008 Raid Level : raid1 Array Size : 974711680 (929.56 GiB 998.10 GB) Used Dev Size : 974711680 (929.56 GiB 998.10 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent

Update Time : Wed Jun  2 10:09:35 2010
      State : clean, degraded

Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1

       UUID : bba497c6:5029ba0b:bfa4f887:c0dc8f3d
     Events : 0.5395594

Number   Major   Minor   RaidDevice State
   2       8       35        0      spare rebuilding   /dev/sdc3
   1       8       19        1      active sync   /dev/sdb3

I've tried removing and re-adding the drive a few times, but the same thing happens. The raid fails to resync. I've looked at /var/log/messages, and found the following:

Jun 2 07:57:36 dola kernel: [35708.917337] sd 5:0:0:0: [sdb] Unhandled sense code Jun 2 07:57:36 dola kernel: [35708.917339] sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 2 07:57:36 dola kernel: [35708.917342] sd 5:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor] Jun 2 07:57:36 dola kernel: [35708.917346] Descriptor sense data with sense descriptors (in hex): Jun 2 07:57:36 dola kernel: [35708.917348] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 2 07:57:36 dola kernel: [35708.917357] 00 43 9e 47 Jun 2 07:57:36 dola kernel: [35708.917360] sd 5:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed

So it looks like there's some kind of error on sdb (the first new drive). My question is, what would be the best approach to get the raid up and running again? I've thought about dd'ing the /dev/md1 to a blank hard drive, then re-doing the raid from scratch and loading the data back, but there could be an easier solution..

Any help would be appreciated.

3 Answers

Voted

Trevor Harrison · Answer 1 · 2010-06-03T09:37:42+08:00

Trevor Harrison

2010-06-03T09:37:42+08:002010-06-03T09:37:42+08:00

RE:

I removed the faulty drive, copied the partition from the remaining good drive to the new drive and then added it to the raid.

You shouldn't be copying partitions on your own.

The only thing you should have to do is put the new drive into your system, and use mdadm to add it to your raid group.

If you really did do a copy (ie. a dd if=/dev/good_disk of=/dev/new_disk), you probably wound up copying raid UUIDs or something that let mdadm know which disk is which, and then it gets confused.

1

AndreasM · Answer 2 · 2010-07-17T05:27:23+08:00

AndreasM

2010-07-17T05:27:23+08:002010-07-17T05:27:23+08:00

Install the new hd, partition it like Tom O'Connor suggested and then use mdadm to repair the array. See the man page of mdadm under "For Manage mode:", the --add option:

mdadm /dev/md0 --add /dev/sda1

You may have to "--fail" the first replacement drive first.

1

tylerl · Answer 3 · 2010-10-05T02:09:16+08:00

Best Answer

tylerl

2010-10-05T02:09:16+08:002010-10-05T02:09:16+08:00

You shouldn't attempt to prepare the new drive in any meaningful way unless your raid constituents are actually disk PARTITIONS not disks themselves. In which case, you would create a partition on the new drive that is the same size as the one on the remaining active disk.

You never need to touch the old drive at all -- it's assumed to be failed and unreliable.

The correct procedure is to remove the broken drive, add a new, empty drive, and then use mdadm to add that new drive to the array. You'd do it something like this:

mdadm --add /dev/md0 /dev/<newdrive>

The kernel will then sync the new drive into the array, copying the data from the one remaining good drive.

0

mdadm raid1 fails to resync

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?