Ping a Specific Port

Question

nip3o

Asked: 2011-08-12 07:20:26 +0800 CST2011-08-12 07:20:26 +0800 CST 2011-08-12 07:20:26 +0800 CST

Degraded partition in RAID5 system

772

I've got a server running Debian Squeeze and a 3x 500 GB-drive RAID5 system which I haven't set up myself. When booting, the status of one partition in the RAID-array seems to be bad.

md: bind<sda2>
md: bind<sdc2>
md: bind<sdb2>
md: kicking non-fresh sda2 from array!
md: unbind<sda2>
md: export_rdev(sda2)
raid5: device sdb2 operational as raid disk 1
raid5: device sdc2 operational as raid disk 2
raid5: allocated 3179kB for md1
1: w=1 pa=0 pr=3 m=1 a=2 r=3 op1=0 op2=0
2: w=2 pa=0 pr=3 m=1 a=2 r=3 op1=0 op2=0
raid5: raid level 5 set md1 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:2
 disk 1, o:1, dev:sdb2
 disk 2, o:1, dev:sdc2
md1: detected capacity change from 0 to 980206485504
 md1: unknown partition table

mdstat also tells me the partition is missing:

Personalities : [raid1] [raid6] [raid5] [raid4] 
md1 : active raid5 sdb2[1] sdc2[2]
      957232896 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid1 sda1[0] sdc1[2](S) sdb1[1]
      9767424 blocks [2/2] [UU]

When running sudo mdadm -D, the partition shows up as removed, and the array as degraded.

/dev/md1:
        Version : 0.90
  Creation Time : Mon Jun 30 00:09:01 2008
     Raid Level : raid5
     Array Size : 957232896 (912.89 GiB 980.21 GB)
  Used Dev Size : 478616448 (456.44 GiB 490.10 GB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Aug 11 16:58:50 2011
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 03205c1c:cef34d5c:5f1c2cc0:8830ac2b
         Events : 0.275646

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2

/dev/md0:
        Version : 0.90
  Creation Time : Mon Jun 30 00:08:50 2008
     Raid Level : raid1
     Array Size : 9767424 (9.31 GiB 10.00 GB)
  Used Dev Size : 9767424 (9.31 GiB 10.00 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Aug 11 17:21:20 2011
          State : active
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

           UUID : f824746f:143df641:374de2f8:2f9d2e62
         Events : 0.93

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

       2       8       33        -      spare   /dev/sdc1

However, md0 seems to be ok. So, what does all this tell me? Can the disk be faulty even though md0 is working? If not, can I just re-add /dev/sda2 to the md1 array to solve the problem?

3 Answers

Voted

Sven · Answer 1 · 2011-08-12T07:25:54+08:00

Sven

2011-08-12T07:25:54+08:002011-08-12T07:25:54+08:00

Keeping the array working with a broken disk is the exact purpose of a RAID5. It keeps redundancy informations so you can lose one disk and still don't have data loss.

I would recommend to replace the disk as soon as possible because if you lose another disk, all your data will be gone.

3

voretaq7 · Answer 2 · 2011-08-12T07:44:28+08:00

The R in RAID stands for Redundant.

RAID 5 is N+1 redundant: If you lose one disk you're at N -- The system will keep operating fine as long as you don't lose another one. If you lose a second disk you are now at N-1 and your universe collapses (or at the very least you lose lots of data).

Like SvenW said, replace the disk AS SOON AS POSSIBLE (Follow your distribution's instructions for replacing disks in md RAID arrays, and for God's sake make sure you replace the correct disk! Pulling out one of the active disks will really screw up your day.)
Also be aware that when you replace a disk in a RAID 5 there is a lot of resulting disk activity as the new drive is rebuilt (lots of reads on the old disks, lots of writes on the new one). This has two major implications:

Your system will be slow during the rebuild.
How slow depends on your disks and disk I/O subsystem.
You may lose another disk during/shortly after the rebuild.
(All that disk I/O sometimes triggers enough errors from another drive that the controller declares it "bad").

The chances of #2 increase as you have more disks in your array, and follows the standard "bathtub curve" of hard drive mortality. This is part of why you should have a backup, and one of the many reasons you hear the mantra "RAID is not a backup" repeated so often on ServerFault.

Steven Monday · Answer 3 · 2011-08-12T07:59:59+08:00

Steven Monday

2011-08-12T07:59:59+08:002011-08-12T07:59:59+08:00

Even though /dev/sda1 appears to be working fine in md0 now, the fact that the other partition on the same disk (sda2) is faulty bodes ill for the health of the drive. I must concur with the other opinions already expressed here: replace the sda drive immediately.

Of course, that means you will need to mdadm --fail and mdadm --remove partition sda1 from array md0, even though it appears to be fine right now. And when you install the replacement drive, you will need to ensure that its partitions are at least as large as those on the old drive, so that its partitions can be properly added to the md0 and md1 arrays.

1

Degraded partition in RAID5 system

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?