Ping a Specific Port

Question

rob

Asked: 2012-11-16 09:38:23 +0800 CST2012-11-16 09:38:23 +0800 CST 2012-11-16 09:38:23 +0800 CST

How do I find out which disk in a multi-disk mdadm RAID1 triggered a rebuild?

772

I'm using mdadm for several RAID1 mirrors. md7 is an N-way mirror consisting of 3 spinning disks (all flagged write-mostly) and an SSD:

md7 : active raid1 sdd1[0] sde5[3](W) sdf5[4](W) sdc1[1](W)
      234428416 blocks [4/4] [UUUU]

md6 : active raid1 sdf6[0] sde6[1]
      1220988096 blocks [2/2] [UU]

md2 : active raid1 sdb6[0] sda6[1]
      282229824 blocks [2/2] [UU]

md1 : active raid1 sdb2[0] sda2[1]
      19534976 blocks [2/2] [UU]

md0 : active raid1 sdb1[0] sda1[1]
      192640 blocks [2/2] [UU]

The entire system has hung 3 times in the past 2 weeks, requiring a hard reset. For the time being, I'm going to assume the system hang is unrelated to my md issue, although I can't completely discount that possibility. Each time we've rebooted, md7 has required a rebuild, but I can't figure out how to tell from the logs which disk triggered the rebuild. I thought iostat might be able to help me while the RAID was still rebuilding:

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              43.39      1038.34       558.83     223108     120075
sdb              66.88      1445.47       648.86     310588     139420
sdc              36.42        12.99     22256.81       2792    4782320
sdd             190.75     23227.78       331.14    4990954      71152
md0               2.11        21.39         0.23       4596         50
md1             173.72      1855.87       522.14     398770     112192
md2              11.68        65.84        27.59      14146       5928
md6              27.42       149.83        69.51      32194      14936
sde              75.83        70.81     22326.91      15214    4797384
sdf              79.31        99.41     22326.91      21360    4797384
sr0               0.04         2.61         0.00        560          0
md7             202.31      1287.41       331.07     276626      71136

...but it looks to me like md7 is using sdd to rebuild all the other disks in that RAID. I thought maybe this was simply because sdd is an SSD and all the other disks are marked write-mostly, but in that case, it should only rebuild the one disk that was out of sync (unless all the spinning disks just happened to be out of sync, which seems unlikely to me).

Another theory that I have is that all the spinning disks are always out of sync upon reboot simply because the SSD's writes are so fast that it has time to finish writing a block while the others are still writing, then the system just happens to lock up before the other disks finish writing that block?

So, how do I tell which disk(s) triggered the resync? Is the fact that I have an n-way mirror with mixed SSD and spinning disks possibly responsible for the fact that all the spinning disks are always rebuilt after one of these freezes, or does the md driver guarantee that a block isn't considered written on one disk until it's successfully written on all disks?

2 Answers

Voted

Michael Slade · Answer 1 · 2012-11-16T09:54:02+08:00

Michael Slade

2012-11-16T09:54:02+08:002012-11-16T09:54:02+08:00

I understand that (at least linux) raid works something like a filesystem for these purposes - if the system crashes while it's in use, it will need to be checked on reboot. So the cause of your system's crashes may not be any disks in the array.

1

kiko · Answer 2 · 2014-11-26T09:47:17+08:00

As Michael points out above, the hangs and consequent unclean shutdown are the reason you are seeing your RAID rebuild. The kernel md driver rebuilds unclean arrays in order to ensure they are truly in sync, since a hang, or crash or powerloss won't guarantee which writes actually got flushed out to disk.

Now, as to why sdd is getting used, the first thing to understand is that in an unclean shutdown, the actual array, as opposed to an individual member device, is marked dirty. In the manpage I linked above, the following is said about RAID-1:

If the md driver finds an array to be dirty at startup, it proceeds to correct any possibly inconsistency. For RAID1, this involves copying the contents of the first drive onto all other drives.

In your example, the md7 array has partitions on drives sdc, sdd, sde & sdf, but if you look at your mdstat output:

md7 : active raid1 sdd1[0] sde53 sdf54 sdc11

note how the first partition, marked with a [0], is on sdd, namely, sdd1. That's the reason sdd is being used -- it's the first drive in md7.

How do I find out which disk in a multi-disk mdadm RAID1 triggered a rebuild?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?