Ping a Specific Port

Question

Kevin

Asked: 2012-09-26 23:01:44 +0800 CST2012-09-26 23:01:44 +0800 CST 2012-09-26 23:01:44 +0800 CST

How ZFS handles online replacement in a RAID-Z (theoretical)

772

This is a somewhat theoretical question about ZFS and RAID-Z. I'll use a three disk single-parity array as an example for clarity, but the problem can be extended to any number of disks and any parity.

Suppose we have disks A, B, and C in the pool, and that it is clean.

Suppose now that we physically add disk D with the intention of replacing disk C, and that disk C is still functioning correctly and is only being replaced out of preventive maintenance. Some admins might just yank C and install D, which is a little more organized as devices need not change IDs - however this does leave the array degraded temporarily and so for this example suppose we install D without offlining or removing C. Solaris docs indicate that we can replace a disk without first offlining it, using a command such as:

zpool replace pool C D

This should cause a resilvering onto D. Let us say that resilvering proceeds "downwards" along a "cursor." (I don't know the actual terminology used in the internal implementation.)

Suppose now that midways through the resilvering, disk A fails. In theory, this should be recoverable, as above the cursor B and D contain sufficient parity and below the cursor B and C contain sufficient parity. However, whether or not this is actually recoverable depnds upon internal design decisions in ZFS which I am not aware of (and which the manual doesn't say in certain terms).

If ZFS continues to send writes to C below the cursor, then we are fine. If, however, ZFS internally treats C as though it were gone, resilvering D only from parity between A and B and only writing A and B below the cursor, then we're toast.

Some experimenting could answer this question but I was hoping maybe someone on here already knows which way ZFS handles this situation. Thank you in advance for any insight!

3 Answers

Voted

USD Matt · Answer 1 · 2012-09-27T04:52:50+08:00

Best Answer

USD Matt

2012-09-27T04:52:50+08:002012-09-27T04:52:50+08:00

Testing with a file based pool (v28 on FreeBSD 8.3 using file-backed md devices) suggests that it should work. I was able to offline one of the remaining disks while the resilver was in progress. Ideally it'd need testing with real disks and to actually pull one to be 100% sure but ZFS was perfectly happy to let me offline the disk.

Before offlining md0, the pool was still fully ONLINE so it appears to me that ZFS is just mirroring the replaced disk to the new disk, but still treating the whole lot as available during the process.

    NAME                     STATE     READ WRITE CKSUM
    test                     DEGRADED     0     0     0
      raidz1-0               DEGRADED     0     0     0
        8480467682579886773  OFFLINE      0     0     0  was /dev/md0
        md1                  ONLINE       0     0     0
        replacing-2          ONLINE       0     0     0
          md2                ONLINE       0     0     0
          md3                ONLINE       0     0     0  (resilvering)

6

Chris S · Answer 2 · 2012-09-27T06:35:44+08:00

Chris S

2012-09-27T06:35:44+08:002012-09-27T06:35:44+08:00

Disk C is still used in the RAIDZ exactly as it had been until it is removed from the VDev. As Matt points out, ZFS replaces a disk by making the replacement disk a mirror of the replacee, and resilvering the replacement disk. The RAIDZ VDev is never degraded, and never resilvered (until A fails, which is entirely separate from the replacement operation).

2

ewwhite · Answer 3 · 2012-09-27T04:09:43+08:00

ewwhite

2012-09-27T04:09:43+08:002012-09-27T04:09:43+08:00

I'm not sure that this matters.

In most cases, you shouldn't be using RAIDZ, versus mirrors... If you do, you should be doing so with a spare.

Resilvering will fail if one of the disks it's reading from fails or is unavailable. Same as an Unrecoverable Read Error. Disk C would be gone by that point...

1

How ZFS handles online replacement in a RAID-Z (theoretical)

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?