jplitza's questions -server

jplitza

Asked: 2017-02-12 08:48:54 +0800 CST

ZFS doesn't want to detach replaced disk

I wanted to replace a disk in my zpool by issuing the following command:

zpool replace -o ashift=12 pool /dev/mapper/transport /dev/mapper/data2

ZFS got to work and resilvered the pool. In the process, there were some read errors on the old disk, and after it finished, zpool status -v looked like this:

  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 6,30T in 147h38m with 6929 errors on Sat Feb 11 13:31:05 2017
config:

    NAME             STATE     READ WRITE CKSUM
    pool             ONLINE       0     0 16,0K
      raidz1-0       ONLINE       0     0 32,0K
        data1        ONLINE       0     0     0
        replacing-1  ONLINE       0     0     0
          transport  ONLINE   14,5K     0     0
          data2      ONLINE       0     0     0
        data3        ONLINE       0     0     0
    logs
      data-slog      ONLINE       0     0     0
    cache
      data-cache     ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:
<list of 3 files>

I expected the old disk to be detached from the pool, but it wasn't. I tried to detach it manually:

# zpool detach pool /dev/mapper/transport
cannot detach /dev/mapper/transport: no valid replicas

But when I exported the pool, removed the old drive, and imported the pool again, it seems to work flawlessly: It started resilvering again, but it is DEGRADED, not FAILED:

  pool: pool                      
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Feb 11 17:28:50 2017
    42,7G scanned out of 9,94T at 104M/s, 27h43m to go
    1,68G resilvered, 0,42% done
config:

    NAME                        STATE     READ WRITE CKSUM
    pool                        DEGRADED     0     0     9
      raidz1-0                  DEGRADED     0     0    18
        data1                   ONLINE       0     0     0
        replacing-1             DEGRADED     0     0     0
          15119075650261564517  UNAVAIL      0     0     0  was /dev/mapper/transport
          data2                 ONLINE       0     0     0  (resilvering)
        data3                   ONLINE       0     0     0  (resilvering)
    logs
      data-slog                 ONLINE       0     0     0
    cache
      data-cache                ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:
<list of 3 files>

Still, although it is clearly not necessary for full functionality of the pool, I cannot detach the old drive:

# zpool offline pool 15119075650261564517
cannot offline 15119075650261564517: no valid replicas

What is going on?

Update: Apparently, ZoL hadn't given up on the failing devices just yet. Replacing the 3 files with permanent errors (one of which was a zvol, meaning I had to create another one, dd conv=noerror over the contents and destroy the old one) and letting the resilver finish finally removed the old drive.

I'd still be interested in what ZoL was thinking. I mean, everything that didn't cause read- or checksum-errors was copied over to the new device, and it had already marked the sectors that caused errors as permanently failed. So why hang on to the old device that ZoL clearly didn't intend to get any information from anymore?

ZFS doesn't want to detach replaced disk

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?