I have a Proxmox server with a few zpools. One of the zpools rust01
is a 4-disk zpool, where the metadata and write cache were some nvme m.2 drives on the motherboard (one for each - I know, stupid, but that's what was done).
It appears as though the rust01
has had a catastrophic failure.
When I click on rust01
in the Server View, I get the following error:
could not activate storage 'rust01', zfs error: cannot import 'rust01': I/O error (500)
When I go to { Server } > Disks > ZFS, I do not see the
rust01
zpool.When I go to { Server } > Disks I don't even see the 4-disks or the special metadata -file or the read/write cache drives.
When I run
zpool status -x
I getall pools are healthy
When I run
zpool import rust01
I get the following message:cannot import 'rust01': I/O error Destroy and re-create the pool from a backup source.
When I run zpool status rust01
I get cannot open 'rust01': no such pool
,
When I reboot the server, below is the error emailed to me:
ZFS has detected that a device was removed.
impact: Fault tolerance of the pool may be compromised.
eid: 10
class: statechange
state: UNAVAIL
host: pve01
time: 2024-09-14 21:20:32-0500
vpath: /dev/nvme2n1p1
vphys: pci-0000:41:00.0-nvme-1
vguid: 0x297D516B1F1D6494
devid: nvme-Samsung_SSD_970_EVO_Plus_2TB_S6S2NS0T815592K-part1
pool: rust01 (0xE4AAC2680D8B6A7E)
When I run zpool destroy rust01
I get the following error cannot open 'rust01': no such pool
.
Ideally, I would like to get rust01
back online. I am fairly certain the issue is the special metadata disk mentioned in the email above. That said, I would be happy to destroy and recreate rust01
. All of the VMs on that disk are backed up, so I can easily restore if needed. My problem, however, is that I can't find a way to get Proxmox/ZFS to release the disks associated with the corrupt rust01
zpool. Below is the output of lsblk
:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 1 465.8G 0 disk
|-sda1 8:1 1 1007K 0 part
|-sda2 8:2 1 1G 0 part
`-sda3 8:3 1 464G 0 part
sdb 8:16 1 465.8G 0 disk
|-sdb1 8:17 1 1007K 0 part
|-sdb2 8:18 1 1G 0 part
`-sdb3 8:19 1 464G 0 part
sdc 8:32 1 3.6T 0 disk
|-sdc1 8:33 1 3.6T 0 part
`-sdc9 8:41 1 8M 0 part
sdd 8:48 1 3.6T 0 disk
|-sdd1 8:49 1 3.6T 0 part
`-sdd9 8:57 1 8M 0 part
sde 8:64 1 3.6T 0 disk
|-sde1 8:65 1 3.6T 0 part
`-sde9 8:73 1 8M 0 part
sdf 8:80 1 3.6T 0 disk
|-sdf1 8:81 1 3.6T 0 part
`-sdf9 8:89 1 8M 0 part
sdg 8:96 1 0B 0 disk
sdh 8:112 1 0B 0 disk
sdi 8:128 1 0B 0 disk
sdj 8:144 1 0B 0 disk
sdk 8:160 1 0B 0 disk
sdl 8:176 1 0B 0 disk
sdm 8:192 1 0B 0 disk
sdn 8:208 1 0B 0 disk
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 1024M 0 rom
sr2 11:2 1 1024M 0 rom
sr3 11:3 1 1024M 0 rom
zd0 230:0 0 4M 0 disk
zd16 230:16 0 80G 0 disk
`-zd16p1 230:17 0 80G 0 part
zd32 230:32 0 64G 0 disk
|-zd32p1 230:33 0 1M 0 part
|-zd32p2 230:34 0 2G 0 part
`-zd32p3 230:35 0 62G 0 part
zd48 230:48 0 40G 0 disk
|-zd48p1 230:49 0 600M 0 part
|-zd48p2 230:50 0 1G 0 part
`-zd48p3 230:51 0 38.4G 0 part
zd64 230:64 0 32G 0 disk
|-zd64p1 230:65 0 31G 0 part
|-zd64p2 230:66 0 1K 0 part
`-zd64p5 230:69 0 975M 0 part
zd80 230:80 0 90G 0 disk
|-zd80p1 230:81 0 100M 0 part
|-zd80p2 230:82 0 16M 0 part
|-zd80p3 230:83 0 89.4G 0 part
`-zd80p4 230:84 0 523M 0 part
zd96 230:96 0 90G 0 disk
|-zd96p1 230:97 0 499M 0 part
|-zd96p2 230:98 0 128M 0 part
|-zd96p3 230:99 0 88.5G 0 part
`-zd96p4 230:100 0 920M 0 part
zd112 230:112 0 100G 0 disk
|-zd112p1 230:113 0 499M 0 part
|-zd112p2 230:114 0 99M 0 part
|-zd112p3 230:115 0 16M 0 part
`-zd112p4 230:116 0 99.4G 0 part
zd128 230:128 0 64G 0 disk
|-zd128p1 230:129 0 1M 0 part
|-zd128p2 230:130 0 2G 0 part
`-zd128p3 230:131 0 62G 0 part
zd144 230:144 0 90G 0 disk
|-zd144p1 230:145 0 500M 0 part
`-zd144p2 230:146 0 89.5G 0 part
zd160 230:160 0 60G 0 disk
|-zd160p1 230:161 0 100M 0 part
|-zd160p2 230:162 0 16M 0 part
|-zd160p3 230:163 0 59.4G 0 part
`-zd160p4 230:164 0 450M 0 part
zd176 230:176 0 32G 0 disk
|-zd176p1 230:177 0 1M 0 part
|-zd176p2 230:178 0 2G 0 part
`-zd176p3 230:179 0 30G 0 part
zd192 230:192 0 100G 0 disk
|-zd192p1 230:193 0 450M 0 part
|-zd192p2 230:194 0 99M 0 part
|-zd192p3 230:195 0 15.8M 0 part
|-zd192p4 230:196 0 89.4G 0 part
`-zd192p5 230:197 0 256K 0 part
zd208 230:208 0 32G 0 disk
|-zd208p1 230:209 0 600M 0 part
|-zd208p2 230:210 0 1G 0 part
`-zd208p3 230:211 0 30.4G 0 part
zd224 230:224 0 1M 0 disk
nvme1n1 259:0 0 1.8T 0 disk
|-nvme1n1p1 259:1 0 1.8T 0 part
`-nvme1n1p9 259:2 0 8M 0 part
nvme3n1 259:3 0 1.8T 0 disk
|-nvme3n1p1 259:4 0 1.8T 0 part
`-nvme3n1p9 259:5 0 8M 0 part
nvme0n1 259:6 0 1.8T 0 disk
|-nvme0n1p1 259:7 0 1.8T 0 part
`-nvme0n1p9 259:8 0 8M 0 part
nvme2n1 259:9 0 1.8T 0 disk
|-nvme2n1p1 259:10 0 1.8T 0 part
`-nvme2n1p9 259:11 0 8M 0 part
There are other VMs on this host running on other zpools, that appear fine. As such, reinstalling everything is not an option I want to entertain.
UPDATE: After running through the steps outlined in Fix your dead SSD with the power cycle method the 4TB hard drives are showing up in Proxmox, albeit the zpool is not accessible. Some progress, but still no access to the data.
Any ideas on how to proceed other than a wiping the disks in the affected zpool?
In the end I gave up and decided to destroy zpool
rust01
.When trying to destroying a corrupt zpool used in Proxmox, you may run into problems where the disk seems always engaged and command line and UI attempts to wipe the disk(s) fail. You may need to comment out the zpool in
/etc/pve/storage.cfg
. After that, you will be able to wipe all of the disks associate with your failed zpool. Click here for more information.