I have a 3x1TB ZFS (ZoL) RAIDZ1. One drive failed. I'm replacing it with a 4TB disk.
$ sudo zpool replace pool <old guid> /dev/disk/by-id/<new id>
cannot replace <old guid> with /dev/disk/by-id/<new id>:
new device has a different optimal sector size;
use the option '-o ashift=N' to override the optimal size
$ zdb -C | grep ashift
ashift: 9
$ sudo zpool replace pool <old guid> /dev/disk/by-id/<new id> -o ashift=9
This works. The array is now resilvering. However, to be honest I didn't understand the impact of this setting. I just wanted to replace the faulty disk as soon as possible, and replace the other 1TB disks for 4TB ones in the near future. But in hindsight, performance with an Alignment Shift of 2^9 is described as horrible.
I understand this setting is immutable. Once I replaced the other two disks, I cannot change the ashift
value to 2^12, which is, if I understand correctly, recommended for 4TB disks.
Did I just shoot myself in the foot? How best to proceed? Can I disable
autoexpand
and create a new volume on the new array with ashift=12
and copy the old volume to the new volume on the same drive? Is that possible and recommendable?
Choosing a too-small
ashift
value will essentially cause the disk to do a read-modify-write internally when you do writes. Whether you’ll notice that performance hit depends on your usage pattern — do you do a lot of synchronous writes and need very high performance? If the answer is yes, you will notice the difference, but if not, you might not. Async writes are not as affected by this because ZFS batches those writes together into large multi-block updates, meaning there won’t be a lot of sub-block writes (because only the first and last blocks in a large contiguous write can be sub-block).The main downside of switching to 4KiB blocks is that compression won’t work as well, since it can only round the compressed blocks to the nearest 4KiB instead of the nearest 512B. If that matters to you more than top-end write performance, maybe leave it as is.
If write performance really matters, you should rebuild the pool. I think the easiest way would be to do a
zfs send
/zfs receive
to a new pool which is configured the way you want.Regarding your specific suggestion for how to avoid buying all the hardware for the new pool at once and doing a clean
send
/receive
: since you already have data written withashift
9, you can’t add a disk withashift
12 to the pool (ZFS would not be able to address the blocks which are 512B aligned on the disk which is 4KiB aligned;ashift
is a setting on top-level vdevs, not disks) — hence the warning you saw. It might be possible to make a new pool on the second half of the new drive and copy all the data from the first pool over to it, but the redundancy of that would not be the same as the original pool, since you can’t build a RAIDZ1 out of one partition of one disk. Maybe you could copy to the second pool with no redundancy, then reconfigure the disks / partitions in the original pool into the correctashift
/ redundancy, then copy all the data back.