I am trying to be clear about how zfs dedup handles the case where some (but not all) datasets in a pool are deduped, from a dedup table/RAM impact perspective. I found this quote from the FreeBSD mailing list in 2012:
"Note that only file systems that you enabled dedup for will actually participate in dedup. File systems that have dedup=off won't go through the dedup."
As an example, suppose we have two zpools, A and B. Pool A has 4 datasets containing 21 TB of data:
- Datasets #1 and #2 each contain 0.5 TB data with dedup on
- Dataset #3 and #4 each contain 10 TB data with dedup off
Pool B has one dataset containing 1 TB data with dedup on.
It's clear that the dedup functionality applies to the entirety of each pool. What isn't clear, is whether the RAM impact of dedup is based only on the deduped datasets? In other words, all other things being equal, will the dedup table size and RAM impact be similar for pool A and pool B, or far larger for pool A than pool B?
I think the dedup table has to be similar for both (set poolwide but no impact on size from any non-deduped datasets), mainly because if it was much larger, it would be equivalent to forcing dedup on the whole pool not just specific datasets. However it isn't clear to me whether this is actually so.
0 Answers