I have some stored ZFS "send streams" (i.e., they were created by redirecting the output of zfs send
to a file). I would like to examine the contents of these streams without receiving and writing them into a filesystem — for example, I'd like to view a list of filenames inside the stream. Is there any way to do this?
I've done some reading and searching, but haven't found anything that looks like what I'm talking about. I'm using both the FreeBSD and the ZFS on Linux implementations of ZFS.
You can obtain some info by piping them into
zstreamdump -d
, but that will not give any info on file names directly because there are no files in the stream. The stream is a sheer difference between two trees described in blocks. However the code is public, so if you manage to add ZFS structure detection and parsing you can get more from it.ZFS internal structure is a tree internally and all operations are done over that tree. Files, directories, filenames, attributes and everything else is just a data in that tree. Snapshots, volumes and FS are the tree roots, and when you take another snapshot you are just storing the current root somewhere. Live systems are generating new roots for each transaction constantly drifting away from the older roots, while keeping a lot of data "leaves" from the previous tree intact. The stream represents a list of operations that should be performed on tree A to become B.
I'm just trying to say that you may not see the data you are looking for in the stream because they are not required to be there. When the file is dropped the corresponding blocks are just freed so you can't tell what was the file name or contents. When the file is changed it's referenced by the object id so you wouldn't get anything from the stream even if the file was rewritten from the scratch but the directory entry hasn't been updated.
You will be lucky if the stream is not a differential stream or if you have some data over it's previous state. But that's just because a full stream transforms empty root to the target tree thus containing all required data. Hence you can add the block parsing code to the
zstreamdump
to detect and process ZFS internal data.Short answer:
I do not believe there is any way to usefully catalog the contents of a send stream that is any lighter weight than piping it to zfs receive to recreate it as a dataset.
Much longer answer:
A send stream is a storage block level collection of data, not a filesystem level collection of data. A send stream does not know or care about individual files; it's designed to replicate what are essentially raw block devices. While one user might exclusively use
zfs send
to replicate ZFS datasets with files stored directly on them, another might use it to replicate ZVOLs formatted with ext4, ntfs, or even an encrypted system like LUKS - in these cases, ZFS has absolutely no knowledge of what the contents of the volume are, it merely stores the raw blocks for them.zfs send
works exactly the same whether you're replicating a dataset, or a raw zvol, because it simply doesn't care about anything below the raw block storage level. It doesn't know about filenames, file sizes, paths, or anything else - it knows which blocks belong in a given snapshot of a zvol or snapshot, but it does not know how any of those blocks relate to one another.So, there's no lightweight way to catalog the file contents of a
zfs send
stream, because there is no internal catalog of the files in one. Even if you know conclusively that this particular stream happens to be a full (not incremental) replication of an unencrypted ZFS dataset, you'd have to parse every block of it line by line to try to figure out which blocks of it contained filenames.Essentially, in order to extract the filenames from a send stream, you'd be doing all the same work that
zfs receive
is doing by applying that stream to a dataset in the first place.