I just got an SSD and wanted to migrate my current Ubuntu installation to take advantage of its performance.
So I booted to an Ubuntu live CD, mounted source (ext4) and target (SSD, btrfs
mounted with compress-force=zlib,nodatacow,noatime,rw,ssd
) drives and began copying files using rsync:
sudo rsync -av --exclude=/home/ '/media/username/source/' '/media/username/target'
# /home stays on HDD for now
Rsync finished the job without problems. The file count is similar (about 1.2 million files average, number obtained by right click > properties on Nautilus), but the resultant copy is 31GB, way larger than the source which is only 18GB.
Checked sizes by various methods:
df
- Right click > Properties
btrfs filesystem df
- Baobab
All gave similar results, source is way smaller.
I know that btrfs uses some kind of metadata journal and "shadow copies" of files when COW is on. But COW is off and even if it were on, is the first population of data plus noway 12GB out of 31GB could be metadata; right? o.o
Any idea on WTH is going on? or better yet, how to fix it?
By default, btrfs puts files smaller than 4 KiB into metadata blocks (to avoid the extra seeks that would occur if the data were placed far away from the metadata); this is controlled by the
max_inline
mount option. In addition, btrfs will duplicate metadata by default unlessmkfs.btrfs
detects that the selected device is non-rotational when creating the filesystem; this is controlled by the--metadata
option inmkfs.btrfs
. Taken together, this means that the on-disk size of each file less than 4 KiB is at least twice its actual data size.At 1.2 million files and 18 GB, the average size of your files is 16 KB, and I suspect a lot of these are smaller than 4 KiB. This might explain the significant increase in disk space usage over ext4.
However, this explanation is suspicious because ext4—like most filesystems—is also inefficient in storing files less than 4 KiB since it defaults to a sector size of 4 KiB, which means that every file occupies at least that much space on disk. Btrfs is different here in that it tightly packs the inlined data in its metadata blocks. I expect btrfs to become more space-efficient than ext4 (under current default options) for files smaller than 2 KiB.
Thus, I think my explanation here is wrong unless you have a lot of files that are between 2 KiB and 4 KiB, or you are using non-default options for ext4 or btrfs.
But if this explanation is right, then you can reduce disk space usage in btrfs by not duplicating metadata: simply specify the
--metadata single
option when invokingmkfs.btrfs
(obviously, this reduces redundancy, so the filesystem will be less resilient against metadata corruption). For an existing btrfs filesystem, you can convert duplicated metadata into single metadata using balance filters.It is possible to disable data inlining using the
max_inline=0
mount option, but I don't recommend this because it runs into the space-efficiency problem for small files that ext4 and other filesystems face.