du -hc --max-depth=1 /home/back/tvppclientarea/ | sort -k2
To show the size of backup directories that use rsync and hard links. The command lists each directory and shows the amount added to the previous directory, i.e. how big each backup was.
For reference the rsync command was
rsync --archive --itemize-changes --human-readable --stats --delete --link-dest=/home/back/tvppclientarea/1586563849_Sat_11Apr2020_0110 [email protected]:/home/tvppclientarea/. /home/back/tvppclientarea/1586565194_Sat_11Apr2020_0133/.
Think is the directory is 1.5TB and it takes a long time to run, minutes. Was wondering if there was a way of speeding this up. I came across a command ncdu which I think may work (it does cacheing so the second time you run it is quicker) but can't find how to replace my command with it.
Later versions of
ncdu
attempt to only count space use of hard links once, see the man page for details. This is different fromdu
behavior.If you want to count hard links only once like that, there is no avoiding scanning the entire backup tree. Or at least, the destination and the --link-dest of one backup. Necessary to find references to the same inode.
ncdu -o
output files save every file of one report. Not an incremental cache, all you can do is load the entire thing into the ncdu UI. So it will still take minutes to runbut loading that report again later is much faster.
Could save a
ncdu
report from each individual backup, not the entire tree. So the target directory, and the --link-dest where the hard links are. Then comparing the size of the last backups is a matter of finding multiple report files and runningncdu -f
on each.File system metadata is many small IOs. Order of magnitude one per file when iterating over them like this. Improving the IOPS of the storage system may make this faster. Perhaps adding a caching tier of fast storage.