So I have a old folder with lots of stuff. I think there are duplicate files here and there.
Is it worth it to do a squashfs backup first? Or should I just 7zip it?
Here is my backup trick method:
apt install -y squashfs-tools ;
cd /mnt/BackupDrive
mksquashfs /mnt/OldSourceDrive/ Backup-Deduped.squashfs
-keep-as-directory
The previous answer which states Squashfs does not do duplicate detection is incorrect. Squashfs does explicit detection of duplicate files, and only stores the data once. This is before and entirely separate to compression.
BTW the output of Mksquashfs will tell you how many duplicate files there are.
You also change the default compression algorithm and block size from gzip and 128K. This will achieve better compression.
mksquashfs /mnt/OldSourceDrive/ Backup-Deduped.squashfs -keep-as-directory -comp xz -b 1M
You've specifically mentioned that you have duplicate files, so it's worth pointing out that in general, filesystem or archive compression formats won't remove redundancy between duplicate files. The exception is tar.gz, but even then it won't make a huge difference to the space taken by the duplicate files. If duplicate files are main reason you want to compress, it would be better to do something like run a duplicate file finder over it and remove or hard link together any duplicates (see here).
If you just need to once-off compress a bunch of things, and you won't need continuous access to write new files into that archive, it's easiest just to zip it up (you can use 7-zip). Note that on most people's drives these days the majority of space is taken by file formats that won't compress (eg movies, photos).