We need to backup a filesystem with lots of hardlinks. Since there are several hardlinks for each "true" file, we would like to skip all the hardlinks when backing up the filesystem to avoid n exact copies of each file.
The backup is done using Tivoli Storage Manager Backup, and we've been unable to get it to treat hardlinks as anything other than separate files to be backed up alongside each other.
In case it's relevant for possible solutions, I'd like to note that it's possible to tell a hardlink from a proper file by the filename:
foobarbaz-123.ext # file
foobarbaz-123-1.ext # hardlink
foobarbaz-123-2.ext # hardlink
barbazfoo-456.ext # file
barbazfoo-456-1.ext # hardlink
barbazfoo-456-2.ext # hardlink
barbazfoo-456-3.ext # hardlink
That is, all hardlinks have two hyphens in the filename, where as proper files have just the one.
The server is running Ubuntu Linux, and the files are situated on a gfs volume on our SAN.
A quick read of some TSM docs suggests "Don't do that!"
With unix, a "file" is just a directory entry that points to an inode. A "hard link" is just when you have more than one directory entries (pointers) pointing to a given inode. For all intents and purposes, these two "files" are exactly 100% identical.
Hard links are a well established and understood mechanism in unix. It is proper and common to encounter them and it is common for backup software to understand exactly what a hardlink is and to back it up exactly as it should -- as another pointer to a specific piece of data, not as a unique and novel piece of data that happens to be exactly the same as the other hard links.
A quick google of tsm and hardlinks indicates that tsm understands hard links and the docs specifically warn:
Interestingly, it seems like are two different ways that you can do backups with TSM -- backups and archives and the two ways seem to deal with hard links differently.
backing up and restoring files:
archiving and restoring files:
From this it seems that you'll blow your backup server up if it is "Archiving" things and it will do what you want if you're "backing up." Leave it to IBM to make it simple!
First, there is no difference between a "proper file" and a "hardlink", the hardlink is just another name for the same object, while a softlink is actually a file containing a pointer to the real file, which is why a softlink can cross filesystem boundaries and a hardlink cannot.
About the actual problem: Have a look at the Exclude option and the include-exclude-list option in the documentation, you should be able to work something out with them. (like
exclude /path/to/your/files/*-*-?.*
or something).Without knowing anything about Tivoli Storage manager, it wouldn't be possible to get any piece of software to treat hardlinks differently to files, since there is no actual difference between the original file handle, and the other hardlinks. (it may be possibly to script it based on file names)
Upgrade to TSM 6.1 and activate deduplication. (currently only available with device type FILE, but patience is a virtue)