We've got a collection of servers that we're backing up using Duplicity. We're trying to build some functionality for our staff so that they can select a file and view the available versions for restore. Duplicity stores its metadata in tar files like:
16M Mar 6 07:20 duplicity-new-signatures.20140305T070733Z.to.20140306T070755Z.sigtar.gz
17M Mar 5 07:17 duplicity-new-signatures.20140304T070728Z.to.20140305T070733Z.sigtar.gz
74M Mar 4 08:02 duplicity-full-signatures.20140304T070728Z.sigtar.gz
13M Mar 3 09:11 duplicity-new-signatures.20140302T070743Z.to.20140303T070723Z.sigtar.gz
14M Mar 2 07:18 duplicity-new-signatures.20140301T070921Z.to.20140302T070743Z.sigtar.gz
18M Mar 1 07:22 duplicity-new-signatures.20140228T071001Z.to.20140301T070921Z.sigtar.gz
16M Feb 28 07:23 duplicity-new-signatures.20140227T071151Z.to.20140228T071001Z.sigtar.gz
15M Feb 27 07:27 duplicity-new-signatures.20140226T070820Z.to.20140227T071151Z.sigtar.gz
13M Feb 26 07:20 duplicity-new-signatures.20140225T071049Z.to.20140226T070820Z.sigtar.gz
14M Feb 25 07:28 duplicity-new-signatures.20140224T070941Z.to.20140225T071049Z.sigtar.gz
92M Feb 24 08:14 duplicity-full-signatures.20140224T070941Z.sigtar.gz
- Each
.sigtar.gz
is a tar archive containing the signatures of all of the changed files. The signatures are stored as files named identically to those to which they reference. duplicity-full
files contain a signature for every file in the setduplicity-new
files contain only signatures for files that have changed since the last full or incremental backup.
Essentially what I need to do is:
for file in `ls /root/.cache/duplicity/hashid/*.sigtar.gz`; do
tar -tzvf $file signature/path/to/specified/file.name
done
The problem is that even on the server that only contains 1/20th of the amount of data that we expect a 'fully-loaded' server to have, the listing of a 'full' can take in excess of 10 seconds. I shudder to think of how long this process might take once these machines fill up.
Is there any way to speed up the retrieval via tar?
Or if anyone happens to know of a better way to parse Duplicity metadata, I'd definitely like to know.
It is possible to speed up the retrieval via tar with the
--seek
option:However, the fact that the tarball is compressed means it isn't seekable. http://duplicity.nongnu.org/new_format.html acknowledges this limitation.