By default, I get an error that I have too many open files from the process. If I lift the limit manually, I get an error that I'm out of memory. For whatever reason, it seems that Git Annex in its current state is not optimised for this sort of task (adding thousands of files to a repository at once).
As a possible solution, my next thought was to do something like:
cd /
find . -type d | xargs git annex add --$NONRECURSIVELY
find . -type f | xargs git annex add
# Need to add parent directories of each file first or adding files fails
The problem with this solution is that there doesn't seem from the documentation to be a way to non-recursively add a directory in Git Annex. Is there something I'm missing or a workaround for this?
If my proposed solution is a dead end, are there other ways that people have solved this problem?
Update: do not do this.
Evidently, Git Annex moves each file added to a repository to some directory structure in .git/annex/objects, then replaces them with symlinks to the real files in .git. This would be fine if I hadn't first experimented with adding /etc.
Needless to say, server hosed. Luckily, I came up with a fix:
Edit: Disregard; I'm stupid; system's still hosed; this is going to be a long night.
Second edit: Managed to unhose system . It involved a lot of manual reconstruction of /etc and reinstalling every package, including reconfiguring/unbreaking a large number of them and debugging/solving a ton of problems with APT. Would not try this again.
As far as the problem of version controlling 300 gigs of files, I'll come back with an update whenever I decide on something and get it working (regardless of whether or not it's with Git Annex).
Small update:
This entire problem was totally user error. My root drive is a 256 GB SSD, while one of the folders I was attempting to add mapped to a 1.5 TB RAID 1 array. No matter how I tried to accomplished this, it would have inevitably tried to copy more files into the /.git folder than the drive could have fit (duh). No idea what I thought was going to happen :/.
This is why you don't mess with system directories...
Initialised the Git Annex repository on the 1.5 TB drive instead and just copied the root-level directories I wanted backed up. The normal
git annex add .
worked brilliantly, and my repository has been in the process of backing itself up to Glacier for the past five days or so using these Annex-Glacier hooks with little issue.I use annex for host management like this:
This all acts like a low-speed, distributed, "admin filesystem", with version control, staging, and whatever checks and balances you want to put into how you're using git and git annex.
If you're managing your machines sanely, you don't need to check in the entire root filesystem -- most of it doesn't vary from one machine to another. You do need to have some way of managing package installs and upgrades, but that tooling can itself be checked into the annex, along with the packages and other blobs which it uses as source data -- again, all versioned courtesy of git.