On a server, install git
cd /
git init
git add .
git commit -a -m "Yes, this is server"
Then get /.git/
to point to a network drive (SAN, NFS, Samba whatever) or different disk. Use a cron job every hour/day etc. to update the changes. The .git directory would contain a versioned copy of all the server files (excluding the useless/complicated ones like /proc, /dev etc.)
For a non-important development server where I don't want the hassle/cost of setting it up on a proper backup system, and where backups would only be for convenience (I.E. we don't need to backup this server but it would save some time if things went wrong), could this be a valid backup solution or will it just fall over in a big pile of poop?
You're not a silly person. Using
git
as a backup mechanism can be attractive, and despite what other folks have said,git
works just fine with binary files. Read this page from the Git Book for more information on this topic. Basically, sincegit
is not using a delta storage mechanism, it doesn't really care what your files look like (but the utility ofgit diff
is pretty low for binary files with a stock configuration).The biggest issue with using
git
for backup is that it does not preserve most filesystem metadata. Specifically,git
does not record:You can solve this by writing tools to record this information explicitly into your repository, but it can be tricky to get this right.
A Google search for git backup metadata yields a number of results that appear to be worth reading (including some tools that already attempt to compensate for the issues I've raised here).
etckeeper was developed for backing up
/etc
and solves many of these problems.I've not used it, but you might look at bup which is a backup tool based on git.
It can be a valid backup solution, etckeeper is based on this idea. But keep an eye on the
.git
directory permissions otherwise pushing/etc/shadow
can be readable in the.git
directory.Whilst technically you could do this I would put two caveats against it:
1, You are using a source version control system for binary data. You are therefore using it for something that it was not designed for.
2, I worry about your development process if you don't have a process (documentation or automated) for building a new machine. What if you got hit buy a bus, who would know what to do and what was important?
Disaster recovery is important, however its better to automate (script) the setup of a new development box than just backup everything. Sure use git for your script/documentation but not for every file on a computer.
I use git as a backup for my Windows system, and it's been incredibly useful. At the bottom of the post, I show the scripts I use to configure on a Windows system. Using git as a backup for any system provides 2 big advantages:
Bottom line: A git backup gives you incredible amounts of power on controlling how your backups happen.
I configured this on my Windows system. The first step is to create the local git repo where you will commit all your local data to. I recommend using a local second hard drive, but using the same harddrive will work to (but it's expected you'll push this somewhere remote, or otherwise your screwed if the harddrive dies.)
You'll first need to install cygwin (with rsync), and also install git for Windows: http://git-scm.com/download/win
Next, create your local git repo (only run once):
init-repo.bat:
Next, we have our backup script wrapper, which will be called regularly by Windows Scheduler:
gbackup.vbs:
Next, we have the backup script itself that the wrapper calls:
gbackup.bat:
We have exclude-from.txt file, where we put all the files to ignore:
exclude-from.txt:
You'll need to go to any remote repos and do a 'git init --bare' on them. You can test the script by executing the backup script. Assuming everything works, go to Windows Scheduler and point an hourly backup toward the vbs file. After that, you'll have a git history of your computer for every hour. It's extremely convenient -- every accidentally delete a section of text and miss it? Just check your git repository.
Well it's not a bad idea, but I think there is 2 red flags to be raised:
... but still, it can be a good backup for corruptions-related things. Or like you said, if the .git/ folder is somewhere else.
... So you may need to tell your cronjob to add tags, and then make sure commit that are not tagged will be cleaned up.
I haven't tried it with a full system but I'm using it for my MySQL backups (with the --skip-extended-insert option) and it has really worked well for me.
You're going to run into problem with binary data files (their entire contents could and will change) and you might have problems with the
.git
folder getting really large. I would recommend setting up a.gitignore
file and only backing up text files that you really know you need.I once developped a backup solution based on subversion. While it worked quite well (and git should work even better), I think there are better solutions out here.
I consider rsnapshot to be one of the better - if not the better. With a good use of hard link, I have a 300 GB fileserver (with half a million files) with daily, weekly and montly backup going back as far as one years. Total used disk space is only one full copy + the incremental part of each backup, but thanks to hardlinks I have a complete "live" directory structure in each of the backups. In other word, files are directly accessible not only under daily.0 (the most recent backup), but even in daily.1 (yestarday) or weekly.2 (two week ago), and so on.
Resharing the backup folder with Samba, my users are able to pull the file from backups simply by pointing their PC to the backup server.
Another very good options is rdiff-backup, but as I like to have files always accessible simply by heading Explorer to \\servername, rsnapshot was a better solution for me.
I had the same idea to backup with git, basically because it allows versioned backups. Then I saw rdiff-backup, which provides that functionality (and much more). It has a really nice user interface (look at the CLI options). I'm quite happy with that. The
--remove-older-than 2W
is pretty cool. It allows you to just delete versions older than 2 weeks.rdiff-backup
stores only diffs of files.I am extremely new to git, but aren't branches local by default, and must be pushed explicitly to remote repositories? This was an unpleasant and unexpected surprise. After all, don't I want all of my local repo to be 'backed up' to the server? Reading the git book:
To me this meant that those local branches, like other non-git files on my local machine, are at risk of being lost unless backed up regularly by some non-git means. I do this anyway, but it broke my assumptions about git 'backing up everything' in my repo. I'd love clarification on this!