Ping a Specific Port

Question

Smudge

Asked: 2011-12-16 04:10:47 +0800 CST2011-12-16 04:10:47 +0800 CST 2011-12-16 04:10:47 +0800 CST

GIT as a backup tool

772

On a server, install git

cd /
git init
git add .
git commit -a -m "Yes, this is server"

Then get /.git/ to point to a network drive (SAN, NFS, Samba whatever) or different disk. Use a cron job every hour/day etc. to update the changes. The .git directory would contain a versioned copy of all the server files (excluding the useless/complicated ones like /proc, /dev etc.)

For a non-important development server where I don't want the hassle/cost of setting it up on a proper backup system, and where backups would only be for convenience (I.E. we don't need to backup this server but it would save some time if things went wrong), could this be a valid backup solution or will it just fall over in a big pile of poop?

16 Answers

Voted

larsks · Answer 1 · 2011-12-16T09:25:18+08:00

You're not a silly person. Using git as a backup mechanism can be attractive, and despite what other folks have said, git works just fine with binary files. Read this page from the Git Book for more information on this topic. Basically, since git is not using a delta storage mechanism, it doesn't really care what your files look like (but the utility of git diff is pretty low for binary files with a stock configuration).

The biggest issue with using git for backup is that it does not preserve most filesystem metadata. Specifically, git does not record:

file groups
file owners
file permissions (other than "is this executable")
extended attributes

You can solve this by writing tools to record this information explicitly into your repository, but it can be tricky to get this right.

A Google search for git backup metadata yields a number of results that appear to be worth reading (including some tools that already attempt to compensate for the issues I've raised here).

etckeeper was developed for backing up /etc and solves many of these problems.

stew · Answer 2 · 2011-12-16T05:27:22+08:00

stew

2011-12-16T05:27:22+08:002011-12-16T05:27:22+08:00

I've not used it, but you might look at bup which is a backup tool based on git.

28

Stone · Answer 3 · 2011-12-16T04:18:11+08:00

Stone

2011-12-16T04:18:11+08:002011-12-16T04:18:11+08:00

It can be a valid backup solution, etckeeper is based on this idea. But keep an eye on the .git directory permissions otherwise pushing /etc/shadow can be readable in the .git directory.

12

Phil Hannent · Answer 4 · 2011-12-16T05:45:57+08:00

Phil Hannent

2011-12-16T05:45:57+08:002011-12-16T05:45:57+08:00

Whilst technically you could do this I would put two caveats against it:

1, You are using a source version control system for binary data. You are therefore using it for something that it was not designed for.

2, I worry about your development process if you don't have a process (documentation or automated) for building a new machine. What if you got hit buy a bus, who would know what to do and what was important?

Disaster recovery is important, however its better to automate (script) the setup of a new development box than just backup everything. Sure use git for your script/documentation but not for every file on a computer.

12

user64141 · Answer 5 · 2015-03-22T09:10:17+08:00

I use git as a backup for my Windows system, and it's been incredibly useful. At the bottom of the post, I show the scripts I use to configure on a Windows system. Using git as a backup for any system provides 2 big advantages:

Unlike commercial solutions often use their own proprietary format, your backup is in an open source format that is widely supported and very well documented. This gives you full control of your data. It's very easy to see which files changed and when. If you want to truncate your history, you can do that as well. Want to obliterate something from your history? No problem. Getting a version of your file back is as simple as any git command.
As many or as few mirrors as you want, and all can have customized backup times. You'll get your local mirror, which is unburdened by slow Internet traffic, and thus gives you (1) the ability to do more frequent backups throughout the day and (2) a quick restoration time. (Frequent backups are a huge plus, because I find the most time I lose a document is by user-error. For example, your kid accidentally overwrites a document he's been working on for the last 5 hours.) But you'll get your remote mirror, which gives the advantage of data protection in case of a local disaster or theft. And suppose you want your remote mirror backing up at customized time to save your Internet bandwidth? No problem.

Bottom line: A git backup gives you incredible amounts of power on controlling how your backups happen.

I configured this on my Windows system. The first step is to create the local git repo where you will commit all your local data to. I recommend using a local second hard drive, but using the same harddrive will work to (but it's expected you'll push this somewhere remote, or otherwise your screwed if the harddrive dies.)

You'll first need to install cygwin (with rsync), and also install git for Windows: http://git-scm.com/download/win

Next, create your local git repo (only run once):

init-repo.bat:

@echo off
REM SCRIPT PURPOSE: CREATE YOUR LOCAL GIT-REPO (RUN ONLY ONCE)

REM Set where the git repository will be stored
SET GBKUP_LOCAL_MIRROR_HOME=E:\backup\mirror


REM Create the backup git repo. 
SET GIT_PARAMS=--git-dir=%GBKUP_LOCAL_MIRROR_HOME%\.git --work-tree=%GBKUP_LOCAL_MIRROR_HOME% 
mkdir %GBKUP_LOCAL_MIRROR_HOME%
git %GIT_PARAMS% init
git %GIT_PARAMS% config core.autocrlf false
git %GIT_PARAMS% config core.ignorecase false 
git %GIT_PARAMS% config core.fileMode false
git %GIT_PARAMS% config user.email backup@yourComputerName
git %GIT_PARAMS% config user.name backup

REM add a remote to the git repo.  Make sure you have set myRemoteServer in ~/.ssh/config   
REM The path on the remote server will vary.  Our remote server is a Windows machine running cygwin+ssh.  
REM For better security, you could install gitolite on the remote server, and forbid any non-fast-forward merges, and thus stop a malicious user from overwriting your backups.
git %GIT_PARAMS% remote add origin myRemoteServer:/cygdrive/c/backup/yourComputerName.git

REM treat all files as binary; so you don't have to worry about autocrlf changing your line endings
SET ATTRIBUTES_FILE=%GBKUP_LOCAL_MIRROR_HOME%\.git\info\attributes
echo.>> %ATTRIBUTES_FILE% 
echo *.gbkuptest text>> %ATTRIBUTES_FILE% 
echo * binary>> %ATTRIBUTES_FILE% 
REM compression is often a waste of time with binary files
echo * -delta>> %ATTRIBUTES_FILE% 
REM You may need to get rid of windows new lines. We use cygwin's tool
C:\cygwin64\bin\dos2unix %ATTRIBUTES_FILE%

Next, we have our backup script wrapper, which will be called regularly by Windows Scheduler:

gbackup.vbs:

' A simple vbs wrapper to run your bat file in the background
Set oShell = CreateObject ("Wscript.Shell") 
Dim strArgs
strArgs = "cmd /c C:\opt\gbackup\gbackup.bat"
oShell.Run strArgs, 0, false

Next, we have the backup script itself that the wrapper calls:

gbackup.bat:

    @echo off

REM Set where the git repository will be stored
SET GBKUP_LOCAL_MIRROR_HOME=E:\backup\mirror
REM the user which runs the scheduler
SET GBKUP_RUN_AS_USER=yourWindowsUserName
REM exclude file
SET GBKUP_EXCLUDE_FILE=/cygdrive/c/opt/gbackup/exclude-from.txt

SET GBKUP_TMP_GIT_DIR_NAME=git-renamed
for /f "delims=" %%i in ('C:\cygwin64\bin\cygpath %GBKUP_LOCAL_MIRROR_HOME%') do set GBKUP_LOCAL_MIRROR_CYGWIN=%%i

REM rename any .git directories as they were (see below command)
for /r %GBKUP_LOCAL_MIRROR_HOME% %%i in (%GBKUP_TMP_GIT_DIR_NAME%) do ren "%%i" ".git" 2> nul

SET RSYNC_CMD_BASE=C:\cygwin64\bin\rsync -ahv --progress --delete --exclude-from %GBKUP_EXCLUDE_FILE%

REM rsync all needed directories to local mirror
%RSYNC_CMD_BASE% /cygdrive/c/dev %GBKUP_LOCAL_MIRROR_CYGWIN%
%RSYNC_CMD_BASE% /cygdrive/c/Users/asmith %GBKUP_LOCAL_MIRROR_CYGWIN%
%RSYNC_CMD_BASE% /cygdrive/c/Users/bsmith %GBKUP_LOCAL_MIRROR_CYGWIN%

cacls %GBKUP_LOCAL_MIRROR_HOME% /t /e /p  %GBKUP_RUN_AS_USER%:f

REM rename any .git directories as git will ignore the entire directory, except the main one
for /r %GBKUP_LOCAL_MIRROR_HOME% %%i in (.git) do ren "%%i" "%GBKUP_TMP_GIT_DIR_NAME%" 2> nul
ren %GBKUP_LOCAL_MIRROR_HOME%\%GBKUP_TMP_GIT_DIR_NAME% .git

REM finally commit to git
SET GIT_PARAMS=--git-dir=%GBKUP_LOCAL_MIRROR_HOME%\.git --work-tree=%GBKUP_LOCAL_MIRROR_HOME% 
SET BKUP_LOG_FILE=%TMP%\git-backup.log
SET TO_LOG=1^>^> %BKUP_LOG_FILE% 2^>^&1
echo ===========================BACKUP START=========================== %TO_LOG%
For /f "tokens=2-4 delims=/ " %%a in ('date /t') do (set mydate=%%c-%%a-%%b)
For /f "tokens=1-2 delims=/:" %%a in ('time /t') do (set mytime=%%a%%b)
echo %mydate%_%mytime% %TO_LOG%
echo updating git index, committing, and then pushing to remote %TO_LOG%
REM Caution: The --ignore-errors directive tells git to continue even if it can't access a file.
git %GIT_PARAMS% add -Av --ignore-errors %TO_LOG%
git %GIT_PARAMS% commit -m "backup" %TO_LOG%
git %GIT_PARAMS% push -vv --progress origin master %TO_LOG%
echo ===========================BACKUP END=========================== %TO_LOG%

We have exclude-from.txt file, where we put all the files to ignore:

exclude-from.txt:

target/
logs/
AppData/
Downloads/
trash/
temp/
.idea/
.m2/
.IntelliJIdea14/
OLD/
Searches/
Videos/
NTUSER.DAT*
ntuser.dat*

You'll need to go to any remote repos and do a 'git init --bare' on them. You can test the script by executing the backup script. Assuming everything works, go to Windows Scheduler and point an hourly backup toward the vbs file. After that, you'll have a git history of your computer for every hour. It's extremely convenient -- every accidentally delete a section of text and miss it? Just check your git repository.

FMaz008 · Answer 6 · 2011-12-16T05:40:10+08:00

FMaz008

2011-12-16T05:40:10+08:002011-12-16T05:40:10+08:00

Well it's not a bad idea, but I think there is 2 red flags to be raised:

If the harddisk fail, you'll lose everything if you're not pushing your commit to another server/drive. ( Event if you've a plan for it, I prefer to mention. )

... but still, it can be a good backup for corruptions-related things. Or like you said, if the .git/ folder is somewhere else.

This backup will always increase in size. There's no pruning or rotation or anything by default.

... So you may need to tell your cronjob to add tags, and then make sure commit that are not tagged will be cleaned up.

5

Scott Keck-Warren · Answer 7 · 2011-12-16T05:23:59+08:00

Scott Keck-Warren

2011-12-16T05:23:59+08:002011-12-16T05:23:59+08:00

I haven't tried it with a full system but I'm using it for my MySQL backups (with the --skip-extended-insert option) and it has really worked well for me.

You're going to run into problem with binary data files (their entire contents could and will change) and you might have problems with the .git folder getting really large. I would recommend setting up a .gitignore file and only backing up text files that you really know you need.

3

shodanshok · Answer 8 · 2015-03-22T12:01:23+08:00

I once developped a backup solution based on subversion. While it worked quite well (and git should work even better), I think there are better solutions out here.

I consider rsnapshot to be one of the better - if not the better. With a good use of hard link, I have a 300 GB fileserver (with half a million files) with daily, weekly and montly backup going back as far as one years. Total used disk space is only one full copy + the incremental part of each backup, but thanks to hardlinks I have a complete "live" directory structure in each of the backups. In other word, files are directly accessible not only under daily.0 (the most recent backup), but even in daily.1 (yestarday) or weekly.2 (two week ago), and so on.

Resharing the backup folder with Samba, my users are able to pull the file from backups simply by pointing their PC to the backup server.

Another very good options is rdiff-backup, but as I like to have files always accessible simply by heading Explorer to \\servername, rsnapshot was a better solution for me.

Daniel · Answer 9 · 2011-12-16T10:07:19+08:00

Daniel

2011-12-16T10:07:19+08:002011-12-16T10:07:19+08:00

I had the same idea to backup with git, basically because it allows versioned backups. Then I saw rdiff-backup, which provides that functionality (and much more). It has a really nice user interface (look at the CLI options). I'm quite happy with that. The --remove-older-than 2W is pretty cool. It allows you to just delete versions older than 2 weeks. rdiff-backup stores only diffs of files.

2

Matthew Cornell · Answer 10 · 2013-03-07T05:22:15+08:00

Matthew Cornell

2013-03-07T05:22:15+08:002013-03-07T05:22:15+08:00

I am extremely new to git, but aren't branches local by default, and must be pushed explicitly to remote repositories? This was an unpleasant and unexpected surprise. After all, don't I want all of my local repo to be 'backed up' to the server? Reading the git book:

Your local branches aren’t automatically synchronized to the remotes you write to — you have to explicitly push the branches you want to share. That way, you can use private branches for work you don’t want to share, and push up only the topic branches you want to collaborate on.

To me this meant that those local branches, like other non-git files on my local machine, are at risk of being lost unless backed up regularly by some non-git means. I do this anyway, but it broke my assumptions about git 'backing up everything' in my repo. I'd love clarification on this!

2

GIT as a backup tool

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?