Ping a Specific Port

Question

Sandra

Asked: 2019-02-06 05:57:51 +0800 CST2019-02-06 05:57:51 +0800 CST 2019-02-06 05:57:51 +0800 CST

How to big scale backup Gitlab?

772

When asking Gitlab support on how to do a 3TB backup on ones on-premise Gitlab they reply use our tool that produces a tarball.

This just seams wrong to me on all levels. This tarball contains the postgres dump, docker images, repo data, GIT LFS, etc config and so on. Backing up TB of static data together with KB very dynamic data doesn't seam right. And then comes the issue of, we want to do a backup every hour.

Question

I'd really like to know from others how they do it, to get a consistent backup.

ZFS on Linux would be fine with me, if that is part of the solution.

2 Answers

Voted

ETL · Answer 1 · 2019-02-06T07:08:01+08:00

I would review what you are backing up and possibly use a "multi-path" approach. For example, you could backup the Git repositories by constantly running through Git pulls on a backup servers. That would copy only the diff and leave you with a second copy of all Git repositories. Presumably you could detect new repos with the API.

And use the "built-in" backup procedures to backup the issues, etc. I doubt that the 3TB comes from this part so you would be able to do backups very often at very little cost. You could also set up the PostgreSQL database with a warm standby with replication.

Possibly your 3TB comes from container images in the Docker registry. Do you need to back those up? If so, then there may be a better approach just for that.

Basically, I would recommend really looking at what it is that makes up your backup and backup the data in various parts.

Even the backup tool from GitLab has options to include/exclude certain parts of the system such as the Docker Registry.

shodanshok · Answer 2 · 2019-02-06T06:59:33+08:00

Best Answer

shodanshok

2019-02-06T06:59:33+08:002019-02-06T06:59:33+08:00

For such a short time between backups (1h), your best bet is to rely on filesystem-level snapshot and send/recv support.

If using ZoL is not a problem in your environment, I would strongly advise to use it. ZFS is a very robust filesystem and you will really like all the extras (eg: compression) it offer. When coupled with sanoid/syncoid, it can provide a very strong backup strategy. The main disvantage is that it is not included into mainline kernel, so you need to install/update it separately.

Alternatively, if you really need to restrict yourself to mainline-included stuff, you can use BTRFS. But be sure to understand its (many) drawbacks and pita.

Finally, an alternative solution is to use lvmthin to take regular backups (eg: with snapper), relying on third party tools (eg: bdsync, blocksync, etc) to copy/ship deltas only.

A different approach would be to have two replicated machines (via DRBD) where you take indipendent snapshots via lvmthin.

10

How to big scale backup Gitlab?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?