Ping a Specific Port

Question

vaab

Asked: 2014-09-16 19:42:27 +0800 CST2014-09-16 19:42:27 +0800 CST 2014-09-16 19:42:27 +0800 CST

How to minimize bandwidth in frequent postgres backup scenario?

772

I'm looking to make very frequent backup (every hour) of postgres data on several VMs (say 20-50) towards the same archive server.

Here are more data if needed: Ideally, the system should support the load of 80 to 200 databases located on all VMs. The databases are small (10MB - 100MB) to medium sized (500MB - 2GB), composed of hundredth of tables, a small portion of these tables can easily contain several thousands of lines up to around a million of lines. Changes to the database is often new records, some updates, not so much deletion. The bandwidth would be 100Mbits/s.

As I'm already doing so with a standard filesystem using incremental backup (rsync), I'm wondering if something similar could be achieved with postgres database backups.

I have several possible options:

I could choose to put database on snapshotable filesystem (aufs docker style, ZFS, btrfs, but some of these seems really slowing down postgres).
I'm ready to use WAL if necessary
It would be better if I can backup only at the database level if necessary. As I do not need to backup the whole postgres data, only the customers databases.
I have some disk space on the postgres server that could keep an intermediate backup.
I can afford some reasonable CPU workload on the VM side, but would rather minimize it on the backup server as it'll add up the more database to backup there will be.
I'm not really looking for continuous backup or PITR recovery options. My backup server has a file based system (brfs) to do efficient periodic snapshots of backups. It's good enough.

I've thought about:

using rsync in combination with pg_dump locally to the server in SQL, but I'm not sure which of the different format I should use to keep maximum efficiency.
using snapshotable filesystem that allows to send binary diffs at the block level (btrfs and ZFS are good at it) with or without using a local dump (same question about the backup format to use).
I've learned about the existence of pg_rman, I don't really know if it can be relied upon, and the setup and various process seems slightly heavier than pg_dump. Would it support having only incremental backups ? And can we have a practical format on the backup side ?.

and is there another way than incremental backups to reach small bandwith ?

So... how could I minize bandwith in my postgres backup scenario ?

2 Answers

Voted

Cameron Kerr · Answer 1 · 2014-09-17T02:21:24+08:00

Cameron Kerr

2014-09-17T02:21:24+08:002014-09-17T02:21:24+08:00

You're attempting to solve a well-practiced problem (in real database systems) using an awkward solution ; this is understandable for most people coming from a background in smaller database systems (and I've done a very similar thing myself with MySQL and slogged through the consequences of bandwidth blowout).

You should use PostgreSQL's replication features; see http://www.postgresql.org/docs/9.3/interactive/high-availability.html

3

Kazimieras Aliulis · Answer 2 · 2014-09-16T22:56:16+08:00

Kazimieras Aliulis

2014-09-16T22:56:16+08:002014-09-16T22:56:16+08:00

Make dump in sql format. Keep one full copy on local vm, lets say refreshed every day. Then dump new copy and make a diff from full copy. Copy full copy once a day and only diff at other times. To restore you will have to patch full copy with a diff and execute sql file.

1

How to minimize bandwidth in frequent postgres backup scenario?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?