I'm looking to make very frequent backup (every hour) of postgres data on several VMs (say 20-50) towards the same archive server.
Here are more data if needed: Ideally, the system should support the load of 80 to 200 databases located on all VMs. The databases are small (10MB - 100MB) to medium sized (500MB - 2GB), composed of hundredth of tables, a small portion of these tables can easily contain several thousands of lines up to around a million of lines. Changes to the database is often new records, some updates, not so much deletion. The bandwidth would be 100Mbits/s.
As I'm already doing so with a standard filesystem using incremental backup (rsync
), I'm wondering if something similar could be achieved with postgres database backups.
I have several possible options:
- I could choose to put database on snapshotable filesystem (
aufs
docker style,ZFS
,btrfs
, but some of these seems really slowing down postgres). - I'm ready to use WAL if necessary
- It would be better if I can backup only at the database level if necessary. As I do not need to backup the whole postgres data, only the customers databases.
- I have some disk space on the postgres server that could keep an intermediate backup.
- I can afford some reasonable CPU workload on the VM side, but would rather minimize it on the backup server as it'll add up the more database to backup there will be.
- I'm not really looking for continuous backup or PITR recovery options. My backup server has a file based system (brfs) to do efficient periodic snapshots of backups. It's good enough.
I've thought about:
- using
rsync
in combination withpg_dump
locally to the server in SQL, but I'm not sure which of the different format I should use to keep maximum efficiency. - using snapshotable filesystem that allows to send binary diffs at the block level (btrfs and ZFS are good at it) with or without using a local dump (same question about the backup format to use).
- I've learned about the existence of pg_rman, I don't really know if it can be relied upon, and the setup and various process seems slightly heavier than
pg_dump
. Would it support having only incremental backups ? And can we have a practical format on the backup side ?.
and is there another way than incremental backups to reach small bandwith ?
So... how could I minize bandwith in my postgres backup scenario ?
You're attempting to solve a well-practiced problem (in real database systems) using an awkward solution ; this is understandable for most people coming from a background in smaller database systems (and I've done a very similar thing myself with MySQL and slogged through the consequences of bandwidth blowout).
You should use PostgreSQL's replication features; see http://www.postgresql.org/docs/9.3/interactive/high-availability.html
Make dump in sql format. Keep one full copy on local vm, lets say refreshed every day. Then dump new copy and make a diff from full copy. Copy full copy once a day and only diff at other times. To restore you will have to patch full copy with a diff and execute sql file.