What would be the recommeded way of copying the production database into staging? Let's say the a link to the production db dump is already available to the staging server.
When doing it by hand, I have to:
- stop any pods with apps that are connected to the db. In my case, it's the web app and the worker app
- drop and recreate the staging db
- run pg_restore on a pod (via Job, or I create a one-off pod to call
pg_restore
from)
I have some ideas but I'd like to know what others have made.
I'm hoping for something that can be as simple as heroku:
heroku pg:backups restore `heroku pg:backups public-url -r production` -r staging
I've been using pghoard, from https://github.com/ohmu/pghoard ...
It seems to work quite well once configured, but there is a good bit of setup overhead. At very least you will need a dockerfile, start script, and kubernetes configs to take advantage of it.
Once it's up and running, you should then be able to copy the latest backup to the bucket for your staging setup pghoard, kill the current staging database, and, when the staging database pod starts again, pghoard can restore the latest backup from its bucket. Probably. (I have not tried using it this way, and don't recall off the top of my head how pghoard chooses the version to restore in the event of a failure.)
For frequently changed staging/test envs, you might also be able to skip using a persistent volume and treat the postgres pod/deployment, along with its storage, as ephemeral.
This is not as simple as the example you gave, but might be the most straightforward, and most likely to actually work. Probably not too hard to automate, either.
Things that I'm a lot less certain about: (These are definitely hypothetical. So... be careful.)
A second option would be to use stolon postgresql, from https://github.com/sorintlab/stolon ...
This, however, sort of depends on your cluster configuration. If you are using one cluster, with multiple namespaces or labels to differentiate prod from staging, then this would be easy (ish).
Again, this isn't really what I would call an out-of-the-box setup. It would require significant tweaking (as well as changes to your production postgres setup) in order to make this work.
I guess a third option, off the top of my head, might be to configure synchronous/streaming replication through postgres itself. Unfortunately, I have no idea what happens when you start writing to an actively sync'd replica. Maybe some mix of enable/disable streaming replication, along with pg_rewind, might do the trick.