I make a database backup using a command like (-Fd
meaning "directory" format):
pg_dump -U postgres -Fd -j 8 -f testdb.dir testdb
My DB contains many schemas with personal data. At some point I might need to remove one schema from the backup. Restore&re-dump is not a feasible solution (too many backups).
So ideally I would need a command like command --remove-schema=user123 testdb.dir
. That would just remove an appropriate 1234.dat.gz
file and perhaps update toc.dat
inside testdb.dir
.
PS: I know I can list contents with pg_restore -l
, but haven't found docs describing format of the output file. I don't want to guess what is what and risk data loss.
There is no way to do it. You can try to remove it by
grep -p
but it is very complicate as for me. Probably is easier to remove schema after restoringIf you have segregated the data so that all of the PII is contained in one schema; you could exclude that schema from your dump.
From the man page for
pg_dump
:This will break foreign key references from tables in other schemas in the recovered database. And it's only one way to do it; there are plenty of others depending on your use case.
If you are producing a data product you probably want to write a custom export script that removes dangling ID's and adds formatting and other features your consumers will want. Backup tools are designed primarily to preserve all of the data and are rather blunt in filtering it.