I'm currently working with an embedded system that uses Postgresql for data storage. We currently have a problem where the boxes will sometimes get rebooted with no warning, and no proper shutdown. This obviously leaves us with database problems in some cases (invalid page header in certain high-traffic tables is the most common symptom).
What I want to know is, what's the easiest way to clear the errors? I'm obviously going to lose data, but since the errors usually occur in tables that have ephemeral data in them, I don't really care, I just want to get the system back into operation.
Right now our procedure is to drop and re-create any afflicted tables. Is there anything else that we could do that would be faster? As I said, I'm OK with losing any data on the affected page, I just want the thing up and running.
Platform is Ubuntu 7.04, Postgresql 8.2 (We can't force an upgrade onto the customer right now). Filesystem is ext3, on a 2 gig CF card.
Obviously, fixing the unexpected reboots is my top priority, but progress on that is slow (it's hard to reproduce in the lab). In the meantime, I'm hoping for a simpler solution that will let our field people deal more quickly with the issues that do arise.
Just want to follow up on bandaiding, in case somebody needs it:
You can enable zero_damaged_pages and run a VACCUUM on the affected table, this should clear out any pages which contain known defect data. This won't protect you from silent corruption of column values though, since PostgreSQL does no block/page level checksumming of it's data as of now.
So this is a last resort option, fixing the source of the problems is always preferable ;).
Have you tried setting WAL sync method to
fsync_writethrough
?