A postgres SELECT query ran out of control on our DB server and started eating up tons of memory and swap until the server ran out of memory. I found the particular process via ps aux | grep postgres
and ran kill -9 pid
. This killed the process and the memory freed up as expected. The rest of the system and postgres queries appeared to be unaffected. This server is running postgres 9.1.3 on SLES 9 SP4.
However, one of our developers chewed me out for killing a postgres process with kill -9
, saying that it will take down the entire postgres service. In reality, it did not. I've done this before a handful of times and have not seen any negative side effects.
With that said, and after further reading, it looks like kill pid
without the flags is the preferred way to kill a runaway postgres process, but per other users in the postgres community, it also sounds like postgres has "gotten better" over the years such that kill -9
on an individual query process/thread is no longer a death sentence.
Can someone enlighten me on the proper way to kill a runaway postgres process as well as the how disastrous (or benign) using kill -9
is with Postgres these days? Thanks for the insight.
voretaq7's answer covers the key points, including the correct way to terminate backends but I'd like to add a little more explanation.
kill -9
(ieSIGKILL
) should never, ever, ever be your first-choice default. It should be your last resort when the process doesn't respond to its normal shutdown requests and aSIGTERM
(kill -15
) has had no effect. That's true of Pg and pretty much everything else.kill -9
gives the killed process no chance to do any cleanup at all.When it comes to PostgreSQL, Pg sees a backed that's terminated by
kill -9
as a backed crash. It knows the backend might have corrupted shared memory - because you could've interrupted it half way through writing a page into shm or modifying one, for example - so it terminates and restarts all the other backends when it notices that a backend has suddenly vanished and exited with a non-zero error code.You'll see this reported in the logs.
If it appears to do no harm, that because Pg is restarting everything after the crash and your application is recovering from the lost connections cleanly. That doesn't make it a good idea. If nothing else backend crashes are less well tested than the normal-functioning parts of Pg and are much more complicated/varied, so the chances of a bug lurking in backend crash handling and recovery are higher.
BTW, if you
kill -9
the postmaster then removepostmaster.pid
and start it again without making sure everypostgres
backend is gone, very bad things can happen. This could easily happen if you accidentally killed the postmaster instead of a backend, saw the database had gone down, tried to restart it, removed the "stale" .pid file when the restart failed, and tried to restart it again. That's one of the reasons you should avoid wavingkill -9
around Pg, and shouldn't deletepostmaster.pid
.A demonstration:
To see exactly what happens when you
kill -9
a backend, try these simple steps. Open two terminals, open psql in each, and in each runSELECT pg_backend_pid();
. In another terminalkill -9
one of the PIDs. Now runSELECT pg_backend_pid();
in both psql sessions again. Notice how they both lost their connections?Session 1, which we killed:
Session 2, which was collateral damage:
See how both sessions were broken? That's why you don't
kill -9
a backend.I found the particular process via ps aux | grep postgres and ran kill -9 pid.
NO! BAD! STEP AWAY FROM THE BACKEND!
Seriously -- Don't kill Postgres backends like that -- TERRIBLE things can happen (even with all the stability enhancements that have been made since the 7.x days) which can trash your whole DB, and your developer is quite right to chew you out for doing this.
There is, in fact, a blessed and approved way of doing this from within Postgres -- It's even in the Postgres manual though that SO post does a better job of explaining it...
SELECT pg_cancel_backend(pid)
Sends a cancel (
SIGINT
) signal to the specified backend, which cancels the currently running query.select pg_terminate_backend(pid)
Sends a terminate (
SIGTERM
) signal to the specified backend, which cancels the query and aborts the backend (dropping its connection).Backend IDs can be obtained from the
pg_stat_activity
table (orps
)Killing a PostgreSQL client process should be fine. Killing a PostgreSQL daemon process might get you scolded.
Since SQL daemons have internal process controls as well, the preferred way is to try using that channel first.
See Stop (long) running SQL query in PostgreSQL... from StackOverflow.