I am running Bacula on a RedHat box. From time to time, the storage daemon bacula-sd stops working and becomes <defunct>
.
[root@backup ~]# ps -ef | grep defunct | more
root 4801 29261 0 09:25 pts/5 00:00:00 grep defunct
root 5825 1 0 Oct18 ? 00:00:00 [bacula-sd] <defunct>
My question is, how can I kill this process? Its parent is 1, which is init, as far as I know, and I wouldn't want to kill the init process, would I?
'Normally' killing this process does not work:
[root@backup ~]# kill -0 5825
[root@backup ~]# kill -9 5825
Help is greatly appreciated!
Edit: running
[root@backup ~]# lsof -p 5825
produces the following output:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
bacula-sd 5825 root cwd DIR 253,0 4096 3801089 /root
bacula-sd 5825 root rtd DIR 253,0 4096 2 /
bacula-sd 5825 root txt REG 253,0 2110599 368004 /usr/local/sbin/bacula-sd
bacula-sd 5825 root mem REG 253,0 75284 389867 /usr/lib/libz.so.1.2.3
bacula-sd 5825 root mem REG 253,0 46680 3604521 /lib/libnss_files-2.5.so
bacula-sd 5825 root mem REG 253,0 936908 369115 /usr/lib/libstdc++.so.6.0.8
bacula-sd 5825 root mem REG 253,0 125736 3606807 /lib/ld-2.5.so
bacula-sd 5825 root mem REG 253,0 1602128 3606885 /lib/libc-2.5.so
bacula-sd 5825 root mem REG 253,0 208352 3606892 /lib/libm-2.5.so
bacula-sd 5825 root mem REG 253,0 125744 3606887 /lib/libpthread-2.5.so
bacula-sd 5825 root mem REG 253,0 25940 3604573 /lib/libacl.so.1.1.0
bacula-sd 5825 root mem REG 253,0 15972 3604535 /lib/libattr.so.1.1.0
bacula-sd 5825 root mem REG 253,0 46548 3606908 /lib/libgcc_s-4.1.2-20080102.so.1
bacula-sd 5825 root mem REG 253,0 56422480 366368 /usr/lib/locale/locale-archive
bacula-sd 5825 root 0r CHR 1,3 1545 /dev/null
bacula-sd 5825 root 1r CHR 1,3 1545 /dev/null
bacula-sd 5825 root 2r CHR 1,3 1545 /dev/null
bacula-sd 5825 root 3u CHR 9,128 6469 /dev/nst0
bacula-sd 5825 root 4u IPv4 1023380 TCP backup:bacula-sd (LISTEN)
bacula-sd 5825 root 5u IPv4 2693268 TCP backup:bacula-sd->backup:53957 (CLOSE_WAIT)
bacula-sd 5825 root 7u IPv4 3248683 TCP backup:bacula-sd->backup:57629 (CLOSE_WAIT)
bacula-sd 5825 root 8u IPv4 3250966 TCP backup:bacula-sd->backup:37650 (CLOSE_WAIT)
bacula-sd 5825 root 9u IPv4 3253908 TCP backup:bacula-sd->backup:37671 (CLOSE_WAIT)
The only way you could remove the zombie/defunct process, would be to kill the parent. Since the parent is init (pid 1), that would also take down your system.
This pretty much leaves you with two options.
I'd go with the second.
Check if there was a kernel panic,
Check if the process is in "D" Unkillable sleep, where it's in kernel mode for some syscall which has not returned yet (either kernel oops, or some other reason) http://www.nabble.com/What-causes-an-unkillable-process--td20645581.html
You could try restarting init:
Otherwise, I wouldn't worry too much. It's not running and it's not taking any resources and it's just there so the kernel can remember it.
If a zombie has init as its parent, then init has stopped working properly. One of the roles of init is to clean up zombies. If it doesn't do it, noone else will. So the only solution is to reboot. If init is broken, then a reboot may fail, so I'd shut down important services, sync the filesystem then hit the power button instead.
Let's keep the panic down, shall we? A "defunct" or "zombie" process is not a process. It is simply an entry in the process table, with a saved exit code. Thus, a zombie holds no resources, takes no CPU cycles, and uses no memory, since it is not a process. Don't get all weird and itchy trying to "kill" zombie processes. Just like their namesakes, they can't be killed, since they're already dead. But unlike the brain-eating kind, they harm absolutely no-one, and won't bite other processes.
Don't let zombie processes eat your brain. Just ignore them.
Seems like you've got an orphaned process. As far as I know the only way to kill these would be to reboot the box. I've had this happen on my ESX servers (which are linux under the hood) from time to time and a host reboot is the fix (from VMware support).
I'm a Windows guy so take that for what its worth.
I just had this issue, where I'm running wine Kindle, and the Kindle window won't close after I kill all wine processes, if I run ps, there is a
[Kindle.exe] <defunct>
process whose parent is 1 (ps.tree is a self-made script to show process tree):I finally killed the
[Kindle.exe]
process and the ghost window by killing all threads of this process, by running this command: