I'm having some problem with my Debian 2.6.38-3
server which **crashes* once every 3 month, but I can't figure out why.
According to Pingdom the site died around 4 AM, but none of the logs I've looked at contains any info about any kind of error.
This is the log files I've looked through:
- /var/log/messages
- /var/log/syslog
- /var/log/debug
- /var/log/kern.log
According to the given logs there is noting wrong.
Here is an example from /var/log/messages
.
Jan 21 04:01:46 debian god[1195]: app still alive after 10s; sent SIGKILL
Jan 21 11:18:20 debian kernel: imklog 3.18.6, log source = /proc/kmsg started.
Any ideas what logs may contain the info I'm looking for?
**crashes as in does not respond to anything. Screen turns black, doesn't responds to web requests and I can't access it using SSH.*
Sadly, probably none of them. When there's a kernel panic, there's no logging subsystem left to write logs to, and no file handles to handle them.
The only possible thing would be to redirect console to /dev/ttyS0 and set up another server to log the output from there.
That way, when the kernel panics (if that's what's happening), you'll be able to tail the log from the monitoring server, across the serial port.
SIGKILL is forcefully killing the process by some one. I think you need some deep monitoring. You need to continuously monitor memory, processing, SWAP, load average, Number of processes, zombies all services running. So you can find a suspect. I suggest you to install nagios and monitor all of above.
Hopefully you may have checked the crontab, kernel bugs etc.
Thanks