If you have experienced a kernel panic, you can set up a remote kernel console to capture all the data that might be lost on the local console (especially if the crash is from a non-maskable interrupt, which tends to reboot the system).
10.1.1.16 is the IP address of the local interface to send via
eth0 is the name of the local interface to send via
10.1.1.17 is the IP address of the remote interface to send to
00:19:BB:31:B8:0E is the MAC address of the remote interface to send to
On the remote system, run (this requires that you have netcat installed):
nc -l -p 6666 -u | tee capture.file
This will capture all kernel output on the remote system. This runs at a much lower level (the same point in the kernel that writes to /dev/klog), so you may see the very last bit of information that the kernel outputs when it panics even if syslog et. al have stopped operating.
What sort of crash? Everyone's recommendation about dmesg / messages logs are good. If it is just 'shutting off' before it has the chance to log anything, I would guess it might be overheating or there is a power supply problem.
If this is the case, it might be helpful to go to the hardware logs if they exist. If you use Dell servers, Dell support can give you Linux tools to access these logs. Other vendors might provide similar functionality.
Collecting a core over the network is probably overkill, you can dump it locally. This is a guide for setting up and testing kdump. If you follow the instructions and still can't get a dump created locally then you should then move on to capturing over the network..
Of course once you have a core dump, you'll need to do some analysis on it using the crash utility. You'll need to install the right kernel-debuginfo rpm for your running kernel and then invoke crash - you should get the general gist from the whitepaper. If you can get it open the first thing you should look at is the log - scroll down to the bottom and you should get some clues as to what is going on at the time the crash occurs.
If you have experienced a kernel panic, you can set up a remote kernel console to capture all the data that might be lost on the local console (especially if the crash is from a non-maskable interrupt, which tends to reboot the system).
On the system that you expect might crash:
On the remote system, run (this requires that you have netcat installed):
This will capture all kernel output on the remote system. This runs at a much lower level (the same point in the kernel that writes to /dev/klog), so you may see the very last bit of information that the kernel outputs when it panics even if syslog et. al have stopped operating.
try starting process accounting
/etc/init.d/psacct start
or/sbin/chkconfig psacct on
(for autostart on boot)then use lastcomm(1) to see what was running when.
or try installing atop, it will log your machine memory and process state every 10 minutes so you can get an idea what was going on.
atop -r /var/log/atop/atop_YYYYMMDD
and then use t and T keys to go forwards and backwardsin 99% of the cases it is clear from those two exactly what was going on
Have you checked /var/log/dmesg, /var/log/messages, and /var/log/syslog?
What sort of crash? Everyone's recommendation about dmesg / messages logs are good. If it is just 'shutting off' before it has the chance to log anything, I would guess it might be overheating or there is a power supply problem.
If this is the case, it might be helpful to go to the hardware logs if they exist. If you use Dell servers, Dell support can give you Linux tools to access these logs. Other vendors might provide similar functionality.
You might also check the memory with memtest86.
Collecting a core over the network is probably overkill, you can dump it locally. This is a guide for setting up and testing kdump. If you follow the instructions and still can't get a dump created locally then you should then move on to capturing over the network..
Of course once you have a core dump, you'll need to do some analysis on it using the crash utility. You'll need to install the right kernel-debuginfo rpm for your running kernel and then invoke crash - you should get the general gist from the whitepaper. If you can get it open the first thing you should look at is the log - scroll down to the bottom and you should get some clues as to what is going on at the time the crash occurs.
You could configure the machine to do a kernel core dump over the network, but you'd still need someone skilled to look into that.