Note:
- 2.6+ Kernel
- Or older 2.4
Question:
- Generic suggested guidelines with regards to Linux kernel crash dump analysis
- Skills required, i.e. Kernel compilation skills etc
Suggestions:
- Detailed walk-through, of the Red Hat Crash Utility
- Usage of Kdump
For basic crash dump analysis no particular skills are needed. If you can follow the instructions and open a dump with crash then you can do some basic diagnostics without any in depth knowledge of the kernel. However, for anything beyond the basics you're going to need to know how to debug code using gdb, develop a good knowledge of kernel structure and code aswell as learning how x86 and x86_64 actually work. There are plenty of resources you can google for to help with that. RedHat also run a kernel internals course which is well worth it (if someone else pays)
Once you have the dump open you can do some basic checks that will help diagnose a large number of dumps. When you open the dump you should get some basic info including the load at the time of the crash - always a useful pointer. Looking in the ring buffer log will give you a trace of the crash, taking info from here and googling will often show it is a known issue with a fix. Another place to look is at free memory - if you're down to a handful of small pages you know why the crash/hang occurred.
This is a pretty big subject. I've never come across any really good tutorial type resources with example crash dumps to look through, starting with simple to diagnose problems leading through to much more in depth root causes. Maybe that would be a worthy project.
Here's one pointer which may apply:
Kdump/Kexec Howto