The system is a spare Dell 2400 I wiped clean, with Ubuntu 10.4 installed. Update manager has everything current, and I haven't been mucking with drivers or tricky system settings. In fact, it has been a stable and friendly system to install and use.
So imagine my surprise when browsing to http://element-14.com/ (an otherwise useful community site for electronic engineering types) followed a redirect or two, then black screen, then the I'm starting up tune with the pink hazy smoke and nothing further works. The keyboard is crashed hard, and the Alt-SysRq key combos do nothing.
More than just firefox and the X server are crashing. I repeated the crash with an SSH session open, and not only did the connection get taken down, but it no longer responded to attempts to get a fresh connection.
I tried enabling Apport, in hopes that it would notice something and help identify the culprit, but it seems to be oblivious to the crash.
Each time, I've had to lean on the power button to reboot.
Google searches hint that there are issues with the particular intel chipset providing the VGA on its motherboard.
I'm looking for advice about how to proceed with debugging this kind of crash. Any ideas?
Update: I tried following advice to try setting up the netconsole
kernel module and a matching netcat instance to receive the log. I set up netcat on my XP box, used Alt-SysRq-S to verify it could receive kernel messages, then browsed to the site. Only two printk()
s were logged:
[251728.009794] i915: Unknown parameter `modset' [251728.051420] i915: Unknown parameter `modset'
Hmm. Perhaps my video driver is misconfigured? Especially since I see these same messages in the output of dmesg
just after booting.
At least this time I explicitly synced my disks before deliberately crashing the system.
For the record, lspci -nn | grep VGA
says:
00:02.0 VGA compatible controller [0300]: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device [8086:2562] (rev 01)
Update: Solved!!!
The hint to use netconsole
led to an epiphany. Googling around the phrase "i915 unknown parameter modset" suddenly led me to trip over the root cause.
The name of the option to the i915 driver is modeset not modset.
I changed /etc/modprobe.d/i915.conf to have the correct spelling, rebooted, and now I can access element-14 (and presumably other sites that do whatever it is that element-14 does that triggers the bug in the video driver) without an unpleasant forced reboot.
This leaves behind the (apparently well known) issue that the i915 driver lacks quality, especially on older chipsets. Apparently the Kernel Mode Setting feature is particularly deficient. Without the option spelled correctly, it defaulted to KMS enabled, and also crashed. With it spelled correctly, KMS is disabled, and the driver survives whatever content was triggering the crash.
Also, there are a number of bug pages at launchpad and other community sites that have the wrong spelling of the option name. I strongly suspect that is where I got the spelling I used.
Edit: I've copied the relevant solution to an actual answer, and improved my description of it here.
Assuming it's a kernel crash you need to capture the kernel dump info, you can try using a kernel net console: https://wiki.ubuntu.com/Kernel/Netconsole
Almost assuredly a graphics chip driver or chip bug as there is little else that has crushed a system like that in my experience. If you want to really muck about inside drivers that don't get much attention, do enjoy.
There are app-notes, device documentation, and code at Intel. Personally, I'd drop US$30-40 on the best damn PCI graphics card money can buy (yes, you do pay a premium for legacy hardware) and be done with it. Ask around and you may find someone with a similar vintage machine with such a card for free. I just recycled such a machine for a friend the other week.
The hint about netconsole from João Pinto led to an epiphany. Googling around the phrase "i915 unknown parameter modset" suddenly led me to trip over the root cause.
The name of the option to the i915 driver is spelled "modeset" not "modset".
I changed /etc/modprobe.d/i915.conf to have the correct spelling, rebooted, and now I can access element-14 without a reboot.