I see it mentioned from time to time when the kernel checks the CPUs. I suppose it's some kind of hardware bug of the CPU related to the HLT instruction, but I can't find any information about this. So what is hlt_bug actually?
I see it mentioned from time to time when the kernel checks the CPUs. I suppose it's some kind of hardware bug of the CPU related to the HLT instruction, but I can't find any information about this. So what is hlt_bug actually?
I'm not sure of the exact details as it is an issue from long ago, possibly as far back as when 386 based machines were common.
The HLT instruction can only be called in "ring 0" when the CPU is not in "real" mode, so it should only called by the kernel in a modern OS. It instructs the processor to pause until the next interrupt is received. Modern CPUs will drop into a low power state at this point, though it is not quite as simple as that for CPUs with multiple cores obviously.
If I remember rightly, the bug was that some 386 CPUs would not wake in response to some interrupts in certain circumstances. The check to see if this bug exists is done by setting a timer that the affected CPUs are known to respond to and one that they don't - if the first time the CPU wakes is in response to that first, longer period, timer you know the bug exists because it should have already woken beforehand and services the other, shorter period, timer's interrupt. As the HLT instruction is usually never called outside the kernel you don't need to worry about it - I assume the only affect of the "hlt bug found" flag is to stop the power management code calling HLT to idle processors that have the bug so might not wake up.
The only reference I've found to this bug online (aside from copies of kernel boot output, the bugs.* source files, and this question (wow, questions on these sites hit Google's database fast!)) after a quick search is a discussion as to whether the check for it needs to be kept in the kernel these days as it is unlikely to affect any hardware configurations that people are using today or are going to use in future.
Edit: this HOWTO lists a HLT problem in some 486DX-100 chips (search the page for
no-hlt
for the reference). This may be the issue I'm remembering (rather than it being a problem with some 386 chips) or it may be a coincidence and there have been two wake-from-low-power-state bugs concerning that instruction.I encountered one!
My first computer was a Soviet Iskra EVM (basically an IBM PC/XT with the Iron Curtain's own bus, but fully software compatible). On some rare instances it was freezing, sometimes producing garbage on the screen. Upon closer investigation I discovered:
The system had a Siemens SAB 8086 CPU running at 8 Mhz.
The culprit was the HLT (0xF4) instruction which was killing the system regardless of whether the interrupts were disabled or enabled.
A simple sequence, like 0xFA, 0xF4, 0xC3 (cli, hlt, ret) was NOT freezing the system gracefully, as one would expect, rather producing garbage on screen, then freezing up.
The similar sequence 0xFB, 0xF4, 0xC3 (sti, hlt, ret) did not just quietly execute and return to shell, again - garbage on screen, and either a freeze, or (rarely) - return shell.
Just the 0xF4, 0xC3 (normally the interrupts are enabled, anyway) - same garbage, beeps and hang.
I never figured where the control was being transferred, I could have written a bootstrap loader which fills the memory with hooks (0xCC), then the INT 03h handler would have told me where it came from. But back then I never thought about it. Or maybe it wasn't just transferring control, but corrupting something somewhere, who knows? I never heard of the buggy HLT instruction on the Siemens CPUs, but this may be the case. I don't want to generalize, it well might have been just THIS ONE case, or perhaps a buggy batch.
Well, to finish the story - back then I found another machine of the same model, but the one which had the Soviet stone inside (the KM1810VM86M - the faithfully stolen (borrowed) and then reproduced Intel 8086 CPU). I tried playing with the HLT instruction there, and IT WORKED the way it should, and the way the Intel 8086 Programmer's Reference says...
What an irony... and what a story! :)