Ping a Specific Port

Question

Curtis

Asked: 2012-08-15 06:34:06 +0800 CST2012-08-15 06:34:06 +0800 CST 2012-08-15 06:34:06 +0800 CST

Debian Linux server locked - no clues in the logs?

772

I had a server lock up this morning. Here is a screen shot from the console:

enter image description here

None of the messages from the screen shot mean anything to me. I have a feeling that the important stuff probably scrolled off the console. I can not find any of the messages from the above screen capture in the syslog, message, dmesg, debug logs or anything logged at all at the time of the crash. Shouldn't this stuff have been logged?

This is a Debian box running Proxmox. uname output:

2.6.32-4-pve #1 SMP Mon May 9 12:59:57 CEST 2011 x86_64 GNU/Linux

The server has been online for about a year with no other crashes and it booted up again just fine.

I would love to figure out what the issue might have been so that we can prevent it from occurring again in the future. But, from the evidence I have so far, I don't even know if this was a hardware or software issue. Ideas?

2 Answers

Voted

svenx · Answer 1 · 2012-08-15T09:18:42+08:00

Exactly which Debian kernel release do you run? You can see the full version and revision numbers if you do "dpkg -l | grep linux-image".

It looks like you're hitting a fairly prevalent bug that I've seen strike numerous times: In kernels before 3.2 mainline, before 2.6.32.50 stable and before Debian's 2.6.32-45 (based on 2.6.32.50 stable), there's a clock overflow that will strike after ~208 days of uptime, which will in turn enable the potential of crashing. I don't know exactly what can cause the crash after that time; the patch itself is pretty vague about it too:

Although we may still have enough bits to store the value of ns,
in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
leading to an incorrect result.

I've seen upwards of hundred crashes due to this issue, before it was determined what caused it and the patch was deployed.

The bug was discussed at length in the lkml at the end of 2011. There could be a possible link to this divide by zero bug, but I haven't found any conclusion.

TL;DR: The likely fix is to upgrade to Debian's linux-image version 2.6.32-45 or later.

DerfK · Answer 2 · 2012-08-15T07:05:50+08:00

DerfK

2012-08-15T07:05:50+08:002012-08-15T07:05:50+08:00

This is a screenshot of a kernel panic. The traceback is printed inside out, so whatever function finally killed the kernel is off the top of the screen, but starting from the top is a call to divide_error() in hpet_msi_next_event() divide_error() is defined in the kernel as a trap for FPE_INTDIV, so something in hpet_msi_next_event() attempted to divide by zero.

Unfortunately, the cause of that could be either hardware, software, or even just a transient bit flip error. (Are you using ECC ram?)

3

Debian Linux server locked - no clues in the logs?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?