Ping a Specific Port

Question

lexsys

Asked: 2009-07-31 04:17:45 +0800 CST2009-07-31 04:17:45 +0800 CST 2009-07-31 04:17:45 +0800 CST

Why does my server accidentally go down?

772

I have CentOS 5.3 based server with kernel 2.6.18-128.2.1.el5. It worked fine nearly for a month, but this week it went down three times. I saw it in Nagios, write a email to reboot the server. It worked 12-36 hours and then went down again.

I look through log files. Just before first fault in /var/log/messages was this message:

logrotate: ALERT exited abnormally with [1]

After rebooting the server the second time the sysadmin from datacenter send me this screenshot: alt text http://www.freeimagehosting.net/uploads/bd9fb68d98.png Before the third fault in /var/log/messages was message:

Eeek! page_mapcount(page) went negative (-1)

How should I investigate the problem?

UPD:

Part of the memtester output:

Compare OR          : FAILURE: 0x7e9f90d1 != 0x7e9fd2d1 at offset 0x06222609.
FAILURE: 0x7e9f90d1 != 0x7e9fd0d1 at offset 0x06222621.
FAILURE: 0x7e9f90d1 != 0x7e9fd1d1 at offset 0x06222661.
FAILURE: 0x7e9f90d1 != 0x7e9f92d1 at offset 0x06222681.
FAILURE: 0x7e9f90d1 != 0x7e9fd0d1 at offset 0x062226a1.
FAILURE: 0x7e9f90d1 != 0x7e9fd0d1 at offset 0x062226c1.
FAILURE: 0x7e9f90d1 != 0x7e9f93d1 at offset 0x062226e9.

It is faulty memory. Thank you for help!

5 Answers

Voted

TomOnTime · Answer 1 · 2009-07-31T04:30:45+08:00

Best Answer

TomOnTime

2009-07-31T04:30:45+08:002009-07-31T04:30:45+08:00

My first guess is that Nagios has a small memory leak and after months of running ran out of RAM or swap. However, since the machine has crashed a few times in the same day, that suggests a faulty RAM chip. My first step would be to do a memory test or check the bad memory log (if your server supports it).

3

Kyle Brandt · Answer 2 · 2009-07-31T04:37:00+08:00

Kyle Brandt

2009-07-31T04:37:00+08:002009-07-31T04:37:00+08:00

I vote faulty ram too. I would recommend using memtest86 to do a thorough check of the ram. Also, are the temperatures in the room nice and cool?

2

sybreon · Answer 3 · 2009-07-31T04:47:16+08:00

sybreon

2009-07-31T04:47:16+08:002009-07-31T04:47:16+08:00

I vote faulty RAM too. If you cannot use memtest86 because the machine is remotely located, you may want to try a userspace tool - memtester, instead. It doesn't work quite as well but may be able to pick up some memory errors if they are there.

1

Jeremy Bouse · Answer 4 · 2009-07-31T04:26:46+08:00

Jeremy Bouse

2009-07-31T04:26:46+08:002009-07-31T04:26:46+08:00

Just a quick glance it looks like the process that paniced was Nagios. Has that been consistent every time it's paniced and locked up? If so I would ask if the problems started around the time you setup Nagios. If that's the case then you might want to try shutting Nagios down and see if the server returns to be stable. If it does then you have found the culprit and need to look closer to see what's wrong with Nagios.

0

goo · Answer 5 · 2009-07-31T04:36:03+08:00

goo

2009-07-31T04:36:03+08:002009-07-31T04:36:03+08:00

Google or Centos forums/list are likely to be you best bet. Without a crsah dump it's going to be difficult to be sure, so you should look into getting that configured.

You can also search through Redhat bugzilla. This looks a possibility based on the little you have from the screen shot.

0

Why does my server accidentally go down?

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?