Ping a Specific Port

Question

Milos

Asked: 2012-01-12 16:11:30 +0800 CST2012-01-12 16:11:30 +0800 CST 2012-01-12 16:11:30 +0800 CST

Debian server freezing

772

I appoligize in advance for not being the proper admin, I'm just a programmer with a server on which I installed Debian Etch plus mysql, php, apache and ISPConfig.

So, it had an uptime of more than 900 days with not a single problem (there's no important load on it, just a couple of our services), and then it started to behave badly - suddenly it freezes (only ping is working, nothing else) and when I try to restart it via ISP's interface, it freezes completely. Then I have to request support for a manual restart. After that, it works fine for a couple of days, then the same thing happens again (it happened three times so far).

Now I performed a network boot and run fsck (found 1.1% non-contiguous) and I hope it will help

My question is did anyone had similar experience and what could be causing such a problem (when only ping works)?

Also, I looked in system log, but found nothing which could indicate a problem. Is there some other log I should look into?

thanks for a lot of answers!

Sorry, I didn't register yet, so I have no option to vote up. But thanks!

First, to clear the issue, this is a housed server, and there is network boot / reset / manual reset function at the ISP's support.

It probably is a HDD issue, since -after the fsck- everything seemed to work fine, until i looked deeper and realized that only the front page works, while others don't (pages give '403 forbidden' error or just a blank page or mysql error...).

SSH is also seems to work but it actually doesn't work: i can try to log in and it will refuse the wrong password, but when I enter the correct one - the connection just closes.

I will try to access the files once again through network boot and backup as much as possible, then will have to replace the disk...

Is it possible to clone a disk with errors on it? Is it worth trying, anyway?

UPDATE: Today (one day after I asked the question) it turned out that the HDD is definitely defective. Once again, thanks for your time and help!

2 Answers

Voted

Brett Dikeman · Answer 1 · 2012-01-12T22:10:41+08:00

Assuming this is a dedicated physical server:

The next time it freezes, you should have your hosting company plug in a "crash cart" and see what's on the screen (console), or go down yourself. The next time it starts to act up, if you're able to login, type "dmesg" and look for error messages; include them by editing your question and pasting them, or using pastebin.

I've snapped photos with a digital camera or cellphone in the past for later reference or showing to someone remotely. Any serious kernel messages will most likely be on screen (it depends on how logging is configured); without this information, the answers you get will be essentially wild guesses.

My wild guess is hard drive failure; bring a bootable CD (Ubuntu is probably easiest) and run smartctl -A insert hard drive device path here. You'll get a list of drive health parameters, and more importantly, a log of errors from the drive, if any.

Also: ignore the person who suggested doing an OS upgrade. That is exceptionally dangerous advice.

Update: Yes, it's possible to clone a damaged drive, if you don't have good or recent backups. Look at GNU ddrescue. It's an advanced tool, though. If money is on the line, send it out for recovery, or at least hire a pro sysadmin who has experience with data recovery.

aseq · Answer 2 · 2012-01-12T16:41:54+08:00

It's possible this is a hardware issue. Disk or memory errors, over heating (clogged fan or air vents), network card that went bad. Unless there are any hardware errors then as a first thing I would upgrade the system to lenny, then squeeze. It's possible it may automagically fix your problems.

I would also scan the system for badblocks (that's the command name). In mkfs.ext3 there exists the following option:

-c     This option causes e2fsck to use badblocks(8) program to do a read-only scan of the device in order to find any bad 
       blocks.  If any bad blocks are found, they are added to the bad block inode to prevent them from being allocated to
       a file or directory. If this option is specified twice, then the bad block scan will be done using a 
       non-destructive read-write test.

So you will be able to avoid disk errors caused by bad blocks.

Also consider running a memory test using memtest86 or memtest86+. If it finds errors and you feel adventurous you can use memtest's output to feed to the kernel and map out any bad memory: http://rick.vanrein.org/linux/badram/

I know for a fact it works very well. I once had a bad dimm which would predictably crash and burn the system at some point of memory allocation. After using memtest and finding the bad memory area I used badram kernel parameter to map it out and the problem was solved.

Debian server freezing

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?