Ping a Specific Port

Question

Garry Harthill

Asked: 2009-05-22 08:38:09 +0800 CST2009-05-22 08:38:09 +0800 CST 2009-05-22 08:38:09 +0800 CST

Monitoring hardware failure on HP DL servers

772

Are there any tools other than the HP provided ones for monitoring component failures in these server from RHEL5?

3 Answers

Voted

Chopper3 · Answer 1 · 2009-05-22T08:56:21+08:00

Chopper3

2009-05-22T08:56:21+08:002009-05-22T08:56:21+08:00

HP's hardware hooks are themselves proprietry but they do expose their instrumentation via a number of 'open' methods such as SNMP/WMI/WBEM etc. So you don't HAVE to use SIM/SMHP.

2

gharper · Answer 2 · 2009-05-22T08:57:12+08:00

Best Answer

gharper

2009-05-22T08:57:12+08:002009-05-22T08:57:12+08:00

The HP ASM tools & SNMP OIDs are what we mainly use for general component monitoring....

Alternatively, you can also use smartmontools to monitor the disk drives and most of the sensors should show up in lm_sensors

2

carlito · Answer 3 · 2009-06-05T21:09:12+08:00

You should install HP's full complement of tools, hpasm/hprsm packages, etc. They are literally the hardest packages to install I have ever seen. It seems they were written by people with no concern for ease of deployment. They provide a shell script you can run by hand, use this at first until you figure out how to hack that script, write a wrapper, install the RPMs individually, or lean or the vendor to behave reasonably.

You should monitor syslog for the errors from these tools.

You should parse hpasmcli (show server, show dimm) and hpacucli (controller all show, then for each controller slot=X pd all show) output to identify failures. If you rely on the syslog reporting, you will miss failures and have embarrassing disasters.

You should parse hplog output as well, and clear the output after checking it, archiving this output somewhere. Consider this a redundant check to the hpasmcli/hpacucli checking.

You should use hponcfg to make sure the ILO is configured, and connect to it to make sure it is actually responsive.

Make sure you can upgrade firmware, and do so regularly. HP releases critical firmware upgrades, for example that turn a crash from a minor memory error without identifying the bad DIMM into a fault light. HP changed my opinion about upgrading firmware when it is not absolutely required. (Well, it -is- absolutely required, you just have no one telling you so).

Give up on the SNMP stuff. You have a lot of work to do, this is just additional work that will not give you the full functionality you need, so that you will still have to do the other work.

The HP servers are still the best Intel servers with respect to reporting/managing hardware issues. They just have certain extremely annoying issues. Perhaps if every customer complains at least once they will make deployment easier. There's just no excuse for this.

A DL3[68]0 G5 running RHEL5 and constantly monitored HP management tools along with occasional stress tests of disk and memory will be the most reliable Intel solution on the market. Just do your diligence to make sure you get your money's worth. HP provides you the tools, they just don't make them as easy to use as they should.

Only use HP RAM. It's just not worth the trouble otherwise. You don't need vendors pointing fingers at each other when a DIMM has a fault light go on.

Do a datacenter walkthrough for fault lights regularly and use this to correct failures in your monitoring scripts. This is how I learned that syslog is barely useful and you must check hpasmcli/hpacucli regularly.

Monitoring hardware failure on HP DL servers

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?