HP's hardware hooks are themselves proprietry but they do expose their instrumentation via a number of 'open' methods such as SNMP/WMI/WBEM etc. So you don't HAVE to use SIM/SMHP.
You should install HP's full complement of tools, hpasm/hprsm packages, etc. They are literally the hardest packages to install I have ever seen. It seems they were written by people with no concern for ease of deployment. They provide a shell script you can run by hand, use this at first until you figure out how to hack that script, write a wrapper, install the RPMs individually, or lean or the vendor to behave reasonably.
You should monitor syslog for the errors from these tools.
You should parse hpasmcli (show server, show dimm) and hpacucli (controller all show, then for each controller slot=X pd all show) output to identify failures. If you rely on the syslog reporting, you will miss failures and have embarrassing disasters.
You should parse hplog output as well, and clear the output after checking it, archiving this output somewhere. Consider this a redundant check to the hpasmcli/hpacucli checking.
You should use hponcfg to make sure the ILO is configured, and connect to it to make sure it is actually responsive.
Make sure you can upgrade firmware, and do so regularly. HP releases critical firmware upgrades, for example that turn a crash from a minor memory error without identifying the bad DIMM into a fault light. HP changed my opinion about upgrading firmware when it is not absolutely required. (Well, it -is- absolutely required, you just have no one telling you so).
Give up on the SNMP stuff. You have a lot of work to do, this is just additional work that will not give you the full functionality you need, so that you will still have to do the other work.
The HP servers are still the best Intel servers with respect to reporting/managing hardware issues. They just have certain extremely annoying issues. Perhaps if every customer complains at least once they will make deployment easier. There's just no excuse for this.
A DL3[68]0 G5 running RHEL5 and constantly monitored HP management tools along with occasional stress tests of disk and memory will be the most reliable Intel solution on the market. Just do your diligence to make sure you get your money's worth. HP provides you the tools, they just don't make them as easy to use as they should.
Only use HP RAM. It's just not worth the trouble otherwise. You don't need vendors pointing fingers at each other when a DIMM has a fault light go on.
Do a datacenter walkthrough for fault lights regularly and use this to correct failures in your monitoring scripts. This is how I learned that syslog is barely useful and you must check hpasmcli/hpacucli regularly.
HP's hardware hooks are themselves proprietry but they do expose their instrumentation via a number of 'open' methods such as SNMP/WMI/WBEM etc. So you don't HAVE to use SIM/SMHP.
The HP ASM tools & SNMP OIDs are what we mainly use for general component monitoring....
Alternatively, you can also use smartmontools to monitor the disk drives and most of the sensors should show up in lm_sensors
You should install HP's full complement of tools, hpasm/hprsm packages, etc. They are literally the hardest packages to install I have ever seen. It seems they were written by people with no concern for ease of deployment. They provide a shell script you can run by hand, use this at first until you figure out how to hack that script, write a wrapper, install the RPMs individually, or lean or the vendor to behave reasonably.
You should monitor syslog for the errors from these tools.
You should parse hpasmcli (show server, show dimm) and hpacucli (controller all show, then for each controller slot=X pd all show) output to identify failures. If you rely on the syslog reporting, you will miss failures and have embarrassing disasters.
You should parse hplog output as well, and clear the output after checking it, archiving this output somewhere. Consider this a redundant check to the hpasmcli/hpacucli checking.
You should use hponcfg to make sure the ILO is configured, and connect to it to make sure it is actually responsive.
Make sure you can upgrade firmware, and do so regularly. HP releases critical firmware upgrades, for example that turn a crash from a minor memory error without identifying the bad DIMM into a fault light. HP changed my opinion about upgrading firmware when it is not absolutely required. (Well, it -is- absolutely required, you just have no one telling you so).
Give up on the SNMP stuff. You have a lot of work to do, this is just additional work that will not give you the full functionality you need, so that you will still have to do the other work.
The HP servers are still the best Intel servers with respect to reporting/managing hardware issues. They just have certain extremely annoying issues. Perhaps if every customer complains at least once they will make deployment easier. There's just no excuse for this.
A DL3[68]0 G5 running RHEL5 and constantly monitored HP management tools along with occasional stress tests of disk and memory will be the most reliable Intel solution on the market. Just do your diligence to make sure you get your money's worth. HP provides you the tools, they just don't make them as easy to use as they should.
Only use HP RAM. It's just not worth the trouble otherwise. You don't need vendors pointing fingers at each other when a DIMM has a fault light go on.
Do a datacenter walkthrough for fault lights regularly and use this to correct failures in your monitoring scripts. This is how I learned that syslog is barely useful and you must check hpasmcli/hpacucli regularly.