We use Nagios for monitoring. Is there a way to create hardware checks using SNMP MIB for R820 servers running ESXi5.x on them? Right now we are using this python plugin:
But we can use it no longer due to security policies within the org. We are satisfied with the output of the current plugin, therefore it would be great if we could use similar agent less check using SNMP. Thanks
Maybe I'm weird, but I prefer to monitor my ESXi hosts in a vSphere cluster through the vCenter SNMP interface (coupled with email for certain events). That covers most of what I need. So it's alerting about events versus polling the hardware through something like Nagios.
Can you clarify which specific items you're most interested in monitoring at the host level?
I think vSphere's traps and email alerts can be as granular as you wish...
Nope. VMware has chosen to go the CIM route instead of SNMP, so you can't do exactly what you asked about. The only SNMP support they have implemented is trap-sending, which was very buggy last time I tried it (admittedly a few years ago).
Two good options have already been discussed here (check_esxi_hardware.py, OP5's check-esx-plugin).
As you're probably aware, Nagios Exchange is littered with other people's attempts to solve this, but most of them are outdated and will not work with modern VMware products.
Regarding the problem of having root access, the python plugin used to work without root access past the root level of the CIM tree (e.g., not inherited down to the VMs themselves), but that seems to no longer be the case as of 5.1. You could probably create a special role for Nagios to use (that isn't the administrator role), though.
Judging by the comments you made above (about wanting more-detailed hardware status monitoring), you might be better served by some IPMI checking through the service processor (BMC, LOM, iLO, whatever you want to call it) in that case.
If you're specifically dealing with Dell hardware, you can add the Dell-specific offline bundle (VIB) to enable OpenManage support in ESXi.
In the future, you might be able to use the excellent check_openmanage plugin for this, but it's not currently possible.
we use the check_esx plugin from op5 (http://www.op5.org/community/plugin-inventory/op5-projects/check-esx-plugin) exactly for this purpuse. You need to install the vmware perl sdk.
We use it like this:
The check_esx plugin can monitor a lot of stuff, great work from the op5 guys.
The problem with check_esxi_hardware and a read-only or non administrator role user (not root) is due to a PAM feature or bug in ESXi 5.1 and later depending on your point of view.
Any user that is created and assigned to any role other than the administrator role is set to denied ALL in /etc/security/access.conf. Even if you clone the administrator role and assign the user you create to this clone role it will be set to denied ALL in /etc/security/access.conf.
I have created a user "nagios" on an ESXi 5.5 host locally (not through vCenter) and assigned it to the "Read Only Role" under the permissions tab. By default its permissions in access.conf are "-:nagios:ALL"
If I ssh to the ESXi host and edit /etc/security/access.conf and change the nagios user permissions to "+:nagios:sfcb" or "+:nagios:ALL" then check_esxi_hardware works.
Using "+:nagios:sfcb" restricts the user "nagios" so it can only access the CIM Service.
The problem you now encounter is changes to /etc/security/access.conf aren't persistent across reboots.
This is a thread in the VMware communities discussing this problem: https://communities.vmware.com/thread/464552?start=15&tstart=0
This is a very good article discussing the same problem using wbem: https://alpacapowered.wordpress.com/2013/09/27/configuring-and-securing-local-esxi-users-for-hardware-monitoring-via-wbem/
These are two blogs discussing making changes persistent over reboots in ESXi:
www.therefinedgeek.com.au/index.php/2012/02/01/enabling-ssh-access-in-esxi-5-0-for-non-root-users/
www.virtuallyghetto.com/2011/08/how-to-persist-configuration-changes-in.html
I can't make the last two links hyperlinks as this is my first post to serverfault and until you have 10 reputation points you can only put two links in an answer (which is fair).
I haven't decided which solution I will use to make the this persistent across reboots. I am still testing.
Thanks