Despite researching this topic quite a bit online (to be fair I'm not a full time sysadmin) I'm unable to figure this out.
We have a bunch of VMWare ESXi 5.5 servers, some of which are integrated into vSphere, some of which are not (for cost reasons).
All of them run the standard ESXi image, with the exception of one machine which is actually running the DELL VMWare ESXi image.
What I would like to accomplish seems simple: Configure the system so that it can be queried via SNMP from a remote host, whether it's snmpwalk, Nagios, PRTG etc. I'd like to see information from temperature sensors, installed disks and their status, fan speed, PSU status etc.
I was under the impression that installing the VMWare version from DELL would automagically enable the necessary modules (OpenManage most importantly), but it seems like that is not the case.
I have conflicting information whether this is even possible at all, some documents say that you cannot query a DELL VMWare ESXi server via SNMP and you need to use a CIM client. Then there is the OMSA VIBs one can install, etc.
I imagine this being a fairly common requirement, yet the docs available pull one in all different directions.
Is what I am trying to do possible (without a complete vSphere environment) even possible?
Yes, you can monitor the standalone ESXi Host using any SNMP monitoring software but some items may only be visible using a monitoring tool that supports the CIM protocol.
All of my ESXi Hosts are part of vCenter but we monitor them directly (using the vmkernal Host IP address) with SolarWinds NPM. There are 5 or 6 CIM modules built into ESXi 5.5 that give you hardware health but RAID card health is not one of them. You will need to add the Dell OMSA VIB that adds the additional CIM agents including the one for the RAID array. Brian Atkinson's post is still the best I have found that describes the process,
https://communities.vmware.com/people/vmroyale/blog/2012/07/26/how-to-use-dell-dset-with-esxi
You only need to follow the instructions for installing the OMSA ESXi VIB if you are going to use a third party monitoring tool that gives historical information and does alerting. If you wish to use the Dell OMSA Server you can install it remotely on bare bones server, remotely in a VM or locally as a VM.
You can use the OMSA server to connect to DRAC and iDRAC Out of Band (OOB/ IPMI/ iLo) management cards or to the ESXi Host after you install the OMSA VIB on the ESXi Host. You will not see the RAID Health information in the DRAC or iDRAC though - only when connecting the OMSA Server to an ESXi Host - I repeat the Server keyword so there is no confusion between the Server which is acting as a client to the OMSA VIB that is installed on the ESXi Host.
Some useful resources:
Show the current CIM providers on an ESXi Host https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2053715
Show the currently installed VIBs on the ESXi Host from the Host's CLI,
esxcli software vib list
You do see some minor additional hardware health details when you connect to a vCenter server versus the ESXi Host directly but generally if you do not see the hardware health you are looking for in the Configuration/ Health Status panel then you are missing a CIM provider and you need to locate and install the VIB on the ESXi Host. When you add the Dell OMSA VIB to the ESXi Host you will see a Storage sensor added to the Health Status page which shows the RAID volumes, drives, controller and battery health for your storage controller. You may need to reset the sensors for it to show up and sometimes it takes 15 to 20 minutes the first time after the VIB install and reboot of the ESXi Host.
If you do not see a sensor on the ESXi Host's Health Status page when you connect with the vSphere Client then you are most likely not going to see it when you are remotely polling the sensors with monitoring software.
Also you should note that not all servers have the same sensors and you may not be able to get the same health status from all depending on the Server hardware, RAID card and version of the CIM available for the combination. You may also need to upgrade or change the VIBs for the RAID card in order for the health status to work. The CIM provider (which is the OMSA VIB in this case) talks to the hardware through the device VIB (the real device driver) and passes this information to the CIM Broker on the ESXi Host - also known as the Small Footprint CIM Broker Daemon (sfcbd). When you poll the ESXi Host for hardware health using robust monitoring software it will get some information using SNMP queries, some using CIM and some using the ESXi API (which are SOAP requests). The CIM client talks to the sfcbd process on the ESXi Host.
Sometimes the CIM process just stops working. When that happens you will be restarting the sfcbd-watchdog process on the ESXi Host. This will restart the sfcbd service and CIM polling will work again. From the CLI of the Host,
/etc/init.d/sfcbd-watchdog restart
I think that covers most of the items you need to get you running.
I understand what you're looking for; specific notes on how to manage and monitor the health of a standalone VMware ESXi host.
In practice, the approach should be slightly different. I'll explain how I manage hosts.
In a situation where you have multiple ESXi hosts under vCenter management, the assumption is that you leverage the vCenter for monitoring and health status, versus querying the individual hosts. There's a catch-all alarm that I configure in vCenter to alert on "Host Hardware Health". I typically don't care if it's a power supply, RAM, disk or any other component, but rather that the host is unhealthy.
Monitoring a standalone ESXi host isn't going to be very helpful, as the point of the Dell/HP drivers is to expose information to vCenter. And I don't believe it's the best practice to query individual hosts in this manner. Granted, that's because you ideally want your VM hosts centrally managed.
If you run vCenter with a single host, you DO get this ability, so maybe that's an option for your environment.
If you need some sort of out-of-band monitoring, couldn't you query the DRAC instead?
you can use the excellent https://exchange.nagios.org/directory/Plugins/Operating-Systems/*-Virtual-Environments/VMWare/check_vmware_api/details (with or without nagios), it leverages the vmware api to get all the info you require for hardware monitoring:
You need the perl vmware sdk but other than that it's pretty straight forward. It works for all types of hardware (as long as the sensors are seen by the vmware api, they are checked).
Try zabbix (http://zabbix.com):
1) it's perfect, well known world class monitoring software
2) you can easy start with Zabbix appliance available also as pre-configured virtual image (based on OpenSuSE).
3) it can monitor ESX[I] hosts and machines using Vmware Web services (like web-client). You can use low-level discovery rules to automatically discover VMware hypervisors and virtual machines and create hosts to monitor them, based on pre-defined host prototypes.
4) you'll be able to monitor whole hardware of your Dell servers using SNMP via iDrac including raid controller and it's volumes status, physical discs/memory modules/PSU and so on...
All kind of hardware statuses info [as it available in iDrac] can be accessed via SNMP (at least on servers with IDrac 7/8 - I've implemented monitoring of hardware of 50+ Dell 12/13 generation servers for my company in this way).
With perfect LLD (low-level-discovery) feature of zabbix you can easy collect all hardware components for monitoring without manual enumeration and automatically create an items for monitoring (statuses, temperatures, fan speeds, disk sizes and serials and so on), triggers (expressions to process monitoring data) and various actions...