Last night we received a notification from vCenter saying that it couldn't connect to the agent on one of our hosts, and that the "Host and Power status" was in error and the server was disconnected.
There were no issues with any guests running on the host, so we left it for the morning.
But when checking the task & event and alert logs, we find nothing. The host logs also show no issues at that time.
No reference to anything having gone wrong, nothing to tie that notification back to.
Even if the issue was temporary and fixed itself, should't there be something in the log indicating that some type of trouble occurred?
Also, if it in fact automatically recovered, why didn't vCenter send its usual "Oh hai, everythings fine nao" notification when the system recovered?
Regarding the alerting for when an alarm clears, you need to alter the Alarm definition so that the notification is triggered on a change of state 'to green' rather than the default, which triggers when the state changes to anything 'from green'. To do this:
Here's the column you need to configure:
Regarding the disconnections, are you running ESXi or ESX? The logging on ESXi rolls over very quickly (especially messages) so you may not be able to go far back enough to see the disconnection information. If this is the case you can rectify it by configuring the host to log to an external syslog server. We've seen host disconnection issues for strange reasons recently, most notably that a checkpoint appliance between the host and vcenter was interfering with packet order (via it's 'Intelligent' IDS) and causing hosts to regularly drop to an unmanageable state until we restarted the management services. Are there any WAN links or firewalls between the hosts and vcenter?