Currently we are using vFabric Hyperic 4.5.2.2 to monitor a number of systems.
The alerts and such were setup prior to my joining this team, but I have been looking at ways to improve them, namely minimize the impact of the monitoring on the production servers without compromising sufficient coverage.
I've noticed that periodically Hyperic will just hammer the servers, sometimes maxing the CPU for 30 seconds to a minute.
While I know that reducing the number of monitors/alerts will help, I may not be able to do that until some other system architecture and layout changes are made.
In the mean time, is there a way to schedule the page requests made or force them to be staggered? I've found how to change the collection interval - but this doesn't really address the core problem.
In addition, I am not sure if it is just the HTTP monitors that are causing the problems, though I am pretty sure they are contributing.
I was able to locate the server.log, but it either lacks info (perhaps due to logging level?) or I don't know what I am looking for.
The more over reaching question I have is, how can I determine what Hyperic is doing that is causing the monitored servers to sometimes all but lock up? This will, of course, likely lead to other questions, but I can address those as they arise.
I have looked at the answers to this question but out hyperic isn't set to scan the logs.
Thank you.
I had a similar issue with hyperic 4.1.1. After running fine for 2+ years, we started seeing high cpu use.
We isolated the issue to the agent. In our case, we were using the agent with the embedded jre.
We installed the v6 build 35 jre, and set HQ_JAVA_HOME :
(Note: Do not set this var to the java bin dir. Rather set it to the base jre6 dir, on windows this is typically: c:\program files\java\jre6 )
We restarted the agent, and there is peace on earth!
http://pubs.vmware.com/vfabric5/index.jsp?topic=/com.vmware.vfabric.hyperic.4.6/Configure_JREs_for_Hyperic_Components.html