I'm looking to set up monitoring and alerting for java server based app and want to find some best practices for monitoring JVM specific metrics and for designing alerts based on those metrics.
So what are the key JVM metrics to monitor? Some possible contenders:
- Heap space used
- CPU usage
- GC frequency
- Time spent in GC
- Thread count
- Class count
- Object count
And once you start watching some metrics, what are good alerting strategies for said metrics? CPU usage seems like an easy one, but something like heap space seems good to monitor and be able to view, but it doesn't translate so well into an alertable metric as you expect it will grow to capacity, triggering GC. But something like time spent in GC, especially as a ratio to overall time seems like it has good alerting potential.
I'm not looking for a tool per se (ie. Hyperic or Nagios) to perform the monitoring, but if there is one that has an especially good Java template/default graph/rule set, that would be a handy pointer.
I have used hprof before which bundled together with JRE. It does HEAP and CPU monitoring. I usually use it to monitor CPU usage and check which thread is taking majority of CPU. http://java.sun.com/developer/technicalArticles/Programming/HPROF.html
I also used JProbe before which is a commercial software. http://www.quest.com/jprobe/
Ruxit is monitoring and presenting JVM metrics in an infographics style. It provides insights regarding: CPU, Memory, Traffic, Retransmissions, Connectivity, Suspension, JVM You can see screenshots here: Java Monitoring
Ruxit uses baselining to only alert you when it is necessary. I'm obviously a bit biased as I work for Ruxit. But the infographics style for visualizing the metrics are really great.
There are several types of metrics, many java applications use JMX for in-application metrics and there are the Java VM metrics like you mentioned in the question.
For JMX you can e.g use https://github.com/jmxtrans/jmxtrans and send the metrics to one of the various outputs available. For the standard metrics there is as well tools like
jstat(d)
,jinfo
,jps
, ... which are often helpful.In any case I'd suggest to have a closer look on the JMX monitoring. Often applications provide a lot of metrics available in JMX, not only the JVM data.
If you need the exteme insights, then go for Ruxit/Dynatrace, with that solution it is possible to track metrics along complex infrastructure and down to Java methods. Cool stuff, but often out of budget limits.