I have Zabbix Server on Ubuntu working... I have an Agent install on my Windows server. Template_Windows works wonders and gives me all sorts of stats that I'm happy to play with...
The problem is that I'm trying to monitor an application. Not a service or a website. An application.
I have one application that likes to crash unexpectedly. I'd like to be able to get a "Program is not responding" alert (and then funnel that into email/sms/reports).
I've got another server with an application that I'd like to monitor stats (CPU usage, etc).
I see how-to monitor services... but these aren't services. They are applications that run while a "User" is logged in. I can't quite find a good tutorial on how to setup something like this.
Edit: Doing further research and tinkering... The question is becoming: Irregardless of the method, how do I detect that an application is frozen/hung/not-responding?
- Use Proc_Counter and detect if there is zero activity for say... 15 seconds?
- Use a perf_info metric? I don't seem to see anything in it that would indicate a hung process, but the only man page I can find is 1.4 and current Zabbix is 1.8.4.
- VBScript, command-line test, etc that monitor/test's application with an output that can be tracked via UserParameters?
I can't seem to get something working. Once I can verify a hung process I can respond with task-kill/restart, email responsible party, etc... but I just can't seem to find a graceful way to detect a hung process/application.
It took me forever to get
simple-checks
working. I haven't tried applications yet.Does the second post here help at all ? http://www.zabbix.com/forum/showthread.php?t=18206
You can go a number of directions.
Probably the two easiest would be to build a user parameter which runs a script on the client system to check the health of your application. If that takes more than 30 seconds to run however then you will most likely be best served by setting up a script which then pushes the health data to Zabbix using the zabbix-sender. On the other end you could do a number of things, probably one of the easiest would be to trigger on nodata() and a combination of last()=errorvalue. The nodata check works best if you have cron sending the data as the timing of the check is not controlled by Zabbix.
There have been several discussions about this in the Zabbix forum.
http://www.zabbix.com/forum