Ping a Specific Port

Question

JustAGuy

Asked: 2013-10-27 05:07:01 +0800 CST2013-10-27 05:07:01 +0800 CST 2013-10-27 05:07:01 +0800 CST

Nagios - measuring Average CPU Load

772

I've been looking for some hours now for a plugin that will notify me if one of my server's CPU Load has been over 90% for the past 5 hours. No luck looking around the Nagios Exchange.

Can anyone help out?

Thanks!

4 Answers

Voted

dmourati · Answer 1 · 2013-10-27T10:41:48+08:00

dmourati

2013-10-27T10:41:48+08:002013-10-27T10:41:48+08:00

CPU load under UNIX is typically defined as the number of processes in a runnable state. We measure this in 1, 5, and 15 minute intervals. The command uptime is a common way to output the load average values.

~$ uptime 18:35:22 up 1 min, 1 user, load average: 0.04, 0.01, 0.01

check_load takes a tuple of three elements, matching the 1, 5, and 15 minute averages and accepts both a warning and critical threshold.

As a rough idea, try check_load -c 0.9,0.9,0.9 with a check_interval of 1 hour and a max_check_attempts of 5.

Also note, the -r argument. This addresses the fact that most CPUs are multi-core and can therefore be fully utilized individually while still having excess capacity in the aggregate.

2

the-wabbit · Answer 2 · 2013-10-27T13:05:43+08:00

The basic check_load Nagios check will only evaluate /proc/loadavg which just has 1, 5, and 15 minute averages. If you need more, you would need a backlog reaching this far. Incidentally, the sysstat package does just that - it evaluates and records performance values at given intervals and makes them available via the sar command line utility. The ~~check_sa Nagios plugin~~ is capable of evaluating the output and averaging the values to match your needs.

I should add that Nagios is a rather poor choice when it comes to actually defining alarm thresholds based on performance values averaging over a certain period of time as this needs extensive state-keeping which Nagios does not support. Other monitoring systems collecting performance data are doing a better job here. I would suggest looking at OpenNMS or at least something like Munin if you can't manage the complexity and handle the technical requirements (SNMP) of the former. Both have the advantage of being able to draw fancy RRD graphs helping you to detect trends before you get them formalized in evaluation rules.

Nils · Answer 3 · 2013-11-09T14:17:35+08:00

Nils

2013-11-09T14:17:35+08:002013-11-09T14:17:35+08:00

Astonishing - isn`t it?

We had to write a monitor ourselves for this, too.

The standard check_load is pretty meaningless since it has to be set into relationship with the number of (logical) processors within the system.

So roughly what we do: - look up how many processors are reported in the system - divide the current load through that number

There you will get that 90% mark you are after.

We use 100% for warning and 150% for critical.

1

nandoP · Answer 4 · 2013-10-27T19:14:55+08:00

nandoP

2013-10-27T19:14:55+08:002013-10-27T19:14:55+08:00

install systat crontab sa -q 10000000 |mail somewhere@youwant.report.com

basically, sar gives you by default 10 minute status details,.

so for load avg...

[root@ops2 ~]# sar -q|tail -5

05:00:01 PM 0 527 0.00 0.01 0.00

05:10:01 PM 1 528 0.00 0.00 0.00

05:20:01 PM 6 537 0.00 0.00 0.00

05:30:01 PM 2 532 0.00 0.01 0.00

Average: 2 529 0.03 0.05 0.04

this can report on a number of things, although email server reporting is going away in the likes of app dynamics and newrelic, which dig much deeper (but cost money)

IMHO, nagios is still the best for the money... and hell you can even integrate it with ircd

nagios is definately the way i would go. it is easy to use their prebuilt plugins, or write your own nrpe plugins, and it awesome with hipchat, irc, pagerduty, or custom alerting systems

0

Nagios - measuring Average CPU Load

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?