sar -u 1 | awk '{print $9}'
so this will give me "CPU Idle" value every second. I'd like to get email in this case the value goes to "0" 10 times in a row?
What would be the appropriate way to do it?
I found a preliminary solution
sar -u 1 | awk '{ if (int($9)==0) {
i=i+1; {
print i, $9
}
}
if (int($9)>=0) {
i=0
}
if (i>=10) print "sending email"
}'
but in last line where I print "sending email" I can't put call to mutt, like this
sar -u 1 | awk '{ if (int($9)==0) {
i=i+1; {
print i, $9
}
}
if (int($9)>=0) {
i=0
}
if (i>=10) mutt -s "VPNC Problem" [email protected] < /home/semenov/strace.output
}'
the problem is that it says "syntax" error in mutt command call. Any ideas?
The appropriate way to do it is to NOT do it.
CPU Utilization (either %used or %idle) is a bogus value to monitor - it can (and SHOULD) be 100% at various times during normal operation. Do you really want a bunch of alerts because you happened to get 5-10 web requests at the same time your monitoring system checked CPU utilization? I'm betting the answer is no.
Instead you should monitor Load Average (reported by
uptime
among other tools), which is a measure of the number of processes which want to run right now (the length ofRunQ
in OS scheduling terms).The value is usually reported as three values, 1-minute load average ("now"), 5-minute load average, and 15-minute load average.
Load averages below 1 indicate an "unloaded" system (lots of free CPU time, no programs waiting around to execute).
High load averages ("high" being relative to the number of CPUs you have and your system's interactive performance under load) are a cause for concern, and should be investigated.
I typically use 10 as my threshold for load average alarms -- a value high enough that you shouldn't typically see it in production, but low enough that you should have time to respond to the situation once the alarm trips.
The script to monitor in either case is trivial:
The getting-and-stuffing part is left as an exercise for the reader.
If you really want to Do It Right you should investigate some monitoring systems and SNMP...
okay correct command is this