Ping a Specific Port

Question

Dennis Nolte

Asked: 2014-08-13 05:28:42 +0800 CST2014-08-13 05:28:42 +0800 CST 2014-08-13 05:28:42 +0800 CST

rrdgraph generation fails on high IO load

772

We have a 4 core CPU production system which does a lot of cronjobs , having constant proc queue and an usual load of ~1.5.

During night time we do some IO intensive stuff with postgres. We generate a graph showing the load/memory usage (rrd-updates.sh) This "fails" sometimes on high IO load situations. It is happening nearly every night, but not on every high IO situation.

My "normal" solution would be to nice and ionice the postgres stuff and increase the prio of the graph generation. However this still fails. The graph generation is semi-thread-proof with flock. I do log the execution times and for the graph generation it is up to 5 min during high IO load, seemingly resulting in a missing graph for up to 4 min.
The timeframe is exactly matching with the postgres activity (this sometimes happens during the day aswell, though not that often) Ionicing up to realtime prio (C1 N6 graph_cron vs C2 N3 postgres), nicing way above the postgres (-5 graph_cron vs 10 postgres) did not solve the issue.

Assuming the data is not collected, the additional issue is the ionice/nice somehow still not working.
Even with 90% IOwait and a load of 100 i was still able to use the data generation command free without more than maybe 5 sec delay (on testing at least).

Sadly i have not been able to reproduce this exactly in testing (having only a virtualized dev system)

Versions:

Kernel 2.6.32-5-686-bigmem
Debian Squeeze rrdtool 1.4.3 Hardware: SAS 15K RPM HDD with LVM in hardware RAID1
mount options: ext3 with rw,errors=remount-ro
Scheduler: CFQ
crontab:

* * * * *               root    flock -n /var/lock/rrd-updates.sh nice -n-1 ionice -c1 -n7 /opt/bin/rrd-updates.sh

There seems to be a somhow possibly related BUG from Mr Oetiker on github for rrdcache:
https://github.com/oetiker/rrdtool-1.x/issues/326

This actually could be my issue (concurrent writes) but it does not explain the cronjob to not fail. In the asumption i actually have 2 concurrent writes flock -n would return exit code 1 (per man page ,confirmed in testing) As i do not get an email with the output either and the observation that the cronjob do actually run fine all the other time i am somehow lost.

Example output: cpu load graph with missing lines

Based on the comment i added the important source of the update script.

rrdtool update /var/rrd/cpu.rrd $(vmstat 5 2 | tail -n 1 | awk '{print "N:"$14":"$13}')
rrdtool update /var/rrd/mem.rrd $(free | grep Mem: | awk '{print "N:"$2":"$3":"$4}')
rrdtool update /var/rrd/mem_bfcach.rrd $(free | grep buffers/cache: | awk '{print "N:"$3+$4":"$3":"$4}')

What do i miss or where can i check further?

Remember: Productive system so no dev, no stacktrace or similiar available or installable.

1 Answers

Voted

drookie · Answer 1 · 2014-10-18T21:27:52+08:00

Best Answer

drookie

2014-10-18T21:27:52+08:002014-10-18T21:27:52+08:00

I guess it's not the rrdtool that cannot update the graph, but rather data cannot be measured at this point. By the way, your method of measuring CPU and memory stats is just wrong, because it gives you instant result. CPU and Memory load can change drastically along the 60 seconds interval, but you will take only one value. You should really consider taking SNMP data, which gives average data on an interval. Plus, the whole pipe seems to be more expensive and slow that a snmpget call. Could be the gaps main reason.

2

rrdgraph generation fails on high IO load

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?