I have a LAMP cluster that shares files via NFS and occasionally one of them will be stricken for a while when mysterious flush processes start appearing.
Can anyone help me? The only way to resolve this is to reboot - killing the processes only spawns new ones.
top - 19:43:43 up 104 days, 4:52, 1 user, load average: 27.15, 56.72, 33.31
Tasks: 301 total, 9 running, 292 sleeping, 0 stopped, 0 zombie
Cpu(s): 15.6%us, 77.0%sy, 0.0%ni, 4.2%id, 2.0%wa, 0.0%hi, 1.2%si, 0.0%st
Mem: 8049708k total, 7060492k used, 989216k free, 157156k buffers
Swap: 4194296k total, 483228k used, 3711068k free, 928768k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
840 root 20 0 0 0 0 R 98.0 0.0 6:45.83 flush-0:24
843 root 20 0 0 0 0 R 97.6 0.0 5:50.32 flush-0:25
835 root 20 0 0 0 0 R 96.0 0.0 6:42.44 flush-0:22
836 root 20 0 0 0 0 R 95.0 0.0 6:51.56 flush-0:27
833 root 20 0 0 0 0 R 94.3 0.0 6:27.21 flush-0:23
841 root 20 0 0 0 0 R 93.7 0.0 6:46.97 flush-0:26
2305 apache 20 0 772m 31m 25m S 23.6 0.4 0:07.60 httpd
2298 apache 20 0 772m 31m 25m S 13.6 0.4 0:08.98 httpd
26771 apache 20 0 775m 47m 41m S 10.3 0.6 4:07.97 httpd
2315 apache 20 0 770m 29m 25m S 9.0 0.4 0:07.44 httpd
24370 memcache 20 0 457m 123m 608 S 8.6 1.6 66:20.28 memcached
1191 apache 20 0 770m 30m 26m S 8.3 0.4 0:13.54 httpd
2253 apache 20 0 771m 32m 27m S 8.3 0.4 0:11.75 httpd
3476 varnish 20 0 52.9g 2.0g 20m S 8.0 25.6 0:15.30 varnishd
17234 apache 20 0 775m 50m 45m S 7.0 0.6 9:22.09 httpd
23161 apache 20 0 780m 54m 43m S 7.0 0.7 6:33.40 httpd
Thanks
Your system is being overloaded with disk writing requests and your configuration "dirty ratio" is not optimal for your environment.
You can set two administrative parameters for virtual memory:
These are the
dirty_background_ratio
anddirty_ratio
locatable in/proc/sys/vm/
These parameters represent a percentage of memory.
If you setting a low value for
dirty_ratio
You can get more disk load but would reduce the consumption of RAM for dirty memory management.The
dirty_background_ratio
is the percentage minimal residual memory, which caused the stoppage of writing dirty data in the disk from the system. This means that you must find the best compromise between the dirty chunks dimension to write (flush process) and minimum memory where the system will be stop in the writing process.Relationship for good performance could be:
an average ratio:
The causes of this imbalance in your system can be several, among the most common causes is an insufficient amount of RAM to manage the installed other times it may simply be due to a drop in performance of memory installed on your server with causes ranging from poor ventilation to incorrect feeding.
Although most of the problems are in the form of software bugs, not known to many of these errors are due to poor confuguracion of the hardware in relation to the services installed. Especially in the case of rented machines.
To help those less familiar with Linux machines, the above mentioned parameters can be replaced in this way:
Permanent mode:
(run these two commands only once, otherwise edit this file with your favorite editor)
Temporally mode:
You can find more information about these settings at this link
I found following link with similar discussion:
0005972: Top and uptime displays wrong load average value - CentOS Bug Tracker
at last post it says:
The high load average issue is resolved in a newer version of the hpvsa driver (1.2.4-7) that is now released by HP. Contact HP Support to obtain a copy of the new driver.
Do you have a
EnableMMAP Off
in your Apache configuration file?I'm not sure whether these are the symptoms, but it's worth a try
If you have an ext4 filesystem, check this bug Slow writes to ext4 partition - INFO: task flush-253:7:2137 blocked for more than 120 seconds. which has been fixed in recent kernels RHSA-2011-1530 which you can also obtain, of course, from Centos.