I'm trying to narrow down the list of suspects of web servers that perform moderately well most of the time with occasional bouts of poor performance. I'm analyzing the data collected and summarized by sar. I've noticed a few things, one of which is high number of tasks in the run queue.
10:15:01 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
10:25:01 AM 2 150 0.05 0.05 0.06 0
10:35:01 AM 4 149 0.08 0.12 0.09 0
10:45:01 AM 6 150 0.13 0.19 0.15 0
10:55:01 AM 1 150 0.08 0.10 0.13 0
11:05:01 AM 4 150 0.20 0.35 0.23 0
11:15:01 AM 3 149 0.02 0.09 0.15 0
11:25:01 AM 7 149 0.04 0.05 0.11 0
11:35:01 AM 4 150 0.14 0.15 0.13 0
11:45:01 AM 6 150 0.27 0.18 0.16 0
11:55:01 AM 5 150 0.08 0.10 0.13 0
12:05:01 PM 3 149 0.35 0.40 0.26 0
12:15:01 PM 19 155 0.02 0.10 0.16 1
12:25:01 PM 2 150 0.00 0.07 0.12 0
12:35:02 PM 3 151 0.58 0.24 0.17 0
12:45:01 PM 8 150 0.02 0.13 0.15 0
12:55:01 PM 6 149 0.81 0.29 0.18 0
01:05:01 PM 3 148 0.00 0.09 0.13 0
01:15:01 PM 7 149 0.00 0.04 0.11 0
I believe these are 10 minute averages.
Is this an indicator that the web server is not performing as fast as it could if the average run queue length was lower?
Your load average remains low throughout this. I think it would be difficult to determine much with such large delays between readings. A high run queue with a corresponding high load would indicate a resource issue. I don't think that's the case here. How are you quantifying "poor performance"?
This is more likely a symptom of poor performance (eg. long processing time per processing item for some items combined with load balancing that is agnostic about the work involved in each query imbalancing the load toward one server) than its cause.