I have a pipeline that runs some computationally intensive tasks on a Linux machine. The script that launches these checks the current load average and—if it is above a certain threshold— waits until the load falls below it. This is on an Ubuntu virtual machine (running on an Ubuntu host, if that's relevant) which can have a variable number of cores assigned to it. Both our development and production machines are VMs running on the same physical server and we manually allocate cores to each as needed.
I have noticed that even when the VM has as few as 20 cores, a load of ~60 isn't bringing the machine to its knees. My understanding of how Linux load average works was that anything above the number of CPUs is indicative of a problem, but apparently things are not quite as clearcut as all that.
I'm thinking of setting the threshold at something like $(grep -c processor /proc/cpuinfo) x N
where N>=1
. Is there any clever way of determining the value N
should take so as to both maximise performance and minimize lag?
In other words, how can I know what maximum load average a machine can support before performance starts to degrade? I had naively expected that to be the number of CPUs (so, N=1
) but that doesn't seem to hold up. Since the number of cores can vary, testing possible combinations is both complicated and time consuming and, since this is a machine used by various people, impractical.
So, how can I decide an acceptable max average load threshold as a function of the number of available cores?
Load is a very often misunderstood value on Linux.
On Linux it is the measurement of all tasks in the running or uninterruptible sleep state.
Note this is tasks, not processes. Threads are included in this value.
Load is calculated by the kernel every five seconds and is a weighted average. That is the minute load is the average of 5/60, the five minute 5/300 and the fifteen 5/900.
Generally speaking, load as a pure number has little value without a point of reference and I consider the value often misrepresented.
Misconception 1: Load as a Ratio
This is most common falsehood people make of load in Linux. That it can be used to measure CPU performance against some fixed ratio. This is not what load gives you.
To elaborate - people have an easy time understanding CPU utilization. This is utility over time. You take work done, then divide it by work possible.
Work possible in this regards is a fixed known value normally represented as a percentage out of 100 - thats your fixed ratio.
Load however has no constraint. There is no fixed maximum which is why you are having this difficulty understanding what to measure against.
To clarify what load is sampling does have a unfixed maximum, which is the total number of tasks currently present in the system when the sample is taken (this has no real bearing on what CPU work is being done).
Load as its calculated has no fixed maximum given its thrown into a weighted average and no recording of the number of tasks is given when weighting is measured.
Because I like food, an analogy you could give is that utilization is how fast you can eat your plate and load is - on average - how many plates you have left to devour.
So, the difference between CPU utility and load is subtle but important. CPU utility is a measure of work being done and load is a measure of work that needs to be done.
Misconception 2: Load is an Instant Measurement
The second fallacy is that Load is a granular measurement. You can read a number and get a understanding of the systems state.
Load is not granular but represents the general long term condition of the system. Not only is it sampled every five seconds (so misses running tasks that occur within the 5 second window) but is measured as averages over 1, 5 and 15 minutes respectively.
You cant use it as an instant measure of capacity, but a general sense of a systems burden over a longer period.
The load can be 100 and then be 10 only 30 seconds later. Its a value you have to keep watching to work with.
What can Load tell you?
It can give you an idea of the systems working trend. It is being given more than it can cope or less?
Because of the uninterruptible sleep state, this does muddy the load value as a pure scheduling score of work - but gives you some indication of how much demand there is on the disk (its still work that needs to be done technically).
Load also offer clues to anomalies on a system. If you see the load at 50+ it suggests something is amiss.
Load additionally can cause people to be concerned without reason.
In Summary
I find load a very woolly value, precisely there are no absolutes with it. Its measurement you get on one system is often meaningless in reference against another.
Its probably one of the first things I'd see in top purely to check for an obvious anomaly. Basically I'm using it almost like a thermometer - like a general condition of a system only.
I find its sampling period way too long for most workloads I throw at my systems (which run in the order of seconds generally, not minutes). I suppose it makes sense for systems that execute long running intensive tasks, but I dont really do much of that.
The other thing I use it for is long term capacity management. Its a nice thing to graph over long periods of time (months) as you can use it to understand how much more work you are handling compared to a few months ago.
Finally, to answer your question about what to do in your scenario. Quite honestly, the best suggestion I would offer is rather than consider using load as a factor as to when to run - use nice to execute your process giving other processes priority over it. This is good for a few reasons.
With a niceness of 0 (the default) each process gets a weight of 1024. The lower the weight, the less time on the CPU is offered to the process. Here is a table of this behaviour.
So to compare, in a scenario where you have 2 processes waiting to run - if you renice a process +10 it gets approximately 1/10th of the CPU time a priority 0 process has. If you renice it +19 it would get 1/100th of the CPU time a priority 0 process has.
It should be noted you'll probably see your load at 1 at least during the duration of your pipeline.
I imagine this would be a more elegant solution to your problem.
From Wikipedia:
In other words, the load average reported by Linux includes any processes waiting for I/O (eg: disk or network). This means that if your application is somewhat I/O intensive you will have a high load average (ie: many processes are waiting for I/O) with low CPU utilization (they sleep while waiting for I/O).
This, in turn, will led to a system that is responsive even with an overloaded load average.