Looking at the output of top on our server, one of my colleagues told me that the fact that some processes got less than 100 "%CPU" was because I was running too many processes. He added that based on his experience if I run less than 6 processes, then probably all the processes would have 100 "%CPU".
I don't want to be an annoyance to other users, but I doubt what he said is correct. The server has 16 cores and the current load average is between 10 and 11. From what I have learned, it is not overloaded. But I don't know why some processes are just getting less than 100 "%CPU"? Is it really because of me?
Thanks and regards!
Here comes the output of top:
top - 16:34:13 up 32 days, 1:36, 12 users, load average: 10.61, 10.39, 10.22
Tasks: 380 total, 10 running, 370 sleeping, 0 stopped, 0 zombie
Cpu(s): 55.0%us, 1.7%sy, 0.0%ni, 42.2%id, 0.5%wa, 0.1%hi, 0.4%si, 0.0%st
Mem: 130766620k total, 39859784k used, 90906836k free, 849412k buffers
Swap: 47351548k total, 279456k used, 47072092k free, 19792956k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17197 tim 18 -2 1315m 1.3g 1504 R 100 1.0 4510:11 MLtest
28762 tim 18 -2 1315m 1.3g 1504 R 100 1.0 4633:01 MLtest
29249 tim 18 -2 1315m 1.3g 1504 R 100 1.0 4623:03 MLtest
29560 tim 18 -2 1315m 1.3g 1504 R 100 1.0 4626:59 MLtest
4904 tim 18 -2 1315m 1.3g 1504 R 100 1.0 4757:12 MLtest
5143 tim 18 -2 1315m 1.3g 1504 R 100 1.0 4759:40 MLtest
29389 tim 18 -2 1315m 1.3g 1504 R 99 1.0 4622:11 MLtest
5285 tim 18 -2 1315m 1.3g 1504 R 97 1.0 4758:49 MLtest
4763 tim 18 -2 1315m 1.3g 1504 R 93 1.0 4754:22 MLtest
9456 zma 18 -2 206m 85m 11m S 48 0.1 60:46.78 dropbox
7527 vals 18 -2 1266m 436m 42m S 4 0.3 613:57.10 MATLAB
2903 root 15 -5 0 0 0 S 1 0.0 19:00.01 rpciod/0
19133 vals 18 -2 1380m 503m 42m S 1 0.4 798:47.99 MATLAB
12454 tim 18 -2 19248 1588 1024 R 1 0.0 0:48.88 top
12 root RT -5 0 0 0 S 1 0.0 35:01.05 migration/3
2924 root 15 -5 0 0 0 S 1 0.0 27:20.92 nfsiod
12690 jun 18 -2 913m 84m 2684 S 1 0.1 121:55.65 MATLAB
19650 jun 18 -2 19244 1600 1028 S 1 0.0 6:58.41 top
6 root RT -5 0 0 0 S 0 0.0 129:49.45 migration/1
9 root RT -5 0 0 0 S 0 0.0 104:34.66 migration/2
2870 daemon 20 0 8180 404 308 S 0 0.0 5:18.91 portmap
8985 root 20 0 28484 344 264 S 0 0.0 6:24.77 hald-addon-stor
9293 root 20 0 369m 4208 2316 S 0 0.0 83:36.35 kdm_greet
24028 tim 18 -2 871m 140m 45m S 0 0.1 7:50.56 MATLAB
1 root 20 0 4104 304 224 S 0 0.0 0:03.59 init
2 root 15 -5 0 0 0 S 0 0.0 0:00.26 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:00.31 migration/0
4 root 15 -5 0 0 0 S 0 0.0 1:08.91 ksoftirqd/0
Not sure what your friend is talking about but it sounds pretty arbitrary and... well, blatantly wrong.
The percentage of CPU measure is somewhat misleading. In fact, any process that is currently "on" the CPU is getting 100% of the CPU, at that moment in time. The percentage refers to how much CPU time those processes have received during the last time sample.
So the fact that they are displaying less than 100% CPU usage is not an indication of a problem.
A more relevant measure in your top output is this line: Cpu(s): 55.0%us, 1.7%sy, 0.0%ni, 42.2%id, 0.5%wa, 0.1%hi, 0.4%si, 0.0%st
It shows 42% idle time on the CPU. So your other processes, whatever they are, are not CPU-bound.
You can press "1" (one) and
top
will show the CPU stats at the top on a per-CPU basis. You might find that informative.Programs do more than wait on the CPU. They wait on disk an network I/O; they wait on user input. Not every program that runs is going to use 100% of the CPU for top's refresh quantum. For example, when nothing's running, do you see
init
consuming 100% CPU? No.Your friend is not only wrong, but if you do what he says, it very well may be counter productive. If you have 16 cores, and a load of 10, you should probably increase the number of MLTest processes you have running if it is currently limited to only 9 and somehow configurable. Why?
Well, a process can usually only run on 1 cpu. If the process uses 100% of the cpu, it is cpu bound. So if you restrict and say you can only use 9 processes to do whatever it is that MLTest does, then you can only use 9 of those 16 processors.
Load refers to the number of processes waiting to run. You apparently have 10 processes that need to run on the CPU. Who knows what they need to do. But if you are only letting you MLTest processes run on a few CPU's (remember, 1 process per CPU), then you (could) have high load because all of those processes are always running or waiting to run. By letting more processes though, you can get more stuff done faster, then you won't have to wait to run so much of the time.
However, this is just one theoretical scenario. To specifically solve the problem, you need to answer:
1) what (process) is waiting to run (causing load)? 2) Are you rstricting the numner of MLTest processes that can run? 3) If you let more MLTest processes run, will it "finish" your problem/program faster?
A number of things can cause this, and I would say first that this is no cause for alarm or concern.
Not knowing anything more about what you're doing than the process list you included exposes, and not knowing anything really about Matlab, I'm going to suggest some possible things that are going on that are completely normal, and can result in what you're seeing.
First, though, I want to point out that top is showing you an average value over a certain period of time, and probably a very short one -- on the order of a few seconds. One of your processes running at a mere 93% for a couple seconds (rather than 100%) is not a huge thing. It's probably back up to 100% (and a different process down to 93%) on the next interval.
Back to why:
If a process does anything requiring a system call, especially disk I/O, it may be idle for a time waiting for that operation to finish. This will result in < 100% CPU usage, as part of the time it's blocking on I/O. Other users' processes definitely have an effect here. There may be more than enough cores, but if you're all vying for bandwidth to the same hard disk, then nobody will see 100% CPU utilization.
Your application seems to use multiple processes or even multiple threads at once. This can speed things up to a point (and that directly depends on the application and how it's dividing up work). However, this can also have a cost associated with it when it comes to communication between processes. If, for example, each child process (or thread) has to talk to each other process, then the number of communications channels grows significantly as the number of processes increases. Even if each process is only communicating with a main process-in-charge, then the children can block on the communication with the parent as the parent talks to a different child. This isn't really all that different from blocking on disk I/O.
In the end, even with an infinite number of cores, you will likely see diminishing returns with each additional process you use to do your work. There's probably a sweet spot somewhere, and maybe it's 6, as your colleague suggests. But I wouldn't use his analysis (looking for <100% utilization) to determine where that sweet spot is.