Ping a Specific Port

Question

Tim

Asked: 2009-10-06 13:38:52 +0800 CST2009-10-06 13:38:52 +0800 CST 2009-10-06 13:38:52 +0800 CST

%CPU for a process

772

Looking at the output of top on our server, one of my colleagues told me that the fact that some processes got less than 100 "%CPU" was because I was running too many processes. He added that based on his experience if I run less than 6 processes, then probably all the processes would have 100 "%CPU".

I don't want to be an annoyance to other users, but I doubt what he said is correct. The server has 16 cores and the current load average is between 10 and 11. From what I have learned, it is not overloaded. But I don't know why some processes are just getting less than 100 "%CPU"? Is it really because of me?

Thanks and regards!

Here comes the output of top:

top - 16:34:13 up 32 days,  1:36, 12 users,  load average: 10.61, 10.39, 10.22
Tasks: 380 total,  10 running, 370 sleeping,   0 stopped,   0 zombie
Cpu(s): 55.0%us,  1.7%sy,  0.0%ni, 42.2%id,  0.5%wa,  0.1%hi,  0.4%si,  0.0%st
Mem:  130766620k total, 39859784k used, 90906836k free,   849412k buffers
Swap: 47351548k total,   279456k used, 47072092k free, 19792956k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                        
17197 tim    18  -2 1315m 1.3g 1504 R  100  1.0   4510:11 MLtest                                                                                       
28762 tim    18  -2 1315m 1.3g 1504 R  100  1.0   4633:01 MLtest                                                                                       
29249 tim    18  -2 1315m 1.3g 1504 R  100  1.0   4623:03 MLtest                                                                                       
29560 tim    18  -2 1315m 1.3g 1504 R  100  1.0   4626:59 MLtest                                                                                       
 4904 tim    18  -2 1315m 1.3g 1504 R  100  1.0   4757:12 MLtest                                                                                       
 5143 tim    18  -2 1315m 1.3g 1504 R  100  1.0   4759:40 MLtest                                                                                       
29389 tim    18  -2 1315m 1.3g 1504 R   99  1.0   4622:11 MLtest                                                                                       
 5285 tim    18  -2 1315m 1.3g 1504 R   97  1.0   4758:49 MLtest                                                                                       
 4763 tim    18  -2 1315m 1.3g 1504 R   93  1.0   4754:22 MLtest                                                                                       
 9456 zma    18  -2  206m  85m  11m S   48  0.1  60:46.78 dropbox                                                                                         
 7527 vals   18  -2 1266m 436m  42m S    4  0.3 613:57.10 MATLAB                                                                                          
 2903 root   15  -5     0    0    0 S    1  0.0  19:00.01 rpciod/0                                                                                        
19133 vals   18  -2 1380m 503m  42m S    1  0.4 798:47.99 MATLAB                                                                                          
12454 tim    18  -2 19248 1588 1024 R    1  0.0   0:48.88 top                                                                                             
   12 root   RT  -5     0    0    0 S    1  0.0  35:01.05 migration/3                                                                                     
 2924 root   15  -5     0    0    0 S    1  0.0  27:20.92 nfsiod                                                                                          
12690 jun    18  -2  913m  84m 2684 S    1  0.1 121:55.65 MATLAB                                                                                          
19650 jun    18  -2 19244 1600 1028 S    1  0.0   6:58.41 top                                                                                             
6 root       RT  -5     0    0    0 S    0  0.0 129:49.45 migration/1                                                                                     
9 root       RT  -5     0    0    0 S    0  0.0 104:34.66 migration/2                                                                                     
 2870 daemon 20   0  8180  404  308 S    0  0.0   5:18.91 portmap                                                                                         
 8985 root   20   0 28484  344  264 S    0  0.0   6:24.77 hald-addon-stor                                                                                 
 9293 root   20   0  369m 4208 2316 S    0  0.0  83:36.35 kdm_greet                                                                                       
24028 tim    18  -2  871m 140m  45m S    0  0.1   7:50.56 MATLAB                                                                                          
1 root      20   0  4104  304  224 S    0  0.0   0:03.59 init
2 root      15  -5     0    0    0 S    0  0.0   0:00.26 kthreadd
3 root      RT  -5     0    0    0 S    0  0.0   0:00.31 migration/0
4 root      15  -5     0    0    0 S    0  0.0   1:08.91 ksoftirqd/0

5 Answers

Voted

DictatorBob · Answer 1 · 2009-10-06T13:52:01+08:00

Best Answer

DictatorBob

2009-10-06T13:52:01+08:002009-10-06T13:52:01+08:00

Not sure what your friend is talking about but it sounds pretty arbitrary and... well, blatantly wrong.

The percentage of CPU measure is somewhat misleading. In fact, any process that is currently "on" the CPU is getting 100% of the CPU, at that moment in time. The percentage refers to how much CPU time those processes have received during the last time sample.

So the fact that they are displaying less than 100% CPU usage is not an indication of a problem.

A more relevant measure in your top output is this line: Cpu(s): 55.0%us, 1.7%sy, 0.0%ni, 42.2%id, 0.5%wa, 0.1%hi, 0.4%si, 0.0%st

It shows 42% idle time on the CPU. So your other processes, whatever they are, are not CPU-bound.

4

Dennis Williamson · Answer 2 · 2009-10-06T13:47:49+08:00

Dennis Williamson

2009-10-06T13:47:49+08:002009-10-06T13:47:49+08:00

You can press "1" (one) and top will show the CPU stats at the top on a per-CPU basis. You might find that informative.

2

wfaulk · Answer 3 · 2009-10-06T13:54:22+08:00

wfaulk

2009-10-06T13:54:22+08:002009-10-06T13:54:22+08:00

Programs do more than wait on the CPU. They wait on disk an network I/O; they wait on user input. Not every program that runs is going to use 100% of the CPU for top's refresh quantum. For example, when nothing's running, do you see init consuming 100% CPU? No.

2

Zak · Answer 4 · 2009-10-06T15:16:01+08:00

Your friend is not only wrong, but if you do what he says, it very well may be counter productive. If you have 16 cores, and a load of 10, you should probably increase the number of MLTest processes you have running if it is currently limited to only 9 and somehow configurable. Why?

Well, a process can usually only run on 1 cpu. If the process uses 100% of the cpu, it is cpu bound. So if you restrict and say you can only use 9 processes to do whatever it is that MLTest does, then you can only use 9 of those 16 processors.

Load refers to the number of processes waiting to run. You apparently have 10 processes that need to run on the CPU. Who knows what they need to do. But if you are only letting you MLTest processes run on a few CPU's (remember, 1 process per CPU), then you (could) have high load because all of those processes are always running or waiting to run. By letting more processes though, you can get more stuff done faster, then you won't have to wait to run so much of the time.

However, this is just one theoretical scenario. To specifically solve the problem, you need to answer:

1) what (process) is waiting to run (causing load)? 2) Are you rstricting the numner of MLTest processes that can run? 3) If you let more MLTest processes run, will it "finish" your problem/program faster?

Rob F · Answer 5 · 2009-10-06T13:56:07+08:00

A number of things can cause this, and I would say first that this is no cause for alarm or concern.

Not knowing anything more about what you're doing than the process list you included exposes, and not knowing anything really about Matlab, I'm going to suggest some possible things that are going on that are completely normal, and can result in what you're seeing.

First, though, I want to point out that top is showing you an average value over a certain period of time, and probably a very short one -- on the order of a few seconds. One of your processes running at a mere 93% for a couple seconds (rather than 100%) is not a huge thing. It's probably back up to 100% (and a different process down to 93%) on the next interval.

Back to why:

If a process does anything requiring a system call, especially disk I/O, it may be idle for a time waiting for that operation to finish. This will result in < 100% CPU usage, as part of the time it's blocking on I/O. Other users' processes definitely have an effect here. There may be more than enough cores, but if you're all vying for bandwidth to the same hard disk, then nobody will see 100% CPU utilization.

Your application seems to use multiple processes or even multiple threads at once. This can speed things up to a point (and that directly depends on the application and how it's dividing up work). However, this can also have a cost associated with it when it comes to communication between processes. If, for example, each child process (or thread) has to talk to each other process, then the number of communications channels grows significantly as the number of processes increases. Even if each process is only communicating with a main process-in-charge, then the children can block on the communication with the parent as the parent talks to a different child. This isn't really all that different from blocking on disk I/O.

In the end, even with an infinite number of cores, you will likely see diminishing returns with each additional process you use to do your work. There's probably a sweet spot somewhere, and maybe it's 6, as your colleague suggests. But I wouldn't use his analysis (looking for <100% utilization) to determine where that sweet spot is.

%CPU for a process

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?