Ping a Specific Port

Question

U. Windl

Asked: 2020-08-05 06:42:44 +0800 CST2020-08-05 06:42:44 +0800 CST 2020-08-05 06:42:44 +0800 CST

Confusion with cpu states in /proc/stat

772

I wrote a program (monitoring plugin) that reads the cpu usage numbers from /proc/stat. Unfortunately it seems the numbers do not match the manual page proc(5), especially I get different proportions for real machines and Xen paravirtualized machines:

The amount of idle time is different between these machines:

(An explanation on the "code blocks" following: The first parts show the fields of the "cpu" line (e.g.: "cpu#1" is the first field of the CPU-line), followed by a boot count ("epoch"), the UNIX time, and the actual value, each being separated with a colon (:). Next line starting with "stat: OK" is the output of my monitoring plugin; here it outputs the differences for debugging purposes, but usually it would output difference rates. It also adds user-readable labels to the numbers. "time" is the time difference since last call in seconds. Finally I added the CPU-related lines from /proc/stat (with some time elapsed since plugin output).)

First a two-CPU, 6 core, two threads each physical machine: Idle time seems to be about 900 times the sum of the other CPU states, corresponding to 99.89% idle and 0.06% user CPU. Also note that the relation of idle time to elapsed time is about 2398.5; when divided by USER_HZ (100) you get the number of CPUs roughly. It looks odd to me:

# physical two 6-core cpus, 2 threads each
cpu#1=0:1596547833:2667804
cpu#2=0:1596547833:90388
cpu#3=0:1596547833:1257514
cpu#4=0:1596547833:2735255340
cpu#5=0:1596547833:142707
cpu#6=0:1596547833:0
cpu#7=0:1596547833:107191
cpu#8=0:1596547833:0
cpu#9=0:1596547833:0
cpu#10=0:1596547833:0
stat OK: epoch=0, time=354, cpu.usr=581, cpu.ni=24, cpu.sys=288, cpu.idl=849070, cpu.iow=29, cpu.hirq=0, cpu.sirq=13, cpu.st=0, cpu.vgst=0, cpu.usr0=0

# cat /proc/stat
cpu  2668778 90430 1257998 2736664140 142741 0 107213 0 0 0
cpu0 116314 1436 53622 113861868 3864 0 81296 0 0 0
cpu1 142008 4782 32464 114026161 9767 0 4396 0 0 0
cpu2 167052 3058 63902 113932120 12818 0 1966 0 0 0
cpu3 120029 4260 28712 114058016 3337 0 1478 0 0 0
cpu4 145332 2972 61798 113983716 16115 0 1037 0 0 0
cpu5 114346 6809 27875 114060364 4110 0 1124 0 0 0
cpu6 126193 3720 54701 113999094 12348 0 968 0 0 0
cpu7 108188 4859 27436 114067537 6028 0 976 0 0 0
cpu8 121890 2820 51548 114020211 13474 0 940 0 0 0
cpu9 102942 4235 26150 114076765 3423 0 977 0 0 0
cpu10 125984 2724 48521 114014015 13950 0 845 0 0 0
cpu11 89154 4047 26674 114085160 7735 0 885 0 0 0
cpu12 116730 3894 397743 113663892 2352 0 884 0 0 0
cpu13 84306 4424 26164 114096015 2767 0 871 0 0 0
cpu14 127293 3539 44438 114033462 1294 0 922 0 0 0
cpu15 77740 3958 26201 114105245 358 0 854 0 0 0
cpu16 133217 3043 41476 114034324 737 0 958 0 0 0
cpu17 88893 4497 25736 114094645 662 0 838 0 0 0
cpu18 125887 2812 39150 114024555 1309 0 806 0 0 0
cpu19 65198 3560 25976 114092343 21838 0 802 0 0 0
cpu20 109361 3292 37270 114059144 1381 0 764 0 0 0
cpu21 71055 4094 26435 114111750 759 0 859 0 0 0
cpu22 118589 3643 37525 114052728 1567 0 883 0 0 0
cpu23 71069 3943 26468 114110998 737 0 875 0 0 0

Then a Xen paravirtualized machine with two virtual CPUs. The idle time is about 74 times the sum of the other CPU states, corresponding to 98.66% idle and 8% user cpu. Again if you take the proportion of idle time to elapsed time, you get 197.4, roughly corresponding to 2 CPUs. Here's one problem: User CPU and Idle exceed 100%.

## virtual 2 cpus (Xen PV)
cpu#1=0:1596547988:1162034
cpu#2=0:1596547988:227660
cpu#3=0:1596547988:3036855
cpu#4=0:1596547988:701649884
cpu#5=0:1596547988:1037577
cpu#6=0:1596547988:0
cpu#7=0:1596547988:31478
cpu#8=0:1596547988:355862
cpu#9=0:1596547988:0
cpu#10=0:1596547988:0
stat OK: epoch=0, time=36, cpu.usr=16, cpu.ni=7, cpu.sys=28, cpu.idl=7108, cpu.iow=4, cpu.hirq=0, cpu.sirq=0, cpu.st=5, cpu.vgst=0, cpu.usr0=0

> cat /proc/stat
cpu  1162136 227690 3037149 701727879 1037664 0 31481 355901 0 0
cpu0 531438 112727 1469157 350497090 791387 0 31011 192100 0 0
cpu1 630698 114962 1567991 351230788 246276 0 470 163801 0 0

I know that these numbers in /proc/stat are USER_HZ, but that shouldn't matter as a common factor, right?

I feel the idle proportion does not match the rest of the CPU states (too high?), but I fail to recognize what's wrong. (I also realized that for multiple cores you can never read read those numbers from /proc/stat consistently, but the differences would be small enough to ignore)

1 Answers

Voted

U. Windl · Answer 1 · 2020-08-19T04:47:30+08:00

Probably because of the "specialties" of /proc/stat the main mistake was trying to relate the CPU accumulated ticks to elapsed time:

Besides of getting inconsistent readings for multiple CPUs, even the sum of all states for one CPU differs significantly between CPUs (One would guess that any CPU has to be in any of the states listed at any time).

For example on a two-socket, 6-core, 2-threads machine has these sums:

Total           100.00% ( 99539187188)
Total cpu #1    100.00% (  4140261989)
Total cpu #2    100.00% (  4144710272)
Total cpu #3    100.00% (  4151313481)
...
Total cpu #22   100.00% (  4148181711)
Total cpu #23   100.00% (  4147345315)
Total cpu #24   100.00% (  4154271398)
total sum is              99539187091

So the correct algorithm sums up all tick counts from all fields for one cpu and then divides the individual fields of that (and only that) CPU by that sum. Then you would get individual and total relative CPU stats like this:

user              0.24% (   237735999)
nice              0.01% (     5002498)
system            0.63% (   630749296)
idle             98.41% ( 97954146521)
iowait            0.39% (   385255182)
irq               0.00% (      186187)
softirq           0.16% (   157069844)
steal             0.17% (   169041661)
guest             0.00% (           0)
guest_nice        0.00% (           0)
Total           100.00% ( 99539187188)

user              0.47% (    19423699)
nice              0.01% (      218870)
system            1.20% (    49511456)
idle             97.49% (  4036508813)
iowait            0.41% (    17117301)
irq               0.00% (        5014)
softirq           0.18% (     7495287)
steal             0.24% (     9981549)
guest             0.00% (           0)
guest_nice        0.00% (           0)
Total cpu #1    100.00% (  4140261989)

...

user              0.27% (    11119646)
nice              0.01% (      272401)
system            0.55% (    22956454)
idle             98.63% (  4097537498)
iowait            0.32% (    13130627)
irq               0.00% (        3663)
softirq           0.09% (     3893145)
steal             0.13% (     5357964)
guest             0.00% (           0)
guest_nice        0.00% (           0)
Total cpu #24   100.00% (  4154271398)

Finally if you take the average of all individual CPUs stats, you get the total stats more or less (the sums differ; that's what I called "inconsistent reading"):

Avg. user              0.24% (      237735987)
mismatch for cpu stat #0: 237735999 - 237735987 is 12
Avg. nice              0.01% (        5002486)
mismatch for cpu stat #1: 5002498 - 5002486 is 12
Avg. system            0.63% (      630749284)
mismatch for cpu stat #2: 630749296 - 630749284 is 12
Avg. idle             98.41% (    97954146509)
mismatch for cpu stat #3: 97954146521 - 97954146509 is 12
Avg. iowait            0.39% (      385255171)
mismatch for cpu stat #4: 385255182 - 385255171 is 11
Avg. irq               0.00% (         186174)
mismatch for cpu stat #5: 186187 - 186174 is 13
Avg. softirq           0.16% (      157069830)
mismatch for cpu stat #6: 157069844 - 157069830 is 14
Avg. steal             0.17% (      169041650)
mismatch for cpu stat #7: 169041661 - 169041650 is 11
Avg. guest             0.00% (              0)
Avg. guest_nice        0.00% (              0)
Total           100.00% (    99539187091)

All percent numbers were output with printf("%6.2f%%", ...), so they are rounded.

Finally here are total numbers from a busy "48 CPU" machine:

Total           100.00% ( 21877206847)
Total cpu #1    100.00% (   455281359)
Total cpu #2    100.00% (   455501710)
Total cpu #3    100.00% (   455015811)
Total cpu #4    100.00% (   456004044)
Total cpu #5    100.00% (   455909301)
Total cpu #6    100.00% (   455981392)
Total cpu #7    100.00% (   455729034)
Total cpu #8    100.00% (   456028994)
Total cpu #9    100.00% (   456049969)
Total cpu #10   100.00% (   455860960)
Total cpu #11   100.00% (   455870035)
Total cpu #12   100.00% (   456039088)
Total cpu #13   100.00% (   456028481)
Total cpu #14   100.00% (   455902588)
Total cpu #15   100.00% (   454901116)
Total cpu #16   100.00% (   456039049)
Total cpu #17   100.00% (   456079322)
Total cpu #18   100.00% (   456018015)
Total cpu #19   100.00% (   455811530)
Total cpu #20   100.00% (   456048623)
Total cpu #21   100.00% (   455174048)
Total cpu #22   100.00% (   455986778)
Total cpu #23   100.00% (   455045645)
Total cpu #24   100.00% (   456062558)
Total cpu #25   100.00% (   456074695)
Total cpu #26   100.00% (   456046892)
Total cpu #27   100.00% (   455889172)
Total cpu #28   100.00% (   456067743)
Total cpu #29   100.00% (   455975866)
Total cpu #30   100.00% (   455998701)
Total cpu #31   100.00% (   455324199)
Total cpu #32   100.00% (   456078454)
Total cpu #33   100.00% (   455430179)
Total cpu #34   100.00% (   456052808)
Total cpu #35   100.00% (   454948089)
Total cpu #36   100.00% (   456056733)
Total cpu #37   100.00% (   455406941)
Total cpu #38   100.00% (   456040647)
Total cpu #39   100.00% (   455873538)
Total cpu #40   100.00% (   456075263)
Total cpu #41   100.00% (   456020834)
Total cpu #42   100.00% (   456053464)
Total cpu #43   100.00% (   455017805)
Total cpu #44   100.00% (   456063505)
Total cpu #45   100.00% (   455146763)
Total cpu #46   100.00% (   455838738)
Total cpu #47   100.00% (   455278795)
Total cpu #48   100.00% (   456077454)
Total (all CPUs)100.00% ( 21877206728)

Confusion with cpu states in /proc/stat

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?