I would like the know in what cpus % can be considered in save range and also load average? Which indication will give signal something is wrong with the server?
top - 22:55:51 up 3 days, 6:39, 1 user, load average: 0.53, 0.43, 0.37
Tasks: 229 total, 2 running, 227 sleeping, 0 stopped, 0 zombie
Cpu0 : 16.2%us, 0.7%sy, 0.0%ni, 82.8%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 10.5%us, 0.7%sy, 0.0%ni, 88.5%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 9.0%us, 0.0%sy, 0.0%ni, 91.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.3%us, 0.3%sy, 0.0%ni, 99.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 44.8%us, 2.6%sy, 0.0%ni, 37.0%id, 0.0%wa, 9.4%hi, 6.2%si, 0.0%st
Cpu6 : 3.0%us, 0.0%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st
Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16468596k total, 2423908k used, 14044688k free, 200172k buffers
What a "safe" range is depends entirely on what this server is used for, and what you are prepared to accept with respect to high load.
There is no ready answer; some companies want their servers to never reach above 50% total utilization, some don't care as long as everything is done on time.
You start with the intended purpose of this box, and work your way across from that.
Do not focus on individual cpu % numbers so much, but rather take note of the
load average
. These numbers will give you an idea if your system is 'overloaded'.The load average's 3 values indicate what percentage of your system has been utilized over the last 1, 5 and 15 minutes. For example, a load average value of 1.0 on a single cpu system, means that is is utilizing 100% of its cpu resources and processes will need to queue and wait for free resources before they can be processed. For systems with more than one cpu, divide the load average number by the number of processors in your system. For example, a value of 8.0 would mean your system is 100% utilized.
The
load average: 0.53, 0.43, 0.37
in your example, states that your system has been utilizing 6.6% of its cpu resources over the last one minute, 5.3% over the last 5 minutes, and 4.6% over the last 15 minutes (which is pretty low).What number freaks you out is relative, but generally you would not want to run at 75%+ sustained utilization if possible. I say this simply because around that number and higher, the temps in your server will rise and your fans will start running at full speed, more load will be put on your power system, and the hotter the room the server is in will be (which can affect the cooling of other systems and AC costs). Also, the life of your system may be shortened as well due to the increase use of the fans and higher temps over long periods.
Keep in mind that your system is capable of going over 100% utilization, and in fact can go quite high if under heavy load. Spikes are not uncommon (backups for example, spikes in internet traffic, system updates, etc), and should generally only concern you if it is affecting your customers, sites or services, or your system is running under heavy load for extended periods as I stated above.
You can quickly use the
uptime
command to see the load times.Hope this helps!
I think you've bought the idea that these metrics show how your server is performing - that's not the case, high values will indicate how the performance is constrained which is something very different.
I see nothing in the data you've provided here to suggest that the performance of this server is constrained by CPU, disk I/O nor memory.
If you want to know if the services provided by this box are being adversely affected by performance then measure the time taken to service requests