Ping a Specific Port

Question

thomasrutter

Asked: 2014-04-28 17:54:33 +0800 CST2014-04-28 17:54:33 +0800 CST 2014-04-28 17:54:33 +0800 CST

How to investigate the cause of a 100% CPU event that lasted for hours?

772

Yesterday the CPU on my Xen-based VPS server went to 100% for two hours and then went back to normal, seemingly naturally.

I have checked logs including syslog, auth.log and more and nothing seems out of the ordinary.

During this time, the server seemed to be operating as normal as indicated by people logging in, emails received etc
Memory, disk and network usage during this time appeared to be normal.
I hadn't rebooted the server in weeks, and I wasn't working on it that morning.
I keep it updated with security updates and the like. It's 12.04 LTS.
It runs nginx, mysql and postfix along with a few other things.

Around the start of the event syslog contains these entries:

Apr 27 07:55:34 ace kernel: [3791215.833595] [UFW LIMIT BLOCK] IN=eth0 OUT= MAC=___ SRC=209.126.230.73
 DST=___ LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=2962 PROTO=TCP SPT=49299 DPT=465 WINDOW=1024 RES=0x00 SYN URGP=0
Apr 27 07:55:34 ace dovecot: pop3-login: Disconnected (no auth attempts): rip=209.126.230.73, lip=___
Apr 27 07:55:34 ace kernel: [3791216.012828] [UFW LIMIT BLOCK] IN=eth0 OUT= MAC=___ SRC=209.126.230.73
 DST=___ LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=58312 PROTO=TCP SPT=49299 DPT=25 WINDOW=1024 RES=0x00 SYN URGP=0
Apr 27 07:55:34 ace kernel: [3791216.133155] [UFW LIMIT BLOCK] IN=eth0 OUT= MAC=___ SRC=209.126.230.73
 DST=___ LEN=76 TOS=0x00 PREC=0x00 TTL=244 ID=63315 PROTO=UDP SPT=49299 DPT=123 LEN=56

But then again, I get these all the time. It just indicates UFW/iptables successfully blocked some unwanted connections. It shouldn't be related.

I have a daily backup that runs just under 2 hours prior to the start of this "event". It seemed to run normally although it did cause a higher server load (but not CPU utilisation) than normal, pointing to a possible I/O congestion issue. But it didn't coincide with the 100% CPU event.

My question is: how can I investigate the cause of an event like this that happened in the past, given that it's no longer happening?

1 Answers

Voted

user521903 · Answer 1 · 2020-01-14T13:37:03+08:00

If you have CPU load graphs available, they might give further insight into what the CPU was doing at this time. It could have been waiting for disk IO's for instance, this is called IOWAIT.

If these are not available and you're having difficulty finding a reason this incident could very well be attributed to issues on the host server. Perhaps an issue with a noisy neighbor: a VM on the same host that is misbehaving, or a hardware failure (like a disk, this could cause high IOWAIT).

There is a utility called atop, this will keep a detailed record of your processes and would have shown the answer here. atop will make a 'snapshot' of all your process and resource usage every xx minutes (configurable). This is not going to help you now, but will if this were to happen again. See the atop website for more information: https://www.atoptool.nl/

P.s. Ubuntu 12.04 has reached end of life status and you should consider upgrading the machine since no more security updates are available for this version. See the Ubuntu release cycle: https://ubuntu.com/about/release-cycle

How to investigate the cause of a 100% CPU event that lasted for hours?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?