I've noticed that my AWS server occasionally starts using a bunch of CPU for no particular reason, looking something like this:
Observe that it does not occur at specific times, but has a very definite pattern to it. It lasts just under an hour.
Remoting to the machine during this occurrence would invariably make it stop happening. Leaving the account permanently logged on allowed me to capture a more fine-grained CPU usage trace. It looked like this:
That's right; the processes that actually consume that CPU are not in the list. Instead, they appear and disappear all the time. ProcMon was obviously the tool for the job, so I captured a trace. This is what I found:
There's also Postgres involved:
However all the CPU usage is by the Winlogon/LogonUI/etc:
Here's a short excerpt of process start and stop events during this occurrence:
Note that postgres is not interleaved with each start/stop of smss/winlogon/etc, but only some of them.
Any ideas why this happens, and how to prevent it?
For the postgres part this is because postgres creates a process - not a thread - for each session. This is quite costly on windows ( but rather efficient on unix systems ).
Winlogon / LogonUi part this is rather strange. Is the server remotely accessible? Could there be a network scanner on the network which would try to open port 3389 on the server and thus span a rdp session, which would explain the smss / winlogon / logonui sequence? I think of a network scanner because the session is closed immediately.
So my guess for the bounty: you have a nmap process or some "network discovery" tool which scan ports on your network, or your server is open to the internet without firewall on port 3389 ( and maybe 5432 ).
The problem was that someone was brute-forcing my RDP login. A secondary issue was that network level authentication was disabled, making each login attempt relatively CPU-expensive.
The solution was to change the RDP port away from 3389 to stop the brute force attacks, and to enable network level authentication to reduce the CPU cost of a logon attempt.
Tip #1, from syneticon-dj: check the event logs. These spikes were correlated with lots of logon failures, trying usernames like "john", "admin", "test", etc, each one with about 3-5 different passwords. They arrived 3-4 seconds apart.
Tip #2, from Olivier S: this server, being an Amazon EC2 instance, requires RDP. The real problem was that by default, EC2 machines have Network Level Authentication disabled, for some reason. This means that every time someone wants to attempt a password, an entire logon UI is spun up, just to present them with a pretty remote desktop session. This is what caused all the CPU usage.
I found the answer to this was 15 people trying to brute force my RDP on port 3389.
Open up a command prompt and type
netstat -n
look for your IP:3389 if there is more than 1 connection that's not yourself then somebody is trying to get in.Solution to stop near 100% CPU was change the default 3389 to something else.
You can google a solution for this, the port is stored in the registry
You might also need to mod your firewall rules accordingly
This cured my problem and I have my CPU back!