I'm running something of a bare-bones server (based on Ubuntu 11.04) on an Amazon EC2 micro instance, whose purpose is simply to coordinate the activities of a few webservers. The machine ran well for a few weeks, but now is hanging frequently with its CPU redlined at 100%.
I logged into the machine over SSH and ran a top
, which revealed that the landscape-sysinfo
process was the perpetrator consuming all of the system resources. A pstree
revealed where it was situated:
init─┬─atd ├─cron ├─dhclient3 ├─dovecot─┬─2*[dovecot-auth] │ ├─3*[imap-login] │ └─3*[pop3-login] ├─6*[getty] ├─master─┬─pickup │ └─qmgr ├─mountall ├─mysqld───11*[{mysqld}] ├─rsyslogd───3*[{rsyslogd}] ├─sshd─┬─sshd───sshd───bash │ ├─sshd───sshd───bash───top │ ├─sshd───sshd───bash───pstree │ └─sshd───sh───run-parts───50-landscape-sy───landscape-sys+ ├─udevd───2*[udevd] ├─upstart-socket- ├─upstart-udev-br └─vsftpd
The offending process is listed here as the last child of sshd
. If I manually kill landscape-sysinfo
, the machine returns to normal - until the process spontaneously respawns, usually a few moments later. (I can "vouch for" the other sshd
processes in the above tree. They were legitimate.)
I have no idea why landscape-sysinfo
is spawning itself randomly. I doubly have no idea why it's the child of sshd
.
I'm obviously none too thrilled about having an SSH processes running on my machine that I can't account for. Initially I feared a breach/trojan/backdoor, so I ran chkrootkit
and rkhunter
, but they both came up clean.
Does anybody have any idea what could be causing this process to run wild? Any thoughts on how to stop it from respawning?
I figured out the actual cause of the problem a while back, and figured I should document it here for the sake of others who may have similar issues. The root cause turned out to be trickier and more complicated than I initially expected.
In short,
run-parts
was working fine all along. Its going haywire was just the symptom of a different problem. The failure-chain looked something like this:1) On an entirely different machine,
lsyncd
(a file-syncing utility based off ofrsync
) was running haywire for reasons beyond our concern here. Of our concern, though, is thatlsyncd
was trying to sync files against this micro-instance (which manifested the problems) over SSH.2) Because
lsyncd
was making dozens of simultaneous connections over SSH, each was seemingly being greeted with the SSH login bannerlandscape-sysinfo
Ubuntu provides by default. This explains whatlandscape-sysinfo
is and why it is a child of SSH. It appeared thatrun-parts
was the culprit, but in fact the issue was that the machine was being bombarded with SSH connections.3) Exacerbating the issue was that this is a micro-instance on EC2, and I've since discovered that Amazon severely throttles micro-instances whose CPU consumption steadily rides above a certain threshold. (For an excellent explanation of the details, please see Greg's Ramblings. Many thanks to Greg for that article!)
Thus, the machine ran slowly for a few moments while it was being bombarded SSH connections, and then became unusably slow after the throttling kicked in.
Mystery solved!
It is a regularly scheduled cron job that gathers performance data.
Look here for (light) removal instructions. To just remove it entirely if you don't care about the data collection, either remove the package (if it will let you) or just find the crontab entry for it and comment it out.