Ping a Specific Port

Question

Patrick R

Asked: 2011-01-21 13:41:21 +0800 CST2011-01-21 13:41:21 +0800 CST 2011-01-21 13:41:21 +0800 CST

PID ran away with all our MEM and SWAPPED hard - OSSEC RHEL

772

Forgive me for the length of this question... it is mostly details... only attempt to follow if you also enjoy reading log files... or drinking coffee.

I'll state the questions first:

1) how the heck did a nano process fire off based on what I've stated below

2) how did nano manage to take so much resource

3) working with ossec restarts surely isn't a coincidence so is that related?

This is a Red Hat 4.1.2-46 XEN environment, three cluster members. We updated our Hurricane monitoring code manually on Jan17 at 11:34am. Two files were changed (using nano) while ossec was running:

preloaded-vars.conf
ossec.conf

ossec was then restarted and then the root user logged off.

Unfortunately the three servers went offline (still had ssh) because a nano process ran away (I imagine that this would have happened had I used VI - so the editor type is not in question). Oddly, no crons started the nano service and no one was logged into the server at the time, and I'm sure that I properly closed out of nano. Before I killed the PID, top provided me with the following insight:

Mem:  28359680k total, 28325064k used,    34616k free,     3424k buffers
Swap:  4194296k total,  4194296k used,        0k free,    70208k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
26351 root      18   0 29.7g  25g  784 R 100.1 95.6   4424:38 nano

Note: the nano editor took up ~28GB of ram.

It took just over three days for this to take our servers down. I found something else before I killed the process. Notice that the nano process began two hours later after the file was first edited and root logged off. Notice that the tty = ?.

# ps -ef | grep nano
root      7836  7689  0 13:19 pts/5    00:00:00 grep nano
root     26351     1 99 Jan17 ?        3-01:44:46 nano /opt/ossec/etc/ossec.conf

Thankfully after I killed the PID I had:

Mem:  28359680k total,  1189924k used, 27169756k free,     4584k buffers
Swap:  4194296k total,   260284k used,  3934012k free,   104352k cached

I first expected to find that the process status would be stopped or traced but it was running (see the R before the %CPU usage stat)

Additional Notes. The preloaded-vars.conf file was created from a .tar file (therefore the 1000:1000). It was edited by root. The .sav file was created when I killed nano (and it's smaller than the main file). On two of the Xen servers nano was stuck editing the preloaded-vars.conf and on the third nano was stuck editing the ossec.conf file. No ossec.conf.save was create when nano was killed.

-rwxr-xr-x  1 1000 1000  2918 Jan 17 11:04 preloaded-vars.conf
-rw-------  1 root root  2909 Jan 20 13:13 preloaded-vars.conf.save

Further Findings: I've discovered that if I open the preloaded-vars.conf file and then from another terminal kill the pid, the default behavior of nano is to create a preloaded-vars.conf.save file when it receives a SIGHUP or SIGTERM message. Still don't understand what caused it to go off the rails to begin with.

1 Answers

Voted

voretaq7 · Answer 1 · 2011-01-21T14:33:51+08:00

Best Answer

voretaq7

2011-01-21T14:33:51+08:002011-01-21T14:33:51+08:00

Well, the answer to (2) is probably "You don't have any resource limits configured" - check out ulimit to solve that one.

No clue on the others though.

1

PID ran away with all our MEM and SWAPPED hard - OSSEC RHEL

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?