Ping a Specific Port

Question

Matthew

Asked: 2013-06-01 10:43:40 +0800 CST2013-06-01 10:43:40 +0800 CST 2013-06-01 10:43:40 +0800 CST

"Cannot fork: retry..." error in RH5. Need info on nproc

772

We had a server effectively go down this morning. SSH access cut out, and at least temporarily network access went down as well. We were able to log in using out-of-band access and were presented with a screen full of "Init: cannot fork, retry.." messages.

When trying to log in, when we typed in a userid and bad password, we got the normal "invalid user/pass" error. However if we typed in a correct userid and password, we were simply presented with the MOTD and the login screen again. It looks like the system was no longer able to launch any new processes (logging in successfully should launch a shell, if it can't I guess it drops you back at login?).

I found a description of the issue at Red Hat's knowledgebase (https://access.redhat.com/site/solutions/39497), but there is very little supplementary information on the error, just a suggested solution.

What exactly does nproc do? Is it a hard limit on the number of processes the system can have running at any point in time? When nproc is exceeded does it cause impacts like we saw? Is there any way to set it to unlimited? If not, how can we know what a safe or unsafe range is?

Any help or guidance would be very much appreciated, since it caused production issues and is now on the plate of several layer-8 folks :(

Edit: Also in /var/log/messages:

May 31 15:26:00 servername udevd[1637]: udev_event_run: fork of child failed: Resource temporarily unavailable
May 31 15:26:00 servername last message repeated 3 times
May 31 15:26:00 servername udevd-event[2461]: run_program: fork of '/lib/udev/udev_run_hotplugd' failed: Resource temporarily unavailable
May 31 15:26:00 servername udevd-event[2461]: run_program: fork of '/lib/udev/udev_run_devd' failed: Resource temporarily unavailable
May 31 15:26:00 servername udevd[1637]: udev_event_run: fork of child failed: Resource temporarily unavailable

2 Answers

Voted

Soham Chakraborty · Answer 1 · 2013-11-29T16:55:34+08:00

The error message means that the server ran out of the limit of the number of processes. There are two limits - hard and soft. When you fork(), you create a new process from the existing process. Here, we have some condition that is not allowing fork().

You have a problem in forking udev child processes. I guess, this is happening on boot time. See this

/lib/udev/udev_run_hotplugd

So there is some hot-pluggable device there. Otherwise, I don't see a reason for that library to be called.

Two suggestions for now -

1) If you can reproduce it, strace it if possible. Get the syscall where it is failing. Much easier that way. I don't exactly remember which syscall it is.

2) Run udev in debug mode. Change udev_log=info to debug BUT test it first. It produces HUGE amount of logs and without a good ring buffer size or an enormous wide monitor, missing out the messages shown on terminal is fairly common.

But I have seen this issue a lot. Lemme tell you, why not ask the Red Hat folks if you have a subscription.

mdpc · Answer 2 · 2013-06-01T10:48:27+08:00

mdpc

2013-06-01T10:48:27+08:002013-06-01T10:48:27+08:00

Sounds like either (1) you ran out of memory+swap space, or (2) an errant process flooded your process table preventing new processes from being created.

0

"Cannot fork: retry..." error in RH5. Need info on nproc

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?