Ping a Specific Port

Question

Tilman Schmidt

Asked: 2020-06-25 08:25:38 +0800 CST2020-06-25 08:25:38 +0800 CST 2020-06-25 08:25:38 +0800 CST

Systemd becomes unresponsive

772

Within three weeks, on two of my Ubuntu 20.04LTS servers systemd has suddenly become unresponsive. Symptoms:

All systemctl commands for controlling services or accessing logs fail with error messages:

Failed to retrieve unit state: Connection timed out
Failed to get properties: Connection timed out

systemd does not heed the signal from logrotate for reopening its log, continuing to write to the renamed log file /var/log/syslog.1 while the newly created /var/log/syslog remains empty.
Lots of zombie processes accumulating from cronjobs and system management tasks, ie. PID 1 systemd neglects its duty of reaping orphaned processes.
Running services continue to run normally but starting or stopping services is no longer possible as even the legacy scripts in /etc/init.d redirect to the non-functional systemctl.
Nothing unusual in the logs except the Connection timed out messages from attempted interactions with systemd.

The commonly proposed corrective measures:

systemctl daemon-reexec
kill -TERM 1
removing /run/systemd/system/session-*.scope.d

do not fix the problem. The only remedy is to reboot the entire system, which is of course both disruptive and problematic for a server on the other side of the globe.

The same problem occurred with Ubuntu 16.04LTS about once per month in a population of about 100 servers. It is much less frequent since the upgrade to 20.04LTS, but not completely gone. Of the two servers that have been hit since 20.04LTS, one had already been hit when it was still running 16.04LTS.

Questions:

What are possible causes for that sort of systemd malfunction?
How can I diagnose this further?
Is there a less disruptive way to recover from an unresponsive systemd than to reboot?

1 Answers

Voted

Thiago Conrado · Answer 1 · 2020-12-03T07:25:18+08:00

Thiago Conrado

2020-12-03T07:25:18+08:002020-12-03T07:25:18+08:00

this is a very old question, but I hope it can save someone else time.

I had a identical problem, some zombies and systemctl respond any request with a timeout. As expected the problem was to remove the daemons. At least on our case the solution was:

telinit u
systemctl daemon-reexec
systemctl daemon-reload

2

Systemd becomes unresponsive

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?