I'm running a Ubuntu 16.04 container under Proxmox 5.2-11. After applying the latest round of patches1 I'm unable to login at the console or over ssh.
I mounted the container root FS on the hypervisor and added pts/0
to /etc/security/access.conf
(we run pam_access
) and that allowed root login to the console. We have root : lxc/tty0 lxc/tty1 lxc/tty2
in access.conf
which I thought was sufficient so why I needed pts/0
now is puzzling.
I noticed ssh was not running so tried starting it by hand (/usr/sbin/sshd -DDD -f /etc/ssh/sshd_config
) and received this error:
Missing privilege separation directory: /var/run/sshd
I created the directory by hand, started ssh
and was able to finally login, but after a reboot, the problem persists. The directory is not being created.
Only useful bits in journalctl
and the only interesting part is something about "operation not permitted" but no further info.
I'm not too familiar with 16.04 so wondering how I can find out more about the problem. I have no /var/log/syslog
or /var/log/messages
only kern.log
so kind of lost.
systemd-sysv 229-4ubuntu21.9
libpam-systemd 229-4ubuntu21.9
libsystemd0 229-4ubuntu21.9
systemd 229-4ubuntu21.9
udev 229-4ubuntu21.9
libudev1 229-4ubuntu21.9
iproute2 4.3.0-1ubuntu3.16.04.4
libsasl2-modules-db 2.1.26.dfsg1-14ubuntu0.1
libsasl2-2 2.1.26.dfsg1-14ubuntu0.1
ldap-utils 2.4.42dfsg-2ubuntu3.4
libldap-2.4-2 2.4.42dfsg-2ubuntu3.4
libsasl2-modules 2.1.26.dfsg1-14ubuntu0.1
libgs9-common 9.25dfsg1-0ubuntu0.16.04.3
ghostscript 9.25dfsg1-0ubuntu0.16.04.3
libgs9 9.25dfsg1-0ubuntu0.16.04.3
[2]
Nov 27 10:13:48 host16 systemd[1]: Starting OpenBSD Secure Shell server...
Nov 27 10:13:48 host16 sshd[474]: Missing privilege separation directory: /var/run/sshd
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 27 10:13:48 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 27 10:13:48 host16 mysqld_safe[495]: Starting mysqld daemon with databases from /var/lib/mysql/mysql
Nov 27 10:13:48 host16 mysqld[500]: 181127 10:13:48 [Note] /usr/sbin/mysqld (mysqld 10.0.36-MariaDB-0ubuntu0.16.04.1) starting as process 499 ...
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 27 10:13:48 host16 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: Failed to reset devices.list on /system.slice/ssh.service: Operation not permitted
Nov 27 10:13:48 host16 systemd[1]: Starting OpenBSD Secure Shell server...
Nov 27 10:13:48 host16 sshd[502]: Missing privilege separation directory: /var/run/sshd
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 27 10:13:48 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 27 10:13:48 host16 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: Failed to reset devices.list on /system.slice/ssh.service: Operation not permitted
Nov 27 10:13:48 host16 systemd[1]: Starting OpenBSD Secure Shell server...
Nov 27 10:13:48 host16 sshd[503]: Missing privilege separation directory: /var/run/sshd
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 27 10:13:48 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 27 10:13:48 host16 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: Failed to reset devices.list on /system.slice/ssh.service: Operation not permitted
Nov 27 10:13:48 host16 systemd[1]: Starting OpenBSD Secure Shell server...
Nov 27 10:13:48 host16 sshd[504]: Missing privilege separation directory: /var/run/sshd
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 27 10:13:48 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:48 host16 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 27 10:13:49 host16 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 27 10:13:49 host16 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 27 10:13:49 host16 systemd[1]: ssh.service: Start request repeated too quickly.
Nov 27 10:13:49 host16 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 27 10:13:49 host16 systemd[1]: ssh.service: Unit entered failed state.
Nov 27 10:13:49 host16 systemd[1]: ssh.service: Failed with result 'start-limit-hit'.
Nov 27 10:13:49 host16 systemd[1]: Started /etc/rc.local Compatibility.
Nov 27 10:13:49 host16 systemd[1]: Failed to reset devices.list on /system.slice/plymouth-quit.service: Operation not permitted
Nov 27 10:13:49 host16 systemd[1]: Starting Terminate Plymouth Boot Screen...
Nov 27 10:13:49 host16 systemd[1]: Failed to reset devices.list on /system.slice/plymouth-quit-wait.service: Operation not permitted
Nov 27 10:13:49 host16 systemd[1]: Starting Hold until boot process finishes up...
Nov 27 10:13:49 host16 systemd[1]: Failed to reset devices.list on /system.slice/rc-local.service: Operation not permitted
Nov 27 10:13:49 host16 systemd[1]: Started Hold until boot process finishes up.
Nov 27 10:13:49 host16 systemd[1]: Started Container Getty on /dev/pts/1.
Nov 27 10:13:49 host16 systemd[1]: Started Container Getty on /dev/pts/0.
Nov 27 10:13:49 host16 systemd[1]: Failed to reset devices.list on /system.slice/console-getty.service: Operation not permitted
Nov 27 10:13:49 host16 systemd[1]: Started Console Getty.
Nov 27 10:13:49 host16 systemd[1]: Reached target Login Prompts.
Nov 27 10:13:49 host16 systemd[1]: Started Terminate Plymouth Boot Screen.
Nov 27 10:13:52 host16 nslcd[338]: accepting connections
Nov 27 10:13:52 host16 nslcd[275]: ...done.
Nov 27 10:13:52 host16 systemd[1]: Started LSB: LDAP connection daemon.
Nov 27 10:13:52 host16 systemd[1]: Failed to reset devices.list on /system.slice/cron.service: Operation not permitted
Nov 27 10:13:52 host16 systemd[1]: Started Regular background program processing daemon.
Nov 27 10:13:52 host16 systemd[1]: Failed to reset devices.list on /system.slice/atd.service: Operation not permitted
Added systemd-tmpfiles --create
output
Really bizarre.... I checked /tmp
and those files don't exist
One mistake you did was trying to start
sshd
by hand.If you instead start
sshd
through official means it should just work. Theservice
command knows what the correct way to start a service on your distribution is, and this should work:In case of sysv init scripts, that's everything you need to do. The reason the directory is missing is that
/var/run
is a symlink to/run
and/run
is atmpfs
mount point. That means on each boot/var/run
will start out empty. When you use theservice
command the/etc/init.d/ssh
script will be used to startsshd
but before doing that the script will create/var/run/sshd
if it doesn't exist.With
systemd
things work a bit differently. There will be a file called/usr/lib/tmpfiles.d/sshd.conf
with this content:During boot this should cause the
/var/run/sshd
directory to be created. What you need to verify that the file exists and has the correct contents. If the/var/run/sshd
directory is still missing you can verify if it gets created when you runsystemd-tmpfiles --create
manually.So /run (and /var/run symlinked to it) gets recreated every reboot. Except that systemd-tmpfiles isn't doing that for some files including (/var)/run/sshd.
Apparently, this is fixed by a OpenVZ kernel upgrade. But to actually fix it now you edit
/usr/lib/tmpfiles.d/sshd.conf
and remove/var
from the lined /var/run/sshd 0755 root root
to read instead:d /run/sshd 0755 root root
And that's it..!
And when openssh-server gets upgraded, we hope that they will have fixed this bug (or is it really a bug in systemd? or openvz??) -- otherwise you could run into the same problem.
Apparently this gets resolved when running an OpenVZ kernel 2.6.32-042stab134.7 or newer. I find it strange that there is no fix possible in the systemd start scripts somehow. Probably an ugly hack like automatically creating /run/sshd/ after starting up and then starting sshd would work.
The output of my
systemd-tmpfiles --create
:The changelog of OpenVZ 2.6.32-042stab134.7 says this:
For as much trouble as I've had with systemd over the years, I must admit this issue stems instead from the Ansible synchronize directive.
For some reason, after provisioning this host with our ansbile scripts, it left the / directory (as well as /etc, /opt and others) owned by an admin user, and not root. After running
chown
to correct things,/var/run/sshd
is now created on boot again.I really appreciate all the input but there is no bug here, at least in the sense that applying inappropriate ownership to root directories caused undefined system behavior.
I also had this behavior. The problem in my case was that ssh.socket got enabled somehow. When disabling ssh.socket, ssh.service does start normally on boot.
One way I've seen around this is to simply create that directory, yourself, in your Dockerfile.