I've just set up slurm where one physical machine will be the only system in the cluster (so far). This is on Ubuntu 18.04.
I have slurmdbd running, but when I attempt to start up slurmd and slurmctld this times out. Why?
I'm issuing the following commands:
systemctl start slurmctld
systemctl start slurmd
I've also tried:
systemctl start slurmctld slurmd
and:
systemctl start slurmd slurmctld
This fails with the following, for slurmctld:
systemd[1]: slurmd.service: Can't open PID file /var/run/slurm-llnl/slurm-llnl/slurmd.pid (yet?) after start: No such file or directory
systemd[1]: slurmctld.service: Start operation timed out. Terminating.
systemd[1]: slurmctld.service: Failed with result 'timeout'.
systemd[1]: Failed to start Slurm controller daemon.
And for slurmd:
systemd[1]: slurmd.service: Start operation timed out. Terminating.
systemd[1]: slurmd.service: Failed with result 'timeout'.
systemd[1]: Failed to start Slurm node daemon.
However, when I start these manually (using two terminals) by issuing:
slurmctld -Dvvv
slurmd -Dvvv
Everything appears to work.
Why is this? How am I supposed to start slurm?
These are the service files (which should be standard, I didn't touch them except for adding verbose arguments, but then removing them again later):
# cat /lib/systemd/system/slurmd.service
[Unit]
Description=Slurm node daemon
After=network.target munge.service
ConditionPathExists=/etc/slurm-llnl/slurm.conf
Documentation=man:slurmd(8)
[Service]
Type=forking
EnvironmentFile=-/etc/default/slurmd
ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurm-llnl/slurmd.pid
KillMode=process
LimitNOFILE=51200
LimitMEMLOCK=infinity
LimitSTACK=infinity
[Install]
WantedBy=multi-user.target
# cat /lib/systemd/system/slurmctld.service
[Unit]
Description=Slurm controller daemon
After=network.target munge.service
ConditionPathExists=/etc/slurm-llnl/slurm.conf
Documentation=man:slurmctld(8)
[Service]
Type=forking
EnvironmentFile=-/etc/default/slurmctld
ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurm-llnl/slurmctld.pid
[Install]
WantedBy=multi-user.target
Look carefully at your log:
This path does not match the one declared in your
/lib/systemd/system/slurmd.service
. To fix it, fieldSlurmdPidFile
in file/etc/slurm-llnl/slurm.conf
should be corrected. The same goes forSlurmctldPidFile
.Note also that the easy configurator
/usr/share/doc/slurm-wlm-doc/html/configurator.easy.html
offers/var/run/slurmd.pid
by default, which fails as well.