We are running a Beowulf cluster using the Scyld distribution from Penguin Computing, and it looks like cgroups are configured on the head node, but not the compute nodes. I'm trying to configure Slurm to use the proctrack/cgroup
plugin, but it won't work on the compute nodes.
For example, I can list the cgroups on the head node, but not on a compute node:
$ bpsh -1 systemd-cgls
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
├─user.slice
...
$ bpsh 1 systemd-cgls
Failed to create bus connection: No such file or directory
$
If I look at the mount point for the cgroup system, it's mounted on the head node, but not the compute nodes. The compute nodes just have an empty directory at that location.
$ bpsh -1 findmnt /sys/fs/cgroup
TARGET SOURCE FSTYPE OPTIONS
/sys/fs/cgroup tmpfs tmpfs ro,nosuid,nodev,noexec,mode=755
$ bpsh 1 findmnt /sys/fs/cgroup
$ bpsh 1 ls -l /sys/fs/cgroup
total 0
$
I assume I have to start some cgroup service on the compute nodes, but how? I found the RHEL documentation on cgroups, but it only describes using them, not the initial setup.
Update
man7.org describes how to mount cgroups controllers, but says this:
Note that on many systems, the v1 controllers are automatically mounted under /sys/fs/cgroup; in particular, systemd(1) automatically creates such mount points.
That explains why I can't see any configuration for cgroups on the head node: they're just mounted automatically. Why aren't they mounted automatically on the compute nodes?
It looks like the drivers are loaded on the compute node, but not mounted:
$ cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 6 1 1
cpu 4 1 1
cpuacct 4 1 1
memory 2 1 1
devices 3 1 1
freezer 10 1 1
net_cls 7 1 1
blkio 5 1 1
perf_event 9 1 1
hugetlb 8 1 1
pids 11 1 1
net_prio 7 1 1
$ bpsh 0 cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 0 1 1
cpu 0 1 1
cpuacct 0 1 1
memory 0 1 1
devices 0 1 1
freezer 0 1 1
net_cls 0 1 1
blkio 0 1 1
perf_event 0 1 1
hugetlb 0 1 1
pids 0 1 1
net_prio 0 1 1
I tried searching for "cgroup" in /var/log/messages
, and I found the head node initializing the cgroup subsystems, but nothing from the compute nodes.
0 Answers