I have installed a Ceph cluster across three nodes which needed completely reinstalling after testing. It seems that on one of the nodes, some configuration data remained which Ceph is still picking up.
On boot, Ceph seems to be looking for old OSD's which no longer exist. Here is the contents of our /var/log/ceph/ceph-volume.log
.
[2022-03-08 09:32:10,581][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 1-f5f2a63b-540d-4277-ba18-a7db63ce5359
[2022-03-08 09:32:10,592][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 3-eb671fc9-6db3-444e-b939-ae37ecaa1446
[2022-03-08 09:32:10,825][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.2 with osd_fsid
e45faa5d-f0af-45a9-8f6f-dac037d69569
[2022-03-08 09:32:10,837][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.0 with osd_fsid
16d1d2ad-37c1-420a-bc18-ce89ea9654f9
[2022-03-08 09:32:10,844][systemd][WARNING] command returned non-zero exit
status: 1
[2022-03-08 09:32:10,844][systemd][WARNING] failed activating OSD, retries
left: 25
[2022-03-08 09:32:10,853][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.1 with osd_fsid
f5f2a63b-540d-4277-ba18-a7db63ce5359
[2022-03-08 09:32:10,853][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.0 with osd_fsid
59992b5f-806b-4bed-9951-bca0ef4e6f0a
[2022-03-08 09:32:10,855][systemd][WARNING] command returned non-zero exit
status: 1
[2022-03-08 09:32:10,855][systemd][WARNING] failed activating OSD, retries
left: 25
[2022-03-08 09:32:10,865][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.3 with osd_fsid
eb671fc9-6db3-444e-b939-ae37ecaa1446
For comparison, the volumes we do have installed (found by ceph-volume lvm list
) are -
osd fsid 3038f5ae-c579-410b-bb6d-b3590c2834ff
osd fsid b693f0d5-68de-462e-a1a8-fbdc137f4da4
osd fsid 4639ef09-a958-40f9-86ff-608ac651ca58
osd fsid c4531f50-b192-494d-8e47-533fe780bfa3
Any ideas where this data might be coming from and how I can remove these 'orphaned' volumes?
I managed to resolve this. When Ceph sets up it's OSD's, it also creates a systemd service file to manage them.
I just had to jump into
/etc/systemd/system/multi-user.target.wants
and remove each troublesome service that was left behind by the old installation -