I'm currently running a 3-Node Hyperconverged Proxmox/Ceph cluster. I'm in the process of transferring a large amount of data (100TB+) from an old unRAID instance to the new cluster infrastructure. Having to copy data 1HDD as at time to the new CephFS pool, then wipe the disk and add it to the OSD pool. I don't have any more space HDDs laying around or the budget to buy more drives which would make this process much easier.
Half way through the process I'm now stuck with the "ceph balancer" reporting "Too many objects are misplaced;" and "283 active+remapped+backfill_wait" has remained unchanged for over 12hrs now. The cluster is idling, but not "self healing" as I would expect it to.
Before I started this migration, I pushed and pulled Ceph and broke it a number of ways as part of testing. I was always able to get it back to Healthy_OK without any Data loss or extended downtime bar a service/server restart. I've read through the docs for this issue and haven't found anything useful as to how to kick this into gear.
https://docs.ceph.com/en/latest/rados/operations/health-checks/#object-misplaced
Data migration is currently on hold.
NB: 1-There is a bit of a mismatch between my OSD sizes (the reweights) were me trying to get Ceph to spread to the larger drives. Instead of constantly filling small drives. 2-nearfull OSD is one of the 3 4TB drives (there are 16TB drives nearly empty but it's not balancing across.
ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.000087",
"last_optimize_started": "Mon Jun 3 17:56:27 2024",
"mode": "upmap",
"no_optimization_needed": false,
"optimize_result": "Too many objects (0.401282 > 0.050000) are misplaced; try again later",
"plans": []
}
ceph -s
cluster:
id: {id}
health: HEALTH_WARN
1 nearfull osd(s)
2 pgs not deep-scrubbed in time
2 pool(s) nearfull
1 pools have too many placement groups
services:
mon: 3 daemons, quorum {node1},{node2},{node3} (age 31h)
mgr: {node3}(active, since 26h), standbys: {node1}, {node2}
mds: 2/2 daemons up, 1 standby
osd: 23 osds: 23 up (since 23h), 23 in (since 2h); 284 remapped pgs
data:
volumes: 1/1 healthy
pools: 7 pools, 801 pgs
objects: 10.19M objects, 37 TiB
usage: 57 TiB used, 55 TiB / 112 TiB avail
pgs: 12261409/30555598 objects misplaced (40.128%)
513 active+clean
283 active+remapped+backfill_wait
2 active+clean+scrubbing+deep
2 active+clean+scrubbing
1 active+remapped+backfilling
io:
client: 15 MiB/s wr, 0 op/s rd, 71 op/s wr
ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
8 hdd 7.31639 1.00000 7.3 TiB 665 GiB 625 GiB 2 KiB 2.2 GiB 6.7 TiB 8.88 0.17 27 up
10 hdd 9.13480 1.00000 9.1 TiB 40 GiB 30 MiB 1 KiB 1.3 GiB 9.1 TiB 0.43 0.01 16 up
5 ssd 0.72769 1.00000 745 GiB 248 GiB 246 GiB 189 MiB 2.5 GiB 497 GiB 33.32 0.65 133 up
6 ssd 0.72769 1.00000 745 GiB 252 GiB 251 GiB 104 MiB 1.1 GiB 493 GiB 33.80 0.66 126 up
7 hdd 5.49709 1.00000 5.5 TiB 259 GiB 219 GiB 1 KiB 1.6 GiB 5.2 TiB 4.61 0.09 9 up
22 hdd 9.13480 1.00000 9.1 TiB 626 GiB 586 GiB 1 KiB 2.7 GiB 8.5 TiB 6.70 0.13 12 up
15 ssd 0.72769 1.00000 745 GiB 120 GiB 118 GiB 53 MiB 1.3 GiB 626 GiB 16.05 0.31 71 up
16 ssd 0.87329 1.00000 894 GiB 128 GiB 126 GiB 56 MiB 1.9 GiB 766 GiB 14.35 0.28 78 up
17 ssd 0.43660 1.00000 447 GiB 63 GiB 62 GiB 25 MiB 1.2 GiB 384 GiB 14.11 0.28 40 up
18 ssd 0.43660 1.00000 447 GiB 91 GiB 89 GiB 24 MiB 1.8 GiB 357 GiB 20.25 0.40 48 up
19 ssd 0.72769 1.00000 745 GiB 132 GiB 130 GiB 67 MiB 2.1 GiB 613 GiB 17.71 0.35 82 up
20 ssd 0.72769 1.00000 745 GiB 106 GiB 104 GiB 24 MiB 1.9 GiB 639 GiB 14.28 0.28 65 up
21 ssd 0.72769 1.00000 745 GiB 127 GiB 124 GiB 62 MiB 2.2 GiB 619 GiB 17.00 0.33 75 up
0 hdd 16.40039 1.00000 16 TiB 12 TiB 12 TiB 7 KiB 25 GiB 4.5 TiB 72.65 1.42 241 up
1 hdd 3.66800 0.50000 3.7 TiB 2.6 TiB 2.6 TiB 6 KiB 6.1 GiB 1.1 TiB 71.08 1.39 56 up
2 hdd 3.66800 0.09999 3.7 TiB 2.9 TiB 2.9 TiB 6 KiB 7.3 GiB 793 GiB 78.89 1.55 56 up
3 hdd 14.58199 1.00000 15 TiB 10 TiB 10 TiB 6 KiB 22 GiB 4.4 TiB 69.94 1.37 216 up
4 hdd 3.66800 0.09999 3.7 TiB 3.2 TiB 3.1 TiB 6 KiB 7.3 GiB 501 GiB 86.66 1.70 63 up
11 hdd 14.58199 0.95001 15 TiB 12 TiB 12 TiB 9 KiB 24 GiB 2.7 TiB 81.23 1.59 233 up
13 hdd 14.58199 0.95001 15 TiB 11 TiB 11 TiB 6 KiB 24 GiB 3.4 TiB 76.77 1.51 223 up
9 ssd 0.72769 1.00000 745 GiB 139 GiB 137 GiB 63 MiB 1.5 GiB 606 GiB 18.65 0.37 80 up
12 ssd 1.81940 1.00000 1.8 TiB 311 GiB 308 GiB 146 MiB 2.5 GiB 1.5 TiB 16.67 0.33 182 up
14 ssd 1.45549 1.00000 1.5 TiB 247 GiB 245 GiB 88 MiB 2.2 GiB 1.2 TiB 16.57 0.32 143 up
TOTAL 112 TiB 57 TiB 57 TiB 902 MiB 145 GiB 55 TiB 51.00
Deleted about 1TB or redundant data and set Ceph to prioritise cluster traffic and maintenance tasks over handling IO.
These numbers will be reset before the cluster goes back into production, it's kinda pointless to have fast IO if the backend can't handle more storage faster.
Repair ETA 2d