I recently replaced one of the HDD's that provides a brick in a GlusterFS cluster. I was able to get that HDD mapped back into a brick, and then get GlusterFS to replicate successfully to it.
However there was one issue with that entire process that did not appear to work for me. I attempted to run the "heal" command on the volume with the replaced brick but would continuously run into this issue:
$ gluster volume heal nova
Locking failed on c551316f-7218-44cf-bb36-befe3d3df34b. Please check log file for details.
Locking failed on ae62c691-ae55-4c99-8364-697cb3562668. Please check log file for details.
Locking failed on cb78ba3c-256f-4413-ae7e-aa5c0e9872b5. Please check log file for details.
Locking failed on 79a6a414-3569-482c-929f-b7c5da16d05e. Please check log file for details.
Locking failed on 5f43c6a4-0ccd-424a-ae56-0492ec64feeb. Please check log file for details.
Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
Locking failed on 6c0111fc-b5e7-4350-8be5-3179a1a5187e. Please check log file for details.
Locking failed on 88fcb687-47aa-4921-b3ab-d6c3b330b32a. Please check log file for details.
Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.
The logs basically echoed the above, specifically:
$ tail etc-glusterfs-glusterd.vol.log
[2015-08-03 23:08:03.289249] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289258] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
[2015-08-03 23:08:03.289279] W [rpc-clnt-ping.c:199:rpc_clnt_ping_cbk] 0-management: socket or ib related error
[2015-08-03 23:08:03.289827] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289858] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
[2015-08-03 23:08:03.290509] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.290529] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.
[2015-08-03 23:08:03.290597] E [glusterd-syncop.c:1804:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2015-08-03 23:07:03.351603] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-08-03 23:07:03.351644] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
These other logs had messages around the time that I attempted the above:
$ ls -ltr
-rw------- 1 root root 41704 Aug 2 12:07 glfsheal-nova.log
-rw------- 1 root root 15986 Aug 2 12:07 cmd_history.log-20150802
-rw------- 1 root root 290359 Aug 3 19:07 var-lib-nova-instances.log
-rw------- 1 root root 221829 Aug 3 19:07 glustershd.log
-rw------- 1 root root 195472 Aug 3 19:07 nfs.log
-rw------- 1 root root 61831116 Aug 3 19:07 var-lib-nova-mnt-92ef2ec54fd18595ed18d8e6027a1b3d.log
-rw------- 1 root root 3504 Aug 3 19:08 cmd_history.log
-rw------- 1 root root 89294 Aug 3 19:08 cli.log
-rw------- 1 root root 136421 Aug 3 19:08 etc-glusterfs-glusterd.vol.log
Looking through them, it wasn't clear if any of it was relevant to this particular problem.
With the above setup I initially thought that I could only run the heal command from the primary node of the GlusterFS cluster, but as it turned out, my true issue lied in the fact that the 11 nodes within the GlusterFS cluster were running 2 different versions of GlusterFS.
Once I realized this, I updated all the nodes to the latest version of GlusterFS (3.7.3) and was able to perform heals from any of the nodes, as one would expect.