Ping a Specific Port

Question

slm

Asked: 2015-08-04 15:27:38 +0800 CST2015-08-04 15:27:38 +0800 CST 2015-08-04 15:27:38 +0800 CST

Healing GlusterFS doesn't appear to work?

772

I recently replaced one of the HDD's that provides a brick in a GlusterFS cluster. I was able to get that HDD mapped back into a brick, and then get GlusterFS to replicate successfully to it.

However there was one issue with that entire process that did not appear to work for me. I attempted to run the "heal" command on the volume with the replaced brick but would continuously run into this issue:

$ gluster volume heal nova
Locking failed on c551316f-7218-44cf-bb36-befe3d3df34b. Please check log file for details.
Locking failed on ae62c691-ae55-4c99-8364-697cb3562668. Please check log file for details.
Locking failed on cb78ba3c-256f-4413-ae7e-aa5c0e9872b5. Please check log file for details.
Locking failed on 79a6a414-3569-482c-929f-b7c5da16d05e. Please check log file for details.
Locking failed on 5f43c6a4-0ccd-424a-ae56-0492ec64feeb. Please check log file for details.
Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
Locking failed on 6c0111fc-b5e7-4350-8be5-3179a1a5187e. Please check log file for details.
Locking failed on 88fcb687-47aa-4921-b3ab-d6c3b330b32a. Please check log file for details.
Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.

The logs basically echoed the above, specifically:

$ tail etc-glusterfs-glusterd.vol.log
[2015-08-03 23:08:03.289249] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289258] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
[2015-08-03 23:08:03.289279] W [rpc-clnt-ping.c:199:rpc_clnt_ping_cbk] 0-management: socket or ib related error
[2015-08-03 23:08:03.289827] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289858] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
[2015-08-03 23:08:03.290509] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.290529] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.
[2015-08-03 23:08:03.290597] E [glusterd-syncop.c:1804:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2015-08-03 23:07:03.351603] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-08-03 23:07:03.351644] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped

These other logs had messages around the time that I attempted the above:

$ ls -ltr
-rw-------   1 root root      41704 Aug  2 12:07 glfsheal-nova.log
-rw-------   1 root root      15986 Aug  2 12:07 cmd_history.log-20150802
-rw-------   1 root root     290359 Aug  3 19:07 var-lib-nova-instances.log
-rw-------   1 root root     221829 Aug  3 19:07 glustershd.log
-rw-------   1 root root     195472 Aug  3 19:07 nfs.log
-rw-------   1 root root   61831116 Aug  3 19:07 var-lib-nova-mnt-92ef2ec54fd18595ed18d8e6027a1b3d.log
-rw-------   1 root root       3504 Aug  3 19:08 cmd_history.log
-rw-------   1 root root      89294 Aug  3 19:08 cli.log
-rw-------   1 root root     136421 Aug  3 19:08 etc-glusterfs-glusterd.vol.log

Looking through them, it wasn't clear if any of it was relevant to this particular problem.

1 Answers

Voted

slm · Answer 1 · 2015-08-04T15:27:38+08:00

Best Answer

slm

2015-08-04T15:27:38+08:002015-08-04T15:27:38+08:00

With the above setup I initially thought that I could only run the heal command from the primary node of the GlusterFS cluster, but as it turned out, my true issue lied in the fact that the 11 nodes within the GlusterFS cluster were running 2 different versions of GlusterFS.

Once I realized this, I updated all the nodes to the latest version of GlusterFS (3.7.3) and was able to perform heals from any of the nodes, as one would expect.

1

Healing GlusterFS doesn't appear to work?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?