Ping a Specific Port

Question

slm

Asked: 2015-08-02 13:26:31 +0800 CST2015-08-02 13:26:31 +0800 CST 2015-08-02 13:26:31 +0800 CST

How do you add a replacement HDD to a glusterfs volume?

772

I recently inherited a glusterfs setup that I know literally zero about. One of the HDD's that provides a brick to the volume failed and I was able to replace that HDD and the host OS can see the HDD. I've successfully formatted it and it's sitting in the position where the replaced HDD is now mounted as the HDD it replaced.

Here's where I need help.

I believe I need to run a heal command of some sort but am rather confused by how to do this with GlusterFS. Here's some of the background info.

$ mount |grep glus
/dev/sdc1 on /data/glusterfs/sdc1 type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sdg1 on /data/glusterfs/sdg1 type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sdf1 on /data/glusterfs/sdf1 type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sdb1 on /data/glusterfs/sdb1 type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sdd1 on /data/glusterfs/sdd1 type xfs (rw,relatime,attr2,inode64,noquota)
127.0.0.1:/nova on /var/lib/nova/instances type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
127.0.0.1:/cinder on /var/lib/nova/mnt/92ef2ec54fd18595ed18d8e6027a1b3d type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
/dev/sde1 on /data/glusterfs/sde1 type xfs (rw,relatime,attr2,inode64,noquota)

The HDD I replaced is /dev/sde1. I have it mounted (as seen above) and when I run glusterfs volume info I see that it's listed there:

$ gluster volume info nova

Volume Name: nova
Type: Distributed-Replicate
Volume ID: f0d72d64-288c-4e72-9c53-2d16ce5687ac
Status: Started
Number of Bricks: 10 x 2 = 20
Transport-type: tcp
Bricks:
Brick1: icicle07:/data/glusterfs/sdb1/brick
Brick2: icicle08:/data/glusterfs/sdb1/brick
Brick3: icicle09:/data/glusterfs/sdb1/brick
Brick4: icicle10:/data/glusterfs/sdb1/brick
Brick5: icicle11:/data/glusterfs/sdb1/brick
Brick6: icicle07:/data/glusterfs/sdc1/brick
Brick7: icicle08:/data/glusterfs/sdc1/brick
Brick8: icicle09:/data/glusterfs/sdc1/brick
Brick9: icicle10:/data/glusterfs/sdc1/brick
Brick10: icicle11:/data/glusterfs/sdc1/brick
Brick11: icicle07:/data/glusterfs/sdd1/brick
Brick12: icicle08:/data/glusterfs/sdd1/brick
Brick13: icicle09:/data/glusterfs/sdd1/brick
Brick14: icicle10:/data/glusterfs/sdd1/brick
Brick15: icicle11:/data/glusterfs/sdd1/brick
Brick16: icicle07:/data/glusterfs/sde1/brick
Brick17: icicle08:/data/glusterfs/sde1/brick
Brick18: icicle09:/data/glusterfs/sde1/brick
Brick19: icicle10:/data/glusterfs/sde1/brick
Brick20: icicle11:/data/glusterfs/sde1/brick

Trying to run a heal command results in this:

$ gluster volume heal nova full
Locking failed on c551316f-7218-44cf-bb36-befe3d3df34b. Please check log file for details.
Locking failed on 79a6a414-3569-482c-929f-b7c5da16d05e. Please check log file for details.
Locking failed on ae62c691-ae55-4c99-8364-697cb3562668. Please check log file for details.
Locking failed on 5f43c6a4-0ccd-424a-ae56-0492ec64feeb. Please check log file for details.
Locking failed on cb78ba3c-256f-4413-ae7e-aa5c0e9872b5. Please check log file for details.
Locking failed on 6c0111fc-b5e7-4350-8be5-3179a1a5187e. Please check log file for details.
Locking failed on 88fcb687-47aa-4921-b3ab-d6c3b330b32a. Please check log file for details.
Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.

Attempts to run the command again results in this:

$ gluster volume heal nova full
Another transaction is in progress. Please try again after sometime.

Restarting glusterd will flush that lock, but I'm at a loss what the above heal command is actually trying to tell me. The logs I find useless since there are several, and are not entirely clear to me which goes with what:

$ ls -ltr /var/log/glusterfs
...
rw------- 1 root root      41711 Aug  1 00:51 glfsheal-nova.log-20150801
-rw------- 1 root root          0 Aug  1 03:39 glfsheal-nova.log
-rw------- 1 root root       4297 Aug  1 14:29 cmd_history.log-20150531
-rw------- 1 root root     830449 Aug  1 17:03 var-lib-nova-instances.log
-rw------- 1 root root     307535 Aug  1 17:03 glustershd.log
-rw------- 1 root root     255801 Aug  1 17:03 nfs.log
-rw------- 1 root root       4544 Aug  1 17:12 cmd_history.log
-rw------- 1 root root      28063 Aug  1 17:12 cli.log
-rw------- 1 root root   17370562 Aug  1 17:14 etc-glusterfs-glusterd.vol.log
-rw------- 1 root root 1759170187 Aug  1 17:14 var-lib-nova-mnt-92ef2ec54fd18595ed18d8e6027a1b3d.log

Any guidance would be appreciated.

EDIT #1

It seems as though the system is having issue when it attempts to bring up the corresponding glusterfsd for the brick/HDD that I've added back in. Here's the output of the logfile: /var/log/glusterfs/bricks/data-glusterfs-sde1-brick.log:

[2015-08-01 21:40:25.143963] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s icicle11 --volfile-id nova.icicle11.data-glusterfs-sde1-brick -p /var/lib/glusterd/vols/nova/run/icicle11-data-glusterfs-sde1-brick.pid -S /var/run/gluster/d0a51f364706915faa35c6cca46e9ce6.socket --brick-name /data/glusterfs/sde1/brick -l /var/log/glusterfs/bricks/data-glusterfs-sde1-brick.log --xlator-option *-posix.glusterd-uuid=5e09f3ec-bfbc-490b-bd93-8e083e8ebd05 --brick-port 49155 --xlator-option nova-server.listen-port=49155)
[2015-08-01 21:40:25.190863] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-08-01 21:40:48.359478] I [graph.c:269:gf_add_cmdline_options] 0-nova-server: adding option 'listen-port' for volume 'nova-server' with value '49155'
[2015-08-01 21:40:48.359513] I [graph.c:269:gf_add_cmdline_options] 0-nova-posix: adding option 'glusterd-uuid' for volume 'nova-posix' with value '5e09f3ec-bfbc-490b-bd93-8e083e8ebd05'
[2015-08-01 21:40:48.359696] I [server.c:392:_check_for_auth_option] 0-/data/glusterfs/sde1/brick: skip format check for non-addr auth option auth.login./data/glusterfs/sde1/brick.allow
[2015-08-01 21:40:48.359709] I [server.c:392:_check_for_auth_option] 0-/data/glusterfs/sde1/brick: skip format check for non-addr auth option auth.login.a9c47852-7dcf-4f89-80e5-110101943f36.password
[2015-08-01 21:40:48.359719] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2015-08-01 21:40:48.360606] I [rpcsvc.c:2213:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2015-08-01 21:40:48.360679] W [options.c:936:xl_opt_validate] 0-nova-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2015-08-01 21:40:48.361713] E [ctr-helper.c:250:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is disabled.
[2015-08-01 21:40:48.361745] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-pagesize from params.Assigning default value: 4096
[2015-08-01 21:40:48.361762] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-cachesize from params.Assigning default value: 1000
[2015-08-01 21:40:48.361774] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-journalmode from params.Assigning default value: wal
[2015-08-01 21:40:48.361795] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-wal-autocheckpoint from params.Assigning default value: 1000
[2015-08-01 21:40:48.361812] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-sync from params.Assigning default value: normal
[2015-08-01 21:40:48.361825] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-autovacuum from params.Assigning default value: none
[2015-08-01 21:40:48.362666] I [trash.c:2363:init] 0-nova-trash: no option specified for 'eliminate', using NULL
[2015-08-01 21:40:48.362906] E [posix.c:5894:init] 0-nova-posix: Extended attribute trusted.glusterfs.volume-id is absent
[2015-08-01 21:40:48.362922] E [xlator.c:426:xlator_init] 0-nova-posix: Initialization of volume 'nova-posix' failed, review your volfile again
[2015-08-01 21:40:48.362930] E [graph.c:322:glusterfs_graph_init] 0-nova-posix: initializing translator failed
[2015-08-01 21:40:48.362956] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
[2015-08-01 21:40:48.363612] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down

EDIT #2

OK so one issue appears to be with the extended attribute not being present on the mounted brick's filesystem. This command is suppose to fix that:

$ grep volume-id /var/lib/glusterd/vols/nova/info | cut -d= -f2 | sed 's/-//g'
f0d72d64288c4e729c532d16ce5687ac
$ setfattr -n trusted.glusterfs.volume-id -v 0xf0d72d64288c4e729c532d16ce5687ac /data/glusterfs/sde1

Yet I'm still getting the above warning about the attribute being absent:

[2015-08-01 18:44:50.481350] E [posix.c:5894:init] 0-nova-posix: Extended attribute trusted.glusterfs.volume-id is absent

Full output from the glusterd restart:

[2015-08-01 22:03:41.467668] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s icicle11 --volfile-id nova.icicle11.data-glusterfs-sde1-brick -p /var/lib/glusterd/vols/nova/run/icicle11-data-glusterfs-sde1-brick.pid -S /var/run/gluster/d0a51f364706915faa35c6cca46e9ce6.socket --brick-name /data/glusterfs/sde1/brick -l /var/log/glusterfs/bricks/data-glusterfs-sde1-brick.log --xlator-option *-posix.glusterd-uuid=5e09f3ec-bfbc-490b-bd93-8e083e8ebd05 --brick-port 49155 --xlator-option nova-server.listen-port=49155)
[2015-08-01 22:03:41.514878] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-08-01 22:04:00.334285] I [graph.c:269:gf_add_cmdline_options] 0-nova-server: adding option 'listen-port' for volume 'nova-server' with value '49155'
[2015-08-01 22:04:00.334330] I [graph.c:269:gf_add_cmdline_options] 0-nova-posix: adding option 'glusterd-uuid' for volume 'nova-posix' with value '5e09f3ec-bfbc-490b-bd93-8e083e8ebd05'
[2015-08-01 22:04:00.334518] I [server.c:392:_check_for_auth_option] 0-/data/glusterfs/sde1/brick: skip format check for non-addr auth option auth.login./data/glusterfs/sde1/brick.allow
[2015-08-01 22:04:00.334529] I [server.c:392:_check_for_auth_option] 0-/data/glusterfs/sde1/brick: skip format check for non-addr auth option auth.login.a9c47852-7dcf-4f89-80e5-110101943f36.password
[2015-08-01 22:04:00.334540] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2015-08-01 22:04:00.335316] I [rpcsvc.c:2213:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2015-08-01 22:04:00.335371] W [options.c:936:xl_opt_validate] 0-nova-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2015-08-01 22:04:00.336170] E [ctr-helper.c:250:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is disabled.
[2015-08-01 22:04:00.336190] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-pagesize from params.Assigning default value: 4096
[2015-08-01 22:04:00.336197] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-cachesize from params.Assigning default value: 1000
[2015-08-01 22:04:00.336211] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-journalmode from params.Assigning default value: wal
[2015-08-01 22:04:00.336217] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-wal-autocheckpoint from params.Assigning default value: 1000
[2015-08-01 22:04:00.336235] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-sync from params.Assigning default value: normal
[2015-08-01 22:04:00.336241] W [gfdb_sqlite3.h:238:gfdb_set_sql_params] 0-nova-changetimerecorder: Failed to retrieve sql-db-autovacuum from params.Assigning default value: none
[2015-08-01 22:04:00.336951] I [trash.c:2363:init] 0-nova-trash: no option specified for 'eliminate', using NULL
[2015-08-01 22:04:00.337131] E [posix.c:5894:init] 0-nova-posix: Extended attribute trusted.glusterfs.volume-id is absent
[2015-08-01 22:04:00.337142] E [xlator.c:426:xlator_init] 0-nova-posix: Initialization of volume 'nova-posix' failed, review your volfile again
[2015-08-01 22:04:00.337148] E [graph.c:322:glusterfs_graph_init] 0-nova-posix: initializing translator failed
[2015-08-01 22:04:00.337154] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
[2015-08-01 22:04:00.337629] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down

1 Answers

Voted

slm · Answer 1 · 2015-08-02T14:55:45+08:00

OK so it seems I had to do the following.

Add extended attribute trusted.glusterfs.volume-id - notice that it needs to be on the /brick directory, I tried it a level up from there and it didn't work

$ setfattr -n trusted.glusterfs.volume-id -v 0xf0d72d64288c4e729c532d16ce5687ac /data/glusterfs/sde1/brick

NOTE: that value for the volume-id comes from this command:

$ grep volume-id /var/lib/glusterd/vols/nova/info | cut -d= -f2 | sed 's/-//g'
f0d72d64288c4e729c532d16ce5687ac

Restart glusterd

$ service restart glusterd.service

If I then watch the log for the brick: /var/log/glusterfs/bricks/data-glusterfs-sde1-brick.log you'll see messages of the effect:

[2015-08-01 22:28:01.510200] I [login.c:81:gf_auth] 0-auth/login: allowed user names: a9c47852-7dcf-4f89-80e5-110101943f36
[2015-08-01 22:28:01.510254] I [server-handshake.c:585:server_setvolume] 0-nova-server: accepted client from icicle07.td.teradata.com-44127-2015/08/01-21:08:06:639278-nova-client-19-0-0 (version: 3.7.0)
[2015-08-01 22:28:01.510584] I [login.c:81:gf_auth] 0-auth/login: allowed user names: a9c47852-7dcf-4f89-80e5-110101943f36
[2015-08-01 22:28:01.510614] I [server-handshake.c:585:server_setvolume] 0-nova-server: accepted client from icicle08.td.teradata.com-7291-2015/07/02-00:22:13:514999-nova-client-19-0-0 (version: 3.7.0)
[2015-08-01 22:28:01.513443] I [login.c:81:gf_auth] 0-auth/login: allowed user names: a9c47852-7dcf-4f89-80e5-110101943f36

Now as I watch the brick I can see it's getting synced with the rest of the cluster:

$ while [ 1 ]; do du -sh /data/glusterfs/sde1/brick; sleep 30; done
38G /data/glusterfs/sde1/brick
40G /data/glusterfs/sde1/brick
41G /data/glusterfs/sde1/brick

Once that's complete, run a heal command to double check things.
```
$ gluster volume heal nova full
```

Additional details

I also saw these messages right after I restarted glusterd:

[2015-08-01 22:27:56.882271] W [graph.c:357:_log_if_unknown_option] 0-nova-quota: option 'timeout' is not recognized
[2015-08-01 22:27:56.882303] W [graph.c:357:_log_if_unknown_option] 0-nova-trash: option 'brick-path' is not recognized
Final graph:
+------------------------------------------------------------------------------+
  1: volume nova-posix
  2:     type storage/posix
  3:     option glusterd-uuid 5e09f3ec-bfbc-490b-bd93-8e083e8ebd05
  4:     option directory /data/glusterfs/sde1/brick
  5:     option volume-id f0d72d64-288c-4e72-9c53-2d16ce5687ac
  6: end-volume
  7:
  8: volume nova-trash
  9:     type features/trash
 10:     option trash-dir .trashcan
 11:     option brick-path /data/glusterfs/sde1/brick
 12:     option trash-internal-op off
 13:     subvolumes nova-posix
 14: end-volume
 15:
 16: volume nova-changetimerecorder
 17:     type features/changetimerecorder
 18:     option db-type sqlite3
 19:     option hot-brick off
 20:     option db-name brick.db
 21:     option db-path /data/glusterfs/sde1/brick/.glusterfs/
 22:     option record-exit off
 23:     option ctr_link_consistency off
 24:     option record-entry on
 25:     option ctr-enabled off
 26:     option record-counters off
 27:     subvolumes nova-trash
 28: end-volume
 29:
 30: volume nova-changelog
 31:     type features/changelog
 32:     option changelog-brick /data/glusterfs/sde1/brick
 33:     option changelog-dir /data/glusterfs/sde1/brick/.glusterfs/changelogs
 34:     option changelog-barrier-timeout 120
 35:     subvolumes nova-changetimerecorder
 36: end-volume
 37:
 38: volume nova-bitrot-stub
 39:     type features/bitrot-stub
 40:     option export /data/glusterfs/sde1/brick
 41:     subvolumes nova-changelog
 42: end-volume
 43:
 44: volume nova-access-control
 45:     type features/access-control
 46:     subvolumes nova-bitrot-stub
 47: end-volume
 48:
 49: volume nova-locks
 50:     type features/locks
 51:     subvolumes nova-access-control
 52: end-volume
 53:
 54: volume nova-upcall
 55:     type features/upcall
 56:     option cache-invalidation off
 57:     subvolumes nova-locks
 58: end-volume
 59:
 60: volume nova-io-threads
 61:     type performance/io-threads
 62:     subvolumes nova-upcall
 63: end-volume
 64:
 65: volume nova-barrier
 66:     type features/barrier
 67:     option barrier disable
 68:     option barrier-timeout 120
 69:     subvolumes nova-io-threads
 70: end-volume
 71:
 72: volume nova-index
 73:     type features/index
 74:     option index-base /data/glusterfs/sde1/brick/.glusterfs/indices
 75:     subvolumes nova-barrier
 76: end-volume
 77:
 78: volume nova-marker
 79:     type features/marker
 80:     option volume-uuid f0d72d64-288c-4e72-9c53-2d16ce5687ac
 81:     option timestamp-file /var/lib/glusterd/vols/nova/marker.tstamp
 82:     option xtime off
 83:     option gsync-force-xtime off
 84:     option quota off
 85:     option inode-quota off
 86:     subvolumes nova-index
 87: end-volume
 88:
 89: volume nova-quota
 90:     type features/quota
 91:     option volume-uuid nova
 92:     option server-quota off
 93:     option timeout 0
 94:     option deem-statfs off
 95:     subvolumes nova-marker
 96: end-volume
 97:
 98: volume nova-worm
 99:     type features/worm
100:     option worm off
101:     subvolumes nova-quota
102: end-volume
103:
104: volume nova-read-only
105:     type features/read-only
106:     option read-only off
107:     subvolumes nova-worm
108: end-volume
109:
110: volume /data/glusterfs/sde1/brick
111:     type debug/io-stats
112:     option latency-measurement off
113:     option count-fop-hits off
114:     subvolumes nova-read-only
115: end-volume
116:
117: volume nova-server
118:     type protocol/server
119:     option transport.socket.listen-port 49155
120:     option rpc-auth.auth-glusterfs on
121:     option rpc-auth.auth-unix on
122:     option rpc-auth.auth-null on
123:     option transport-type tcp
124:     option auth.login./data/glusterfs/sde1/brick.allow a9c47852-7dcf-4f89-80e5-110101943f36
125:     option auth.login.a9c47852-7dcf-4f89-80e5-110101943f36.password XXXXXXX
126:     option auth.addr./data/glusterfs/sde1/brick.allow *
127:     subvolumes /data/glusterfs/sde1/brick
128: end-volume
129:
+------------------------------------------------------------------------------+

Confirming extended attributes

You can use the following command to see what attributes are present:

$ getfattr -d -m . -e hex /data/glusterfs/sde1/brick
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sde1/brick
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.nova-client-18=0x000000000000000000000000
trusted.afr.nova-client-19=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000004ccccccb66666663
trusted.glusterfs.dht.commithash=0x3000
trusted.glusterfs.volume-id=0xf0d72d64288c4e729c532d16ce5687ac

How do you add a replacement HDD to a glusterfs volume?

EDIT #1

EDIT #2

Additional details

Confirming extended attributes

References

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?