Ping a Specific Port

Question

Pablo

Asked: 2014-07-10 22:22:18 +0800 CST2014-07-10 22:22:18 +0800 CST 2014-07-10 22:22:18 +0800 CST

GlusterFS failing to mount at boot with Ubuntu 14.04

772

Previously I asked about mounting GlusterFS at boot in an Ubuntu 12.04 server and the answer was that this was buggy in 12.04 and worked in 14.04. Curious I gave it a try on a virtual machine running on my laptop and in 14.04 it worked. Since this was critical for me, I decided to upgrade my running servers to 14.04 only to discover that GlusterFS is not mounting localhost volumes automatically either.

This is a Linode server and fstab looks like this:

# <file system> <mount point>          <type>    <options>                 <dump>  <pass>
proc        /proc                        proc    defaults                       0       0
/dev/xvda   /                            ext4    noatime,errors=remount-ro      0       1
/dev/xvdb   none                         swap    sw                             0       0
/dev/xvdc   /var/lib/glusterfs/brick01   ext4    defaults                       1       2
koraga.int.example.com:/public_uploads /var/www/shared/public/uploads glusterfs defaults,_netdev 0 0

The booting process likes like this (around the networking mounting part, which are the only fails):

 * Stopping Mount network filesystems                                    [ OK ]
 * Starting set sysctls from /etc/sysctl.conf                            [ OK ]
 * Stopping set sysctls from /etc/sysctl.conf                            [ OK ]
 * Starting configure virtual network devices                            [ OK ]
 * Starting Bridge socket events into upstart                            [ OK ]
 * Starting Waiting for state                                            [fail]
 * Stopping Waiting for state                                            [ OK ]
 * Starting Block the mounting event for glusterfs filesystems until the [fail]k interfaces are running
 * Starting Waiting for state                                            [fail]
 * Starting Block the mounting event for glusterfs filesystems until the [fail]k interfaces are running
 * Stopping Waiting for state                                            [ OK ]
 * Starting Signal sysvinit that remote filesystems are mounted          [ OK ]
 * Starting GNU Screen Cleanup                                           [ OK ]

I believe the log file /var/log/glusterfs/var-www-shared-public-uploads.log contains the main clue to the problem, as it's the only one that is really different between this server, where mounting is not working, and my local virtual server, where it is:

[2014-07-10 05:51:49.762162] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.1 (/usr/sbin/glusterfs --volfile-server=koraga.int.example.com --volfile-id=/public_uploads /var/www/shared/public/uploads)
[2014-07-10 05:51:49.774248] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-07-10 05:51:49.774278] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-07-10 05:51:49.775573] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 192.168.134.227:24007 failed (Connection refused)
[2014-07-10 05:51:49.775634] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: koraga.int.example.com (No data available)
[2014-07-10 05:51:49.775649] I [glusterfsd-mgmt.c:1607:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2014-07-10 05:51:49.776284] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23) [0x7f6718bf3f83] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x90) [0x7f6718bf7da0] (-->/usr/sbin/glusterfs(+0xcf13) [0x7f67192bbf13]))) 0-: received signum (1), shutting down
[2014-07-10 05:51:49.776314] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/var/www/shared/public/uploads'.

The status of the volume is:

Volume Name: public_uploads
Type: Distribute
Volume ID: 52aa6d85-f4ea-4c39-a2b3-d20d34ab5916
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: koraga.int.example.com:/var/lib/glusterfs/brick01/public_uploads
Options Reconfigured:
auth.allow: 127.0.0.1,192.168.134.227
client.ssl: off
server.ssl: off
nfs.disable: on

If I run mount -a after booting up, the volume is mounted correctly:

koraga.int.example.com:/public_uploads on /var/www/shared/public/uploads type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

A couple of related log files show this:

/var/log/upstart/mounting-glusterfs-_var_www_shared_public_uploads.log:

start: Job failed to start

/var/log/upstart/wait-for-state-mounting-glusterfs-_var_www_shared_public_uploadsstatic-network-up.log:

status: Unknown job: static-network-up
start: Unknown job: static-network-up

but on my testing server, it shows exactly the same, so, I don't think this is relevant.

Any ideas what's wrong now?

Update: I tried the change of WAIT_FOR from static-network-up to networking and it still didn't work but all the [fail] messages at boot disappear. These are the contains of the log files under these conditions:

/var/log/glusterfs/var-www-shared-public-uploads.log contains:

wait-for-state stop/waiting

/var/log/upstart/wait-for-state-mounting-glusterfs-_var_www_shared_public_uploadsstatic-network-up.log contains:

start: Job is already running: networking

/var/log/glusterfs/var-www-shared-public-uploads.log contains:

[2014-07-11 17:19:38.000207] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.1 (/usr/sbin/glusterfs --volfile-server=koraga.int.example.com --volfile-id=/public_uploads /var/www/shared/public/uploads)
[2014-07-11 17:19:38.029421] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-07-11 17:19:38.029450] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-07-11 17:19:38.030288] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 192.168.134.227:24007 failed (Connection refused)
[2014-07-11 17:19:38.030331] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: koraga.int.example.com (No data available)
[2014-07-11 17:19:38.030345] I [glusterfsd-mgmt.c:1607:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2014-07-11 17:19:38.030984] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23) [0x7fd9495b7f83] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x90) [0x7fd9495bbda0] (-->/usr/sbin/glusterfs(+0xcf13) [0x7fd949c7ff13]))) 0-: received signum (1), shutting down
[2014-07-11 17:19:38.031013] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/var/www/shared/public/uploads'.

Update 2: I also tried this in the upstart file:

start on (started glusterfs-server and mounting TYPE=glusterfs)

but the computer failed to boot (don't know why yet).

7 Answers

Voted

mdekkers · Answer 1 · 2015-03-09T01:54:22+08:00

mdekkers

2015-03-09T01:54:22+08:002015-03-09T01:54:22+08:00

I managed to make this work through a combination of answers in this thread and this one: GlusterFS is failing to mount on boot

As per @Dan Pisarski edit /etc/init/mounting-glusterfs.conf to read:

exec start wait-for-state WAIT_FOR=networking WAITER=mounting-glusterfs-$MOUNTPOINT

As per @dialt0ne change /etc/fstab to read:

[serverip]:[vol]  [mountpoint]  glusterfs  defaults,nobootwait,_netdev,backupvolfile-server=[backupserverip],direct-io-mode=disable  0       0

Works For Me(tm) on Ubuntu 14.04.2 LTS

6

dialt0ne · Answer 2 · 2014-07-20T08:49:13+08:00

dialt0ne

2014-07-20T08:49:13+08:002014-07-20T08:49:13+08:00

I have run into the same problem on AWS on ubuntu 12.04. Here are some things you can do that worked for me:

add more fetch-attempts in your fstab

This will allow you to retry the volfile server while the network is unavailable.

add a backup volfile server in your fstab

This will allow for you to mount the filesystem from another gluster server member if the primary is down for some reason.

add nobootwait in your fstab

This allows the instance to continue booting while this filesystem isn't mounted.

A sample entry from my current fstab is:

10.20.30.40:/fs1 /example glusterfs defaults,nobootwait,_netdev,backupvolfile-server=10.20.30.41,fetch-attempts=10 0 2

I have not tested this on 14.04, but it works ok for my 12.04 instances.

4

Rarylson Freitas · Answer 3 · 2015-06-16T10:52:45+08:00

It's a bug

This is really a bug (the static-network-up is not a job, it's an event signal).

Moreover, using the network job as suggested in other answers is not the most correct solution.

So, I created this bug report and submitted a patch to this problem.

As a workaround, you can apply my proposed solution (at the end of this answer) and use the _netdev option in your fstab.

A better explanation is showed above too, but you can skip this explanation if you want.

Explanation

This is a bug in the mounting-glusterfs.conf. It can increase unnecessary 30 seconds in the boot in an Ubuntu Server, or even hang the boot process.

Because of this bug, the mountall process thinks that the mount failed (you'll see "Mount failed" errors in /var/log/boot.log). So, when not using the nobootwait/nofail flags in /etc/fstab, the bug can hang the mount process (and the boot process too). When using the nobootwait/nofail flags, the bug will increase the boot time in about 30 seconds.

The bug is caused by the following errors:

There is no need to wait for the network is up. The Ubuntu itself has the _netdev mount flag that will retry the mount for each time that an interface brings up;
However, it's necessary to wait for the GlusterFS Server daemon (for mounts using localhost);
- This was implemented in an old commit in the GlusterFS upstream project. However, this commit was overwritten;
It's wrong to use the wait-for-state upstart task to wait for a signal. It's used to wait for a job. static-network-up is an event signal, and not a job;
- This is why the "Unknown job: static-network-up" is logged;
It's wrong, when waiting for a job to be started, not passing the WAIT_STATE=running env var because it's not the default in wait-for-state.

Solution

/etc/init/mounting-glusterfs.conf:

author "Louis Zuckerman <me@louiszuckerman.com>"
description "Block the mounting event for glusterfs filesystems until the glusterfs-server is running"

instance $MOUNTPOINT

start on mounting TYPE=glusterfs
task
script
  if status glusterfs-server; then
    start wait-for-state WAIT_FOR=glusterfs-server WAIT_STATE=running \
        WAITER=mounting-glusterfs-$MOUNTPOINT
  fi
end script

PS: Use also the _netdev option in your fstab.

Dan Pisarski · Answer 4 · 2014-07-12T05:47:14+08:00

Dan Pisarski

2014-07-12T05:47:14+08:002014-07-12T05:47:14+08:00

I ran into this as well, and want to preface this answer with the statement that I am not an expert in this area so its possible there is a better solution to this!

But the issue seems to be that static-network-up is an event, not the name of an upstart job. However, the wait-for-state script expects a job name to be passed in as WAIT_FOR value. Thus, the error of "Unknown job" as you discovered above.

To resolve the issue I changed /etc/init/mounting-glusterfs.conf, changing:

exec start wait-for-state WAIT_FOR=static-network-up WAITER=mounting-glusterfs-$MOUNTPOINT

into:

exec start wait-for-state WAIT_FOR=networking WAITER=mounting-glusterfs-$MOUNTPOINT

networking is the name of an actual job (/etc/init/networking.conf) and I believe the job that typically emits static-network-up.

This change worked for me on Ubuntu 14.04.

1

Attila Heidrich · Answer 5 · 2015-07-31T04:50:26+08:00

Attila Heidrich

2015-07-31T04:50:26+08:002015-07-31T04:50:26+08:00

Thanks for the detailed explanation, I think I understand a lot more than earlier. Latest solution is almost working. The problems (actually one, since the first implies the second):

local shares (127.0.0.1:/share) still not mounted
mounted TYPE=glusterfs never satisfied, so the services which are dependent of the mounted TYPE=glusterfs state

/etc/fstab:

127.0.0.1:/control-share /mnt/glu-control-share glusterfs defaults,_netdev 0 0

/etc/init/mounting-glusterfs.conf: copied from above

/etc/init/salt-master.conf:

description "Salt Master"

start on (mounted TYPE=glusterfs
          and runlevel [2345])
stop on runlevel [!2345]
limit nofile 100000 100000
...

The local share must be mounted by hand, or by some automatism, salt-master must be started by hand after all reboots.

Noticed later: the above WAIT script in mounting-glusterfs... blocks the whole boot procedure, seems like glusterfs-server state never reaches running.

1

Vladislav Kulbatski · Answer 6 · 2018-02-10T02:15:38+08:00

Vladislav Kulbatski

2018-02-10T02:15:38+08:002018-02-10T02:15:38+08:00

I managed this with really simple solution:

add mount point to /etc/fstab

gluster:/VOLUME_NAME/local/mount/point glusterfs defaults,_netdev 0 0

add one line to your /etc/rc.local, so it will look like:
```
mount -a
exit 0
```

Now all glusterfs volumes would be mounted on a startup.

1

BDO · Answer 7 · 2021-10-15T05:04:54+08:00

BDO

2021-10-15T05:04:54+08:002021-10-15T05:04:54+08:00

Good thinking there. A simpler solution for Ubuntu/Debian is to use the x-systemd.automount service. So the command becomes localhost:/gv0 /srv glusterfs defaults,_netdev,noauto,x-systemd.automount 0 0. No need to do anything else. More info on https://serverfault.com/a/823582.

FROM: https://stanislas.blog/2018/10/how-to-mount-local-glusterfs-volume-boot-fstab-systemd-fix/

Test for glusterfs9, it works.

0

GlusterFS failing to mount at boot with Ubuntu 14.04

It's a bug

Explanation

Solution

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?