Previously I asked about mounting GlusterFS at boot in an Ubuntu 12.04 server and the answer was that this was buggy in 12.04 and worked in 14.04. Curious I gave it a try on a virtual machine running on my laptop and in 14.04 it worked. Since this was critical for me, I decided to upgrade my running servers to 14.04 only to discover that GlusterFS is not mounting localhost volumes automatically either.
This is a Linode server and fstab looks like this:
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
/dev/xvda / ext4 noatime,errors=remount-ro 0 1
/dev/xvdb none swap sw 0 0
/dev/xvdc /var/lib/glusterfs/brick01 ext4 defaults 1 2
koraga.int.example.com:/public_uploads /var/www/shared/public/uploads glusterfs defaults,_netdev 0 0
The booting process likes like this (around the networking mounting part, which are the only fails):
* Stopping Mount network filesystems [ OK ]
* Starting set sysctls from /etc/sysctl.conf [ OK ]
* Stopping set sysctls from /etc/sysctl.conf [ OK ]
* Starting configure virtual network devices [ OK ]
* Starting Bridge socket events into upstart [ OK ]
* Starting Waiting for state [fail]
* Stopping Waiting for state [ OK ]
* Starting Block the mounting event for glusterfs filesystems until the [fail]k interfaces are running
* Starting Waiting for state [fail]
* Starting Block the mounting event for glusterfs filesystems until the [fail]k interfaces are running
* Stopping Waiting for state [ OK ]
* Starting Signal sysvinit that remote filesystems are mounted [ OK ]
* Starting GNU Screen Cleanup [ OK ]
I believe the log file /var/log/glusterfs/var-www-shared-public-uploads.log
contains the main clue to the problem, as it's the only one that is really different between this server, where mounting is not working, and my local virtual server, where it is:
[2014-07-10 05:51:49.762162] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.1 (/usr/sbin/glusterfs --volfile-server=koraga.int.example.com --volfile-id=/public_uploads /var/www/shared/public/uploads)
[2014-07-10 05:51:49.774248] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-07-10 05:51:49.774278] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-07-10 05:51:49.775573] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 192.168.134.227:24007 failed (Connection refused)
[2014-07-10 05:51:49.775634] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: koraga.int.example.com (No data available)
[2014-07-10 05:51:49.775649] I [glusterfsd-mgmt.c:1607:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2014-07-10 05:51:49.776284] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23) [0x7f6718bf3f83] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x90) [0x7f6718bf7da0] (-->/usr/sbin/glusterfs(+0xcf13) [0x7f67192bbf13]))) 0-: received signum (1), shutting down
[2014-07-10 05:51:49.776314] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/var/www/shared/public/uploads'.
The status of the volume is:
Volume Name: public_uploads
Type: Distribute
Volume ID: 52aa6d85-f4ea-4c39-a2b3-d20d34ab5916
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: koraga.int.example.com:/var/lib/glusterfs/brick01/public_uploads
Options Reconfigured:
auth.allow: 127.0.0.1,192.168.134.227
client.ssl: off
server.ssl: off
nfs.disable: on
If I run mount -a
after booting up, the volume is mounted correctly:
koraga.int.example.com:/public_uploads on /var/www/shared/public/uploads type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
A couple of related log files show this:
/var/log/upstart/mounting-glusterfs-_var_www_shared_public_uploads.log
:
start: Job failed to start
/var/log/upstart/wait-for-state-mounting-glusterfs-_var_www_shared_public_uploadsstatic-network-up.log
:
status: Unknown job: static-network-up
start: Unknown job: static-network-up
but on my testing server, it shows exactly the same, so, I don't think this is relevant.
Any ideas what's wrong now?
Update: I tried the change of WAIT_FOR from static-network-up to networking and it still didn't work but all the [fail] messages at boot disappear. These are the contains of the log files under these conditions:
/var/log/glusterfs/var-www-shared-public-uploads.log
contains:
wait-for-state stop/waiting
/var/log/upstart/wait-for-state-mounting-glusterfs-_var_www_shared_public_uploadsstatic-network-up.log
contains:
start: Job is already running: networking
/var/log/glusterfs/var-www-shared-public-uploads.log
contains:
[2014-07-11 17:19:38.000207] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.1 (/usr/sbin/glusterfs --volfile-server=koraga.int.example.com --volfile-id=/public_uploads /var/www/shared/public/uploads)
[2014-07-11 17:19:38.029421] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-07-11 17:19:38.029450] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-07-11 17:19:38.030288] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 192.168.134.227:24007 failed (Connection refused)
[2014-07-11 17:19:38.030331] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: koraga.int.example.com (No data available)
[2014-07-11 17:19:38.030345] I [glusterfsd-mgmt.c:1607:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2014-07-11 17:19:38.030984] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23) [0x7fd9495b7f83] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x90) [0x7fd9495bbda0] (-->/usr/sbin/glusterfs(+0xcf13) [0x7fd949c7ff13]))) 0-: received signum (1), shutting down
[2014-07-11 17:19:38.031013] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/var/www/shared/public/uploads'.
Update 2: I also tried this in the upstart file:
start on (started glusterfs-server and mounting TYPE=glusterfs)
but the computer failed to boot (don't know why yet).
I managed to make this work through a combination of answers in this thread and this one: GlusterFS is failing to mount on boot
As per @Dan Pisarski edit
/etc/init/mounting-glusterfs.conf
to read:As per @dialt0ne change
/etc/fstab
to read:Works For Me(tm) on Ubuntu 14.04.2 LTS
I have run into the same problem on AWS on ubuntu 12.04. Here are some things you can do that worked for me:
This will allow you to retry the volfile server while the network is unavailable.
This will allow for you to mount the filesystem from another gluster server member if the primary is down for some reason.
nobootwait
in your fstabThis allows the instance to continue booting while this filesystem isn't mounted.
A sample entry from my current fstab is:
I have not tested this on 14.04, but it works ok for my 12.04 instances.
It's a bug
This is really a bug (the static-network-up is not a job, it's an event signal).
Moreover, using the network job as suggested in other answers is not the most correct solution.
So, I created this bug report and submitted a patch to this problem.
As a workaround, you can apply my proposed solution (at the end of this answer) and use the
_netdev
option in your fstab.A better explanation is showed above too, but you can skip this explanation if you want.
Explanation
This is a bug in the
mounting-glusterfs.conf
. It can increase unnecessary 30 seconds in the boot in an Ubuntu Server, or even hang the boot process.Because of this bug, the mountall process thinks that the mount failed (you'll see "Mount failed" errors in
/var/log/boot.log
). So, when not using thenobootwait
/nofail
flags in/etc/fstab
, the bug can hang the mount process (and the boot process too). When using thenobootwait
/nofail
flags, the bug will increase the boot time in about 30 seconds.The bug is caused by the following errors:
_netdev
mount flag that will retry the mount for each time that an interface brings up;wait-for-state
upstart task to wait for a signal. It's used to wait for a job.static-network-up
is an event signal, and not a job;WAIT_STATE=running
env var because it's not the default inwait-for-state
.Solution
/etc/init/mounting-glusterfs.conf
:PS: Use also the
_netdev
option in your fstab.I ran into this as well, and want to preface this answer with the statement that I am not an expert in this area so its possible there is a better solution to this!
But the issue seems to be that static-network-up is an event, not the name of an upstart job. However, the wait-for-state script expects a job name to be passed in as WAIT_FOR value. Thus, the error of "Unknown job" as you discovered above.
To resolve the issue I changed /etc/init/mounting-glusterfs.conf, changing:
into:
networking is the name of an actual job (/etc/init/networking.conf) and I believe the job that typically emits static-network-up.
This change worked for me on Ubuntu 14.04.
Thanks for the detailed explanation, I think I understand a lot more than earlier. Latest solution is almost working. The problems (actually one, since the first implies the second):
127.0.0.1:/share
) still not mountedmounted TYPE=glusterfs
never satisfied, so the services which are dependent of the mountedTYPE=glusterfs
state/etc/fstab
:/etc/init/mounting-glusterfs.conf
: copied from above/etc/init/salt-master.conf
:The local share must be mounted by hand, or by some automatism, salt-master must be started by hand after all reboots.
Noticed later: the above WAIT script in mounting-glusterfs... blocks the whole boot procedure, seems like glusterfs-server state never reaches running.
I managed this with really simple solution:
add mount point to /etc/fstab
add one line to your /etc/rc.local, so it will look like:
Now all glusterfs volumes would be mounted on a startup.
Good thinking there. A simpler solution for Ubuntu/Debian is to use the x-systemd.automount service. So the command becomes localhost:/gv0 /srv glusterfs defaults,_netdev,noauto,x-systemd.automount 0 0. No need to do anything else. More info on https://serverfault.com/a/823582.
FROM: https://stanislas.blog/2018/10/how-to-mount-local-glusterfs-volume-boot-fstab-systemd-fix/
Test for glusterfs9, it works.