TRW

Asked: 2020-12-05 00:54:10 +0800 CST2020-12-05 00:54:10 +0800 CST 2020-12-05 00:54:10 +0800 CST

Gluster Performance

currently I try to setup a Gluster cluster and the performance is strange and I'm not sure, if I configured something wron. I'm using 4x Hetzner root server running Debian Buster with Intel i7, 128GB RAM, two NVMe's and one HDD. Every system has a separate 10Gbs network interface for internal communication (all hosts are directly connected to one switch on one rack).

When I test the network with iperf - I've got around 9.41 Gbits/sec between all peers.

I've installed the Debian default glusterfs-server packages (glusterfs-server_5.5-3_amd64.deb).

I've build three volumes with:

SSD (gv0) on /mnt/ssd/gfs/gv0
HDD (gv1) on /mnt/hdd/gfs/gv1
RAM-disc (gv2) on /mnt/ram/gfs/gv2

With

gluster volume create gv0 replica 2 transport tcp 10.255.255.1:/mnt/ssd/gfs/gv0 10.255.255.2:/mnt/ssd/gfs/gv0 10.255.255.3:/mnt/ssd/gfs/gv0 10.255.255.4:/mnt/ssd/gfs/gv0 force
...

And some configuration changes - all volumes look like this (gv0, gv1 and gv2 are the same)

# gluster volume info gv0
 
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 0fd68188-2b74-4050-831d-a590ef0faafd
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.255.255.1:/mnt/ssd/gfs/gv0
Brick2: 10.255.255.2:/mnt/ssd/gfs/gv0
Brick3: 10.255.255.3:/mnt/ssd/gfs/gv0
Brick4: 10.255.255.4:/mnt/ssd/gfs/gv0
Options Reconfigured:
performance.flush-behind: on
performance.cache-max-file-size: 512MB
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet

Later I found some optimizations in the net. But the performance doesn't change a lot (of course it is a single thread performance test).

# gluster volume info gv0
 
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 0fd68188-2b74-4050-831d-a590ef0faafd
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.255.255.1:/mnt/ssd/gfs/gv0
Brick2: 10.255.255.2:/mnt/ssd/gfs/gv0
Brick3: 10.255.255.3:/mnt/ssd/gfs/gv0
Brick4: 10.255.255.4:/mnt/ssd/gfs/gv0
Options Reconfigured:
performance.write-behind-window-size: 1MB
cluster.readdir-optimize: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.readdir-ahead: on
performance.io-thread-count: 16
performance.io-cache: on
performance.flush-behind: on
performance.cache-max-file-size: 512MB
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet

Also I tried with jumbo frames and without it. But it also made no difference

# ip a s
...
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 6c:b3:11:07:f1:18 brd ff:ff:ff:ff:ff:ff
    inet 10.255.255.2/24 brd 10.255.255.255 scope global enp3s0
       valid_lft forever preferred_lft forever

All three volumes are mounted directly on one of the peers

10.255.255.1:gv0 /mnt/gv0 glusterfs defaults 0 0
10.255.255.1:gv1 /mnt/gv1 glusterfs defaults 0 0
10.255.255.1:gv2 /mnt/gv3 glusterfs defaults 0 0

Then I created some test data in a separate RAM disk. I wrote a script that generates with dd if=/dev/urandom and a for loop many files. I first generated the files, because /dev/urandom seems to be "end" at around 45Mb/s, when I write to a ram disk.

----- generate files 10240 x 100K
----- generate files 5120 x 1000K
----- generate files 1024 x 10000K
sum: 16000 MB on /mnt/ram1/

And now comes the transfer. I've just called cp -r /mnt/ram1/* /mnt/gv0/ etc. to write and cp -r /mnt/gv0/* /mnt/ram1/ and count the seconds. And that looks terrible.

                    read    write
ram <-> ram           4s       4s
ram <-> ssd           4s       7s
ram <-> hdd           4s       7s
ram <-> gv0 (ssd)   162s     145s
ram <-> gv1 (hdd)   164s     165s
ram <-> gv2 (ram)   158s     133s

So the performance of read and write with local disk compared and gluster cluster is around 40-time faster. That can't be.

What do I miss?

Gluster Performance

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

Gluster Performance

0 Answers