Ping a Specific Port

Question

TRW

Asked: 2020-12-16 14:00:55 +0800 CST2020-12-16 14:00:55 +0800 CST 2020-12-16 14:00:55 +0800 CST

Proxmox Ceph - Got timeout on separate network

772

I've installed on 4 nodes a completly fresh OS with Proxmox. Every node has 2xNVMe und 1xHD, one NIC public, one NIC private. On the public network there is an additional wireguard interface running for PVE cluster communication. The private interface should be used only for the upcoming distributed storage.

# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 6c:b3:11:07:f1:18 brd ff:ff:ff:ff:ff:ff
    inet 10.255.255.2/24 brd 10.255.255.255 scope global enp3s0
       valid_lft forever preferred_lft forever
    inet6 fe80::6eb3:11ff:fe07:f118/64 scope link 
       valid_lft forever preferred_lft forever
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b4:2e:... brd ff:ff:ff:ff:ff:ff
    inet 168..../26 brd 168....127 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 2a01:.../128 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::b62e:99ff:fecc:f5d0/64 scope link 
       valid_lft forever preferred_lft forever
4: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether a2:fd:6a:c7:f0:be brd ff:ff:ff:ff:ff:ff
    inet6 2a01:....::2/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::..:f0be/64 scope link 
       valid_lft forever preferred_lft forever
6: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none 
    inet 10.3.0.10/32 scope global wg0
       valid_lft forever preferred_lft forever
    inet6 fd01:3::a/128 scope global 
       valid_lft forever preferred_lft forever

The nodes are fine and the PVE cluster is running as expected.

# pvecm status
Cluster information
-------------------
Name:             ac-c01
Config Version:   4
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Dec 15 22:36:44 2020
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000002
Ring ID:          1.11
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.3.0.4
0x00000002          1 10.3.0.10 (local)
0x00000003          1 10.3.0.13
0x00000004          1 10.3.0.16

The PVE firewall is active in the cluster but there is a rule, that all PVE nodes can talk to each other on any protocol on any port on any interface. This is true - I can ping, ssh, etc. between all nodes on all IPs.

Then I installed ceph.

pveceph install

On the first node I've initialized ceph with

pveceph init -network 10.255.255.0/24
pveceph createmon

That works.

On the second - I tried the same (I'm not sure, if I need to set the -network option - I tried with and without). That works too.

But pveceph createmon fails on any node with:

# pveceph createmon
got timeout

I can also reach port 10.255.255.1:6789 on any node. Whatever I try - I'm getting a "got timeout" on any node then node1. Also disabling firewall doesn't have any effect.

When I remove the -network option, I can run all commands. It looks like it cannot talk via the second interface. But the interface is fine.

When I set network to 10.3.0.0/24 and cluster-network to 10.255.255.0/24 it works too, but I want all ceph communication running via 10.255.255.0/24. What is wrong?

2 Answers

Voted

TRW · Answer 1 · 2020-12-16T15:33:20+08:00

TRW

2020-12-16T15:33:20+08:002020-12-16T15:33:20+08:00

The problem is - the MTU 9000 is a problem. Even when I run the complete Proxmox cluster via the private network, there are errors.

ip link set enp3s0 mtu 1500

So, Ceph has a problem with jumbo frames.

0

Danyright · Answer 2 · 2021-11-14T08:52:46+08:00

Just for reference, the official documentation mentions jumbo frames as bringing important performance improvements:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/configuration_guide/ceph-network-configuration#verifying-and-configuring-the-mtu-value_conf

https://ceph.io/en/news/blog/2015/ceph-loves-jumbo-frames/

I for one, have seen read/write performances improvements of around 1400% after changing the MTU on the 6 nodes we set up (3 storage, 3 compute).

And no, this is not a typo. We went from 110 MB/s read/write with dd tests in Linux VMs to 1.5-1.6 GB/s afterwards (1 Gbps public network, 10 Gbps private network, OSD's on SATA SSDs).

Nota Bene: changing the MTU on all network interfaces (public AND private) seems quite important! In our case, changing it only on the private NICs made the whole system go haywire.

From Redhat's doc:

Important

Red Hat Ceph Storage requires the same MTU value throughout all networking devices in the communication path, end-to-end for both public and cluster networks.

I hope this helps someone! Cheers

Proxmox Ceph - Got timeout on separate network

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?