Searched for a while and cannot find answer or even directions to look forward.
So. XCP-NG cluster of three servers HP DL360p G8, MSA 2060 iSCSI NAS with 12 SAS 10K drives, QNAP TS-1273U-RP, Mikrotik CRS317 switch. Storage network is in dedicated bridge in mikrotik. All devices connected with 3meter copper cables. All devices shows that link is 10G. I even configured MTU to 9000 to all devices. Each server has Ethernet card with two interfaces. One is used for storage network only (eth1 on all three servers). Different subnet for storage network and management network. Xen network backend is openvswitch.
Jumbo frames is working:
ping -M do -s 8972 -c 2 10.100.200.10 -- QNAP
PING 10.100.200.10 (10.100.200.10) 8972(9000) bytes of data.
8980 bytes from 10.100.200.10: icmp_seq=1 ttl=64 time=1.01 ms
8980 bytes from 10.100.200.10: icmp_seq=2 ttl=64 time=0.349 ms
--- 10.100.200.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.349/0.682/1.015/0.333 ms
ping -M do -s 8972 -c 2 10.100.200.8 -- MSA 2060
PING 10.100.200.8 (10.100.200.8) 8972(9000) bytes of data.
8980 bytes from 10.100.200.8: icmp_seq=1 ttl=64 time=9.83 ms
8980 bytes from 10.100.200.8: icmp_seq=2 ttl=64 time=0.215 ms
--- 10.100.200.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.215/5.023/9.832/4.809 ms
Problem: when i copy virtual machine from one storage (QNAP) to another (MSA) write speed is about 45MB/s. When i copy some large file from QNAP (ex: iso install) to servers local storage, speed is about 100MB/s and in that server htop
shows one core with 100% load
It is clearly visible that network is working like 1G network.
Some info about hardware.
ethtool -i eth1
driver: ixgbe
version: 5.5.2
firmware-version: 0x18b30001
expansion-rom-version:
bus-info: 0000:07:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
ethtool eth1
Settings for eth1:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: 10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
lspci | grep net
07:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Then i ran iper3 server on this host: iperf3 -s -4
Results on server host:
[ ID] Interval Transfer Bandwidth
[ 5] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 5] 0.00-10.04 sec 5.48 GBytes 4.69 Gbits/sec receiver
[ 7] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender
[ 7] 0.00-10.04 sec 5.44 GBytes 4.66 Gbits/sec receiver
[SUM] 0.00-10.04 sec 0.00 Bytes 0.00 bits/sec sender
[SUM] 0.00-10.04 sec 10.9 GBytes 9.35 Gbits/sec receiver
And client on another host: iperf3 -c 10.100.200.20 -P 2 -t 10 -4
Results on client host:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 5.49 GBytes 4.72 Gbits/sec 112 sender
[ 4] 0.00-10.00 sec 5.48 GBytes 4.71 Gbits/sec receiver
[ 6] 0.00-10.00 sec 5.45 GBytes 4.68 Gbits/sec 178 sender
[ 6] 0.00-10.00 sec 5.44 GBytes 4.67 Gbits/sec receiver
[SUM] 0.00-10.00 sec 10.9 GBytes 9.40 Gbits/sec 290 sender
[SUM] 0.00-10.00 sec 10.9 GBytes 9.38 Gbits/sec receiver
What to test next or how to find bottleneck?
iperf3 shows that link is working with 10Gbit speed or i interpret results incorrectly?
Software versions:
xe host-list params=software-version
software-version (MRO) : product_version: 8.2.0; product_version_text: 8.2; product_version_text_short: 8.2; platform_name: XCP; platform_version: 3.2.0; product_brand: XCP-ng; build_number: release/stockholm/master/7; hostname: localhost; date: 2021-05-20; dbv: 0.0.1; xapi: 1.20; xen: 4.13.1-9.11.1; linux: 4.19.0+1; xencenter_min: 2.16; xencenter_max: 2.16; network_backend: openvswitch; db_schema: 5.602
software-version (MRO) : product_version: 8.2.0; product_version_text: 8.2; product_version_text_short: 8.2; platform_name: XCP; platform_version: 3.2.0; product_brand: XCP-ng; build_number: release/stockholm/master/7; hostname: localhost; date: 2021-05-20; dbv: 0.0.1; xapi: 1.20; xen: 4.13.1-9.11.1; linux: 4.19.0+1; xencenter_min: 2.16; xencenter_max: 2.16; network_backend: openvswitch; db_schema: 5.602
software-version (MRO) : product_version: 8.2.0; product_version_text: 8.2; product_version_text_short: 8.2; platform_name: XCP; platform_version: 3.2.0; product_brand: XCP-ng; build_number: release/stockholm/master/7; hostname: localhost; date: 2021-05-20; dbv: 0.0.1; xapi: 1.20; xen: 4.13.1-9.11.1; linux: 4.19.0+1; xencenter_min: 2.16; xencenter_max: 2.16; network_backend: openvswitch; db_schema: 5.602
Other two servers have HP 530FLR-SFP+ cards:
lspci | grep net
03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
ethtool -i eth1
driver: bnx2x
version: 1.714.24 storm 7.13.11.0
firmware-version: bc 7.10.10
expansion-rom-version:
bus-info: 0000:03:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
ethtool eth1
Settings for eth1:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: 10000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: Direct Attach Copper
PHYAD: 1
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: g
Wake-on: g
Current message level: 0x00000000 (0)
Link detected: yes
Edit 1: Local storage test:
dmesg | grep sda
[ 13.093002] sd 0:1:0:0: [sda] 860051248 512-byte logical blocks: (440 GB/410 GiB)
[ 13.093077] sd 0:1:0:0: [sda] Write Protect is off
[ 13.093080] sd 0:1:0:0: [sda] Mode Sense: 73 00 00 08
[ 13.093232] sd 0:1:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 13.112781] sda: sda1 sda2 sda3 sda4 sda5 sda6
[ 13.114348] sd 0:1:0:0: [sda] Attached SCSI disk
[ 15.267456] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
[ 15.268750] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 17.597243] EXT4-fs (sda1): re-mounted. Opts: (null)
[ 18.991998] Adding 1048572k swap on /dev/sda6. Priority:-2 extents:1 across:1048572k
[ 19.279706] EXT4-fs (sda5): mounting ext3 file system using the ext4 subsystem
[ 19.281346] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)
dd if=/dev/sda of=/dev/null bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 11.1072 s, 92.2 MB/s
This is strange, because server has Smart Array P420i Controller with 2GB cache, hardware raid10 of 6 146GB 15k SAS drives. iLo shows that with storage all is ok. On another server results is similar 1024000000 bytes (1.0 GB) copied, 11.8031 s, 86.8 MB/s
Edit 2 (Shared storage test):
Qnap (SSD Raid10):
dd if=/run/sr-mount/23d45731-c005-8ad6-a596-bab2d12ec6b5/01ce9f2e-c5b1-4ba8-b783-d3a5c1ac54f0.vhd of=/dev/null bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 11.2902 s, 90.7 MB/s
MSA (HP MSA-DP+ raid):
dd if=/dev/mapper/3600c0ff000647bc2259a2f6101000000 of=/dev/null bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 11.3974 s, 89.8 MB/s
No more than 1 Gigabit network ... So, if i transfer VM images between shared storage, then local storage isn't involved. Can openvswitch be bottleneck?
Edit 3 (More disk tests):
sda = Raid10 of 6 x 146GB 15k sas, sdb = one 146GB 15k SAS in raid0
dd if=/dev/sdb of=/dev/null bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 16.5326 s, 61.9 MB/s
[14:35 xcp-ng-em ssh]# dd if=/dev/sdb of=/dev/null bs=512k count=1000
1000+0 records in
1000+0 records out
524288000 bytes (524 MB) copied, 8.48061 s, 61.8 MB/s
[14:36 xcp-ng-em ssh]# dd if=/dev/sdb of=/dev/null bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 84.9631 s, 61.7 MB/s
[14:37 xcp-ng-em ssh]# dd if=/dev/sda of=/dev/null bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 7.03023 s, 746 MB/s