Ping a Specific Port

Question

Criggie

Asked: 2015-12-24 16:15:10 +0800 CST2015-12-24 16:15:10 +0800 CST 2015-12-24 16:15:10 +0800 CST

Xenserver, iSCSI and Dell MD3600i

772

I have a functional xenserver 6.5 pool with two nodes. It is backed by an iscsi share on a Dell MD3600i SAN, and this works fine. It was set up before my time.

We've added three more nodes to the pool. However these three new nodes will not connect to the storage.

Here's one of the original nodes, working fine:

[root@node1 ~]# iscsiadm -m session
tcp: [2] 10.19.3.11:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [3] 10.19.3.14:3260,2 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [4] 10.19.3.12:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [5] 10.19.3.13:3260,2 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)

Here's one of the new nodes. Notice the corruption in the address?

[root@vnode3 ~]# iscsiadm -m session
tcp: [1] []:-1,2 ▒A<g▒▒▒-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [2] 10.19.3.12:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [3] 10.19.3.11:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [4] 10.19.3.14:3260,2 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)

The missing IP address is .13 but another node is missing .12

Comments:

I have live running production VMs on the existing nodes and nowhere to move them, so rebooting the SAN is not an option.

Multipathing is disabled on the original nodes, despite the san having 4 interfaces. This seems sub optimal so I've turned on multipathing on the new nodes.

The three new nodes have awfully high system loads. Original boxes have a load average of 0.5 to 1, and the three new nodes are sitting at about 11.1, with no VMs running. top shows no high CPU processes, so its something kernel-related ? There are no processes locked in state D (uninterruptable sleep)

If I tell Xencenter to "repair" those Storage Repositories it sits spinning its wheels for hours till I hit cancel. The message is Plugging PDB for node5

Question: How do I get my new xenserver pool members to see the pool storage and work like expected ?

EDIT Further information

None of the new nodes will do a clean reboot either - they get wedged in "stopping iSCSI" on a reboot and I have to use the drac to remotely repower them.
Xencenter is adamant that the nodes are in maintenance mode and that they haven't finished booting.

Good pool node:

[root@node1 ~]# multipath -ll
36f01faf000eaf7f90000076255c4a0f3 dm-36 DELL,MD36xxi
size=3.3T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=12 status=enabled
| |- 14:0:0:6 sdg 8:96  active ready running
| `- 15:0:0:6 sdi 8:128 active ready running
`-+- policy='round-robin 0' prio=11 status=enabled
  |- 12:0:0:6 sdc 8:32  active ready running
  `- 13:0:0:6 sdh 8:112 active ready running
36f01faf000eaf6fd0000098155ad077f dm-35 DELL,MD36xxi
size=917G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=14 status=enabled
| |- 12:0:0:5 sdb 8:16  active ready running
| `- 13:0:0:5 sdd 8:48  active ready running
`-+- policy='round-robin 0' prio=9 status=enabled
  |- 14:0:0:5 sde 8:64  active ready running
  `- 15:0:0:5 sdf 8:80  active ready running

Bad node

[root@vnode3 ~]# multipath
Dec 24 02:56:44 | 3614187703d4a1c001e0582691d5d6902: ignoring map
[root@vnode3 ~]# multipath -ll
[root@vnode3 ~]#                           (ie no response at all, exit code was 0)

Bad node

[root@vnode3 ~]# iscsiadm -m session
tcp: [1] []:-1,2 ▒A<g▒▒▒-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [2] 10.19.3.12:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [3] 10.19.3.11:3260,1 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)
tcp: [4] 10.19.3.14:3260,2 iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb (non-flash)

[root@vnode3 ~]# iscsiadm -m node --loginall=all
Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3600i.6f01faf000eaf7f900000000531ae9bb, portal: 10.19.3.13,3260] (multiple)
^C iscsiadm: caught SIGINT, exiting...

So it tries to log into an IP on the SAN, but spins its wheels for hours till I hit ^C.

2 Answers

Voted

Tobias K · Answer 1 · 2015-12-24T19:27:14+08:00

Tobias K

2015-12-24T19:27:14+08:002015-12-24T19:27:14+08:00

If the iSCSI discovery doesn't work, it's probably a matter of the IQN on the XenSerever host, the MD3600i or both not recognizing each other. Make sure the MD3600i is allowed access from all your IQNs on all your XenServer host using Dell's MDSM utility and then try to redo the iSCSI discovery:

iscsiadm -m discovery -t st -p (MD3600i-primary-controller-IP-address)

iscsiadm -m node --loginall=all

iscsiadm -m session

You should be at least able to ping the primary IP address of the MD3600i from your XenServers if you have network access.

Also note that you'll need to first set up separate iSCSI interfaces on the NICs associated with each new XenServer and assign static IP addresses to those that are unique and on the same subnets as those of your other hosts' iSCSI conenctions.

I hope that helps, --Tobias

2

Criggie · Answer 2 · 2016-01-12T18:57:23+08:00

For closure, there were multiple things wrong.

The hosts were configured for a 1500 byte MTU, whereas the storage SAN was using 9216 byte MTU.
One of the hosts had a subtly-different IQN from reality - the SAN listed the correct IQN as "unassigned" even though it was visually the same as the IQN in use.
My original two nodes had management IPs configured on their on-board 1 Gbit card. The three new nodes had an acceptable management IP configured on the bonded interface, in a vlan. This was too different and mostly stopped the new hosts from coming out of maintanence mode after a boot.

Multipath seemed to have no bearing on the problem at all.

Deleting and fiddling around with files in /var/lib/iscsi/* on the xenserver nodes had no impact on the problem.

I had to use other means to reboot these newer boxes too - they would wedge up trying to stop the iscsi service.

And finally the corruption in the IQN name visible in iscsiadm -m session has vanished completely. This was possibly related to the MTU mismatch.

For future internet searchers - good luck!

Edit: in September 2021, I had exactly the same issue, with a dell MD3800 SAN and some xcp-ng servers. Again, it was caused by mismatched MTU. And Google just happens to serve up this question, which I had completely forgotten. Just goes to show how important it is to provide closure for future readers... that reader might be you.

Xenserver, iSCSI and Dell MD3600i

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?