Introduction
I am attempting to upgrade my installation of Openshift-Ansible from 3.6 to a higher version.
Currently, I'm running the following in an attempt to upgrade to v3.7:
ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i hosts -k openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade.yml
I get the following failing task when running:
TASK [Check for invalid namespaces and SDN errors] ***********************************************************************************************************************************************************************
fatal: [10.0.0.51]: FAILED! => {"changed": false, "msg": "Failed to GET hostsubnet.", "results": {"cmd": "/usr/bin/oc get hostsubnet -o json -n default", "results": [{}], "returncode": 1, "stderr": "Unable to connect to the server: dial tcp: lookup docker1.foo.bar on 10.0.0.1:53: no such host\n", "stdout": ""}, "state": "list"}
Details
I've checked out the release-3.9
branch of the openshift-ansible project.
For the sake of brevity, I'll post only the portion of my hosts
file which I think are relevant, please let me know if I have left out important details:
[OSEv3:children]
masters
nodes
etcd
openshift_master_cluster_method=native
openshift_master_cluster_hostname=10.0.0.51
openshift_master_cluster_public_hostname=10.0.0.51
osm_cluster_network_cidr=10.168.0.0/13
[masters]
10.0.0.51
[etcd]
10.0.0.51
[nodes]
10.0.0.51 openshift_node_labels="{'region': 'infra','zone': 'default','node-role.kubernetes.io/compute': 'true'}" openshift_schedulable=true
10.0.0.52 openshift_node_labels="{'region': 'infra','zone': 'default','node-role.kubernetes.io/compute': 'true'}"
10.0.0.53 openshift_node_labels="{'region': 'infra','zone': 'default','node-role.kubernetes.io/compute': 'true'}"
I think this error may be due to some sort of name resolution error, I have an /etc/resolv.conf
inside 10.0.0.53 with the following contents:
nameserver 10.0.0.1
nameserver 10.0.0.53
If I do nslookup docker1.foo.bar 10.0.0.1
, I get a failure to find, which is expected because 10.0.0.1 can't resolve internal network names.
If I do docker1.foo.bar 10.0.0.53
, I resolve the name to 10.0.0.51
as expected.
What I've Already Done Before this
Initially, when I ran upgrade.retry
as above, I got the following error:
TASK [openshift_excluder : Check the available origin-docker-excluder version is at most of the upgrade target version] **********************************************************************
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result|version_compare` use `result is version_compare`. This feature will be removed in version 2.9.
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
fatal: [docker1.foo.bar]: FAILED! => {"changed": false, "msg": "Available origin-docker-excluder version 3.9.0 is higher than the upgrade target version"}
...
So, I've added the following to my inventory file:
enable_excluders=false
Which seems to make the playbook continue past this error, then I hit the error as described above in the Introduction.
Question
How can I upgrade my installation of Openshift-Ansible, or, what is causing my error?
Make 10.0.0.53 (or some other nameserver with knowledge of your private network) your primary nameserver (move to top of
/etc/resolv.conf
).This will allow the openshift client to get the host subnet. This may not be enough to complete an upgrade, but it will resolve this issue.