I'm trying to deploy OpenStack Queens with kolla-ansible (7.0.0) on Ubuntu hosts, following the official guide.
After successful bootstrap-servers
and precheck
the deploy
command fails:
RUNNING HANDLER [haproxy : Waiting for virtual IP to appear] **********************************************************
fatal: [testcloudcontrol01]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 10.52.41.98:3306"}
fatal: [testcloudcontrol02]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 10.52.41.98:3306"}
The reason for the check to fail is that the kolla_internal_vip_address
does not come up.
globals.yml
config_strategy: "COPY_ALWAYS"
kolla_base_distro: "ubuntu"
kolla_install_type: "binary"
openstack_release: "queens"
kolla_internal_vip_address: "10.52.41.98"
kolla_internal_fqdn: "testcloudapi.example.com"
kolla_external_vip_address: "{{ kolla_internal_vip_address }}"
kolla_external_fqdn: "{{ kolla_internal_fqdn }}"
network_interface: "ens160"
api_interface: "ens160"
storage_interface: "ens161"
keepalived_virtual_router_id: "148"
I'm currently fixed on queens because I want to replicate our production environment for testing.
The output of ip addr
on one of the nodes where haproxy is supposed to deploy:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a1:6a:2c brd ff:ff:ff:ff:ff:ff
inet 10.52.41.100/24 brd 10.52.41.255 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fea1:6a2c/64 scope link
valid_lft forever preferred_lft forever
3: ens161: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a1:7d:07 brd ff:ff:ff:ff:ff:ff
inet 10.52.42.100/24 brd 10.52.42.255 scope global ens161
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fea1:7d07/64 scope link
valid_lft forever preferred_lft forever
4: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a1:23:6e brd ff:ff:ff:ff:ff:ff
inet 10.52.40.100/24 brd 10.52.40.255 scope global ens224
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fea1:236e/64 scope link
valid_lft forever preferred_lft forever
5: ens256: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a1:20:12 brd ff:ff:ff:ff:ff:ff
inet 10.52.44.100/24 brd 10.52.44.255 scope global ens256
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fea1:2012/64 scope link
valid_lft forever preferred_lft forever
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:b0:8a:93:e7 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
The nodes are VMware virtual machines with VMXNet3 nics.
Output of docker logs keepalived
:
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/keepalived/keepalived.conf
INFO:__main__:Copying /var/lib/kolla/config_files/keepalived.conf to /etc/keepalived/keepalived.conf
INFO:__main__:Setting permission for /etc/keepalived/keepalived.conf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD='/usr/sbin/keepalived -nld -p /run/keepalived.pid'
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
++ modprobe ip_vs
++ '[' -f /run/keepalived.pid ']'
+ echo 'Running command: '\''/usr/sbin/keepalived -nld -p /run/keepalived.pid'\'''
Running command: '/usr/sbin/keepalived -nld -p /run/keepalived.pid'
+ exec /usr/sbin/keepalived -nld -p /run/keepalived.pid
Thu Dec 13 12:10:26 2018: Starting Keepalived v1.3.9 (10/21,2017)
Thu Dec 13 12:10:26 2018: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 13 12:10:26 2018: Starting Healthcheck child process, pid=11
Thu Dec 13 12:10:26 2018: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 13 12:10:26 2018: Starting VRRP child process, pid=12
Thu Dec 13 12:10:26 2018: ------< Global definitions >------
Thu Dec 13 12:10:26 2018: Router ID = testcloudcontrol01.example.com
Thu Dec 13 12:10:26 2018: Default interface = eth0
Thu Dec 13 12:10:26 2018: LVS flush = false
Thu Dec 13 12:10:26 2018: VRRP IPv4 mcast group = 224.0.0.18
Thu Dec 13 12:10:26 2018: VRRP IPv6 mcast group = ff02::12
Thu Dec 13 12:10:26 2018: Gratuitous ARP delay = 5
Thu Dec 13 12:10:26 2018: Gratuitous ARP repeat = 5
Thu Dec 13 12:10:26 2018: Gratuitous ARP refresh timer = 0
Thu Dec 13 12:10:26 2018: Gratuitous ARP refresh repeat = 1
Thu Dec 13 12:10:26 2018: Gratuitous ARP lower priority delay = 4294
Thu Dec 13 12:10:26 2018: Gratuitous ARP lower priority repeat = -1
Thu Dec 13 12:10:26 2018: Send advert after receive lower priority advert = true
Thu Dec 13 12:10:26 2018: Send advert after receive higher priority advert = false
Thu Dec 13 12:10:26 2018: Gratuitous ARP interval = 0
Thu Dec 13 12:10:26 2018: Gratuitous NA interval = 0
Thu Dec 13 12:10:26 2018: VRRP default protocol version = 2
Thu Dec 13 12:10:26 2018: Iptables input chain = INPUT
Thu Dec 13 12:10:26 2018: Using ipsets = true
Thu Dec 13 12:10:26 2018: ipset IPv4 address set = keepalived
Thu Dec 13 12:10:26 2018: ipset IPv6 address set = keepalived6
Thu Dec 13 12:10:26 2018: ipset IPv6 address,iface set = keepalived_if6
Thu Dec 13 12:10:26 2018: VRRP check unicast_src = false
Thu Dec 13 12:10:26 2018: VRRP skip check advert addresses = false
Thu Dec 13 12:10:26 2018: VRRP strict mode = false
Thu Dec 13 12:10:26 2018: VRRP process priority = 0
Thu Dec 13 12:10:26 2018: VRRP don't swap = false
Thu Dec 13 12:10:26 2018: Checker process priority = 0
Thu Dec 13 12:10:26 2018: Checker don't swap = false
Thu Dec 13 12:10:26 2018: SNMP keepalived disabled
Thu Dec 13 12:10:26 2018: SNMP checker disabled
Thu Dec 13 12:10:26 2018: SNMP RFCv2 disabled
Thu Dec 13 12:10:26 2018: SNMP RFCv3 disabled
Thu Dec 13 12:10:26 2018: SNMP traps disabled
Thu Dec 13 12:10:26 2018: SNMP socket = default (unix:/var/agentx/master)
Thu Dec 13 12:10:26 2018: Network namespace = (default)
Thu Dec 13 12:10:26 2018: DBus disabled
Thu Dec 13 12:10:26 2018: DBus service name = (null)
Thu Dec 13 12:10:26 2018: Script security disabled
Thu Dec 13 12:10:26 2018: Default script uid:gid 0:0
Thu Dec 13 12:10:26 2018: Registering Kernel netlink reflector
Thu Dec 13 12:10:26 2018: Registering Kernel netlink command channel
Thu Dec 13 12:10:26 2018: Registering gratuitous ARP shared channel
Thu Dec 13 12:10:26 2018: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 13 12:10:26 2018: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
Thu Dec 13 12:10:26 2018: Truncating auth_pass to 8 characters
Thu Dec 13 12:10:26 2018: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Thu Dec 13 12:10:26 2018: ------< Global definitions >------
Thu Dec 13 12:10:26 2018: Router ID = testcloudcontrol01.example.com
Thu Dec 13 12:10:26 2018: Default interface = eth0
Thu Dec 13 12:10:26 2018: LVS flush = false
Thu Dec 13 12:10:26 2018: VRRP IPv4 mcast group = 224.0.0.18
Thu Dec 13 12:10:26 2018: VRRP IPv6 mcast group = ff02::12
Thu Dec 13 12:10:26 2018: Gratuitous ARP delay = 5
Thu Dec 13 12:10:26 2018: Gratuitous ARP repeat = 5
Thu Dec 13 12:10:26 2018: Gratuitous ARP refresh timer = 0
Thu Dec 13 12:10:26 2018: Gratuitous ARP refresh repeat = 1
Thu Dec 13 12:10:26 2018: Gratuitous ARP lower priority delay = 5
Thu Dec 13 12:10:26 2018: Gratuitous ARP lower priority repeat = 5
Thu Dec 13 12:10:26 2018: Send advert after receive lower priority advert = true
Thu Dec 13 12:10:26 2018: Send advert after receive higher priority advert = false
Thu Dec 13 12:10:26 2018: Gratuitous ARP interval = 0
Thu Dec 13 12:10:26 2018: Gratuitous NA interval = 0
Thu Dec 13 12:10:26 2018: VRRP default protocol version = 2
Thu Dec 13 12:10:26 2018: Iptables input chain = INPUT
Thu Dec 13 12:10:26 2018: Using ipsets = false
Thu Dec 13 12:10:26 2018: ipset IPv4 address set = keepalived
Thu Dec 13 12:10:26 2018: ipset IPv6 address set = keepalived6
Thu Dec 13 12:10:26 2018: ipset IPv6 address,iface set = keepalived_if6
Thu Dec 13 12:10:26 2018: VRRP check unicast_src = false
Thu Dec 13 12:10:26 2018: VRRP skip check advert addresses = false
Thu Dec 13 12:10:26 2018: VRRP strict mode = false
Thu Dec 13 12:10:26 2018: VRRP process priority = 0
Thu Dec 13 12:10:26 2018: VRRP don't swap = false
Thu Dec 13 12:10:26 2018: Checker process priority = 0
Thu Dec 13 12:10:26 2018: Checker don't swap = false
Thu Dec 13 12:10:26 2018: SNMP keepalived disabled
Thu Dec 13 12:10:26 2018: SNMP checker disabled
Thu Dec 13 12:10:26 2018: SNMP RFCv2 disabled
Thu Dec 13 12:10:26 2018: SNMP RFCv3 disabled
Thu Dec 13 12:10:26 2018: SNMP traps disabled
Thu Dec 13 12:10:26 2018: SNMP socket = default (unix:/var/agentx/master)
Thu Dec 13 12:10:26 2018: Network namespace = (default)
Thu Dec 13 12:10:26 2018: DBus disabled
Thu Dec 13 12:10:26 2018: DBus service name = (null)
Thu Dec 13 12:10:26 2018: Script security disabled
Thu Dec 13 12:10:26 2018: Default script uid:gid 0:0
Thu Dec 13 12:10:26 2018: ------< VRRP Topology >------
Thu Dec 13 12:10:26 2018: VRRP Instance = kolla_internal_vip_148
Thu Dec 13 12:10:26 2018: Using VRRPv2
Thu Dec 13 12:10:26 2018: Want State = BACKUP
Thu Dec 13 12:10:26 2018: Running on device = ens160
Thu Dec 13 12:10:26 2018: Skip checking advert IP addresses = no
Thu Dec 13 12:10:26 2018: Enforcing strict VRRP compliance = no
Thu Dec 13 12:10:26 2018: Using src_ip = 10.52.41.100
Thu Dec 13 12:10:26 2018: Gratuitous ARP delay = 5
Thu Dec 13 12:10:26 2018: Gratuitous ARP repeat = 5
Thu Dec 13 12:10:26 2018: Gratuitous ARP refresh timer = 0
Thu Dec 13 12:10:26 2018: Gratuitous ARP refresh repeat = 1
Thu Dec 13 12:10:26 2018: Gratuitous ARP lower priority delay = 5
Thu Dec 13 12:10:26 2018: Gratuitous ARP lower priority repeat = 5
Thu Dec 13 12:10:26 2018: Send advert after receive lower priority advert = true
Thu Dec 13 12:10:26 2018: Send advert after receive higher priority advert = false
Thu Dec 13 12:10:26 2018: Virtual Router ID = 148
Thu Dec 13 12:10:26 2018: Priority = 1
Thu Dec 13 12:10:26 2018: Advert interval = 1 sec
Thu Dec 13 12:10:26 2018: Accept enabled
Thu Dec 13 12:10:26 2018: Preempt disabled
Thu Dec 13 12:10:26 2018: Promote_secondaries disabled
Thu Dec 13 12:10:26 2018: Authentication type = SIMPLE_PASSWORD
Thu Dec 13 12:10:26 2018: Password = 0RXbQYFF
Thu Dec 13 12:10:26 2018: Tracked scripts = 1
Thu Dec 13 12:10:26 2018: check_alive weight 0
Thu Dec 13 12:10:26 2018: Virtual IP = 1
Thu Dec 13 12:10:26 2018: 10.52.41.98/32 dev ens160 scope global
Thu Dec 13 12:10:26 2018: ------< VRRP Scripts >------
Thu Dec 13 12:10:26 2018: VRRP Script = check_alive
Thu Dec 13 12:10:26 2018: Command = /check_alive.sh
Thu Dec 13 12:10:26 2018: Interval = 2 sec
Thu Dec 13 12:10:26 2018: Timeout = 0 sec
Thu Dec 13 12:10:26 2018: Weight = 0
Thu Dec 13 12:10:26 2018: Rise = 10
Thu Dec 13 12:10:26 2018: Fall = 2
Thu Dec 13 12:10:26 2018: Insecure = no
Thu Dec 13 12:10:26 2018: Status = INIT
Thu Dec 13 12:10:26 2018: Script uid:gid = 0:0
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018: Name = lo
Thu Dec 13 12:10:26 2018: index = 1
Thu Dec 13 12:10:26 2018: IPv4 address = 127.0.0.1
Thu Dec 13 12:10:26 2018: IPv6 address = ::
Thu Dec 13 12:10:26 2018: is UP
Thu Dec 13 12:10:26 2018: is RUNNING
Thu Dec 13 12:10:26 2018: MTU = 65536
Thu Dec 13 12:10:26 2018: HW Type = LOOPBACK
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018: Name = ens160
Thu Dec 13 12:10:26 2018: index = 2
Thu Dec 13 12:10:26 2018: IPv4 address = 10.52.41.100
Thu Dec 13 12:10:26 2018: IPv6 address = fe80::250:56ff:fea1:6a2c
Thu Dec 13 12:10:26 2018: MAC = 00:50:56:a1:6a:2c
Thu Dec 13 12:10:26 2018: is UP
Thu Dec 13 12:10:26 2018: is RUNNING
Thu Dec 13 12:10:26 2018: MTU = 1500
Thu Dec 13 12:10:26 2018: HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018: Name = ens161
Thu Dec 13 12:10:26 2018: index = 3
Thu Dec 13 12:10:26 2018: IPv4 address = 10.52.42.100
Thu Dec 13 12:10:26 2018: IPv6 address = fe80::250:56ff:fea1:7d07
Thu Dec 13 12:10:26 2018: MAC = 00:50:56:a1:7d:07
Thu Dec 13 12:10:26 2018: is UP
Thu Dec 13 12:10:26 2018: is RUNNING
Thu Dec 13 12:10:26 2018: MTU = 1500
Thu Dec 13 12:10:26 2018: HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018: Name = ens224
Thu Dec 13 12:10:26 2018: index = 4
Thu Dec 13 12:10:26 2018: IPv4 address = 10.52.40.100
Thu Dec 13 12:10:26 2018: IPv6 address = fe80::250:56ff:fea1:236e
Thu Dec 13 12:10:26 2018: MAC = 00:50:56:a1:23:6e
Thu Dec 13 12:10:26 2018: is UP
Thu Dec 13 12:10:26 2018: is RUNNING
Thu Dec 13 12:10:26 2018: MTU = 1500
Thu Dec 13 12:10:26 2018: HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018: Name = ens256
Thu Dec 13 12:10:26 2018: index = 5
Thu Dec 13 12:10:26 2018: IPv4 address = 10.52.44.100
Thu Dec 13 12:10:26 2018: IPv6 address = fe80::250:56ff:fea1:2012
Thu Dec 13 12:10:26 2018: MAC = 00:50:56:a1:20:12
Thu Dec 13 12:10:26 2018: is UP
Thu Dec 13 12:10:26 2018: is RUNNING
Thu Dec 13 12:10:26 2018: MTU = 1500
Thu Dec 13 12:10:26 2018: HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018: Name = docker0
Thu Dec 13 12:10:26 2018: index = 6
Thu Dec 13 12:10:26 2018: IPv4 address = 172.17.0.1
Thu Dec 13 12:10:26 2018: IPv6 address = ::
Thu Dec 13 12:10:26 2018: MAC = 02:42:b0:8a:93:e7
Thu Dec 13 12:10:26 2018: is UP
Thu Dec 13 12:10:26 2018: MTU = 1500
Thu Dec 13 12:10:26 2018: HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: Using LinkWatch kernel netlink reflector...
Thu Dec 13 12:10:26 2018: VRRP_Instance(kolla_internal_vip_148) Entering BACKUP STATE
Thu Dec 13 12:10:26 2018: /check_alive.sh exited with status 1
Thu Dec 13 12:10:28 2018: /check_alive.sh exited with status 1
Thu Dec 13 12:10:30 2018: VRRP_Instance(kolla_internal_vip_148) Now in FAULT state
Thu Dec 13 12:10:30 2018: /check_alive.sh exited with status 1
Thu Dec 13 12:10:32 2018: /check_alive.sh exited with status 1
[message repeats until I stop the container]
That's it, both keepalived instances stay in the FAULT state, the IP address is not activated on any of the VMs.
I went through this question and the answer, even though I don't have the error messages in the log files:
- keepalived_virtual_router_id has been changed and is unique
- I ran
kolla-genpwd
again. I confirmed thatkeepalived_password
is set in/etc/kolla/passwords.yml
kolla_internal_vip_address
is accessible fromnetwork_interface
. The main IP on that interface is in the same network. I can manually set the additional IP address and it works.kolla-ansible prechecks
passes- selinux is not active on Ubuntu
On the hypervisor side I tried enabling Promiscuous mode
for the port group of that interface. That didn't make a difference.
So, after running into the same problem on bare metal I dug deeper into the problem. Turns out it wasn't keepalived, but the haproxy container that had the problem.
The haproxy container keeps restarting because haproxy is started with the command line parameter
-W
, which does not exist in the haproxy version that is shipped in the container.Hence, the haproxy container keeps restarting. The keepalived container on the other hand, is configured with a check script for keepalived that keeps exiting with an error:
This check script is very simple, it checks the status of haproxy via a socket file:
So ... as long as haproxy is called with the invalid parameter and doesn't start, keepalived stays in
FAULT
state, with no floating IP up.Using
grep -R "haproxy -W" *
I found that the command line for haproxy is defined in the file/usr/local/share/kolla-ansible/ansible/roles/haproxy/templates/haproxy.json.j2
. I removed the-W
parameter from the command line, which resulted in haproxy starting properly and keepalived changing toMASTER
state with configuration of the floating IP.There is already a bug report open on Launchpad regarding this issue. There is also a slightly different solution in the comments (changing the same file).
This modification will of course be reverted when the file is updated. If you have the same problem, please log into Launchpad and mark that the bug (which was reported on 2018-06-08) affects you, so it gets priority and gets fixed.