I am trying to add an EC2 instance to an Elasitic Load Balancer using an Ansible playbook, with the ec2_elb
module. This is the task that should do this:
- name: "Add host to load balancer {{ load_balancer_name }}"
sudo: false
local_action:
module: ec2_elb
state: present
wait: true
region: "{{ region }}"
ec2_elbs: ['{{ load_balancer_name }}']
instance_id: "{{ ec2_id }}"
However, it routinely fails, with this output (verbosity turned up):
TASK: [Add host to load balancer ApiELB-staging] ******************************
<127.0.0.1> REMOTE_MODULE ec2_elb region=us-east-1 state=present instance_id=i-eb7e0cc7
<127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868 && echo $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868']
<127.0.0.1> PUT /var/folders/d4/17fw96k107d5kbck6fb2__vc0000gn/T/tmpki4HPF TO /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ec2_elb
<127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ec2_elb; rm -rf /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ >/dev/null 2>&1']
failed: [10.0.115.149 -> 127.0.0.1] => {"failed": true}
msg: The instance i-eb7e0cc7 could not be put in service on LoadBalancer:ApiELB-staging. Reason: Instance has not passed the configured HealthyThreshold number of health checks consecutively.
FATAL: all hosts have already failed -- aborting
I have my ELB configuration defined like this (also via Ansible):
- name: "Ensure load balancer exists: {{ load_balancer_name }}"
sudo: false
local_action:
module: ec2_elb_lb
name: "{{ load_balancer_name }}"
state: present
region: "{{ region }}"
subnets: "{{ vpc_public_subnet_ids }}"
listeners:
- protocol: https
load_balancer_port: 443
instance_protocol: http
instance_port: 8888
ssl_certificate_id: "{{ ssl_cert }}"
health_check:
ping_protocol: http # options are http, https, ssl, tcp
ping_port: 8888
ping_path: "/internal/v1/status"
response_timeout: 5 # seconds
interval: 30 # seconds
unhealthy_threshold: 10
healthy_threshold: 10
register: apilb
When I access the status resource from either my laptop or from the server itself (as localhost) I get a 200
response as expected. I also added a command
task to the Ansible playbook, right before adding the instance to the ELB, to confirm that the application is booted up and serving requests properly (and it is):
- command: /usr/bin/curl -v --fail http://localhost:8888/internal/v1/status
I don't see any non-200 responses for the status check resource in the logs for my application (but of course, if the requests never made it as far as my application, they would not be logged).
The other weird thing is that the instance does get added to the ELB, and it seems to work properly. So I know that at some point, at least, the load balancer can access the application properly (for both the status check resource, and other resources). The AWS console shows the instance is healthy, and the Cloudwatch charts don't show any failed health checks.
Any ideas?
Adapted from my earlier comment:
Judging from the Ansible docs, there's a
wait_timeout
parameter which you will have to set to something higher than 300 for this to work. (330 would be safe).Or you could lower your
interval
orhealthy_threshold
or both so that you have to wait less than 300 seconds.Your
unhealthy_threshold
is the same as thehealthy_threshold
, so once a web server starts throwing 500 responses, it will stay in the pool for 5 minutes before the ELB drops it.You can use ec2_elb option
wait: no
.