Ping a Specific Port

Question

pkaeding

Asked: 2014-08-28 08:53:46 +0800 CST2014-08-28 08:53:46 +0800 CST 2014-08-28 08:53:46 +0800 CST

Why is my instance failing ELB health checks when adding it to the load balancer via Ansible?

772

I am trying to add an EC2 instance to an Elasitic Load Balancer using an Ansible playbook, with the ec2_elb module. This is the task that should do this:

- name: "Add host to load balancer {{ load_balancer_name }}"
  sudo: false
  local_action:
    module: ec2_elb
    state: present
    wait: true
    region: "{{ region }}"
    ec2_elbs: ['{{ load_balancer_name }}']
    instance_id: "{{ ec2_id }}"

However, it routinely fails, with this output (verbosity turned up):

TASK: [Add host to load balancer ApiELB-staging] ****************************** 
<127.0.0.1> REMOTE_MODULE ec2_elb region=us-east-1 state=present instance_id=i-eb7e0cc7
<127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868 && echo $HOME/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868']
<127.0.0.1> PUT /var/folders/d4/17fw96k107d5kbck6fb2__vc0000gn/T/tmpki4HPF TO /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ec2_elb
<127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ec2_elb; rm -rf /Users/pkaeding/.ansible/tmp/ansible-tmp-1409156786.81-113716163813868/ >/dev/null 2>&1']
failed: [10.0.115.149 -> 127.0.0.1] => {"failed": true}
msg: The instance i-eb7e0cc7 could not be put in service on LoadBalancer:ApiELB-staging. Reason: Instance has not passed the configured HealthyThreshold number of health checks consecutively.

FATAL: all hosts have already failed -- aborting

I have my ELB configuration defined like this (also via Ansible):

- name: "Ensure load balancer exists: {{ load_balancer_name }}"
  sudo: false
  local_action:
    module: ec2_elb_lb
    name: "{{ load_balancer_name }}"
    state: present
    region: "{{ region }}"
    subnets: "{{ vpc_public_subnet_ids }}"
    listeners:
      - protocol: https
        load_balancer_port: 443
        instance_protocol: http
        instance_port: 8888
        ssl_certificate_id: "{{ ssl_cert }}"
    health_check:
        ping_protocol: http # options are http, https, ssl, tcp
        ping_port: 8888
        ping_path: "/internal/v1/status"
        response_timeout: 5 # seconds
        interval: 30 # seconds
        unhealthy_threshold: 10
        healthy_threshold: 10
  register: apilb

When I access the status resource from either my laptop or from the server itself (as localhost) I get a 200 response as expected. I also added a command task to the Ansible playbook, right before adding the instance to the ELB, to confirm that the application is booted up and serving requests properly (and it is):

- command: /usr/bin/curl -v --fail http://localhost:8888/internal/v1/status

I don't see any non-200 responses for the status check resource in the logs for my application (but of course, if the requests never made it as far as my application, they would not be logged).

The other weird thing is that the instance does get added to the ELB, and it seems to work properly. So I know that at some point, at least, the load balancer can access the application properly (for both the status check resource, and other resources). The AWS console shows the instance is healthy, and the Cloudwatch charts don't show any failed health checks.

Any ideas?

2 Answers

Voted

Ladadadada · Answer 1 · 2014-08-28T11:11:56+08:00

Best Answer

Ladadadada

2014-08-28T11:11:56+08:002014-08-28T11:11:56+08:00

Adapted from my earlier comment:

Judging from the Ansible docs, there's a wait_timeout parameter which you will have to set to something higher than 300 for this to work. (330 would be safe).

Or you could lower your interval or healthy_threshold or both so that you have to wait less than 300 seconds.

Your unhealthy_threshold is the same as the healthy_threshold, so once a web server starts throwing 500 responses, it will stay in the pool for 5 minutes before the ELB drops it.

4

Alexey Vazhnov · Answer 2 · 2015-09-24T10:26:25+08:00

Alexey Vazhnov

2015-09-24T10:26:25+08:002015-09-24T10:26:25+08:00

You can use ec2_elb option wait: no.

3

Why is my instance failing ELB health checks when adding it to the load balancer via Ansible?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?