We're using Terraform to launch ASGs for most of our AWS EC2 instances. The problem is that once in awhile we want to do some extra work before terminating an instance; for example: decommissioning a node from a cluster before the EC2 instance it was running on is terminated. If we were just to lower min
== max
(our default) then an instance gets terminated and we can't run a graceful decommission.
Instead what I've tried is lowering min
to the new desired value (example: 6) and keeping the max
at the old value (example: 10), what happens in this case is that the desired
value stays at 10 (the max
) and terminating the EC2 instance causes a new one to be launched by the ASG. NOTE: we are not setting the Terraform desired_capacity
setting at all.
If I set desired_capacity
manually, I risk the ASG terminating a node that has not been gracefully decommissioned so I don't think that's an option for me.
What I'd ideally like is for the ASG to do nothing when the current EC2 Instance count for that ASG is between min
and max
and let me manually terminate instances. Obviously if the count goes below min
I'd still like the ASG to launch a new EC2 Instance.
Is there any way to achieve this?
There are two possible ways to achieve what you want:
Option 1: Suspend Auto Scaling Processes
You can put the Auto Scaling group's processing "on hold" while you are making your adjustments.
For example, you could try:
aws autoscaling suspend-processes --auto-scaling-group-name MyGroup
aws autoscaling resume-processes --auto-scaling-group-name MyGroup
http://docs.aws.amazon.com/cli/latest/reference/autoscaling/suspend-processes.html
Option 2: Use Auto Scaling Lifecycle Hooks
Using Lifecycle Hooks, your launched and/or terminating EC2 instances are given an opportunity to do an initial or pre-termination processing. For example, you can have the hook notify the terminating instnce that it is about to be terminated, and it can decommission itself from your cluster.
http://docs.aws.amazon.com/autoscaling/latest/userguide/lifecycle-hooks.html
Supposed solutions:
Option 1: Your ASG should be created with instance protection ON - Terraform docs
In this case, we could have next sequence of operations for instance decommission:
aws autoscaling set-instance-protection --instance-ids <instances_ids> --auto-scaling-group-name <asg_name> --no-protected-from-scale-in
Option 2: Your ASG was not created with instance protection.
In this case, we could have next sequence of operations for instance decommission:
aws autoscaling set-instance-protection --instance-ids <instances_ids> --auto-scaling-group-name <asg_name> --protected-from-scale-in
aws autoscaling set-instance-protection --instance-ids <instances_ids> --auto-scaling- group-name <asg_name> --no-protected-from-scale-in