I did some testing today with my autoscaling setup on Azure Kubernetes. I noticed that when an autoscale was triggered, it took a while for the next node to spin up, so the last pod had to wait a long time to be scheduled. I would like to make it so that when my server reaches a certain threshold, new nodes are added, but pods can still be scheduled on the already-running nodes. Is that possible?
To answer the question of how to trigger autoscaling such that a pod does not have to wait for a new node, this article describes an elegant strategy called "Pause Pods":