recently I learned how to setup load balancer using aws and direct it to root / sub domain.
But while playing around with it, I figured there are some delays. For example, my settings are
3 instances, using nginx
, node
, with pm2
I changed the html
content for each instances such as <h1>load 1</h1>
<h1>load 2</h1>
<h1>load 3</h1>
then when I keep reloading the load balancer dns, it does show the instances are switching randomly so I know that it's working.
then I went into one of the instances, stop the pm2
I went to refresh my page again, sometimes it would show gateway error
which I believe it's because it went into the instance that I stopped, sometimes it would show the page BUT with super bad css that when I open console.log, there will be tones of errors saying cannot locate file.
Takes like 1-2 minutes until everything is totally fine.
I am wondering if this is normal and how it should happen? Or there is a way to optimise it in order for user to have a better user experience?
Thanks for any advise.
When you stopped your instance, two problems appeared:
This is simply how ELB works. The ELB uses health checks to know if your instances are bad. But that takes time. The 1 to 2 minutes you are witnessing is that time where ELB is determining whether the instance is failed or not.
You can configure the frequency of health checks in your ELB configuration.
When you configure a load balancer, health checks are involved. Detection of a failed node is not instant, the health check has to fail first, then the node is removed from the load balancer.
You can set the health checks to be very short, but this can make the situation worse. If your site is very busy and one instance fails a health check, the instance is removed and then the other instances have even more work to do, they start failing health checks and like dominos your site comes down.