We're running a fairly large Flask application and finding that at random times we'll end up with a very slow request (sometimes a minute or more)
I presume the issue is related to lazy-loading modules and slow requests are when a new worker needs to start-up, or on reload. (we originally had this issue with Apache + mod_wsgi but decided to try uWSGI instead since it preforks workers). However even with reloading its not consistent. Often I can reload and the requests are a little slower but not significantly.
I know lazy-loading is a thing in Django but as I understand from the docs Flask doesn't do this unless its configured to. I'm at a loss as to why requests would continue to be this slow.
To add to the mystery I'm running this on EC2 behind a load balancer (just with the one instance). I seem to have more problems when connecting through the load balancer than I do when connecting directly but again its random. Most requests through the load-balancer aren't adding more than about 10ms.
Here's the various configs:
nginx:
server {
listen 80;
server_name dev.mysite.net
root /var/www/mysite;
location / {
include uwsgi_params;
uwsgi_pass unix:/var/run/uwsgi/mysite.sock;
}
}
uWSGI (managed by emperor)
[uwsgi]
base = /usr
app = my_app.py
pythonpath = /usr/lib/python2.7
pythonpath = /usr/lib/python2.7/site-packages
pythonpath = /usr/lib/python2.7/dist-packages
pythonpath = /var/www/mysite
socket = /var/run/uwsgi/%n.sock
module = %(app)
callable = app
logto = /var/log/uwsgi/%n.log
workers=10
enable-threads = 1
EDIT: One thing I did run across was that keep alive timeout settings could cause problems and it does seem the hang was waiting for a few more seconds than the keep alive timeout. This seems to have helped but not fixed it entirely, especially on requests through the load balancer.
Figured this issue out:
The issue was the Availability zones in the ELB. I thought I had to have an availability zone set the same as where my instance actually is and then at least one Public zone (the zone the instance is in is private).
Apparently where the instance is doesn't matter, just the availability zones need to all be public. Creating a public subnet in the same availability zone magically made it work even though the instance itself is on a private subnet.