I run a SaaS monitoring service. Our standard integration example is to make a curl request to a special URL we provide. We have an intermittent issue where a user is hitting a 10 second curl timeout (-m 10 param). However, I can see in my server logs that the requests being processed around that time were only taking 100-300ms which is about normal for us.
We do see spikes in traffic at the top of every minute, but even then we seldom take longer than 1000ms.
We have had a single user in particular who seems susceptible to timeouts. I've asked him to set our IP in his hosts file to ensure this is not a DNS issue. (Though i feel very confident it's not DNS).
I'd love any ideas on minimally invasive ways I could ask this user to help me troubleshoot. Before we dumped the amazon ELB timeouts were more common (though still very rare) and I was able to reproduce it a couple times and I saw very strange "timed out at 0ms" errors as if the connection was somehow rejected immediately despite a 10 second timeout.
There is nothing exotic in our iptables config, just blocking ports and bad ips. The webserver stack is nginx-uwsgi
The problem for us turned out to be that our uwsgi request queue was filling up. To fix that we had to adjust a uwsgi setting and a kernel setting: https://stackoverflow.com/questions/8516516/stuck-at-100-requests-uwsgi