I have an API that I've written that uses TCP to bidirectionally stream data. It works perfectly fine in my local setup. But when I move to production, it never hits the endpoint.
Production server uses ELB for load balancing an EC2 instance and it's all fronted by cloud flare.
At first, I couldn't even see it hit the NGINX logs so I went to go around CloudFlare and hit the EC2 IP address directly. This did cause the attempt to show up in the access logs, but they all immediately 499 like this.
[16/Sep/2024:03:13:50 +0000] "POST /api/user_sync HTTP/1.1" 499 0 "-" "MYAPP/1.0.6 CFNetwork/1494.0.7 Darwin/23.6.0"
499 seems to indicate that the client aborted the connection, but I can see in my app that the connection to the server is not shut down until it times out client side. But the 499 appears pretty instantly in the NGINX access logs. So it's like it immediately shuts it out.
I unfortunately do not have much experience with NGINX, Load Balancing, or websites in general so any help would be much appreciated.
I did find some posts where other users were having similar problems though for them, the 499 either happened for every API call, or after a long period of time. However this happens EVERY TIME and only for this specific endpoint.
Anyone else who finds this post!
Both ELB and NGINX were messing things up. ELB waits until a certain amount of data has been sent on the stream before tossing it over. But this doesn't work for us because the initial protocol payload is very small (like 100 chars).
So we went around that.
NGINX apparently is no fan of chunked-transfer-encoding (I think that was the problematic one) so we ended up routing around it too.
Since this API is not cache-able, we ended up routing around varnish too.
This all involved things like setting up new security groups that forwarded to the right port when called on a specific URL that is not our main URL (think like purchase.ebay.com instead of www.ebay.com)
Again, forgive me for the non-technical nomenclature as I'm really not a web guy lol