I only really need help understand the following image, but I will give the background for context.
We have an app that is configured to use a proxy on port 8080 and requires Internet access. At random times throughout the day, the app fails to connect and just dies. We are trying to figure out the cause. We have ruled out FW and proxy URL rules (it's always hitting the same URL when it works and fails anyway). I think the issue is network related on a performance issue on the proxy itself. To get to the bottom of it I have been taking network captures when it happens.
If you look at the following image, it is a snippet with the IP details removed. The first line with source "42" is the client machine making a TLS request through the proxy (IP 35) on port 8080. NOTE: It usually works and requests the same URL/IP, but this is one of the times it failed. The bottom window is the details of the first green line.
The highlighted part "Next sequence number" matches the ACK of the last returned packet from 35 (2nd to last line). This is 35 essentially replying to the client stating it has received all the data that was sent to it (this means the device is up as it acknowledgesdreceipt of the data (meaning no FW or network issues)). Notice that it does not send any data back though. Immediately after this the client issues a TCP RST. Here is my interpretation but I'd like someone to verify, as my TCP skills are a little rusty.
The client is sending some form of request to the proxy, but for some reason the proxy is not responding (at the application layer). Since the proxy DOES reply with TCP ACKs, this means that at the networking layer all is good. This would imply that when the data is passed up the networking stack to the proxy itself, it is the proxy that is dropping the connection. Why it does that I do not know yet, but I am looking for clarification so that I can speak to the proxy team and tell them they need to investigate this (they don’t think it is the proxy).
Other evidence to support my case is that the 4 first lines you see in the image before the RST are repeated many times. Again, this implies that the client is re-sending whatever request it has but never gets a response; and then it eventually gives up and issues a reset.
There is apparently a load balancer that sits in front of the proxy, and the proxy is actually several machines. I have a feeling that the there is an issue with one of them at the backend and the LB is not removing the node from the pool, and therefore sends the data to a black hole potentially.
I am looking for a second opinion, does this summary I have above look accurate based on the capture?
Not immediately. The RST is sent by the client 30 seconds after the last ACK was sent by the server.
These are not the same lines. They have a different value for ACK.
My interpretation is here that the client is sending a request with a larger payload (hence the multiple ACK from the server to acknowledge this) and then expects the proxy to send the response back. After 30 seconds without response the client is giving up and closing the connection with RST.
It is not clear why the proxy does not send a response. It might be a problem of the proxy. But it might also be a problem of the upstream server and the server just propagates the problem to the client.
Note that the interpretation might be wrong though. There is not much context and packet capture provided, so it is more an educated guess.