If I have an application looks like the following:
Client <--> ELB <--> EC2
Would the E2E latency possibly be lower if I set up ELB as TCP passthrough mode than I make it as HTTP listener?
The reason I guess TCP passthrough mode may make my E2E latency better (lower) is because ELB in this scenario almost does not cause any extra hop cost than the following scenario:
Client <--> EC2
Is my understanding correct? Please walk me through if not.
Your understanding isn't quite correct, because TCP pass-through is really payload pass-through. The balancer accepts the connection, then creates a new connection to the instance, then passes the payload back and forth on the connection. Traffic still passes through an extra device -- the balancer.
It will not likely make a substantial difference in latency because once the request is cut-through in HTTP mode, the behavior is similar, with bytes copied from one connection to the other.
The disadvantage of TCP mode is that you lose something a Classic ELB in HTTP mode is able to do: reuse the same connections to the instances for handling sequential requests for multiple clients -- it holds idle connections open to the instances, waiting for more client requests to arrive, which means fewer connections being initiated to the instances, and many requests potentially using connections that are already established.
Depending on the application, an ALB -- application load balancer -- offers a further advantage, not only reusing instance connections, but supporting HTTP/2 on the browser side, allowing the browser to send concurrent requests, which are fanned out to the instances as parallel HTTP/1.1 requests.
Or, if you really want a TCP pass-through scenario, you probably want an NLB -- network load balancer. Unlike the other two balancer types, an NLB actually modifies the network behavior to create dynamic NAT translations to the instances -- there isn't a separate system handling the traffic, because NLBs are virtual entities. Classic and Application balancers are actually (as far as anyone can tell) implemented on "hidden" EC2 instances.