We're running across three zones in GCE with a TCP Proxy loadbalancer in front. The backend for the loadbalancer is on a single node within one of the zones. Using the health check, the loadbalancer can determine where to send the traffic. This works as expected when traffic originates outside of Google. For outgoing traffic we use a NAT gateway and a route with a lower priority than the default Google route (800 instead of 1000) to actually route the traffic for 0.0.0.0/0 there.
However, when one of the backend servers that's not in the same zone as the (active) backend for the TCP loadbalancer tries to access the external address, it will get a connection time out. The route does not seem to get used. If the request originates from the same zone as the active backend, it works. The nat gateway route does not seem to be used.
One solution would of course be to make sure that there's an active backend in all zones, but we prefer not to do that. One thing we note is that when we get traffic from within the cluster through this external IP address, the client IP address in our logs seems to be the loadbalancer address itself. Any idea how to solve this otherwise? Is this expected behaviour or did we possibly misconfigure some route or firewall?
Are you sure you are using a TCP proxy load balancer. Because if you are using a non proxy load balancer, it is expected that you would see this behaviour.
The Internal Load Balancer (ILB) is not an actual device apart from client and backend instances, but implemented as programming to the Software Defined Network used by client and server. The ILB logic is implemented at the client side, but if the backend is also a client to the same ILB, since the ILB IP address is configured in the guest Google Compute Engine (GCE) Agent running on the backend instance, the packet never "reaches" the ILB, as the GCE agent directly handles the packet inside the backend instance.
In other words, the ILB internally configures a route and an IP address into each backend instance Operating System. If a backend instance performs a request to the ILB, this request will remain inside the instance and will not be routed to any other backend instance.
Documentation on internal load balancer can be found at this link.