In the diagram, we have one vnet, two subnets, and three systems.
- Azure "IP Forwarding" is enabled on the router interfaces.
- Routing tables are created for "trust" and "untrust" subnets
- Static routes are created on the machines (the obscured routes are host routes to make sure I don't cut myself off)
We can see that bob is successfully pinging alice.
Despite bob's default route being the router, the azure routing table setting bob's default route to the router, and alice is not in the same subnet, the traffic does not pass through the router!?
This raises two big questions for me
Why and how is Azure doing this? This seems to completely defy Layer3 logic.
How are we supposed to do this in Azure?
My next guess on this is that this might need to be done with distinct vnets, but if I use vnets, does that mean 3 vnets? 1 for the virtual appliance and 1 for each subnet?
Here's what I figured out (answering my own question):
Why and how is Azure doing this?
Why:
Unlike physical networking, Azure has the benefit of a record of all systems and network interfaces in the environment.
In a physical network, "Address Resolution Protocol" ie., ARP is a broadcast based protocol for discovering machines in a broadcast domain or "locally attached" L2 environment. This is only necessary because there is no central record.
Azure operates without broadcasts. It's more efficient and it's not necessary when you already know all the machines in the subnet.
How did it route the packets at L3?:
For the routing problem, Azure is observing all packets leaving Alice, and sending them to Bob without regard for the OS's default route. It doesn't matter if the packet destined for 10.0.2.5 was being transmitted to 10.0.1.4 for routing. That L2 direction would have used ARP, which doesn't work here. Instead the vnet controls L3 completely. With an Azure "vnet local" route type in an effective routing table, the packet is delivered directly to Bob whether it is in the same subnet or not.
"vnet local" for the vnet's address range is a default in Azure. This route needs to be overridden in the subnet's UDR for the packet to be delivered to "router"
How should this be done?
Local routing tables on VMs are almost ignored (You still need to be mindful of which interface a packet is leaving on as this affects the subnet it appears in and the routing table applied).
Given L2 is unicast, L3 allows destinations outside the subnet, and Azure routing tables are applied to the subnet, a completely different design is possible.
"should" isn't clear, but this is what I did.
Notice the router is routing between two subnets where it has no interface in those subnets.