I have a Linux server that is an OpenVPN endpoint, but also hosts a webserver. When my client connects to the server address for the webserver, the packets travel outside the VPN. Rightly so, since the route to the server set by OpenVPN is more specific than the default route to enter the VPN. However I see that as a "leak".
Hence I tried to setup a similar setup as Wireguard does (Wireguard is great, but I need OpenVPN because it needs to be TCP).
I based my setup on the Wireguard page, as well as on other questions: Prevent routing loop with FwMark in Wireguard (Hat off for the lecture held there !) Routing fwmark to VPN gateway using nftables mark
Despite the setup, Wireshark shows the http/https requests still go through the physical interface and not through the vpn tun0 interface. When I look at the packet marks with nft monitor trace, it seems the meta mark is properly set and only the appropriate packets (to/from port 1194) appear.
So I suspected this is:
- the pbr rule that does not work as expected.
- the packet marking that does not happen early enough.
I tried to change the chain to mark outgoing packets as:
- type route hook output
- type filter hook output
- --> with no more luck
These commands return the following:
- ip rule:
0: from all lookup local
32764: from all lookup main suppress_prefixlength 0
32765: not from all fwmark 0x4 lookup vpn
32766: from all lookup main
32767: from all lookup default
- ip route show table vpn:
default dev tun0 scope link
- ip route:
default via 10.8.0.1 dev tun0 proto static metric 50
default via 192.168.1.1 dev wlp4s0 proto dhcp src 192.168.1.10 metric 600
10.8.0.0/24 dev tun0 proto kernel scope link src 10.8.0.2 metric 50
END.POINT.IP.ADDRESS via 192.168.1.1 dev wlp4s0 proto static metric 50
192.168.1.0/24 dev wlp4s0 proto kernel scope link src 192.168.1.10 metric 600
-nft list ruleset:
table inet vpn {
chain premangle {
type filter hook prerouting priority mangle; policy accept;
ip saddr END.POINT.IP.ADDRESS tcp sport 1194 meta nftrace set 1
meta mark set ct mark
}
chain postmangle {
type filter hook postrouting priority mangle; policy accept;
ip daddr END.POINT.IP.ADDRESS tcp dport 1194 meta nftrace set 1
ip daddr END.POINT.IP.ADDRESS tcp dport 1194 meta mark set 0x00000004
meta mark 0x00000004 ct mark set meta mark
}
}
- traceroute -n --fwmark=0x4 END.POINT.IP.ADDRESS
shows it goes via the physical interface out of the vpn (as expected)
- traceroute -n END.POINT.IP.ADDRESS
shows it goes via the physical interface out of the vpn (UNWANTED)
Thank you so much in advance !
If not using Strict Reverse Path Forwarding ("SRPF"), then no nftables should be used at all.
While routed (forwarded) traffic usually works fine when marks are handled in iptables or nftables, locally initiated rerouted traffic because of a mark (in
type route hook output
chain) usually gets issues: the reroute check which happens in thetype route hook output
chain won't magically change the local source IP address that was already chosen on the client socket. Usually it's the wrong IP address. It thus usually requires a NAT bandaid (that would be needed intype nat hook output
) and will probably get UDP handling even more difficult than it already is in a multi-homed environment. Using nftables for this should be avoided whenever possible.Just as WireGuard, OpenVPN can adequately set the firewall mark itself on its envelope outgoing traffic, and this will then happen before any route lookup happens for locally outgoing traffic:
This works the same as WireGuard: the outgoing envelope packets, on the real interface, get the mark, probably by having the client use
SO_MARK
on its socket before connecting to the server:Of course if neither rerouting nor direct use of policy routing, including direct marking (with
SO_MARK
or an equivalent method) are in place, chances are it won't work at all.So delete all nftables rules:
and instead add in the client configuration:
Keep the routing rules and table (they should probably be integrated in VPN hooks):
Note: the parts at the end of this answer, only for the SRPF case, should be added before adding the routing table entry above to avoid temporary disruption.
Do not add a default route through the VPN nor an explicit route to the remote endpoint. Don't have the server push this configuration. Or have the client ignore it with:
or:
In order that these routes don't appear:
but only this one gets added:
Instead the policy routing rules will handle the default route by selecting the routing table
vpn
only when adequate.As explained in my answer to the 1st linked Q/A, most of the nftables ruleset for WireGuard's
Table = auto
+AllowedIPs = 0.0.0.0
is to handle SRPF for reply traffic. There are a few cases:rp_filter=0
everywhereincluding
net.ipv4.conf.default.rp_filter
andnet.ipv4.conf.all.rp_filter
. No RPF check: nothing to do. No nftables needed.rp_filter=1
Now envelope reply traffic can fail SRPF
Either choose Loose RPF on the main interface:
and be done with it. No nftables needed,
or implement all the logic to mark return envelope traffic just as is done in WireGuard
Have the fwmark also be used in reverse path lookup
by enabling
src_valid_mark
on main interface (could have been made onall
instead), thus allowing SRPF to pass:Transpose (IPv4 only here) WireGuard's setup
as seen in linked Q/A with additional corner cases described at the end also accounted for, so reply traffic gets the fwmark
Chain
preraw
is optional and can be removed if needed. It protects against remote (LAN) attempts to access the internal VPN local address.The mark is created by OpenVPN on outgoing envelope packets, copied into the connmark at hook postrouting, and re-injected into reply envelope packets at hook prerouting. No endpoint address or port appears anywhere.
No rerouting is done (no
type route hook output
nortype nat hook output
present).Note: the sysctl command and the nftables ruleset above should both be executed before adding the default route in the routing table
vpn
or temporary loss of connectivity will happen until the VPN TCP socket recovers (still, only once both are added).The client system can now reach the server from within the tunnel.
Connectivity tests can be done like this:
OP's
tcpdump
should reach END.POINT.IP.ADDRESS in a single hop: through the VPN.At least on an amd64 (x86-64) architecture, the VPN can be bypassed (as root) with:
where
setsockopt-listen
means: useSO_MARK
before connecting (rather than listening, for this case). and the 4 inL4
is the same mark value as used by OpenVPN.Note: the specific case of the client querying through the tunnel an UDP service on the server with a server's public IP address can hit a common issue not really related to VPN but to using UDP and being multi-homed. This requires the UDP service to be multi-homed aware: usually either by using multiple UDP sockets, binding once for each local address (so usually at least once per interface) or with a single unbound UDP socket by using
IP_PKTINFO
with additional handling code.First thanks so much for the prompt reply.
Ow, ok. I understand that since the initial routing decision just out of the requesting process already happened, without NAT, the re-routing will send the packets with a wrong source and hence fail.
So I did the following:
Concerning rp_filter, my distro default is 2, I won't add anymore complexity with that. So leaving all this out of the way as advised. I set all these to 0 for now.
I kept nftables only to trace packets and ensure openvpn did mark the packets as per the below).
However now, I cannot access the webserver anymore:
I did
ip route flush cache
just in case. With no success, while all seems in order with:Wirehshark does not show any packet related to the request, no even to a wrong address or no answer.
Any clue of anything that can get in the way, please ?! Sorry if I misunderstood any point, the setup you proposed makes sense to me frankly. Thanks !