I have two hosts which are attempting to set up an IPSec connection with each other. For this they have to communicate on UDP ports 500 and 4500, so I opened them in the firewalls on both ends (shown in relevant part):
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -m udp -p udp --dport 500 -j ACCEPT
-A INPUT -m udp -p udp --dport 4500 -j ACCEPT
#.....
-A INPUT -j REJECT --reject-with icmp6-port-unreachable
However, the key exchange never succeeds. Each side keeps attempting to retransmit the UDP packets over and over, never hearing a response, until they finally give up.
I started tcpdump
on one end and observed that the UDP packet was being fragmented, and that an ICMP port unreachable was being returned after the second fragment came in.
An example of such a failed exchange (sanitized for your protection):
04:00:43.311572 IP6 (hlim 51, next-header Fragment (44) payload length: 1240) 2001:db8::be6b:d879 > 2001:db8:f:608::2: frag (0x5efa507c:0|1232) ipsec-nat-t > ipsec-nat-t: NONESP-encap: isakmp 2.0 msgid 00000001 cookie 55fa7f39522011ef->f8259707aad5f995: child_sa ikev2_auth[I]: [|v2e] (len mismatch: isakmp 1596/ip 1220)
04:00:43.311597 IP6 (hlim 51, next-header Fragment (44) payload length: 384) 2001:db8::be6b:d879 > 2001:db8:f:608::2: frag (0x5efa507c:1232|376)
04:00:43.311722 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 432) 2001:db8:f:608::2 > 2001:db8::be6b:d879: [icmp6 sum ok] ICMP6, destination unreachable, length 432, unreachable port[|icmp6]
The firewall logged the following in regard to this packet:
Aug 26 04:00:43 grummle kernel: iptables: REJECT IN=eth0 OUT= MAC=############### SRC=2001:0db8:0000:0000:0000:0000:be6b:d879 DST=2001:0db8:000f:0608:0000:0000:0000:0002 LEN=424 TC=0 HOPLIMIT=51 FLOWLBL=0 OPT ( FRAG:1232 ID:5efa507c ) PROTO=UDP
I was under the impression that Linux automatically reassembled fragments before passing them on to the packet filter. So why are these fragments not being reassembled and therefore the second fragment subsequently rejected?
The netfilter code only reassembles fragments for you prior to packet filtering if your firewall rules use connection tracking (i.e. the firewall rule is stateful and uses
-m conntrack
or the deprecated-m state
) or NAT. Otherwise all the fragments are processed separately and you get issues like this one.This makes resolving the issue easy and obvious (in retrospect, anyway). Just add connection tracking to the firewall rules in question.
Or for older Linux systems (e.g. RHEL 5 and earlier):