We use Cisco ASA for our IPSEC VPNs, using the EZVPN method. From time to time we encounter problems where an ISP has made a change to their network and our VPN stops working. Nine times out of ten the ISP denies that their change could have stopped this working - I suspect because they don't understand exactly what might have caused the problem. Rather than just bashing heads with them I want to try and point them in a direction that might get a speedier resolution.
In my current incident, I can ssh onto the external interface of the ASA and do a little poking around:
sh crypto isakmp sa
Active SA: 1
Rekey SA: 0 (A tunnel will report 1 Active and 1 Rekey SA during rekey)
Total IKE SA: 1
1 IKE Peer: {Public IP address of London ASA}
Type : user Role : initiator
Rekey : no State : AM_TM_INIT_XAUTH_V6C
At the other end of the link I see the following:
Active SA: 26
<snip>
25 IKE Peer: {public IP address of Port-Au-Prince-ASA}
Type : user Role : responder
Rekey : no State : AM_TM_INIT_MODECFG_V6H
I can't find any documentation for what AM_TM_INIT_XAUTH_V6C
or AM_TM_INIT_MODECFG_V6H
, but I'm pretty sure it means that the IKE handshake has failed for some reason.
Can anyone suggest any likely things that might be preventing IKE from succeeding, or specific details of what AM_TM_INIT_XAUTH_V6C
means?
Update: We connected the ASA at the site of a customer of another ISP. The VPN connection came up immediately. This confirms that the problem is not configuration related. The ISP is now accepting responsibility and investigating further.
Update: The connection suddenly came back online last week. I have notified the ISP to see if they changed anything, but not heard back yet. Frustratingly I am now seeing a similar issue on another site. I found a Cisco doc on the effects of fragmentation on VPN. I am starting to think that this may be the cause of the issues I am seeing.
With a little assistance from Cisco I did some deeper analysis of what was happening, and figured out the things that I needed to be checking for. The useful things that Cisco told me:
debug crypto isakmp 5
gives enough detail to see whether problems are occurring with ISAKMP trafficclear crypto isakmp sa
clears out any stale security associations.clear crypto isakmp {client_ip_address}
can be used on the HQ to clear out a specific security association (you don't necessarily want to clear all your security associations if it is only one device that is having trouble!Reading up a little on the IPSEC suite, and ISAKMP more specifically showed that the following need to be allowed through any firewalls in the path:
It seems a lot of people out there don't realise the important difference between IP protocols and TCP/UDP ports.
The following packet captures focussed on the above types of traffic. These were set up on both the remote and HQ ASAs:
You can then download the captures from each device at
https://{device_ip_address}/capture/ISAKMP/pcap
and analyse it in Wireshark.My packet captures showed that ISAKMP traffic outlined above was getting fragmented - since those packets are encrypted, once they are fragmented it is hard to put them back together and things break.
Giving this information to the ISP meant they could do their own focussed checking, and resulted in them making some changes to a firewall. Turns out the ISP was blocking all ICMP traffic on their edge router, which meant that Path MTU Discovery was broken, resulting in fragmented ISAKMP packets. Once they stopped blanket blocking ICMP the VPN came up (and I expect all their customers started getting better service in general).
It's quite possible your ISP is misinterpreting your traffic as P2P filesharing or something nefarious. Take a look at M-Lab to find out if that's what could be happening.
An AM_TM_INIT_XAUTH error likely means your pre-shared keys don't match. (source www.cisco.com/warp/public/471/easyvpn-nem.pdf)
All that the needs to work to establish an IPSec session is for udp traffic destined to port 500 (for IKE) and ESP traffic (or udp 4500 for NAT-T) to be permitted. This seems like a configuration issue rather than an ISP-caused problem. Feel free to post your relevant configuration if you'd like some help verifying.