I am trying to set up a VPN tunnel using StrongSwan 5.1.2 between two Amazon AWS EC2 instances running Ubuntu 14.04.2 LTS. Prior to using StrongSwan, I used open(libre)swan on an Amazon RedHat AMI, which worked fine. For some reason I can't even get IKE to work here for StrongSwan. I triple checked my AWS configurations, and it all looks good, so it must be a problem with StrongSwan configuration.
As you will see below, the error I am getting is "Error writing to socket: Invalid argument". I have looked online and really can't find the solution to this. I am convinced my strongswan ipsec.conf is improperly configured.
Here is what I am working with:
Instance #1: N.Virginia - 10.198.0.164 with public EIP 54.X.X.X
Instance #2: Oregon - 10.194.0.176 with public EIP 52.Y.Y.Y
The (simple) topology is as follows:
[ Instance #1 within N.Virginia VPC <-> Public internet <-> Instance #2 within Oregon VPC ]
I verified that the following AWS configs are correct:
Security groups permit all
IP information is correct
Src/Dest disabled on both instances
ACLs permit all
routes are present and correct (route to 10.x will point to that local instance in order to be routed out to the VPN tunnel)
Below is the /etc/ipsec.conf (this is from Oregon, however it is the same on the N.Virginia instance except the left|right values are reversed):
config setup
charondebug="dmn 2, mgr 2, ike 2, chd 2, job 2, cfg 2, knl 2, net 2, enc 2, lib 2"
conn aws1oexternal-aws1nvexternal
left=52.Y.Y.Y (EIP)
leftsubnet=10.194.0.0/16
right=54.X.X.X (EIP)
rightsubnet=10.198.0.0/16
auto=start
authby=secret
type=tunnel
mobike=no
dpdaction=restart
Below is the /etc/ipsec.secrets *(reversed for other instance, obviously):
54.X.X.X 52.Y.Y.Y : PSK "Key_inserted_here"
Below is the /etc/strongswan.conf:
charon {
load_modular = yes
plugins {
include strongswan.d/charon/*.conf
}
}
Below is the /etc/sysctl.conf:
net.ipv4.ip_forward=1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
Here is the debug output from /var/log/syslog It seems the problem here is "error writing to socket: Invalid argument; after everything I tried, I continue to get this same error:
Jun 17 17:34:48 ip-10-198-0-164 charon: 13[IKE] retransmit 5 of request with message ID 0
Jun 17 17:34:48 ip-10-198-0-164 charon: 13[NET] sending packet: from 54.X.X.X[500] to 52.Y.Y.Y[500] (1212 bytes)
Jun 17 17:34:48 ip-10-198-0-164 charon: 03[JOB] next event in 75s 581ms, waiting]
Jun 17 17:34:48 ip-10-198-0-164 charon: 16[NET] sending packet: from 54.X.X.X[500] to 52.Y.Y.Y[500]
Jun 17 17:34:48 ip-10-198-0-164 charon: 13[MGR] checkin IKE_SA aws1vexternal-aws1oexternal[1]
Jun 17 17:34:48 ip-10-198-0-164 charon: 13[MGR] check-in of IKE_SA successful.
Jun 17 17:34:48 ip-10-198-0-164 charon: 16[NET] error writing to socket: Invalid argument
Jun 17 17:36:04 ip-10-198-0-164 charon: 03[JOB] got event, queuing job for execution
Jun 17 17:36:04 ip-10-198-0-164 charon: 03[JOB] no events, waiting
Jun 17 17:36:04 ip-10-198-0-164 charon: 08[MGR] checkout IKE_SA
Jun 17 17:36:04 ip-10-198-0-164 charon: 08[MGR] IKE_SA aws1vexternal-aws1oexternal[1] successfully checked out
Jun 17 17:36:04 ip-10-198-0-164 charon: 08[IKE] giving up after 5 retransmits
Jun 17 17:36:04 ip-10-198-0-164 charon: 08[IKE] establishing IKE_SA failed, peer not responding
Jun 17 17:36:04 ip-10-198-0-164 charon: 08[MGR] checkin and destroy IKE_SA aws1vexternal-aws1oexternal[1]
Jun 17 17:36:04 ip-10-198-0-164 charon: 08[IKE] IKE_SA aws1vexternal-aws1oexternal[1] state change: CONNECTING => DESTROYING
Jun 17 17:36:04 ip-10-198-0-164 charon: 08[MGR] check-in and destroy of IKE_SA successful
Below is what I have tried so far:
1) Verified layer 3
2) rebooted machines
3) Tried adding in leftid=
4) Tried doing ipsec update then ipsec restart
5) Tried adding nat_traversal=yes under confif setup (note that this shouldn't matter since ipsec statusall verified using IKEv2, which according to documentation automatically uses nat_traversal)
6) Tried omitting virtual_private <-- Was used according to AWS openswan documentation so I included it in strongswan config.
7) Tried disabling net.ipv4.conf.all.send_redirects = 0 and net.ipv4.conf.all.accept_redirects = 0 in /etc/sysctl.conf
8) Tried using private IP instead of EIPs. I no longer get the socket error, however obviously the two IPs can't communicate to each other to peer...
9) Tried adding this to strongswan.conf: load = aes des sha1 sha2 md5 gmp random nonce hmac stroke kernel-netlink socket-default updown
10) Tried using leftfirewall=yes, didn't work
Please help! Thanks!
EDIT #1:
Michael's response cleared the original problem, however I have a new problem related to routing. Both VPN instances are unable to ping each other. Furthermore, when I try to ping from a random instance in either subnet, to either another random instance or the far end VPN instance, I get the following ping response:
root@ip-10-194-0-80:~# ping 10.198.0.164
PING 10.198.0.164 (10.198.0.164) 56(84) bytes of data.
From 10.194.0.176: icmp_seq=1 Redirect Host(New nexthop: 10.194.0.176)
From 10.194.0.176: icmp_seq=2 Redirect Host(New nexthop: 10.194.0.176)
From 10.194.0.176: icmp_seq=3 Redirect Host(New nexthop: 10.194.0.176)
From 10.194.0.176: icmp_seq=4 Redirect Host(New nexthop: 10.194.0.176)
Obviously this must be a routing issue between the two VPN instances (most likely due to strongswan config or instance routing table) since the 10.194.0.80 host in the Oregon subnet is able to receive a response from the Oregon VPN instance. Route table + traceroute on instance:
root@ip-10-194-0-80:~# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.194.0.1 0.0.0.0 UG 0 0 0 eth0
10.194.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
root@ip-10-194-0-80:~# traceroute 10.198.0.164
traceroute to 10.198.0.164 (10.198.0.164), 30 hops max, 60 byte packets
1 10.194.0.176 (10.194.0.176) 0.441 ms 0.425 ms 0.409 ms^C
When I was using openswan, it did not require me to make any manual modifications to each instance's routing table.
Here is the Oregon VPN instance's routing table:
root@ip-10-194-0-176:~# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.194.0.1 0.0.0.0 UG 0 0 0 eth0
10.194.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
I'm a bit stumped.
EDIT #2:
Looks like routing between the VPN instances might not be the problem: /var/log/syslog shows packets being received from one VPN instance public IP to the other VPN instance
Jun 23 19:57:49 ip-10-194-0-176 charon: 10[NET] received packet: from 54.X.X.X[4500] to 10.194.0.176[4500] (76 bytes)
Looks like it is an issue related to Child Security Associations:
aws1oexternal-aws1nvexternal: child: 10.194.0.0/16 === 10.198.0.0/16 TUNNEL, dpdaction=restart
Security Associations (1 up, 0 **connecting**):
/var/log/syslog:
Jun 23 19:52:19 ip-10-194-0-176 charon: 02[IKE] failed to establish CHILD_SA, keeping IKE_SA
Jun 23 19:52:48 ip-10-194-0-176 charon: 11[IKE] queueing CHILD_CREATE task
Jun 23 19:52:48 ip-10-194-0-176 charon: 11[IKE] activating CHILD_CREATE task
Jun 23 19:52:48 ip-10-194-0-176 charon: 06[IKE] establishing CHILD_SA aws1oexternal-aws1nvexternal
Jun 23 19:52:48 ip-10-194-0-176 charon: 10[IKE] received FAILED_CP_REQUIRED notify, no CHILD_SA built
Jun 23 19:52:48 ip-10-194-0-176 charon: 10[IKE] failed to establish CHILD_SA, keeping IKE_SA
Jun 23 19:52:49 ip-10-194-0-176 charon: 14[CFG] looking for a child config for 10.194.0.0/16 === 10.198.0.0/16
Jun 23 19:52:49 ip-10-194-0-176 charon: 14[CFG] found matching child config "aws1oexternal-aws1nvexternal" with prio 10
Jun 23 19:52:49 ip-10-194-0-176 charon: 14[IKE] configuration payload negotiation failed, no CHILD_SA built
Jun 23 19:52:49 ip-10-194-0-176 charon: 14[IKE] failed to establish CHILD_SA, keeping IKE_SA
***EDIT #3: Problem solved (uhh, actually see EDIT #4 below...)****
Problem fixed.
1) I did not properly follow Michael's config directions. I also configured a rightsourceip and leftsourceip together, thereby causing both instances to believe they were both initiators. I ensured that one was an initiator and one was a requestor; this fixed the IKE problem.
2) I figured out that I also had to explicitly set the esp parameter. Even though there is already a default (aes128-sha1,3des-sha1), the esp parameter still has to be set in order for the instance to know to use esp OR ah (but not both). I ended up using aes128-sha1-modp2048.
Hope this posting helps the next linux newbie set this up!!
Cheers!
EDIT #4: Problem (not really) solved
While troubleshooting a separate issue related to strongswan, I changed the "leftfirewall" parameter, tested, didn't fix my separate issue, then reverted back to the orig config beforehand (commented out leftfirewall). I then noticed that I now couldn't ping across the tunnel. After going crazy for hours trying to figure out what happened, I commented out the esp parameter to see what would happen: I CAN NOW PING ACROSS THE TUNNEL AGAIN! <- so, there is a possibility there are some ipsec ghosts running around playing tricks on me and that the esp parameter isn't really the fix for the TS_UNACCEPTABLE errors (although other resources online state the esp parameter is the fix...)
EDIT #5: Problem fully solved
I ended up moving everything into a test environment and starting from scratch. I installed from source using the latest version (5.3.2) rather than the older version that was in the Ubuntu repo (5.1.2). This cleared the problem I was having above, and verified layer 7 connectivity using netcat (great tool!!) between multiple subnets over the VPN tunnel.
Also: It is NOT required to enable DNS hostnames for the VPC (as I was incorrectly led to believe by Amazon), FYI>
Hope this all helps!!!!!!
Additional edit 2/11/2017:
As per JustEngland's request, copying the working configuration below (leaving out certain details in order to prevent identification in any way):
Side A:
# ipsec.conf - strongSwan IPsec configuration file
# basic configuration
config setup
# Add connections here.
conn %default
ikelifetime= You choose; must match other side
keylife= You choose; must match other side
rekeymargin= You choose; must match other side
keyingtries=1
keyexchange= You choose; must match other side
authby=secret
mobike=no
conn side-a
left=10.198.0.124
leftsubnet=10.198.0.0/16
leftid=54.y.y.y
leftsourceip=10.198.0.124
right=52.x.x.x
rightsubnet=10.194.0.0/16
auto=start
type=tunnel
# Add connections here.
root@x:~# cat /etc/ipsec.secrets
A.A.A.A B.B.B.B : PSK "Your Password"
Side B:
# ipsec.conf - strongSwan IPsec configuration file
# basic configuration
config setup
conn %default
ikelifetime= You choose; must match other side
keylife= You choose; must match other side
rekeymargin= You choose; must match other side
keyingtries=1
keyexchange= You choose; must match other side
authby=secret
mobike=no
conn side-b
left=10.194.0.129
leftsubnet=10.194.0.0/16
leftid=52.x.x.x
right=54.y.y.y
rightsubnet=10.198.0.0/16
rightsourceip=10.198.0.124
auto=start
type=tunnel
root@x:~# cat /etc/ipsec.secrets
B.B.B.B A.A.A.A : PSK "Your Password"
In VPC, the public IP address of an instance is never bound to the instance's stack, so you have to configure both the internal private address and the external public address. The invalid argument is presumably caused by trying to source traffic directly from the public IP address, which isn't known to your instance.
Problem fixed.
1) I did not properly follow Michael's config directions. I also configured a rightsourceip and leftsourceip together, thereby causing both instances to believe they were both initiators. I ensured that one was an initiator and one was a requestor; this fixed the IKE problem.
2) I figured out that I also had to explicitly set the esp parameter. Even though there is already a default (aes128-sha1,3des-sha1), the esp parameter still has to be set in order for the instance to know to use esp OR ah (but not both). I ended up using aes128-sha1-modp2048.