I have the following config:
LAN 01: 192.168.16.0/24 (LAN for internal servers)
LAN 02: 192.168.67.0/24 (LAN for workstations)
WAN: X.X.X.X
And then:
PFSENSE LAN IP: 192.168.16.1
PFSENSE LAN IP: 192.168.67.1 (it's a virtual IP)
LAN 01 and LAN 02 are physically connected (i.e. in the same switch. I know I should use separate LANs or at least VLANs on them, but I can not easily change this configuration for now).
I have a PFSENSE installation (2.2) working where computers in LAN 02 get their IP addresses from a DHCP SERVER and use PFSENSE as default gateway.
Here's my problem:
If I sit on a computer residing on LAN 02 and I ssh (or any other persistent protocol for that matter) onto a server residing on LAN 01 like this:
$ ssh -l myself 192.168.16.25
I connect without issues. The connection lasts for something in between 20 and 30 seconds, and then it consistently gets dropped.
So my question is: What can I do to avoid getting the connection dropped?
I did a tcpdump from both sides and, at some point, packets start to get duplicated. It looks like this:
I have this option enabled which I thought it would help, but it didn't.
I should mention that this exact same configuration, using a LINUX FIREWALL (iptables) works perfectly.
Any ideas?
I'm guessing your listing of LAN1 and LAN2 as both 192.168.1.0/24 is wrong given the capture shows one is 192.168.16.0 and one is 192.168.67.0 apparently, hopefully both /24s.
The static route filtering option has no applicability here.
I'm guessing you either have overlapping networks (not a /24 mask on both, maybe /16 on some hosts), or one of the affected systems is dual homed on both networks which causes asymmetric routing.
I had a very similar problem and the issue was essentially a variation of asymmetric routing.
With my topology, I have a PFSENSE box with 2 LAN interfaces - both /24s but definitely different subnets. I then have an L2\L3 switch that connects both interfaces to the rest of the network in different VLANs. Also hanging off this switch are wired users and a segment that has all wireless users. The wired users are in one subnet\VLAN and the wireless in another - and both of these subnets are what exist on the PFSENSE box. All endpoints use the PFSENSE IP for their respective subnet as their DG. And finally, the switch also has an IP in both of the aforementioned subnets.
My problem was that if I was connected via wireless and SSH'd to the switch, I would connect fine and then drop in 20-30 secs. As you probably already realize, because the switch had an IP in the same subnet as my machine, return packets from the switch would go direct to my machine rather than following the same path as packets from my machine. The switch would essentially just side-step the PFSENSE box.
In many situations, this actually could work fine especially for non-stateful intermediaries and\or UDP sessions. However, the PFSENSE box is a stateful device so after a few seconds, PFSENSE sees no repsonse to the TCP OPEN and ends up killing the state. To confirm, you can tweak PFSENSE's TCP OPEN timeout value (System --> Advanced --> State Timeouts) and then observe that the time it takes for the SSH session to drop will follow what you have set. The default for this value is, as expected, 30s. Upon removing the relevant IP from the switch - BAM - issue fixed.
While not mentioned specifically in the OP's described topology, I suspect there may be a switch (or similar) between the relevant endpoints. Maybe not but if so, then it might be the same issue I had here. Alternatively, if the server in the OP's topo is dual-homed, then this issue would occur.