I'm running Apache 2.2 on a linux box. Is there any reliable way of detecting if a visitor to my website is connecting from a VPN? I've heard of attempting to open a tcp connection on the remote ip address, which can sometimes identify the user as using a proxy. But could the same method work with VPN users?
Plain and simple - you can't. There are way too many types of VPN, way too many providers, and overall way too many variables here to be able to reliably tell if someone is using a VPN.
You cannot reliably detect if a user is using VPN. There are many different VPN solutions, and regardless of your detection method a new VPN solution could come along and defeat your detection method.
That said, some VPN solutions can be detected.
The first signal I would look for is the PMTU. Because the VPN adds additional headers, they have to either reduce the MTU or use fragmentation beneath the VPN. A reduced MTU can be detected. But any kind of tunneling can cause a reduced MTU, not only VPN solutions. PPPoE can also cause a reduced MTU. So this method will end up requiring a long list of possible MTU values and the likelyhood of that MTU indicating a VPN.
I doubt it, it would look the same way as if someone was using NAT.
There are several solutions now days to do this.
Paid solutions:
blocked - paid API detection, has raw feeds you can buy
maxmind - proxy-detection via API
Free solutions:
W I T C H - detects OpenVPN via MSS values (from what kasperd said)
getIPIntel - machine learning proxy / VPN detection, via API