I have a load-balanced set of apache servers, running 20-odd vhosts, behind an HAProxy load balancer. I'm trying to get the Apache servers to record the actual client IPs in the log files, in order to use fail2ban effectively. There's plenty of articles on this around the net like this, or this, but the solutions that they describe don't seem to work in our case.
I've got HAProxy sending the x-forwarded-for header, which contains the IP of the actual client (I've checked the traffic with tcpdump, and it's definitely there), but the original client IP isn't appearing in the access_log, just the IP of the loadbalancer. The logging configuration for the vhost I'm using to test this is:
CustomLog "/var/log/apache2/www.sitename.co.uk-access_log" "%h %{x-forwarded-for}i %a %l %u %t \"%r\" %>s \"%{Referer}i\" \"%{User-Agent}i\""
But the log data stubbornly shows the LB IP and nothing else, like:
<HAProxy IP> - <HAProxy IP> - - [29/Oct/2020:13:37:08 +0000] "GET /wp-content/uploads//path/to/file.png HTTP/1.1" 304 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.3; WOW64; Trident/7.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.30729; .NET CLR 2.0.50727; .NET CLR 3.0.30729; Tablet PC 2.0; Zoom 3.6.0; Microsoft Outlook 15.0.5285; Microsoft Outlook 15.0.5285; ms-office; MSOffice 15)"
...so I'm obviously missing something. But can anyone tell me what?
Software used: Apache/2.4.43 (OpenSSL/1.1.1d PHP/7.4.6), HAProxy 2.0.14, all running on OpenSuse 15.2
Answering my own question; this is because most of the hosts are using HTTPS. The frontend on HAProxy is therefore TCP, which means that the "forwardfor" option won't insert the header (since that's an option for HTTP frontends). Some of the vhosts don't use HTTP, which is why I saw some XFF headers in the traffic capture, but the majority do, and the ones I was testing all use HTTPS.
I'm going to have to fix this by getting HAProxy to terminate HTTPS and talk to the backend webservers over HTTP instead. Sorry for wasting everyone's time.