I have an apache/nginx/whatever web server which logs client IP addresses to the access logs. Now these log files are rotated via logrotate
.
I want to keep the IP addresses for some days, then after 7 days, I want to remove the IPs from the log files for privacy reasons (mostly dictated by German law).
Using mod_removeip
or something like that doesn't work because I need to filter some requests based on their IP addresses.
Is there any 'standard' way to do it? Maybe even with logrotate
?
EDIT
I just found this script but it depends on the ability to pipe all logging through the script in real-time. I'm not really sure about the performance implication of this approach.
Also, this only works for the 'front-end' server logs, not the application server logs.
PCRE! (Perl-Compatible Regular Expression)
Use that as a filter in a perl script or any other suitable language (quite a few use PCRE or some other close-enough regex language that will work) to rewrite your log files at 7 days.
On
Ubuntu > 12.04
/apache 2.4
, with default config you could use something like this:This creates a copy of all
*.gz
files older then 7 days and replaces the last two bytes of allIPs
0.0
in the copied version withano
suffix added.If you don't use compression or different compression like
bz2
you have to change the commands accordingly, e.g.zcat
->bzcat
.Finally you can call this routine via
cron
once per day/week.I don't think logrotate will do it; you may need to look at creating a script that will decompress the files, process them through awk or sed to strip the IP's out, then recompress them. Just can't do it on "active" log files.