Has anyone had any issues with logrotate before that causes a log file to get rotated and then go back to the same size it originally was? Here's my findings:
Logrotate Script:
/var/log/mylogfile.log { rotate 7 daily compress olddir /log_archives missingok notifempty copytruncate }
Verbose Output of Logrotate:
copying /var/log/mylogfile.log to /log_archives/mylogfile.log.1 truncating /var/log/mylogfile.log compressing log with: /bin/gzip removing old log /log_archives/mylogfile.log.8.gz
Log file after truncate happens
[root@server ~]# ls -lh /var/log/mylogfile.log -rw-rw-r-- 1 part1 part1 0 Jan 11 17:32 /var/log/mylogfile.log
Literally Seconds Later:
[root@server ~]# ls -lh /var/log/mylogfile.log -rw-rw-r-- 1 part1 part1 3.5G Jan 11 17:32 /var/log/mylogfile.log
RHEL Version:
[root@server ~]# cat /etc/redhat-release Red Hat Enterprise Linux ES release 4 (Nahant Update 4)
Logrotate Version:
[root@DAA21529WWW370 ~]# rpm -qa | grep logrotate logrotate-3.7.1-10.RHEL4
Few Notes:
- Service can't be restarted on the fly, so that's why I'm using copytruncate
- Logs are rotating every night, according to the
olddir
directory having log files in it from each night.
This is probably because even though you truncate the file, the process writing to the file will continue writing at whichever offset it were at last. So what's happening is that logrotate truncates the file, size is zero, process writes to the file again, continuing at the offset it left off, and you now have a file with NULL-bytes up to the point where you truncated it plus the new entries written to the log.
od -c after truncate + sudden growth, generated output along the lines of:
What this says is from offset 0 to 33255657600 your file consists of null bytes, and then some legible data. Getting to this state doesn't take the same amount of time it would take to actually write all those null-bytes. The ext{2,3,4} filesystems support something called sparse files, so if you seek past a region of a file that doesn't contain anything, that region will be assumed to contain null-bytes and won't take up space on disk. Those null bytes won't actually be written, just assumed to be there, hence the time it takes to go to 0 to 3.5GB don't take a lot of time to. (You can test the amount of time it takes by doing something like
dd if=${HOME}/.bashrc of=largefile.bin seek=3432343264 bs=1
, this should generate a file of over 3GB in a few milliseconds).If you run
ls -ls
on your logfiles after they've been truncated and had a sudden growth again, it should now report a number at the beginning of the line which represents the actual size (in blocks occupied on disk), which probably is orders of magnitude smaller than the size reported by justls -l
.I am extremely confident that Kjetil has hit it. Drew, you may not yet be convinced by his explanation but I urge you to read carefully what he's said.
If you accept it, the fix is either to stop and restart your application when the logs are rotated, or use a tool like apache's "rotatelogs", where you feed the log output to the tool via a pipe, and the tool takes care of rotating the logfile every so often. For example, one of my apache instances logs with
which causes a lot of logfiles with names like
to appear without restarting apache; I can then compress them manually after the fact. Note how the rotation is done every week, which is every 604800 seconds, that being the argument passed to
rotatelogs
.If you can't stop and restart the app, and it can't log via a pipe, then I think you've got a real problem. Perhaps others will have suggestions.
it would be really great if you could send the whole logrotate.
Why try to use kill -HUP ? (Classical reloading not restarting) method.
Also ... check with
lsof
who is accessing the file.Just use ">>" which means append instead of ">" which means create from your scripts that write to this file. I had the exact same issue and I fixed it using append in my script.
SomeScript.sh >> output.txt
Hope that is clearer.