I have large numbers of small log files that are essentially write-only, unless I have to look at them for some reason. Right now, they accumulate in day-specific subdirectories in a logging folder (e.g. 2018-12-29
for yesterday, 2018-12-30
for today, etc.) and I end up tar
/bzip2
'ing them up later into single files-per-day.
That's not terribly convenient for me, and I was thinking that if I could create a compressed filesystem for each day, I could write directly to those filesystems, use less disk space and not have to "go back" and compress each directory into a tarball. It also makes inspecting individual files later easier because I could mount the filesystem and use it however -- use grep, find, less, etc. rather than trying to use tar
to stream the data through some command pipeline.
I know I can create a loopback device of arbitrary size, but I have to know that size in advance and if I guess "too high" I end up wasting disk space with unused space and if I choose "too low", I'll run out of disk space and my software will fail (or at the very least complain very loudly).
I know I can create a sparse file, but I'm not exactly sure how that will interact with a filesystem such as extNfs or other filesystems available on Linux; it may end up expanding far larger than necessary due to backup superblocks and stuff like that.
Is there a way to create a loop-device that can take up a minimal amount of physical space on the disk?
You might create a gzip compressed ZFS pool based on plain files and store your logs on it. There would be no need to do anything else than writing the logs there.
They will, from the outset, only use their compressed size in the ZFS file systems. You will be able to read the data afterwards (grep, find, less, and so on), and even modify, delete them even if that's not part of your requirements.
Should the pool become full, you can either grow the back-end file (with the autoexpand property set to on) or add new back-end files and the file systems capacity should grow accordingly.
You should investigate the use of logrotate(8) to help manage your log files. It can be configured to rename your files to a specific date format and compress them automatically. You can also configure it to keep a specified number of logs (and many other things). Once you have it set up like you want you can basically forget about it.
Also, take a look at the tools that come with gzip/bzip2, e.g. zgrep, zless, bzgrep, bzless etc. They allow you to work with archives without you having to create pipes.
I know
logrotate
has been suggested for you here, but if you'd still would like to go forward with the compressed filesystem idea, why wouldn't you create those only after the day is over? Your shell script would then calculate the size of the logging folder, create the loopback device file of needed size, mount the loopback image, move the log files there, and finally unmount the loopback image.I can feel the pain if some stupid application you cannot/are not allowed to do anything about creates millions of log files per day under some directory and you'd still need to keep those on disk for half a year or so. In that case a loopback image might be a good idea as the active amount of small files on some partition would come down dramatically.