We are currently using syslog-ng to dump files to a network storage location. Each day there are 5 .log files that are written by various servers and at the end of the day I need to merge the 5 files in chronological order and then compress them. For the past 2 years, I've use logmerge and it's worked great. The exact syntax is:
/local/bin/logmerge -f /mnt/logs/Windows/`date -d yesterday +\%Y-\%m-\%d`-sys*.log | gzip -9 -c > /mnt/logs/Windows/`date -d yesterday +\%Y-\%m-\%d`.log.gz && rm -f /mnt/logs/Windows/`date -d yesterday +\%Y-\%m-\%d`-sys*.log
Over the past few weeks this process has broken due to how large the .log files have gotten. Each one is now over 7 GB and the logmerge process is failing on sorting so many lines. Right now I'm just gzipping them but it makes searching harder because the logs aren't in order.
Is there a better way to merge these files and zip them up?
It rather sounds like you may want to look into some form of database to store your logs.
One possibility might be to use the ELK stack:
It isn't necessarily the answer you might have been looking for, but it sounds like you might have a legitimate use case for a solution like it. You can also consider something like Splunk, but given your volume of data, that will cost you.
Logstash can also be used on Windows machines to read the EventLog, so might allow you to achieve your goals without using syslog at all (if I am reading between the lines of your setup correctly).
It may also be there is something you can do about how the logs are being written to help avoid such massive files, but I would tend to think that if you are regularly dealing with 7GB of logs you periodically need to search through, a solution geared towards that use case might be more practical.
Updated I see. In which case, is it not possible to have syslog-ng write everything either to one massive daily file (rather than 5), or to have syslog-ng write everything to a series of files up to a certain size (e.g. 10 700M files, each created after the last fills)?
It really sounds like the issue is having your data out of order, and I would have thought there are ways to avoid that issue, by configuring syslog accordingly. Since it sounds like the timestamps are more important than the sources, I would imagine that timestamps alone (or possibly, timestamps and maximum log size) should determine how events are stored in the first place.