Imagine some linux systems with scripts of various kinds (mostly PERL but it could be anything that writes to STDERR) which are run 100s of times by different users with slightly different needs.
Logs are kept of the output and the warnings/errors (stderr) from each run the script. That means 1000s of logs are accumulating.
The users make mistakes. And the developers dont always write clean code, etc.
And we'd like to tell from the logs what is going on, both (programatically) in each case and (administratively, analytically) in understanding trends over time.
This problem can also be thought of in a web server/cgi context since that often generates 100s of runs of scripts, but I'm not looking for solutions peculiar to apache access/error logs.
What free/open source software tools exist, in general, for identifying and analysing unusual output from such a collection of logs where each log represents one run of a procedure?
Useful features could include:
- can compare the stdout/stderr from this run to the historical output and determine what parts of the stdout or stderr are unusual or noteworthy
- can achieve 'compression' over storing all the logs in plain text, by eliminating the need to store the same error 100 times or more
- can analyse the entire store for trends (this message is showing up less or more than in the past) as well as counts (the most frequent errors are these)
- has a browsable user interface of some sort with graphics and data export
For instance, one can take all of the logs made from stderr, cat them, and run them through sort and uniq -c and sort again to make a list of the error strings from least frequent to most frequent. One could also begin dumping the logs into a SQL database of some sort.
This could become the building block of a tool, but maybe there are complete packages that do this already and much much more. So I thought I would ask to see what other people use.
Do you develop in-house tools for this sort of thing or are there good open source alternatives?
My Thoughts would be: Use petit (http://opensource.eyemg.com/Petit) to analyze instead of uniq. Logs could be stored in .gz format so the following may achieve your first 3 goals and it is GPL. There is no graphical interface or notion of exporting.
Or
It sounds like Splunk would be a good answer to many of your requirements, if not all. It's quite easy to get running to evaluate. If you've already considered it perhaps comment on why it doesn't suit your needs.
Cheers