I'm trying to figure out how other people implement their log management systems.
I have 20-30 Linux servers and a few Windows boxes (most of them virtualized). We utilize a lot of Perl and Bash scripts to do most of our automated jobs and I'm trying to standardize their logging.
I've been looking at log4perl and log4sh for logging of scripts and syslog-ng to get all the logs on a centralized logging server. I've also read up on splunk, even though is sounds like the enterprise edition is pretty pricey and I might go over the free license limit with all my servers.
I've seen other tools like swatch and logcheck, but I'm not quite sure how all these pieces fit together... Any recommendations would be greatly appreciated!
I've got about 30 servers, and I just use straight up syslog to send all the logs to a single logging server. For backup, all of the machines are also configured to store their own logs locally for a few days, using logrotate to take care of the rotation and deletion of old logs.
Each of my application servers runs a small perl script to send their logs to syslog, which then forwards on to the loghost (perl script below).
Then on the loghost we have some custom scripts that are similar to logcheck that basically watch the incoming logs for anything suspicious.
We also have all of the email from every host going to one place, so that if any program complains that way, we get all the messages. This could theoretically go to a single mailbox that a program could act on and analyze.
Here is my logging perl script. It works by piping the program's output into it, and then it syslogs the output and spits it back out so you can send it elsewhere (I send to multilog). You can also give it the -q option to just go to syslog.
Although I haven't implemented it yet, I'm planning on moving all of my log-generating machines to rsyslog, and implementing a bastion-type server which will function as the collector of syslogs. From there, I think the free version of Splunk can do everything I need to pull out information.
Now just to implement it...
I use a central syslog host. Each edge system sends *.debug to the central loghost. The central syslog host runs syslog-ng, and has rules to split logs so that each machine generates its own files named for that day. It also dumps everything into a single file, against which I run a descendant of logcheck.sh.
Once a day I run a log compacter, which zips up any logs older than 7 days, and deletes anything older than 28 days. Between the two, it gives logs an expected life of 35 days on the server, which means that all logs should make it to monthly backups, where they can be recovered for up to two years.
It's storage-intense, but seems to be the best way to assure coverage.
For centralized logging, I would highly recommend LogZilla. We've been using it for over a year now and absolutely love it. The UI is extremely easy to learn and use and installation took me about an hour.
Even if you don't, you really should try to get away from script-based monitoring as that's exactly what you get...monitoring. What you should try to achieve is Management. Repairing problems on Top talkers, etc. will greatly reduce the amount of "fires" triggered by script-base monitoring. Here's a very good article on syslog management:
http://www.cisco.com/en/US/technologies/collateral/tk869/tk769/white_paper_c11-557812.html
We use an appliance from LogLogic for our enterprise logging. It's based on syslog, so all *nix boxes have no problem using it; there is a small app that needs to be installed on windows servers. I can search on anything I want, including REGEX queries, and it seems to be able to handle quite a bit of load(our Active Directory setup alone generates a mind boggling amount of traffic).
For the centralized logging server, you can take a look at my Octopussy project.
It's a lot of work at the begining, but after you can do a lot of things with these logs !
Here is a tutorial that I wrote that covers all of the aspects of centralized logging and analysis.
Link: http://crunchtools.com/centralizing-log-files/