We have several applications which are generating their own plain-text log files, which I would like to forward to a remote syslog server for centralized logging. I do not have access to root
on these machines, nor can I reconfigure syslog
to redirect output to a remote machine.
I've found some solutions online, but they're mostly people's homemade bash scripts, and I'm looking for something more robust that is suitable for implementation in a potentially high-volume production environment.
Preferably something designed with an eye for a small footprint, background daemon that keeps running, that can keep up with a lot of lines, etc. - What solutions are currently available?
You've already rejected "other people's bash scripts", but this is a pretty common solution -- some creative use of the
logger
command can follow a file and send its contents elsewhere.I personally wouldn't do this in a production environment though.
A better option which requires less scripting hackery is using
rsyslogd
and the text file input module like yoonix mentioned -- This is a pretty decent solution though there is some potential for lost lines during a file rotation, and if you're on a Linux system withrsyslog
as your syslog daemon there's not much additional work required.syslog-ng
also supports a file input source with functionality similar torsyslog
's.IMHO the best solution - albeit one which requires modifying the application generating these logs - is to log to syslog directly. You don't want to be going through intermediary steps, files, etc. --
syslog
is the SYStem LOGger, and things that write logs on a Unix platform should be sending them to syslog.Implementation of this is, unfortunately, left as an exercise for the reader (and application developer) and may not be possible if your developers are nonexistent, lazy, or incompetent....
You could use logstash with the file input and syslog output.
For example, create a configuration with the file (or files) you want to monitor and your syslog server info.
file-to-syslog.conf:
The start up logstash with
I hacked together
tail.c
andlogger.c
into a single, small footprint compiled program (binary) that is lightweight, fast and stable. As long as it has read access to the log file(s), then it works without needing root privilege.I also made a couple improvements to the native logger and added a new (optional) capability of inserting a text string at the beginning of each log line before it gets sent to the log server. The result is a program that can be run by itself, without needing to use shell pipes (i.e. don't need to
tail logfile | logger
). It will run forever until explicitly killed or it encounters an error writing to the network socket. It even continues to run if the log file is rotated or even disappears (it will just continue to look to see if the file reappears.)It's easy to use: just give it one or more log file(s) to monitor, and each time a new line gets written to the file, it will send a copy of that line to the local or remote syslog server you specify. Plus the extra text string if you use that option.
I actually finished the program back in December, but was waiting for Yahoo to take copyright and make it available, which they've now done. (I wrote it as part of my job at Yahoo).
filelogger program information and download link:
There are a number of ways to tackle this. But the very, very first thing you should do is: forward the logs using syslog itself.
Syslog (and many replacements for syslog) have built-in facilities to forward logging to another syslog server at a different address. You can easily do so by changing the configuration file and appending the address to forward the facility to. For instance, adding this line to:
...would forward all facilities to the machine at 192.168.1.1, which (hopefully) has the service running. The example I give here is for rsyslog, which is the stock syslog server on Debian, although it should work for many others. Consult the documentation for your implementation of syslog with
man syslog
and see what it says about "forwarding".The remote syslog server can be anything you like. There are even products, like Splunk, which happily aggregate these logs into a single view with a web dashboard, search, event-driven notifications, etc. etc. You can see more here: http://www.splunk.com/ If that doesn't meet your needs, you can use something else. There are even syslog servers that will dump to a SQL database!
Sure, you could write your own script/program/service to do this for you, but why re-invent the wheel when it's both done for you and already given to you?
Edit: So I went back and re-read the question, and noticed several comments. It sounds like:
So let's address each one in sequence:
root
to set up logging. We only need access to the syslog API.root
is not a requirement to write to the syslog; if this were the case, then all of those services that drop privileges would be unable to write diagnostics to the log files.Re: text dumps, this is normal. however, you should be able to use a subshell to pipe the output of STDERR and STDOUT to a program that calls the syslog API. This isn't rocket science, it's far from being brittle, and it's well documented. In fact, it's one of the reasons that output redirection even exists. A simple command that could be tossed into a single shell script would be:
( my-application 2>&1 | my-syslog-shunt ) &
if you have the ability to alter your application's source code, you should write a shunt into it to dump the text output to syslog instead of a plain text file. This shouldn't be too hard; all you do is take the lines you would output, and wrap them with a call. However....
you might not have access to the source code at all, so you can't do this. Which means something like #3 above would work fine.
I'm answering my own question.
swatch might have worked, but I was unable to get perl's Sys::Syslog module to work on the host, and the /usr/bin/logger that installed on the host does not support logging to remote server (util-linux-ng-2.17.2).
So, the first thing I did was to download the source code for util-linux-2.20.1 for which the logger program does support remote logging. Upon testing, it became apparent there is a limit imposed on the number of characters allowed on the log line. Digging into the source code I found a hard coded 400-character limit. (If you don't believe me, run "strings /usr/bin/logger | grep 400" on any Linux system).
This limit is not acceptable for apache-type of logging (including nodejs), so I modified the code and increased the limit to 4096. While I was at it, I also added a new command-line option which allows one to insert an optional text string at the beginning of each log line. I did this because the nodejs logs do not include the hostname as one would might see in apache.
At this point, I could run a shell script with "tail -F -n 0 [logfile] | ./modified_logger ...." and it worked. But I had some concerns about running this from supervise (daemontools) or even in the background, because if one or the other sides of the pipe terminates, then there is the risk the whole pipe will terminate. I also had concerns (albeit untested) about performance.
so I decided to combine the tail functionality with the logger functionality into a single executable binary that would bypass the need to use Unix pipes or external programs. I did this by hacking tail.c from gnu coreutils and incorporating what I need into the modified logger program.
The result is a new binary (117k size) which I'm calling "filelogger" and which continously monitors one or more files and logs each new line to a local or remote syslog, either via UDP or TCP. It works like a charm. I was able to do a little benchmarking and it logs about 17,000 lines (1.8MB) in about 3 seconds across subnets with a vlan and a couple physical switches between them, to a remote server running syslog-ng.
to run the program you do something like the following (either in the foreground, background, or supervised with daemontools):
./filelogger -t 'access' -d -p local1.info -n [remote loghost] -u /tmp/ignored -a $(hostname) /tmp/myfile1 /tmp/myfile2 ...
/tmp/myfile1 and /tmp/myfile2 are the files being monitored.
The "-a" is the new option I added. In this case I insert the local hostname at the beginning of each log line.
This solutions was exactly the type of solution I was looking for when I asked the question and, as it turned out, did not exist until I made it myself. :)