I'd like to parse logfiles. Is the logfile format of syslogd the same for all systems? On my system (Debian Lenny), it's:
Mar 7 04:22:40 my-host-name ...
(I'm not much interested in the ... part)
Can I rely on this? And is there maybe some more-or-less official description? The manpage of syslogd
describes the config format, but not the logfile format.
Ideally, the description would give the fields official names like (date, time, host, entry) or (datetime, hostname, message). Maybe additionally some regular expressions. I'd like to use the names and regexes in my script, to avoid an unnecessary deviation from the standard, and to make sure, that the script runs everywhere.
Thanks
Chris
RFC 3164 that Warner pointed you to describes the network format for UDP syslog messages, you can rely on this being what goes over the wire, but syslogd may write something slightly different to disk when it logs your messages.
That said, you can usually rely on syslog entries resembling what's described in the RFC, roughly in the form:
Date is of the form
Jan 1 00:00:01
Hostname is usually the short hostname, but may be fully qualified (particularly if you're logging a message from a remote host)
Tag is freeform, but by convention doesn't contain
:
. It is often of the formprocname[PID]
, and I believe always followed by a literal:
Message is freeform
If you need a better guarantee of consistency in your log format syslog-NG is worth looking in to -- it will let you define your fields & insert markers to ensure you can parse the resulting files. syslog-NG also lets you include metadata like the facility+priority values from the message. Using syslog-NG reduces the definition of "everywhere" to "machines running syslog-NG with a configuration similar to yours" though.
The RFC should answer this question. To my knowledge: yes, that's usually the case.
The devil is in the RFC that @warner linked:
4.1.3 MSG Part of a syslog Packet
This essentially says the developer can stick whatever they want into CONTENT, so there really is no standard for the actual contents of messages, just for the organization of messages. I might say that this is a flaw but I'm not sure yet.