Ping a Specific Port

Question

markdrayton

Asked: 2009-05-02 05:09:46 +0800 CST2009-05-02 05:09:46 +0800 CST 2009-05-02 05:09:46 +0800 CST

Log transport and aggregation at scale

772

How're you analysing log files from UNIX/Linux machines? We run several hundred servers which all generate their own log files, either directly or through syslog. I'm looking for a decent solution to aggregate these and pick out important events. This problem breaks down into 3 components:

1) Message transport

The classic way is to use syslog to log messages to a remote host. This works fine for applications that log into syslog but less useful for apps that write to a local file. Solutions for this might include having the application log into a FIFO connected to a program to send the message using syslog, or by writing something that will grep the local files and send the output to the central syslog host. However, if we go to the trouble of writing tools to get messages into syslog would we be better replacing the whole lot with something like Facebook's Scribe which offers more flexibility and reliability than syslog?

2) Message aggregation

Log entries seem to fall into one of two types: per-host and per-service. Per-host messages are those which occur on one machine; think disk failures or suspicious logins. Per-service messages occur on most or all of the hosts running a service. For instance, we want to know when Apache finds an SSI error but we don't want the same error from 100 machines. In all cases we only want to see one of each type of message: we don't want 10 messages saying the same disk has failed, and we don't want a message each time a broken SSI is hit.

One approach to solving this is to aggregate multiple messages of the same type into one on each host, send the messages to a central server and then aggregate messages of the same kind into one overall event. SER can do this but it's awkward to use. Even after a couple of days of fiddling I had only rudimentary aggregations working and had to constantly look up the logic SER uses to correlate events. It's powerful but tricky stuff: I need something which my colleagues can pick up and use in the shortest possible time. SER rules don't meet that requirement.

3) Generating alerts

How do we tell our admins when something interesting happens? Mail the group inbox? Inject into Nagios?

So, how're you solving this problem? I don't expect an answer on a plate; I can work out the details myself but some high-level discussion on what is surely a common problem would be great. At the moment we're using a mishmash of cron jobs, syslog and who knows what else to find events. This isn't extensible, maintainable or flexible and as such we miss a lot of stuff we shouldn't.

Updated: we're already using Nagios for monitoring which is great for detected down hosts/testing services/etc but less useful for scraping log files. I know there are log plugins for Nagios but I'm interested in something more scalable and hierarchical than per-host alerts.

4 Answers

Voted

Gary Richardson · Answer 1 · 2009-05-02T10:25:43+08:00

Best Answer

Gary Richardson

2009-05-02T10:25:43+08:002009-05-02T10:25:43+08:00

I've used three different systems for centralizing logs:

Syslog/syslog-ng forwarding to one host
Zenoss for aggregating and alerting events
Splunk for log aggregation and search

For #3, I typically use syslog-ng to forward the messages from each host directly into splunk. It can also parse log files directly, but that can be a bit of a pain.

Splunk is pretty awesome for search and categorizing your logs. I haven't used splunk for log alerting, but I think it's possible.

5

Guillaume · Answer 2 · 2009-06-11T04:22:47+08:00

Guillaume

2009-06-11T04:22:47+08:002009-06-11T04:22:47+08:00

You can take a look at OSSEC, a complete open-source HIDS, it does log analysis & can trigger actions or send mail on alerts. Alerts are trigered by a set of simple XML based rules, a lot of pre-defined ones for various log formats are included and you can add your own rules

http://www.ossec.net/

2

sebthebert · Answer 3 · 2009-05-06T15:01:41+08:00

sebthebert

2009-05-06T15:01:41+08:002009-05-06T15:01:41+08:00

Take a look at Octopussy. It's fully customizable and seems to answer all your needs...

PS: I'm the developer of this solution.

1

gimel · Answer 4 · 2009-05-02T05:17:42+08:00

gimel

2009-05-02T05:17:42+08:002009-05-02T05:17:42+08:00

You need to look into a monitoring system, for example Zenoss Core. Among other things, it says on the intro page:

Zenoss Event Monitoring and Management provides the ability to aggregate log and event information from various sources including availability monitoring, performance monitoring, syslog sources, SNMP trap sources, Windows Event log.

See what-tool-do-you-use-to-monitor-your-servers.

0

Log transport and aggregation at scale

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?