Ping a Specific Port

Question

Oneiroi

Asked: 2010-11-11 03:30:43 +0800 CST2010-11-11 03:30:43 +0800 CST 2010-11-11 03:30:43 +0800 CST

Apache access logs parsing for hosting cluster

772

We have been using awstats for some time now to parse our apache server logs into a format for the billing dept.

A set of custom python script are in use to generate the merge logs based on those passed from each of the servers in the hosting cluster/farm.

The issue I am currently facing is that our logs have grown considerably for certain projects, some generating ~30GB/day in uncompressed logs. awstats is not the most memory efficient of parsers, and will use upward of 1GB of memory to process these logs, (by comparison a python script + regex of mine will with in 450kb of memory).

What I need is a replacement to awstats that can handle large logfiles on a fairly basis and produce a "billing friendly" output.

Stats should include, bandwidth, unique visitos, vists per unique pages served etc ...

Idealy this should also allow us to import the historical Awstats data (which is currently in text files).

So in summary my question is, is there any software available to do this?

1 Answers

Voted

Oneiroi · Answer 1 · 2011-12-15T02:46:28+08:00

Best Answer

Oneiroi

2011-12-15T02:46:28+08:002011-12-15T02:46:28+08:00

As this has not been answered in over a year I thought I would post an update on my plans.

I'll be leveraging python multiprocessing to provides distributed processing of the logs, using custom map + reduce methodology.

If you find this question and do not want to "roll your own" there are a few hadoop projects around that may help (I suggest looking at pig).

1

Apache access logs parsing for hosting cluster

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?