I'm trying to find a way to parse our Amazon S3 access logs to get some webstats.
I've been trying to use AWStats 7, but I got to the point of where after day 9 of a given month it can't process any more logs because it runs out of memory. This server has 4gigs of memory
Our S3 logs are rather big(~1gig/day) and soon CloudFront logs could be 10-20gigs/day.
Is there any software that can generate webstats from S3(and soon cloudfront) logs?
I know about s3stat.com but I want something I can run on my own.
I'd suggest GoAccess. We are parsing about 120 million hits in about ~35mins, which is way faster than awstats. Seems like it doesn't consume a lot ram. (< 1GB) It's running on a 8GB RAM system.
You should give it a try though.
I'd consider running karmasphere analyst on EMR to run SQL queries against your CloudFront log directory (KSA knows how to query from bucket->folder->gzip->.log)
http://aws.amazon.com/elasticmapreduce/karmasphere/