I am parsing my httpd access logs, to grab out which of our Google appliance's crawlers are bombarding my web servers, and when. If I type the command:
grep google /path/to/access_log | awk '{print $4, $14}'
I can get a very large (like I said, they're bombarding me) return set, at least 4 per second. I want to be able to group the above listed result set by the time stamp, and return a string inline with the number of hits there were per second. So ideally, I'd like to have something similar to
04/Aug/2011:15:56:16 Crawler1 6
04/Aug/2011:15:56:16 Crawler2 10
04/Aug/2011:15:56:17 Crawler1 8
04/Aug/2011:15:56:18 Crawler1 12
Where the first is the timestamp, second is the 14th field, the Google Crawler's ID, and third being the count. The order of the columns is irrelevant.
This can be done in a single awk, counting hits in the same second using arrays, but it is hard to test without a sample input. Lets guess: