I have nginx log file, and I want to find out market share for each major version of browsers. I am not interested in minor versions and operating systems. I would like to get something like this:
100 IE6
99 IE7
20 IE8
200 FF2
300 FF3
I know how to get the list of user agents from the file, but I want to aggregate the list to see only the major versions of the browsers. Is there a tool that does it?
awk(1)
- selecting full User-Agent string of GET requestscut(1)
- using first word from itsort(1)
- sortinguniq(1)
- countsort(1)
- sorting by count, reversedPS. Of course it can be replaced by one
awk
/sed
/perl
/python
/etc script. I just wanted to show how rich unix-way is.While the one liner by SaveTheRbtz does the job, it took several hours to parse my
nginx
access log.Here is a faster version based on his, which takes less than 1 minute per 100MB of log file (corresponding to about 1 million lines):
It works with the default access log format of
nginx
, which is the same as thecombined
format of Apache'shttpd
and has theUser-Agent
as the last field, delimited by"
.This is a slight variation of the accepted answer, using
fgrep
andcut
.There is something appealing about using "weaker" commands when it is possible.
Awstats should do the trick, but will supply far more information. I hope this helps...
Webalizer can do it.
Example:
-o reports_folder
specifies folder where report is generated-M 5
displays only the browser name and the major version numberlog_file
specifies log file nameTo get user agent
I'd use shell script for that: cat, awk pipe, sort and uniq will do the job