In bash I can grep some time measurements from a log file like this
grep "time:" myLogfile.txt | cut -d' ' -f 3 >> timeMeasurements.txt
#timeMeasurements.txt
2.5
3.5
2.0
...
Now I would like to compute the mean value from the values in timeMeasurements.txt
. What is the quickest way to do that in bash?
I know that there is gnuplot and R but it seems like one has to write some lengthy script for either one on them.
Obligatory GNU datamash version
ASIDE: it feels like this really should be possible natively in
bc
(i.e. without using the shell, or an external program, to loop over input values). The GNUbc
implementation includes aread()
function - however it appears to be frustratingly difficult to get it to detect end-of-input. The best I could come up with is:which you can then pipe file input to provided you terminate input with any non-numeric character e.g.
You could use
awk
. Bash itself is not very good at maths...Notes
lines=0; total=0
set variables to 0lines++
increaselines
by one for each linetotal+=$1
add the value in each line to the running totalprint total/lines
when done, divide the total by the number of valuesAnother way, using
sed
andbc
:The sed expression converts the input to something like this:
This is piped to
bc
which evaluates it line-by-line.Adapting the R command from this U&L post:
You can use
bc
the basic calculator, in awhile
loop withread
:Or more readably:
Explanation:
while read -r num; do ... ; done < timeMeasurements.txt
to do this. This will mean that we'll do something for each line of the file.((count++))
.$(...)
withecho
piped tobc
to add the value of the num variable for this line of the file, to the sum of the num variable from all previous lines.bc
is used as bash does not cope well with floating point arithmetic.At this point the loop ends, the count variable contains the number of time measurement values, the sum variable contains the sum of the time measurements.
echo
with our variables to create the mean calculation which is passed tobc
. Thescale=2
part tellsbc
how many significant figures to display.The datamash one seems a good option, but even acknowledging that my answer may be overkill, just in case you want to do a bit more than just a mean, octave is not so verbose:
If you are doing means, remember that the same mean could come from very different behaviours, so the standard deviation is usually also relevant:
or even a simple histogram is easy to do:
Also, I think datamash is not in the apt-get repositories for trusty, only for newer versions.
Edit:
Oneliner, for more script-friendly usages: