i'm looking for some handy program/script to which i can pump data via stdin and which can present me some basic statistics of input data. for instance - provided with set of values separated by new line character i would like to get:
- average for all values
- average for data except 5% smallest and 5% largest values
- standard deviation
yes - i know, can be done with bash or awk, but maybe you already know something handy?
ps.
i'm perfectly aware of 'big cannons' like octave, r and some other - but i need something much simpler.
thanks
you could try something along the lines of;
to get the average. And you could install the Statistics::Descriptive package http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm
to do what you need for the other requiremtns. stdev is probably easy, the other one would take a few more lines to sort and filter. (no doubt its possible to do in a single line...;-)
This little AWK snippet will do part of what you're looking for:
The drop 5% part would be a little more complicated and depend on exactly how you mean it.
I know you're looking for something canned, but short of using R, Octave, SAS or SPSS, I don't know of anything.
Edit: Corrected formula
R might be exactly what you are looking for, or it may be a total over kill for your purpose. Hard to tell from your question.
Anyways, check it out http://en.wikipedia.org/wiki/R_(programming_language)
The first and last items are do-able (I've done them a couple of times) without maintaining the entire set of data in memory and without knowing the total number of items in advance. The middle item (dropping the outliers) is more challenging and requires maintaining the entire list in RAM or at least knowing the total number of items in advance.
I don't know of any simple pre-built tools to do any of those (though Octave and R sound as though they might be such).