I have a file that consists of lines like this (other numbers included). This is part of the output of
$ grep 1848 filename.csv
How can I find the top 5 lines which have the lowest third column in the .csv
file given that 1848 is either in the first or second column?
1848,2598,11.310694021273559
1848,2599,10.947275955606203
1848,2600,10.635270124233982
1848,2601,11.916564552040725
1848,2602,12.119810736845844
1848,2603,12.406661156256154
1848,2604,10.636275056472996
1848,2605,12.549890992708612
1848,2606,9.783802450936204
1848,2607,11.253697489670264
1848,2608,12.16385432290674
1848,2609,10.30355814063016
1848,2610,12.102525596913923
1848,2611,11.636595992818505
1848,2612,10.741178028606866
1848,2613,11.352414275107423
1848,2614,12.204860161717253
1848,2615,12.959915468475387
1848,2616,11.320652192610872
Unfortunately 1848 sometimes appears in third column as well and I need to ignore that:
6687,8963,9.241848677632822
6687,9111,10.537325656184889
6687,9506,11.315629894841848
With GNU sort:
(if the first column may have less or more than exactly 4 digits, replace
{4}
with+
)Output:
With just
awk
:BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"}
sets the order of any array that will be created according to the index, according to the digits, in an ascending style$1==1848||$2==1848 {a[$3]=$0}
checks if the first or the second field is 1848, if so an then the third field ($3
) is taken as an arraya
index, with the value being the whole record ($0
)In the
END {for(i in a) print a[i]}
, we are simple iterating over the keys and printing the valuesTo get only the 5 records, add
head -5
at end:Just for the sake of completeness, you can obviously get only the first 5 records by incorporating a tiny
break
logic in theEND
looping, no need fortail
:Example: