I want to count the number of different name in a text file of this presentation:
2008 girl Avah
2009 girl Avah
2008 girl Carleigh
2011 girl Kenley
2012 boy Joseph
2013 boy Joseph
2014 boy Isaac
2014 boy Brandon
So basically I want to skip the duplicate and have as an answer 6. I tried with awk
to access only the third column but I can't get it to print the number of lines.
with
awk
:if new name found
!nameSeen[$3]++
increment countercount++
and at theEND
print counter value.Since your file appears to be pre-sorted on the name column, you could use
uniq
with the-f
(--skip-fields
) option to output only the first line of each name, and count lines:or
If your data are not pre-sorted, you can combine
sort -u
with a-k
field specification to achieve the same thing (although it's not clearly documented in the GNUsort
man page):It's overkill for this task, however you could also use GNU Datamash:
A rather simple quick way that explains itself:
A notice to @Rebi Khalifa:
@αғsнιη rightly wrote in the comments below:
@steeldriver rightly wrote in the comments below:
They both used field selection approach which is the same approach you were trying to implement to solve your issue based on what you wrote in your question:
One does not need to be sophisticated to get things done in Ubuntu! Things can be done in many unimaginable ways.
One way which praises the KISS principle is to pipe
|
simple commands one to the next until mission is accomplished:cat FileName
-->|
-->sed 's/[0-9]*//g'
-->|
-->sed 's/\<boy\>//g'
-->|
-->sed 's/\<girl\>//g'
-->|
-->sort -u
-->|
-->wc -l
-->A really short and easy way, using Miller (https://github.com/johnkerl/miller)