How can I use docker without sudo?

Question

Suat Yazıcı

Asked: 2018-08-22 20:19:05 +0800 CST2018-08-22 20:19:05 +0800 CST 2018-08-22 20:19:05 +0800 CST

How can I find duplicate in the first column, then remove concerning whole lines ?

772

I have one xlsx file (110725x9 matrix) and I saved as type text (tab delemited) because I don't know whether Unix helps for xlsx files or not. Duplicates rows are always successive line by line.

For example, suppose text file as follow. You will see 3,4-th, 7,8-th and 17,18-th rows are same. I'd like to remove upper duplicate lines not lower always.

2009,37214611872    2009    135 20  17,1    17,4    19,2    21,8    24,1
2009,37237442922    2009    135 22  16,5    14,5    12,6    11,2    10,5
2009,37260273973    2009    136 0   7,7     7,2     7,1     7,3     7,5
2009,37260273973    2009    136 0   7,7     7,2     7,0     7,2    7,4
2009,37488584475    2009    136 20  14,6    15,1    16,4    18,3    20,1
2009,37511415525    2009    136 22  15,9    14,6    12,8    10,9    9,4
2009,37534246575    2009    137 0   8,2     6,9     6,2     6,2     6,4
2009,37534246575    2009    137 0   8,1     6,8     6,1     6,0     6,3
2009,37557077626    2009    137 2   6,8     6,7     6,5     6,3     6,2
2009,37579908676    2009    137 4   5,8     5,6     5,4     5,4     5,7
2009,37602739726    2009    137 6   6,3     6,1     5,9     5,8     5,8
2009,37625570776    2009    137 8   4,5     5,2     6,0     6,6     7,2
2009,37648401826    2009    137 10  9,6     9,0     8,4     8,4     9,1
2009,37671232877    2009    137 12  11,4    11,7    12,4    13,4    14,4
2009,37694063927    2009    137 14  12,4    13,1    14,2    15,4    16,7
2009,37785388128    2009    137 22  15,5    14,0    12,2    10,3    8,7
2009,37808219178    2009    138 0   6,3     5,8     5,5     5,5     5,8
2009,37808219178    2009    138 0   6,2     5,7     5, 4    5,4     5,7

So output should be like that:

2009,37214611872    2009    135 20  17,1    17,4    19,2    21,8    24,1
2009,37237442922    2009    135 22  16,5    14,5    12,6    11,2    10,5
2009,37260273973    2009    136 0   7,7     7,2     7,0     7,2    7,4
2009,37488584475    2009    136 20  14,6    15,1    16,4    18,3    20,1
2009,37511415525    2009    136 22  15,9    14,6    12,8    10,9    9,4
2009,37534246575    2009    137 0   8,1     6,8     6,1     6,0     6,3
2009,37557077626    2009    137 2   6,8     6,7     6,5     6,3     6,2
2009,37579908676    2009    137 4   5,8     5,6     5,4     5,4     5,7
2009,37602739726    2009    137 6   6,3     6,1     5,9     5,8     5,8
2009,37625570776    2009    137 8   4,5     5,2     6,0     6,6     7,2
2009,37648401826    2009    137 10  9,6     9,0     8,4     8,4     9,1
2009,37671232877    2009    137 12  11,4    11,7    12,4    13,4    14,4
2009,37694063927    2009    137 14  12,4    13,1    14,2    15,4    16,7
2009,37785388128    2009    137 22  15,5    14,0    12,2    10,3    8,7
2009,37808219178    2009    138 0   6,2     5,7     5, 4    5,4     5,7

How can I do that without sorting?

2 Answers

Voted

muru · Answer 1 · 2018-08-22T22:26:54+08:00

Best Answer

muru

2018-08-22T22:26:54+08:002018-08-22T22:26:54+08:00

To remove duplicates based on a single column, you can use awk:

awk '!seen[$1]++' input-file > output-file

You can see an explanation for this in this Unix & Linux post.

Removing the older lines is more complicated. Given that duplicates always come together, you can do:

awk 'prev && ($1 != prev) {print seen[prev]} {seen[$1] = $0; prev = $1} END {print seen[$1]}' input-file > output-file

Here, in the middle block, {seen[$1] = $0} saves the current line ($0) to the seen array with the first field ($1) as index, then saves the first field in the prev variable. This prev is used in the first block when processing the next line.

In the first block, then, we check if prev is set (only true for the second line onwards) and not equal to the current first field (here prev was set while processing the previous line). If it isn't, we have moved past duplicates and can print the previous line. At the END, we do that again for the last line.

8

user986805 · Answer 2 · 2019-11-21T10:33:11+08:00

user986805

2019-11-21T10:33:11+08:002019-11-21T10:33:11+08:00

Using tac and uniq.

$ tac text.txt | uniq -w 16 | tac

0

How can I find duplicate in the first column, then remove concerning whole lines ?

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?