How can I use docker without sudo?

Question

Lynob

Asked: 2014-11-08 04:29:10 +0800 CST2014-11-08 04:29:10 +0800 CST 2014-11-08 04:29:10 +0800 CST

Comparing two text files

772

I have 2 big csv files, file1.csv which looks like this

1,2,3,4
1,4,5,6
1,7,8,9
1,11,13,17

file2.csv which looks like this

1,2,3,4
1,7,8,9
2,4,9,10
13,14,17,18

These are just random numbers that I made up, basically the two numbers where identical, and sorted. I want to compare file1.csv and file2.csv and then copy the rows that are present in file1.csv but not in file2.csv to file3.csv. the delimiter is comma obviously

I tried

comm -2 -3 file.csv file2.csv > file3.csv

and I tried

diff -u file.csv file2.csv >> file3.csv

Both didn't work because file3 was bigger than file1 and file2. I tried different diff and comm commands, sometimes it's bigger than file2 and about the same size as file file1, I know that file3 has to be significantly less in size than file1 and file2. And of course I looked at file3, not the results I wanted

At this point, I know it could be done with diff or comm but I do not know the command to use.

4 Answers

Voted

Lety · Answer 1 · 2014-11-08T04:44:56+08:00

Best Answer

Lety

2014-11-08T04:44:56+08:002014-11-08T04:44:56+08:00

Try this command:

 grep -v -f file2.csv file1.csv > file3.csv

According to grep manual:

  -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

  -v, --invert-match
          Invert the sense of matching, to select non-matching lines.  (-v
          is specified by POSIX.)

As Steeldriver said in his comment is better add also -x and -F that:

  -F, --fixed-strings
          Interpret PATTERN as a  list  of  fixed  strings,  separated  by
          newlines,  any  of  which is to be matched.  (-F is specified by
          POSIX.)
  -x, --line-regexp
          Select  only  those  matches  that exactly match the whole line.
          (-x is specified by POSIX.)

So, better command is:

 grep -xvFf file2.csv file1.csv > file3.csv

This command use file2.csv line as pattern and print line of file1.csv that doesn't match (-v).

9

choroba · Answer 2 · 2014-11-08T04:49:51+08:00

choroba

2014-11-08T04:49:51+08:002014-11-08T04:49:51+08:00

In order to be able to use comm, you have to sort the lines first.

comm -23 <(sort file1.csv) <(sort file2.csv) > file3.csv

7

Jacob Vlijm · Answer 3 · 2014-11-08T04:48:18+08:00

Jacob Vlijm

2014-11-08T04:48:18+08:002014-11-08T04:48:18+08:00

A python option:

#!/usr/bin/env python3

import sys

def readfile(file):
    with open(file) as src:
        return [line.strip() for line in src.readlines()]

lines_1 = readfile(sys.argv[1]); lines_2 = readfile(sys.argv[2])

for line in lines_1:
    if not line in lines_2:
        print(line)

Output:

1,4,5,6
1,11,13,17

Paste the script into an empty file as extract.py, make it executable and run it by the command:

<script> <file_1> <file_2>

Or, to write it directly to file_3:

<script> <file_1> <file_2> >file_3

3

αғsнιη · Answer 4 · 2014-11-08T05:46:02+08:00

αғsнιη

2014-11-08T05:46:02+08:002014-11-08T05:46:02+08:00

Using diff command do grep and no storing required.

Output if lines exist in file1 but not in file2:

$ diff file{1,2}.csv | grep -Po "^< \K.*"
1,4,5,6
1,11,13,17

And output if lines exist in file2 but not in file1, with just changing left angle(<) to right angle(>):

$ diff file{1,2}.csv | grep -Po "^> \K.*"
2,4,9,10
13,14,17,18

3

Comparing two text files

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?