I have 2 big csv files, file1.csv which looks like this
1,2,3,4
1,4,5,6
1,7,8,9
1,11,13,17
file2.csv which looks like this
1,2,3,4
1,7,8,9
2,4,9,10
13,14,17,18
These are just random numbers that I made up, basically the two numbers where identical, and sorted. I want to compare file1.csv and file2.csv and then copy the rows that are present in file1.csv but not in file2.csv to file3.csv. the delimiter is comma obviously
I tried
comm -2 -3 file.csv file2.csv > file3.csv
and I tried
diff -u file.csv file2.csv >> file3.csv
Both didn't work because file3 was bigger than file1 and file2. I tried different diff
and comm
commands, sometimes it's bigger than file2 and about the same size as file file1, I know that file3 has to be significantly less in size than file1 and file2. And of course I looked at file3, not the results I wanted
At this point, I know it could be done with diff
or comm
but I do not know the command to use.
Try this command:
According to grep manual:
As Steeldriver said in his comment is better add also
-x
and-F
that:So, better command is:
This command use
file2.csv
line as pattern and print line offile1.csv
that doesn't match (-v
).In order to be able to use
comm
, you have to sort the lines first.A python option:
Output:
Paste the script into an empty file as
extract.py
, make it executable and run it by the command:Or, to write it directly to file_3:
Using
diff
command dogrep
and no storing required.Output if lines exist in file1 but not in file2:
And output if lines exist in file2 but not in file1, with just changing left angle(
<
) to right angle(>
):