How can I use docker without sudo?

Question

André M. Faria

Asked: 2015-05-19 13:11:46 +0800 CST2015-05-19 13:11:46 +0800 CST 2015-05-19 13:11:46 +0800 CST

Remove duplicated from two files and merge the unique ones

772

I Have two big text files, checksums_1.txt and checksums_2.txt, I want to parse these files and remove duplication between them and merge the unique lines in one file.

Each file have the following structure for each line.

size, md5, path

Example: Checksums_1.txt

9565, a4fs614as6fas4fa4s46fsaf1, /mnt/app/1tier/2tier/filename.exe
9565, a4fs614as6fas4fa4s46fsaf1, /mnt/app/1tier/2tier/filename2.exe

Example: Checksums_2.txt

9565, a4fs614as6fas4fa4s46fsaf1, /mnt/temp/1tier/2tier/filename.exe
9565, a4fs614as6fas4fa4s46fsaf1, /mnt/temp/1tier/2tier/filename2.exe
9565, a4fs614as6fas4fa4s46fsaf1, /mnt/temp/1tier/2tier/newfile.exe

The section that have to be used to check between the checksums_1.txt and checksums_2.txt is after the mountpoint /mnt/app/ and /mnt/temp/, In other words, from the start of each line to the end of the mountpoint /mnt/temp/ or /mnt/app/ will be ignored.

The data inside checksums_1.txt is more important, so if a a duplicated is found the line in checksums_1.txt must be moved to the merged file.

Part of Checksums_1.txt

1058,b8203a236b4f15316e516165a6546666,/mnt/app/Certificados/ca.crt
2694,8a815adefde4fa0c263e74832b15de64,/mnt/app/Certificados/ca.db.certs/01.pem
136,77bf2e5313dbaac4df76a4b72df2e2ad,/mnt/app/Certificados/ca.db.index

Part of Checksums_2.txt

1058,b8203a236b4f1531616318284202c9e6,/mnt/temp/Certificados/ca.crt
3,72b2ac90f7f3ff075a937d6be8fc3dc3,/mnt/temp/Certificados/ca.db.serial 
2694,8a815adefde4fa0c263e74832b15de64,/mnt/temp/Certificados/ca.db.certs/01.pem
136,77bf2e5313dbaac4df76a4b72df2e2ad,/mnt/temp/Certificados/ca.db.index

Example of the merged file

1058,b8203a236b4f15316e516165a6546666,/mnt/app/Certificados/ca.crt 
3,72b2ac90f7f3ff075a937d6be8fc3dc3,/mnt/temp/Certificados/ca.db.serial 
2694,8a815adefde4fa0c263e74832b15de64,/mnt/app/Certificados/ca.db.certs/01.pem
136,77bf2e5313dbaac4df76a4b72df2e2ad,/mnt/app/Certificados/ca.db.index

3 Answers

Voted

lgpasquale · Answer 1 · 2015-05-20T07:00:20+08:00

If you are willing to use python (therefore if performance is not an issue), what you want can be achieved with the following script:

#!/usr/bin/env python3

import sys
import csv
import re

mountpoint1 = "/mnt/app/"
mountpoint2 = "/mnt/temp/"

if (len(sys.argv) != 4):
    print('Usage: {} <input file 1> <input file 2> <output file>'.format(sys.argv[0]))
    exit(1)

inputFileName1 = sys.argv[1]
inputFileName2 = sys.argv[2]
outputFileName = sys.argv[3]

# We place entries from both input files in the same dictionary
# The key will be the filename stripped of the mountpoint
# The value will be the whole line
fileDictionary = dict()

# First we read entries from file2, so that those
# from file2 will later overwrite them when needed
with open(inputFileName2) as inputFile2:
    csvReader = csv.reader(inputFile2)
    for row in csvReader:
        if len(row) == 3:
            # The key will be the filename stripped of the mountpoint
            key = re.sub(mountpoint2, '', row[2])
            # The value will be the whole line
            fileDictionary[key] = ','.join(row)

# Entries from file1 will overwrite those from file2
with open(inputFileName1) as inputFile1:
    csvReader = csv.reader(inputFile1)
    for row in csvReader:
        if len(row) == 3:
            # The key will be the filename stripped of the mountpoint
            key = re.sub(mountpoint1, '', row[2])
            # The value will be the whole line
            fileDictionary[key] = ','.join(row)

# Write all the entries to the output file
with open(outputFileName, 'w') as outputFile:
    for key in fileDictionary:
        outputFile.write(fileDictionary[key])
        outputFile.write('\n')

Simply save the script as merge-checksums.py, give it execution permission

chmod u+x merge-checksums.py

and run it as:

./merge-checksums.py Checksums_1.txt Checksums_2.txt out.txt

A.B. · Answer 2 · 2015-05-20T07:47:15+08:00

The bash version (with awk and grep):

#!/bin/bash

filename1="$1"
filename2="$2"

keys=$(awk -F'/' '{ for(i=4;i<NF;i++) printf "%s",$i "/"; if (NF) printf "%s",$NF; printf "\n"}' "$filename1" "$filename2" | awk '{gsub(/^[ \t]+|[ \t]+$/,"")};1' | sort -u)

while read -r key
do
    match=$(grep "$key" "$filename1")
    if [ "$match" != "" ]
    then
        echo "$match"
    else
        grep "$key" "$filename2"
    fi
done <<< "$keys"

Checksums_1.txt

9565, 1111111111111111111111111, /mnt/app/1tier/2tier/filename.exe
9565, 0000000000000000000000000, /mnt/app/1tier/2tier/filename2.exe

Checksums_2.txt

9565, 2222222222222222222222222, /mnt/temp/1tier/2tier/filename.exe
9565, 0000000000000000000000000, /mnt/temp/1tier/2tier/filename2.exe
9565, 3333333333333333333333333, /mnt/temp/1tier/2tier/newfile.exe

Run with

./merge_checksum Checksums_1.txt Checksums_2.txt > Checksums_3.txt

Checksums_3.txt

9565, 1111111111111111111111111, /mnt/app/1tier/2tier/filename.exe
9565, 0000000000000000000000000, /mnt/app/1tier/2tier/filename2.exe
9565, 3333333333333333333333333, /mnt/temp/1tier/2tier/newfile.exe

Or with interchanged input files

./merge_checksum Checksums_2.txt Checksums_1.txt > Checksums_3.txt

Checksums_3.txt

9565, 0000000000000000000000000, /mnt/temp/1tier/2tier/filename2.exe
9565, 2222222222222222222222222, /mnt/temp/1tier/2tier/filename.exe
9565, 3333333333333333333333333, /mnt/temp/1tier/2tier/newfile.exe

Jacob Vlijm · Answer 3 · 2015-05-20T10:29:05+08:00

Assuming both files are not huge, the python script below will do the job as well.

How it works

Both files are read by the script. The lines in file_1 (the file that has precedence) is split by the directory you entered for the file in the head section (in your example /mnt/app/).

Subsequently, the lines in file_1 are written to the output file (the merged file). At the same time, lines from file_2 are removed from the line list if the identifying string (the section after the mount point) occurs in the line. Finally, the "remaining" lines of file_2 (of which no dupe exist in file_1) are written to the output file as well. The result:

file_1:

1058,b8203a236b4f15316e516165a6546666,/mnt/app/Certificados/ca.crt
2694,8a815adefde4fa0c263e74832b15de64,/mnt/app/Certificados/ca.db.certs/01.pem
136,77bf2e5313dbaac4df76a4b72df2e2ad,/mnt/app/Certificados/ca.db.index

file_2:

1058,b8203a236b4f15316e516165a6546666,/mnt/app/Certificados/ca.crt
3,72b2ac90f7f3ff075a937d6be8fc3dc3,/mnt/temp/Certificados/ca.db.serial
2694,8a815adefde4fa0c263e74832b15de64,/mnt/app/Certificados/ca.db.certs/01.pem
136,77bf2e5313dbaac4df76a4b72df2e2ad,/mnt/app/Certificados/ca.db.index

merged:

1058,b8203a236b4f15316e516165a6546666,/mnt/app/Certificados/ca.crt
2694,8a815adefde4fa0c263e74832b15de64,/mnt/app/Certificados/ca.db.certs/01.pem
136,77bf2e5313dbaac4df76a4b72df2e2ad,/mnt/app/Certificados/ca.db.index
3,72b2ac90f7f3ff075a937d6be8fc3dc3,/mnt/temp/Certificados/ca.db.serial

The script

#!/usr/bin/env python3
#---set the path to file1, file2 and the mountpoint used in file1 below
f1 = "/path/to/file_1"; m_point = "/mnt/app"; f2 = "/path/to/file_2"
merged = "/path/to/merged_file"
#---
lines1 = [(l, l.split(m_point)[-1]) for l in open(f1).read().splitlines()]
lines2 = [l for l in open(f2).read().splitlines()]

for l in lines1:
    open(merged, "a+").write(l[0]+"\n")
    for line in [line for line in lines2 if l[1] in line]:
            lines2.remove(line)

for l in lines2:
    open(merged, "a+").write(l+"\n")

How to use

Copy the script into an empty file, save it as merge.py
in the head section of the script, set the paths to f1 (file_1), f2, the path to the merging file and the mountpoint as mentioned in file_1.
Run it by the command:
```
python3 /path/to/merge.py
```

Edit

Or a tiny bit shorter:

#!/usr/bin/env python3
#---set the path to file1, file2 and the mountpoint used in file1 below
f1 = "/path/to/file_1"; m_point = "/mnt/app"; f2 = "/path/to/file_2"
merged = "/path/to/merged_file"
#---
lines = lambda f: [l for l in open(f).read().splitlines()]
lines1 = lines(f1); lines2 = lines(f2); checks = [l.split(m_point)[-1] for l in lines1]
for item in sum([[l for l in lines2 if c in l] for c in checks], []):
    lines2.remove(item)
for item in lines1+lines2:
    open(merged, "a+").write(item+"\n")

Remove duplicated from two files and merge the unique ones

How it works

The script

How to use

Edit

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?