So basically what I want to do is compare two file by line by column 2. How could I accomplish this?
File_1.txt:
User1 US
User2 US
User3 US
File_2.txt:
User1 US
User2 US
User3 NG
Output_File:
User3 has changed
So basically what I want to do is compare two file by line by column 2. How could I accomplish this?
File_1.txt:
User1 US
User2 US
User3 US
File_2.txt:
User1 US
User2 US
User3 NG
Output_File:
User3 has changed
Look into the
diff
command. It's a good tool, and you can read all about it by typingman diff
into your terminal.The command you'll want to do is
diff File_1.txt File_2.txt
which will output the difference between the two and should look something like this:A quick note on reading the output from the third command: The 'arrows' (
<
and>
) refer to what the value of the line is in the left file (<
) vs the right file (>
), with the left file being the one you entered first on the command line, in this caseFile_1.txt
Additionally you might notice the 4th command is
diff ... | tee Output_File
this pipes the results fromdiff
into atee
, which then puts that output into a file, so that you can save it for later if you don't want to view it all on the console right that second.Or you can use Meld Diff
Install by running:
Your example:
Compare directory:
Example with full of text:
You can use vimdiff.
Example:
FWIW, I rather like what I get with side-by-side output from diff
would give something like:
You can use the command
cmp
:output would be
Meld
is a really great tool. But you can also usediffuse
to visually compare two files:Litteraly sticking to the question (file1, file2, outputfile with "has changed" message) the script below works.
Copy the script into an empty file, save it as
compare.py
, make it executable, run it by the command:The script:
With a few extra lines, you can make it either print to an outputfile, or to the terminal, depending on if the outputfile is defined:
To print to a file:
To print to the terminal window:
The script:
An easy way is to use
colordiff
, which behaves likediff
but colorizes its output. This is very helpful for reading diffs. Using your example,where the
u
option gives a unified diff. This is how the colorized diff looks like:Install
colordiff
by runningsudo apt-get install colordiff
.Install git and use
And you will get output in nice colored format
Git installation
colcmp.sh
Compares name/value pairs in 2 files in the format
name value\n
. Writes thename
toOutput_file
if changed. Requires bash v4+ for associative arrays.Usage
Output_File
Source (colcmp.sh)
Explanation
Breakdown of the code and what it means, to the best of my understanding. I welcome edits and suggestions.
Basic File Compare
cmp will set the value of $? as follows:
I chose to use a case..esac statement to evalute $? because the value of $? changes after every command, including test ([).
Alternatively I could have used a variable to hold the value of $?:
Above does the same thing as the case statement. IDK which I like better.
Clear the Output
Above clears the output file so if no users changed, the output file will be empty.
I do this inside the case statements so that the Output_file remains unchanged on error.
Copy User File to Shell Script
Above copies File_1.txt to the current user's home dir.
For example, if the current user is john, the above would be the same as cp "File_1.txt" /home/john/.colcmp.arrays.tmp.sh
Escape Special Characters
Basically, I'm paranoid. I know that these characters could have special meaning or execute an external program when run in a script as part of variable assignment:
What I don't know is how much I don't know about bash. I don't know what other characters might have special meaning, but I want to escape them all with a backslash:
sed can do a lot more than regular expression pattern matching. The script pattern "s/(find)/(replace)/" specifically performs the pattern match.
"s/(find)/(replace)/(modifiers)"
in english: capture any punctuation or special character as caputure group 1 (\\1)
in english: prefix all special characters with a backslash
in english: if more than one match is found on the same line, replace them all
Comment Out the Entire Script
Above uses a regular expression to prefix every line of ~/.colcmp.arrays.tmp.sh with a bash comment character (#). I do this because later I intend to execute ~/.colcmp.arrays.tmp.sh using the source command and because I don't know for sure the whole format of File_1.txt.
I don't want to accidentally execute arbitrary code. I don't think anyone does.
"s/(find)/(replace)/"
in english: capture each line as caputure group 1 (\\1)
in english: replace each line with a pound symbol followed by the line that was replaced
Convert User Value to A1[User]="value"
Above is the core of this script.
#User1 US
A1[User1]="US"
A2[User1]="US"
(for the 2nd file)"s/(find)/(replace)/"
in english:
capture the rest of the line as capture group 2
(replace) = A1\\[\\1\\]=\"\\2\"
A1[
to start array assignment in an array calledA1
]="
]
= close array assignment e.g.A1[
User1]="
US"
=
= assignment operator e.g. variable=value"
= quote value to capture spaces ... although now that i think about it, it would have been easier to let the code above that backslashes everything to also backslash space characters.in english: replace each line in the format
#name value
with an array assignment operator in the formatA1[name]="value"
Make Executable
Above uses chmod to make the array script file executable.
I'm not sure if this is necessary.
Declare Associative Array (bash v4+)
The capital -A indicates that the variables declared will be associative arrays.
This is why the script requires bash v4 or greater.
Execute our Array Variable Assignment Script
We have already:
User value
to lines ofA1[User]="value"
,Above we source the script to run it in the current shell. We do this so we can keep the variable values that get set by the script. If you execute the script directly, it spawns a new shell, and the variable values are lost when the new shell exits, or at least that's my understanding.
This Should Be a Function
We do the same thing for $1 and A1 that we do for $2 and A2. It really should be a function. I think at this point this script is confusing enough and it works, so I'm not gonna fix it.
Detect Users Removed
Above loops through associative array keys
Above uses variable substitution to detect the difference between a value that is unset vs a variable that has been explicitly set to a zero length string.
Apparently, there are a lot of ways to see if a variable has been set. I chose the one with the most votes.
Above adds the user $i to the Output_File
Detect Users Added or Changed
Above clears a variable so we can keep track of users that did not change.
Above loops through associative array keys
Above uses variable substitution to see if a variable has been set.
Because $i is the array key (user name) $A2[$i] should return the value associated with the current user from File_2.txt.
For example, if $i is User1, the above reads as ${A2[User1]}
Above adds the user $i to the Output_File
Because $i is the array key (user name) $A1[$i] should return the value associated with the current user from File_1.txt, and $A2[$i] should return the value from File_2.txt.
Above compares the associated values for user $i from both files..
Above adds the user $i to the Output_File
Above creates a comma separated list of users who did not change. Note there are no spaces in the list, or else the next check would need to be quoted.
Above reports the value of $USERSWHODIDNOTCHANGE but only if there is a value in $USERSWHODIDNOTCHANGE. The way this is written, $USERSWHODIDNOTCHANGE cannot contain any spaces. If it does need spaces, above could be rewritten as follows: