Right now we are attempting to build gold configurations for our environment. One piece of software that we use relies on large XML
files to contain the bulk of its configuration. We want to take our lab environment, catalog it as our "gold configuration" and then be able to audit against that configuration in the future.
Since diff is a bytewise comparison and NOT a logical comparison, we can't use it to compare files in this case (XML
is unordered, so it won't work). What I am looking for is something that can parse the two XML
files, and compare them element by element. So far we have yet to find any utilities that can do this. OS doesn't matter, I can do it on anything where it will work. The preference is something off the shelf.
Any ideas?
Edit: One issue we have run into is one vendor's config files will occasionally mention the same element several times, each time with different attributes. Whatever diff utility we use would need to be able to identify either the set of attributes or identify them all as part of one element. Tall order :)
Two approaches that I use are (a) to canonicalize both XML files and then compare their serializations, and (b) to use the XPath 2.0 deep-equal() function. Both approaches are OK for telling you whether the files are the same, but not very good at telling you where they differ.
A commercial tool that specializes in this problem is DeltaXML.
If you have things that you consider equivalent, but which aren't equivalent at the XML level - for example, elements in a different order - then you may have to be prepared to do a transformation to normalize the documents before comparison.
Good answer here:
Apologies for any failure to adhere to serverfault conventions ... I'm sure someone will let me know and I will amend appropriately.
Python-based
xmldiff
looks like a very attractive solution; it claims to "extract differences between two xml files and to return a set of primitives to apply on source tree to obtain the destination tree."Example:
a1.xml
a2.xml
xmldiff a1.xml a2.xml
:I wrote a simple python tool for this called
xmldiffs
:Get it at https://github.com/joh/xmldiffs
There are commercial and free tools that are "XML aware" and will do what you need without much hassle. There is also apparently an eclipse plugin that will compare XML.
Alternatively, you could accomplish ordering the fields via xsl:sort or by specifying a sequence order in your XSD (which would make any existing documents invalid by definition.) Then "normalize" both documents before comparing with a non-xml aware diff tool. Normalizing first removes any formatting differences that don't affect the actual content of the XML. (Think of it as applying an XML code style filter.)