XML Diff Comparator
Compare two XML documents locally with parser diagnostics, keyed path matching, namespace-aware values, patch-operation rows, XML change scripts, chart counts, and exports.{{ line.text }}
| {{ header }} | Copy |
|---|---|
| No rows to export for the current input. | |
{{ cell.value }}
{{ cell.value || ' ' }}
{{ cell.value }}
|
XML diffs are most useful when they compare the document tree instead of treating the file as ordinary lines of text. A feed, SOAP payload, sitemap, SVG, WSDL, Maven file, or vendor configuration can change in ways that are easy to miss in a plain text diff: one attribute value changes, a repeated record moves, a namespace prefix is renamed, or a direct text value is edited inside a deeply nested element.
A parsed XML comparison works from the elements, attributes, namespace-aware names, and direct text that an XML parser can read. That makes the review closer to the data structure the receiving system will see. It also makes the limits clearer: the documents must be well-formed XML before any comparison is trustworthy, and a successful parse is not the same as validation against an XML Schema, DTD, or application contract.
Tree-aware comparison is helpful during release reviews, data migration checks, configuration audits, and vendor payload debugging. The important question is usually not only whether two files differ, but where the changed value lives and whether the difference is a real data change or review noise caused by order, whitespace, or namespace-prefix choices.
How to Use This Tool:
Start with a Review profile, then place the baseline document in Original XML and the changed document in Revised XML. You can paste text, format either side, browse for one XML-like file, drop one file onto each textarea, or load the sample pair to see the expected review flow.
- Choose Review profile for the starting rule set. Config drift, schema evolution, document content, and strict position audits tune matching, scope, whitespace, namespace, context, and identity attributes.
- Review Node matching. Keyed sibling matching uses identity attributes such as
id,name,key,code,uid, andskubefore falling back to sibling position. Exact path position treats repeated elements as position-dependent. - Paste or load the original XML. If the file is not XML-like or is larger than 2 MB, the file status explains what to fix before the comparison can proceed.
- Paste or load the revised XML. Results update after edits, drops, file loads, or rule changes, so you can adjust settings without rerunning a separate command.
- Set Diff scope to show changed, added, and removed records for review, the full parsed tree for audit, content-only differences, or text-node changes only.
- Pick a Whitespace policy. Normalized whitespace reduces formatting noise by compacting text and attribute spacing; preserved whitespace is better when spaces inside text values are meaningful.
- Pick a Namespace names policy. Local names with namespace URI help align documents that use different prefixes for the same namespace, while qualified prefix names preserve literal prefix differences.
- Review the summary, then use XML Change Script, Node Change Ledger, XML Patch Ops, Comparison Audit, XML Change Mix Chart, and JSON for the level of detail you need.
When the summary says Needs valid XML, fix the parse message first. A diff generated from a failed parse would hide the document structure that XML consumers actually read.
Interpreting Results:
The headline count is the number of material parsed-record differences across the full XML tree. A material difference can be a changed value, a record that appears only in the revised document, or a record that appears only in the original document. The visible ledger count can be smaller when the selected diff scope filters the review down to text-only or content-only changes.
- Changed means both documents contain the matched record, but the compared value differs after the active namespace and whitespace policies are applied.
- Added means the record exists only in the revised XML.
- Removed means the record exists only in the original XML.
- Unchanged appears when the full tree or context rows include records that matched without a material value change.
Do not treat a low change count as proof that two XML files are equivalent for every consuming system. Check the Comparison Audit for matching, namespace, whitespace, depth, and warning notes, then inspect the exact paths and values in the Node Change Ledger.
Technical Details:
XML has a logical tree structure made from elements, attributes, character data, processing instructions, comments, namespaces, and other information items. A practical diff for configuration and data-review work usually focuses on the parts that affect application data most often: element names, attribute values, and text held directly inside elements.
Well-formedness comes first. XML processors are required to detect malformed syntax, such as missing end tags or improperly nested markup, before a reliable tree can be built. Validation is a separate question. A well-formed file can still violate an XSD, DTD, business rule, required field, or allowed-value list.
Transformation Core:
The comparison turns each readable XML document into comparable records, then assigns each record a review status.
| Stage | Compared content | Review effect |
|---|---|---|
| Parse | Each document must produce one XML document element. | Empty input, malformed XML, or missing document elements stop the comparison and produce an input issue. |
| Record collection | Element entries, attributes, and direct text or CDATA text are recorded with display paths. | Comments, processing instructions, schema validity, and DTD semantics are not treated as ledger rows. |
| Path matching | Repeated elements match by identity attributes when available, or by sibling position when no key is found. | Stable keys reduce reorder noise in feeds, orders, inventories, and configuration lists. |
| Value comparison | Matched element names, attribute values, and direct text values are compared after the active whitespace and namespace policies. | Each comparable record becomes changed, added, removed, or unchanged. |
| Scope filtering | The full comparison is filtered for the selected review scope. | The summary can count more material changes than the currently visible ledger when the scope is narrower. |
| Patch operation mapping | Material changes are expressed as add, remove, or replace operations with XPath-like selectors. | The XML Patch Ops tab gives reviewers a compact handoff list separate from the full ledger. |
Rule Core:
Most interpretation errors come from matching policy, namespace treatment, or whitespace treatment rather than from the changed value itself.
| Choice | Rule | Use when | Watch for |
|---|---|---|---|
| Keyed sibling matching | Identity attributes are tried before sibling position. | Repeated records have stable values such as id, sku, or code. |
Wrong or missing identity attributes can still fall back to position. |
| Exact path position | Repeated siblings match by their order in the parsed tree. | Order itself is meaningful, such as ordered steps or priority lists. | Pure reordering can appear as value changes. |
| Normalize whitespace | Runs of whitespace are compacted before text and attribute values are compared. | Indentation, wrapping, or pretty-printing should not dominate the diff. | Meaningful spacing inside a text value may be hidden. |
| Preserve exact whitespace | Text and attribute values keep their exact spacing for comparison. | Whitespace carries meaning in the data or review policy. | Formatting-only edits can create many text changes. |
| Local names and namespace URI | Names are aligned by namespace URI plus local name. | Two documents use different prefixes for the same vocabulary. | Namespace URI strings must still match exactly. |
| Qualified prefix names | Literal prefixes are kept as part of the names shown and compared. | Prefix changes are part of the review contract. | Prefix aliases can add noise when the namespace URI is what matters. |
Canonical XML is a stricter standards-defined way to normalize a document representation for signature and equivalence use cases. This comparator does not claim canonical XML equivalence. It gives a review-oriented diff over parsed records so humans can inspect changed paths, values, and comparison settings.
Limitations and Privacy Notes:
The comparison runs in the browser and is meant for review, not for proving every XML-processing or security property of a document. Use the output as a structured change report, then confirm critical changes against the receiving system's schema, parser, and business rules.
- Browsed or dropped files are limited to one XML-like file per input and must be smaller than 2 MB.
- Only well-formed XML with a document element can be compared.
- The ledger focuses on elements, attributes, and direct text or CDATA text. It does not validate schemas or compare comments, processing instructions, DTD behavior, or external entity behavior as review rows.
- XML security issues such as external entities and entity expansion depend on parser configuration in the environment that will process the XML. Treat untrusted XML carefully before using it in production systems.
Worked Examples:
A quantity update in an order is a typical value-change case. With original XML like <item sku="A1"><qty>2</qty></item> and revised XML changing the quantity to 3, keyed sibling matching keeps the record anchored to sku="A1". The summary reports a material change, and Node Change Ledger shows a Changed text row at a path ending in /qty[1]/text() with original value 2 and revised value 3.
A reordered list needs a matching decision before the count means much. If an order contains <item sku="A1"/> and <item sku="B2"/> in one order and the revised XML swaps them, Keyed sibling matching can align each item by sku. Exact path position treats the first item position and second item position as the comparison anchors, so the Comparison Audit warning about reorder noise is a cue to confirm whether order is meaningful.
A parse failure is the troubleshooting path to fix before any diff review. If the original input contains <invoice><status>draft</invoice>, the summary changes to Needs valid XML, the alert reports an original XML parse error, and the Comparison Audit shows an Input issue. Fixing the missing </status> end tag is required before ledger rows or chart counts can be trusted.
FAQ:
Does this validate XML against an XSD or DTD?
No. It reports parse errors for XML that is not well formed and then compares parsed records. Schema validity, DTD defaults, required fields, and business rules still need to be checked with the system or validator that owns that contract.
Why did reordering repeated elements create changes?
Check Node matching. Exact path position treats the first, second, and later siblings as separate anchors. If repeated records have stable attributes such as id, sku, or code, keyed sibling matching usually gives a more useful review.
Why are whitespace edits showing up?
The Whitespace policy controls this. Normalize whitespace when indentation and line wrapping should not matter. Preserve exact whitespace when spaces inside text or attribute values carry meaning for the receiving system.
What should I use for namespace prefixes?
Use local names and namespace URI when different prefixes point to the same XML vocabulary. Use qualified prefix names when the literal prefix spelling is part of the review requirement or when you need to see prefix changes explicitly.
Is pasted XML uploaded for comparison?
No upload is needed for the comparison itself. Pasted text and browsed files are read in the browser, and export buttons create local copies of the current script, tables, chart, or JSON report.
Glossary:
- Well-formed XML
- XML that satisfies the syntax rules needed to build a parsed document tree.
- Namespace URI
- The URI string that identifies an XML vocabulary for namespace-aware names.
- Local name
- The name part of an element or attribute after any namespace prefix is removed.
- Qualified name
- The literal XML name form that may include a prefix, such as
fin:status. - Identity attribute
- An attribute value used to match repeated sibling elements across the original and revised documents.
- Direct text
- Text or CDATA content that belongs directly to an element, not to one of its child elements.
References:
- Extensible Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation, 26 November 2008.
- Namespaces in XML 1.0 (Third Edition), W3C Recommendation, 8 December 2009.
- XML Information Set (Second Edition), W3C Recommendation, 4 February 2004.
- Canonical XML Version 1.1, W3C Recommendation, 2 May 2008.
- Parsing and serializing XML, MDN Web Docs, 13 October 2025.
- XML External Entity Prevention Cheat Sheet, OWASP Cheat Sheet Series.