XML Diff Comparator

{{ statusBadge }} {{ matchingBadge }} {{ whitespaceBadge }} {{ namespaceBadge }} {{ analysis.warnings.length }} note(s)

Review profile:

Start from a config, schema, document, or strict positional review profile; then adjust any rule below.

Node matching:

Choose whether repeated XML elements match by identity attributes or by exact path position.

Original XML:

Paste the baseline XML or drop one XML-like file onto the textarea.

{{ originalStatus || 'Drop XML, XSD, SVG, WSDL, TXT, or config files onto the original textarea.' }}

Revised XML:

Paste the changed XML or drop one XML-like file onto the textarea.

{{ revisedStatus || 'Drop XML, XSD, SVG, WSDL, TXT, or config files onto the revised textarea.' }}

Diff scope:

Show only material changes for review, all parsed records for audit, or content-only differences.

Whitespace policy:

Normalize for feed/config reviews; preserve when text spacing is meaningful.

Namespace names:

Use local names when documents may use different prefixes for the same namespace URI.

Comparison actions:

Use the sample pair for a quick cold-start review, swap the comparison direction, format both sources, or clear both editors.

XML pair

Diff updates automatically after edits, drops, swaps, or rule changes.

Script context: {{ boundedContextLines }} row(s)

Use 0 for compact scripts or more rows for review context.

rows

Identity attributes:

Default names cover common feed, SOAP, Maven, and config identifiers.

{{ line.text }}

{{ header }}	Copy
No rows to export for the current input.
{{ cell.value }} `{{ cell.value \|\| ' ' }}` {{ cell.value }}

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

XML changes can look larger or smaller than they really are depending on how the file is compared. A line-by-line diff highlights characters and line positions, which is useful for hand-edited text but often noisy for XML. XML applications usually read a parsed tree: one document element, nested elements, attributes, namespace-aware names, and character data. A review that follows that tree is closer to what a feed reader, build system, SOAP client, sitemap crawler, or configuration loader will consume.

The same XML information can be serialized with different indentation, attribute order, namespace prefixes, or empty-element spelling. Some of those differences are harmless for a receiving system; others change the actual data. A useful review names the changed element, attribute, or text value and then checks whether that path matters to the system that reads it.

Common XML comparison questions and risks
Review question	What matters	Common trap
Did a value change?	Attribute values and direct text inside the matched element.	Pretty-printing or wrapped text can distract from the actual value.
Did a record move?	Stable identifiers such as an ID, code, SKU, or name.	Position-only comparison can turn a reorder into many apparent edits.
Did a namespace change?	The namespace URI and local name, not just the visible prefix.	Different prefixes can point to the same vocabulary.
Is the XML valid?	Well-formed XML can be parsed; schema validity is a separate contract.	A clean parse does not prove that an XSD, DTD, or business rule accepts the document.

Tree-aware comparison is most helpful when XML holds repeated records or structured settings: invoice line items, Maven dependencies, RSS entries, SVG elements, WSDL definitions, sitemap URLs, or vendor configuration blocks. In those files, a changed value can sit several levels deep, and a raw text diff can bury it among spacing, prefix, or ordering changes.

Parsed XML trees compared through matched element, attribute, and direct text records.

A parsed XML diff still has boundaries. It can show where two well-formed documents differ, but it cannot decide whether a changed field is allowed by a schema, whether a default from a DTD should be applied, or whether an application treats two values as equivalent. Treat the diff as a focused change review, then confirm important changes against the system that owns the XML contract.

How to Use This Tool:

Use the controls to decide how repeated nodes, whitespace, and namespace prefixes should be treated before you trust the change count.

Choose a Review profile. Config drift, Schema evolution, Document content, and Strict position audit set sensible starting values for node matching, scope, whitespace, namespace handling, context rows, and identity attributes.
Set Node matching. Keyed sibling matching tries identity attributes before using sibling order; Exact path position treats the first, second, and later repeated elements as distinct positions.
Paste, browse, or drop the baseline document into Original XML, then add the changed document to Revised XML. The file picker accepts XML-like text files such as XML, XSD, SVG, RSS, Atom, WSDL, TXT, and config files under 2 MB.
Use Diff scope to choose the review view. Show changed, added, and removed records for a normal review; switch to All parsed nodes for an audit; use Text and attributes only or Text node changes only when element-name rows would add noise.
Pick a Whitespace policy. Normalize whitespace for formatted feeds and configuration files; preserve exact whitespace when spaces inside text or attribute values carry meaning.
Pick Namespace names. Local names and namespace URI avoids false differences when prefixes change but the namespace URI is the same. Qualified prefix names is stricter when prefix spelling is part of the review.
Open Advanced when you need more script context or a custom identity-attribute list. Results update after edits, drops, swaps, formatting, and rule changes.

If the summary changes to Needs valid XML, fix the parse message first. The result tabs remain useful only after both documents can be parsed into XML trees.

Interpreting Results:

The headline figure counts material changes across the full parsed comparison, even when the selected Diff scope shows a narrower ledger. A material change is a changed value, a record found only in the revised XML, or a record found only in the original XML.

Changed means a matched element, attribute, or direct text record exists in both documents but the compared value differs after the active whitespace and namespace rules.
Added means the record exists only in Revised XML.
Removed means the record exists only in Original XML.
Unchanged appears in full-tree or context views when a parsed record matches without a material value change.

Start with XML Change Script when you need a readable path-by-path review. Use Node Change Ledger for the status, kind, original value, revised value, and review note behind each visible row. Use XML Patch Ops when you need a compact add, remove, or replace list. Use Comparison Audit to check parse counts, ignored comments or processing instructions, matching policy, namespace policy, whitespace policy, and warnings.

XML diff result artifacts and uses
Artifact	Use it for	What to verify
XML Change Script	A readable script-style review of paths and changed values.	Whether the selected diff scope and context rows include enough surrounding information.
Node Change Ledger	A row-by-row table of status, kind, path, values, and review notes.	Whether the path and matching note match the document semantics.
XML Patch Ops	A compact add, remove, or replace handoff list.	Whether the XPath-like selector is specific enough for the receiving workflow.
Comparison Audit	Counts, policies, parse signals, ignored-information notes, and warnings.	Whether comments, processing instructions, large ledgers, or positional matching warnings affect trust.
XML Change Mix Chart	A count view of changed, added, removed, and unchanged records.	Whether a dominant count is caused by a real change or by a matching-policy choice.
JSON	A structured copy of summary, settings, counts, visible changes, and patch operations.	Whether the same settings were used across comparisons you plan to compare.

A small count is not proof of full XML equivalence. Confirm the comparison settings, inspect the exact paths, and treat schema validity, DTD defaults, digital signatures, and application-specific rules as separate checks.

Technical Details:

XML 1.0 separates a document's physical text from its logical structure. The physical file contains characters, markup, entity references, declarations, and whitespace. The logical structure that most applications consume is a tree rooted at one document element, with child elements, attributes, namespace information, comments, processing instructions, and character information available to the XML processor.

Well-formedness is the first gate. A document with a missing end tag, badly nested element, or broken syntax cannot produce a reliable XML tree. Validation is stricter and separate: a well-formed document may still fail an XML Schema, DTD, required-field rule, allowed-value list, or business contract.

Transformation Core:

The comparison reduces each parsed document to records that can be matched and labeled. The record model is intentionally review-oriented: it focuses on element names, attributes, and direct text or CDATA content, while reporting comments and processing instructions as audit information rather than value rows.

XML diff transformation stages
Stage	Mechanism	Review effect
Parse	Each input must produce a well-formed XML document with a document element.	Empty input, malformed markup, or missing document structure stops the comparison with an input issue.
Collect records	Element records, attribute records, and direct text or CDATA records are collected with display paths.	The ledger centers on data-bearing structure rather than raw line positions.
Match paths	Repeated sibling elements are matched by configured identity attributes or by parsed sibling position.	Stable keys reduce reorder noise; positional matching preserves order-sensitive reviews.
Compare values	Matched records are compared after the active whitespace and namespace-name policies are applied.	Each record becomes changed, added, removed, or unchanged.
Filter view	The complete record comparison is filtered by the selected diff scope.	The headline material count can be larger than the visible ledger when the view is narrowed.
Map handoff rows	Material differences are converted into add, remove, or replace operations with XPath-like selectors.	Reviewers can separate the detailed ledger from a compact change-operation list.

Rule Core:

Three policies usually explain why two XML reviews disagree: repeated-node matching, namespace treatment, and whitespace treatment.

XML diff rule and interpretation map
Policy	Rule	Useful for	Risk to check
Keyed sibling matching	Configured identity attributes such as `id`, `name`, `key`, `code`, `uid`, or `sku` are tried before sibling order.	Feeds, inventories, dependency lists, order lines, and configuration maps with stable identifiers.	A missing or wrong identity value can still make a row fall back to position.
Exact path position	Repeated siblings match by their order under the same parent.	Priority lists, ordered steps, ranked rules, or documents where order changes meaning.	A pure reorder can appear as multiple changed or replaced records.
Normalize whitespace	Runs of whitespace in compared text and attribute values are compacted before comparison.	Reviews where indentation, wrapping, and pretty-printing should not dominate the diff.	Meaningful spacing inside a text value can be hidden.
Preserve exact whitespace	Compared text and attribute values keep their exact spacing.	Strict reviews where spaces are data, not formatting.	Formatting-only edits can create many text differences.
Local names and namespace URI	Element and attribute names are compared by namespace URI plus local name where applicable.	Documents that use different prefixes for the same XML vocabulary.	Namespace URI strings must still match exactly.
Qualified prefix names	The literal prefix and name spelling are kept for comparison.	Contracts where prefix spelling itself must be reviewed.	Prefix aliases can create noise when the namespace URI is the meaningful identity.

Information Boundary:

The XML Information Set includes more than elements, attributes, and character data. Comments, processing instructions, namespace declarations, DTD-related information, unparsed entities, base URI, and declaration-derived defaults can matter in some systems. A review focused on element, attribute, and direct text changes is easier to read, but it is not the same as a complete infoset comparison.

Canonical XML is a standards-defined method for producing a normalized physical representation for equivalence and signature workflows. A human-readable parsed-record diff has a different purpose. It helps reviewers locate changed paths and values, but it does not prove canonical equivalence, signature preservation, or schema acceptance.

For repeat reviews, keep the same matching, whitespace, namespace, and scope policies. Changing those rules between runs can change the count even when the XML inputs are identical.

Limitations and Privacy Notes:

The comparison itself runs in the browser. Pasted XML and loaded file text are read for the local comparison action rather than uploaded by the compare step, and exported scripts, tables, chart images, and JSON files are generated from the current browser result.

Each browsed or dropped file input uses the first XML-like file and rejects files larger than 2 MB.
Both inputs must be well-formed XML with a document element before the comparison can produce ledger rows.
The ledger compares elements, attributes, and direct text or CDATA text. Comments and processing instructions are counted in the audit but are not compared as ledger rows.
No XML Schema, DTD, business-rule, digital-signature, or canonical XML validation is performed.
Untrusted XML can be risky in systems that process external entities, DTDs, or entity expansion. Review security-sensitive XML with the parser settings used by the receiving application.

Worked Examples:

An invoice customer update is a typical text change. If <customer code="C-42">Ada</customer> becomes <customer code="C-42">Ada Lovelace</customer>, keyed sibling matching anchors the row to code="C-42". Node Change Ledger shows Changed, kind text, a path ending in /customer[@code="C-42"]/text(), original value Ada, and revised value Ada Lovelace.

A reordered item list needs the matching rule to match the business meaning. If two <item sku="..."> records swap order, Keyed sibling matching can still align each item by sku. Exact path position instead compares the first item with the first item and the second item with the second item, so the Comparison Audit warning about reorder noise is a cue to confirm whether order itself matters.

A namespace-prefix change can be harmless or important depending on the review policy. If <fin:status> and <f:status> use the same namespace URI, Local names and namespace URI treats the prefix alias as the same name and focuses on the status text. Qualified prefix names preserves the literal prefix, so the ledger can show added and removed records where a URI-aware review would show one matched name.

A malformed document must be fixed before the diff can be trusted. If Original XML contains <invoice><status>draft</invoice>, the summary reports Needs valid XML, the alert names an original XML parse error, and Comparison Audit shows Input issue. Add the missing </status> end tag before reading counts, paths, or chart values.

FAQ:

Does the comparator validate XML against an XSD or DTD?

No. It checks whether both inputs can be parsed as well-formed XML, then compares parsed records. Schema rules, DTD defaults, required fields, allowed values, and business rules still need a validator or the receiving system.

Why did a reorder create many changes?

Check Node matching. Exact path position treats sibling order as meaningful. If records have stable attributes such as id, sku, or code, Keyed sibling matching usually gives a cleaner review.

Why are namespace prefixes ignored?

Local names and namespace URI compares namespace-aware names by URI and local name, so different prefixes can still match. Choose Qualified prefix names when literal prefix spelling must be reviewed.

Why do formatting edits appear as text changes?

Whitespace policy controls that behavior. Use Normalize whitespace to reduce indentation and wrapping noise; use Preserve exact whitespace when spacing inside values is meaningful.

What does the XML Change Mix Chart count?

It counts changed, added, removed, and unchanged parsed records across the comparison. Use Comparison Audit and Node Change Ledger to explain why a count is high before treating the chart as a summary of business impact.

Is pasted XML uploaded for comparison?

No upload is needed for the comparison action itself. The browser reads pasted text and loaded files, then creates local copies when you use the script, table, chart, or JSON export buttons.

Glossary:

Well-formed XML: XML that satisfies the syntax rules needed to build a parsed document tree.
Document element: The single root element that contains the rest of the XML document structure.
Namespace URI: The URI string that identifies an XML vocabulary for namespace-aware names.
Local name: The element or attribute name after any namespace prefix is removed.
Qualified name: The visible XML name form that may include a prefix, such as fin:status.
Identity attribute: An attribute value used to match repeated sibling elements across the original and revised documents.
Direct text: Text or CDATA content that belongs directly to an element, not to one of its child elements.
Canonical XML: A standards-defined normalized XML representation used for equivalence and signature workflows.

References:

Extensible Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation, 26 November 2008.
Namespaces in XML 1.0 (Third Edition), W3C Recommendation, 8 December 2009.
XML Information Set (Second Edition), W3C Recommendation, 4 February 2004.
Canonical XML Version 1.1, W3C Recommendation, 2 May 2008.
Parsing and serializing XML, MDN Web Docs, 13 October 2025.
XML External Entity Prevention Cheat Sheet, OWASP Cheat Sheet Series.