{{ summaryTitle }}
{{ summaryFigure }}
{{ summaryDetail }}
{{ modeBadge }} {{ sourceLength.toLocaleString() }} input chars {{ safetyBadge }} {{ unresolvedCount }} unresolved
HTML entity source
Paste escaped HTML, copied CMS text, email fragments, or a single TXT/HTML file.
{{ sourceMeta }}
Drop HTML or TXT onto the textarea.
Use 1 for a literal single pass; 2 handles most double-escaped CMS and email fragments.
passes
Choose strict mode when broken ampersands must remain untouched for source review.
Leave as No change for exact decoded text; NFC/NFKC can merge visually equivalent Unicode forms.
Off by default so copied snippets retain exact surrounding whitespace.
{{ trimOutputEnabled ? 'On' : 'Off' }}
{{ decodedText }}
HTML entity ledger for decoded references
# Reference Type Decoded Pass Status Copy
{{ row.sequence }} {{ row.reference }} {{ row.type }} {{ row.decodedDisplay }} {{ row.pass }} {{ row.status }}
No HTML character references detected.
Safety and decoding audit for HTML entity output
Finding Status Detail Copy
{{ row.finding }} {{ row.status }} {{ row.detail }}

        
Customize
Advanced
:

Introduction

HTML character references are escape sequences that let text carry characters the HTML parser would otherwise treat as markup. They are common in copied CMS content, email fragments, documentation snippets, and source examples where < should mean a visible less-than sign rather than the start of a tag.

Unescaping turns those references back into readable characters. That helps when a copied paragraph shows & instead of an ampersand, when a code example has been escaped twice, or when a reviewer needs to see whether a fragment becomes plain text, markup-shaped text, or something that deserves a safety review.

Escaped source references pass through strict or tolerant decoding and then appear as decoded text with audit flags.
Nested escapes may need more than one pass, while missing semicolons should be reviewed before the decoded text is reused.

The useful result is not only the readable text. Decoding can reveal actual angle brackets, event-handler attributes, script tags, or broken references that were hidden behind entities. Unescaped output should be treated as text until it has been reviewed for the destination where it will be pasted.

Unescaping is also different from sanitizing. It does not remove unsafe markup or make untrusted HTML suitable for rendering. It reveals what the references represent so a person can copy readable text, compare escaped and decoded forms, or decide that the decoded fragment needs a separate HTML sanitizer or code review.

Technical Details:

HTML defines character references as named, decimal numeric, or hexadecimal numeric sequences that begin with an ampersand. During parsing, recognized references resolve to Unicode characters. The same visible character can have more than one spelling, such as <, <, and < for a less-than sign.

Named references use a vocabulary maintained by the HTML standard. Some legacy names are accepted without a trailing semicolon in HTML parsing, but that tolerance is not a good editing habit. Decimal and hexadecimal numeric references point directly to Unicode code points, which is why © and © both resolve to the same copyright character.

The transformation is a scan-and-replace process. Reference-shaped tokens are found, classified, decoded through the browser parser, and recorded with the pass number, source offset, decoded display value, Unicode code points, and status. Repeated passes matter when the source has been escaped more than once.

HTML character reference forms decoded by the page
Reference form Example How it resolves Review note
Named &, <, © Uses an HTML reference name to return the matching character or characters. Case and semicolon tolerance follow HTML parser behavior in tolerant mode.
Decimal numeric < Converts the base-10 number to the corresponding Unicode code point. Useful when the source avoids named references or needs explicit code points.
Hex numeric <, < Converts the hexadecimal number to the corresponding Unicode code point. Common in security notes, logs, and generated markup.

The main rule changes with Reference mode. HTML5 tolerant mode lets the parser resolve common semicolonless references when the parser accepts them. Strict semicolons decodes only references that end in ;; a missing semicolon is preserved and counted for review instead of being silently changed.

Decode status values and their meanings
Status Condition Result in decoded text Why it matters
Decoded The reference resolves to a different character or character sequence. The resolved character replaces the reference. The ledger shows the original token, decoded value, code points, and pass.
Skipped Strict mode sees a reference-shaped token without a semicolon. The original token stays in place. Semicolon repair should be intentional when reviewing source text.
Unresolved The parser does not resolve the token to another value. The original token stays in place. Spelling errors, unsupported names, and malformed numeric references remain visible.
Transformation Core:
escaped source
  -> scan named, decimal, and hexadecimal references
  -> run pass 1 and record each token
  -> run later passes only while the previous pass changed text
  -> apply NFC or NFKC normalization when selected
  -> trim only outer whitespace when selected
  -> decoded text, entity ledger, safety audit, and JSON summary

Unicode normalization happens after reference decoding. NFC keeps characters in their common composed form, while NFKC also folds many compatibility characters into standard equivalents. That can help comparison and matching, but it can also change distinctions that were visible in the original text. Leaving normalization off preserves the decoded characters as closely as the parser returns them.

The safety audit is a text signal, not a security decision. Markup-shaped segments are counted when decoded text contains tag-like strings. Script-bearing patterns include script, iframe, object, embed, event-handler attributes, and javascript: text. These findings mean the decoded result deserves review before rendering; they do not prove every unsafe case or clear every safe case.

Local text files are read as text and limited to 1 MB. Decoded output, ledger rows, audit rows, and the JSON summary come from the current browser session. The JSON view escapes angle brackets and ampersands for display so the structured record can be inspected without rendering decoded markup.

Everyday Use & Decision Guide:

Start with Decode passes set to 2 when content came from a CMS, email client, ticket system, or exported template. Double-escaped fragments such as < usually need a second pass before they become the readable less-than sign. Use 1 pass when you are auditing the first level only and want nested references to stay visible.

Keep Reference mode on HTML5 tolerant for ordinary copied content. Switch to Strict semicolons when you are reviewing source quality, because a missing semicolon can be a real defect in a template or a clue that an ampersand was not meant to start a reference.

  • Use Decoded Text for the readable result, then copy or download only after checking the summary badges.
  • Use Entity Ledger when you need to see which token changed, which pass changed it, and which Unicode code points appeared.
  • Use Safety Audit before pasting decoded output into a template, preview, rich text editor, or documentation page.
  • Use JSON when a ticket, review note, or handoff needs the source, options, decoded text, ledger rows, and warnings together.

If the summary says Decoded with unresolved references, do not assume the readable parts are enough. Open Entity Ledger and look for Skipped or Unresolved rows. A typo such as &cop; may need correction, while a literal brand string containing an ampersand may be fine to leave alone.

Leave Unicode normalization as No change unless comparison is the goal. Turn on Trim outer whitespace only when leading and trailing blank space came from copying, because the switch removes the outside whitespace after decoding while preserving internal spacing and line breaks.

Step-by-Step Guide:

Follow the controls in order when decoding a fragment or local text file.

  1. Paste escaped content into Escaped HTML, use Browse file, or drop a TXT or HTML file onto the textarea. If the file is over 1 MB, choose a smaller text file before continuing.
  2. Set Decode passes from 1 to 6. Watch the summary detail for the input and output character counts across the passes actually run.
  3. Choose Reference mode. Use HTML5 tolerant for normal copied content, or Strict semicolons when semicolonless tokens should stay visible.
  4. Open Advanced only when needed. Pick NFC or NFKC under Unicode normalization, and enable Trim outer whitespace only for accidental outside spacing.
  5. Read the summary box. Decoded text ready means the pass finished without unresolved tokens; Decoded with unresolved references means the ledger needs review.
  6. Open Decoded Text and inspect the final text before copying. If markup appears, keep treating it as text until the destination has been reviewed.
  7. Open Entity Ledger to inspect Decoded, Skipped, and Unresolved rows by reference, type, decoded value, and pass.
  8. Open Safety Audit and pause on Markup signal, Script signal, or Replacement characters before reusing the result.
  9. Use JSON when you need a structured record of the current parameters, summary counts, decoded text, entity rows, audit rows, and warnings.

Interpreting Results:

The most important result is Decoded Text, but the summary badges and audit rows decide whether the text is ready to reuse. Text only means the current text scan did not find markup-shaped or script-bearing patterns. It does not prove that the content is safe for every destination.

Markup text means the decoded result contains tag-shaped strings such as <article>. That may be exactly what you wanted for a readable code example, but it should not be rendered as trusted HTML without a separate sanitizer or review. Script signal is stronger: inspect the decoded text for script tags, event-handler attributes, embedded objects, or javascript: strings before copying it into anything that can render markup.

Use Entity Ledger to resolve confidence problems. A Skipped row usually points to strict mode preserving a missing semicolon. An Unresolved row usually means the name, number, or spelling was not recognized by the parser. Fix the source or change Reference mode, then check that the summary and audit rows match the result you intend to hand off.

Worked Examples:

Double-escaped CMS snippet. A fragment such as &amp;lt;article class=&amp;quot;notice&amp;quot;&amp;gt;Tom &amp;amp; Jerry &amp;copy; 2026&amp;lt;/article&amp;gt; needs Decode passes set to 2. Decoded Text becomes <article class="notice">Tom & Jerry © 2026</article>, while Entity Ledger shows which references changed on pass 1 and pass 2. The Markup signal row should be reviewed because the readable result now contains tag-shaped text.

Strict review of a missing semicolon. With Reference mode set to Strict semicolons, the source Tom &copy 2026 &amp; Co. keeps &copy in the output and decodes &amp; to an ampersand. Entity Ledger reports the missing-semicolon token as Skipped, and the summary reports an unresolved count. In HTML5 tolerant mode, the same source may decode the legacy named reference if the parser accepts it.

Script-shaped output found during review. The escaped source &lt;script&gt;alert(1)&lt;/script&gt; decodes in one pass to <script>alert(1)</script>. Decoded Text still displays it as text, but Safety Audit marks Script signal as unsafe text. That result is useful for diagnosis or documentation, not for rendering as trusted HTML.

FAQ:

Why did some references stay unchanged?

A token stays unchanged when Strict semicolons skips a missing-semicolon reference or when the browser parser does not recognize the name or numeric value. Check Entity Ledger for Skipped or Unresolved status.

Should I use one decode pass or more?

Use one pass when you want to inspect the first escape level. Use two passes for common double-escaped CMS or email content. The control accepts 1 to 6 passes and stops early when a pass no longer changes the text.

Does decoded markup become safe HTML?

No. The decoded result is displayed as text, and the audit can flag markup or script-shaped patterns, but unescaping does not sanitize HTML. Review or sanitize separately before rendering decoded content.

What file types can I load?

Use Browse file or drag and drop for TXT, HTML, or HTM text files. Files over 1 MB are rejected with a message asking for a smaller file.

Is pasted text sent away for decoding?

The current decoding work happens in the browser session after the page loads. Pasted text and local file content are read there for the decoded text, ledger, audit, and JSON summary.

Glossary:

Character reference
An escape sequence that represents a Unicode character in HTML text.
Named reference
A reference that uses an HTML name such as &lt; or &copy;.
Numeric reference
A decimal or hexadecimal reference that points to a Unicode code point.
Decode pass
One scan through the current text, replacing references that resolve during that scan.
Strict semicolons
The reference mode that preserves semicolonless tokens for review instead of decoding them.
Unicode normalization
A post-decode option that rewrites decoded characters into NFC or NFKC form for comparison or matching.
Markup signal
An audit finding that decoded text contains tag-shaped strings.
Script signal
An audit finding that decoded text contains script-bearing patterns that need review before rendering.

References: