{{ summaryTitle }}
{{ summaryFigure }}
{{ summaryDetail }}
{{ operationBadge }} {{ sourceLength.toLocaleString() }} input chars {{ changeBadge }} {{ warningCount }} review note(s)
HTML escape and unescape inputs
Use one snippet at a time; auto mode decodes entity-shaped input and escapes raw markup or text.
{{ sourceMeta }}
Drop TXT, HTML, XML, or SVG onto the textarea.
{{ fileStatus }}
Escape raw snippets for display or attributes, or unescape entities back to readable text.
Attribute-safe escapes both quote types; text-node only escapes ampersands and angle brackets.
Use strict mode when reviewing broken ampersands or source that must not be guessed.
One pass is literal; two passes handles common double-escaped CMS or email fragments.
passes
Only escape operations use this setting; unescape preserves decoded line breaks.
Leave unchanged for exact source fidelity; NFC/NFKC can merge visually equivalent Unicode forms.
Off by default so copied snippets retain exact surrounding whitespace.
{{ trimOutputEnabled ? 'On' : 'Off' }}
{{ outputText }}
HTML escape and unescape transform ledger
# Offset Input token Output token Type Pass Status Copy
{{ row.sequence }} {{ row.offset }} {{ row.inputDisplay }} {{ row.outputDisplay }} {{ row.type }} {{ row.pass }} {{ row.status }}
No character references or escaped characters were changed.
Copy safety audit for HTML escape and unescape output
Signal Status Detail Copy
{{ row.signal }} {{ row.status }} {{ row.detail }}

        
Customize
Advanced
:

Introduction:

HTML character references exist because some characters have jobs inside markup. The less-than sign can start a tag, the ampersand can start a reference, and quote characters can close an attribute value. Escaping changes those characters into text-safe references so the browser reads them as content instead of structure.

Unescaping runs the idea in the other direction. It turns references such as &, <, ", decimal references, and hexadecimal references back into their readable characters. That is useful when content has been copied out of a CMS, email template, XML export, localization file, or audit log and needs to be checked by a person.

Raw text <span title="x"> Coffee & tea Escape context text node attribute value Reference text &lt;span&gt; Coffee &amp; tea Escaping preserves text meaning while changing how reserved characters appear in markup.

The context matters. Text between tags usually needs ampersands and angle brackets escaped. Attribute values also need the quote character that matches the attribute delimiter, and many teams escape both single and double quotes to make snippets easier to move between systems. XML has a smaller predefined set and treats missing semicolons more strictly than a forgiving HTML parser.

Escaping is not the same as sanitizing. A decoded string can still contain markup, event-handler text, script-like fragments, or URLs that should not be inserted into a page. Character references help preserve text and avoid accidental parser confusion; they do not prove that the decoded content is safe to render as trusted HTML.

How to Use This Tool:

Paste a snippet, drop a TXT, HTML, XML, or SVG file into the text area, or load the sample. The input is processed locally in the browser, and the result updates as the operation and advanced options change.

  1. Leave Operation on Auto detect for a first pass, or pin Escape source or Unescape entities when repeating a review.
  2. Choose an Escape profile that matches the target context. Use attribute-safe output for snippets that may land inside quoted attributes, text-node safe output for visible body text, XML five predefined output for XML-like content, or numeric profiles when named references are not wanted.
  3. For unescaping, choose HTML5 tolerant when reviewing ordinary web content and Strict semicolons when missing semicolons should remain visible.
  4. Set Decode passes above one only when the text is nested or double-escaped, such as &amp;lt;. The process stops early when another pass would not change the text.
  5. Use the advanced line-break, Unicode normalization, and trimming options only when the destination system needs that exact output shape.
  6. Review Converted Text, then check Transform Ledger and Copy Safety before copying content into templates, docs, issue comments, or code review notes.

Interpreting Results:

Converted Text is the direct output. When escaping, it should replace reserved characters according to the selected context. When unescaping, it should turn recognized references into readable characters while preserving unresolved or deliberately skipped references.

Transform Ledger shows the tokens that changed, their offsets, pass numbers, and status. A row marked Skipped usually means strict mode preserved a missing-semicolon reference. A row marked Unresolved means the parser did not resolve that reference, so the token needs manual review.

Copy Safety is a practical warning layer. It flags raw angle brackets after escaping, markup-shaped output after unescaping, and script-shaped patterns such as script tags, iframes, event attributes, or javascript: text. Treat those warnings as review prompts, not as a complete security verdict.

HTML escaping result interpretation
Signal Likely meaning Check before reuse
No change The selected mode found no characters or references it needed to transform. Confirm that the operation was pinned correctly and the text was not already in the desired form.
Many changed tokens The text contains many reserved characters, references, line breaks, or nested escapes. Use the ledger to spot accidental double-escaping or an overly broad escape profile.
Markup signal Decoded output looks like real tags or tag fragments. Do not paste it into an HTML-rendering surface unless it has been reviewed and sanitized separately.
Script signal Decoded output contains text commonly associated with active content. Treat it as unsafe text until a security-aware review says otherwise.

Technical Details:

HTML references use an ampersand-led token to represent a character. Named references use memorable names, decimal numeric references use a base-10 code point, and hexadecimal numeric references use base 16. A browser-style HTML parser is forgiving for some legacy names, while strict XML-like handling expects the semicolon and the smaller predefined set.

Escaping is context-sensitive because the same character can be harmless in one place and structural in another. The ampersand should be escaped before creating references for other characters, otherwise a newly created reference can be escaped again. Attribute profiles also need to protect the quote delimiter so the value cannot close early.

Transformation Core:

HTML escape and unescape transformation rules
Mode Core rule Boundary
Text-node escape Replace &, <, and > with named references. Does not escape quote characters because plain text between tags does not use quotes as delimiters.
Attribute escape Escape ampersand, angle brackets, double quote, and single quote. Useful when the same string may move between single-quoted and double-quoted attributes.
XML five predefined Use &amp;, &lt;, &gt;, &quot;, and &apos;. A closer fit for XML-like content than broad HTML named references.
Numeric escape Emit decimal or hexadecimal character references for reserved characters. Useful when named references are undesirable, but still requires context review.
Unescape Resolve named, decimal, and hexadecimal reference-shaped tokens into characters. Strict mode preserves missing-semicolon tokens; tolerant mode follows browser-like behavior for common cases.

Reference Anatomy:

HTML character reference anatomy
Reference form Example What it represents
Named &copy; A named entry in the HTML character reference set.
Decimal numeric &#169; A Unicode code point written in base 10.
Hex numeric &#xA9; The same code point written in base 16.
Nested &amp;lt; A reference that needs more than one decode pass to become <.

Unicode normalization is applied only after conversion when selected. NFC can make canonically equivalent text compare more consistently, while NFKC can fold compatibility characters. Leave normalization off when byte-for-byte source fidelity matters.

Privacy and Safety Notes:

The transformation runs in the browser for pasted text and selected files. That helps with review snippets from templates, CMS exports, and logs, but it does not remove the need to handle sensitive content carefully before pasting it into any web page.

Escaped output is still text. Decoded output can become markup-shaped text. Use a dedicated sanitizer, template auto-escaping, content security policy, and application-specific validation when content might be rendered as HTML for other users.

Worked Examples:

Escaping an attribute snippet. Input such as <a title="Coffee & tea"> becomes a copy-safe string where angle brackets, ampersand, and quotes are represented as references. The ledger should show the quote and bracket replacements, and the copy-safety check should not report remaining raw angle brackets.

Unescaping a CMS export. Input such as &lt;strong&gt;Sale&lt;/strong&gt; decodes to visible tag-shaped text. That may be what you need for an editor review, but the markup warning is important because the result should not be treated as trusted HTML by default.

Checking a double escape. Input such as &amp;lt; needs two decode passes to become <. One pass produces &lt;, which is still a reference-shaped token.

FAQ:

Does escaping prevent cross-site scripting?

Escaping for the correct context is one part of safe output handling, but it is not a full sanitizer. Decoded or user-supplied content still needs the same security controls your application normally requires.

Why did strict mode leave a token unchanged?

Strict mode preserves reference-shaped text that is missing a semicolon. That makes broken or ambiguous source easier to audit instead of silently guessing what the browser might do.

Why are apostrophes sometimes written as &#39; instead of &apos;?

&apos; belongs to XML's predefined set and is widely understood in modern HTML, but numeric &#39; remains a common HTML-safe choice for single quotes in attributes.

Glossary:

Character reference
A named, decimal, or hexadecimal token that stands in for a character in HTML text.
Escaping
Replacing reserved characters with references so they are treated as content rather than markup syntax.
Unescaping
Resolving references back into their readable characters.
Text node
Visible text between tags, where quotes usually do not need escaping.
Attribute value
Text inside a tag's attribute, where the quote delimiter must be protected.