HTML Escape and Unescape Tool

{{ requestedSourceModeBadge }} {{ resolvedSourceModeBadge }} {{ sourceLength.toLocaleString() }} input chars {{ changeBadge }} {{ warningCount }} review note(s)

Ignored {{ ignoredExtraFiles }} extra file(s). This tool imports one text or HTML file at a time.

Operation:

Auto detect unescapes entity-shaped source and escapes raw markup or text; pinned modes bypass detection.

Source text:

Use one snippet at a time; files stay local in the browser and follow the selected operation.

Drop TXT, HTML, XML, or SVG onto the textarea.

Escape profile:

Attribute-safe escapes both quote types; text-node only escapes ampersands and angle brackets.

Unescape handling:

Use strict mode when reviewing broken ampersands or source that must not be guessed.

Decode passes:

One pass is literal; two passes handles common double-escaped CMS or email fragments.

passes

Line breaks on escape:

Only escape operations use this setting; unescape preserves decoded line breaks.

Unicode normalization:

Leave unchanged for exact source fidelity; NFC/NFKC can merge visually equivalent Unicode forms.

Trim outer whitespace:

Off by default so copied snippets retain exact surrounding whitespace.

{{ outputText }}

HTML escape and unescape transform ledger
#	Offset	Input token	Output token	Type	Pass	Status	Copy
{{ row.sequence }}	{{ row.offset }}	{{ row.inputDisplay }}	{{ row.outputDisplay }}	{{ row.type }}	{{ row.pass }}	{{ row.status }}
No character references or escaped characters were changed.

Copy safety audit for HTML escape and unescape output
Signal	Status	Detail	Copy
{{ row.signal }}	{{ row.status }}	{{ row.detail }}

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

HTML mixes readable text with syntax. Most letters, numbers, and punctuation can appear as ordinary content, but a few characters tell the parser to start a tag, begin a character reference, or close an attribute value. Escaping changes those reserved characters into references such as <, &, and " so the browser reads them as text instead of structure.

Unescaping travels in the other direction. It resolves character references back into the characters a person expects to read, which is useful when CMS exports, email templates, translation strings, XML snippets, SVG fragments, log entries, or issue comments arrive already encoded. The risky case is repeated decoding: &lt;strong&gt; may look harmless at first, then become tag-shaped text after a second pass.

Text node: Visible content between tags. Ampersands and angle brackets are the main characters that usually need protection.
Attribute value: Text inside a tag attribute. The quote character used to wrap the value must also be escaped.
Character reference: An ampersand-led token that stands for a character by name, decimal code point, or hexadecimal code point.

Markup text, context choice, and reference text in an HTML escaping flow

Context decides which characters need protection. Visible text mainly needs ampersands and angle brackets escaped. A quoted attribute value also needs the matching quote protected because one stray delimiter can end the value early. XML-shaped content is stricter about its predefined entity set and semicolon use, while HTML keeps several legacy named-reference allowances for older pages.

Escaping is output encoding, not a sanitizer. It can show untrusted text without giving reserved characters their markup role, but it does not decide which HTML tags, attributes, or URLs should be allowed. Once text is decoded again, tag-shaped fragments, event attributes, embedded objects, and script-like URL strings still need a separate safety review before they are rendered as markup.

How to Use This Tool:

Paste the text to convert, drop one TXT, HTML, XML, or SVG file into the text area, or browse for a file under 1 MB. The conversion updates from the current input and settings, so you can pin a direction, compare passes, and use the output as a new source without reloading the page.

Start with HTML or escaped text. If a file is too large, choose a smaller text file; if more than one file is dropped, only the first file is imported and the extra-file warning explains what was ignored.
Use Operation to choose the direction. Auto detect decodes common reference-shaped input and escapes ordinary text or markup; choose Escape source or Unescape entities when you need a repeatable review.
For escaping, set Escape profile to the target context. Use attribute-safe output when the text might sit inside quotes, text-node output for visible body text, XML five predefined for XML-like snippets, or numeric/reference profiles when named references are not wanted.
For decoding, choose Unescape handling. HTML5 tolerant follows browser-style behavior for common references, while Strict semicolons leaves missing-semicolon tokens visible for review.
Set Decode passes above one for nested text such as &lt;. The run can use up to six passes, but it stops early when another pass would not change the output.
Use the advanced controls only when the destination needs that exact shape. Line breaks can be preserved, converted to  , or encoded as 
; Unicode can be normalized to NFC or NFKC; outer whitespace can be trimmed after conversion.
Check Converted Text first, then use Transform Ledger and Copy Safety to confirm what changed and what still needs review. If the output looks wrong, switch the operation or use Use output as input to test the opposite direction.

Interpreting Results:

Converted Text is the text to copy or inspect. In escape mode, it should contain references for the characters covered by the selected profile. In unescape mode, recognized named, decimal, and hexadecimal references become readable characters unless strict handling or parser rules leave them unchanged.

Transform Ledger is the audit trail for changed or inspected tokens. It lists the offset, input token, output token, reference type or code point, decode pass, and status. Escaped and Decoded rows changed the text. Skipped rows were preserved by strict missing-semicolon handling. Unresolved rows matched the reference pattern but did not resolve.

Copy Safety highlights practical review signals such as remaining raw angle brackets after escaping, unresolved references after decoding, markup-shaped segments, script-bearing patterns, and entity-like tokens left in the output. These warnings are not a sanitizer verdict; they are cues for where a person or application security check should look next.

HTML escaping and unescaping result signals
Signal	Likely meaning	Useful follow-up
No change	The selected operation found no covered characters or resolvable references.	Confirm the operation direction and check whether the input is already in the desired form.
Skipped or unresolved references	Strict handling preserved missing semicolons, or the parser did not recognize the token.	Inspect the exact token in the ledger before correcting source text or switching to tolerant decoding.
Remaining raw markup	Escaped output still contains `<` or `>` characters.	Check that the escape profile was broad enough for the target context.
Markup or script signal	Decoded output looks like tags, event attributes, embedded objects, or script-like URL text.	Treat the text as untrusted until a sanitizer or security review says it is safe to render.
Output entity tokens remain	The result still contains reference-shaped text, often because the input is nested or partially encoded.	Increase decode passes only when a second pass is intended, then compare the ledger before reuse.

Technical Details:

HTML character references begin with & and resolve through named entries or Unicode code points. Named references such as © rely on the HTML reference table, which includes legacy forms for compatibility. Decimal numeric references such as © and hexadecimal numeric references such as © point to the same Unicode character by number.

Escaping is a context rule, not a universal text-cleaning rule. A text node does not use quotes as delimiters, so quote characters can remain plain text there. Attribute values are different because a quote can terminate the value. XML adds another boundary: its five predefined entities are amp, lt, gt, apos, and quot, and XML processors expect entity references to be semicolon-delimited.

Transformation Core:

HTML escape and unescape transformation rules
Path	Rule used	Important boundary
Text-node escape	Replace ampersand and angle brackets with named references.	Quotes are not changed because they are not text-node delimiters.
Attribute-safe escape	Replace ampersand, angle brackets, double quote, and single quote.	Single quotes use `'` so the output works in common HTML attribute contexts.
XML five predefined	Use `&`, `<`, `>`, `"`, and `'`.	This matches XML's predefined entity set rather than the broad HTML named-reference list.
Decimal or hex numeric escape	Write covered reserved characters as numeric character references.	The character choice is still context-bound; numeric form does not make unsafe markup safe.
Named common symbols	Escape reserved characters and selected common symbols such as non-breaking space, copyright, registered, trademark, currency, degree, and dash references.	It is a practical subset, not the full HTML named-reference table.
Unescape	Resolve named, decimal, and hexadecimal reference-shaped tokens into characters.	Strict mode preserves tokens without semicolons; tolerant mode follows browser-style resolution for common cases.

Reference Forms:

HTML character reference forms
Form	Example	Meaning
Named reference	`<`	A name from the HTML character reference list represents the less-than sign.
Decimal numeric reference	`<`	The Unicode code point is written in base 10.
Hex numeric reference	`<`	The same code point is written in base 16.
Nested reference text	`&lt;`	One decode pass produces `<`; a later pass can produce `<`.

Each source character is transformed once during escaping, so ampersands created inside a new reference are not escaped again in the same pass. Line-feed handling applies only in escape mode: line breaks can stay as line feeds, become   plus a line feed, or become 
. Carriage returns are not carried into the escaped result, so CRLF input effectively becomes LF-based output.

Decode passes matter when text has been encoded more than once. With two passes, &lt; first becomes <, then becomes <. If a pass makes no change, decoding stops before the requested maximum. Unicode normalization, when selected, is applied after conversion; trimming outer whitespace happens last.

Privacy and Safety Notes:

The conversion for pasted snippets and selected files runs in the browser. Selected files are read as text on the page, and no upload is needed for the escape or unescape operation. Avoid using secrets in any browser-based text tool unless you are comfortable with local browser form state, clipboard use, downloads, and any address you choose to share.

Escaped text is safer to display as text, but decoded text can become markup-shaped again. Use the correct output encoding for the final context, and use a sanitizer when untrusted HTML is meant to render as HTML rather than visible text.

Worked Examples:

Escaping a link snippet for review. Input such as <a title="Coffee & tea"> with the attribute-safe profile produces <a title="Coffee & tea"> in Converted Text. The Transform Ledger should show changes for the angle brackets, quote marks, and ampersand.

Auditing a missing semicolon. A string such as Sale &copy decodes the semicolon-terminated tag references in strict mode, while the missing-semicolon copyright token can remain marked as Skipped. That makes the questionable token visible instead of silently accepting a broken source string.

Checking nested CMS output. Input such as &lt;em&gt;Draft&lt;/em&gt; needs more than one decode pass. One pass leaves Draft; a second pass reveals tag-shaped text, and Copy Safety should prompt a markup review before that result is rendered anywhere.

FAQ:

Is escaping the same as sanitizing HTML?

No. Escaping writes reserved characters as text-safe references for a context. Sanitizing decides which HTML is allowed to remain when content will be rendered as markup.

Why did Auto detect choose the wrong direction?

Auto detect looks for common entity-shaped text. If a raw snippet contains only ordinary text or an ambiguous ampersand, pin Escape source or Unescape entities before reviewing the result.

Why did strict mode leave a reference unchanged?

Strict mode keeps missing-semicolon tokens unchanged and marks them in Transform Ledger. Switch to HTML5 tolerant only when browser-style legacy decoding is the behavior you want to inspect.

What file types can be imported?

Use one TXT, HTML, HTM, XML, or SVG file under 1 MB. The file is read as text; binary files and oversized exports should be cleaned up or split before importing.

Why are entity-like tokens still present in the output?

The text may be intentionally escaped, partially decoded, or nested. Check Output entity tokens in Copy Safety and compare the pass rows before adding more decode passes.

Glossary:

Character reference: An HTML or XML token that represents a character by name or Unicode code point.
Escaping: Replacing reserved characters with references so they are handled as text.
Unescaping: Resolving reference tokens back into readable characters.
Text node: Visible text between tags, where quote marks normally have no delimiter role.
Attribute value: Text inside a tag attribute, where the surrounding quote delimiter must be protected.
Decode pass: One complete attempt to resolve reference-shaped tokens in the current text.

References:

HTML Standard: Named character references, WHATWG.
HTML Standard: Character reference parsing, WHATWG.
Extensible Markup Language XML 1.0: Predefined Entities, W3C.
Cross Site Scripting Prevention Cheat Sheet, OWASP Cheat Sheet Series.