HTML Escaper
Escape HTML text online for body, attribute, XML, or aggressive contexts with entity ledgers, validation checks, and exports for safer markup handoff.{{ summaryTitle }}
| Character | Entity | Count | Usage | Copy |
|---|---|---|---|---|
| {{ row.character }} | {{ row.entity }} |
{{ row.count }} | {{ row.usage }} | |
| No characters required entity treatment. | ||||
| Check | Status | Detail | Copy |
|---|---|---|---|
| {{ row.check }} | {{ row.status }} | {{ row.detail }} |
Introduction
HTML escaping turns characters that can be read as markup into character references that can be displayed as ordinary text. The difference matters whenever copied text includes ampersands, angle brackets, quote marks, existing entities, or non-ASCII characters that need to survive a handoff without being parsed as tags or attributes.
This tool takes pasted text, typed text, or a local TXT, HTML, XML, or SVG file and prepares escaped output for the selected HTML destination. The default attribute-safe mode treats ampersands, angle brackets, double quotes, and apostrophes. Text-node mode keeps quote marks readable when the value will sit between tags. XML mode uses the five predefined XML and HTML5 references. Aggressive mode encodes every non-alphanumeric character for strict attribute handoff.
The result updates as the source changes. A summary counts entity treatments, the main output view is ready to copy or download, the ledger explains which characters were changed or preserved, and the checks table calls out common review points such as existing entities and HTML-only context boundaries.
Processing happens in the browser session after the page loads. Pasted content and dropped files are not sent away for escaping, which is useful when the source includes internal snippets, draft copy, or private template fragments. Still, escaping is not sanitizing. It prepares text for HTML text or attribute insertion, but it does not clean unsafe markup, validate a template, or make URL, JavaScript, or CSS contexts safe.
Technical Details:
HTML character references represent characters using forms such as &, <, ", ', or '. During HTML parsing, those references resolve back to literal characters. Escaping works in the opposite direction: it takes literal characters from source text and writes reference-shaped output so the browser reads them as text instead of syntax.
The selected target controls which characters are replaced. In a text node, the important risk is tag or entity parsing, so ampersands and angle brackets are enough for normal display text. In an attribute value, quote marks can close the attribute early, so both quote types are escaped by default. XML mode follows the predefined entity set, including ' for apostrophes when readable named entities are selected.
| Escape target | Characters treated by the normal rule | Best fit |
|---|---|---|
| HTML attribute-safe text | &, <, >, ", and apostrophe |
Quoted attributes, reusable snippets, and values that might be pasted between tags or inside attributes. |
| HTML text node only | &, <, and > |
Visible text that will sit between opening and closing tags. |
| XML / HTML5 predefined entities | The five predefined XML characters: ampersand, angle brackets, double quote, and apostrophe. | XML-like handoff where the predefined entity names should be used consistently. |
| Aggressive attribute encoding | Every character that is not an ASCII letter or digit. | Strict handoff formats where spaces, punctuation, line breaks, and symbols should all become references. |
Entity style changes the spelling of replacements, not the source that is examined. Readable named entities use &, <, >, ", and either ' or ' for apostrophes depending on context. Decimal style writes references such as <. Hex style writes references such as <. Characters without one of the built-in readable names become numeric references when they are escaped.
Existing-entity handling decides what happens to text that already looks like a valid named or numeric reference. The default escapes every ampersand, so © becomes visible source text rather than resolving to a copyright symbol later. Preserve mode leaves valid references unchanged and counts them separately in the ledger. That option is useful only when the original references are already trusted and intentionally present.
| Option | What it changes | When to use it |
|---|---|---|
| Preserve valid entities | Leaves valid named, decimal, or hex references unchanged and reports them as preserved. | The source already contains intentional references that should still resolve later. |
| Escape all ampersands | Treats every ampersand as literal text, including entity-shaped input. | The source should display exactly as written or might contain untrusted references. |
| Escape non-ASCII | Encodes characters above ASCII, such as symbols, accented letters, and emoji, when the target is not already aggressive. | A legacy system cannot reliably carry UTF-8 text. |
| Normalize line endings | Preserves pasted endings or rewrites them to LF or CRLF before escaping. | A downstream file format or review workflow expects one line-ending convention. |
| Trim outer whitespace | Removes leading and trailing whitespace before conversion while keeping interior spacing. | The copied source includes accidental blank lines around the real payload. |
source text or local file
-> optional trim
-> optional line-ending normalization
-> selected HTML target
-> existing entity policy
-> readable, decimal, or hex entity style
-> escaped output, entity ledger, checks, and JSON summary
Everyday Use & Decision Guide:
Use HTML attribute-safe text as the starting point when the destination is not certain. It escapes the characters that most often break quoted attributes and still works for ordinary text display. Switch to HTML text node only when you know the value will appear between tags and you want quote marks to remain readable.
Choose XML / HTML5 predefined entities for XML-like output where apostrophes should be written as ' under the readable style. Choose Aggressive attribute encoding when a receiving system expects nearly every punctuation mark, space, line break, or symbol to be encoded. Aggressive output is harder to read, so it is usually a handoff format rather than the best choice for human-edited templates.
The existing-entity setting is the most important safety decision. If the source came from a user, a ticket, a copied web page, or any place where entity-shaped text might not be deliberate, keep Escape all ampersands. If the source came from a trusted template and already contains intentional references such as & or ©, Preserve valid entities can avoid double-escaping them.
| Task | Good starting choice | Review before copying |
|---|---|---|
| Show a markup example as text | Text node mode, escape all ampersands, readable names. | Quotes may remain literal, which is fine between tags but not inside attributes. |
| Paste a value into a quoted attribute | Attribute-safe mode, escape all ampersands, readable names. | Confirm the ledger treated both quote types when they appear. |
| Prepare XML-flavored text | XML mode and readable names. | Apostrophes should appear as ' rather than the HTML numeric form. |
| Send text through a legacy system | Attribute-safe or aggressive mode with non-ASCII escaping, decimal or hex style as requested. | Output length can grow quickly, especially with emoji, spaces, and line breaks. |
| Document exactly how text was treated | Use the ledger, checks table, and JSON export after setting the final options. | The export should match the selected target and entity policy, not an earlier draft. |
Leave non-ASCII characters unescaped for normal UTF-8 pages unless a receiving system has a clear restriction. Encoding every symbol can be helpful for old systems, but it also makes the output harder to review and may hide readable text behind numeric references.
Step-by-Step Guide:
- Paste text or markup into Text or markup, or use Browse file or drag and drop for a local text file.
- Set Escape target to match the destination where the escaped text will be pasted.
- Choose Existing entities. Use Escape all ampersands for literal display and untrusted source text. Use Preserve valid entities only for trusted references.
- Open Advanced when the handoff needs a specific entity style, non-ASCII escaping, line-ending normalization, or trimming.
- Read the summary. It reports the source length, output length, selected context, entity policy, entity style, and total entity treatments.
- Use Escaped HTML Text to copy or download the final escaped text.
- Open Entity Treatment Ledger to see each treated character, replacement entity, count, and usage note.
- Open Escape Checks to confirm source status, coverage, existing entity handling, HTML-only context boundaries, and length expansion.
- Use JSON when you need a structured record of the inputs, options, escaped output, ledger rows, and checks.
Interpreting Results:
The big count is the number of replacements plus any valid entities preserved by policy. A high count does not automatically mean the source was unsafe; it often means the selected target is strict or the source contains many repeated brackets, ampersands, spaces, or non-ASCII characters. A zero count means no characters required treatment under the current settings.
The escaped output should be read with the selected target in mind. In text-node mode, quote marks can remain unchanged because they do not close an attribute between tags. In attribute-safe mode, both quote types are treated so the same value is safer to reuse inside quoted attributes. In aggressive mode, spaces and punctuation become references too, so the output may expand much more than expected.
| Result view | What it tells you | Common misread |
|---|---|---|
| Escaped HTML Text | The final escaped string after normalization and selected policies. | Assuming the same output is also safe for URL, JavaScript, or CSS contexts. |
| Entity Treatment Ledger | Which characters were replaced or preserved, how many times, and why. | Ignoring preserved references when they came from a source that was not trusted. |
| Escape Checks | Source readiness, coverage, entity policy, HTML-only boundary, and length ratio. | Treating a passing check as sanitizer approval. It is still only HTML escaping. |
| JSON | A structured snapshot of inputs, options, output, ledger rows, and checks. | Sharing JSON that contains sensitive source text when only the escaped output is needed. |
The HTML only check is deliberate. HTML escaping is the right treatment for text displayed in an HTML text node or quoted attribute. URLs need percent-encoding, JavaScript strings need JavaScript string escaping, CSS needs CSS escaping, and untrusted markup needs sanitizing or removal rather than simple entity replacement.
Worked Examples:
Markup shown as text
A source such as <strong>Save & close</strong> in text-node mode becomes visible example text rather than a real strong element. The angle brackets and ampersand are escaped, while quote handling is not needed because the result sits between tags.
Input:
<strong>Save & close</strong>
Text-node output:
<strong>Save & close</strong>
Attribute value with quotes
Attribute-safe mode is a better fit when the text could be placed inside a quoted attribute. A label such as Save "draft" & close needs the ampersand and the double quotes treated so the destination attribute stays closed.
Input:
Save "draft" & close
Attribute-safe output:
Save "draft" & close
Trusted existing entities
With escape-all mode, © is treated as literal source text and the leading ampersand is escaped. With preserve-valid mode, a valid reference can remain unchanged and appears as a preserved ledger entry. That second behavior is useful for trusted templates, not for unknown input.
Input:
Already escaped: © & ©
Escape all ampersands:
Already escaped: &copy; &amp; &#169;
Preserve valid entities:
Already escaped: © & ©
FAQ:
Is this the same as sanitizing HTML?
No. Escaping turns selected characters into references so text can be displayed in an HTML context. Sanitizing inspects markup and removes or allows elements, attributes, and protocols according to a policy. Use a sanitizer when you need to accept and render user-provided markup.
Why does attribute-safe mode escape apostrophes?
An apostrophe can close a single-quoted attribute. Attribute-safe mode treats both quote types so the same escaped value is safer to reuse in either quote style.
Why does preserving existing entities need caution?
A preserved reference can still resolve later. That is useful when the reference is intentional, but risky when the source is unknown and you wanted exact literal display. Escape all ampersands when in doubt.
Should non-ASCII characters always be escaped?
No. Normal UTF-8 pages can carry accented letters, symbols, and emoji directly. Turn on non-ASCII escaping only when a legacy handoff cannot safely carry Unicode or a receiving system explicitly asks for numeric references.
Why is the escaped output longer?
References take more characters than the original symbols. For example, one ampersand becomes &, and one less-than sign becomes <. Aggressive mode and non-ASCII escaping can expand the text much more.
Glossary:
- HTML character reference
- Text such as
&or&that represents a character during HTML parsing. - Text node
- Ordinary text between HTML tags, such as the visible words inside a paragraph.
- Attribute value
- Text assigned to an HTML attribute, usually inside single or double quotes.
- Non-ASCII
- Characters outside the basic ASCII range, including many accented letters, currency symbols, mathematical signs, and emoji.
- Double-escaping
- Escaping text that already contains references, which can make a reference display literally instead of resolving to its character.