HTML Entity Lookup

Lookup direction:

Auto decodes entity tokens and bare names; code points and literal symbols resolve to Encode.

Entity, name, symbol, or code point:

The selector above decides whether this source is decoded as an entity/name or encoded as a symbol/code point.

Input hint:

Use code point for U+ or decimal values; use entity/name for escaped tokens.

Preferred HTML output:

Choose named for readability, decimal or hex for exact numeric copy.

Example preset:

Select Custom to keep your own query and settings unchanged.

Strict entity parsing:

Switch off only when reviewing legacy bare names or missing semicolons.

Attribute-value decode rules:

Turn on when the entity will be decoded inside an attribute value.

Field	Value	Copy
{{ row.label }}	{{ row.displayValue }}

Target	Paste this	Use when	Copy
{{ row.label }}	{{ row.displayValue }}	{{ row.recommendation }}

No paste targets available

Resolve one lookup to generate paste targets for text nodes, attributes, scripts, CSS, and URLs.

Alias	Kind	Notes	Copy
{{ row.alias }}	{{ row.kind }}	{{ row.notes }}

#	Glyph	Unicode	Decimal	UTF-8 bytes	Notes	Copy
{{ row.index }}	{{ row.glyph }}	{{ row.unicode }}	{{ row.decimal }}	{{ row.bytes }}	{{ row.notes }}

No code points available

Resolve one lookup to inspect the underlying Unicode code points and UTF-8 bytes.

Signal	What to review	Copy
{{ row.badge }}	{{ row.detail }}

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

Characters in HTML have two identities at once. One identity is the character a reader sees, such as a copyright sign, an ampersand, or a mathematical symbol. The other is the source text the browser reads before it can display anything. When a character is also part of HTML syntax, the source form matters as much as the visible glyph.

Character references are the standard way to write a character through a source token. A named reference such as ©, a decimal numeric reference such as ©, and a hexadecimal numeric reference such as © can all resolve to the same Unicode value. The choice affects readability, portability, and how easy it is to review a template later.

The confusing part is that HTML is only one destination. The same value may need one spelling in text between tags, another inside an attribute, another in a JavaScript string, another in CSS generated content, and another in a URL component. Copying the first entity that looks right can create a string that is valid in one place and wrong in another.

Common HTML character reference decisions
Question	Useful distinction	Common mistake
Do I need a named entity?	Named references are readable when an official semicolon-terminated name exists.	Assuming every Unicode character has a named HTML reference.
Is the numeric value enough?	Numeric references point directly to Unicode code points and work even without a name.	Forgetting that a visible symbol may be a sequence of multiple code points.
Where will I paste it?	Text nodes, quoted attributes, unquoted attributes, JavaScript, CSS, and URLs use different escape rules.	Moving an HTML entity into code or a URL and expecting the destination language to decode it.

Several terms help keep the decision clear. A character reference is the source token that starts with an ampersand. A code point is the Unicode number behind a character, written in forms such as U+00A9. A grapheme cluster is what people often treat as one character even when it uses more than one code point, such as a base symbol plus a combining mark.

Legacy parsing rules add one more wrinkle. Some names can be recognized without a semicolon for compatibility with old pages, and several names can point to the same character or sequence. New HTML is easier to audit when named references include the semicolon and when attribute values are quoted.

An entity lookup answers a narrow question: what value does this token, name, symbol, or code point represent, and which copy form fits the destination syntax? It does not prove that a larger HTML fragment is safe, sanitized, or valid as a full document.

How to Use This Tool:

Use one entity token, bare entity name, symbol, or code point sequence at a time. The lookup separates identity checks from copy-ready output so you can confirm the value before choosing a paste target.

Enter a value in Entity, name, symbol, or code point. Useful examples include ©, copy, ©, U+00A9, a literal quote, or U+2242 U+0338.
Keep Lookup direction on Auto for normal work. Pin Decode entity or name when the input already looks like an HTML reference, or Encode symbol or code point when you start from a glyph or Unicode value.
Use Input hint when the source is ambiguous. The code point hint is best for U+, 0x, decimal scalar, or numeric character reference input.
Set Preferred HTML output to choose the main recommendation. Readable named first favors official names when available, while decimal and hex force numeric references.
Open Advanced when reviewing edge cases. Strict entity parsing rejects loose forms, and Attribute-value decode rules checks how the value behaves inside an attribute context.
Read Reference Board first. Confirm the resolved text, Unicode sequence, UTF-8 bytes, recommended named alias, numeric references, JavaScript escape, CSS escape, and URL component form.
If the summary reports Fix input and retry, correct the case-sensitive name, add a semicolon, remove extra literal text, or keep every code point inside the valid Unicode scalar range before copying from the result tabs.

Interpreting Results:

The primary output is the preferred HTML form for the current settings. It may differ from the text you typed because decode mode resolves a reference to its character first, while encode mode turns a character or code point into source-safe forms.

Reference Board is the identity check. Use it when similar-looking symbols might hide different code points, when a combining mark is present, or when you need the exact UTF-8 bytes before copying into another system.

Paste Targets is the destination check. The HTML text row is for content between tags. Quoted attribute rows include the quote-specific escape. The unquoted attribute row is disabled when whitespace, quotes, equals signs, grave accents, empty text, or other syntax risks make unquoted markup unsafe.

Entity Aliases lists official names for the same resolved value when the public standards catalog is available. Prefer the semicolon-terminated recommendation for new markup. Treat no-semicolon aliases as parser-compatibility information.

Code Points shows each scalar value separately. A single-looking character can produce more than one row when the result is a sequence, and copying only part of the sequence can change the displayed symbol.

Paste Review collects warnings such as bare-name shortcuts, missing semicolons, invisible characters, multiple code points, ignored extra literal text, numeric-only fallback, and cases where a quoted attribute form is safer than an unquoted one.

Technical Details:

HTML character references are resolved by the HTML parser before displayed text is formed. A named reference maps a case-sensitive name to one or more Unicode code points from the HTML named-reference table. A numeric reference maps a base-10 or base-16 integer to a Unicode scalar value.

Unicode scalar values range from U+0000 through U+10FFFF, excluding surrogate code points. Surrogates are part of UTF-16 encoding, not standalone characters, so an explicit code point in that range is not a valid scalar value for a character reference.

The same resolved text can still need different escaping after lookup. HTML text, HTML attributes, JavaScript strings, CSS strings, and URL components have separate grammars. The safest copy form is therefore tied to the destination, not only to the character identity.

Transformation Core:

HTML entity transformation paths
Input path	Example	Resolution step	Important boundary
Named reference	`©`	The case-sensitive name maps through the named-reference catalog.	Names normally need the leading ampersand and trailing semicolon in real HTML source.
Bare name shortcut	`copy`	The bare word is treated as a lookup convenience for a matching named reference.	The shortcut is not the source text to paste into markup.
Decimal numeric reference	`©`	The decimal integer becomes a Unicode scalar value.	Out-of-range values and surrogate code points are invalid.
Hex numeric reference	`—`	The hexadecimal integer becomes a Unicode scalar value.	Hex digit case does not change the value, but named-reference case still matters.
Code point sequence	`U+2242 U+0338`	Each scalar value becomes a character, then the sequence is kept together.	Combining marks must travel with the base character when the combined display is intentional.
Literal symbol or grapheme	`&`	The first grapheme cluster is encoded into named, numeric, and destination-specific forms.	Extra literal text after the first cluster is reported rather than silently encoded as a phrase.

Paste Context Rules:

Destination syntax for copied entity output
Destination	Rule to apply	Practical result
HTML text node	Escape characters that disturb markup parsing, especially ampersand and angle brackets.	Use between tags when the value is plain page text.
Double-quoted attribute	Escape the double quote along with normal markup-sensitive characters.	Use when the surrounding attribute is wrapped in `"`.
Single-quoted attribute	Escape the single quote along with normal markup-sensitive characters.	Use when the surrounding attribute is wrapped in `'`.
Unquoted attribute	Reject forms that contain whitespace, quotes, equals signs, grave accents, empty text, or other token-breaking characters.	Use only when the row is enabled and unquoted syntax is deliberate.
JavaScript string	Use JavaScript escape syntax rather than an HTML character reference.	Copy this form for source strings, not for rendered HTML text.
CSS string or content	Use CSS backslash code point escaping with proper termination.	Copy this form for CSS-generated text or CSS string values.
URL component	Use percent-encoding instead of entity syntax.	Copy this form for query values or path components that must travel through a URL.

Alias and Catalog Limits:

Official HTML named references can include multiple names for the same resolved value. Some compatibility names omit the semicolon, and some names resolve to a sequence instead of one code point. Alias rows are strongest when the standards catalog is available; numeric references remain the fallback when no named reference exists for the exact value.

Fonts and rendering engines can make different values look alike. Whitespace, zero-width characters, variation selectors, combining marks, and emoji sequences should be checked through the code point and byte rows rather than by appearance alone.

Accuracy and Privacy Notes:

The lookup runs in the browser for the value you enter. The page may load public standards data for named-reference coverage, but your typed value is not sent as a query to that catalog.

The result identifies one character reference, symbol, or code point sequence and prepares destination-specific copy forms. It does not sanitize a larger fragment, validate a full document, or guarantee that every font will display the resolved value identically.

Worked Examples:

Readable copyright mark. Enter © with Lookup direction on Auto or Decode entity or name. The value resolves to U+00A9, and the named output is a readable choice when ordinary HTML source is the destination.

Hex numeric reference. Enter — and use strict parsing. The decode path resolves the em dash code point, then the paste rows show the named, decimal, hex, CSS, JavaScript, and URL-safe options that fit different destinations.

Attribute quote. Enter a literal double quote with Encode symbol or code point and turn on Attribute-value decode rules. The double-quoted attribute row should differ from the single-quoted row because the wrapper quote decides what must be escaped.

Combining sequence. Enter U+2242 U+0338 with Input hint set to Unicode code point or sequence. The code point rows should show both scalar values, and the review notes should warn that the resolved display depends on the full sequence.

Emoji code point. Enter U+1F680 and force Preferred HTML output to Hex numeric. The hex entity is usually the clearest HTML fallback when no readable named reference exists for that exact emoji.

FAQ:

Should I use a named reference or a numeric reference?

Use a semicolon-terminated named reference when an official name exists and helps humans read the source. Use decimal or hexadecimal numeric output when no name exists, when exact code point identity matters more, or when your project uses a numeric style consistently.

Why does a missing semicolon warning matter?

Some legacy names can parse without a semicolon, but that compatibility shortcut is easier to misread near other text. A semicolon-terminated reference is the safer default for new HTML.

Why did one visible character produce several code points?

Some characters are built from a sequence, such as a base character plus a combining mark or an emoji sequence. Copying only one code point can change the display or meaning.

Can I paste an HTML entity directly into JavaScript or CSS?

Usually no. JavaScript and CSS strings use their own escape syntax. Use the JavaScript or CSS paste row when the destination is source code rather than HTML markup.

Why is the unquoted attribute row disabled?

Unquoted attributes cannot safely contain many characters that are harmless in quoted attributes. If the row is disabled, use a quoted attribute form or change the surrounding markup to quote the value.

Glossary:

Character reference: A source token that represents one or more characters when HTML is parsed.
Numeric reference: A decimal or hexadecimal reference that maps directly to a Unicode code point.
Unicode scalar value: A Unicode code point from U+0000 through U+10FFFF, excluding surrogate code points.
Grapheme cluster: The visible unit a reader often treats as one character, even when it is made from multiple code points.
Legacy no-semicolon alias: A compatibility form that may parse in some HTML contexts but should not be the default for new markup.

References:

HTML Standard: Named character references, WHATWG.
Character reference glossary, MDN Web Docs.
Unicode Standard: Conformance and encoding forms, Unicode Consortium.