Unicode Character Lookup
Inspect online Unicode characters, code points, escapes, normalization forms, UTF byte views, and safer copy targets for debugging exact text.{{ result.summaryTitle }}
| Field | Value | Copy |
|---|---|---|
|
{{ row.label }}
Recommended
{{ row.note }}
|
{{ row.displayValue }} |
| Target | Output | Copy |
|---|---|---|
|
{{ row.label }}
Primary
{{ row.note }}
|
{{ row.displayValue }} |
| Form | Preview | U+ | Change | Copy |
|---|---|---|---|---|
|
{{ row.form }}
Selected
{{ row.note }}
|
{{ row.preview }} | {{ row.codePointSequence }} | {{ row.changedText }} |
| # | Preview | Name | U+ | Type | Profile | Copy |
|---|---|---|---|---|---|---|
| {{ row.position }} | {{ row.preview }} |
{{ row.name }}
{{ row.scriptLabel }} | {{ row.utf8Label }}
|
{{ row.uPlus }} | {{ row.typeLabel }} | {{ row.profileLabel }} |
By copying or publishing this embed code, you are responsible for how the tool appears and is used on your website.
- The embedded tool is provided for general informational and utility purposes only. It is not professional, legal, financial, medical, safety, or compliance advice.
- Results depend on the inputs, browser behavior, available data sources, and the current version of the tool. Review important results before relying on them.
- You are responsible for the surrounding page context, labels, instructions, privacy notices, accessibility, and any laws or policies that apply to your website.
- Do not embed the tool in a misleading, unlawful, harmful, or security-sensitive context.
- Simplified Tools may update, limit, suspend, or remove tools and embed behavior without prior notice.
- Analytics, network requests, cookies, browser storage, third-party services, and query parameters may apply depending on the tool and the embedding page.
If these terms do not work for your use case, do not embed the tool.
Introduction:
Unicode text is made from code points, but users often see grapheme clusters: one visible character can contain several code points joined by combining marks, variation selectors, or zero-width joiners. That distinction matters when debugging broken search, copied symbols, invisible spaces, or emoji sequences.
The safest way to discuss a suspicious character is often not the glyph itself. U+ notation, escape forms, UTF-8 bytes, and normalization output make the value explicit even when fonts, editors, or messaging systems render it differently.
Unicode lookup identifies representation and risk cues. It does not prove that every application will render or compare the text the same way, because fonts, segmentation libraries, normalization policy, and security filters still vary.
Technical Details:
The lookup accepts literal graphemes, code point tokens, JavaScript or JSON escapes, CSS escapes, and percent-encoded UTF-8. Auto mode first recognizes explicit tokens such as U+2713, 0x2713, numeric references, and Unicode escapes; escaped text mode can force a specific dialect.
| Input path | Accepted examples | Output emphasis |
|---|---|---|
| Literal grapheme | e plus combining mark, emoji cluster | Code point count and grapheme count. |
| Code point sequence | U+200B, U+1F469 U+200D U+1F4BB | Exact U+ sequence. |
| JavaScript or JSON escapes | \\u2713, \\u{1F469} | Decoded literal and source dialect. |
| CSS escapes | \\1F469 \\200D \\1F4BB | CSS target form and decoded text. |
| Percent UTF-8 | %E2%80%8B | UTF-8 bytes and URL-safe reference. |
The result compares normalization forms NFC, NFD, NFKC, and NFKD when the runtime supports normalization. It flags surrogate code points, private-use code points, noncharacters, controls, invisible format characters, whitespace, and combining marks when those checks are enabled. External Unicode name and property modules are loaded for richer names and categories, with local fallback labels if they are unavailable.
Everyday Use & Decision Guide:
Use Auto detect for pasted characters and common U+ tokens. Switch to Escaped text when the input is source-code syntax or percent-encoded bytes and the auto guess is not the dialect you intended.
- Use
U+ notationin tickets when the visible glyph could be confused or invisible. - Use
Normalized literalonly when canonical or compatibility folding is actually desired. - Check
Review flagsbefore copying private-use, surrogate, control, or invisible characters. - Use
Normalizationrows to see whether storage or search may change the code point sequence.
The first grapheme cluster is the inspection target. If extra text is pasted after it, review notes call that out so the result is not mistaken for a full-string analysis.
Step-by-Step Guide:
- Paste a glyph, U+ token, escape sequence, or percent-encoded UTF-8 into the lookup field.
- Choose
Auto detect,Literal grapheme,Code point token or sequence, orEscaped text. - Set the escape dialect and normalization form if the input path needs to be forced.
- Run the lookup and read
Reference Boardfor name, U+ sequence, type profile, and bytes. - Use
Paste Targetsfor JSON, CSS, HTML numeric references, UTF-8, UTF-16, and UTF-32 forms. - If
Review neededappears, prefer explicit U+ notation over raw copying.
Interpreting Results:
Lookup complete means no enabled high-severity flags were found. Review needed means the character or sequence may be invisible, application-defined, malformed as standalone text, or otherwise risky to copy raw.
A glyph that looks identical after normalization may still have different code points before normalization. Use the Code point sequence and Normalization Board when exact storage or comparison matters.
Worked Examples:
Check mark. The visible character ✓ resolves to U+2713, with JavaScript, CSS, HTML, UTF-8, and percent-encoded forms available for copying.
Combining acute. e plus U+0301 may normalize to a single precomposed form under NFC. The normalization row shows that the code point sequence changed.
Zero-width space. U+200B may not be visible in normal text. The result flags it and recommends U+ notation for debugging notes.
FAQ:
Why does one emoji have several code points?
Many emoji are grapheme clusters built from base emoji, zero-width joiners, variation selectors, or modifiers.
What does normalization change?
Normalization can compose, decompose, or compatibility-fold characters so comparison and storage behave consistently.
Why are private-use code points flagged?
Private-use values are application-defined, so they may lose meaning outside the system that assigned them.
Glossary:
- Code point
- A numbered Unicode value such as
U+2713. - Grapheme cluster
- What a user usually perceives as one character.
- Normalization
- A standard way to transform equivalent Unicode sequences.
- Surrogate
- A UTF-16 code unit range that is not a standalone Unicode scalar value.