{{ result.summaryTitle }}
{{ result.previewDisplay }}
{{ result.secondaryText }}
{{ badge.text }}
Unicode character lookup inputs
Try U+2713, %E2%9C%93, or one pasted symbol.
{{ advanced.flag_internal_only ? 'On' : 'Off' }}
{{ advanced.flag_invisible_controls ? 'On' : 'Off' }}
Field Value Copy
{{ row.label }} Recommended
{{ row.note }}
{{ row.displayValue }}
Target Output Copy
{{ row.label }} Primary
{{ row.note }}
{{ row.displayValue }}
Form Preview U+ Change Copy
{{ row.form }} Selected
{{ row.note }}
{{ row.preview }} {{ row.codePointSequence }} {{ row.changedText }}
# Preview Name U+ Type Profile Copy
{{ row.position }} {{ row.preview }}
{{ row.name }}
{{ row.scriptLabel }} | {{ row.utf8Label }}
{{ row.uPlus }} {{ row.typeLabel }} {{ row.profileLabel }}

        
:

Introduction:

Unicode text is made from code points, but users often see grapheme clusters: one visible character can contain several code points joined by combining marks, variation selectors, or zero-width joiners. That distinction matters when debugging broken search, copied symbols, invisible spaces, or emoji sequences.

The safest way to discuss a suspicious character is often not the glyph itself. U+ notation, escape forms, UTF-8 bytes, and normalization output make the value explicit even when fonts, editors, or messaging systems render it differently.

Unicode lookup path from glyph to exact reference forms

Unicode lookup identifies representation and risk cues. It does not prove that every application will render or compare the text the same way, because fonts, segmentation libraries, normalization policy, and security filters still vary.

Technical Details:

The lookup accepts literal graphemes, code point tokens, JavaScript or JSON escapes, CSS escapes, and percent-encoded UTF-8. Auto mode first recognizes explicit tokens such as U+2713, 0x2713, numeric references, and Unicode escapes; escaped text mode can force a specific dialect.

Unicode lookup input and output paths
Input pathAccepted examplesOutput emphasis
Literal graphemee plus combining mark, emoji clusterCode point count and grapheme count.
Code point sequenceU+200B, U+1F469 U+200D U+1F4BBExact U+ sequence.
JavaScript or JSON escapes\\u2713, \\u{1F469}Decoded literal and source dialect.
CSS escapes\\1F469 \\200D \\1F4BBCSS target form and decoded text.
Percent UTF-8%E2%80%8BUTF-8 bytes and URL-safe reference.

The result compares normalization forms NFC, NFD, NFKC, and NFKD when the runtime supports normalization. It flags surrogate code points, private-use code points, noncharacters, controls, invisible format characters, whitespace, and combining marks when those checks are enabled. External Unicode name and property modules are loaded for richer names and categories, with local fallback labels if they are unavailable.

Everyday Use & Decision Guide:

Use Auto detect for pasted characters and common U+ tokens. Switch to Escaped text when the input is source-code syntax or percent-encoded bytes and the auto guess is not the dialect you intended.

  • Use U+ notation in tickets when the visible glyph could be confused or invisible.
  • Use Normalized literal only when canonical or compatibility folding is actually desired.
  • Check Review flags before copying private-use, surrogate, control, or invisible characters.
  • Use Normalization rows to see whether storage or search may change the code point sequence.

The first grapheme cluster is the inspection target. If extra text is pasted after it, review notes call that out so the result is not mistaken for a full-string analysis.

Step-by-Step Guide:

  1. Paste a glyph, U+ token, escape sequence, or percent-encoded UTF-8 into the lookup field.
  2. Choose Auto detect, Literal grapheme, Code point token or sequence, or Escaped text.
  3. Set the escape dialect and normalization form if the input path needs to be forced.
  4. Run the lookup and read Reference Board for name, U+ sequence, type profile, and bytes.
  5. Use Paste Targets for JSON, CSS, HTML numeric references, UTF-8, UTF-16, and UTF-32 forms.
  6. If Review needed appears, prefer explicit U+ notation over raw copying.

Interpreting Results:

Lookup complete means no enabled high-severity flags were found. Review needed means the character or sequence may be invisible, application-defined, malformed as standalone text, or otherwise risky to copy raw.

A glyph that looks identical after normalization may still have different code points before normalization. Use the Code point sequence and Normalization Board when exact storage or comparison matters.

Worked Examples:

Check mark. The visible character resolves to U+2713, with JavaScript, CSS, HTML, UTF-8, and percent-encoded forms available for copying.

Combining acute. e plus U+0301 may normalize to a single precomposed form under NFC. The normalization row shows that the code point sequence changed.

Zero-width space. U+200B may not be visible in normal text. The result flags it and recommends U+ notation for debugging notes.

FAQ:

Why does one emoji have several code points?

Many emoji are grapheme clusters built from base emoji, zero-width joiners, variation selectors, or modifiers.

What does normalization change?

Normalization can compose, decompose, or compatibility-fold characters so comparison and storage behave consistently.

Why are private-use code points flagged?

Private-use values are application-defined, so they may lose meaning outside the system that assigned them.

Glossary:

Code point
A numbered Unicode value such as U+2713.
Grapheme cluster
What a user usually perceives as one character.
Normalization
A standard way to transform equivalent Unicode sequences.
Surrogate
A UTF-16 code unit range that is not a standalone Unicode scalar value.