Text Summary
{{ formatNumber(countWord) }} words
Unique {{ formatNumber(countUniqueWord) }} Chars {{ formatNumber(countChar) }} Sentences {{ formatNumber(countSentence) }} Paragraphs {{ formatNumber(countParagraph) }} Read ~ {{ estReadTime }} min FK Grade {{ fleschKincaidGrade }} FRE {{ fleschReadingEase }} Lexical {{ lexicalDensity }}%
Text:
{{ row.label }} {{ row.value }}
Word Count Copy
{{ item.word }} {{ item.count }}
N-gram Count Copy
{{ g.token }} {{ g.count }}

                

Introduction:

Text statistics describe how writing is structured and how it reads in practice. A word counter and readability checker helps you see length, variety, and pace so you can edit with intent.

Counts and clarity scores arrive together, then frequency lists and ngrams reveal which words carry most weight. Paste text or drop a file, choose options that fit your draft, and read results that are easy to compare.

A short paragraph may show a quick reading time while a dense passage can lift the grade level and lower reading ease. Use consistent settings across drafts so changes reflect your edits rather than different rules.

Results are estimates shaped by language and punctuation, so unusual formatting or very small samples can mislead. When comparing versions, look for direction rather than single point differences.

Technical Details:

The analysis observes words, sentences, characters, whitespace, and syllables in a snapshot of text. From these quantities it derives average word length, average sentence length, lexical density, estimated reading minutes, and estimated speaking minutes. Lexical density is the share of unique words within all words, sometimes called a type–token ratio.

Two readability indices are computed from sentence length and syllables per word. Reading ease increases as sentences shorten and syllables per word fall, while grade level rises with longer sentences or more syllables per word. These indices summarize effort, not writing quality.

Tokens are built from letters with diacritics and internal apostrophes or hyphens, with an option to include numbers. Sentence boundaries follow terminal punctuation. Syllables are estimated with a vowel‑group heuristic and common ending adjustments.

Comparisons are most meaningful within the same language and with the same options for case handling, stop‑words, accent normalization, minimum word length, and n‑gram rules. Extremely short texts can produce noisy indices.

ASL = WS ASW = SYLW FRE = 206.835 1.015ASL 84.6ASW FKGL = 0.39ASL + 11.8ASW 15.59
Symbols and units used in formulas
Symbol Meaning Unit/Datatype Source
WTotal wordscountDerived
STotal sentencescountDerived
SYLTotal syllablescountEstimated
ASLAverage sentence lengthwords/sentenceDerived
ASWAverage syllables per wordsyllables/wordDerived
FREReading ease score0–100Computed
FKGLGrade level estimategradeComputed
Worked example
Given : W=200, S=10, SYL=300 ASL = 20010=20 ASW = 300200=1.5 FRE = 206.8351.0152084.61.5=59.64 FKGL = 0.3920+11.81.515.59=9.91

In this case the reading ease is moderate while the grade level is about tenth grade. Values are rounded to two decimals.

Options and parameters
Parameter Meaning Unit/Datatype Typical Range Sensitivity Notes
Case sensitive Treat uppercase and lowercase as distinct boolean false or true Affects unique words and frequencies Lowercasing improves comparability
Include numbers Count numeric tokens as words boolean false or true Shifts counts in data‑heavy text Recognizes decimals with “.” or “,”
Remove stop‑words Exclude common function words boolean false or true Raises lexical density Fixed English list
Normalize accents Fold diacritics to base letters boolean false or true Unifies spellings Applied before tokenizing
Minimum word length Discard short tokens integer 1–∞ Small shift to counts Default 1
N‑gram size Length of contiguous token sequences integer 1–3 Reveals phrases 1 words, 2 bigrams, 3 trigrams
Across sentences Allow n‑grams to cross sentence ends boolean false or true Changes phrase counts When off, reset at punctuation

Units, precision, and rounding

  • Averages, lexical density, and readability indices use two decimals with a period as the decimal separator.
  • Estimated reading minutes assume 200 words/minute; speaking minutes assume 125 words/minute, rounded to the nearest minute with a minimum of one.

Validation and bounds

Input controls and limits
Field Type Min Max Step/Pattern Notes
Top N words number 1 integer Controls frequency table and chart
Top N n‑grams number 1 integer Controls n‑gram table
Minimum word length number 1 step 1 Shorter tokens are discarded
N‑gram size select 1 3 1 words, 2 bigrams, 3 trigrams
File input file text read Reads content of selected file as text

I/O and tokenization notes

  • Input is plain text pasted or read from a local file.
  • Tokens include letters with diacritics, apostrophes, and hyphens; optional decimals use “.” or “,”.
  • Accent normalization folds characters like “café” to “cafe” before tokenizing.
  • Curly apostrophes normalize to straight for counting consistency.

Networking, storage, and charts

  • Processing is browser‑based; files are read locally and not uploaded.
  • Copy and download actions create CSV and JSON in the session only.
  • Charts render from in‑page data; no network requests are made for analysis.

Assumptions and limitations

  • Heads‑up Syllable counts use a heuristic and can miscount rare words or names.
  • Sentence splitting relies on “.” “!” “?” and may miss abbreviations or ellipses.
  • Stop‑words are English‑only and fixed; other languages may skew density.
  • Numbers are included by default; toggling them changes counts and averages.
  • Very short texts can produce unstable readability scores.
  • Accent folding can merge distinct words in some languages.
  • N‑grams reflect contiguous tokens only; no stemming or lemmatization is applied.
  • Reading and speaking speeds are fixed baselines and do not reflect individual rates.

Edge cases and error sources

  • Mixed decimal separators in numbers within the same text.
  • Non‑ASCII punctuation or unusual quotes affecting token boundaries.
  • Single‑sentence or single‑word inputs yielding division by small counts.
  • Long unbroken strings causing wide tokens and skewed histograms.
  • Abbreviations with periods increasing sentence counts unexpectedly.
  • Hyphenated compounds counted as single tokens rather than two words.
  • Grapheme clusters that contain combining marks without normalization.
  • Right single quotes and straight apostrophes mixing within contractions.
  • Binary or rich‑text files opened as text producing noisy characters.
  • Extremely long inputs stressing memory during frequency aggregation.

Scientific and standards context

The reading ease and grade formulas are widely attributed to Rudolf Flesch and to work by J. Peter Kincaid and colleagues on grade level. Type–token ratio is a common lexical diversity measure in linguistics.

Privacy and compliance

No data is transmitted or stored server‑side. Clipboard and downloads operate locally for your session.

Step‑by‑Step Guide:

Text analysis measures counts, readability, and frequent terms to guide revision.

  1. Paste text or drop a file into the input area.
  2. Open Advanced to set case, numbers, stop‑words, and accents.
  3. Choose Minimum word length and N‑gram size.
  4. Toggle Across sentences for phrase spans if needed.
  5. Review statistics, then scan frequency, n‑grams, and charts.
  6. Copy or download CSV or JSON for your records.

Example: After removing stop‑words and setting minimum length to 3, frequent terms shift to content words that better represent the topic.

Use the same settings to compare drafts confidently.

FAQ:

Is my data stored?

No. Text is processed in your browser, and exports are created locally for your session.

No server‑side storage.
How accurate are the syllable counts?

They use a heuristic that is strong for common English words yet may miss rare names or unusual endings.

Treat as estimates.
Which units are used?

Word and sentence counts are integers. Averages and scores use two decimals. Reading and speaking minutes are whole numbers.

Period as decimal separator.
Can I run it without an internet connection?

Yes, once loaded it operates entirely in the page without remote requests for analysis.

Charts render from local data.
Does it cost anything?

There is no charge indicated by the package metadata. If licensing changes, follow the site’s terms.

Check project terms if published.
How do I count unique words only?

Enable case folding, keep numbers off if not needed, set a minimum length, and consult the Unique Words figure or frequency table.

Stop‑words can be removed.
What does a borderline score mean?

Scores near your target suggest minor edits may flip direction. Short sentences or simpler words usually increase ease and lower grade.

Use trends across drafts.

Troubleshooting:

  • Numbers dominate results — turn off Include numbers.
  • Too many tiny words — increase Minimum word length.
  • Phrases look odd — disable Across sentences.
  • Mixed character cases — disable Case sensitive.
  • Charts seem empty — ensure there is enough text and increase Top N.
  • Accents split words — enable Normalize accents.

Glossary:

Token
A unit treated as a word for counting.
Type–token ratio
Unique words divided by total words.
Bigram
Two‑word contiguous sequence.
Trigram
Three‑word contiguous sequence.
Average sentence length
Words per sentence across the text.
Reading ease
Score indicating simplicity of text.
Grade level
Approximate school grade needed to read comfortably.
Stop‑word
Common function word often excluded from analysis.