Text Sorter
Sort pasted lists or TXT/CSV text in your browser with natural order, cleanup filters, seeded shuffle, counts, and copy-ready output.Current status
Current status
| Metric | Value | Copy |
|---|---|---|
| {{ r.label }} | {{ r.value }} |
| Metric | Value | Copy |
|---|---|---|
| {{ r.label }} | {{ r.value }} |
Introduction:
Small text lists rarely arrive in a tidy order. Filenames come from folders, titles come from notes, tags come from forms, and issue labels may be copied from several places before anyone decides what order should be kept. Sorting gives those loose items a repeatable shape so missing entries, duplicates, long labels, and odd numbering become easier to see.
Alphabetical order is not one fixed rule. A computer can compare raw characters, apply language-aware collation, treat embedded numbers as real numbers, or sort by item length. Each choice can be correct for a different job. File10 before File2 follows a plain character comparison, while File2 before File10 matches how people normally read numbered filenames.
| Decision | What changes | Example to check |
|---|---|---|
| Natural or plain alphabetical order | Digit runs are compared as numbers in natural order, but as separate characters in plain alphabetical order. | File2 and File10 |
| Case-sensitive or case-folded comparison | Uppercase and lowercase may be treated as different text or as the same letters for sorting and duplicate checks. | alpha and Alpha |
| Locale-aware collation | Accents and alphabet conventions can move the same visible letters in different languages. | a, รค, and z |
| Item cleanup before comparison | Trimming, blank removal, article handling, leading-number stripping, and filters can change both order and counts. | 1. The Report |
The first practical question is where one item ends and the next begins. A newline list, a comma-separated phrase list, a tab-separated token stream, and a space-separated word list need different boundaries. Choosing the wrong separator can make a whole paste look like one item, or split a phrase into fragments that were meant to stay together.
Plain text sorting works best when each item is a simple line or token. Full CSV records, quoted commas, escaped delimiters, and multi-column data need a CSV parser before sorting, because the delimiter has meaning inside the record itself. For single-column lists and ordinary text tokens, a sorter can make cleanup, order, and duplicate policy visible before the list is reused.
How to Use This Tool:
Start by proving that the source text was split into the right items. After that, choose the comparison rule and add cleanup only where it improves the final list.
- Paste the list into Text to sort, drop a text file onto the field, or choose Browse TXT/CSV. A loaded file should show a source summary with line and character counts.
- Set Input separator before changing the sort mode. Use New Line for one item per line; use comma, semicolon, tab, or space only for simple token streams.
If the first result looks like one giant item, the separator is usually wrong. Switch back to New Line for ordinary pasted lists.
- Choose Sort mode. Use alphabetical modes for names and words, natural modes for numbered filenames, Character length for short-to-long review, Reverse to flip the current sequence, or Shuffle for a randomized order.
- Set Case sensitive and Locale when uppercase, lowercase, or accented letters affect the order. Keep the same locale when comparing two sorted runs.
- Open Advanced for cleanup controls such as Trim each item, Ignore blank lines, Collapse internal whitespace, Strip leading numbers, Ignore leading articles, and Deduplicate items.
- Add Include text filter, Exclude text filter, or Regex include only after the basic split looks right.
If the regex warning shows Invalid regex, fix the pattern or return Regex filter mode to No regex filter before trusting the filtered output.
- Choose Output separator, Newline style, optional line numbers, prefix, and suffix. Check Sorted Text for the copy-ready text, then use Text Stats, Character Counts, or JSON when you need counts or a review record.
Interpreting Results:
Sorted Text is the paste-ready result. It includes the final output separator, newline style, optional line numbers, prefix, and suffix. The other tabs explain what happened to the list, but they are not a substitute for checking the actual text you plan to reuse.
- Text Stats compares total items, non-blank items, unique items after cleanup and filtering, output items, and average item length.
- Character Counts measures the sorted item array before final line numbers, prefixes, suffixes, and joining are added.
- JSON records the chosen settings, sorted items, summary counts, warnings, and generated artifacts in a structured form.
- The Unique badge and the final output count can differ when duplicate removal is off, blanks remain, or filters remove items.
High item counts do not prove the list was split correctly. A comma inside "Boston, MA" can become a false boundary, and a space separator can break multi-word titles. Compare the first few output lines against the original text before using the result as a record list.
Technical Details:
Collation is the comparison rule that decides which string comes first. Language-aware collation matters because Unicode code point order is not the same as reader-expected alphabetical order. Accents, case, punctuation, and local alphabet rules can all affect where a word belongs.
Natural sorting changes comparison for digit runs. Instead of comparing 1 and 2 as the next characters, the numeric parts are compared as numbers. That makes file and issue labels easier to read when names contain counts, sequence numbers, or version-like suffixes.
Transformation Core:
- Split the source text by the selected input separator. Newline mode normalizes carriage returns first; comma, semicolon, tab, and space modes also treat line breaks as item boundaries.
- Apply cleanup: trim item edges, strip leading list numbers, collapse internal whitespace, and optionally remove blank items.
- Apply plain include and exclude filters under the active case-sensitivity rule.
- Apply regex inclusion only when regex mode is enabled, the pattern is present, and the pattern compiles with the accepted flags.
- Sort, reverse, or shuffle the processed items under the selected mode.
- Remove duplicates when requested. Case-sensitive mode uses exact item text; case-folded mode uses lowercase comparison keys.
- Decorate with optional line numbers, prefix, and suffix, then join items with the chosen output separator and newline style.
| Mode | Comparison behavior | Good fit |
|---|---|---|
| Alphabetical A-Z / Z-A | Language-aware text comparison without numeric collation. | Names, titles, labels, and words without important embedded numbers. |
| Natural A-Z / Z-A | Language-aware comparison with digit runs treated as numeric values. | Filenames, numbered tasks, photo names, mixed IDs, and version-like labels. |
| Character length | Shortest item first, with alphabetical comparison used for equal lengths. | Finding long labels, compacting prompt lists, or reviewing naming consistency. |
| Reverse | Flips the processed sequence without alphabetical comparison. | Turning newest-first notes into oldest-first order, or reversing an existing queue. |
| Shuffle | Randomizes the processed list; a seed repeats the same shuffle for the same processed items. | Review rotations, randomized prompts, and temporary queues that need a repeatable draw. |
Metrics Core:
Average item length in Text Stats is calculated from the sorted item array before numbering, prefixes, suffixes, or the final joiner are added:
The displayed average is rounded to two decimal places. Character Counts also reports non-whitespace characters, word count, average words per item, shortest item length, and longest item length for the sorted items.
| Setting or input | Boundary to check | Practical effect |
|---|---|---|
| Locale | Blank uses the browser default; an unsupported language tag falls back to default collation. | Accented-letter order can vary between locales and browsers. |
| Case sensitive | Off uses base sensitivity for comparison and lowercase keys for duplicates. | Alpha and alpha can collapse into one duplicate group. |
| Regex include | Accepted flags are g, i, m, s, u, and y; an invalid pattern raises a warning. |
The regex pass should be corrected before filtered output is copied. |
| TXT/CSV file | Text-like files are accepted only up to 2 MB. | Large or binary files are rejected before sorting starts. |
| Comma-separated text | Comma mode splits plain tokens, not quoted CSV records. | A value such as "Boston, MA" can be split into separate items. |
Limitations and Privacy Notes:
The sorter is built for plain text lists, not full spreadsheet or CSV-record parsing. Pasted text and selected text-like files are processed in the browser session rather than uploaded for sorting.
- Quoted CSV fields, escaped separators, and multi-column records are not preserved as records.
- Selected files larger than 2 MB are rejected to keep browser-side sorting responsive.
- Unseeded shuffle uses current browser randomness and is not suitable for audit-grade random draws.
- Locale-sensitive order can differ when a different browser, operating system, or locale fallback is used.
Worked Examples:
These cases show where separator choice, comparison mode, and cleanup settings change the result.
Numbered filenames
A pasted list containing File10, File2, and File1 should use Input separator set to New Line and Sort mode set to Natural (A-Z). Sorted Text should show File1, File2, then File10.
Title list with leading articles
For The Matrix, Arrival, and A Beautiful Mind, enable Ignore leading articles before sorting. The articles remain visible in Sorted Text, but comparison treats Matrix and Beautiful Mind as the title starts.
Repeatable review rotation
With four reviewer names, choose Shuffle and enter release-v1-order in Shuffle seed. The same names, cleanup settings, and seed produce the same randomized Sorted Text, which makes the order easier to revisit later.
Regex warning recovery
If Regex include uses [A-Z and the warning shows Invalid regex, the regex pass is not trustworthy. Correct the pattern to something valid, such as [A-Z], then compare Total items and Output items in Text Stats.
Advanced Tips:
- Use Natural (A-Z) for filenames and numbered labels before adding duplicate removal, so duplicate checks run after the human-readable order is established.
- Set Locale explicitly for multilingual or accented lists that must be compared later with the same order.
- Keep Case sensitive on when
SKU-aandSKU-Aare distinct records; turn it off when capitalization is only formatting noise. - Use Regex include for structured patterns such as
^IMG_\d+, but use plain include and exclude text for simple contains checks. - Check Character Counts after prefix, suffix, or line-number changes if the destination has length limits, because the tab measures sorted items before final decoration.
FAQ:
How do I make File2 sort before File10?
Choose Natural (A-Z). Natural sorting treats digit runs as numeric values, so 2 sorts before 10.
What should I enter for locale?
Use a BCP 47 language tag such as en-US, fr, or de when accented-letter order matters. Leave the field blank to use the browser default.
Why did my CSV rows split in the wrong place?
Comma mode treats commas as plain item separators. It does not preserve quoted fields, escaped commas, or multi-column CSV records.
Why does JSON differ from Sorted Text?
JSON contains the sorted item array, settings, counts, and warnings. Sorted Text also includes line numbers, prefixes, suffixes, and the selected output separator.
What does Invalid regex mean?
The regex pattern or flag combination could not be compiled. Fix the pattern or switch Regex filter mode to No regex filter before using the filtered result.
Are dropped files uploaded for sorting?
No. Selected and dropped text-like files are read by the browser, and sorting runs in the current browser session.
Glossary:
- Collation
- The comparison rule used to decide the order of strings.
- Natural sort
- A text order that compares embedded digit runs as numbers.
- BCP 47 language tag
- A standard language identifier such as
en-US,fr, orde. - Separator
- The character, space, tab, or line break used to split source text into individual items.
- Regex
- A pattern used to match text that follows a structure.
- Seeded shuffle
- A randomized order that can be repeated when the same input, cleanup settings, and seed are used.
References:
- Unicode Technical Standard #10: Unicode Collation Algorithm, Unicode Consortium, 3 September 2025.
- Intl.Collator, MDN Web Docs, 10 July 2025.
- RFC 5646: Tags for Identifying Languages, RFC Editor, September 2009.
- Regular expressions, MDN Web Docs, 30 October 2025.
- File API, World Wide Web Consortium, 3 December 2025.