Duplicate Line Remover
Remove online duplicate lines with case, trimming, blank-line, keep-rule, sorting, and spacing controls for cleaner list exports and audits.Duplicate Line Remover
{{ cleanOutput || 'No cleaned lines.' }}
| Metric | Value | Reading | Copy |
|---|---|---|---|
| {{ row.metric }} | {{ row.value }} | {{ row.reading }} |
| Duplicate key | Occurrences | Kept line | Removed lines | Removed text | Kept text | Copy |
|---|---|---|---|---|---|---|
| {{ row.preview }} | {{ row.occurrences }} | {{ row.keptLineLabel }} | {{ row.removedLineLabel }} | {{ row.removedTextLabel }} | {{ row.keptTextLabel }} | |
| No duplicate groups found with the current match settings. | ||||||
Introduction
Duplicate lines show up in pasted contact lists, keyword exports, redirect maps, logs, allowlists, product IDs, and spreadsheet columns. Removing repeats safely depends on the matching rule, not just on whether two lines look similar at a glance.
Two entries can be duplicates because they differ only by case, leading spaces, trailing spaces, or repeated internal whitespace. The same two entries can also be distinct when they are codes, fixed-width records, or case-sensitive identifiers.
Good cleanup preserves the evidence behind the change. The cleaned list is useful for copying, while line metrics and a duplicate ledger explain what was kept, what was removed, and why.
Technical Details:
Duplicate-line removal works by building a comparison key for every eligible line. The key can trim line edges, fold case, collapse repeated internal spaces, and include or ignore blank rows depending on the selected controls.
After keys are built, entries with the same key form a duplicate group. The keep rule chooses the first occurrence, the last occurrence, or no occurrence when Keep only lines that appear once is selected.
Rule Core:
| Rule | Effect on Matching | Result to Review |
|---|---|---|
Case-sensitive match |
On keeps Apple and apple separate. Off compares lowercase forms. |
Unique match keys changes when capitalization variants merge. |
Trim line edges |
On removes leading and trailing spaces before matching and in cleaned output. | Duplicate Ledger shows the retained cleaned text. |
Ignore blank lines |
On drops blank rows before grouping. Off allows one blank-line key to be kept or removed like any other duplicate group. | Comparable lines reports ignored blank rows. |
Keep rule |
Keep first keeps the earliest source line, Keep last keeps the latest, and unique_only removes every repeated group. |
Kept output lines and Removed lines show the effect. |
Output order |
Source order keeps retained source positions. A-Z and Z-A sorting happen after duplicate removal. | Output order states whether sorting changed the final list order. |
The grouping is not limited to adjacent lines. A repeated key can be found even when matching entries are far apart in the source list. That differs from traditional adjacent-line utilities, where sorting is often needed before de-duplication.
Text-like files are read in the browser and are limited to 5 MB. The same grouped result drives Cleaned Lines, Line Metrics, Duplicate Ledger, and JSON output.
Everyday Use & Decision Guide:
For emails, URLs, keywords, and copied spreadsheet columns, start with Case-sensitive match off, Trim line edges on, Ignore blank lines on, Keep first occurrence, and Preserve source order. That removes common paste noise without reshuffling the list.
For IDs, codes, log lines, or fixed-width data, make the matching stricter. Turn case-sensitive matching on when capitalization matters, and leave internal-space collapse off when spacing is part of the value.
- Use
Keep last occurrencewhen later rows supersede earlier rows in an exported list. - Use
Keep only lines that appear onceonly when every repeated value should disappear completely. - Open
Duplicate Ledgerbefore replacing the source list; it names kept lines, removed lines, and removed text. - Sort A-Z or Z-A only after verifying the retained lines, because sorting is applied after removal.
Do not rely only on the removed-line count. A high reduction can be good for noisy keyword lists and harmful for records where case or spacing carries meaning.
Step-by-Step Guide:
- Paste one entry per line into
Line list, or useBrowse TXTfor a TXT, CSV, or LOG file under the browser-side size limit. - Set
Case-sensitive match,Trim line edges, andIgnore blank linesbefore judging the counts. - Open Advanced and choose
Keep rule: first, last, or unique-only. - Choose
Output order. Keep source order for audit trails, or sort after de-duplication for keyword and contact lists. - Use
Line Metricsto check original lines, comparable lines, unique keys, kept output lines, removed lines, and duplicate groups. - Open
Duplicate Ledgerwhen removed lines are nonzero. If a kept line or removed line looks wrong, adjust the matching rules before copyingCleaned Lines.
Interpreting Results:
Kept lines is the size of the cleaned output. Removed lines is the number of source entries dropped by the current keep rule. Unique match keys is the count after matching rules are applied.
The main false-confidence risk is assuming that fewer lines always means a better list. If Case-sensitive match is off, ABC and abc share a key; if Trim line edges is on, beta and beta share a key.
Use Duplicate Ledger as the corrective check. It shows the duplicate key, occurrence count, kept line, removed line numbers, removed text, and kept text for every repeated group.
Worked Examples:
With the default sample alpha, beta, Alpha, beta , gamma, and another beta, case-insensitive matching and trimmed edges group alpha with Alpha, and all beta variants together. Keep first occurrence leaves alpha, beta, and gamma.
If Case-sensitive match is turned on for the same sample, alpha and Alpha become separate keys. The removed-line count drops because capitalization now changes identity.
A troubleshooting case starts with an empty cleaned output after choosing Keep only lines that appear once. That means every comparable line belonged to a repeated group under the current rules. Switch back to Keep first occurrence or tighten case and spacing rules if repeated values should leave one representative line.
FAQ:
Does sorting affect which duplicate is kept?
No. Duplicate grouping and the keep rule run first. Output order sorting is applied only to the retained entries.
Why did only one blank line survive?
When Ignore blank lines is off, blank rows become a match key. With Keep first occurrence, one blank row can remain while later blank rows are removed.
Can non-adjacent duplicates be removed?
Yes. Lines are grouped by comparison key across the whole source list, so matching entries do not need to be next to each other.
Does loaded file content leave the browser?
The file reader loads TXT, CSV, or LOG content into the page for browser-side cleanup. The description does not show any server upload path for line-list content.
Glossary:
- Comparison key
- The value used for duplicate grouping after case, trimming, blank-line, and spacing rules are applied.
- Duplicate group
- Two or more entries that share the same comparison key.
- Keep rule
- The rule that decides whether the first, last, or no repeated entry survives.
- Comparable lines
- Source lines that remain after the blank-line policy is applied.
References:
- uniq invocation, GNU Coreutils manual, version 9.10.