HTML Tag Stripper
Turn pasted HTML, emails, or page fragments into browser-local plain text with scope choices, link checks, alt-text handling, and ledgers.- {{ warning }}
{{ plainText }}
| Check | Value | Detail | Copy |
|---|---|---|---|
| {{ row.check }} | {{ row.value }} | {{ row.detail }} |
| Element | Count | Treatment | Detail | Copy |
|---|---|---|---|---|
| {{ row.element }} | {{ row.count }} | {{ row.treatment }} | {{ row.detail }} |
| Item | Text and URL | Treatment | Copy |
|---|---|---|---|
| {{ row.item }} |
{{ row.text }}
{{ row.url }}
|
{{ row.handling }}. {{ row.detail }} |
Introduction:
Copied HTML is often useful long after the page layout stops mattering. A product description may need to move from a storefront into a spreadsheet, an email reply may need to become a support note, or a developer may need to inspect the text hiding inside a template fragment. The visible words are only part of the job. HTML also carries links, images, list order, table shape, hidden text, character references, and script or style blocks that can change what a reader sees after markup is removed.
Tags are the angle-bracket markers in the source, while elements are the document parts a browser builds from those markers. That distinction matters because a safe-looking delete operation can flatten meaning. Removing <li> tags without list markers turns separate steps into a run-on paragraph. Dropping <a> tags may keep the label while losing the destination. Ignoring <img> can remove a chart label, product variant, or warning that existed only as alternative text.
- Readable copy
- The words that should survive for pasting into a ticket, document, spreadsheet, search field, or plain-text message.
- Structure
- Line breaks, list markers, headings, and table rows that keep the text understandable after visual layout disappears.
- References
- Link destinations and image source clues that may need to remain visible for review, attribution, or follow-up work.
- Hidden material
- Comments, hidden elements, templates, scripts, styles, and accessibility-hidden text that may be noise for normal copy but relevant for an audit.
Different plain-text jobs value different parts of the source. A customer-service note usually needs readable paragraphs and the main link destinations. A spreadsheet handoff may need one table row per line. A search snippet may need a compact single line. An accessibility or content audit may need alternative text and a record of hidden material, while a normal copy task may treat that same material as noise.
The common mistake is treating tag removal as sanitization. Plain text can be safer to copy than raw markup because it is not executable HTML, but it does not prove the original markup is safe to render again. Any workflow that will display user-supplied HTML still needs proper sanitization, contextual output encoding, and URL handling at the destination.
How to Use This Tool:
Start with the smallest source that contains the content you want to keep, then choose how the converted text should treat structure, references, and non-visible material.
- Paste markup into HTML source, drop source text onto the editor, choose Browse HTML for one
.html,.htm, or.txtfile under 1 MB, or use Load sample to see a populated result. - Set Content scope. Auto content root tries article, main, role-main, and common content containers first. Body or fragment uses the parsed body or pasted fragment. Document title plus body prepends the title before the body text.
- Choose Output style. Readable paragraphs keeps block spacing, One block per line helps with row review, and Compact one line collapses whitespace for short fields or snippets.
- Decide whether Hidden content and Script/style content belong in the result. The default skips hidden and aria-hidden text and removes script, style, template, and noscript blocks.
- Set Link handling and Image text. Keep link labels alone, append safe URLs, create Markdown-style links, or add a reference list. For images, keep alt text, ignore images, or add safe source URLs beside the alt text.
- Use Entity handling when source spelling matters. Decode entities for ordinary reading, or preserve named and numeric tokens when you are checking escaping or preparing another conversion step.
- Open Advanced for list markers, table flattening, blank-line limits, and outer whitespace. Review Plain Text first, then use Text Cleanup Ledger, Element Treatment Map, and Link and Image Ledger to check warnings and source treatment.
If the warning list says the selected scope produced no visible text, switch scope, keep hidden content for audit, or paste a smaller fragment that contains the visible body you expected.
Interpreting Results:
Plain Text is the copy-ready result, but the result is only trustworthy after a quick source check. Compare the opening heading, list order, important URLs, table rows, image alt text, and any warning about skipped hidden elements or removed script-style blocks.
- Text Cleanup Ledger summarizes scope, output style, stripped tag count, hidden-content treatment, link and image handling, script/style treatment, entity mode, list and table policy, and warnings.
- Element Treatment Map shows which element names were found and how their text was handled.
- Link and Image Ledger is the main place to check missing or suppressed destinations, empty link labels, image alt text, and safe source handling.
- JSON keeps the settings, summary, plain text, warnings, ledgers, and source note together for repeat review.
A tidy result can still be incomplete. Missing alt text, flattened tables, broad body scope, and suppressed URL schemes can all produce readable output while hiding the reason a human should review the source again.
Technical Details:
HTML-to-text conversion starts by parsing the source into a document tree. That is more reliable than a raw text replacement because copied fragments often contain omitted wrappers, mixed-case element names, nested links, comments, and malformed markup. Once a browser has a document tree, the text can be extracted from nodes while element names guide spacing and review rules.
There is no single correct plain-text rendering for every HTML document. A paragraph break, a table row, a link destination, and an image's alternative text are different kinds of meaning. The conversion has to turn each one into a text convention that matches the user's next workflow.
Transformation Core:
| Stage | Rule | Review Evidence |
|---|---|---|
| Parse source | Pasted text and supported local files are read as HTML so fragments and full documents can be normalized before extraction. | Source character count, source line count, file note, and empty-source warnings. |
| Select content | The source can use an automatic content root, the body or fragment, or the document title followed by body text. | Source scope in Text Cleanup Ledger. |
| Render text | Text nodes are kept, block elements create boundaries, list items receive markers, and table cells are flattened according to the selected table policy. | Plain-text word count, output line count, stripped tag count, and Element Treatment Map. |
| Handle references | Links keep their labels and may add safe URLs; images may contribute alt text, no text, or alt text plus a safe source URL. | Link and Image Ledger, link count, image count, and unsafe URL count. |
| Normalize whitespace | The final text becomes readable paragraphs, one block per line, or one compact line, with optional trimming and a blank-line cap. | Output style, Maximum blank lines, and the final Plain Text result. |
Element Handling Map:
| Construct | Text Treatment | What to Check |
|---|---|---|
| Headings, paragraphs, sections, articles, main regions | Markup is removed while readable block boundaries are preserved. | Broad scope can pull in navigation, repeated headers, sidebars, or footer text. |
| Links | Visible link text remains. Safe URLs can be omitted, appended, converted to Markdown-style links, or collected as references. | Empty labels, very long URLs, and suppressed destinations need manual review. |
| Images | Image tags are removed. Alternative text can be kept, ignored, or paired with a safe source URL. | Missing or decorative alt text can remove real meaning from instructions, products, charts, and linked images. |
| Lists | Ordered items use generated numbers. Unordered items can use dash, asterisk, or no prefix. | Nested menus and copied navigation may still need cleanup after extraction. |
| Tables | Rows become tab-separated cells, pipe-separated cells, or readable cell lines. | Merged cells, header relationships, and layout-only tables cannot be fully represented in plain text. |
| Scripts, styles, templates, noscript blocks | These blocks are removed by default or kept as text when selected for review. | Keeping this text is useful for audits, not for ordinary publishable copy. |
| Hidden and aria-hidden elements | Hidden text is skipped by default or kept when the audit setting is selected. Comments are omitted. | Hidden legal text, accessibility labels, tracking fragments, or template leftovers may matter in review work. |
Entity handling decides whether character references become readable characters. Decoding turns source tokens such as &, , and numeric references into their text form. Preserving entities keeps the exact source spelling visible, which helps when checking double escaping, broken encoding, or a handoff into another converter.
Unsafe URL suppression is deliberately narrow. It keeps javascript:, vbscript:, and data: destinations out of copied references, including simple whitespace-obfuscated forms. That is a review guard for plain text, not a complete link-security policy or an HTML sanitizer.
Privacy and Limits:
Pasted source and supported local files are processed in the browser after the page loads. The conversion does not upload the source HTML, generated plain text, ledgers, JSON record, or current settings.
- File loading accepts one HTML, HTM, or TXT text file under 1 MB. Trim very large pages to the article, email body, table, or fragment that needs review.
- Local processing does not protect anything you later copy, download, screenshot, share in a URL, or paste into another service.
- Plain text is not sanitized HTML. Rendering the original markup anywhere else still requires destination-specific sanitization and output encoding.
- Visual layout, merged table relationships, event handlers, comments, and decorative images may be lost or intentionally omitted.
Advanced Tips:
- Use Auto content root for articles and emails first, then check Source scope when navigation or footer text appears in the result.
- Choose Reference list at bottom when the destination needs clean prose plus a separate link audit trail.
- Switch Table handling to Rows with tab-separated cells before pasting table text into spreadsheets.
- Keep Script/style content only when auditing source material. Remove it for ordinary reader-facing copy.
- Use Preserve entity tokens when the exact escaping is evidence, then compare with decoded text before publishing or sending the copy onward.
Worked Examples:
Support email handoff
An HTML email includes a paragraph, two action links, a tracking image, and a script-style block. With Readable paragraphs, Append URLs in parentheses, Use alt text, and Remove script/style/template blocks, the Plain Text result keeps the message and safe destinations while the warning list notes the removed block.
Spreadsheet table cleanup
A copied HTML table needs to become reviewable rows. Set Table handling to Rows with tab-separated cells and check that each row appears on its own line. The Element Treatment Map should confirm table, row, and cell counts, but merged cells still need manual review because their visual relationships do not fully survive flattening.
Escaping audit
Source containing &, , and numeric references can be checked in two passes. Decode entities to text shows the human-readable copy, while Preserve entity tokens keeps the exact source tokens visible for a bug report or converter handoff.
No visible text warning
A fragment made mostly of hidden elements may produce a warning that the selected scope has no visible text. Switch Hidden content to Keep hidden source text for audit if that hidden material is the subject, or paste a smaller visible fragment if the wrong source was selected.
FAQ:
Does stripping tags make HTML safe?
No. The result is plain text for reading, copying, or audit notes. If the original markup will be rendered again, use proper sanitization, contextual output encoding, and destination-specific URL handling.
Why are some URLs missing?
URL schemes such as javascript:, vbscript:, and data: are suppressed from copied references. Check Link and Image Ledger to see which item was suppressed.
What files can I load?
Browse HTML and drag-and-drop accept one HTML, HTM, or TXT text file under 1 MB. Larger sources should be trimmed before loading or pasted as smaller fragments.
Should image alt text be kept?
Keep alt text when images carry instructions, product meaning, chart labels, or link purpose. Ignore images when they are decorative or when their alt text would add noise to the plain-text handoff.
Why did navigation or footer text appear?
The selected scope may have used a broader body extraction. Check Source scope, switch content scope, or paste only the article, email body, or fragment you intended to convert.
Glossary:
- Element
- A parsed HTML document part, such as a paragraph, link, image, list item, or table cell.
- Tag
- The source marker that starts or ends an element, such as
<p>or</a>. - Content root
- The selected part of the parsed document used for extraction, such as an article, main region, body, fragment, or title plus body.
- Entity
- An HTML character reference such as
&or that can be decoded or preserved. - Alternative text
- Image text that describes the image information or function when the image itself is unavailable or not useful.
- Unsafe URL scheme
- A destination scheme such as
javascript:,vbscript:, ordata:that should not be copied into ordinary text without review.
References:
- DOMParser: parseFromString(), MDN Web Docs.
- HTML elements reference, MDN Web Docs, 6 February 2026.
- Images Tutorial, W3C Web Accessibility Initiative, 8 April 2026.
- Cross Site Scripting Prevention Cheat Sheet, OWASP Cheat Sheet Series.