{{ summaryHeading }}
{{ summaryMetric }}
{{ summaryLine }}
Preflight only {{ engineBadge }} {{ targetBadge }} {{ sanitizeBadge }} Local preflight
PDF Marks Clean PDF
PDF redaction preflight inputs
{{ message }}
Drop or browse one PDF to inspect page count, selectable text, and hidden-data risk markers.
{{ sourceTitle }}
{{ sourceHint }}
Name the future redacted PDF before exporting the preflight evidence.
Paste targets, or browse/drop one TXT, CSV, TSV, MD, or LOG file. One term or regex per line.
{{ targetActionHint }}
Examples: all, odd, even, 1-3, 5, 8-.
Choose the future mark style for reviewer consistency.
Optional area plan for signatures, images, tables, or scanned text that search cannot find.
Leave enabled for any production handoff; turning it off blocks the readiness gate.
{{ sanitizeHiddenBool ? 'Required cleanup planned' : 'Cleanup not planned' }}
{{ actionStatusLabel }}
{{ actionHint }}
Short labels such as REDACTED or CONFIDENTIAL work best.
Raise only for small, text-heavy PDFs on a capable desktop browser.
pages
Gate Status Evidence Next action Copy
{{ row.gate }} {{ row.status }} {{ row.evidence }} {{ row.action }}
Type Target Evidence Status Next check Copy
{{ row.type }} {{ row.target }} {{ row.evidence }} {{ row.status }} {{ row.action }}
Signal Value Detail Copy
{{ row.signal }} {{ row.value }} {{ row.detail }}
Step Action Status Handoff Copy
{{ row.step }} {{ row.action }} {{ row.status }} {{ row.handoff }}

        
Customize
Advanced
:

PDF redaction fails most often when the visible mark is mistaken for the security step. A black rectangle, white box, highlight, or label can make a page look clean while the original text, image pixels, annotation, form value, bookmark, attachment, metadata, or searchable OCR text still remains in the file. A safe release copy has to remove or replace the sensitive content, clean hidden document data, and then survive another search and inspection pass.

PDF files make this work harder than ordinary page images. A single page can contain selectable text, raster images, vector drawing commands, optional content, comments, form fields, signatures, embedded files, document properties, and accessibility text. Some pages are image-only scans with no selectable text. Others look like scans but also carry an invisible OCR layer. Search results are useful clues, but they are not enough to prove that every sensitive item has been found or removed.

PDF redaction release flow from finding targets to removal, hidden-data cleanup, and final verification.

A redaction plan is the checklist before the irreversible step. It should name the PDF, the page range, the phrases or patterns to search, the manual areas that search cannot see, the hidden-data cleanup requirement, and the checks expected after the release copy is produced. The plan is not the release copy. The final PDF still needs a redaction process that removes data rather than drawing cover marks over it.

Common PDF redaction risk areas
Risk area Why search may miss it Release check
Selectable text Search can find many text objects, but only on the pages scanned and only for the targets supplied. Search the redacted output again for every target.
Scans and images Text inside pixels may not be selectable, even when it is visible on the page. Use manual areas, OCR-aware review, or page-level removal.
Annotations and forms Comments, highlights, fields, and stamps can hold text outside the main page content. Flatten, remove, or sanitize them according to the release policy.
Metadata and attachments Document properties, embedded files, prior saves, or hidden structures may not appear on the page. Run hidden-data cleanup and inspect the final file properties.

The key habit is to separate planning from publication. Planning helps reviewers find and record risk. Publication depends on true removal, cleanup, and final verification in the output PDF.

How to Use This Tool:

Use the planner to build local preflight evidence before sending a PDF to a production redaction workflow.

  1. Choose one file with Source PDF. The browser checks one PDF at a time and rejects files above the 80 MB preflight guard.
  2. Set Target filename for the future release copy. This name appears in evidence and handoff output; it does not create a redacted PDF.
  3. Enter Pages to review. Use all, *, odd, even, single pages, closed ranges such as 1-3, or open ranges such as 8-.
  4. Add Search targets. Paste one phrase per line, use slash-style regular expressions, or load one TXT, CSV, TSV, MD, or LOG target file under 1 MB.
  5. Add Manual areas for signatures, scanned text, images, stamps, tables, or full pages that selectable-text search cannot prove. Use page: x,y,width,height | label or page: full-page | label.
  6. Leave Hidden data cleanup enabled for any production handoff. Turning it off moves that gate to Blocked because metadata, comments, attachments, layers, forms, and hidden text still need cleanup.
  7. Open Advanced only when needed. Keep Overlay label short, and raise Text preflight guard only for a small, text-heavy PDF on a capable browser.
  8. Run Analyze PDF or Refresh preflight. Fix invalid ranges, duplicate targets, malformed regular expressions, bad manual-area syntax, oversize files, or page selections above the guard before using the evidence.
  9. Review Redaction Gate, Target Ledger, PDF Evidence, and Handoff Plan. Export evidence only after the source, pages, targets, and cleanup gates match the intended handoff.

Interpreting Results:

Redaction Gate is the main readiness view. Ready and Planned mean the planning evidence is usable. Blocked on the irreversible output step is expected because the planner does not remove content streams, rasterize pages, clean metadata, or create a final PDF.

  • Detected in Target Ledger means selectable text on the selected pages matched a phrase or regular expression. It does not clear images, forms, annotations, OCR layers, or hidden objects.
  • Not detected means the local text scan found no match for that target on the selected pages. It is not proof that the information is absent from pixels, form values, or hidden structures.
  • Selectable text with zero characters usually means image-only pages or pages the text scan cannot read. Treat that as a manual review signal.
  • Hidden data markers widen the cleanup plan when metadata, annotations, attachments, layers, forms, signatures, active content, or hidden text hints appear.
  • Target filename is a handoff name, not evidence that a release file exists.

Do not treat a clean ledger as safe disclosure. After production redaction, reopen the output PDF, search every target again, inspect hidden-data findings, test copy and paste near redacted areas, and confirm the page count before release.

Technical Details:

PDF redaction is governed by object removal, not by page appearance alone. Text, images, annotations, form fields, optional layers, and document metadata can all survive a visible overlay. The release copy needs the sensitive object removed or replaced, and any hidden data that could expose it needs to be cleaned before the file is shared.

Preflight work is narrower. It can validate that a selected source looks like a PDF, resolve a page range, parse search targets and manual areas, inspect common structure markers, and scan selectable text within a browser-side page guard. Those checks are evidence for a later redaction workflow. They do not perform the irreversible removal step.

Rule Core:

PDF redaction preflight gate rules
Gate Ready or planned condition Review or stop condition Reason
PDF source One selected source starts with a PDF header and stays below 80 MB. No source, multiple dropped files, non-PDF header, or oversize source. The evidence must be tied to one clean original copy.
Page selection The page expression resolves to at least one detected page. Malformed range, empty selection, or a page outside the document. Only selected pages are scanned for selectable text.
Redaction targets At least one valid phrase, regular expression, manual rectangle, or full-page item exists. No target, duplicate target, invalid expression, or malformed manual area. The target list becomes the removal and verification checklist.
Hidden data cleanup The cleanup switch remains enabled. Cleanup is off. Non-visible data still needs sanitizer coverage after visible redaction.
Selectable text preflight The selected pages scan within the guard and return match or no-match evidence. Reader error, range error, zero selectable characters, or too many selected pages. Search evidence is helpful, but image-only and hidden content need separate review.
Irreversible output engine Not available in this planner. Always blocked. A separate redaction process must remove data rather than draw cover rectangles.

Target and Range Rules:

PDF redaction target and range parsing rules
Input Accepted shape Check performed Correction path
Page range all, *, odd, even, page numbers, comma lists, and ranges such as 2-5 or 8-. Pages use one-based numbering. Closed ranges include both ends, and ranges past the final page are clipped with a warning. Fix malformed tokens or page numbers outside the detected document.
Phrase target One phrase per line, such as account number. The phrase is matched case-insensitively against selectable text. Add manual areas when the same information appears only as pixels or scans.
Regex target Slash-style expression such as /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i. The expression must compile before it can be counted as a ready target. Fix invalid syntax before relying on the target ledger.
Manual rectangle 2: 72,120,240,36 | signature block. The page must exist, and width and height must be positive PDF point values. Confirm coordinates against the page box before production redaction.
Full page 3: full-page | appendix. The page must exist and is recorded as a full-page redaction or removal item. Verify final page count and document structure after production.

Evidence Marker Map:

PDF evidence markers and redaction implications
Evidence group Signals shown Redaction implication
Structure PDF header, EOF marker, page hints, object count, stream count, and cross-reference count. These are document-shape clues, not a complete parser audit.
Security Encryption marker. Use an owner-approved password path and a production parser before release work.
Hidden data Metadata, annotations, attachments, optional layers, and hidden text hints. Sanitization must cover more than visible redaction boxes.
Active or formal content JavaScript, forms, signatures, and timestamp markers. Policy may require flattening, removal, signature handling, or separate approval.

Privacy Notes:

The selected PDF, pasted targets, loaded target list, manual areas, and generated evidence are handled in the browser for this preflight workflow. No redacted PDF is created, and evidence files are produced only when you choose a copy, CSV, DOCX, or JSON action.

  • The browser may need network access to load PDF reading code before selectable-text scanning can run.
  • Target strings and match evidence can appear on screen and in downloaded evidence, so treat exports as sensitive material.
  • A preflight report is not a retention control. Store exported evidence with the same care as the source PDF.

Worked Examples:

These examples show how planning evidence differs from true redaction output.

Contract with account and email targets

A reviewer selects a 12-page contract, sets Pages to review to 1-8, and enters account number plus an email regular expression in Search targets. Target Ledger may show Detected rows with match counts and page counts, while Handoff Plan still keeps Apply true redaction blocked until separate software removes the content.

Signature on an image-only page

A scanned approval page has no selectable text, but the signature block needs removal. The reviewer enters 2: 72,120,240,36 | signature block in Manual areas. If PDF Evidence reports zero selectable characters, the correct follow-up is manual or OCR-aware review rather than trusting text search.

Range and cleanup failure

A reviewer enters 99 for a 14-page PDF and turns off Hidden data cleanup. Redaction Gate moves Page selection to review and Hidden data cleanup to Blocked. Correct the range to 9-14, turn cleanup back on, and rerun Refresh preflight before exporting evidence.

Advanced Tips:

  • Use Manual areas for photographs, signatures, stamps, chart labels, and scanned text because selectable-text search cannot prove those pixels are covered.
  • Keep Pages to review narrow when a document is large. A smaller page range produces clearer evidence and avoids the text preflight guard.
  • Leave Hidden data cleanup on even when the visible targets look simple. Metadata, annotations, forms, attachments, and optional layers can still expose sensitive information.
  • Use Redaction appearance and Overlay label as reviewer instructions only. They are not proof that underlying content will be removed.
  • Export Redaction Gate and Target Ledger after fixing review rows so the production redactor receives a clean checklist.
  • After production redaction, compare the final page count and rerun target searches against the output PDF, not just the original plan.

FAQ:

Does this create a redacted PDF?

No. It creates local preflight evidence and a handoff plan. The irreversible output step remains blocked, and there is no redacted PDF download.

Why are black boxes not enough?

A drawn box can hide content visually while the original text, image data, metadata, comments, form values, attachments, or hidden layers remain in the PDF. True redaction removes or replaces the underlying data and then sanitizes hidden information.

What does Not detected mean?

It means the selected pages did not produce a selectable-text match for that target. It does not clear scanned images, OCR text, form fields, annotations, attachments, or manual areas.

What should I do when the text preflight guard blocks a range?

Narrow Pages to review or raise Text preflight guard only for a smaller, text-heavy PDF on a capable browser. Large or complex files belong in a production parser.

Can I use a target list file?

Yes. Load one TXT, CSV, TSV, MD, or LOG file under 1 MB. The file text becomes the Search targets list, with one phrase or slash-style regular expression per line.

Glossary:

Selectable text
Text that can be extracted from PDF page content and searched for target matches.
Manual area
A page rectangle or full-page instruction used for sensitive material that text search may not see.
Hidden data
Non-visible PDF information such as metadata, comments, attachments, optional layers, forms, or hidden text hints.
OCR layer
A searchable text layer added behind a scanned page image.
Sanitization
The cleanup step that removes hidden or leftover document data after visible redaction work.

References: