{{ summaryHeading }}
{{ summaryMetric }}
{{ summaryLine }}
Disabled {{ engineBadge }} {{ targetBadge }} {{ sanitizeBadge }} Local preflight
PDF redaction preflight inputs
{{ message }}
Drop or browse one PDF to inspect page count, selectable text, and hidden-data risk markers.
{{ sourceTitle }}
{{ sourceHint }}
One term or regex per line. Use this to plan searchable redactions and post-redaction verification.
Examples: all, odd, even, 1-3, 5, 8-.
Optional area plan for signatures, images, tables, or scanned text that search cannot find.
Leave enabled for any production handoff; turning it off blocks the readiness gate.
{{ sanitizeHiddenBool ? 'Required cleanup planned' : 'Cleanup not planned' }}
{{ actionStatusLabel }}
{{ actionHint }}
Leave blank to derive a future output name from the selected PDF.
Choose the future mark style for reviewer consistency.
Short labels such as REDACTED or CONFIDENTIAL work best.
Raise only for small, text-heavy PDFs on a capable desktop browser.
pages
Gate Status Evidence Next action Copy
{{ row.gate }} {{ row.status }} {{ row.evidence }} {{ row.action }}
Type Target Evidence Status Next check Copy
{{ row.type }} {{ row.target }} {{ row.evidence }} {{ row.status }} {{ row.action }}
Signal Value Detail Copy
{{ row.signal }} {{ row.value }} {{ row.detail }}
Step Action Status Handoff Copy
{{ row.step }} {{ row.action }} {{ row.status }} {{ row.handoff }}

        
Customize
Advanced
:

Introduction

PDF redaction is the permanent removal of sensitive material from a document before it is shared, filed, or handed to another team. A visible black box is only a mark on the page unless the underlying text, image pixels, form values, and other stored objects are removed or flattened in a way that prevents later recovery.

Planning redaction is difficult because a PDF is not just what appears on screen. It can contain selectable text, scanned images, comments, form fields, file attachments, metadata, alternate layers, digital signatures, JavaScript, and text used for accessibility or optical character recognition. Sensitive information can appear in any of those places, so a useful review separates visible targets from hidden-data cleanup and verification.

PDF redaction preflight path from source document to evidence, handoff plan, and blocked output engine

Good redaction work therefore has two stages. The first stage identifies what must be removed and proves that the document can be searched or reviewed. The second stage uses a trusted production process to remove the material, sanitize hidden content, and verify that the released copy no longer exposes the targets.

Technical Details:

A PDF stores visible page marks and supporting document objects separately. Page content can be vector text, images, or form fields; extra objects can hold annotations, metadata, embedded files, optional content groups, structure tags, hidden text hints, JavaScript, and signatures. A cover rectangle drawn over text may hide it from view while leaving the original text selectable, searchable, extractable, or recoverable from the document structure.

Redaction planning starts by proving basic PDF structure, selecting the pages under review, and listing every removal target. Searchable targets can be plain phrases or regular expressions. Non-searchable targets, such as signatures, scanned tables, photographs, or whole appendices, need manual areas because text extraction may never see the sensitive content.

Selectable-text preflight is useful evidence, not proof of safe redaction. It can find target matches in parsed text on selected pages, but it does not read every image pixel, recover every optical-character-recognition layer, rewrite content streams, sanitize metadata, or validate a final released copy. The output engine remains blocked until a production process can remove content rather than only document a plan.

Rule Core

The readiness gate separates planning signals from irreversible output. Rows can be ready or planned while the final output gate stays blocked.

PDF redaction preflight gate rules
Gate Ready condition Stop condition Review meaning
PDF source One selected file passes the PDF header check and is below the 80 MB browser guard. No file, non-PDF header, or oversize source. Use a clean original copy and process any release output separately.
Page selection The range resolves to at least one page in the detected page count. Malformed range, page outside the document, or an empty selection. The selected pages define where text matching and handoff planning apply.
Redaction targets At least one valid phrase, regular expression, manual area, or full-page item exists. Duplicate or invalid regex target, malformed manual area, or no target. The target ledger becomes the checklist for production removal and later verification.
Hidden-data cleanup The cleanup switch is enabled. Cleanup is off. Metadata, comments, attachments, layers, forms, hidden text, and active content still need sanitizer coverage.
Selectable-text preflight Selected pages scan within the page guard and return match evidence or no-match evidence. Text engine error, page range error, or selected pages above the guard. Text matches help plan redaction; no matches do not clear image-only or hidden content risk.
Output engine Never ready in the current disabled state. No content-stream removal, rasterization, OCR reset, sanitizer, or verification worker is present. Use the result as a handoff plan, not as a redacted PDF.

Page ranges use one-based PDF page numbers. A blank value, all, or * selects every page. The words odd and even select parity groups. A closed range such as 2-5 includes both endpoints, while an open range such as 8- continues through the final page. Ranges that start beyond the document or point to a missing page are treated as review errors.

Target and area input formats
Input type Accepted shape How it is used Common caution
Phrase target One phrase per line, such as account number. The phrase is escaped and matched without case sensitivity across selectable text. A phrase cannot find scanned pixels or misspelled variants.
Regex target JavaScript-style syntax such as /[A-Z0-9._%+-]+@[A-Z0-9.-]+/i. The expression is compiled globally, with duplicate flags removed and g added when missing. An invalid pattern is kept in the ledger with review status until corrected.
Manual area 2: 72,120,240,36 | signature block. The area is recorded in PDF points from the lower-left page corner. Coordinates must have positive width and height and must target an existing page.
Full page 3: full-page | appendix. The entire page is marked for removal or raster redaction in the production process. Full-page handling should preserve intended page count and accessibility requirements.

The evidence scan samples the PDF bytes and looks for structural markers that often matter during redaction review. Marker detection is intentionally conservative: finding a marker means the item deserves attention, while not finding a marker does not prove the item is absent from every object in a complex PDF.

PDF evidence markers and redaction implications
Marker group Evidence shown Why it matters
Structure Header, EOF marker, page hints, object count, stream count, and xref count. Basic structure confirms the source looks like a PDF and gives rough complexity clues.
Security Encryption marker. Encrypted PDFs need an approved password flow and stronger production parsing.
Hidden data Metadata, annotations, attachments, optional content, and hidden text hints. These objects can carry sensitive material outside the visible page marks.
Active or formal content JavaScript, AcroForm or XFA forms, signatures, and timestamp markers. Release policy may require flattening, removal, signature handling, or a separate approval path.

Everyday Use & Decision Guide:

Start with the exact source PDF that will go into the redaction workflow. The source stays in the browser for preflight, and the summary should change from Choose a PDF to a selected source status. If the selected file is above the 80 MB guard or fails the header check, use a smaller valid PDF or move straight to a production parser.

Use Search targets for names, account fragments, email patterns, identifiers, or phrases that should be searchable. Use one item per line. Put regexes in slash notation when a pattern is safer than a literal phrase, and check the ledger if a regex row moves to Review.

Manual areas are the right place for signatures, stamps, image-only tables, scanned text, logos, or any sensitive region that text search cannot find. Use point coordinates only when a reviewer can map them back to the PDF page box. For an appendix or page that should be removed as a whole, use full-page rather than trying to approximate a large rectangle.

  • Leave Hidden data cleanup enabled for any release handoff; turning it off blocks the cleanup gate.
  • Use Pages to review to narrow a large packet before text preflight, but make sure skipped pages do not contain sensitive material.
  • Raise Text preflight guard only for smaller text-heavy PDFs on a capable browser, because the scan runs locally.
  • Use Redaction appearance and Overlay label as handoff preferences only. Visual boxes are not proof of permanent removal.
  • Review Redaction Gate before any export. The row for Irreversible output engine should still be Blocked.

The best first pass is a narrow page range, a few high-confidence search targets, hidden cleanup on, and manual areas for anything visual. After the preflight, use Target Ledger and PDF Evidence to decide what the production redaction process must remove and what the final verification pass must search again.

Step-by-Step Guide:

Use the preflight as a planning path, then send the evidence to a real redaction process.

  1. Choose Source PDF with Browse PDF or the drop area. The summary should show the selected file, size, and page count after analysis.
  2. Set Pages to review. Use all, odd, even, a single page such as 5, a range such as 1-3, or an open range such as 8-. Fix any range error before trusting the scan.
  3. Enter Search targets. Use phrases for exact text and regexes for patterns such as email addresses, tax IDs, or account formats. Duplicate and invalid rows appear in Target Ledger with review wording.
  4. Add Manual areas for image-only or coordinate-based removals, then keep Hidden data cleanup on. Open Advanced only when target filename, mark appearance, overlay label, or page guard settings need to be recorded.
  5. Click Analyze PDF or Refresh preflight. If the text engine reports a page limit, narrow the range or adjust the guard. If no selectable text is found, plan manual review, optical-character-recognition handling, or page rasterization.
  6. Review Redaction Gate, Target Ledger, PDF Evidence, and Handoff Plan. The plan is ready only when source, pages, targets, and cleanup are valid while the output engine remains blocked.
  7. Copy or download CSV, DOCX, or JSON evidence for the production process. After production redaction, search every target again, inspect hidden data, compare page count, and test copy/paste extraction before release.

A complete preflight ends with clear target rows, cleanup planned, evidence exported, and no claim that a redacted PDF was created.

Interpreting Results:

Redaction Gate is the main status view. Ready and Planned mean the inputs can support a handoff. Blocked on the output engine is expected and important: it means the current state has not removed or rewritten sensitive PDF content.

Target Ledger is the review checklist. A Detected row means selectable text matched on scanned pages. Not detected means the selectable-text scan did not find the target, not that the sensitive item is absent from images, forms, annotations, OCR text, or other hidden objects.

  • Hidden data markers should widen the production cleanup plan, especially when metadata, attachments, layers, annotations, or hidden text hints appear.
  • Selectable text with zero characters usually points to scanned pages or images; use manual areas or OCR-aware production review.
  • Encryption marker means the source should go through an approved password and parsing workflow before redaction.
  • Target filename is only a planning value; it does not mean a redacted output exists.

The safest interpretation is conservative. Treat the tables as evidence for what to remove and verify, then rely on a separate production engine and post-redaction search to decide whether the release copy is safe.

Worked Examples:

Contract with account and email targets

A 12-page contract needs account numbers and contact emails removed from pages 1 through 8. Enter 1-8 in Pages to review, add account number and an email regex in Search targets, and keep Hidden data cleanup on. After analysis, Target Ledger may show detected matches across several pages, while Handoff Plan still marks true redaction as blocked until a production engine removes the content.

Signature block on a scanned page

A scanned approval page has no selectable text, but the signature block needs removal. Add a manual entry such as 2: 72,120,240,36 | signature block. If Selectable text shows zero characters, the result supports manual or raster redaction planning rather than text search. Target Ledger should show the manual area as ready when the page number and coordinates are valid.

Appendix scheduled for whole-page removal

A 30-page disclosure packet has a confidential appendix on page 29. Add 29: full-page | confidential appendix and keep the page range wide enough to include that page. Redaction Gate should report one manual target, and Handoff Plan should tell the production process to remove or raster-redact the whole page while preserving the intended release structure.

Range and cleanup failure

A reviewer enters 99 for a 14-page PDF and turns off Hidden data cleanup. The page gate moves to review because the page is outside the document, and the cleanup gate becomes blocked. Correct the range to a valid value such as 9-14, turn cleanup back on, and rerun Refresh preflight before exporting evidence.

FAQ:

Does this create a redacted PDF?

No. It prepares local preflight evidence and a handoff plan. The output engine remains blocked, and no redacted PDF download is produced.

Why are black boxes not enough?

A box can hide content visually while the original text or image data remains in the PDF. True redaction must remove or rasterize the sensitive content and then sanitize hidden data.

What does Not detected mean in the target ledger?

It means the selectable-text scan did not match that target on the selected pages. It does not clear scanned images, hidden text, form values, annotations, or attachments.

What should I do when the page range fails?

Use one-based pages inside the loaded PDF count. Accepted examples include all, odd, even, 1-3, 5, and 8-.

Does the selected PDF get uploaded?

The selected PDF is read in the browser for preflight. Handle the source and exported evidence according to your data-handling rules, because copied rows, downloads, and screenshots can still expose sensitive details.

When is the plan ready for handoff?

The plan is ready when source, page selection, targets, and hidden-data cleanup are valid. It is still only a handoff plan until a production process removes content and verification confirms the result.

Glossary:

True redaction
Permanent removal or rasterization of sensitive content so it cannot be selected, copied, extracted, or recovered from the release copy.
Sanitization
Cleanup of hidden PDF data such as metadata, comments, attachments, layers, forms, scripts, and stale text.
Selectable text
Text that a PDF parser can extract from a page and compare with phrase or regex targets.
Manual area
A planned redaction rectangle or full-page removal that covers content search cannot reliably find.
PDF point
A page-coordinate unit used to describe manual areas from the lower-left corner of the page.
Optional content
PDF layers or layer-like objects that may show or hide page content depending on viewer settings.
EOF marker
The end-of-file marker used as a basic PDF completeness signal during preflight.

References: