PDF Repair Tool
Repair a damaged PDF locally, review parser and structure warnings, and download a normalized copy when pages can be rebuilt.| Check | Status | Value | Detail | Copy |
|---|---|---|---|---|
| {{ row.check }} | {{ row.status }} | {{ row.value }} | {{ row.detail }} |
| Stage | Status | Action | Detail | Copy |
|---|---|---|---|---|
| {{ row.stage }} | {{ row.status }} | {{ row.action }} | {{ row.detail }} |
Introduction
A damaged PDF rarely fails for the same reason twice. The visible pages may look ordinary while the file around those pages has a broken header, stale cross-reference data, trailing transfer text, encrypted content, missing object records, or a final marker that no longer matches the bytes that follow it.
That mismatch matters because PDF readers do not all forgive the same problems. A desktop viewer might recover enough information to display the pages, while an upload portal, print system, e-signature workflow, archive validator, or conversion step rejects the same file. Repair therefore begins with separating a display problem from a structure problem.
- Header
- The early marker that identifies the byte stream as a PDF and usually names the PDF version.
- Objects and pages
- Numbered records and page tree entries that connect pages to content, resources, annotations, forms, and metadata.
- Cross-reference data
- Lookup information that lets a reader jump to objects instead of guessing where records begin.
- Trailer and EOF
- Closing information and the end marker that help a reader locate the document catalog and the latest revision.
The important question is whether a parser can still load the page tree. If it can, a clean save may remove harmless bytes after the final %%EOF, write a simpler structure, or create a new page-based copy. If it cannot, marker counts are only clues, and the better answer is usually a fresh export, a new download, or a specialist repair pass.
Signed PDFs need special caution. A signature can depend on exact byte ranges, so a rewritten copy may open correctly while no longer preserving the original signature. Keep the damaged original, test any repaired copy where it will be used, and avoid replacing evidence files without checking signature requirements.
How to Use This Tool:
Use the workflow as a local triage pass. Load one source file, check whether the browser can parse its pages, then create a copy only when the result says repair is possible.
- Drop or browse one file in Damaged PDF. The drop zone accepts one PDF at a time and reports ignored extra or unsupported files.
- Use Load sample when you want a harmless demonstration. The sample is a valid PDF with extra bytes after the final EOF marker, so the warning path is visible.
- Pick Repair attempt. Rewrite parsed PDF saves the parsed document again, Fresh page copy builds a new PDF from loaded pages, and Diagnostics only skips download creation.
- Set Output filename. Unsafe characters are cleaned up automatically, and a missing
.pdfextension is added before download. - Open Advanced only for large files. Browser work limit accepts 10 to 200 MB and defaults to 80 MB because a repair pass may hold both source and output bytes in memory.
- Choose Analyze PDF. If the action state says Work limit exceeded, raise the limit on a capable desktop browser or use a smaller source file.
- Read Diagnostic Ledger first, then Recovery Plan. Download from Rewritten PDF only when the tab says a local file is ready.
Interpreting Results:
Parser result carries the repair decision. Parsed means the browser loaded the document and counted pages, so a rewrite or fresh page copy can be attempted unless encryption is present. Blocked means the available structure was not readable enough for this local repair path.
Do not treat marker counts as a pass/fail result. A file can contain a header, several page markers, and an EOF marker while still failing because the page tree, object streams, cross-reference data, or encrypted content cannot be loaded.
- Trust Page signal most when it says Parsed; raw
/Pagematches are only an estimate. - Treat Encryption marker as a stop condition. This workflow does not unlock encrypted PDFs or bypass document passwords.
- Use Recovery Plan for ownership. Missing header or EOF points toward a fresh source export, while parser-blocked structure points toward specialist repair.
- Open the downloaded PDF in a separate reader, compare the page count, and inspect important pages before replacing or uploading it.
Technical Details:
A PDF normally gives readers two navigation anchors: an identifying header near the beginning and closing data near the end. The final area usually contains a trailer and a startxref pointer, which help locate cross-reference information and the document catalog. Incremental saves can append later revisions, so more than one plausible marker may appear in a healthy file.
Light repair is possible only after the page tree can be loaded. Saving the parsed document again can remove trailing bytes, produce a simpler byte stream, and write cleaner cross-reference information. It cannot infer missing page content, decrypt protected data, or promise that forms, annotations, attachments, metadata, and signatures survive exactly as before.
Rule Core:
| Signal | What It Tests | Practical Meaning |
|---|---|---|
| PDF header | Looks for a PDF version marker within the first 1024 bytes. | A missing header blocks repair because the byte stream cannot be confidently treated as a PDF. |
| EOF marker | Counts end markers and measures bytes after the final marker. | Trailing bytes can be harmless transfer noise when pages still parse; a missing EOF often points to truncation. |
| Cross-reference clues | Checks for startxref, xref, and trailer clues. |
Weak clues raise caution because readers may not find object records consistently. |
| Object balance | Compares object and end-object markers as a rough structure check. | A large mismatch suggests incomplete or malformed records, especially when parsing is blocked. |
| Encryption marker | Looks for the document encryption dictionary marker. | Encrypted PDFs need a valid password or unencrypted export before this repair path is useful. |
| Parser result | Attempts to load the document and count pages. | A parsed, unencrypted file can produce output; a blocked parser leaves only diagnostics and recovery guidance. |
Formula Core:
Two displayed quantities are simple byte calculations: the selected browser work limit and the bytes after the final EOF marker.
With the default 80 MB limit, the maximum source size is 83,886,080 bytes. If a file has 18,000 total bytes and the final %%EOF marker begins at byte 17,940, the trailing byte count is 18,000 - (17,940 + 5), or 55 bytes.
| Mode | Output Behavior | Best Fit |
|---|---|---|
| Rewrite parsed PDF | Saves the loaded document again with a normalized byte stream. | Parser-readable files with trailing transfer noise or stale structure. |
| Fresh page copy | Creates a new PDF from pages the parser can copy. | Cases where pages load but the original file structure should be reduced. |
| Diagnostics only | Produces diagnostics without saving a PDF. | Risk review before changing signed, legal, or business-critical documents. |
Limitations:
This is a browser-based repair attempt for PDFs that still parse. It is not a forensic recovery suite.
- It does not unlock encrypted files, bypass passwords, or remove document permissions.
- It does not perform deep cross-reference rebuilding, damaged stream extraction, or missing object reconstruction.
- Fresh page copy may drop source-level features that are not part of copied pages.
- Any rewrite can invalidate digital signatures because the saved bytes change.
Privacy Notes:
The selected PDF is read, scanned, parsed, and rewritten in the browser session. The work limit exists because a repair attempt may hold the original bytes and the repaired bytes in memory at the same time.
Worked Examples:
Upload rejects a readable invoice
An invoice opens in a desktop reader but fails a vendor portal. After loading it with Repair attempt set to Rewrite parsed PDF, Diagnostic Ledger shows a found header, a found EOF marker, 55 trailing bytes, and Parser result as Parsed. The Rewritten PDF download is a reasonable copy to test in the portal after comparing the page count and key pages.
Large catalog hits the work limit
A 120 MB product catalog is selected while Browser work limit remains at 80 MB. The action state reports Work limit exceeded, and no parser run starts. On a capable desktop browser, raising the limit to at least 125 MB or using a smaller export is the correction path.
Signed contract needs review first
A contract parses successfully but Diagnostic Ledger reports Signature markers. Diagnostics only is the safer first pass because it collects the warning without creating a changed PDF. If a copy is later rewritten, keep the original and ask the receiving workflow whether a fresh signature is required.
Visible markers but no parsed pages
A scanned PDF shows a header and a few /Page markers, but Parser result becomes Blocked. The marker scan is not enough to rebuild pages, so Recovery Plan points toward a fresh source export or a specialist repair engine.
FAQ:
Can this repair every damaged PDF?
No. A download is created only when the browser parser can load the document. Severe truncation, damaged streams, missing objects, and encrypted files need another source file or specialist software.
Why do page markers appear without a repaired download?
Raw page markers are scan clues. A download appears only when Parser result becomes Parsed and Repair attempt is not set to Diagnostics only.
What should I do with an encrypted PDF?
Use a PDF reader with the valid password or export an unencrypted copy from the source application. This workflow does not unlock encrypted documents.
Will the repaired copy keep a digital signature valid?
Assume no. A rewrite changes file bytes, and signed PDFs commonly depend on byte ranges. Keep the original signed PDF and create a new signed export when signature validity matters.
Why use Fresh page copy instead of Rewrite parsed PDF?
Fresh page copy is useful when page content loads but you want a cleaner page-based document. It can drop source-level features, so compare important pages before using it as a replacement.
Glossary:
- Cross-reference data
- PDF lookup information that tells a reader where objects are stored in the file.
- EOF marker
- The end marker, written as
%%EOF, that indicates where a PDF revision should finish. - Object stream
- A compressed container that can hold several PDF objects and can be difficult to recover when damaged.
- Page tree
- The PDF structure that organizes pages and connects them to their content and resources.
- Parser result
- The local load attempt that determines whether pages can be read well enough for a browser rewrite.
- Trailing bytes
- Bytes found after the final EOF marker, often caused by transfer footers, concatenation, or incomplete cleanup.
References:
- PDF Reference, Sixth Edition, Version 1.7, Adobe Systems Incorporated, 2006.
- PDF, Version 1.7 (ISO 32000-1:2008), Library of Congress, 2023.
- ISO 32000-2: Portable document format - Part 2: PDF 2.0, PDF Association, 2020.
- ISO/TS 32002: Extensions to Digital Signatures in ISO 32000-2, PDF Association, 2022.