PDF Page Extractor
Extract selected pages from a PDF in your browser, preserve typed order or remove repeats, and verify the new file with a manifest.Current status
| Field | Value | Copy |
|---|---|---|
| {{ row.field }} | {{ row.value }} |
| Output page | Source page | Range token | Sequence | Source file | Copy |
|---|---|---|---|---|---|
| Load a PDF and enter pages to preview the extraction manifest. | |||||
| {{ row.outputPage }} | {{ row.sourcePage }} | {{ row.token }} | {{ row.sequence }} | {{ row.sourceFile }} | |
| Check | Status | Detail | Copy |
|---|---|---|---|
| {{ row.check }} | {{ row.status }} | {{ row.detail }} |
Introduction
PDF page extraction creates a new PDF from selected page positions in an existing document. The job sounds simple, but it often sits inside careful work: sending only the pages approved for review, building an exhibit packet, trimming a long scan, separating signed pages, or sharing a small attachment without exposing the whole source file.
The safest way to think about extraction is by page position, not printed page label. A PDF has a first page, second page, third page, and so on. The visible labels on those pages may use Roman numerals, chapter prefixes, restarted numbering, covers, inserts, or no labels at all. When a selector says page 3, a PDF extractor normally means the third page object in the file, not the page that happens to print the number 3.
| Situation | What matters most | Common mistake |
|---|---|---|
| Sending selected records | Only the approved source pages should appear in the new file. | Choosing pages by visible labels without checking source positions. |
| Building a review packet | The output may need a deliberate order that differs from the source. | Sorting a typed sequence that was meant to stay custom. |
| Cleaning a scan | Odd, even, and open ranges can remove predictable blanks or covers. | Assuming odd and even refer to printed labels instead of PDF positions. |
| Preparing a filing attachment | The extracted file should be checked against the source before submission. | Assuming extraction verifies signatures, bookmarks, forms, tags, or filing rules. |
Sequence choices change the document. A packet might need page 12 before page 4, or it might need the same instruction page repeated before two forms. A cleanup run might need the opposite behavior: sorted pages with repeats removed. Those are different outputs even when the selected source pages overlap.
source PDF positions: 1 2 3 4 5 6
typed selector: 1, 4-5, 2, 2
typed-order output: 1 4 5 2 2
ascending-unique output:1 2 4 5
Extraction is also not redaction. Copying selected pages can reduce what is shared, but it does not search for sensitive text, comments, hidden content, attachments, metadata, links, document policies, or leftover information on the selected pages. Legal, medical, financial, school, and personnel files need a separate review before the extracted PDF becomes the record copy.
A page manifest is useful because it turns the extraction into an auditable mapping. It should show which output page came from which source page and which selector produced it. That small trail catches many errors caused by shifted numbering, copied repeats, or a range typed from memory.
How to Use This Tool:
Load one PDF, enter the source page positions to copy, check the manifest, and download the generated PDF only after the mapping matches your intent.
- Choose Browse PDF, drop one PDF into the upload area, or use Load sample to inspect the workflow. If several files are dropped, only the first file is used.
- Wait for the source summary to show the file name, page count, and size. Loading is blocked when the file is too large, lacks a PDF extension or PDF MIME type, has no PDF header, has no pages, or cannot be parsed in the browser.
- Enter Pages to extract. Accepted selectors include a single page such as
8, comma-separated values such as1, 4, 9, closed ranges such as2-6, open ranges such as10-or-5, plusall,odd, andeven. - Use All, Odd, or Even after a PDF is loaded when a quick selector should expand against the current page count.
- Open Advanced when sequence behavior matters. Typed order, allow duplicates keeps the entered order and repeats. Ascending unique pages sorts source pages and removes repeats before copying.
- Set Page guard between 1 and 500 selected pages. The guard protects the browser from an accidental large extraction run.
- Name the Output filename. Unsafe filename characters are converted to hyphens, and
.pdfis added when missing. - Review Page Manifest and Extraction Checks. Use Extract pages, then download from Extracted PDF only after the output page count, repeated-page notice, and source-page mapping are correct.
Interpreting Results:
The generated PDF is the main result, but the tables explain whether it is ready for the intended handoff. Start with the output page count, then inspect the manifest and checks before sharing the file.
| Result area | What to check | Useful export |
|---|---|---|
| Extracted PDF | Generated status, source file, selected page count, repeated pages, output name, and output size. | Download PDF or copy the status table as CSV. |
| Page Manifest | Output page, source page, range token, sequence mode, and source file for each copied page. | Copy CSV, download CSV, export DOCX, or copy individual rows. |
| Extraction Checks | PDF processing support, source validity, file-size cap, page range validity, page guard, repeats, and output readiness. | Copy CSV, download CSV, export DOCX, or copy individual rows. |
| JSON | Machine-readable settings, source facts, selection details, output facts, checks, and warnings. | Copy JSON or download JSON. |
Repeated pages should match the document's purpose. In typed-order mode, 1, 2, 2, 5 creates four output pages because source page 2 is copied twice. If the repeat is accidental, switch to Ascending unique pages and check the manifest again.
Range needs review means the selector cannot safely expand. Typical causes include a blank entry, an unknown token, a reversed range, page 0, a page beyond the loaded PDF, a selector that produces no pages, or a selected count above the page guard.
A ready status confirms that selected source pages were copied into a new browser-generated PDF. It does not prove that printed page labels, visible signatures, bookmarks, form fields, accessibility tags, attachments, document policies, or filing rules still meet your requirements.
Technical Details:
A PDF stores an ordered page tree along with document-level structures such as outlines, forms, metadata, attachments, security settings, and optional page labels. Page extraction copies selected page objects into a new document. The result should be treated as a derived PDF that needs its own review, not as proof that every document-level feature from the source survived unchanged.
The selector language is one-based because readers normally describe documents as page 1, page 2, and so on. During copying, each selected page is addressed by its source position. That is why a manifest is more reliable than a visible label when front matter, inserted scans, or custom numbering shifts printed page numbers away from PDF positions.
The selected source file is read in the browser session. It must be provided by file picker, drag and drop, or the built-in sample. A user-provided file must have a PDF extension or PDF MIME type, fit the 150 MB cap, start with a PDF header, and parse with at least one page before extraction can proceed.
Selector Rules:
| Selector | Example | Expansion rule | Validation boundary |
|---|---|---|---|
| Single page | 8 |
Copies source page 8 once. | The page must exist in the loaded PDF. |
| Comma list | 1, 4, 9 |
Reads selectors from left to right after spaces are removed. | Each token must be recognized. |
| Closed range | 2-6 |
Copies pages 2, 3, 4, 5, and 6. | The start must be at least 1 and cannot be greater than the end. |
| Open range | 10- or -5 |
Uses the first or last source page when one side is omitted. | The resolved range must stay inside the loaded page count. |
| Whole document | all |
Copies every source page. | The resulting count must stay within the page guard. |
| Parity | odd or even |
Copies odd-numbered or even-numbered source positions. | Printed labels, Roman numerals, and prefixes are not consulted. |
Sequence Core:
Let Sraw be the page sequence produced by the selector text before the sequence choice is applied. Typed-order mode uses that sequence directly. Ascending-unique mode converts it into a sorted set of unique source pages.
The output page count is the length of that final sequence. Extraction is allowed only when the final count is nonzero and does not exceed the selected page guard.
| Boundary | Limit or rule | Why it exists |
|---|---|---|
| Source size | 150 MB maximum | The browser may need memory for the source PDF, copied pages, and generated PDF at the same time. |
| Selected pages | 1 to 500 pages per run | The guard helps prevent an accidental large copy from freezing the tab. |
| File identity | PDF extension or PDF MIME type, followed by a PDF header check | A renamed or mismatched file should fail before page counting starts. |
| Output filename | Letters, numbers, dot, underscore, and hyphen are kept; other character runs become hyphens | The generated download name stays readable and receives a PDF extension. |
| Output structure | Selected pages are copied into one generated PDF | Bookmarks, attachments, forms, signatures, tags, and other document-level features still need separate review. |
source PDF
-> confirm file type, size, header, and page count
-> expand selectors into source page positions
-> apply typed order or ascending unique sequence
-> copy selected source pages into a generated PDF
-> verify with PDF status, page manifest, checks, and JSON
Privacy Notes:
The selected PDF is read in the browser session, and the extracted PDF is generated there for download. No upload is needed for the extraction work, but local processing is not the same as permanent confidentiality.
Handle the source file, downloaded PDF, copied CSV, exported DOCX, downloaded JSON, browser downloads folder, and any shared device memory according to the sensitivity of the document. For confidential records, close unused tabs and remove temporary downloads when the work is finished.
Worked Examples:
Invoice packet with backup pages
A 24-page packet needs the cover plus two supporting pages. Enter 1, 7-8 and keep typed order. The manifest should show output page 1 from source page 1, output page 2 from source page 7, and output page 3 from source page 8. The selected page count should be 3.
Repeated instruction page
A form packet needs the same instruction page before two different sections. Entering 2, 5-7, 2, 12 in typed-order mode creates six output pages because source page 2 appears twice. That repeat is useful when the second copy is deliberate.
Open range above the guard
A 900-page scan loads successfully, but 10- selects 891 pages. With the guard at 500, extraction is blocked. Narrow the range, split the work into smaller runs, or raise the guard only after deciding the browser workload is acceptable.
Printed labels shifted by front matter
A contract prints the first exhibit page as page 1, but the PDF has a cover and index before it. Entering 1-3 copies the cover, index, and first exhibit page. Use the PDF viewer's source positions, such as 3-5, then verify the Source page column.
FAQ:
Does the selected PDF leave my browser?
No upload is needed for extraction. The source bytes are read from the selected local file, selected pages are copied in the browser session, and the new PDF is offered as a download.
Can I repeat a page in the extracted PDF?
Yes. With Typed order, allow duplicates, a range such as 1, 2, 2, 5 copies source page 2 twice. Choose Ascending unique pages when repeats should be removed.
Why is my range rejected?
The range is rejected when it is blank, uses an unknown token, starts after it ends, references page 0, asks for a page beyond the loaded PDF, selects no pages, or exceeds the current page guard.
Do odd and even use printed page numbers?
No. odd and even use source page positions from the loaded PDF. Printed page labels can differ when the document has covers, front matter, section prefixes, or Roman numerals.
Will bookmarks, forms, and signatures be preserved?
Do not assume that document-level structure or signature status is preserved. Review bookmarks, forms, links, comments, signatures, accessibility tags, attachments, and filing requirements in a dedicated PDF viewer or editor when they matter.
Can this be used as a redaction step?
No. Extracting selected pages can reduce what is included in a new PDF, but it does not inspect hidden data, comments, metadata, attachments, or sensitive content on the selected pages. Use a real redaction workflow when material must be removed.
Glossary:
- Page range
- A text selector that expands to one or more source page positions.
- Source page position
- The numeric position of a page inside the loaded PDF, which may differ from printed page labels.
- Typed order
- The selected sequence created by reading the requested selectors from left to right, including repeated pages.
- Ascending unique pages
- A sequence mode that sorts selected source pages and removes duplicates before copying.
- Page guard
- The selected-page cap for one extraction run in the browser.
- Page manifest
- The table that maps each output page to its source page, selector token, sequence mode, and source file.
References:
- Extract pages from PDFs, Adobe Help.
- Renumber pages in PDFs, Adobe Help.
- File API, MDN Web Docs.
- PDF standards, PDF Association.