{{ summaryHeading }}
{{ summaryPrimary }}
{{ summaryLine }}
{{ enablementLabel }} {{ statusBadge }} {{ sourceBadge }} {{ formatBadge }} {{ warningCount }} note{{ warningCount === 1 ? '' : 's' }}
PDF Pages {{ pdfPrefieldStage.formatLabel }} File
PDF to file converter inputs
Drop or browse for one PDF. Encrypted PDFs need the owner-approved password in Advanced.
{{ sourceTitle }}
{{ sourceSubtitle }}
{{ sourceStatus }}
Text/data formats extract content; image and PPTX formats preserve page appearance by rendering pages.
The range is validated after a PDF is loaded and capped by the Advanced page limit.
Use page sections when the target reviewer needs page traceability.
Line rows are easier to filter; page rows are easier for whole-page review.
DOCX output is text-first; it does not reconstruct original PDF layout.
Widescreen is best for screen review; standard keeps a 4:3 deck.
144 DPI is a practical review-quality default.
{{ params.jpeg_quality }}%
Higher quality keeps more detail and increases package size.
White is safest for normal office documents.
{{ progressLabel }}
Leave blank to use the PDF filename.
The selected range is trimmed to this cap when necessary.
Leave blank for unencrypted PDFs.
{{ passwordRevealStatus }}
Check Status Detail Action Copy
{{ row.check }} {{ row.status }} {{ row.detail }} {{ row.action }}

        
Customize
Advanced
:

Introduction:

PDF conversion is not one job. It can mean pulling out editable text, turning visible pages into images, building a review document, or making page-level rows for cleanup.

A PDF is a fixed-page format, so the page you see is not always the structure a converter can recover. Digital reports often contain selectable text positioned on the page. Scanned contracts, signed forms, and image-only archives may contain a picture of text with no searchable text layer. Layout-heavy PDFs can also mix both kinds of content, which is why a page can look perfect while its extracted words appear out of order or incomplete.

The first choice is whether the output should be editable or faithful to the page appearance. Text, Markdown, HTML, RTF, DOCX, CSV, XLSX, and JSON are useful when words need to be searched, copied, reviewed, or cleaned up. PNG, JPG, WebP, and image-backed PPTX output are better when the page itself is the evidence, such as a scanned signature page, map, stamped approval, or diagram.

PDF page Text layer words, lines, rows Rendered page pixels, slides Editable Visual
The same PDF page can be treated as text to extract or as a page image to preserve.

Page range matters because a PDF can be large, encrypted, or expensive for a browser to render. A small range is safer when testing a new output, and a page cap prevents a long conversion from freezing the tab. If the output needs small print or line art, higher render DPI can help, but it also creates bigger canvases and larger downloads.

PDF conversion choices and common review risks
PDF situation Better direction Common mistake
Selectable text in a report, memo, or brief Text, document, row, or structured review output. Assuming columns, headers, and reading order will match the visual page.
Scanned paper pages without OCR Page images or image-backed slides. Expecting editable text when the PDF only contains pictures of text.
Forms, signatures, stamps, maps, and diagrams PNG, JPG, WebP, or PPTX page rendering. Choosing text-first output when visual fidelity is the actual requirement.
Invoice-like pages with apparent tables Line rows or page rows as a cleanup starting point. Treating positioned PDF text as if it were a real spreadsheet grid.

A reliable conversion starts with a short inspection pass. Confirm that the selected pages contain the kind of content the output expects, then use warnings about empty text, duplicate pages, capped ranges, password protection, or oversized rendering as prompts to change the path.

How to Use This Tool:

Start with one PDF and the handoff you need. The controls change slightly because text-first, row, document, image, and slide outputs preserve different parts of the file.

  1. Select or drop one file in Source PDF. If the file is encrypted, open Advanced, enter an owner-approved PDF password, and retry.
  2. Choose Convert to. Use TXT, RTF, HTML, Markdown, DOCX, CSV, XLSX, or JSON for extractable text and rows. Use PNG, JPG, WebP, or PPTX when page appearance matters more than editable text.
  3. Set Pages with all, one page such as 3, a range such as 2-6, or comma-separated selections such as 1,4,8-10. Invalid pages, descending ranges, and pages outside the loaded PDF show a review message.
  4. Adjust the format-specific controls that appear. Text layout can keep page sections, compact lines, or continuous text. Data rows can use line rows or page rows. Visual outputs can set Render DPI, Page background, and JPG or WebP quality.
  5. Use Filename prefix when the download name needs to differ from the PDF filename, and set Page cap to 20, 40, or 80 pages. A selection above the cap is trimmed and reported as a warning.
  6. Review the active preview, the Fidelity Audit, the PDF Export Readiness Map, and any warning text before downloading. Empty text, low row counts, capped pages, or canvas-size warnings mean the output needs a different range, lower DPI, OCR, or a visual format.

Interpreting Results:

Text output is strongest when the selected pages already contain selectable text. Word counts, line counts, and previews are useful checks, but they do not prove that sidebars, columns, footnotes, or headers were reconstructed in the same order a person reads on the page.

Row output is a cleanup aid, not a promise of table detection. Line rows expose each extracted line for filtering and review. Page rows keep one record per selected page when page-level traceability matters more than line-by-line cleanup.

Image and slide output is the right path when the page appearance is the deliverable. It preserves scans, signatures, stamps, drawings, and complex layouts as pixels, so the resulting text is not editable unless an OCR step is run separately.

Treat warnings as decision points. A successful download can still be the wrong handoff if the selected PDF has no text layer, the page range was capped, the file needed a password, or the render settings would make a page too large for the browser canvas.

Technical Details:

PDF pages are assembled from drawing instructions, positioned text, images, paths, fonts, and resource references. They are not stored as word-processor paragraphs or spreadsheet cells. Text extraction reads the available text items and groups them into lines. Rendering asks the page to draw itself at a chosen scale, then saves the resulting pixels.

The distinction explains most conversion edge cases. A born-digital report can yield useful editable words while losing visual reading order. A scan can render cleanly but produce no text because there is no OCR layer to read. A visual table may export as ordinary lines because the PDF never contained spreadsheet cells, formulas, or merged ranges.

Transformation Core:

PDF conversion transformation core
Stage Text-first path Visual path
Load and select pages The PDF is opened in the browser and the page expression is resolved into a page list. The same page list becomes the render queue for images or slides.
Read page content Positioned text is normalized into page text, line rows, page rows, word counts, and previews. Each selected page is drawn to a canvas using the selected DPI and background color.
Build output TXT, RTF, HTML, Markdown, CSV, XLSX, DOCX, and JSON use the extracted text model. PNG, JPG, WebP, and PPTX use rendered page images.
Review signals Duplicate pages, capped selections, invalid ranges, and missing extractable text are reported. Large canvas dimensions, high-DPI size risk, and unavailable canvas rendering stop the export with a message.

Validation and Limit Rules:

PDF converter validation and limit rules
Check Rule Effect
File type One PDF file, using a PDF extension or PDF media type. Other files are rejected before conversion.
File size Browser-side conversions are kept under 120 MB. Larger source files show a size warning instead of running.
Page range all, *, page numbers, ascending ranges, and comma-separated mixes are accepted. Invalid tokens, out-of-range pages, and descending ranges require correction.
Page cap The selected page list is limited to 20, 40, or 80 pages. Extra selected pages are trimmed and a warning names the cap event.
Canvas size Rendered pages must stay within 16,384 pixels per side and 32,000,000 total pixels. Lower the DPI or select fewer pages when a page would exceed the canvas limit.

Format Boundaries:

PDF output format boundaries
Output family Best fit Boundary
Text and document formats Search, copy, editing drafts, and review notes. They do not rebuild the original page layout or OCR image-only text.
Rows and spreadsheets Line cleanup, page-level records, and data-review handoff. They are not true table extraction from cell structures or formulas.
Images and slides Visual evidence, scan review, and page-by-page presentation handoff. They preserve appearance as pixels, not editable text.

Privacy and Accuracy Notes:

The selected PDF is read in the browser session. The PDF content and typed password are not needed for a server-side conversion, and the JSON output redacts the password field when one was entered.

  • Use PDF passwords only for files you are authorized to open, and close or refresh the page when finished on a shared computer.
  • Text quality depends on the PDF's existing text layer; scanned pages need OCR before text-first outputs become useful.
  • Browser memory, page count, file size, render DPI, and canvas limits can affect large conversions even when the PDF itself is valid.

Advanced Tips:

  • Run a small page range first when a PDF is long, encrypted, scanned, or layout-heavy. A two-page test usually reveals whether text extraction or rendering is the better path.
  • Keep Text layout on page sections when reviewers need page traceability. Use continuous text only when the next editor expects one uninterrupted text stream.
  • Choose line rows for invoice cleanup and page rows for page-level review notes. Neither option recreates original spreadsheet cells.
  • Use 144 DPI and a white background as a normal visual-review starting point. Increase DPI only when small detail is unreadable, and reduce the page range if canvas warnings appear.
  • Use the Fidelity Audit and readiness map before handing off a file. The most important checks are selected page count, extracted text presence, row count, and any cap or render warning.

Worked Examples:

Digital report to editable notes

A 12-page report with selectable text can be converted to DOCX with page section headings. Check the preview and Fidelity Audit for selected pages, text presence, and line counts, then skim headers, footers, and column order before editing the downloaded document.

Scanned signed form

A signed form with no selectable text should be rendered as PNG or JPG at 144 DPI with a white background. If text output reports no extractable text, use page images for visual review or run OCR first and retry a text-first format.

Invoice rows for cleanup

An invoice PDF may have line-item text without real spreadsheet cells. Choose CSV or XLSX with line rows, inspect the row count, and expect to split columns or merge wrapped lines after download.

FAQ:

Why is the text output empty?

The selected pages may be scanned images or may not contain extractable text. Use PNG, JPG, WebP, or PPTX if you only need visual copies, or run OCR first and retry a text-first format.

Can I convert only certain pages?

Yes. Use all, a single page, an ascending range, or comma-separated selections. Duplicate pages are ignored, and a selection above the page cap is trimmed with a warning.

Why does spreadsheet output need cleanup?

PDFs usually position text on a page instead of storing spreadsheet cells, formulas, merged regions, or table relationships. CSV and XLSX output are structured starting points for cleanup.

Which image settings should I start with?

Use 144 DPI and a white background for ordinary document review. Increase DPI only for small details, and lower DPI or reduce selected pages if a canvas-size warning appears.

Does the password appear in JSON output?

No. JSON output replaces an entered password with a redacted marker. The password remains sensitive while typed into the browser session, so clear the page when finished on shared machines.

Glossary:

Text layer
Selectable text stored separately from the visible drawing of a PDF page.
OCR
Optical character recognition, the process of turning scanned page images into searchable text.
DPI
Dots per inch, used here as the render scale for page images and image-backed slides.
Canvas
The browser drawing surface used to render a PDF page before saving image output.
Page cap
The maximum number of selected PDF pages converted in one browser-side run.
Page range
The selected page set, such as all pages, one page, a range, or comma-separated ranges.