API Pagination Window Calculator

Total items:

Enter the record count to retrieve from the collection.

Page size:

Use the page size you will send as limit, page[size], per_page, or equivalent.

API page cap:

Set the endpoint maximum so the modeled request count never assumes an invalid limit.

Rate limit:

Enter allowed page requests per minute after any shared-client allowance.

req/min

Average page latency:

Use a measured average or a conservative sample from staging logs.

Concurrent workers:

Use the number of page requests that can be in flight at once.

workers

Average item size:

Use a sampled response payload divided by item count.

bytes/item

Pagination style:

Choose cursor when the API returns next tokens; choose offset for page/offset collections.

Endpoint or resource:

Optional label for exports, copied rows, and the JSON payload.

Response overhead:

Use this for metadata envelopes, headers, or gzip-disabled wrappers.

KiB/page

Retry reserve:

Enter 0 for a clean run; add 5-25% when retries are common.

Metric	Value	Detail	Copy
{{ row.label }}	{{ row.value }}	{{ row.detail }}

Check	State	Recommendation	Copy
{{ row.check }}	{{ row.state }}	{{ row.recommendation }}

Plan	Workers	Window	Use case	Copy
{{ row.plan }}	{{ row.workers }}	{{ row.window }}	{{ row.useCase }}

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

A long API export is not just a question of how many records exist. The practical question is how those records move through a page boundary, a provider limit, a request allowance, and the client's own waiting time. A list with 125,000 records may be harmless when it can be pulled in large cursor pages under a generous quota, or disruptive when the endpoint caps each page, returns bulky payloads, and starts throttling the client halfway through the job.

Pagination turns one large collection into a sequence of smaller responses. Page size controls the amount requested at once, while an API page cap controls the amount the service will actually allow. Many APIs also expose a continuation value, next link, page number, offset, or incremental checkpoint so the client knows where to resume. Those mechanisms sound similar, but they behave differently when records are inserted, removed, or reordered while the export is still running.

API pagination styles and planning risks
Pagination style	Best fit	Planning risk
Cursor or next-token	Large lists where the service returns a continuation value after each page.	Tokens can expire or become invalid when filters, ordering, selected fields, or authorization context changes.
Offset or page number	Small, stable lists where jumping to a known position is useful.	Deep offsets can slow down, hit provider depth limits, or skip and duplicate records when the collection changes.
Incremental window	Exports sorted by a stable timestamp or ID, with checkpoints saved after successful pages.	Late-arriving records and non-monotonic sort keys can leave gaps unless the overlap and retry policy are deliberate.

The retrieval window also has a pacing side. A rate limit describes how many requests may be sent in a time window. Latency describes how long each page takes to return. Concurrency lets several page requests wait at the same time, but it does not increase the provider's request allowance. Extra workers help only while the client is latency-bound; once the quota is slower than the page-response wait, more workers mainly increase queue pressure and the chance of 429 responses.

Payload size is the quieter limit. A fast run can still be a poor plan when every page carries heavy records, wrapper overhead, or uncompressed response bodies. Large payloads affect disk space, network transfer, parser memory, downstream queue depth, and retry cost. A safer export often comes from smaller partitions with stored checkpoints rather than one maximum-size pull that has to restart from the beginning.

A pagination estimate is therefore a preflight model, not a provider guarantee. It helps choose a page size, budget an execution window, spot quota or payload risks, and decide when the job needs sharding, but the live run still has to obey provider headers, token expiry, retry backoff, and the service's own pagination rules.

How to Use This Tool:

Start with the collection size and endpoint rules, then use the review rows and charts to decide whether the modeled export window is safe enough to run.

Set Total items to the number of records in the export window. If the API only gives an approximate count, enter the best current estimate and treat Modeled retrieval window as a planning value rather than a promise.
Enter Page size and API page cap. When API page cap is greater than zero and lower than the requested page size, Page requests is calculated from the cap and Window Review asks you to lower the sent limit.
Add the pacing values with Rate limit, Average page latency, and Concurrent workers. The Window Metrics tab shows both Rate-limit window and Latency window, and the slower one becomes the active ceiling.
Choose Pagination style. Use cursor mode for next-token or next-link responses, offset mode for page-number or offset lists, and incremental mode for stable timestamp or ID windows.
Use Average item size and the advanced Response overhead field when transfer volume matters. Payload estimate should be checked before large backfills, not only after timing looks acceptable.
Add Retry reserve when 429 responses, transient 5xx errors, lock waits, or bounded backoff are likely. The reserve multiplies the final modeled window instead of changing the clean-run page count.
Read Window Review and Shard Plan before using the number. Fix cap warnings, split very large payloads, and compare the current worker count with the single-worker, retry-reserved, and burst scenarios.

The result is ready to use when the summary badge says window ready and the review rows match the provider behavior you expect. A review plan badge means at least one input or assumption needs attention first.

Interpreting Results:

Modeled retrieval window is the headline timing estimate. Read its detail text before acting on it: a rate-limit ceiling means the request allowance dominates, while a latency ceiling means response time and worker count still affect the result.

Page requests, Payload estimate, and Effective item rate are the main sanity checks. A short window can still be risky when the payload is measured in many GiB, the provider caps page size below the planned request, or the retry path is not idempotent.

When Rate-limit window is slower, adding Concurrent workers will not shorten the modeled time unless the provider allowance also increases.
When Latency window is slower, better connection reuse, lower page latency, or a modest worker increase can help until the rate limit becomes slower.
Page Size Tradeoff identifies the fastest tested page size, but the fastest candidate may still be too bulky for memory, streaming, or retry safety.
Shard Plan compares timing scenarios. It does not prove that the API will accept the burst worker count or the same token in parallel requests.

Do not treat a clean estimate as proof that a production pull will complete on schedule. Run a small sample and compare observed latency, response size, throttle headers, and retry frequency with the values entered here.

Technical Details:

A pagination plan starts by converting records into page requests. The requested page size is rounded down to a whole number and compared with any known endpoint maximum. A partial final page still counts as a request, so the request count uses a ceiling division rather than ordinary division.

The timing model compares two ceilings for the same request count. The rate-limit ceiling spreads requests across the allowed requests per minute. The latency ceiling groups requests into worker batches and multiplies those batches by average page latency. The larger ceiling controls the base window, and the retry reserve is applied after that comparison.

Formula Core

\begin{array}{lcl} Effective page size & = & requested page size, or min(requested page size, API page cap) when the cap is above zero \\ Page requests & = & ⌈ \frac{total items}{effective page size} ⌉ \\ Rate-limit seconds & = & \frac{page requests}{requests per minute} \times 60 \\ Latency seconds & = & ⌈ \frac{page requests}{workers} ⌉ \times \frac{average latency ms}{1000} \\ Modeled window & = & max(rate-limit seconds, latency seconds) \times (1 + \frac{retry reserve percent}{100}) \\ Payload bytes & = & total items \times average item bytes + page requests \times response overhead KiB \times 1024 \end{array}

API pagination variables and exact roles
Quantity	Unit	Technical role
Total items	records	Rounded down to a whole count and used only for the selected export window.
Page size	records per request	Rounded down to a whole number and never allowed below one record.
API page cap	records per request	A cap of zero means no known cap; a positive cap limits the effective page size.
Rate limit	requests per minute	Converted to seconds for the request count; provider headers should override estimates when they differ.
Concurrent workers	in-flight page requests	Rounded down to a whole number and never allowed below one worker.
Retry reserve	percent	Clamped from 0% to 300% and applied to the final modeled window.

With 125,000 records, a 500-record effective page size, 480 requests per minute, 180 ms page latency, and 4 workers, the request count is 250. The rate-limit ceiling is 31.25 seconds. The latency ceiling is 63 worker batches times 0.18 seconds, or 11.34 seconds. The modeled clean-run window is therefore 31.25 seconds because the rate limit is slower. A 20% retry reserve raises the modeled window to 37.5 seconds.

API pagination warning and review rules
Rule	Boundary	Effect on interpretation
Page-size cap	`API page cap` > 0 and requested `Page size` > cap	The effective page size is reduced to the cap and the review recommends lowering the sent limit.
Rate-limited run	`Rate-limit window` >= `Latency window`	Request allowance controls the modeled time.
Latency-limited run	`Latency window` > `Rate-limit window`	Worker count and average page latency can still change the modeled time.
Deep offset warning	Offset mode and more than 100 page requests	The review recommends cursor or keyset pagination when the provider supports it.
Payload review	Payload estimate above 5 GiB	Confirm response streaming, compression, disk space, and retry behavior.
Very large payload warning	Payload estimate above 1 TiB	Split the export into smaller windows before one continuous run.

The page-size comparison tests 50, 100, 250, 500, 1000, the current effective size, twice the current size, and four times the current size. Positive API caps limit those candidates before timing is calculated. When two candidates finish in the same modeled time, the candidate with fewer page requests is favored.

The fetch-time curve is a straight projection from zero items to the modeled window. It is useful for planning checkpoints and status updates, but it does not model uneven page arrivals, retry backoff, provider pauses, token expiration, or shifting offsets.

Accuracy Notes:

The calculation runs from the numbers entered on the page. It does not call the target API, inspect live response headers, validate a cursor, or know whether provider tokens will remain valid for the entire job.

Update Average page latency and Average item size from a recent sample before planning a large backfill.
Keep filters, sorting, selected fields, authorization context, and page-size policy stable when comparing runs.
Treat provider documentation, rate-limit headers, and Retry-After values as authoritative when they conflict with the model.
Use smaller partitions when a failed export would be expensive to replay from the beginning.

Worked Examples:

Rate-limited order backfill

For 125,000 records at Page size 500, API page cap 1,000, Rate limit 480 req/min, Average page latency 180 ms, and Concurrent workers 4, Page requests is 250 and Modeled retrieval window is about 31.3 seconds. The result is rate-limited, so doubling workers would not make this clean-run estimate meaningfully shorter.

Requested size above the provider cap

If the same export requests 2,000 records per page while API page cap is 1,000, the request count is calculated from 1,000 records per page. Window Review reports the page-size mismatch, and the practical correction is to lower the sent limit before deploying the job.

Offset depth review

For 60,000 records at 500 records per page, offset mode produces 120 page requests. That crosses the offset warning boundary, so Window Review recommends cursor or keyset pagination if the provider offers it. If offset is the only option, a smaller date or ID window reduces the depth of each run.

Retry-heavy endpoint

Adding a 20% Retry reserve to the default 31.3-second plan raises Modeled retrieval window to about 37.5 seconds. The reserve does not model the exact backoff sequence, but it keeps a clean-run estimate from being mistaken for a realistic throttled job window.

Advanced Tips:

Use Endpoint or resource to label the job when comparing multiple collections in copied rows or JSON output.
Set Response overhead when responses include large envelopes, metadata, link arrays, or uncompressed wrappers.
Compare Fetch Time Curve with the real job's checkpoint cadence; large gaps between expected and observed progress usually mean latency, throttling, or payload assumptions are stale.
Read Page Size Tradeoff with the provider cap in mind. A larger page is not safer if it increases memory pressure or makes retries too expensive.
Keep the Retry-reserved window in Shard Plan close to the real runbook when the endpoint often returns 429 or transient 5xx responses.

FAQ:

Does concurrency increase the API rate limit?

No. Concurrent workers changes how many page requests can wait at once, but Rate limit still controls the allowed request pace.

Why was my page size reduced?

A positive API page cap lower than the requested Page size becomes the effective page size. The review flags the mismatch so the real request limit can be corrected.

Why does offset mode warn after 100 pages?

Offset mode is flagged when Page requests exceeds 100 because deep offset pulls are more likely to hit page-depth limits or shift when records change during the run.

Why does the modeled window differ from the real export?

The estimate uses average latency, a fixed request allowance, and a straight progress curve. Real exports can vary because of server load, cache state, compression, network path, changing data, token expiry, and retry backoff.

What happens when total items is zero?

Window Review reports an empty retrieval scope. Some APIs still require one confirmation request to learn that the collection is empty, so use provider behavior when planning that final check.

Glossary:

Cursor: A continuation value or next link returned by an API so the next request can resume from a service-defined position.
Offset: A page number or item position supplied by the client to skip earlier records in a collection.
Page cap: The maximum number of records an endpoint will allow in one page response.
Rate limit: The allowed number of requests in a time window, often scoped to an account, token, endpoint, or IP address.
Latency ceiling: The time implied by page-response latency and the number of workers waiting on page responses at once.
Retry reserve: An added percentage of time used to keep the plan from assuming every request succeeds on the first attempt.

References:

AIP-158: Pagination, Google API Improvement Proposals, updated July 8, 2025.
Pagination with $after in REST, Microsoft Learn, updated October 14, 2025.
Pagination, Zendesk Developer Docs.
429 Too Many Requests, MDN Web Docs, updated July 4, 2025.
RFC 8288: Web Linking, IETF, October 2017.
How to scrape a JSON API with Scrapy, Simplified Guide.