Test Shard Runtime Calculator

Suite name:

Keeps the runtime plan tied to one CI test job or suite.

Total tests:

Use the same unit your sharding tool distributes across jobs.

Average test time:

Use test execution time, not full job time, when setup is modeled separately.

sec/test

Per-shard setup time:

This setup tax often sets the point where adding more shards stops helping.

sec/shard

Shards:

Model the exact shard total you plan to run.

shards

Max concurrent jobs:

If shards exceed this value, the calculator models queued waves.

jobs

Split balance:

Choose the strategy closest to how your CI assigns work to shards.

Timing coverage:

Use a lower value after large suite rewrites, new tests, or stale timing files.

Runner efficiency:

Use 100 for ideal isolated runners; lower it when parallel shards slow each other down.

Target wall time:

Used for the summary status, audit rows, and chart target line.

min

Check shard inputs

{{ error }}

Extra imbalance buffer:

Leave 0 unless recent shard reports show a larger slowest-shard tail.

Retry or rerun uplift:

Leave 0 when average test time already includes retry behavior.

Serial setup time:

Leave 0 when every setup cost is paid inside each shard job.

sec

Report merge time:

Leave 0 when report merging happens outside the critical feedback path.

sec

{{ header }}	Copy
{{ cell }}

Export to PDF Fullscreen

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

Test sharding splits one test suite across several CI jobs so feedback arrives sooner than it would from one long serial run. The useful question is not only how many tests exist. Runtime depends on how evenly the work is split, how much setup each shard repeats, how many jobs can actually run at once, and how much parallel execution slows the runners down.

A shard plan can look fast on paper and still miss a pull-request target when setup time dominates or runner slots are scarce. Twelve shards do not help much if only four jobs can run at a time, because the suite finishes in queued waves. Timing-based splitters can reduce the slowest-shard tail, but stale or incomplete timing data can still leave one shard carrying more work than the average.

Good shard sizing turns a noisy CI problem into a tradeoff you can inspect. The goal is usually a useful feedback time, not the largest possible shard count. A smaller plan that meets the target with fewer runner minutes may be better than the fastest sampled plan when CI minutes, shared databases, browser workers, or self-hosted runners are already under pressure.

A model cannot prove that a test splitter is configured correctly or that every test is independent. It helps compare shard counts, runner capacity, setup cost, retries, and target time before changing a CI matrix or spending time on deeper pipeline work.

Technical Details:

Shard runtime is governed by the critical path, not by total runner work alone. The total adjusted test time is divided by the shard count, then corrected for runner efficiency and the expected slowest-shard tail. The slowest shard job also pays the per-shard setup cost. If the shard count is higher than the active job limit, the same slowest-shard job time is paid once for each queued wave.

The split-balance setting supplies the base tail buffer. Timing-weighted splits start with the lowest buffer, file timing starts higher, count or name splits assume more variance, and manual or high-variance splits assume the largest tail. Timing coverage then adjusts that buffer. Lower coverage means fewer recent timing records are available, so the slowest shard deserves more reserve.

Formula Core:

The core estimate uses seconds internally. Percent fields are converted to factors before the calculation.

\begin{array}{lll} Waves & = & ⌈ \frac{shards}{maxConcurrentJobs} ⌉ \\ TotalTestSeconds & = & totalTests \times averageSeconds \times (1 + retryUplift) \\ SlowShardTestSeconds & = & \frac{TotalTestSeconds / shards}{runnerEfficiency} \times (1 + tailBuffer) \\ WallSeconds & = & serialSetup + (Waves \times (perShardSetup + SlowShardTestSeconds)) + reportMerge \end{array}

Split balance buffers used in the shard runtime model
Split balance	Base tail	Coverage penalty	Practical meaning
Timing-weighted split	`5%`	`0.12%` per missing coverage point	Recent case timing should keep the slowest shard near the average shard.
File timing split	`12%`	`0.18%` per missing coverage point	File-level timing helps, but large files can still dominate one shard.
Count or name split	`24%`	`0.08%` per missing coverage point	Equal counts can hide very different runtimes across files or cases.
Manual or high-variance split	`38%`	`0.05%` per missing coverage point	One hand-picked bucket or known hotspot can define the wall time.

The effective tail buffer is the base tail plus the timing-coverage penalty plus any extra imbalance buffer, clamped from 0% to 400%. Retry or rerun uplift increases total test work before the split. Runner efficiency divides throughput, so 82% efficiency means each shard needs more elapsed time than the raw average test seconds would suggest.

Main result fields in the test shard runtime calculator
Result field	What it means	Best audit check
`Current wall time`	Modeled elapsed time for the entered shard count, including serial setup, queued waves, per-shard setup, slowest-shard test time, and report merge.	Compare it with recent CI run duration for the same suite and similar runner load.
`Parallel speedup`	Serial baseline divided by modeled wall time.	Check whether setup tax or queued waves are limiting the gain.
`Modeled tail buffer`	The slowest-shard reserve from split balance, timing coverage, and any extra imbalance.	Use shard reports to see whether the slowest shard is actually close to that reserve.
`Runner-minute cost`	The modeled cost of all shard jobs plus serial setup and report merge, expressed in runner minutes.	Use it with the scenario table before adding shards only for speed.
`Target gap`	Time inside or over the target wall time when a positive target is entered.	Treat a small overage differently from a plan that misses by a full queued wave.

Everyday Use & Decision Guide:

Start with one CI suite or job label, then keep the test unit consistent. If your splitter distributes files, use total files and average seconds per file. If it distributes cases or packages, use that same unit for both Total tests and Average test time.

Set Shards to the matrix size you plan to run and Max concurrent jobs to the runner slots that can really execute this suite at the same time. If shards exceed the concurrency limit, the result shows queued waves, which usually explains why adding shards failed to reduce feedback time.

Use Per-shard setup time for checkout, dependency restore, database boot, browser install, fixture setup, and other repeated work paid by every shard.
Choose Timing-weighted split only when recent timing data covers most of the distributed work. Use lower Timing coverage after large test rewrites or stale timing files.
Lower Runner efficiency when parallel shards compete for CPU, a database, browser workers, cache storage, or shared test services.
Use Retry or rerun uplift when flaky retries or failed-test reruns consume runner time and are not already included in the average test duration.
Use Serial setup time and Report merge time only for work that sits on the critical feedback path before or after shards run.

The scenario table is the main place to compare options. The fastest sampled shard count may not be the best plan if it burns many more runner minutes for a small wall-time gain. When a target is entered, Cost-aware target plan points to the lowest runner-minute sample that still meets the target.

Do not read a green target badge as proof that sharding is healthy. The estimate depends on the entered averages, split balance, and runner efficiency. If the next live run is slower, check the actual slowest shard, queue time, setup logs, and retry rate before increasing shard count again.

Step-by-Step Guide:

Use recent CI history first, then tune advanced fields only when the basic estimate matches the shape of real runs.

Enter Suite name so tables, JSON, and exported rows are tied to the right CI job.
Enter Total tests and Average test time from the same distributed unit. Do not mix case count with file-level average duration.
Enter Per-shard setup time, then set Shards and Max concurrent jobs. Watch the wave badge to see whether the model is running one wave or several.
Choose Split balance and Timing coverage. If timing data is new, missing, or stale, lower the coverage instead of assuming a perfect split.
Set Runner efficiency and Target wall time. A target of 0 disables the target gate and leaves the result as a planning estimate.
Open Advanced for extra imbalance, retry uplift, serial setup, and report merge time when real CI logs show those costs.
Review Runtime Metrics and Shard Plan Audit before the charts. The tables explain whether the target, concurrency, split balance, setup tax, or retry load is driving the result.
Use Shard Runtime Curve, Runner-Minute Cost Curve, the scenario table, or JSON output only after the assumptions match the CI run you intend to change.

Interpreting Results:

Current wall time is the headline estimate, but the useful diagnosis is usually in the supporting rows. A long wall time with Queued waves points to runner capacity. A long wall time with high Setup tax points to repeated setup work. A high Modeled tail buffer points to uneven split quality or missing timing data.

Shard Runtime Curve compares modeled wall time with the ideal split floor across sampled shard counts. When the modeled line flattens, more shards are adding setup and tail cost faster than they are removing work from each shard. Runner-Minute Cost Curve shows the price of those options in active runner time.

Common shard runtime result patterns and follow-up checks
Pattern	Likely cause	Useful follow-up
`Queued waves`	Shard count is greater than active runner capacity.	Raise runner capacity, reduce shards, or split the suite into jobs with different capacity limits.
`Setup tax` above `35%`	Repeated setup is taking a large share of each slow shard job.	Cache dependencies, reuse services, move work outside shard jobs, or reduce over-sharding.
High tail buffer	Split balance is weak, timing coverage is low, or extra imbalance was added.	Collect fresh timing data and inspect the slowest shard before trusting a lower buffer.
Fastest plan costs much more	Additional shards reduce wall time but increase total runner minutes.	Use the cost-aware target plan when the target is already met.

The calculation and exports run in the page from the values you enter. Suite names, shard assumptions, JSON output, chart images, CSV rows, and DOCX tables can still reveal CI structure when shared outside the team.

Worked Examples:

Twelve shards with enough runners:

The default-style setup of 2400 test units at 1.8 sec/test, 75 sec/shard setup, 12 shards, 12 concurrent jobs, 90% timing coverage, and 82% runner efficiency lands near a single-wave plan. The wall time is driven by the slowest shard plus setup, and the summary can compare it with the 10 min target. If the result is inside target, the next question is whether fewer shards can still meet the target with lower runner-minute cost.

Twelve shards with four runner slots:

Keeping the same suite but setting Max concurrent jobs to 4 creates three waves. Each wave pays the slowest-shard job time, so the wall time rises even though the total test work per shard is unchanged. The plan audit should call out queued waves and suggest either matching capacity to shard count or reducing shard count to the active limit.

Count split after a large test rewrite:

A team that switches from timing data to count-based splitting after adding many new integration tests should expect a larger tail. Setting Split balance to Count or name split and lowering Timing coverage makes the estimate less optimistic. If the modeled result misses target, the first fix is usually better timing data or a manual split review, not simply more shards.

FAQ:

Should average test time include setup?

Use average test execution time when setup is entered separately. If your measured average already includes setup, set Per-shard setup time carefully so the same cost is not counted twice.

Why can more shards make the estimate worse?

Every shard repeats setup, and shards above the concurrency limit run in waves. When those costs are larger than the saved test time per shard, the wall-time curve can flatten or rise.

What does runner efficiency mean?

It is effective throughput after contention. Use 100% for isolated runners. Lower it when shards compete for CPU, memory, databases, browsers, services, or cache resources.

Does the fastest sampled plan check every possible shard count?

No. The scenario table samples a practical range around the current shard count, the concurrency limit, and nearby counts. Treat it as a planning scan, then test promising candidates in CI.

Can this replace real shard timing reports?

No. Use real shard reports to confirm the slowest shard, queue delay, retry rate, and setup time. The estimate is useful before changing CI, but live timing decides whether the assumptions were right.

Glossary:

Shard: One CI job, container, node, or matrix entry assigned a portion of the suite.
Test unit: The item your splitter distributes, such as a test case, spec file, package, or test file.
Timing coverage: The share of distributed work with recent timing data that can guide a balanced split.
Tail buffer: The reserve added for the slowest shard being heavier than the average shard.
Setup tax: The portion of a slow shard job spent on repeated setup instead of executing tests.
Runner-minute cost: The modeled total runner time consumed by all shard jobs, plus critical serial setup and report merge time.

References:

Test splitting and parallelism, CircleCI Docs.
Running variations of jobs in a workflow, GitHub Docs.
CI/CD YAML syntax reference, GitLab Docs.