{{ result.summaryTitle }}
{{ result.primary }}
{{ result.summaryLine }}
{{ badge.label }}
Tests {{ shardStageTestsLabel }} Shards {{ shardStageMarker }} Runners Waves {{ shardStageWaveLabel }}
Test shard runtime inputs
Keeps the runtime plan tied to one CI test job or suite.
Use the same unit your sharding tool distributes across jobs.
{{ avgTestSecLabel }}
Use test execution time, not full job time, when setup is modeled separately.
sec/test
{{ perShardSetupLabel }}
This setup tax often sets the point where adding more shards stops helping.
sec/shard
{{ shardCountLabel }}
Model the exact shard total you plan to run.
shards
{{ maxConcurrencyLabel }}
If shards exceed this value, the calculator models queued waves.
jobs
Choose the strategy closest to how your CI assigns work to shards.
{{ timingCoverageLabel }}
Use a lower value after large suite rewrites, new tests, or stale timing files.
%
{{ runnerEfficiencyLabel }}
Use 100 for ideal isolated runners; lower it when parallel shards slow each other down.
%
{{ targetRuntimeLabel }}
Used for the summary status, audit rows, and chart target line.
min
{{ extraImbalanceLabel }}
Leave 0 unless recent shard reports show a larger slowest-shard tail.
%
{{ retryUpliftLabel }}
Leave 0 when average test time already includes retry behavior.
%
Leave 0 when every setup cost is paid inside each shard job.
sec
Leave 0 when report merging happens outside the critical feedback path.
sec
{{ header }} Copy
{{ cell }}

        
Customize
Advanced
:

Introduction

A slow CI test suite often looks like a simple parallelism problem, but the wait time at the end of a run is shaped by more than the number of shards. Test sharding divides one suite into smaller jobs so they can run at the same time. The result people usually care about is CI wall time, which is the elapsed time from the start of the suite path until the last shard and any merge work have finished.

The slowest shard sets the finish. If nine shards finish in six minutes and one shard finishes in eleven, the suite still waits eleven minutes before reporting the full result. This tail can come from a slow test file, a high-cost fixture, browser setup, database contention, stale timing history, or a split method that balances by count instead of duration.

Shard
One assigned portion of the suite, usually run as a CI job, matrix entry, container, node, or device task.
Runner slot
A job that can actively execute while other jobs run under the CI system's capacity limit.
Wave
A batch of shards that can run together. Extra shards wait for the next wave when shard count exceeds runner slots.
Setup tax
Repeated per-shard work such as checkout, dependency restore, service boot, fixture setup, or browser installation.

More shards can reduce feedback time, but only while the split still has enough runner capacity and the repeated setup cost stays small compared with test execution. A 40-minute suite split into eight equal chunks sounds like a five-minute run. That estimate breaks down when each shard spends a minute restoring dependencies, when only four jobs can run at once, or when one shard receives the slowest group of tests.

Test work is split into shards, limited by runner slots, and measured as wall time plus runner-minute cost.
A shard plan trades elapsed feedback time against repeated setup, slowest-shard tail, and total runner time.

Timing data is the usual defense against a long tail. A timing-aware splitter can place slow tests more evenly than a count-based split, but it only helps when recent duration data covers the files or cases being divided. Large refactors, renamed tests, new slow suites, flaky retries, and machine changes can make old timing records optimistic.

A useful sharding estimate therefore answers two questions together. One question is whether the suite can meet the feedback target. The other is how many runner minutes that target consumes. The answer remains a planning estimate, because CI schedulers, cache hits, machine sizes, shared services, and retry behavior vary, but the estimate can show when adding shards is likely to help and when setup tax or runner capacity is the real limit.

How to Use This Tool:

Model one suite or test family at a time. Use recent successful runs for the first pass, then adjust the assumptions for stale timing data, retries, queueing, and setup overhead.

  1. Enter a Suite name that identifies the CI job, suite, or test family you are sizing.
  2. Set Total tests and Average test time using the same distributed unit. If the CI job splits files, use file count and average seconds per file; if it splits cases, use case count and average seconds per case.
  3. Add Per-shard setup time for work repeated by every shard, such as checkout, dependency restore, service startup, browser setup, or fixture preparation.
  4. Choose Shards and Max concurrent jobs. When shards exceed max concurrent jobs, the result badges and audit table show queued waves.
  5. Pick Split balance and Timing coverage. Use a lower coverage value after large test additions, file renames, missing reports, or CI infrastructure changes.
  6. Set Runner efficiency below 100 percent when parallel jobs slow each other through CPU, memory, database, browser, network, or shared-service contention.
  7. Use Target wall time when the suite gates pull requests, deploys, or release checks. A zero target removes the within-target or over-target classification.
  8. Open Advanced for extra imbalance reserve, retry or rerun uplift, one-time serial setup, and post-shard report merge time.
  9. Fix any Check shard inputs alert before reading the tables. Then use Runtime Metrics, Shard Plan Audit, and Shard Scenario Table to compare the current plan with nearby shard counts.

Interpreting Results:

Current wall time is the estimated elapsed feedback time for the selected shard count. It includes one-time serial setup, queued waves, the slowest modeled shard in each wave, and report merge time. If a target is set, the summary states whether the plan is within target or over target.

Runner-minute cost is not the same as wall time. Twelve shards that each run for about nine minutes may finish in about nine minutes when all jobs run together, but they still consume about 108 runner minutes. That distinction matters when hosted-runner quotas, self-hosted fleet size, CI budget, or queue depth are the constraint.

Shard runtime outputs and interpretation checks
Output Meaning Check before changing CI
Current wall time Modeled elapsed time for the entered shard plan. Confirm setup, merge, retry, and concurrency assumptions against recent runs.
Modeled tail buffer Extra slowest-shard reserve from split balance, timing coverage, and manual imbalance. Compare it with real per-shard durations before trusting the estimate.
Target gap Time inside or over the selected target wall time. A disabled target removes pass or risk status, so read the scenario table instead.
Fastest sampled plan The lowest wall-time option among sampled shard counts. Check runner-minute cost and setup tax before adopting the fastest count.
Cost-aware target plan The lowest runner-minute sampled plan that still meets the target. If no sampled plan meets the target, reduce setup, improve balance, or add capacity before adding shards.

A plan inside the target can still be wasteful if the runner-minute cost climbs faster than wall time falls. Use Shard Runtime Curve to find where the wall-time curve flattens, and use Runner-Minute Cost Curve to see where extra shards mainly add duplicated work.

Do not treat an optimistic model as proof that the CI change is safe. Re-run the suite after changing split rules, machine size, cache policy, retry behavior, or max concurrency, then compare the real shard durations with the modeled tail buffer.

Technical Details:

A shard runtime estimate starts with total test work, then applies the constraints that make parallel CI different from simple division. Total work is divided by shard count, runner efficiency reduces effective throughput under parallel load, and split imbalance inflates the slowest shard. Per-shard setup is added after the divided work because each shard repeats it.

Concurrency controls how many times that slowest-shard job duration is paid. If shard count fits inside the runner limit, the modeled suite runs in one wave. If shard count exceeds the limit, wall time pays the slowest-shard duration once per wave. Runner-minute cost, meanwhile, counts the modeled job time across all shards, so it can rise even when wall time improves.

Formula Core:

The equations use seconds until results are formatted as minutes and durations. Let N be total tests, t average seconds per test, R retry uplift percent, q shard count, c max concurrent jobs, E runner efficiency percent, P per-shard setup seconds, A serial setup seconds, M merge seconds, and B effective imbalance percent.

W = N×t×(1+R100) B = base tail+(100-H)×coverage penalty+X S = P+W/qE/100×(1+B100) waves = qc wall = A+waves×S+M runner minutes = q×S+A+M60

H is timing coverage percent and X is the extra imbalance buffer. Effective imbalance is bounded after the split-strategy formula is applied. With 2,400 tests at 1.8 seconds each, total work is 4,320 seconds before retry uplift. At 12 shards, 82% runner efficiency, 75 seconds of setup per shard, and a 6.2% tail reserve, the slowest shard job is about 541 seconds. With all 12 shards running together, wall time is about 9.0 minutes and runner-minute cost is about 108.2 minutes.

Split balance settings and modeled imbalance rules
Split balance Base tail Penalty per missing timing percent Interpretation
Timing-weighted split 5% 0.12% Recent timing data should keep shard lengths near the average.
File timing split 12% 0.18% File-level timing helps, but large files can still dominate one shard.
Count or name split 24% 0.08% Runtime variance is not directly balanced, so the reserve is larger.
Manual or high-variance split 38% 0.05% Manual groups or volatile suites need inspection because one group can dominate wall time.
Input boundaries used by the shard runtime model
Input area Accepted boundary Why it matters
Total tests, shards, max concurrent jobs At least 1 These values define total work, split count, active jobs, and waves.
Average test time Greater than 0 seconds Zero-duration work would make speedup and cost comparisons meaningless.
Timing coverage 0% to 100% Incomplete timing data increases the modeled tail reserve.
Runner efficiency 1% to 100% Values below 100% model slower effective throughput under parallel load.
Retry uplift 0% to 500% Models extra test work caused by flakes, reruns, or retry policies.
Extra imbalance buffer 0% to 300% Adds manual reserve for known slow files, devices, or volatile suites.
Setup and merge times 0 seconds or greater Negative overhead would hide real work and produce misleading comparisons.

Scenario rows are sampled rather than exhaustive for very large ranges. The current shard count, the concurrency limit, nearby counts, and broader representative counts are included so the curves show practical knee points without turning the table into thousands of rows.

Limitations and Privacy Notes:

The estimate is based on the values entered on the page. It does not read CI logs, fetch timing history from a provider, inspect workflow files, or verify that the named suite exists.

  • Average test time, timing coverage, runner efficiency, retry uplift, and setup cost should be checked against recent CI evidence.
  • Suite names and exported assumptions can reveal internal project structure or CI capacity, so review them before sharing reports outside the team.
  • Use a real CI trial after changing split rules, machine size, caching, test selection, retry policy, or runner limits.

Advanced Tips:

  • Keep Total tests and Average test time tied to the same unit. Mixing file count with case-level timing can understate or overstate total work.
  • Separate repeated setup from test execution whenever possible. If historical timing already includes setup, lower Per-shard setup time to avoid double-counting.
  • Raise Max concurrent jobs only when the CI account or runner fleet can actually run that many jobs at once.
  • Lower Timing coverage after refactors or missing timing reports, even if the split strategy is still timing-aware.
  • Add Retry or rerun uplift when flaky tests consume runner time; fix the flakes before treating shard count as the only lever.
  • Compare Fastest sampled plan with Cost-aware target plan before changing CI, because the fastest option may spend more runner minutes than the target needs.

Worked Examples:

Balanced timing split

A backend integration suite has 2,400 test units at 1.8 seconds each. With 12 shards, 12 concurrent jobs, 75 seconds of setup per shard, 90% timing coverage, and 82% runner efficiency, the plan runs in one wave. The modeled wall time is about 9.0 minutes, while runner-minute cost is about 108.2 minutes because all 12 shard jobs consume runner time.

Concurrency bottleneck

The same 12 shards on only three runner slots run in four waves. Wall time rises to roughly four times the slowest-shard job duration, so adding shards without raising real concurrency may add queueing instead of faster feedback.

Setup-heavy over-sharding

A small suite with long dependency restore time may look faster at high shard counts until Shard Plan Audit flags setup-heavy work. In that case, improving caching, moving work into one-time serial setup, or reducing report merge time can beat adding more shards.

Stale timing history

After a large refactor, a timing-weighted split may look optimistic because many tests changed names or duration. Lower Timing coverage and add Extra imbalance buffer until new CI runs rebuild reliable timing history.

FAQ:

Why does adding shards sometimes stop reducing wall time?

The plan may be limited by max concurrent jobs, repeated setup, report merge time, runner efficiency, or the slowest-shard tail. Once those costs dominate, extra shards do less useful work.

Should average test time include setup?

Use test execution time only when setup is entered separately. If the historical average already includes setup, lower Per-shard setup time or you will count the same overhead twice.

What split balance should I choose for GitHub Actions matrix jobs?

Choose the balance that matches the script or framework assigning work to each matrix job. The matrix controls job variations and possible concurrency, while the test splitter decides whether work is balanced by timing, file timing, count, or manual groups.

Why is timing coverage separate from split balance?

Split balance describes the method. Timing coverage describes how much useful recent duration data the method has. A timing-aware splitter with missing or stale timing data still needs extra tail reserve.

Can the result predict flaky-test retries exactly?

No. Retry or rerun uplift is an assumption for extra work caused by flakes or reruns. Use recent retry rates and failure history to choose the uplift, then compare the estimate with real CI runs.

Glossary:

CI wall time
Elapsed feedback time from the start of the modeled suite path until all shard and merge work has completed.
Runner-minute cost
Total runner time consumed across shard jobs and one-time overhead.
Shard
One assigned portion of a test suite, usually run as a separate CI job or matrix entry.
Wave
A batch of shards that can run together under the available runner limit.
Tail
Extra wait time caused by the slowest shard being heavier than the average shard.
Timing coverage
The share of distributed work with useful recent duration data for timing-aware splitting.
Setup tax
The repeated setup time each shard pays before or around test execution.

References: