Test Shard Runtime Calculator
Calculate CI test shard wall time from test count, average duration, setup cost, split balance, runner capacity, target gap, and cost curves.{{ result.summaryTitle }}
| {{ header }} | Copy |
|---|---|
| {{ cell }} |
Introduction:
Test sharding splits one test suite across several CI jobs so feedback arrives sooner than it would from one long serial run. The useful question is not only how many tests exist. Runtime depends on how evenly the work is split, how much setup each shard repeats, how many jobs can actually run at once, and how much parallel execution slows the runners down.
A shard plan can look fast on paper and still miss a pull-request target when setup time dominates or runner slots are scarce. Twelve shards do not help much if only four jobs can run at a time, because the suite finishes in queued waves. Timing-based splitters can reduce the slowest-shard tail, but stale or incomplete timing data can still leave one shard carrying more work than the average.
Good shard sizing turns a noisy CI problem into a tradeoff you can inspect. The goal is usually a useful feedback time, not the largest possible shard count. A smaller plan that meets the target with fewer runner minutes may be better than the fastest sampled plan when CI minutes, shared databases, browser workers, or self-hosted runners are already under pressure.
A model cannot prove that a test splitter is configured correctly or that every test is independent. It helps compare shard counts, runner capacity, setup cost, retries, and target time before changing a CI matrix or spending time on deeper pipeline work.
Technical Details:
Shard runtime is governed by the critical path, not by total runner work alone. The total adjusted test time is divided by the shard count, then corrected for runner efficiency and the expected slowest-shard tail. The slowest shard job also pays the per-shard setup cost. If the shard count is higher than the active job limit, the same slowest-shard job time is paid once for each queued wave.
The split-balance setting supplies the base tail buffer. Timing-weighted splits start with the lowest buffer, file timing starts higher, count or name splits assume more variance, and manual or high-variance splits assume the largest tail. Timing coverage then adjusts that buffer. Lower coverage means fewer recent timing records are available, so the slowest shard deserves more reserve.
Formula Core:
The core estimate uses seconds internally. Percent fields are converted to factors before the calculation.
| Split balance | Base tail | Coverage penalty | Practical meaning |
|---|---|---|---|
| Timing-weighted split | 5% |
0.12% per missing coverage point |
Recent case timing should keep the slowest shard near the average shard. |
| File timing split | 12% |
0.18% per missing coverage point |
File-level timing helps, but large files can still dominate one shard. |
| Count or name split | 24% |
0.08% per missing coverage point |
Equal counts can hide very different runtimes across files or cases. |
| Manual or high-variance split | 38% |
0.05% per missing coverage point |
One hand-picked bucket or known hotspot can define the wall time. |
The effective tail buffer is the base tail plus the timing-coverage penalty plus any extra imbalance buffer, clamped from 0% to 400%. Retry or rerun uplift increases total test work before the split. Runner efficiency divides throughput, so 82% efficiency means each shard needs more elapsed time than the raw average test seconds would suggest.
| Result field | What it means | Best audit check |
|---|---|---|
Current wall time |
Modeled elapsed time for the entered shard count, including serial setup, queued waves, per-shard setup, slowest-shard test time, and report merge. | Compare it with recent CI run duration for the same suite and similar runner load. |
Parallel speedup |
Serial baseline divided by modeled wall time. | Check whether setup tax or queued waves are limiting the gain. |
Modeled tail buffer |
The slowest-shard reserve from split balance, timing coverage, and any extra imbalance. | Use shard reports to see whether the slowest shard is actually close to that reserve. |
Runner-minute cost |
The modeled cost of all shard jobs plus serial setup and report merge, expressed in runner minutes. | Use it with the scenario table before adding shards only for speed. |
Target gap |
Time inside or over the target wall time when a positive target is entered. | Treat a small overage differently from a plan that misses by a full queued wave. |
Everyday Use & Decision Guide:
Start with one CI suite or job label, then keep the test unit consistent. If your splitter distributes files, use total files and average seconds per file. If it distributes cases or packages, use that same unit for both Total tests and Average test time.
Set Shards to the matrix size you plan to run and Max concurrent jobs to the runner slots that can really execute this suite at the same time. If shards exceed the concurrency limit, the result shows queued waves, which usually explains why adding shards failed to reduce feedback time.
- Use
Per-shard setup timefor checkout, dependency restore, database boot, browser install, fixture setup, and other repeated work paid by every shard. - Choose
Timing-weighted splitonly when recent timing data covers most of the distributed work. Use lowerTiming coverageafter large test rewrites or stale timing files. - Lower
Runner efficiencywhen parallel shards compete for CPU, a database, browser workers, cache storage, or shared test services. - Use
Retry or rerun upliftwhen flaky retries or failed-test reruns consume runner time and are not already included in the average test duration. - Use
Serial setup timeandReport merge timeonly for work that sits on the critical feedback path before or after shards run.
The scenario table is the main place to compare options. The fastest sampled shard count may not be the best plan if it burns many more runner minutes for a small wall-time gain. When a target is entered, Cost-aware target plan points to the lowest runner-minute sample that still meets the target.
Do not read a green target badge as proof that sharding is healthy. The estimate depends on the entered averages, split balance, and runner efficiency. If the next live run is slower, check the actual slowest shard, queue time, setup logs, and retry rate before increasing shard count again.
Step-by-Step Guide:
Use recent CI history first, then tune advanced fields only when the basic estimate matches the shape of real runs.
- Enter
Suite nameso tables, JSON, and exported rows are tied to the right CI job. - Enter
Total testsandAverage test timefrom the same distributed unit. Do not mix case count with file-level average duration. - Enter
Per-shard setup time, then setShardsandMax concurrent jobs. Watch the wave badge to see whether the model is running one wave or several. - Choose
Split balanceandTiming coverage. If timing data is new, missing, or stale, lower the coverage instead of assuming a perfect split. - Set
Runner efficiencyandTarget wall time. A target of0disables the target gate and leaves the result as a planning estimate. - Open
Advancedfor extra imbalance, retry uplift, serial setup, and report merge time when real CI logs show those costs. - Review
Runtime MetricsandShard Plan Auditbefore the charts. The tables explain whether the target, concurrency, split balance, setup tax, or retry load is driving the result. - Use
Shard Runtime Curve,Runner-Minute Cost Curve, the scenario table, or JSON output only after the assumptions match the CI run you intend to change.
Interpreting Results:
Current wall time is the headline estimate, but the useful diagnosis is usually in the supporting rows. A long wall time with Queued waves points to runner capacity. A long wall time with high Setup tax points to repeated setup work. A high Modeled tail buffer points to uneven split quality or missing timing data.
Shard Runtime Curve compares modeled wall time with the ideal split floor across sampled shard counts. When the modeled line flattens, more shards are adding setup and tail cost faster than they are removing work from each shard. Runner-Minute Cost Curve shows the price of those options in active runner time.
| Pattern | Likely cause | Useful follow-up |
|---|---|---|
Queued waves |
Shard count is greater than active runner capacity. | Raise runner capacity, reduce shards, or split the suite into jobs with different capacity limits. |
Setup tax above 35% |
Repeated setup is taking a large share of each slow shard job. | Cache dependencies, reuse services, move work outside shard jobs, or reduce over-sharding. |
| High tail buffer | Split balance is weak, timing coverage is low, or extra imbalance was added. | Collect fresh timing data and inspect the slowest shard before trusting a lower buffer. |
| Fastest plan costs much more | Additional shards reduce wall time but increase total runner minutes. | Use the cost-aware target plan when the target is already met. |
The calculation and exports run in the page from the values you enter. Suite names, shard assumptions, JSON output, chart images, CSV rows, and DOCX tables can still reveal CI structure when shared outside the team.
Worked Examples:
Twelve shards with enough runners:
The default-style setup of 2400 test units at 1.8 sec/test, 75 sec/shard setup, 12 shards, 12 concurrent jobs, 90% timing coverage, and 82% runner efficiency lands near a single-wave plan. The wall time is driven by the slowest shard plus setup, and the summary can compare it with the 10 min target. If the result is inside target, the next question is whether fewer shards can still meet the target with lower runner-minute cost.
Twelve shards with four runner slots:
Keeping the same suite but setting Max concurrent jobs to 4 creates three waves. Each wave pays the slowest-shard job time, so the wall time rises even though the total test work per shard is unchanged. The plan audit should call out queued waves and suggest either matching capacity to shard count or reducing shard count to the active limit.
Count split after a large test rewrite:
A team that switches from timing data to count-based splitting after adding many new integration tests should expect a larger tail. Setting Split balance to Count or name split and lowering Timing coverage makes the estimate less optimistic. If the modeled result misses target, the first fix is usually better timing data or a manual split review, not simply more shards.
FAQ:
Should average test time include setup?
Use average test execution time when setup is entered separately. If your measured average already includes setup, set Per-shard setup time carefully so the same cost is not counted twice.
Why can more shards make the estimate worse?
Every shard repeats setup, and shards above the concurrency limit run in waves. When those costs are larger than the saved test time per shard, the wall-time curve can flatten or rise.
What does runner efficiency mean?
It is effective throughput after contention. Use 100% for isolated runners. Lower it when shards compete for CPU, memory, databases, browsers, services, or cache resources.
Does the fastest sampled plan check every possible shard count?
No. The scenario table samples a practical range around the current shard count, the concurrency limit, and nearby counts. Treat it as a planning scan, then test promising candidates in CI.
Can this replace real shard timing reports?
No. Use real shard reports to confirm the slowest shard, queue delay, retry rate, and setup time. The estimate is useful before changing CI, but live timing decides whether the assumptions were right.
Glossary:
- Shard
- One CI job, container, node, or matrix entry assigned a portion of the suite.
- Test unit
- The item your splitter distributes, such as a test case, spec file, package, or test file.
- Timing coverage
- The share of distributed work with recent timing data that can guide a balanced split.
- Tail buffer
- The reserve added for the slowest shard being heavier than the average shard.
- Setup tax
- The portion of a slow shard job spent on repeated setup instead of executing tests.
- Runner-minute cost
- The modeled total runner time consumed by all shard jobs, plus critical serial setup and report merge time.
References:
- Test splitting and parallelism, CircleCI Docs.
- Running variations of jobs in a workflow, GitHub Docs.
- CI/CD YAML syntax reference, GitLab Docs.