Kubernetes Cluster Capacity Calculator
Calculate online Kubernetes replica capacity from allocatable node resources, pod requests, rollout surge, PDB rules, and failure tolerance before scaling.{{ result.summaryTitle }}
| Metric | Value | Copy |
|---|---|---|
| {{ row.label }} | {{ row.value }} |
| Scenario | Active Nodes | Capacity | Safe Target | Peak Slack | Status | Outcome | Copy |
|---|---|---|---|---|---|---|---|
| {{ row.scenario }} | {{ row.nodesText }} | {{ row.capacityText }} | {{ row.targetText }} | {{ row.peakSlackText }} | {{ row.statusText }} | {{ row.outcome }} |
| Total Nodes | Add | Safe Target | Steady Capacity | Gap to Goal | Limiter | Status | Outcome | Copy |
|---|---|---|---|---|---|---|---|---|
| {{ row.totalNodesText }} | {{ row.additionalNodesText }} | {{ row.safeTargetText }} | {{ row.steadyCapacityText }} | {{ row.gapText }} | {{ row.limiter }} | {{ row.statusText }} | {{ row.outcome }} |
| Lever | Type | New Safe Target | Delta | Goal Gap | Why It Helps | Trade-off | Copy |
|---|---|---|---|---|---|---|---|
|
{{ row.label }}
{{ row.summary }}
|
{{ row.category }} | {{ row.safeTargetText }} | {{ row.deltaText }} | {{ row.goalGapText }} | {{ row.why }} | {{ row.tradeoff }} | |
| No positive single-step lever was found inside the built-in comparison set for this workload shape. | |||||||
| Field | Value | Copy |
|---|---|---|
| {{ row.label }} | {{ row.value }} |
By copying or publishing this embed code, you are responsible for how the tool appears and is used on your website.
- The embedded tool is provided for general informational and utility purposes only. It is not professional, legal, financial, medical, safety, or compliance advice.
- Results depend on the inputs, browser behavior, available data sources, and the current version of the tool. Review important results before relying on them.
- You are responsible for the surrounding page context, labels, instructions, privacy notices, accessibility, and any laws or policies that apply to your website.
- Do not embed the tool in a misleading, unlawful, harmful, or security-sensitive context.
- Simplified Tools may update, limit, suspend, or remove tools and embed behavior without prior notice.
- Analytics, network requests, cookies, browser storage, third-party services, and query parameters may apply depending on the tool and the embedding page.
If these terms do not work for your use case, do not embed the tool.
Introduction:
Kubernetes capacity planning is the work of turning node resources into a replica count that still holds during rollouts and failures. CPU, memory, pod slots, local ephemeral storage, and pod IPs can each become the first limit. A service can look comfortable on healthy nodes and still fail a rollout when surge pods, a PodDisruptionBudget, or a lost zone removes part of the scheduler runway.
Replica capacity is useful only when it matches the way the workload is actually scheduled. Requests, DaemonSet overhead, reserved headroom, and topology spread all reduce the pool available to application pods. That makes the practical question more specific than total cluster size: how many replicas can run at the selected pod shape while the cluster also absorbs the selected disruption model?
The result is a planning ceiling, not a guarantee that every future pod will place cleanly. Real clusters also depend on taints, affinities, quotas, storage classes, autoscaler behavior, and per-zone imbalance. Treat the number as a conservative deployment target to compare against live scheduler and autoscaler evidence.
Technical Details:
The capacity model starts with allocatable node resources, subtracts DaemonSet overhead, then applies reserve, utilization, topology-spread, and fragmentation factors. The smallest remaining resource budget becomes the steady pod capacity. CPU, memory, and pod slots are always modeled; ephemeral storage and subnet IPs join the limiting set only when non-zero inputs make them meaningful.
Rollout safety is evaluated separately from steady placement. A rolling update with maxSurge needs extra temporary pods, and a PodDisruptionBudget can require a minimum number of replicas to stay available after failures. The safe replica target is the smallest of the healthy cluster, selected node-loss, largest-zone-loss, and PDB-derived targets.
| Quantity | Rule | Why it matters |
|---|---|---|
| Effective pod request | pod request + pod overhead | A pod must fit after runtime overhead is included. |
| Steady capacity | Minimum of CPU, memory, pod slots, optional storage, and optional pod IP budget | The first exhausted resource sets the ceiling. |
| Rollout peak | replicas + ceil(replicas * maxSurge) | The cluster must hold temporary surge pods. |
| PDB floor | ceil(replicas * minAvailable%) | Availability policy can bind before raw resources do. |
The tool clamps impossible inputs rather than letting them produce misleading numbers. For example, availability zones cannot exceed worker nodes, DaemonSet pod count cannot exceed maxPods, and node-loss drills are reduced so at least one node remains. Warnings call out cases where a single pod may not fit, DaemonSet overhead collapses a resource budget, or subnet IPs leave no pod runway.
Everyday Use & Decision Guide:
Start with the workload profile closest to the deployment, then replace the node and pod request values with numbers from the target cluster. The built-in profiles are only seeds; the useful result comes from matching real allocatable CPU, memory, maxPods, DaemonSet tax, and average pod requests.
- Use
Recommended replicasas the first publication ceiling, not the healthy-only steady pod count. - Check
Binding constraintbefore buying nodes. Memory, pod slots, PDB, or subnet IPs often explain the shortage better than CPU. - Set
Desired safe replicaswhen you need a node-count path to a known target. - Leave a non-zero storage request only when workloads declare local ephemeral-storage requests.
A high ceiling does not prove that all pods will schedule. Verify the same target with live namespace quotas, affinity rules, autoscaler limits, and per-zone node groups before changing production replicas.
Step-by-Step Guide:
- Choose a workload profile or keep
Custom, then enter worker nodes, allocatable CPU, allocatable memory, andmaxPods. - Enter average pod CPU and memory requests, plus ephemeral-storage requests only if Kubernetes schedules that resource for the workload.
- Open advanced settings for DaemonSet overhead, reserve, topology spread, rollout
maxSurge, PDBminAvailable, and node failures to tolerate. - Review
Capacity Metricsfor the recommended replica ceiling and the first limiting resource. - Use
Failure ScenariosandScale Pathto see whether the rollout peak survives selected node or zone loss. - If warnings appear, fix impossible inputs or lower the target before treating the plan as publishable.
Interpreting Results:
The safest read is the smallest safe target shown after rollout and failure policy are applied. Steady capacity is still useful, but it does not include the full rollout peak or PDB pressure by itself.
A Broken, Tight, or negative slack row is a stop signal. Add nodes, lower requests, reduce maxSurge, revisit the PDB, or expand pod-subnet IP space before approving a larger replica count.
Worked Examples:
Stateless service. Six nodes with 8 vCPU, 32 GiB memory, 110 pod slots, 15% reserve, and 0.15 vCPU pods may look roomy. If the largest-zone outage leaves four nodes and maxSurge is 25%, the safe target can be much lower than the healthy steady pod count because rollout peak pods must fit after the zone loss.
Service mesh workload. Raising DaemonSet CPU and memory overhead for sidecars and node agents can move the limiter from pod slots to memory. The Capacity Levers rows show whether adding nodes, lowering pod memory, or reducing platform overhead buys the most safe replicas.
Subnet bottleneck. When usable pod-subnet IPs are entered, a high CPU and memory budget can still fail. If rollout peak is short by pod IPs, the next fix is subnet or CNI planning rather than smaller container requests.
FAQ:
Does this replace scheduler testing?
No. It estimates a safe ceiling from resource requests and policy inputs. Live scheduling can still fail because of affinity, taints, quotas, storage, or autoscaler limits.
Why can the recommended replicas be lower than steady capacity?
Rollout surge, selected node loss, largest-zone loss, and PDB minAvailable can all require extra room beyond steady-state pod placement.
When should pod IP budget be enabled?
Use it when the pod network has a known usable IP ceiling. Leave it at zero when IP allocation is not the limiting planning constraint.
Glossary:
- Allocatable
- Node resource available to pods after Kubernetes and system reservations.
- DaemonSet overhead
- Per-node pods and resource requests consumed before application pods are placed.
- Rollout peak
- The temporary pod count created by replicas plus
maxSurge. - PodDisruptionBudget
- A policy that limits voluntary disruption by requiring a minimum number of pods to remain available.