{{ summaryTitle }}
{{ formatCap(usableTB) }} {{ unit }}
{{ summaryMessage }}
{{ summaryDetail }}
{{ badge.text }}
OSDs {{ cephCapacityStage.osdLabel }} Pool {{ cephCapacityStage.profileLabel }} Reserve
Ceph capacity inputs
Use whole storage hosts, such as 6, that can actually place this pool.
Off uses one shared host size; On opens per-node capacity entries.
{{ manualNodes ? 'On' : 'Off' }}
Enter a marketed size, for example 10 with TB or 9.1 with TiB.
Select one unit, then enter one raw capacity per host; blanks count as 0.
{{ unit }}
Filled {{ manualValidCount }} of {{ manualNodeIndices.length }} nodes; raw total {{ formatCap(rawTB) }} {{ unit }} before overhead.
Choose a preset, or Custom before editing size, min_size, k, or m directly.
Current profile: {{ protectionLabel }}; healthy tolerance {{ healthyFaultTolerance }} {{ failureDomainWord(healthyFaultTolerance) }}.
size min_size
k m
min_size
Use host for CRUSH host rules; custom requires entering the block size below.
Custom block size: raw capacity represented by one loss domain.
{{ unit }}
OSD reserve uses raw total divided by nodes and OSDs per node.
Use 1 for one host/OSD loss, 2 for two simultaneous losses.
Enter the installed OSD count per host, such as 8.
Use 50-200 for typical planning; autoscaler and CRUSH roots may differ.
Enter the number of similar data pools on this raw estate.
Set 50%-95%; default Ceph planning check starts at 85%. {{ formatPercentValue(nearfullConfigured) }}
Off clamps to recovery-safe nearfull; On keeps your requested target.
{{ allowDegraded ? 'On' : 'Off' }}
Set 55%-99%; keep it above nearfull and below full. {{ formatPercentValue(backfillfullConfigured) }}
Set 60%-100%; 95% is the common default ceiling. {{ formatPercentValue(fullConfigured) }}
Set 0%-15%; use 0 when modeling only protection and headroom. {{ formatPercentValue(osdOverhead, 1) }}
Set 0%-15%; raise it for mixed nodes or known hot OSDs. {{ formatPercentValue(skew, 1) }}
Recovery check: {{ advancedAlertText }}
Metric Value Copy
{{ row.label }} {{ row.value }}
Type Signal Detail Next action Copy
{{ row.type }} {{ row.signal }} {{ row.detail }} {{ row.action }}
Guardrail Value Copy
{{ row.label }} {{ row.value }}
Profile Usable Efficiency Fault tolerance Min spread Delta Verdict Copy
{{ row.profile }} {{ row.usable }} {{ row.efficiency }} {{ row.faultTolerance }} {{ row.minimumDomains }} {{ row.delta }} {{ row.verdict }}

                
Customize
Advanced
:

Introduction

Ceph storage rarely turns raw drive labels into usable data at a simple ratio. A cluster can have plenty of installed terabytes and still be too tight for recovery if one host fails, if a pool needs more placement domains than the topology offers, or if OSD fill thresholds leave too little room for backfill. Capacity planning is therefore a recovery question as much as a size question.

The first distinction is raw capacity versus protected capacity. Raw capacity is the sum of the OSD space you plan to make available. Protected capacity is what remains after the pool's data protection method, operating headroom, and failure reserve are considered. A replicated pool spends capacity on complete copies of each object. An erasure-coded pool splits each object into data chunks and coding chunks, improving storage efficiency in many cases while requiring enough separate failure domains for those chunks to land safely.

Ceph capacity planning terms
Planning term What it controls Common mistake
Protection profile Replica count or erasure-code k+m chunks. Comparing capacity without checking the needed placement spread.
Failure domain The unit expected to disappear together, such as a host, OSD, or custom block. Using average capacity when the largest host would create the real recovery gap.
Nearfull threshold The operating fill point used for the usable-capacity estimate. Treating nearfull as a target instead of an alert boundary.
Placement groups The pool shards that CRUSH maps across OSDs. Changing PG counts from arithmetic alone without checking autoscaler guidance.

Nearfull, backfillfull, and full are related but not interchangeable. Nearfull is an early warning that the cluster is getting crowded. Backfillfull can prevent recovery data from being placed on an OSD. Full can stop normal writes. A capacity estimate that ignores this ordering may look acceptable during a healthy day and fail exactly when a replacement OSD or host has to absorb data.

Flow from raw Ceph capacity through operating filters, recovery reserve, nearfull policy, and protection efficiency.
A useful Ceph estimate holds back recovery room before applying replication or erasure-code efficiency.

Topology also changes the answer. Six equal hosts can place a 4+2 erasure-coded profile across six hosts, but a smaller host count cannot satisfy the same spread even if raw capacity is high. Mixed-capacity clusters add another wrinkle because losing the largest host matters more than losing an average host. That is why a planning number should be read with the failure-domain and spread checks beside it.

The final result is a sizing estimate, not a live cluster audit. Device classes, CRUSH rules, balancer state, pool autoscaling, snapshots, compression, quotas, and workload write patterns can all move the real usable limit. The estimate is most valuable before hardware purchase, pool creation, or protection-profile changes, where it can reveal a bad assumption while it is still cheap to fix.

How to Use This Tool:

Use the calculator as a what-if planner: enter the raw estate, choose the pool protection profile, then read the capacity and guardrail outputs together.

  1. Set Total nodes. For equal hosts, enter one Capacity per node and unit. For mixed hosts, turn on Heterogeneous nodes and fill the per-node capacity rows; blank rows count as zero.
  2. Choose a Protection preset such as Replication 3x, EC 4+2, or EC 6+3. Select Custom when you need to edit size, min_size, k, or m directly.
  3. Set Failure domain and Domains to tolerate. Host reserve uses the largest entered hosts first, OSD reserve uses average OSD size, and custom reserve uses the capacity block you enter.
  4. Enter OSDs per node, Target PG per OSD, and Data pools to produce Suggested PG per pool and Estimated PG shards per OSD.
  5. Adjust OSD nearfull target. Use the Advanced section only when the scenario needs custom Backfillfull ratio, Full ratio, OSD or metadata overhead, CRUSH imbalance skew, or Accept degraded PGs.
  6. If the summary reports topology risk, a raw shortfall, or a failed recovery check, open Recovery Briefing and Cluster Guardrails. Fix the failed spread, fault-tolerance, or nearfull assumption before treating Usable capacity as a plan.

Interpreting Results:

The headline Usable capacity is protected data capacity at Nearfull used. It is lower than raw storage because the estimate first deducts overhead and skew, then keeps reserve for the selected failure target, then applies the protection efficiency.

A green summary means the entered numbers satisfy the calculator's reserve, fault-tolerance, threshold-order, and spread checks. It does not prove that a live CRUSH rule, device class, balancer state, or pool autoscaler will agree.

  • Nearfull used below Requested nearfull means the recovery-safe clamp lowered the operating fill point.
  • Extra raw for requested nearfull above zero means more raw capacity is needed to keep the requested nearfull value without degraded override.
  • Minimum spread must be less than or equal to Available spread for host or OSD failure-domain models.
  • Scheme Matrix compares protection profiles on the same raw estate. A higher usable number is useful only when its fault tolerance and spread verdict also fit.

Technical Details:

Ceph pools store RADOS objects through placement groups, and CRUSH maps those placement groups to OSD sets. In a replicated pool, each placement group is placed on as many OSDs as the pool size requires. In an erasure-coded pool, each object is split into k data chunks and m coding chunks, so the pool needs enough distinct placement targets for k+m chunks.

Capacity math starts with the entered node capacities in one selected unit. The unit selector labels the numbers and results; it does not convert mixed SI and IEC values for you. Keep all entries in the same unit before comparing TB, TiB, PB, or PiB scenarios.

The recovery reserve is a raw-domain estimate. Host reserve sums the largest entered hosts first, OSD reserve multiplies the selected failure count by average OSD size, and custom reserve multiplies the failure count by the custom block size. That reserve is compared against raw post-overhead capacity when calculating the recommended nearfull point.

Formula Core

The main capacity path computes effective raw capacity, chooses the nearfull value used for the estimate, and applies protection efficiency.

Raw effective = (nodesCi)×(1-O)×(1-S) Recommended nearfull = 1-RRaw effective Nearfull used = min(Requested nearfull,Recommended nearfull) Usable capacity = Raw effective×Nearfull used×E

In the formula, C is each entered node capacity, O is OSD or metadata overhead, S is CRUSH imbalance skew, R is the selected raw-domain reserve, and E is protection efficiency. When Accept degraded PGs is on, Nearfull used keeps the requested nearfull value instead of applying the recovery-safe minimum.

Ceph protection formulas and placement requirements
Protection model Efficiency Healthy fault tolerance Minimum spread
Replication n 1 / n n - 1 domains n domains
Erasure coding k+m k / (k + m) m domains k + m domains

Placement group planning is a sizing hint, not a replacement for Ceph autoscaling. The calculator rounds the suggested pool value to a power of two after estimating how many PG shards each OSD would carry.

Suggested PG per pool = round power of two ( OSDs×Target PG per OSD Data pools×Shard factor )

The shard factor is the replica count for replicated pools and k for erasure-coded pools. Estimated PG shards per OSD then multiplies the suggested pool count by data pools and shard factor, and divides by the estimated OSD count.

Ceph validation and guardrail boundaries
Check Boundary used Meaning when it fails
Failure target fit Domains to tolerate <= healthy fault tolerance The selected profile cannot stay healthy through that many losses.
Spread fit Available spread >= minimum spread The chosen failure-domain model has too few placement targets.
Nearfull reserve Requested nearfull <= recommended nearfull The requested fill target leaves too little recovery reserve.
Threshold ordering Nearfull < backfillfull < full The calculation raises backfillfull and full as needed to keep the order valid.

A six-node cluster with 10 TB per node, Replication 3x, one host reserve, no overhead, and no skew has 60 TB raw effective capacity. One host reserve is 10 TB, so recommended nearfull is 1 - 10 / 60 = 83.33%. Usable capacity is then 60 x 0.8333 x 1/3, or about 16.67 TB. The same reserve under EC 4+2 applies 4/6 efficiency and produces about 33.33 TB, but only when six suitable placement domains exist.

Accuracy Notes:

The estimate uses the values you enter. It does not read live OSD utilization, CRUSH weights, device classes, pool quotas, snapshots, compression ratios, balancer state, or autoscaler recommendations.

  • Use OSD or metadata overhead when marketed drive size differs from usable OSD capacity after formatting and BlueStore overhead.
  • Use CRUSH imbalance skew when mixed devices, uneven weights, or known hot OSDs make ideal distribution too optimistic.
  • Treat Accept degraded PGs as an emergency what-if. It keeps the requested nearfull target even when the recovery-safe clamp would lower it.
  • Confirm Suggested PG per pool with live Ceph autoscaler status before changing production pg_num.

Worked Examples:

A six-host replicated pool with Total nodes set to 6, Capacity per node set to 10 TB, Replication 3x, one host in Domains to tolerate, and 85% requested nearfull reports Nearfull used near 83.33%. Usable capacity is about 16.67 TB because one 10 TB host has to remain outside the used raw portion.

The same six hosts under EC 4+2 can show roughly double the Usable value in Scheme Matrix because efficiency changes from 1/3 to 4/6. The comparison is acceptable only if Minimum spread is 6 and Available spread is also at least 6.

A mixed fleet with two 18 TB hosts and four 10 TB hosts should use Heterogeneous nodes. If the plan shows Extra raw for requested nearfull, lower OSD nearfull target, add raw capacity, or reduce the tolerated-domain target before relying on the headline usable number.

FAQ:

Why is usable capacity lower than raw storage?

The calculator reduces raw storage by overhead, skew, nearfull policy, failure reserve, and protection efficiency. Replication 3x turns only one third of the used raw capacity into protected data.

Does changing TB to TiB convert my numbers?

No. The selected unit labels the entered capacities and the results. Convert the numbers yourself before comparing decimal units such as TB with binary units such as TiB.

What should I do when spread fails?

Compare Minimum spread with Available spread. Add placement domains, choose a narrower protection profile, or verify the custom CRUSH path manually if the failure domain is custom.

Why is requested nearfull different from nearfull used?

Nearfull used is clamped down when the requested nearfull value leaves too little recovery reserve. Accept degraded PGs disables that clamp for emergency what-if modeling.

Can I use the PG result directly in production?

Use Suggested PG per pool as a planning starting point only. Check Ceph autoscaler status and the live pool's CRUSH rule before changing production placement groups.

Glossary:

OSD
An Object Storage Daemon that stores Ceph data and participates in placement, recovery, and rebalancing.
Placement group
A pool shard that maps many objects to a set of OSDs through CRUSH.
CRUSH
The Ceph placement system that chooses OSDs according to topology, weights, and rules.
Nearfull
The operating fill threshold used for the capacity estimate and early warning logic.
min_size
The minimum replica or chunk count needed for degraded I/O to continue.
Failure domain
The unit expected to fail together, such as a host, one OSD, or a custom capacity block.

References: