Projected Usable Capacity

{{ formatCap(rawEffectiveTB) }} {{ unit }} Raw (post-overhead) {{ protectionLabel }} {{ (100 * totalEfficiency).toFixed(1) }} % Protected Eff.

Total nodes:

Capacity per node:

Define each node manually (heterogeneous):

Per-node capacities:

Node {{ i }}

Filled: {{ manualValidCount }} / {{ Math.max(0, Math.floor(Number(nodeCount) || 0)) }} nodes, {{ formatCap(rawTB) }} {{ unit }} total (before overhead).

Data protection:

OSD nearfull ratio :

{{ (nearfull * 100).toFixed(0) }} %

Failure domain for headroom :

OSDs / node

Cap {{ unit }}

Domains to tolerate (fail) :

OSD/metadata overhead :

{{ (osdOverhead*100).toFixed(1) }} %

CRUSH imbalance skew :

{{ (skew*100).toFixed(1) }} %

Accept degraded PGs (override recommended nearfull) :

Recommended nearfull ≤ {{ (recommendedNearfull * 100).toFixed(0) }}% to recover after {{ failNodes }} {{ failureDomain }} failure{{ failNodes===1?'':'s' }}. Using {{ (effectiveNearfull * 100).toFixed(0) }}%.

Metric	Value
Nodes	{{ nodes.length }}
Raw storage ({{ unit }})	{{ formatCap(rawTB) }}
Overhead ({{ unit }})	{{ formatCap(rawTB - rawEffectiveTB) }}
Raw post-overhead ({{ unit }})	{{ formatCap(rawEffectiveTB) }}
OSD nearfull (used)	{{ (effectiveNearfull*100).toFixed(0) }} %
Recommended nearfull	{{ (recommendedNearfull*100).toFixed(0) }} %
Protection	{{ protectionLabel }}
Protected efficiency	{{ (100*totalEfficiency).toFixed(2) }} %
Efficiency @ min_size	{{ (100*minEfficiency).toFixed(2) }} %
Usable capacity ({{ unit }})	{{ formatCap(usableTB) }}
Redundancy overhead ({{ unit }})	{{ formatCap(redundantRawTB) }}
Reserved / free raw ({{ unit }})	{{ formatCap(reservedRawTB) }}

Categories: Calculator , Networking , Storage , System Administration

Export to PDF Fullscreen

Embed:

Ceph stores objects across many nodes using replication or erasure coding to survive failures while distributing load. Usable capacity differs from raw capacity because space is reserved for redundancy, metadata, and rebalancing. Planning therefore requires converting node-level figures into protected efficiency, then applying operational headroom so recovery can complete within your fault-tolerance goals.

This tool models that process from first principles. You enter homogeneous or heterogeneous node capacities, choose a protection scheme, and set operational factors such as nearfull safety, service overhead, placement skew, and failure tolerance. The reactive engine calculates raw post-overhead capacity, recommends a safe nearfull threshold, and derives usable capacity plus a concise breakdown.

Use it when designing a new cluster, validating an expansion, or explaining trade-offs to stakeholders. For example, five 8 TB nodes at 3× replication often yield roughly 10 TB usable after headroom. Caution: it estimates space, not performance; leave margin for background repairs, snapshots, and upgrades.

Technical Details:

Concept Overview

Replication stores r full copies of each object; erasure coding splits data into k data and m parity chunks, reconstructable from any k. Protected efficiency expresses payload as a fraction of consumed raw space. Operational factors include filesystem/OSD overhead, a nearfull ratio limiting usable raw to preserve recovery headroom, and placement skew reflecting CRUSH imbalance. A failure domain defines the blast radius considered in headroom.

Core Equations

Raw post-overhead

R = R_raw \times (1 - o)

Protected efficiency

E = \frac{1}{r} (replication)

E = \frac{k}{k + m} (erasure coding)

Recommended nearfull

n = min (0.95, max (0.50, 1 - min (1, \frac{f \times D}{R})))

Usable capacity

U = R \times n \times E \times (1 - s)

R_raw raw sum of node capacities; o overhead fraction; R raw post-overhead.
r replicas; k data chunks; m parity chunks; E protected efficiency.
n effective nearfull; f tolerated failure domains; D capacity of one domain.
s skew fraction modeling placement imbalance; U usable capacity.

Interpretation & Output Semantics

Metric	Meaning	Implication
Usable	Estimated payload space after protection and headroom.	Primary figure for planning dataset growth.
Redundancy	Raw consumed to achieve durability.	Grows with higher replication or parity.
Reserved	Raw left unused due to nearfull headroom.	Enables rebalance and repair after failures.
Protected efficiency	Payload divided by consumed raw.	Equals 1/replicas or k/(k+m).
Efficiency @ min_size	Payload efficiency when operating at the minimum healthy size.	Repairs may temporarily reduce efficiency.
Recommended nearfull	Ceiling ensuring enough free space to heal tolerated failures.	Lower when domains are large or many.

Use the protected efficiency and recommended nearfull together; raising one typically lowers the other. A higher headroom percentage trades capacity for faster, safer recovery.

Variables & Parameters

Parameter	Meaning	Unit/Datatype	Typical Range	Notes
Capacity per node	Homogeneous node size.	MB/GB/TB/PB or MiB/GiB/TiB/PiB	1–100 TB	Use manual list for mixed nodes.
Total nodes	Node count when homogeneous.	Integer	3–1000	Minimum depends on chosen protection.
Node capacities	Comma/space list for heterogeneous clusters.	Numbers	Varies	Parsed as a set of node sizes.
Protection mode	Replication or erasure coding.	Enum	—	Choose to match pools.
Replicas (`r`)	Full copies stored.	Integer ≥ 1	2–4	Efficiency = 1/`r`.
Rep min_size	Minimum healthy copies.	Integer	2–3	Lower implies degraded risk.
EC `k`, `m`	Data and parity chunks.	Integers	k=4–12, m=2–6	Efficiency = `k/(k+m)`.
EC min_size	Minimum chunks to serve IO.	Integer ≥ `k`	k–k+m	Lower increases risk under failures.
Nearfull	Allowed fullness before backoff.	Fraction	0.50–0.95	Clamped to recommendation unless overridden.
OSD/metadata overhead	Non-payload storage.	Fraction	0–0.15	Accounts for filesystems and metadata.
CRUSH skew	Imbalance across devices.	Fraction	0–0.15	Reduces usable to reflect hot/overfull nodes.
Failure domain	Unit used for headroom.	Host, OSD, or custom	—	Domain capacity multiplies tolerated failures.
Domains to tolerate	Failures healed without reweighting.	Integer ≥ 0	0–2	Higher values reduce nearfull.
Accept degraded	Allow operating above recommendation.	Boolean	Off/On	Risky under multiple failures.

Worked Example

Example (5 nodes × 8 TB, replication r=3, overhead o=0.05, nearfull request 0.80, tolerate f=1 host, domain D=8 TB, skew s=0):

R = 40 \times (1 - 0.05) = 38

n = 1 - \frac{8}{38} = 0.7895

n = min (0.80, 0.7895) = 0.7895

E = \frac{1}{3} = 0.3333

U = 38 \times 0.7895 \times 0.3333 \times 1 = 10.0 TB

Redundancy ≈ 20 TB; reserved headroom ≈ 8 TB.

Assumptions & Limitations

Headroom is based on the largest domain and tolerated failures, not rack-level power or network constraints.
Skew models placement imbalance, not transient spikes or IO hotspots.
Nearfull override accepts degraded service and may prolong recovery under additional failures.
Overhead parameter aggregates filesystem, small-object effects, and metadata; it is not a precise filesystem model.

Edge Cases & Error Sources

Highly heterogeneous nodes lower usable space because headroom is sized by the largest domain.
Very small clusters may violate placement rules, making the efficiency at min_size optimistic.
Incorrect units (TB vs TiB) lead to consistent scaling errors across all outputs.
Setting min_size equal to k+m allows service but eliminates parity tolerance, increasing risk during recovery.

Scientific Validity & References

Equations align with standard Ceph concepts for replica count, erasure coding parameters, placement via CRUSH, nearfull thresholds, and pool min_size. See the Ceph documentation on replication, erasure coding profiles, and cluster fullness thresholds, and the “CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data” paper by Weil et al. for placement theory.

Privacy & Compliance

This calculator processes non-personal infrastructure data and performs all computations client-side; organizational policies and data-handling rules may still apply.

Step-by-Step Guide:

Follow these steps to estimate usable space and export results for planning.

Enter capacity per node and total nodes, or toggle heterogeneous input and paste a list of node sizes.
Select protection mode and set replicas or k+m with an appropriate min_size.
Adjust nearfull, overhead, and skew to match operational practice; prefer the recommended nearfull for safe recovery.
Choose the failure domain, set domains to tolerate, and refine OSDs per node or custom capacity if applicable.
Review the table, charting layer breakdown, and JSON, then copy or download CSV/JSON for records or peer review.

Warning Aggressively raising nearfull or lowering min_size increases operational risk during drive or host failures.

FAQ:

Is my data stored?

No. Calculations run in your browser, and nothing is sent to a server.

What is nearfull?

A safety cap on raw usage that preserves enough free space to heal the number of failures you plan to tolerate at your chosen domain size.

Why does efficiency drop at min_size?

During repair or degraded states, fewer replicas or chunks serve IO; payload per raw temporarily aligns with the min_size efficiency.

Can I mix node sizes?

Yes. Paste capacities as a list. Headroom is sized by the largest domain, so heterogeneity can reduce usable space.

Does it model racks or power zones?

It models a generic failure domain by capacity. For rack-aware planning, use a domain that reflects the largest independent blast radius.

Are binary units supported?

Yes. You can use decimal (MB, GB, TB, PB) or binary (MiB, GiB, TiB, PiB) units consistently across inputs and outputs.

Troubleshooting:

Usable shows 0 — check that node count and capacities are positive and protection parameters are valid.
Nearfull recommendation seems low — reduce tolerated failures or failure-domain size, or add nodes to increase total raw.
Chart appears empty — ensure results are ready and that at least one node capacity is non-zero.
CSV/JSON lacks values — confirm browser clipboard/download permissions and that calculations completed first.
Unexpected units — verify that inputs and expectations both use decimal or binary consistently.

Blocker: Invalid min_size. Ensure replication min_size ≤ replicas and EC min_size ≥ k.

Advanced Tips:

Tip Model maintenance by temporarily increasing tolerated failures; this lowers nearfull and reveals conservative headroom.
Tip Compare replication and erasure coding at equal durability to visualize the efficiency benefits versus CPU and latency trade-offs.
Tip Use custom domain capacity for rack-sized failure planning when hosts vary or OSD counts differ widely.
Tip Keep skew modest; large values suggest topology or weight tuning rather than capacity headroom alone.
Tip Export JSON to capture assumptions alongside results for peer review and reproducibility.

Glossary:

Usable capacity: Payload space after protection and headroom.
Replication: Storing multiple full copies of an object.
Erasure coding: Splitting data into chunks with parity for recovery.
min_size: Minimum replicas or chunks required to serve IO.
Nearfull: Fullness limit to preserve recovery space.
CRUSH skew: Imbalance in data placement across devices.
Failure domain: Capacity at risk from a single fault.

Ceph Storage Capacity Calculator