Capacity Upgrade Brief Report
Build a capacity upgrade brief from service utilization, growth, threshold timing, lead time, and ranked options with charts and approval notes.{{ model.briefMarkdown }}
| Metric | Value | Brief note | Copy |
|---|---|---|---|
| {{ row.metric }} | {{ row.value }} | {{ row.note }} |
| Rank | Option | Cost | Gain | Horizon utilization | Decision note | Copy |
|---|---|---|---|---|---|---|
| {{ row.rank }} | {{ row.name }} | {{ row.costDisplay }} | {{ row.gainDisplay }} | {{ row.horizonDisplay }} | {{ row.decision }} |
Capacity risk usually appears in the measurements before it appears as a full outage. A firewall can still pass traffic, a storage pool can still accept writes, or a queue can still drain overnight while the growth curve already shows that the approval and implementation window is too short for a calm upgrade.
A credible upgrade brief connects the current peak load, the expected demand path, and the time needed to add capacity that is actually usable. A high utilization number is not automatically urgent when growth is flat and redundancy is healthy. A moderate number can be risky when demand compounds, the bottleneck sits in the critical path, or every realistic option needs weeks of procurement, licensing, installation, and validation.
The decision is also financial. Overprovisioning locks budget into idle headroom, while underprovisioning leads to latency, retries, failed writes, packet loss, lost failover margin, or emergency changes at a poor time. A useful capacity note explains why the work should start now, what evidence points to the bottleneck, and which option gives enough relief without hiding cost or disruption.
- Action threshold
- The utilization level where capacity work should begin while there is still time to approve, buy, deploy, and validate the change.
- Critical threshold
- The higher danger band where latency, queueing, exhaustion, loss of failover margin, or user-visible errors may become unacceptable.
- Implementation lead time
- The elapsed time before extra capacity is truly available, including procurement, licensing, delivery, maintenance windows, rollout, and post-change checks.
- Reserve
- The headroom left below the action threshold after the upgrade, so the service does not land exactly on the next trigger point.
Thresholds need to match the resource and failure mode. CPU saturation, inspection throughput, storage I/O, queue depth, address exhaustion, connection limits, and licensed session pools do not degrade in the same way. Some resources fail abruptly, while others first show rising latency, longer queues, or reduced redundancy. The chosen action threshold should therefore be tied to the measured bottleneck rather than copied from a generic percentage.
Forecasts are still assumptions. Demand can flatten after a migration, jump after a launch, or move to another component after a partial upgrade. The brief should make those assumptions visible enough for engineering, finance, and change approvers to challenge the numbers before the service is already in the critical band.
How to Use This Tool:
Start with measurements you would defend in a change or budget review. The generated brief is a structured draft, not a substitute for telemetry and stakeholder approval.
- Enter
Service or platformusing the name that appears in tickets, dashboards, procurement notes, or change records. - Set
Current peak utilization,Monthly demand growth, andPlanning horizon. Use a sustained peak for normal planning, and use a spike only when that spike is the service risk being briefed. - Set
Action thresholdandCritical threshold. The critical value must be greater than the action value or the validation list reportsCritical threshold must be greater than the action threshold. - Enter
Implementation lead time, chooseCapacity strategy, and setTarget reserve after upgrade. These fields change urgency, target utilization, and required capacity gain. - Add
Bottleneck evidenceas a concise paragraph that names the constrained component, the user or operational symptom, and the consequence. That text is included in the generated brief. - Paste
Upgrade optionsas one row per option. The full row order is option, cost, gain percent, lead months, disruption from 1 to 5, and note.Load samplefills a working set, andNormalize rowsrewrites parsed rows into the expected CSV-style shape. - Use
Advancedwhen the review needsForecast confidence,Cost unit,Budget ceiling,Maximum acceptable disruption, or the executive summary toggle. Forecast confidence changes wording in the brief, not the calculations. - Check
Capacity Ledgerfirst for forecast utilization, breach timing, required gain, and recommendation. UseOption Ranking,Utilization Runway, andOption Relief Stackto explain why the top option ranked ahead of alternatives. - If the summary says
Capacity brief needs input, fix the validation messages before copying the Markdown brief, using the charts, or relying on the JSON output.
Advanced Tips:
- Set
Action thresholdto the point where work should begin, not to the point where the service fails. Procurement and change windows need runway. - Use
Leadfor constrained, customer-facing, or hard-to-expand systems where extra reserve is worth the cost. UseLagonly when evidence is still weak and tighter headroom is acceptable. - Keep
Forecast confidencehonest. It changes the wording reviewers see, so a low-confidence forecast should still say which telemetry, seasonality, or launch assumptions need confirmation. - Leave
Budget ceilingat zero when budget is unknown. Once a real ceiling exists, use it to surface over-budget warnings without hiding high-relief options. - Use
Maximum acceptable disruptionto reflect the change window and rollback tolerance. A technically strong option can still be the wrong recommendation if it exceeds the allowed disruption level. - Review both
Utilization RunwayandOption Relief Stack. The runway chart shows timing, while the relief stack shows whether each option brings horizon utilization below target.
Interpreting Results:
The headline forecast is utilization at the planning horizon before the recommended option is applied. Read it with the severity badge and Action threshold breach. A service can show Start now before the horizon reaches the critical threshold when the action breach is inside the implementation lead time.
The recommended option is the highest-ranked entered option, not a purchase order. It can rank first while still showing Partial relief only, over budget ceiling, or disruption above tolerance. Treat those phrases as review flags that need a decision, not as hidden filters that removed the option from consideration.
| Output | What it can support | What to verify |
|---|---|---|
Required capacity gain |
The minimum gain needed to bring horizon utilization down to the target utilization. | Vendor sizing, workload mix, architecture changes, and whether the gain applies to the real bottleneck. |
Action threshold breach |
The estimated time until the current trend reaches the action threshold. | Recent telemetry, known launches, seasonality, incidents, and any trend break in the measurement window. |
Recommended horizon utilization |
The modeled horizon utilization after the top option's capacity gain is applied. | Whether the option can be delivered by its lead month and whether deployment risk is acceptable. |
Utilization Runway |
The no-upgrade path compared with the recommended option over the planning horizon. | Whether action and critical thresholds are appropriate for this service and resource type. |
A calm Monitor badge can still hide risk when forecast confidence is low or bottleneck evidence already shows user-visible symptoms. A severe badge should be checked against fresh telemetry before escalation, especially when the growth rate came from a short or unusual sample.
Technical Details:
Capacity runway compares a projected demand curve with two utilization thresholds. The action threshold is the operational trigger for starting capacity work. The critical threshold is a stronger risk boundary where the service may degrade, lose redundancy, or exhaust a constrained resource.
Monthly growth is compounded. That makes a steady percentage increase more aggressive than a flat point increase, which matches many workloads where new users, traffic, data, or sessions build on the previous month. Zero growth produces a flat forecast, and a service already at or above a threshold has a breach timing of 0 months.
Formula Core
Projected utilization is computed first. Threshold timing is then calculated with logarithms when utilization is below the threshold and monthly growth is positive. Required gain compares the horizon forecast with the target utilization after reserve and strategy adjustments.
| Term | Unit | Role in the calculation |
|---|---|---|
current utilization |
Percent of usable capacity | Starting point for the forecast and threshold timing. |
monthly growth |
Decimal rate | 6% is treated as 0.06 before compounding. |
months |
Whole months | Planning horizon used for the forecast and chart points. |
target utilization |
Percent | Action threshold minus reserve and strategy adjustment, clamped below the action threshold. |
With 78% current peak utilization and 6% monthly growth, a six-month forecast is about 110.6%. If target utilization is 65%, required capacity gain is about 70.2%. Displayed values are rounded for readability, while option ranking uses the underlying numeric values.
| Strategy | Reserve effect | Urgency effect | Planning meaning |
|---|---|---|---|
Lead |
Adds 5 percentage points of reserve pressure. | Treats the approval window as 1 month tighter. | Favors earlier approval and more headroom. |
Match |
Uses the entered reserve as-is. | Uses the entered implementation lead time as-is. | Balances lead time, cost, and incremental capacity. |
Lag |
Relaxes reserve pressure by 4 percentage points. | Allows about 0.75 months more timing tolerance. | Accepts tighter runway and emphasizes cost restraint. |
Status labels are ordered from most severe to least severe. Inclusive boundaries matter: current utilization at or above the critical threshold is Critical now, and forecast utilization at or above the critical threshold is enough for Start now. The Lead strategy tightens the urgency window by one month, while Lag relaxes it by 0.75 months.
| Status | Boundary | Meaning |
|---|---|---|
Critical now |
Current utilization >= critical threshold. |
The service is already in the critical band. |
Start now |
Forecast >= critical threshold, or action breach timing is within adjusted lead time. |
The change path is too close to the forecast action breach. |
Approve in horizon |
Forecast >= action threshold and the stronger rules above do not apply. |
The planning horizon crosses the action point. |
Watch current load |
Current utilization >= action threshold while the stronger rules do not apply. |
The service is already above the action point. |
Monitor |
Current and forecast utilization remain below the action threshold. | No upgrade is indicated by the current forecast, but the assumptions still need review. |
| Factor | How ranking uses it | Output effect |
|---|---|---|
| Relief fit | Capacity gain is compared with required gain and capped so one oversized option cannot dominate every signal. | Strongly affects rank and the decision note. |
| Runway | Post-upgrade months to action threshold are compared with the planning horizon. | Rewards options that leave usable time after the chosen horizon. |
| Cost | Positive costs are compared with the cheapest positive option in the entered set. | Improves cost fit for lower-cost options, but weak relief can still lose. |
| Lead time | Option lead time is checked against action breach timing. | Penalizes slow options when the service is already near or past action. |
| Disruption and budget | Options above budget ceiling or maximum acceptable disruption receive penalties. | The option stays visible and the decision note names the warning. |
Accuracy Notes:
The analysis is a planning model, not a live monitoring feed. It uses the numbers and option rows entered in the browser session and does not query telemetry, procurement, vendor, or change-management systems.
- Use a measurement window that represents the service's real peak pattern, not a one-off anomaly unless that anomaly is the risk being briefed.
- Check that the selected utilization metric matches the bottleneck. CPU, storage throughput, memory pressure, queue depth, and session limits need different evidence.
- Validate option gains after deployment. A stated gain may not apply if the bottleneck moves to another component or the migration changes workload shape.
- Revisit forecasts after launches, seasonal changes, migrations, incidents, or pricing/licensing changes that can shift demand or delivery lead time.
Worked Examples:
Firewall capacity approval
An internet edge firewall starts at 78% current peak utilization with 6% monthly growth across a 6-month horizon. The action threshold is 75%, the critical threshold is 90%, implementation lead time is 2 months, and target reserve is 10 points with the match strategy. Utilization Runway reports about 110.6% at month 6 and can show Start now. A 90% gain option lowers Recommended horizon utilization to about 58.2%, which meets target reserve.
Lead-time edge case
A worker pool at 70% utilization grows 2% per month across a 6-month horizon with a 75% action threshold. Action threshold breach is about 3.5 months away. If Implementation lead time is 4 months, the status can become Start now because the orderly change path is already longer than the runway.
Invalid option rows before review
A pasted row such as License expansion,15000,0,Needs vendor quote triggers the validation message Option row 1 needs a name and positive capacity gain. and keeps the summary at Capacity brief needs input. Changing the row to License expansion,15000,30,2,2,Needs vendor quote lets Option Ranking, Capacity Ledger, and the Markdown brief populate.
FAQ:
What should I enter for current peak utilization?
Use a recent sustained peak as a percentage of usable capacity. A single sample can overstate urgency unless the brief is specifically about that spike and its service impact.
Why does the brief say Start now before the horizon ends?
The status compares Action threshold breach with Implementation lead time and the selected strategy. If the action threshold is too close for the change path, Start now can appear before the planning horizon reaches the critical threshold.
What format do upgrade option rows use?
Use one row per option in the order option, cost, gain percent, lead months, disruption from 1 to 5, and note. Shorter rows with option, cost, gain percent, and note can parse, but ranking has less timing and disruption evidence.
Why is an over-budget option still listed?
The Budget ceiling is a ranking penalty and warning, not a filter. Over-budget options remain visible so reviewers can decide whether stronger relief is worth escalation.
Does forecast confidence change the math?
No. Forecast confidence changes wording in the capacity ledger and brief. Projected utilization, breach timing, required gain, ranking, and charts use the numeric inputs.
Does it connect to monitoring systems?
No. The analysis uses values and option rows entered in the browser session. It does not query telemetry, procurement, vendor, or change-management systems.
Glossary:
- Action threshold
- The utilization level where capacity work should begin before service quality, resilience, or delivery lead time is at risk.
- Critical threshold
- The higher utilization band where unacceptable latency, queueing, packet loss, exhaustion, or failover-margin loss may occur.
- Planning horizon
- The future month used to evaluate forecast utilization, required gain, and option relief.
- Implementation lead time
- The time needed for procurement, licensing, approval, deployment, and validation before capacity relief is available.
- Target reserve
- The intended gap below the action threshold after the upgrade has been applied.
- Required capacity gain
- The minimum percent gain needed to bring horizon utilization down to target utilization.
- Horizon utilization
- The forecast utilization at the planning horizon, either without an upgrade or after an option's gain is applied.
References:
- SRE Best Practices for Capacity Management, USENIX ;login:, Winter 2020.
- Architecture strategies for capacity planning, Microsoft Learn.
- COST09-BP01 Perform an analysis on the workload demand, AWS Well-Architected Framework.
- Handling Overload, Google SRE.