{{ summaryTitle }}
{{ summaryPrimary }}
{{ summaryLine }}
{{ badge.label }}
{{ burnRateStage.sloLabel }} {{ burnRateStage.burnLabel }} Budget Threshold
Error budget burn rate inputs
Name the service, endpoint group, or SLO slice being evaluated.
Enter the target success percentage for the SLO.
%
Use the same window your dashboard uses to calculate remaining budget.
days
Enter the current bad-event percentage from metrics or a quick sample.
%
Set how much of the original error budget is still available.
%
Use the same lookback as the metric query or incident snapshot.
hr
Tune the short-window paging threshold.
x
Set the sustained page threshold used by the alert table and ladder.
x
Set the ticket threshold used by mitigation targets.
x
Keep this at 1x when you want any unsustainable long-window burn to create follow-up.
x
Metric Value Operational note Copy
{{ row.metric }} {{ row.value }} {{ row.note }}
Objective Burn target Error ratio ceiling Required reduction Next move Copy
{{ row.objective }} {{ row.burnTarget }} {{ row.errorCeiling }} {{ row.reduction }} {{ row.nextMove }}
Alert window Threshold Error ratio trigger Current status Budget spend at threshold Copy
{{ row.window }} {{ row.threshold }} {{ row.errorTrigger }} {{ row.currentStatus }} {{ row.budgetSpend }}
Customize
Advanced
:

Introduction:

A reliability incident needs more than a raw error percentage. A 99.9% availability service-level objective, or SLO, leaves only 0.1% of eligible events for errors during the chosen window. Burn rate turns the current bad-event share into a pace: the service may be spending that allowance slowly, exactly as planned, or so fast that the window is at risk long before it ends.

The same error ratio can mean very different things under different targets. A 0.8% failed-request ratio is an 8.00x burn under a 99.9% SLO, but only a 0.80x burn under a 99% SLO. Remaining budget changes urgency too. A service with 95% of its budget left can absorb more current burn than a service that has already used nearly all of its allowance.

Service-level indicator
The measured reliability signal, often good requests divided by eligible requests.
Service-level objective
The target that the service is expected to meet over a policy window.
Error budget
The failure allowance left by the SLO, usually expressed as 100% - SLO%.
Burn rate
The current error ratio divided by the SLO error allowance.

Burn-rate alerting is useful because it separates immediate paging pressure from slower follow-up work. A large spike over a short lookback can spend a meaningful part of the budget in minutes, while a lower but persistent error rate may deserve a ticket because it will still exhaust the budget if it continues. Multi-window alerting compares short and long lookbacks so teams do not page on brief noise or miss a sustained leak.

Error budget burn rate compares SLO allowance, observed errors, burn multiple, and remaining runway.

Burn rate is still a planning estimate, not root-cause evidence. It assumes the error ratio represents the same service-level indicator, window, and eligibility rules used by the SLO. Mixing dashboards, traffic slices, or policy windows can make the multiplier look precise while the underlying comparison is wrong.

How to Use This Tool:

Start with the SLO policy and then enter the current error sample. The result updates as soon as the inputs are valid.

  1. Enter Service or SLO name with the label your team uses in dashboards, alert routes, and incident notes.
  2. Set Availability SLO to the target success percentage. Values must be greater than 0% and less than 100%, because a 100% SLO leaves no error budget for the burn-rate division.
  3. Set SLO window to the full compliance period, such as 7, 28, 30, or 90 days.
  4. Enter Current error ratio from the measurement query you are reviewing. Keep the numerator, denominator, and traffic slice aligned with the SLO.
  5. Set Budget remaining to the share of the original error budget still available in the current window.
  6. Set Observed window to the lookback behind the current error ratio. This controls Observed-window budget spend.
  7. Open Advanced only when your team uses different burn thresholds for Fast page, Sustained page, Ticket, or Slow ticket.
  8. If the summary says Burn-rate inputs need review, fix the listed input before using Burn Ledger, Mitigation Targets, Alert Windows, or the chart tabs.

Interpreting Results:

Current burn rate is the main urgency signal. At 1.00x, the current error ratio matches the pace the SLO can sustain for a full window. Below 1.00x, the service is spending less budget than the policy allows. Above 1.00x, the current pace would exhaust a fresh full budget before the window ends.

Runway to exhaustion uses the burn rate and Budget remaining now. A high burn with plenty of budget left may still leave response time, while a lower burn near the end of the budget can be more urgent. Treat the runway as a straight-line estimate, then verify that the current error ratio still reflects the active incident.

How to interpret error budget burn rate outputs
Output What it helps decide Verification cue
Current burn rate Whether the current error ratio is below, at, or above the sustainable budget pace. Confirm the SLO target and error-ratio units are from the same policy.
Budget consumed per hour How much of the full window budget disappears each hour at the current pace. Check that the SLO window matches the dashboard's budget window.
Runway to exhaustion How long the entered remaining budget lasts if current burn continues. Refresh the error ratio before making paging or rollback decisions.
Observed-window budget spend How much full-window budget the current lookback would consume. Keep Observed window aligned with the metric query lookback.
Maximum possible burn Whether a configured threshold can be crossed for the chosen SLO. Review thresholds that require more than 100% bad events.

A crossed threshold does not identify the failing dependency, and a clear threshold does not prove the service is healthy. Compare Burn Ledger with Alert Windows, then check the live SLI query before changing incident severity.

Technical Details:

An availability SLO converts a success target into an error allowance. The error allowance is small for high-reliability services, so modest-looking error ratios can produce large burn multiples. A 99.9% SLO allows 0.1% errors, which means 0.8% current errors spend budget eight times faster than the full SLO window can sustain.

Runway depends on both pace and inventory. Burn rate describes pace against a full fresh budget, while remaining budget describes how much of that budget is still available now. The same burn rate can mean different operational urgency depending on where the service sits in the current SLO window.

Formula Core:

The equations use percent units because the inputs are displayed as percentages. Equivalent decimal formulas work when every percentage is divided by 100 consistently.

Bpct = 100-SLOpct BR = EcurrentBpct Whr = Wdays×24 Chour = BR×100Whr Trunway = BremainingChour Cobserved = BR×Hobserved×100Whr Etrigger = BRthreshold×Bpct

Here Bpct is the SLO error budget percentage, Ecurrent is the current error ratio percentage, BR is burn rate, Whr is SLO window hours, Chour is the full-budget percentage consumed per hour, Trunway is hours until remaining budget reaches zero, and Etrigger is the error ratio that crosses a threshold.

Using the default values, a 99.9% SLO gives a 0.1% error allowance. A 0.800% current error ratio gives 0.800 / 0.100 = 8.00x. In a 30-day window, 8.00x consumes about 8 * 100 / 720 = 1.111% of the full error budget per hour. With 62% budget remaining, runway is about 62 / 1.111 = 55.8 hours, or 2.3 days.

Error budget burn rate validation bounds
Input Accepted range Role in the calculation
Availability SLO > 0% and < 100% Defines the error allowance used as the burn-rate denominator.
SLO window > 0 days Sets the window hours used for hourly budget consumption.
Current error ratio >= 0% and <= 100% Supplies the observed bad-event share.
Budget remaining >= 0% and <= 100% Sets the remaining inventory for runway.
Observed window > 0 hours Scales the observed-window budget spend.
Burn thresholds > 0x Set the alert rows and mitigation ceilings.
Default burn alert threshold rules
Alert row Default burn Window label Crossing rule
Fast page 14.40x 5 min / 1 hr Current burn rate greater than or equal to 14.40x crosses the row.
Sustained page 6.00x 30 min / 6 hr Current burn rate greater than or equal to 6.00x crosses the row.
Ticket 3.00x 2 hr / 24 hr Current burn rate greater than or equal to 3.00x crosses the row.
Slow ticket 1.00x 6 hr / 72 hr Current burn rate greater than or equal to 1.00x crosses the row.

Mitigation ceilings reverse the same burn-rate equation. For a 99.9% SLO, the 6.00x sustained page threshold maps to an error ratio of 6 * 0.1% = 0.600%. If the current error ratio is 0.800%, clearing that row requires reducing bad events by at least 0.200 percentage points, or 25.0% relative to the current error ratio.

Accuracy Notes:

The calculation is only as good as the SLO and metric values entered. It does not query a monitoring system or confirm that a service is currently failing.

  • Use the same eligible-event definition for Availability SLO and Current error ratio.
  • Keep Budget remaining from the same SLO window as the target.
  • Refresh the metric query when incident conditions are changing quickly.
  • Review any threshold whose Error ratio trigger is above 100%, because that row cannot be crossed for the selected SLO.

Worked Examples:

Checkout API with sustained page pressure

With Service or SLO name set to checkout-api availability, Availability SLO at 99.9%, SLO window at 30 days, Current error ratio at 0.800%, Budget remaining at 62%, and Observed window at 1 hour, Current burn rate is 8.00x. Budget consumed per hour is about 1.111%, Runway to exhaustion is about 2.3 days, and Alert Windows shows that the sustained page row is crossed.

Sharp spike that reaches the fast page row

Keeping the 99.9% SLO and 30-day window, changing Current error ratio to 1.500% makes Current burn rate 15.00x. The default Fast page threshold is 14.40x, so the Error ratio trigger is 1.440%. Mitigation Targets shows that the error ratio must fall below that ceiling to clear the fast page row.

Threshold that cannot be reached

A 90% SLO gives a 10% error allowance. A 14.40x fast page threshold would require an Error ratio trigger of 144%, which is impossible because Current error ratio is bounded at 100%. That is a policy mismatch rather than a service-health signal, so the threshold or SLO should be reviewed before relying on that alert row.

Input that must be corrected first

If Availability SLO is entered as 100%, the summary changes to Burn-rate inputs need review. A 100% SLO leaves a zero error budget, so the calculator cannot produce a meaningful Current burn rate, Runway to exhaustion, or threshold comparison until the target is changed to a value below 100%.

FAQ:

Is burn rate the same as error ratio?

No. Current error ratio is the observed percentage of bad events. Current burn rate divides that percentage by the SLO error allowance, so 0.800% errors become 8.00x burn under a 99.9% SLO.

Why does runway change when budget remaining changes?

Runway to exhaustion uses both Current burn rate and Budget remaining now. The burn multiple can stay unchanged while runway gets shorter as the remaining budget percentage falls.

What does an unreachable alert threshold mean?

It means the threshold burn multiplied by the SLO error allowance would require more than 100% bad events. Review the threshold when Error ratio trigger says Above 100% error ratio.

Does the result check my monitoring system?

No. The calculation uses the values you enter. Pull Current error ratio, Budget remaining, and SLO window from the same monitoring and SLO policy context before trusting the result.

When should I change the default thresholds?

Change them when your team has an explicit alert policy for the service. Critical user-facing services may need different page or ticket thresholds than internal batch services with looser recovery expectations.

Glossary:

Service-level indicator (SLI)
The measured reliability signal, such as the share of eligible requests that succeed.
Service-level objective (SLO)
The reliability target for a service over a defined window.
Error budget
The allowed failure percentage left by the SLO.
Burn rate
The current error ratio divided by the allowed error budget percentage.
Observed window
The metric lookback behind the current error ratio.
Budget runway
The estimated time until remaining budget reaches zero if the current burn continues.
Alert window
A burn threshold paired with a lookback label and page or ticket route.

References: