API Error Budget Analyzer

API name:

Name the API or SLO slice represented by the status-code sample.

Availability SLO:

Enter the target success percentage for this API SLO.

Compliance period:

Use the SLO window you report against, not just the sample period.

days

Observed window:

Set the measurement span so the burn-rate projection uses the right traffic pace.

Budget-consuming statuses:

Define which response codes should burn the API error budget.

Status counts:

Use one status count per line, or paste sampled log lines to count one request per matching status.

Check API budget inputs

{{ error }}

Metric	Value	Operational note	Copy
{{ row.metric }}	{{ row.value }}	{{ row.note }}

Status	Class	Requests	Share	Budget role	Copy
{{ row.status }}	{{ row.classLabel }}	{{ row.countLabel }}	{{ row.shareLabel }}	{{ row.role }}

Guardrail	Condition	Current	Outcome	Copy
{{ row.guardrail }}	{{ row.condition }}	{{ row.current }}	{{ row.outcome }}

Export to PDF Fullscreen

Embed:

Customize

Include current inputs

Size

Advanced

Width

Height

Aspect ratio

Max height

Collapsible embed

Allow fullscreen

Referrer policy

Sandbox tokens

API error budgets turn an availability target into a count of requests that may fail during a compliance period. A 99.9% service-level objective leaves 0.1% of eligible requests for budget-consuming outcomes. If an endpoint handles 10,000,000 eligible requests in a month, that target allows about 10,000 bad events before the service misses the objective.

The budget is useful because percentages can hide operational urgency. A small error ratio can still burn too quickly when traffic is high, and a short incident can consume a large share of a strict objective. Reliability reviews therefore need both the observed error mix and the pace at which that mix would spend the period allowance if it continued.

HTTP status counts are a practical starting point for request-based availability checks. Teams often treat 5xx responses as service failures, then decide whether throttling, client-facing 4xx responses, gateway-specific codes, or edge statuses should also spend budget for a particular service-level indicator. That decision must match the SLO definition, not just a generic status-code habit.

Error budget burn diagram with watch and overrun lines across a compliance period

An error budget readout is not an incident diagnosis by itself. It does not prove the API is healthy, and it does not say which deployment, dependency, region, customer segment, or retry pattern caused failures. It gives a fast reliability accounting view that should be checked against logs, monitors, ownership, and user impact before a release or incident decision.

Technical Details:

A request-based availability SLO has three moving parts: eligible events, bad events, and the target success percentage. Eligible events are the requests that count toward the objective. Bad events are the subset that the SLO definition treats as failures. The error budget is the difference between perfection and the SLO target, expressed as allowed bad events over the compliance period.

Burn rate compares the observed bad-event ratio with the allowed bad-event ratio. A burn rate of 1.00x means the current error mix would use the full allowance exactly by the end of a same-shaped compliance period. A burn rate below 1.00x leaves budget if traffic and errors continue at the same pace. A burn rate above 1.00x spends budget faster than the period can sustain.

Formula Core

The calculation normalizes the observed request sample to a daily pace, then projects that pace across the compliance period. Percentages shown in the result are display values; the arithmetic uses ratios before formatting.

\begin{array}{lcl} Budget ratio & = & \frac{100 - SLO percent}{100} \\ Observed error ratio & = & \frac{budget-consuming requests}{eligible requests} \\ Burn rate & = & \frac{observed error ratio}{budget ratio} \\ Daily request pace & = & eligible requests \times \frac{24}{observed hours} \\ Projected period budget & = & daily request pace \times compliance days \times budget ratio \\ Projected budget used & = & \frac{projected bad requests}{projected period budget} \times 100 \\ Projected exhaustion & = & \frac{compliance days}{burn rate} \end{array}

For a 99.9% SLO, the budget ratio is 0.001. If a sample has 253,093 eligible requests and 493 budget-consuming responses, the observed error ratio is about 0.001948. Dividing 0.001948 by 0.001 gives a burn rate of about 1.95x, so the same mix would spend a 30-day allowance in about 15.4 days.

Supported status count and budget-consuming status patterns
Input or pattern	Accepted form	How it is counted
Status counts	`200=248000`, `500:55`, `503,18`, or status followed by a count	The stated count is added to that HTTP status.
Log-like lines	Lines containing `status=500`, `status_code=500`, `code=500`, access-log status fields, or another HTTP status token	Each matching line counts as one request for the first detected status.
Status class	`5xx`, `4xx`, or another class from `1xx` to `5xx`	Any status in that hundred range is marked as budget-consuming.
Status range	`500-599`, `400-499`, or another inclusive HTTP range	Any code between the two endpoints, including both endpoints, spends budget.
Exact status	`429`, `499`, `500`, or another exact code	Only that status spends budget unless another pattern also matches.

The main state rules are intentionally conservative. The result is over budget when projected period use reaches 100% or the observed sample has already used more bad requests than the sample allowance. It moves into watch status when projected period use reaches 70% or burn rate rises above 1.00x. Separate guardrail rows call out 6.00x and 14.40x fast-burn territory because those thresholds are common SRE alerting reference points, not because every API should page at those exact numbers.

API error budget state and guardrail boundaries
Signal	Boundary	Meaning
Budget overrun	Projected period use >= 100% or sample delta < 0	The current sample is already beyond its local allowance or projects to spend the period budget.
Burn watch	Projected period use >= 70% or burn rate > 1.00x	The sample has not necessarily missed the SLO, but it deserves review before more release risk is added.
Sustainable burn	Burn rate <= 1.00x	The current bad-event ratio is at or below the long-period allowance.
Fast-burn watch	Burn rate >= 6.00x	The error mix can spend a meaningful share of the budget quickly if confirmed by recent traffic.
Critical burn gate	Burn rate >= 14.40x	The sample resembles a high-severity burn case and should be checked against short-window monitors.
Sample confidence	Allowed bad events in sample >= 10	Smaller allowances can make a handful of failures look louder than the service reality.

The result is deterministic from the pasted rows and settings. It does not query an API, verify a live SLO object, or inspect request payloads. That makes the output fast and reproducible, but it also means the user must decide which requests are eligible, which status codes spend budget, and whether the sample window represents the same traffic mix as the compliance period.

Everyday Use & Decision Guide:

Start with the SLO definition you would defend in a review. Enter API name as the service, endpoint group, or SLO slice that appears in incident notes. Set Availability SLO to the target success percentage, then set Compliance period to the window used for reporting, such as 7, 28, 30, or 90 days.

Use Observed window for the measurement span behind the pasted data, not the compliance period. A two-hour sample and a 30-day SLO answer different questions: the sample describes the current mix, while the projection asks what happens if that mix continues. If traffic has a heavy daily or weekly cycle, compare several windows before using the projection in a release decision.

The most important setup choice is Budget-consuming statuses. A first pass of 5xx,429 is useful for many API checks because it treats service failures and throttling as budget spend. Use only 5xx when client throttling is excluded from the SLO. Add 499 or a range such as 500-599 only when those statuses match the service-level indicator definition.

Budget Ledger shows the SLO target, eligible requests, budget-consuming requests, observed success rate, burn rate, projected use, exhaustion timing, and remaining budget events.
Status Mix is the quickest place to catch a wrong pattern, such as 429 being treated as allowed when throttling should count.
Burn Guardrails explains whether the run is sustainable, in watch status, near fast-burn territory, or too small to trust without a longer sample.
Budget Burn Curve shows how the projected budget use moves through the compliance period and where it crosses 70% or 100%.
JSON is the compact handoff when another workflow needs the current inputs, summary values, rows, and curve points.

A green or calm summary does not prove the API is ready for a risky deployment. Check Allowed bad requests in sample, Current burn rate, and Projected remaining budget together. Low traffic can make a few 500s look severe, while high traffic can spend thousands of budget events before the percentage looks dramatic.

This is best suited to SRE review prep, post-incident status-code accounting, release readiness checks, and quick comparisons between candidate SLO definitions. Use the copied rows or downloaded table only after the status mix and guardrails agree with the SLO policy you mean to apply.

Step-by-Step Guide:

Work from the SLO target to the status mix, then verify the budget rows before using the headline summary.

Enter API name. The value appears in the summary line, Budget Ledger, and JSON output, so use the same service or endpoint label you would use in an incident note.
Set Availability SLO. If the value is not greater than 0 and less than 100, the validation area shows Availability SLO must be greater than 0 and less than 100 percent.
Set Compliance period in days and Observed window in hours. Fix any zero or negative value before reading Projected period usage or Projected exhaustion.
Enter Budget-consuming statuses with exact codes, classes, or inclusive ranges. Use Status Mix after the run to confirm each status is labeled Budget consuming or Allowed success as intended.
Paste Status counts as count rows, CSV-style rows, or log-like lines. If no status is detected, the validation area asks for examples such as 200=10000 and 500=12.
Read the summary heading and badges, then open Budget Ledger for Current burn rate, Projected period usage, Projected remaining budget, and Projected exhaustion.
Open Burn Guardrails when the summary says burn watch or budget overrun. Check whether the 70%, 100%, 6.00x, 14.40x, or sample-confidence rows explain the warning.
Use Budget Burn Curve and JSON only after the ledger and status mix are credible. The curve is disabled until the inputs produce a valid analysis.

Interpreting Results:

The main output is Current burn rate. At or below 1.00x, the observed error ratio is no worse than the SLO allowance. Above 1.00x, the same mix would exhaust the period budget before the compliance period ends. Projected period usage translates that pace into percent of budget consumed by period end.

Use Projected remaining budget as the release-risk cue. A positive value means the projection still leaves budget events. A negative value means the current pace would miss the SLO without lower error rate, lower traffic, a changed eligibility definition, or a different compliance window. That negative number is not a count of future incidents; it is the projected budget shortfall under the current sample.

How to read API error budget analyzer outputs
Output	Trust this for	Do not overread
Observed success rate	Whether the pasted sample itself is above or below the target.	It does not prove the full compliance period passed.
Allowed bad requests in sample	The local allowance for the exact sample size and SLO target.	It is not a fixed request count; it changes with traffic volume.
Current burn rate	How fast the sample spends budget compared with a sustainable pace.	It does not identify the root cause or confirm the issue is still active.
Projected exhaustion	The day count where the budget crosses 100% if the same mix continues.	It is not a forecast that traffic and failures will actually stay unchanged.
Sample confidence	Whether the sample allowance is large enough for a practical readout.	A low allowance does not make the math wrong, but it should slow decisions based on tiny traffic.

Verify the Status Mix before accepting any alarming result. A 1.95x burn rate means something very different when 429 intentionally spends budget than when throttling should have been excluded. If the wrong statuses are counted, fix Budget-consuming statuses and rerun the analysis before sending the ledger to another team.

Worked Examples:

Default checkout API sample

With Availability SLO set to 99.9%, Compliance period set to 30 days, Observed window set to 24 hours, and Budget-consuming statuses set to 5xx,429, the default counts produce 253,093 eligible requests and 493 budget-consuming requests. Observed success rate is about 99.805%, Current burn rate is about 1.95x, and Projected period usage is about 194.8%. The Projected remaining budget is negative, so the summary moves to budget overrun.

Two-hour 5xx-only check before a deploy

A team pastes two hours of counts with 119,000 200, 4,000 201, 900 404, 12 500, and 8 503, then uses 5xx as the budget pattern for a 99.9% 30-day SLO. The sample has 123,920 eligible requests and 20 bad requests. Allowed bad requests in sample is about 123.9, Current burn rate is 0.16x, and Projected period usage is 16.1%, leaving about 37,411 projected budget events. That result is inside the budget, provided 4xx responses are truly excluded from the SLO.

Throttling included in a stricter SLO

For a 99.95% SLO over 28 days, a six-hour sample contains 480,000 200, 6,000 204, 800 400, 180 429, and 20 500. With 429,500 as the budget pattern, Observed success rate is about 99.959%, just above target, and the sample still has about 43.5 bad requests of local allowance left. The projection uses about 82.1% of the period budget, so Burn Guardrails raises a policy watch even though the observed sample has not crossed 100%.

No parsed status rows

If Status counts contains pasted notes without any HTTP status tokens, the summary changes to Check input and the validation list asks for rows such as 200=10000 and 500=12. After replacing the notes with parseable rows, reopen Status Mix before trusting the ledger. The table should show request counts, share percentages, and the intended Budget role for each detected status.

FAQ:

Should 429 count against the error budget?

Only if the SLO says throttled requests are bad events for the user journey being measured. The default pattern includes 429, but you can remove it and use 5xx when rate limiting is excluded from the objective.

Why can the sample pass while the projection is in watch status?

The sample allowance checks the observed window only. The projection stretches the same traffic and error pace across the compliance period, so a sample can remain under its local allowance while still using 70% or more of the projected period budget.

What does a burn rate above 1.00x mean?

It means the observed bad-event ratio is higher than the SLO allows over a full period. If that mix continued, Projected exhaustion would land before the compliance period ends.

Why does a low-traffic sample look noisy?

The Sample confidence row compares the allowed bad events in the sample with 10. When the allowance is lower than 10, a few failures can swing burn rate sharply, so aggregate a longer window or a related SLO slice before making a high-risk decision.

Does the analyzer call the API being reviewed?

No. It parses the status counts or log-like lines you paste and calculates the result in the browser session. It does not make a live request to the named API or verify a monitoring-system SLO.

Why did I get a validation error after pasting logs?

The parser needs a recognizable HTTP status from 100 to 599. Use rows such as 200=10000, CSV-style pairs such as 500,12, or log lines that expose a status field. Then confirm the parsed statuses in Status Mix.

Glossary:

API error budget: The allowed number of budget-consuming API responses for a chosen SLO and compliance period.
Availability SLO: The target success percentage for eligible requests, such as 99.9% or 99.95%.
Eligible requests: The request count included in the SLO calculation after the user pastes status counts or log-like rows.
Budget-consuming requests: Requests whose HTTP statuses match the chosen class, range, or exact-code pattern.
Burn rate: The observed error ratio divided by the allowed error ratio for the SLO.
Compliance period: The reporting window, in days, used to project budget use and exhaustion timing.

References:

Concepts in service monitoring, Google Cloud Observability, updated 2026-04-29.
Alerting on your burn rate, Google Cloud Observability, updated 2026-04-29.
Prometheus Alerting: Turn SLOs into Alerts, Site Reliability Engineering Workbook, Google and O'Reilly Media, 2018.