{{ summaryTitle }}
{{ summaryPrimary }}
{{ summaryLine }}
{{ badge.label }}
{{ apiBudgetSloLabel }} Consumed Remaining {{ apiBudgetBurnLabel }}
API error budget inputs
Name the API or SLO slice represented by the status-code sample.
Enter the target success percentage for this API SLO.
%
Use the SLO window you report against, not just the sample period.
days
Set the measurement span so the burn-rate projection uses the right traffic pace.
hr
Define which response codes should burn the API error budget.
Use one status count per line, paste sampled access logs, or load a local TXT/CSV/LOG file.
{{ sourceMetaLabel }}
{{ sourceStatus || 'Drop TXT, CSV, or LOG onto the textarea.' }}
Metric Value Operational note Copy
{{ row.metric }} {{ row.value }} {{ row.note }}
Status Class Requests Share Budget role Copy
No parsed statuses
Paste status counts or load the sample source to populate the status mix ledger.
{{ row.status }} {{ row.classLabel }} {{ row.countLabel }} {{ row.shareLabel }} {{ row.role }}
Guardrail Condition Current Outcome Copy
{{ row.guardrail }} {{ row.condition }} {{ row.current }} {{ row.outcome }}
Customize
Advanced
:

Introduction:

API reliability work often starts after a release, a traffic spike, or an incident export raises the same practical concern: did the service spend too much of its reliability budget? A raw count of 500 responses is useful, but it becomes more meaningful when it is compared with the service level objective, the amount of traffic, and the time window behind the sample.

An availability SLO describes the success rate a service aims to meet over a stated period. The gap between that target and perfect availability is the error budget. A 99.9% availability target leaves 0.1% of eligible requests for failures during the compliance period. For high-volume APIs, that allowance can be thousands of requests. For a low-volume endpoint, one failed request can be enough to make a short sample look alarming.

SLI
The measured reliability signal, such as the fraction of eligible API requests that succeed.
SLO
The target value for that signal, such as 99.9% successful requests over 30 days.
Error budget
The allowed miss rate for the SLO, converted into requests or events for the period.
Burn rate
How fast the sample is spending that allowance compared with the sustainable pace.

HTTP status codes need a policy decision before they become SLO failures. Many teams count 5xx responses because those usually point to a server-side problem. Some count 429 when rate limiting causes a user-visible failure. Others include client disconnects or selected 4xx codes when those responses reflect service behavior rather than user mistakes. The important part is consistency: changing the budget-consuming status rule can change the apparent reliability of the same traffic sample.

Diagram of an API error budget rail with consumed budget, remaining budget, and burn-rate guardrails.
Common HTTP status classes and SLO interpretation cautions
Status area Typical meaning SLO caution
2xx and selected 3xx The API completed the requested work or redirected as intended. Usually counted as successful, subject to the service's own SLI definition.
4xx The request could not be completed as sent, or the client was rate-limited. Some codes may be user error, while 429 or selected edge failures may still harm users.
5xx The server failed or could not complete the method. Often treated as budget-consuming for availability SLOs.

An error-budget calculation is a triage aid, not a full observability system. It can explain a pasted status sample, compare policies, and show whether the current failure rate is sustainable. It cannot know which requests your SLI excludes, whether retries duplicated failures, or whether the sample captures normal traffic for the whole compliance period.

How to Use This Tool:

Use the same SLO wording, period, and status-code policy that the service owner would use in an incident note or SLO dashboard.

  1. Enter API name with the service, route group, or SLO slice that should appear in the Budget Ledger.
  2. Set Availability SLO and Compliance period to the target and reporting window you actually use. A 99.9% 30-day SLO leaves a 0.1% request budget over 30 days.
  3. Set Observed window to the hours covered by the pasted counts or log excerpt, not the full SLO period.
  4. Fill Budget-consuming statuses with classes such as 5xx, exact codes such as 429, or inclusive ranges such as 500-599.
    Changing this policy can move the same status sample from budget available to budget overrun, so keep the pattern fixed when comparing runs.
  5. Paste Status counts, drop one TXT/CSV/LOG file onto the textarea, choose Browse TXT/CSV, or choose Load sample. Accepted rows include 500=12, 500,12, named status and count fields, and access-log lines that contain HTTP status codes.
    If an input issue says no rows were parsed, simplify the source to one status=count pair per line before using the burn-rate result.
  6. Review Budget Ledger for current burn rate and projected period usage, then check Status Mix, Burn Guardrails, and Budget Burn Curve when you need to explain which codes spent the budget and whether the projection crosses 70% or 100%.

Interpreting Results:

Current burn rate is the fastest sustainability readout. A value of 1.00x means the observed error ratio is exactly equal to the SLO's allowed error ratio. Values above 1.00x mean the same mix would run out of budget before the compliance period ends.

Projected period usage and Projected remaining budget show the consequence in period terms. A projection near 100% deserves review even if the short sample has not yet caused a visible outage. A negative remaining budget means the observed pace points to an SLO miss unless traffic, failures, or the policy changes.

API error budget interpretation thresholds
Result cue Boundary How to respond
Sustainable burn Current burn rate <= 1.00x The sample stays within the long-window budget if it represents normal traffic.
Budget policy watch Projected period use >= 70% Review release risk, ownership, and mitigation while the budget is still recoverable.
Budget spent Projected period use >= 100% or the sample already exceeds its allowed bad-request count Confirm with production SLI telemetry and treat the sample as an SLO threat if active traffic agrees.
Fast-burn watch Current burn rate >= 6.00x Check a short-window metric before paging from a small or partial sample.
Critical burn gate Current burn rate >= 14.40x Escalate when the same rate is still present in live traffic.

Avoid false confidence from a clean-looking status mix. If the Sample confidence guardrail says the sample has fewer than 10 allowed bad events, one failure can swing the burn rate sharply. Aggregate a longer window or a related API slice before treating the result as a paging signal.

Technical Details:

Error-budget math converts an availability target into an allowed failure ratio, then compares that allowance with the observed ratio of budget-consuming responses. Request-count SLOs are naturally proportional: twice the traffic creates twice the allowed number of bad responses, but the allowed percentage stays the same.

Burn rate is a pace measurement. A 30-day SLO does not require waiting 30 days to detect risk, because a short sample can be scaled by its observed request rate. That projection is useful for triage, but it inherits every weakness of the sample: missing routes, retry amplification, planned maintenance, bot traffic, and one-off deploy spikes all change the meaning of the count.

Formula Core:

The core calculation uses request counts rather than downtime minutes. The same equations apply whether the input came from aggregated status counts or parsed log lines.

Bratio = 100-SLO100 Eobs = budget-consuming requestseligible requests Burn rate = EobsBratio Projected used % = projected bad requestsprojected budget events100
API error budget formula variables and units
Term Meaning Unit
SLO Availability target entered as a percent, greater than 0 and less than 100. percent
Bratio Allowed fraction of eligible requests that may consume budget. ratio
Eobs Observed bad-response ratio from the selected status policy. ratio
Projected budget events Observed request pace scaled to the compliance period, multiplied by the allowed error ratio. requests

With a 99.9% SLO, the allowed error ratio is 0.001. In the sample counts, 5xx plus 429 produces 493 budget-consuming responses out of 253,093 eligible requests. The observed error ratio is about 0.1948%, so the burn rate is about 1.95x. At a 24-hour observed window and a 30-day compliance period, that pace projects to about 194.8% of the period budget.

Parsing and Rule Core:

API status parsing and burn classification rules
Mechanism Rule Limit or boundary
Status source parsing Rows can be parsed from status=count, CSV-like pairs, named status/count fields, access-log status positions, or the first HTTP status code in a line. Only three-digit HTTP status codes from 100 to 599 are recognized.
Budget-consuming pattern Tokens can be classes such as 5xx, exact codes such as 429, or inclusive ranges such as 500-599. Unsupported tokens or reversed ranges stop the analysis until corrected.
Daily pace Observed requests and bad responses are scaled by 24 / observed window hours. A short incident window is valid only if that pace is the scenario being tested.
Summary status Budget overrun appears when projected usage reaches 100% or the observed sample already exceeds its allowed bad-request count. Burn watch appears at 70% projected use or when burn rate is above 1.00x.
Sample confidence The guardrail expects at least 10 allowed bad events in the sample. Below that point, a small number of failures can create a noisy burn-rate estimate.

Percentages are displayed with rounded decimal places, while request counts are rounded to whole events for readability. Negative remaining budget is not clipped to zero because the size of the overrun is useful when comparing policies or deciding how much recovery time is needed.

Accuracy and Privacy Notes:

Pasted text and local TXT, CSV, or LOG files are read in the browser for parsing. It is still wise to sanitize request paths, identifiers, tokens, and user data before pasting logs, because status-code analysis usually needs counts and codes rather than full request details.

  • The result does not replace production SLI telemetry, alert windows, or incident policy.
  • Partial logs, retry storms, canary traffic, maintenance windows, and low request volume can skew burn-rate projections.
  • Use the same status policy and compliance period when comparing two runs. A policy change can make the same traffic look better or worse.

Advanced Tips:

  • Use Normalize after pasting a mixed log excerpt when you want to audit the parsed status counts before sharing the result.
  • Compare 5xx alone with 5xx,429 when rate limiting is user-visible and the SLO owner has not settled the policy yet.
  • Treat Fast-burn watch and Critical burn gate as investigation cues unless the same burn rate is present in the live short-window SLI.
  • Use Budget Burn Curve to explain timing to release managers, then keep Compliance period unchanged so the curve matches the reported SLO window.
  • Use CSV, DOCX, or JSON exports only after removing request paths, tokens, and user identifiers from pasted log lines.

Worked Examples:

These cases use the shipped sample and common incident-review situations so the burn-rate numbers can be checked against visible result fields.

Release watch from the sample source

The sample has 253,093 eligible requests over 24 hours. Counting 5xx and 429 creates 493 budget-consuming responses, so Budget Ledger reports about 1.95x current burn rate and 194.8% projected period usage for a 99.9% 30-day SLO. That is a burn watch and a projected budget overrun, so confirm the same pace in production SLI telemetry before changing release status.

Rate-limit policy comparison

The same sample counted with only 5xx has 73 budget-consuming responses. Adding 429 changes the Status Mix role for rate-limited requests and moves Projected period usage from a comfortable level to an over-budget projection. That comparison is useful when a team is deciding whether user-visible rate limits should spend the availability budget.

Low-traffic boundary

A small endpoint with 1,000 requests and one 500 response under a 99.9% SLO can show a 1.00x burn rate, but Sample confidence remains weak because the sample allows only one bad event. A longer window or an aggregated route group gives a more stable read before treating the single failure as a paging signal.

Parsing cleanup

A pasted excerpt that contains text but no parseable three-digit statuses returns an input issue asking for rows such as 200=10000 and 500=12. Reformatting the excerpt into one status/count pair per line and choosing Normalize should populate Status Mix with request counts and budget roles.

FAQ:

What should count as a bad API response?

Use the same rule your SLO uses. A common starting point is 5xx, and the status pattern also accepts exact codes such as 429 or inclusive ranges when your policy counts them.

Why does a short sample project across the whole period?

The observed request and bad-response counts are converted to a daily pace, then scaled to the compliance period. That helps with triage, but the projection is only as representative as the observed window.

Does projected exhaustion mean the SLO has already failed?

No. Projected exhaustion assumes the observed pace continues. Compare it with the live SLI source and the actual remaining budget before declaring an SLO miss.

Why can one failure create a large burn rate?

Strict SLOs leave small budgets, and low-traffic samples may have fewer than 10 allowed bad events. Check Sample confidence and aggregate a longer window when the request count is small.

What should I fix if the input is rejected?

Use an SLO greater than 0 and less than 100, enter positive period and window values, add at least one valid budget-consuming status pattern, and provide parseable status counts or log lines.

Glossary:

Service level indicator (SLI)
The measured signal used to judge reliability, such as the fraction of eligible API requests that succeed.
Service level objective (SLO)
The target value for the SLI over a stated period.
Error budget
The portion of eligible requests that may fail while the service still meets its SLO.
Burn rate
The observed bad-response ratio divided by the allowed error ratio.
Compliance period
The full reporting window used for the SLO projection.
Budget-consuming status
An HTTP status code or status class that the selected policy treats as spending error budget.

References: