{{ result.summary.heading }}
{{ result.summary.primary }}
{{ result.summary.line }}
{{ badge.label }}
{{ node.label }}
DORA metric inputs
Analyze one application or service boundary at a time.
{{ windowDaysLabel }}
Use the same reporting window you use for team reviews.
days
Pick the cadence that matches the current improvement conversation.
{{ targetHoursLabel(lead_time_target_hours) }}
Target hours from commit to production deployment.
hr
{{ targetPercentLabel(cfr_target_pct) }}
Maximum acceptable percent of deployments requiring immediate intervention.
%
{{ targetHoursLabel(recovery_target_hours) }}
Target recovery time for failed deployments that need intervention.
hr
{{ targetPercentLabel(rework_target_pct) }}
Maximum acceptable percent of deployments that are unplanned incident rework.
%
Paste CSV with one production deployment per row, or browse/drop a CSV or TXT file.
{{ sourceMeta }}
{{ fileStatus || sourceHint }}
Matched case-insensitively against status and change_type.
Use words such as hotfix, rework, emergency, incident, patch, or unplanned.
{{ header }} Copy
{{ cell }}
No {{ tab.label.toLowerCase() }} rows
Load valid deployment rows to populate this export.
Customize
Advanced
:

Introduction

DORA metrics give software teams a shared way to discuss delivery speed and release stability without reducing everything to story points, ticket volume, or opinion. The measurements are anchored in production movement: when a change was ready, when it reached users, whether it caused immediate repair work, and how long recovery took when a release went wrong.

The current DORA model separates delivery throughput from deployment instability. Throughput includes change lead time, deployment frequency, and failed deployment recovery time. Instability includes change failure rate and deployment rework rate. The distinction matters because a team can move changes often and still create avoidable incidents, or recover quickly from failures while users wait too long for ordinary changes.

Deployment frequency
How often production deployments happen during a reporting period.
Change lead time
Elapsed time from committed code to production deployment.
Change failure rate
Share of deployments that require immediate intervention such as rollback, hotfix, or incident response.
Failed deployment recovery time
Elapsed time from a failed deployment to restored service.
Deployment rework rate
Share of deployments that represent unplanned work caused by production problems.
DORA metric map from commit to deploy, recover, and review

Good DORA reporting depends on a stable measurement boundary. One service, application, or product area measured over a consistent window gives a cleaner signal than a mixed portfolio where release policies, incident severity, and deployment automation differ. A monthly platform report can still be useful, but it should preserve the service-level rows that explain why the overall number moved.

Names and status words are also part of the measurement. A rollback, failed deployment, incident, hotfix, reverted change, or emergency patch may represent the same practical event in different delivery systems. Without a shared token policy, two teams can appear to have different failure rates even when their production experience is similar.

DORA metrics are strongest as trend and improvement signals, not as a complete account of engineering performance. They do not explain customer value, incident severity, business priority, or why a release was delayed. They show where to ask sharper questions about batch size, release friction, recovery practice, and unplanned repair work.

How to Use This Tool:

Prepare one deployment row per production release. A header row is optional, but named columns make the result easier to audit.

  1. Enter a service or team name, then set the measurement window from 1 to 180 days. Deployment frequency is normalized against that window.
  2. Choose the benchmark profile: Elite uses a 14 deployments per week target, High uses 1 per week, Medium uses 0.25 per week, and Baseline leaves frequency as trend-only.
  3. Set the lead time, change failure, recovery, and rework targets if your review uses different guardrails. The time targets accept hours, and the rate targets accept percentages.
  4. Paste deployment rows, drop a CSV or TXT file, browse for a file, or start with the sample. Files larger than 1 MB are rejected so the page stays responsive.
  5. Use the expected CSV shape when possible: id, commit_time, deploy_time, status, recovered_time, and change_type. The last two fields are optional, but they improve failure, recovery, and rework analysis.
  6. Open Advanced when your release system uses custom words. Failure tokens are matched against status and change type, while rework tokens identify unplanned incident work.
  7. Read the DORA Scorecard first, then inspect the Deployment Ledger, Improvement Brief, Throughput Cadence Chart, Stability Recovery Chart, and JSON output when you need evidence for a review.

If the page shows input warnings, fix the source rows before treating the summary as a management-ready result. Missing deploy times remove rows from all metrics, and missing recovery times weaken the recovery reading for failed deployments.

Advanced Tips:

  • Keep benchmark profiles and targets stable for recurring reviews. Changing the profile from High to Elite can be useful for aspiration, but it also changes the on-target signal even when the underlying deployment data has not moved.
  • Normalize deployment rows after loading a messy export when you want a clean audit copy. The normalized text keeps the expected columns and rewrites parseable timestamps into a consistent date-time shape.
  • Tune failure tokens before tuning rework tokens. Change failure rate is the primary instability measure, while rework rate is a useful companion for incident-driven hotfix and unplanned repair patterns.
  • Compare chart tabs only after matching the reporting window. The weekly chart buckets start on Monday in UTC, so deployments near a local week boundary can appear in a different week than a local team calendar.
  • Use table exports for review packets and JSON for repeatable audit trails. The scorecard gives the headline, but the deployment ledger preserves the row-level evidence behind each metric.

Interpreting Results:

The DORA Scorecard gives the quickest judgment. Frequency and lead time describe flow. Change fail rate, failed deployment recovery time, and deployment rework rate describe stability pressure. The summary badges are useful for scanning, but the ledger and warnings explain whether the inputs are complete enough to trust.

DORA result cues and corrective checks
Result cue What to trust What to verify
Deployment frequency Valid deployments divided by window days, reported as deployments per week. Confirm the window and service boundary before comparing teams or periods.
Change lead time Median and p85 hours from commit time to deploy time for eligible rows. Inspect rows with missing commit time or commit time after deployment.
Change fail rate Percentage of valid deployments that match a failure token. Check whether rollback, incident, outage, hotfix, and reverted-change labels are used consistently.
Failed deployment recovery time Median hours from failed deploy time to recovered time for failed rows with recovery data. Treat failed rows with missing recovered time as a reporting gap, not as fast recovery.
Deployment rework rate Percentage of valid deployments marked by rework tokens such as hotfix, emergency, patch, or unplanned. Use it as a supplement to DORA, not as a substitute for change failure rate.
Improvement Brief Prioritized findings based on the selected targets and parser warnings. Review P1 and P2 items against the ledger before assigning improvement work.

A green frequency signal does not mean the delivery system is healthy if failure rate or rework is high. For a fair trend review, keep the same window, token rules, source export, and service boundary across runs.

Technical Details:

DORA measurement starts by deciding which events count as production deployments. In this analyzer, a valid deployment is any row with a parseable deployment timestamp. That timestamp becomes the denominator for frequency, failure rate, and rework rate, so rows without a usable deploy time are skipped rather than partially counted.

Lead time measures elapsed hours from commit to deployment. It is excluded when the commit timestamp is missing or later than the deployment timestamp, because negative lead time would hide an extraction or clock-ordering problem. Recovery time is calculated only for deployments classified as failed and only when the recovery timestamp is at or after the failed deployment.

Token matching controls failure and rework classification. Status and change-type text are lowercased and matched case-insensitively against the configured words. A row can count as both failed and rework when, for example, a hotfix deployment also represents unplanned incident repair.

Formula Core

The frequency calculation converts a count over the selected reporting window into a weekly rate.

deployment frequency per week = valid deployment count measurement window days × 7

Lead and recovery durations are calculated in hours, then summarized with medians. The p85 lead-time value shows the slower tail of changes.

lead hoursi = deploy timei - commit timei 1 hour , recovery hoursj = recovered timej - failed deploy timej 1 hour

Rate metrics use the same valid deployment denominator, which makes failure and rework percentages comparable inside one run.

change failure rate = failed deployment count valid deployment count × 100 , deployment rework rate = rework deployment count valid deployment count × 100

Rule Core

DORA analyzer rules and boundary behavior
Rule Boundary behavior Result impact
Valid deployment deploy_time must parse as a date or date-time. Invalid deploy rows are skipped from all counts and rates.
Lead-time row commit_time must parse and be at or before deploy_time. Invalid lead-time rows still count as deployments but not in median or p85 lead time.
Failure row Status or change type contains one configured failure token. The row counts in change failure rate and may require recovery data.
Recovery row recovered_time must parse and be at or after the failed deploy time. Missing recovery data creates a warning and blocks an on-target recovery signal.
Rework row Status or change type contains one configured rework token. The row counts in deployment rework rate whether or not it also counts as failed.
Weekly charts Deployments are grouped by the UTC week that starts on Monday. Chart labels may differ from local calendar expectations near week boundaries.

Target Signals

DORA target signal operators
Metric On-target rule Default or profile target
Deployment frequency Weekly frequency is greater than or equal to the profile target, or Baseline is selected. Elite 14/week, High 1/week, Medium 0.25/week, Baseline trend-only.
Change lead time Median lead time is less than or equal to the lead time target. 24 hours by default.
Change fail rate Change failure rate is less than or equal to the change failure target. 15% by default.
Failed deployment recovery time No failed deployments exist, or all failed rows have valid recovery data and median recovery is less than or equal to target. 4 hours by default.
Deployment rework rate Rework rate is less than or equal to the rework target. 10% by default.

A sample substitution shows the scale of the arithmetic. Seven valid deployments in a 30 day window produce 7 / 30 * 7 = 1.633 deployments per week, displayed as about 1.63/week in detailed output and 1.6/week in the summary. If three of those deployments match failure tokens, the change failure rate is 3 / 7 * 100 = 42.857%, displayed as 42.9% in the scorecard.

Different vendor dashboards may define the production boundary, successful deployment count, incident relation, or rolling period differently. Use the analyzer for transparent CSV-based review, and avoid mixing its numbers with another platform unless the event definitions match.

Accuracy and Privacy Notes:

  • Pasted and selected CSV text is read in the browser for calculation. External chart code may be loaded by the page, but deployment rows are not sent away to calculate results.
  • Deployment and incident rows can reveal sensitive service, timing, and outage information. Remove customer names, secrets, and incident details that are not needed for the metric.
  • Trend comparisons are valid only when the service boundary, measurement window, deployment source, and token rules stay consistent.
  • Incident severity, customer impact, root cause, ownership, and business value are not inferred from the CSV unless those facts are represented in the visible rows.

Worked Examples:

Healthy cadence with stability pressure. The sample data has 7 valid deployments across a 30 day window, so Deployment frequency is about 1.63/week and meets the High profile. Three rows match failure tokens, so Change fail rate is 42.9% against the default 15% target. The Scorecard should point review toward release quality and failed-change causes before the team celebrates cadence.

Long lead-time tail. A team may show a 9.0 hr median lead time while the p85 value sits around 22.3 hr. That means most changes move fast, but a smaller group waits much longer. The Improvement Brief can still recommend checking review queues, build delays, approvals, or environment promotion even when the median is on target.

Recovery data missing. If a failed deployment row has status rollback but no recovered_time, the row still counts in Change fail rate. Failed deployment recovery time becomes less trustworthy because a missing recovery timestamp is not proof of quick restoration. Fill the restored time from the incident record before comparing recovery to the 4 hr target.

Hotfix-heavy week. A release export with several hotfix or unplanned rows can meet Deployment frequency while failing the Deployment rework rate target. The practical follow-up is to separate planned feature delivery from incident repair and inspect which production triggers caused the rework.

FAQ:

Can I analyze more than one service at once?

You can, but the result is easier to act on when each run uses one service, application, or team boundary. Mixed services can hide which system caused the failure, delay, or rework signal.

Why are some rows skipped?

Rows without a parseable deploy_time are skipped because deployment time is required for every metric. Use Normalize or fix the source export so each deployment has a valid date or date-time.

Why is lead time missing for valid deployments?

Lead time needs a valid commit_time that is not later than deploy_time. Rows that fail that check still count as deployments, but they are excluded from median and p85 lead time.

How should failure tokens be chosen?

Use the words your release and incident systems actually use for failed deployments, rollbacks, incidents, outages, reverted changes, and hotfixes. Keep the token list stable when comparing periods.

Is deployment rework a core DORA metric?

Current DORA guidance includes deployment rework rate in the five-metric model. In this analyzer, it is calculated from configured rework tokens and reported beside the four longer-established delivery metrics.

Why does another dashboard show different DORA numbers?

Platforms differ in production environment rules, successful deployment counting, incident linkage, and rolling-window logic. Compare numbers only after matching the deployment source, window, service boundary, and failure definitions.

Glossary:

Change lead time
Hours from committed code to production deployment for rows with valid commit and deploy timestamps.
Change failure rate
Percentage of valid deployments classified as failed by the configured status or change-type tokens.
Deployment frequency
Valid deployment count normalized to deployments per week over the selected measurement window.
Deployment rework rate
Percentage of valid deployments marked as unplanned repair work by the configured rework tokens.
Failed deployment recovery time
Median hours from failed deployment to recorded recovery for failed rows with valid recovery timestamps.
Measurement window
The day count used to normalize deployment frequency and frame a trend comparison.
Token policy
The chosen words that classify rows as failed changes or unplanned rework.