Why the Original Four Metrics Were Not Enough

The four original DORA metrics — deployment frequency, lead time for changes, change failure rate, and mean time to restore — were designed in an era where the primary threat to software delivery quality was organizational dysfunction: too many handoffs, too little automation, too slow a feedback loop. They were, and remain, excellent instruments for measuring that problem.

But they have a shared blind spot: they treat all deployments as equivalent inputs. A deployment is a deployment. Whether it was authored by a senior engineer who owns the service, a new hire on their first week, or an AI agent that generated 800 lines of code overnight — the DORA framework counts it the same way.

This was fine when the code-to-deploy pipeline was purely human. It became a problem when AI coding assistants started contributing 30–40% of committed code at high-adoption companies. Teams were shipping faster than ever on deployment frequency, their lead time looked excellent, but something was off downstream: their change failure rate was creeping up, and the failures were clustering around a specific type of change.

Rework Rate names that pattern. It does not replace the original four metrics — it extends them by measuring the proportion of your deployment pipeline that is reactive rather than intentional.

The Formula

Rework Rate is expressed as a percentage of total deployments in a given period:

Rework Rate = (Unplanned deployments / Total deployments) × 100%

The definition of "unplanned deployment" is what requires care. The CD Foundation defines it as any deployment whose primary purpose is to remediate a problem introduced by a prior deployment within a defined lookback window — typically 48 hours. In practice, this means:

Rollback deployments: reverting to a previous artifact or commit because the current deployment is degrading service.
Hotfix deployments: a PR merged and deployed outside the normal release cycle specifically to fix a production issue introduced by the prior deploy.
Revert commits: a git revert of a specific PR or commit that was deployed to production and caused an incident.

What does NOT count: scheduled maintenance deployments, infrastructure upgrades, or deployments of new features that happen to ship on short cycle times. The key distinction is intentionality: was this deployment planned as part of the normal delivery flow, or was it forced by a problem introduced upstream?

The AI Connection

The 2025 DORA State of DevOps Report — the one that accompanied the Rework Rate addition — found that teams with high AI coding assistant adoption (more than 40% of committed code AI-assisted) showed a 2.3× higher Rework Rate than teams with low AI adoption, even after controlling for team size, deployment frequency, and industry.

This is a striking finding, and it deserves more careful unpacking than the headline number suggests.

Why AI-generated code produces more rework

AI coding assistants are extraordinarily good at producing syntactically correct code that passes linting and even many unit tests. They are significantly worse at two things:

First, architectural coherence. An AI model generating a new API endpoint does not have the same implicit understanding of your service's failure modes, rate limiting strategy, caching patterns, and downstream consumers that an engineer who has worked on the service for six months does. The code compiles and tests pass, but it introduces subtle issues that only manifest under production load patterns.

Second, cross-cutting concern awareness. AI agents rarely understand which files are high-stakes. A change to a utility function that is called from 47 other modules looks the same to the model as a change to a leaf node with no dependents. Human reviewers with context catch these — AI-generated PRs get reviewed faster (the code looks clean and complete) and receive less scrutiny on the architectural implications.

The result is a class of defects that passes review but manifests in production: higher Rework Rate, not from careless engineering, but from systematically insufficient review of a new class of code.

How to Calculate Rework Rate from GitHub Data

Rework Rate requires correlating deployment data with post-deployment remediation events. Here is the implementation approach using GitHub as the primary data source.

Step 1: Identify your deployment events

Use the GitHub Deployments API or parse your GitHub Actions workflow run history. For each deployment in the measurement window, record: the deployment ID, the commit SHA, the environment (filter to production), the timestamp, and the final status.

Step 2: Identify unplanned deployments

Three patterns classify a deployment as unplanned:

Rollback pattern: A deployment within 48 hours of the prior deployment with a branch name or PR title matching a rollback convention (e.g., branch names containing rollback, revert, hotfix, or a PR title starting with "Revert").
Status failure: A deployment whose status transitions to failure or inactive within the lookback window, followed by a new deployment.
Incident correlation: A deployment immediately preceded by an incident opening in your incident tool (PagerDuty, OpsGenie, incident.io), where the incident was resolved by a subsequent deployment.

Step 3: Calculate the ratio

# Example: monthly rework rate
total_deployments = count_deployments(env='production', period='30d')
unplanned = count_unplanned_deployments(env='production', period='30d')
rework_rate = (unplanned / total_deployments) * 100

Benchmarks

Performance Band	Rework Rate	Typical Pattern
Elite	< 5%	Rare hotfixes, strong pre-deploy review gates
High	5–10%	Occasional rollbacks, good incident detection
Medium	10–20%	Regular hotfixes, reactive deployment culture
Low	> 20%	Every fifth deploy is cleaning up the last one

Teams with high AI coding assistant adoption should expect their initial Rework Rate measurement to be 5–8 percentage points higher than their pre-AI baseline. This is not a reason to abandon AI tools — it is a signal that your review process needs to evolve alongside your tooling.

What to Do When Rework Rate Is High

Rework Rate is most useful not as a global number but as a segmented one. Break it down by:

AI-authored vs. human-authored changes: If AI-assisted PRs have 3× the rework rate of human PRs, you have a review quality problem, not an AI adoption problem.
Service or module: High rework in one service often points to an architectural issue or a missing CODEOWNERS rule that is allowing untested engineers to merge changes to a critical path.
Time of day / day of week: Rework rates are systematically higher for deployments made on Friday afternoons and during on-call engineer rotation transitions — classic pressure-to-ship scenarios.

Deploy risk prediction catches rework before it happens

Rework Rate is a trailing metric — it tells you that something already went wrong. Koalr's deploy risk score operates before the merge, flagging high-risk PRs based on author expertise, coverage delta, change entropy, and historical failure patterns. The goal is to keep your Rework Rate low by catching the changes most likely to need remediation before they reach production.

Connecting Rework Rate to Your AI Tooling Decisions

Rework Rate gives engineering leaders something they did not have before: a quantitative signal for the hidden cost of AI-assisted development. The productivity gains from AI coding tools are real and well-documented. So are the downstream costs when review processes do not adapt.

A team that measures Rework Rate before and after deploying GitHub Copilot or Cursor can make a data-driven judgment: did the AI tool increase productivity more than it increased rework cost? If your Rework Rate went from 8% to 18% after AI adoption, the productivity gain needs to be substantial to justify the remediation overhead — and you should be implementing stronger pre-deploy review gates, not abandoning the tools.

Track Rework Rate alongside your other DORA metrics. Watch for it increasing while deployment frequency increases — that pattern indicates you are shipping faster but cleaning up more, which is not a sustainable trajectory.

DORA's 5th Metric: What Rework Rate Reveals About Your AI Coding Tools