Step 1: Define What Counts as Rework

Before measuring rework rate, your team needs to agree on a consistent definition. This is the step most teams skip, and it makes every downstream comparison meaningless.

There are three categories of rework events to capture:

Category A: Revert commits

A revert commit is a git revert of a specific commit or PR that was deployed to production. GitHub automatically titles these commits "Revert '[original PR title]'" — making them relatively easy to identify from PR title pattern matching.

Revert commits are the clearest signal of rework: the team explicitly undid something that was shipped. The ambiguity is whether the revert was motivated by a production issue (true rework) or by a decision to change direction (planned rollback, not rework). The simplest way to disambiguate: a revert within 48 hours of the original deploy is rework. A revert after 48 hours is more likely to be a planned change.

Category B: Hotfix branches and PRs

Hotfix PRs are changes merged outside the normal release cycle specifically to address a production issue. They are identifiable by branch name patterns (hotfix/*, fix/prod-*, emergency/*) or PR labels that your team applies to track them.

The challenge with hotfixes is that they are easy to misclassify in both directions. Not all "hotfix" branches are actual production fixes — some teams use the convention for any urgent change, including non-production-impacting ones. And some actual hotfixes do not follow the naming convention because the engineer was moving fast. Establish a team norm: hotfix branch naming is required for any change that directly addresses a production incident.

Category C: Rollback deployments

A rollback deployment is a deployment that re-deploys a previous artifact to production. In GitHub Deployments API terms, this appears as a new deployment with a SHA that matches a prior deployment's SHA, or a deployment where the description indicates it is a rollback.

Rollbacks are distinct from reverts: a rollback returns the running system to a previous state without changing the codebase; a revert changes the codebase to undo a previous change. Both count as rework.

Step 2: Tag AI-Generated Changes at Merge Time

To compute rework rate separately for AI vs. human changes, you need to know at merge time which PRs contained significant AI-generated code. The most reliable method is a label or tag applied by the tool or the developer.

# GitHub Actions: auto-apply label when Copilot is used
name: Label AI PRs
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  detect-ai:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check for AI co-authorship
        run: |
          if git log --pretty=format:"%B" -1 | grep -qi "Co-authored-by.*copilot"; then
            gh pr edit "$PR_NUMBER" --add-label "ai-assisted"
          fi
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.pull_request.number }}

Alternative approaches: require developers to self-report AI assistance in a PR template checkbox, or use the GitHub API to identify PRs opened by bot accounts (for autonomous agents like Devin).

Step 3: Calculate Rework Rate Segmented by AI vs. Human

from datetime import datetime, timedelta
import re

def is_rework_pr(pr, all_prs, lookback_hours=48):
    """Determine if a PR represents rework."""
    title = pr.get("title", "").lower()
    branch = pr.get("head", {}).get("ref", "").lower()

    # Pattern A: Revert commit
    if title.startswith("revert"):
        return True, "revert"

    # Pattern B: Hotfix branch naming
    hotfix_patterns = [r"hotfix/", r"fix/prod", r"emergency/", r"fix/incident"]
    if any(re.search(p, branch) for p in hotfix_patterns):
        return True, "hotfix"

    # Pattern C: Rollback label
    labels = [l["name"] for l in pr.get("labels", [])]
    if "rollback" in labels or "incident-fix" in labels:
        return True, "rollback"

    return False, None

def compute_rework_rates(merged_prs):
    ai_total = 0
    ai_rework = 0
    human_total = 0
    human_rework = 0

    for pr in merged_prs:
        labels = [l["name"] for l in pr.get("labels", [])]
        is_ai = "ai-assisted" in labels or "ai-generated" in labels

        is_rw, _ = is_rework_pr(pr, merged_prs)

        if is_ai:
            ai_total += 1
            if is_rw:
                ai_rework += 1
        else:
            human_total += 1
            if is_rw:
                human_rework += 1

    return {
        "ai_rework_rate": (ai_rework / ai_total * 100) if ai_total else 0,
        "human_rework_rate": (human_rework / human_total * 100) if human_total else 0,
        "ai_prs": ai_total,
        "human_prs": human_total,
    }

Step 4: Build the Dashboard

A useful rework rate dashboard for AI quality tracking shows at minimum:

Rework rate trend (30-day rolling): Separate lines for AI-assisted and human-written PRs. The primary comparison you want to watch.
Rework rate by service or module: Which services have the highest AI-assisted rework? This identifies where AI tools are performing worst and where additional review gates are needed.
Rework lag distribution: How many hours between the original PR and the rework event? Short lag (under 4 hours) indicates rapid detection — good incident response. Long lag (over 48 hours) may indicate subtle bugs that take time to surface.
AI adoption percentage (same chart): Show AI adoption rate on the same time axis as rework rate. If rework rate is increasing in lockstep with AI adoption, the tools are the contributing factor. If they are moving independently, the cause may be elsewhere.

What a Good Result Looks Like

Teams that implement AI-aware rework rate tracking typically find one of three patterns:

Pattern 1: AI rework rate ≈ human rework rate. The AI tools are being reviewed effectively. The review process has adapted to account for AI-generated code. This is the target state.

Pattern 2: AI rework rate is 2–3× human rework rate. Review process has not adapted. The team is approving AI code too quickly, without sufficient coverage review or integration checking. Intervention: implement the coverage-first review process and CODEOWNERS enforcement for AI PRs.

Pattern 3: AI rework rate is >5× human rework rate. The AI tools are being used for changes that are beyond their capability — high-complexity, cross-cutting changes that require deep domain expertise. Intervention: restrict AI tool use to lower-risk change categories (isolated functions, test generation, documentation) and implement scope gates on AI PR submissions.

Koalr tracks rework rate by AI vs. human automatically

Koalr detects AI-assisted PRs, tracks rework events (reverts, hotfixes, rollbacks), and shows you the rework rate comparison side-by-side so you can measure whether your AI tools are helping or hurting delivery quality.

The Rework Rate Problem: Measuring AI Code Quality at Scale