What Author File Expertise Measures

Author file expertise is calculated per-file, per-author, from your git commit history. For a given PR, you compute a score for each file being modified: how many times has this specific author committed to this specific file in the past N months?

# Simplified expertise calculation
def file_expertise(author_email, file_path, lookback_days=90):
    commits = git_log(
        author=author_email,
        path=file_path,
        since=f"{lookback_days} days ago"
    )
    return len(commits)  # 0 = first-time contributor to this file

def pr_expertise_score(pr_author, changed_files):
    scores = [file_expertise(pr_author, f) for f in changed_files]
    # Return worst-case (minimum) expertise across all changed files
    return min(scores) if scores else 0

The signal is strongest at the extremes. Authors with zero prior commits to a file (first-time contributors) have significantly higher incident rates. Authors with 10+ prior commits have much lower rates. The inflection point is around 3–5 commits — after that, additional commits add diminishing predictive power.

Why File Expertise Predicts Deployment Risk

File expertise is a proxy for domain knowledge. An engineer who has committed to your payments/charge.service.ts file 15 times over the past year has accumulated knowledge about that file that is not written in the code or in any documentation:

Which error codes from the payment gateway map to which retry strategies
Why that seemingly unnecessary null check exists (it prevented a $200K incident in 2024)
Which downstream services consume this function's output and are sensitive to schema changes in the return type
Which edge cases in the test suite are actually testing production scenarios versus historical incidents

A first-time contributor to the file — human or AI — does not have this knowledge. They can read the code and understand what it does. They cannot know what it means, what its history is, or what the unwritten constraints are.

The AI Agent Expertise Problem

Human engineers gradually accumulate file expertise over their tenure on a team. A new hire starts with zero expertise everywhere and, over 6–12 months, builds expertise in the components they work on most. Their risk profile improves automatically as they spend time in the codebase.

AI coding agents do not accumulate expertise. Copilot, Cursor, Devin, or any other AI agent working on a PR has no commit history to any file in your codebase. From the perspective of the author expertise signal, every AI-generated PR is authored by a first-day new hire who has never seen the codebase before — regardless of how many AI-generated PRs have been merged in the past.

This is not a solvable problem by giving the AI more context in the prompt. The expertise signal captures accumulated knowledge that exists in the engineer's head, not in the codebase. No amount of additional context in the prompt will give an AI agent the equivalent of a human expert's institutional memory.

Implications for Risk Scoring

If you are building or using a deployment risk scoring system, the author expertise signal needs to be applied differently for AI-generated code:

Scenario	Expertise Score	Risk Implication
Expert engineer, owned file 2+ years	High (20+ commits)	Lower risk contribution from this signal
Engineer, some exposure to file	Medium (3–10 commits)	Moderate risk contribution
Engineer, first-time contributor	Low (0–2 commits)	Elevated risk — trigger CODEOWNERS review
AI agent (any tool)	Zero (always)	Highest risk — mandatory expert review required

How to identify AI-authored PRs

Several signals identify AI-generated PRs reliably:

Co-authored-by bot signatures: Many AI tools add Co-authored-by: GitHub Copilot <copilot@github.com> or similar to commit messages.
PR labels: Teams using GitHub Copilot Workspace, Devin, or similar tools often have the tool auto-apply a label (e.g., ai-generated) to PRs it creates.
PR description patterns: AI-generated PRs often have characteristic description formats — very complete, structured with sections, sometimes too verbose. This is a heuristic, not a deterministic signal.
PR author is a bot account: Autonomous agents like Devin create PRs from their own GitHub accounts, which can be identified as bot accounts.

Compensating Controls for Zero-Expertise PRs

When a PR has zero author expertise (AI-generated or human first-time contributor), the risk scoring system should trigger compensating controls:

Mandatory CODEOWNERS review. Route the PR to the team that owns the affected files. This is the most important compensating control — it ensures that someone with file expertise reviews the change, even if the author has none.

Elevated coverage threshold. Apply a higher coverage delta requirement for zero-expertise PRs. Where a regular PR might require coverage to stay flat, a zero-expertise PR should require a coverage increase — the lack of domain knowledge should be compensated by more thorough automated testing.

Staging validation requirement. For zero-expertise PRs touching critical services, require staging validation before production merge, regardless of the test results.

Reduced blast radius deployment. Consider canary deployment patterns for zero-expertise PRs to high-risk services — route 5% of traffic to the new version before full rollout.

Koalr calculates file expertise automatically

Koalr computes per-file author expertise from your git history and applies it as a weighted signal in the deploy risk score. AI-generated PRs are automatically identified and scored with zero expertise weighting, triggering CODEOWNERS routing without manual process overhead.

Author Expertise Signals: Why AI-Written PRs Need Different Scrutiny