Why File Familiarity Predicts Incidents

Code is not uniformly legible. Every non-trivial file in a production codebase has implicit knowledge embedded in it — knowledge that does not appear in the code itself but exists in the minds of engineers who have worked with it extensively:

Known edge cases and gotchas. The null pointer condition that only manifests with a specific input combination. The race condition that occurs under high load. The validation rule that is not documented anywhere but was added after an incident.
Downstream dependencies. Which other services consume this function's output. Which tests in remote repos would catch a change to this interface. Which configuration changes are required alongside code changes in this module.
Performance characteristics. The query in this file that is already close to its execution time limit. The cache key structure that must not change without a coordinated cache invalidation. The batch size that was tuned specifically for memory constraints on the current instance size.
Historical failure modes. The three previous incidents caused by changes to this file, and the design decisions that were made as a result. Engineers who participated in those incidents carry this knowledge; engineers encountering the file for the first time do not.

An engineer modifying a file for the first time is working without this accumulated context. The code change itself may be technically correct — it does what it is intended to do — but it may also interact with edge cases or downstream dependencies in ways the author could not have anticipated without the institutional knowledge.

Defining "Minor Contributor" for Risk Scoring

For deployment risk purposes, contributor familiarity should be defined at the file level rather than the repository level. An engineer can have extensive experience in a repository and zero experience with a specific module. Repository-level contribution history is a much weaker predictor than file-level history.

A practical expertise scoring approach:

def calculate_file_expertise(author: str, filepath: str,
                             commit_history: list[dict],
                             lookback_days: int = 180) -> float:
    """
    Calculate expertise score (0.0 to 1.0) for a specific author/file pair.

    author: GitHub username of the PR author
    filepath: path of the file being modified
    commit_history: list of commits touching this file, each with 'author' and 'days_ago'
    lookback_days: how far back to consider (default: 6 months)
    """
    recent_commits = [
        c for c in commit_history
        if c["author"] == author and c["days_ago"] <= lookback_days
    ]

    if not recent_commits:
        return 0.0  # Never committed to this file in lookback window

    # Base score: presence of any contribution
    base_score = 0.3

    # Additional weight for each commit (diminishing returns)
    commit_count = len(recent_commits)
    commit_score = min(0.4, 0.15 * commit_count)  # Caps at 0.4 for 2+ commits

    # Recency weight: more recent = more relevant
    most_recent = min(c["days_ago"] for c in recent_commits)
    if most_recent <= 30:
        recency_score = 0.3
    elif most_recent <= 90:
        recency_score = 0.2
    else:
        recency_score = 0.1

    return base_score + commit_score + recency_score  # Max: 1.0

The PR-Level Aggregation

For a PR that modifies multiple files, the PR-level expertise score should not be a simple average of per-file scores. A PR where the author is highly expert in 9 out of 10 changed files but a complete novice in 1 critical file is riskier than the average would suggest. The minimum-expertise file is the weak link.

def calculate_pr_expertise_score(author: str, changed_files: list[str],
                                  commit_histories: dict[str, list]) -> dict:
    """
    Calculate PR-level expertise, weighted toward the weakest files.
    Returns a dict with score and details for the risk explanation.
    """
    file_scores = {}
    for filepath in changed_files:
        history = commit_histories.get(filepath, [])
        file_scores[filepath] = calculate_file_expertise(author, filepath, history)

    first_time_files = [f for f, score in file_scores.items() if score == 0.0]
    low_expertise_files = [f for f, score in file_scores.items() if 0 < score < 0.4]

    # Weight toward minimum: min 40%, mean 60%
    scores = list(file_scores.values())
    min_score = min(scores)
    mean_score = sum(scores) / len(scores)
    weighted_score = (0.4 * min_score) + (0.6 * mean_score)

    return {
        "expertise_score": weighted_score,
        "first_time_files_count": len(first_time_files),
        "first_time_files": first_time_files,
        "low_expertise_files": low_expertise_files,
        "risk_contribution": "high" if len(first_time_files) >= 2 else
                            "elevated" if len(first_time_files) == 1 else "normal",
    }

Calibrated Data: Incident Rate by First-Time File Count

First-Time Files in PR	Incident Rate Multiplier	Recommended Action
0 (all familiar)	1.0× (baseline)	Standard review process
1 first-time file	2.1×	Ensure CODEOWNERS review for that file
2–3 first-time files	3.4×	Route to domain expert reviewers; elevated risk score
4+ first-time files	5.1×	Block or require engineering lead sign-off
AI agent (all files first-time)	6–8× (estimated)	Mandatory domain expert review for every changed file

Special Case: AI Agents and Zero-Expertise Contributions

AI coding agents — Devin, GitHub Copilot Workspace, autonomous Claude agents — always have a file expertise score of zero for every file they modify. They have no commit history, no accumulated institutional knowledge, and no ability to reason about the file-level context that human experts carry.

This means the first-time file edit risk signal is structurally always triggered by AI agent PRs. A risk scoring system should treat AI agent contributions as explicitly first-time for all files — and route them accordingly to domain expert reviewers regardless of PR size or other signals.

Using the Signal Without Discouraging Learning

A reasonable objection: if engineers are penalized for modifying unfamiliar files, does this create a disincentive to cross-train and expand knowledge across the codebase?

The answer is in how the signal is used. Surfacing higher risk for first-time file edits should not block those PRs or penalize the engineer — it should trigger additional review from the files' expert owners. The outcome of that review session is both improved PR quality and knowledge transfer from the owner to the author. Done well, the CODEOWNERS review process is a cross-training mechanism, not a barrier.

The risk scoring context shown in the GitHub Check Run should be framed accordingly: "This PR modifies 2 files the author has not previously contributed to. Automatic review from @acme/auth-team has been requested." This surfaces the expert, not a judgment about the author.

Koalr calculates file-level expertise for every PR author

Koalr scores each PR author's familiarity with every file they modify — weighting the risk score toward the weakest files — and surfaces this signal in the PR risk breakdown so reviewers understand exactly where the expertise gap is.

Minor Contributors and Deployment Risk: The Data Behind First-Time File Edits