The Problem With Measuring AI Code Adoption

Most teams trying to measure their AI coding assistant adoption do it through surveys or license counts. Neither tells you what you actually need to know: which specific commits and pull requests contain AI-generated code?

GitHub publishes aggregate Copilot statistics — acceptance rate, suggestions per session — but these numbers don't map to individual PRs. When you're trying to understand why a specific deploy failed, knowing that "your team accepts 32% of Copilot suggestions on average" is useless. Knowing that the PR that caused the incident had 7 of 9 commits attributed to an AI assistant — that's a signal.

Co-authored-by Trailers Are the Ground Truth

GitHub Copilot, Cursor, Claude Code (via the VS Code extension), Codeium, and Amazon CodeWhisperer all support — and in most configurations, automatically add — Git commit trailer lines when they generate a commit. These follow the RFC 822 multi-value trailer format and look like this:

# GitHub Copilot
Co-authored-by: GitHub Copilot <copilot@github.com>

# Cursor (model-specific)
Co-authored-by: Cursor <cursor@cursor.sh>
Co-authored-by: Cursor (claude-3-5-sonnet) <cursor@cursor.sh>

# Claude Code (Anthropic VS Code extension)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

# Codeium
Co-authored-by: Codeium <codeium@codeium.com>

# Amazon CodeWhisperer
Co-authored-by: Amazon CodeWhisperer <codewhisperer@amazon.com>

These trailers are present in the raw commit object and are accessible via the GitHub API without any special permissions — they're part of the commit message. This means Koalr can check every commit in a PR for AI authorship without requiring additional OAuth scopes, without reading your source code, and without adding any latency to your CI pipeline.

The Implementation

When a pull request is opened or updated, Koalr's risk-check pipeline fetches the commit list for that PR and scans each commit message for AI authorship signals. The scan runs concurrently with our other signals (author expertise, coverage delta, change entropy) using Promise.all — no serial wait.

const AI_COAUTHOR_PATTERNS = [
  /co-authored-by:.*copilot@github\.com/i,
  /co-authored-by:.*cursor@cursor\.sh/i,
  /co-authored-by:.*noreply@anthropic\.com/i,   // Claude Code
  /co-authored-by:.*codeium@codeium\.com/i,
  /co-authored-by:.*codewhisperer@amazon\.com/i,
];

// Fetch commits via Octokit (requires no extra scopes — 'repo' already granted)
const commits = await octokit.rest.pulls.listCommits({
  owner, repo, pull_number: prNumber, per_page: 100,
});

const aiCommits = commits.data.filter((c) =>
  AI_COAUTHOR_PATTERNS.some((pattern) => pattern.test(c.commit.message))
);

const signal: AiAuthoredSignal = {
  detected: aiCommits.length > 0,
  aiCommitCount: aiCommits.length,
  totalCommitCount: commits.data.length,
  aiCommitFraction: aiCommits.length / commits.data.length,
  toolsDetected: extractTools(aiCommits),
  mentionedInDescription: AI_MENTION_PATTERNS.some((p) => p.test(prBody + prTitle)),
};

The result is an AiAuthoredSignal interface that captures not just whether AI was involved, but what fraction of commits were AI-authored and which tools were detected. A PR where 9/9 commits are Copilot-generated is very different from one where 1/15 commits has a Claude Code trailer.

How It Appears in the GitHub Check Run

When Koalr posts its deploy risk Check Run to a pull request, the AI authorship signal appears as an informational row in the signal breakdown table — not as a score penalty. This is a deliberate design choice.

Koalr Deploy Risk — PR #2847 in acme-eng/payments-service

Change entropyHIGH — 8.4 (threshold: 6.0)

Author file-expertiseLOW — 4 prior changes

Coverage delta-4.3% (threshold: -2%)

🤖 AI authorship detected7 of 9 commits — GitHub Copilot, Cursor

ℹ️ AI authorship is informational — no score adjustment. Koalr is building correlation data between AI-authored PRs and downstream incident rates.

We deliberately chose not to add a score penalty for AI-authored code. The research shows a correlation between high AI code volume at the team level and deployment instability, but that's a population-level finding. Adding a per-PR penalty would punish engineers for using tools that often improve productivity, based on aggregate data that may not apply to their specific case.

Instead, Koalr surfaces the signal and collects outcome data. As we accumulate evidence across organizations — which AI tools correlate with higher post-deploy incident rates, and under what conditions — we will incorporate the signal into the risk model with empirical backing.

What We See in Practice

Across our internal testing with our own codebase, a few patterns have already emerged:

Copilot trailers are the most common by volume — GitHub's default VS Code integration adds them automatically with no configuration.
Cursor trailers include model metadata (e.g., cursor@cursor.sh sometimes appended with the model name) — useful for distinguishing GPT-4o suggestions from Sonnet suggestions.
Claude Code trailers identify the specific model version (e.g., Claude Sonnet 4.6) — which matters as model capability varies significantly.
PR description mentions are less reliable than commit trailers. Engineers sometimes write "Generated with Copilot" in the PR body, but the commit-level trailer is more precise.

Why This Is a Competitive Moat

No other engineering metrics platform — not Swarmia, LinearB, Jellyfish, or Axify — tracks AI authorship at the commit level. The closest is Axify's AI tool adoption tracking, which shows GitHub Copilot and Cursor usage metrics pulled from those platforms' APIs. But that tells you seats and session counts, not which specific PRs contain AI code.

The commit-level approach gives Koalr something none of them have: a dataset linking specific pull requests that contained AI-generated code to their downstream deployment outcomes. Over time, this becomes a training signal that no competitor can replicate without building the same instrumentation from scratch.

What's Next

Three follow-on directions we're exploring:

Model-level breakdown — when we can distinguish Sonnet from GPT-4o from Claude 3 Haiku via the trailer metadata, we can start to build model-specific risk profiles.
Outcome correlation reporting — a dashboard view that shows AI-authored PRs over time and the incident rate for each tool, once organizations have sufficient history.
Score adjustment with empirical backing — once we have statistical significance in the correlation data, add an optional risk modifier that organizations can enable if they see consistent evidence that AI-authored PRs have higher incident rates in their codebase.

See your team's AI authorship data

Connect your GitHub org and Koalr will start tracking AI-authored commits in your pull requests immediately — no configuration required beyond the repo scope.

Try Koalr free — no credit card →

How We Detect AI-Authored Commits at Deploy Time