Deploy RiskMarch 27, 2026 · 11 min read

Change Entropy: The Hidden Signal in Your Deployment History

Most deployment risk signals focus on the quantity of change: lines added, files touched, PR size. Change entropy captures something different — the dispersion of change across the codebase. A PR that touches 500 lines in one file is fundamentally different from a PR that touches 500 lines scattered across 50 files. Entropy quantifies that difference and correlates strongly with deployment failure rate.

The intuition

Concentrated changes are easier to reason about, test, and review. Scattered changes touch more of the codebase simultaneously, create more potential interaction effects, and are harder for reviewers to track. High entropy signals a change that is doing many things at once — a pattern associated with higher incident rates.

What Change Entropy Measures

Change entropy borrows from information theory. Shannon entropy, in information theory, measures the unpredictability or disorder in a data source. Applied to code changes, it measures how dispersed the changes are across files:

  • Low entropy: Most of the changed lines are concentrated in a small number of files. The change is focused. A reviewer can understand the full scope of the change by reading a small number of files deeply.
  • High entropy: Changed lines are distributed roughly equally across many files. No single file contains more than a small fraction of the total changes. The change is scattered. A reviewer must hold the context of many different files in their head simultaneously to reason about the change.

The Formula

Change entropy is calculated using the Shannon entropy formula applied to the per-file change distribution:

# Shannon entropy of change distribution across files
import math

def change_entropy(file_changes: dict[str, int]) -> float:
    """
    file_changes: {filename: lines_changed}
    Returns entropy in bits (0 = all changes in one file, higher = more dispersed)
    """
    total = sum(file_changes.values())
    if total == 0 or len(file_changes) == 0:
        return 0.0

    entropy = 0.0
    for lines in file_changes.values():
        if lines > 0:
            p = lines / total
            entropy -= p * math.log2(p)

    return entropy

# Example: 500 lines in one file
mono = change_entropy({"payments.py": 500})
print(f"Concentrated: {mono:.2f} bits")  # 0.00 bits — perfectly concentrated

# Example: 100 lines each in 5 files
spread5 = change_entropy({f"file{i}.py": 100 for i in range(5)})
print(f"5 files even: {spread5:.2f} bits")  # 2.32 bits

# Example: 10 lines each in 50 files
spread50 = change_entropy({f"file{i}.py": 10 for i in range(50)})
print(f"50 files even: {spread50:.2f} bits")  # 5.64 bits — high entropy

The maximum entropy for N files is log₂(N) — when all files have exactly equal change. Normalizing by max entropy gives a 0–1 score that is comparable across PRs of different sizes:

def normalized_entropy(file_changes: dict[str, int]) -> float:
    """Returns entropy normalized to 0–1 scale."""
    n = len(file_changes)
    if n <= 1:
        return 0.0
    raw = change_entropy(file_changes)
    max_entropy = math.log2(n)
    return raw / max_entropy

The Research: Entropy vs. Incident Rate

Several research papers on software defect prediction have validated entropy as a signal. The finding that is most actionable for deploy risk scoring: normalized change entropy above 0.7 correlates with a 1.8× higher incident rate compared to changes with entropy below 0.3, after controlling for total change size.

The key insight from the research: entropy and change size are correlated but not the same signal. A large, focused refactor (low entropy, high line count) has a different risk profile from a medium-sized scattered change (high entropy, moderate line count). Both signals contribute independently to risk prediction — entropy adds predictive power on top of simple change size.

Change TypeLines ChangedEntropyIncident Rate
Focused refactor8000.1 (low)6%
Feature in one module2000.3 (low-medium)4%
Cross-cutting change3000.7 (high)9%
Scattered refactor5000.9 (very high)14%

How to Calculate Change Entropy from a GitHub PR

import requests
import math

def get_pr_file_changes(owner, repo, pr_number, token):
    """Fetch per-file change counts for a PR."""
    url = f"https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/files"
    headers = {
        "Authorization": f"token {token}",
        "Accept": "application/vnd.github.v3+json",
    }
    resp = requests.get(url, headers=headers, params={"per_page": 100})
    files = resp.json()

    return {
        f["filename"]: f["additions"] + f["deletions"]
        for f in files
        if f["additions"] + f["deletions"] > 0
    }

def pr_entropy_score(owner, repo, pr_number, token):
    file_changes = get_pr_file_changes(owner, repo, pr_number, token)
    return {
        "normalized_entropy": normalized_entropy(file_changes),
        "raw_entropy": change_entropy(file_changes),
        "file_count": len(file_changes),
        "total_lines": sum(file_changes.values()),
    }

# Usage
result = pr_entropy_score("myorg", "myrepo", 123, "ghp_...")
print(f"Normalized entropy: {result['normalized_entropy']:.2f}")
print(f"Files changed: {result['file_count']}")
print(f"Total lines: {result['total_lines']}")

High-Entropy PRs: What to Do

When a PR has high entropy, there are two questions to ask before routing it to additional review:

1. Is the scatter intentional? Some cross-cutting changes — renaming a function, updating an API contract that is used everywhere, fixing a security vulnerability in a shared utility — are inherently high-entropy by necessity. In these cases, entropy is not a defect signal; it is a characteristic of the change type. The appropriate response is more thorough review, not rejection.

2. Can the change be decomposed? High-entropy PRs that mix unrelated changes — fixing a bug in the auth service while also updating the UI while also refactoring a utility — are candidates for splitting. Lower-entropy atomic changes are easier to review, easier to roll back if something goes wrong, and produce cleaner attribution when an incident does occur.

For AI-generated code specifically, high entropy is a red flag beyond the usual signal strength. AI agents tend to make scattered changes because they follow all implications of a change without the human instinct to leave cleanup for a separate PR. High entropy in an AI PR often indicates the agent has over-scoped the task.

Koalr calculates entropy for every PR

Koalr computes normalized change entropy as one of its deploy risk signals, combined with author expertise, coverage delta, and review thoroughness. High- entropy PRs get higher risk scores and are routed to additional reviewers automatically.

Score change entropy alongside 10+ other risk signals

Koalr combines change entropy with author expertise, coverage delta, DDL detection, and timing signals into a single 0–100 deploy risk score for every open PR. Connect GitHub in 5 minutes.