Change Entropy: The Hidden Signal in Your Deployment History
Most deployment risk signals focus on the quantity of change: lines added, files touched, PR size. Change entropy captures something different — the dispersion of change across the codebase. A PR that touches 500 lines in one file is fundamentally different from a PR that touches 500 lines scattered across 50 files. Entropy quantifies that difference and correlates strongly with deployment failure rate.
The intuition
Concentrated changes are easier to reason about, test, and review. Scattered changes touch more of the codebase simultaneously, create more potential interaction effects, and are harder for reviewers to track. High entropy signals a change that is doing many things at once — a pattern associated with higher incident rates.
What Change Entropy Measures
Change entropy borrows from information theory. Shannon entropy, in information theory, measures the unpredictability or disorder in a data source. Applied to code changes, it measures how dispersed the changes are across files:
- Low entropy: Most of the changed lines are concentrated in a small number of files. The change is focused. A reviewer can understand the full scope of the change by reading a small number of files deeply.
- High entropy: Changed lines are distributed roughly equally across many files. No single file contains more than a small fraction of the total changes. The change is scattered. A reviewer must hold the context of many different files in their head simultaneously to reason about the change.
The Formula
Change entropy is calculated using the Shannon entropy formula applied to the per-file change distribution:
# Shannon entropy of change distribution across files
import math
def change_entropy(file_changes: dict[str, int]) -> float:
"""
file_changes: {filename: lines_changed}
Returns entropy in bits (0 = all changes in one file, higher = more dispersed)
"""
total = sum(file_changes.values())
if total == 0 or len(file_changes) == 0:
return 0.0
entropy = 0.0
for lines in file_changes.values():
if lines > 0:
p = lines / total
entropy -= p * math.log2(p)
return entropy
# Example: 500 lines in one file
mono = change_entropy({"payments.py": 500})
print(f"Concentrated: {mono:.2f} bits") # 0.00 bits — perfectly concentrated
# Example: 100 lines each in 5 files
spread5 = change_entropy({f"file{i}.py": 100 for i in range(5)})
print(f"5 files even: {spread5:.2f} bits") # 2.32 bits
# Example: 10 lines each in 50 files
spread50 = change_entropy({f"file{i}.py": 10 for i in range(50)})
print(f"50 files even: {spread50:.2f} bits") # 5.64 bits — high entropyThe maximum entropy for N files is log₂(N) — when all files have exactly equal change. Normalizing by max entropy gives a 0–1 score that is comparable across PRs of different sizes:
def normalized_entropy(file_changes: dict[str, int]) -> float:
"""Returns entropy normalized to 0–1 scale."""
n = len(file_changes)
if n <= 1:
return 0.0
raw = change_entropy(file_changes)
max_entropy = math.log2(n)
return raw / max_entropyThe Research: Entropy vs. Incident Rate
Several research papers on software defect prediction have validated entropy as a signal. The finding that is most actionable for deploy risk scoring: normalized change entropy above 0.7 correlates with a 1.8× higher incident rate compared to changes with entropy below 0.3, after controlling for total change size.
The key insight from the research: entropy and change size are correlated but not the same signal. A large, focused refactor (low entropy, high line count) has a different risk profile from a medium-sized scattered change (high entropy, moderate line count). Both signals contribute independently to risk prediction — entropy adds predictive power on top of simple change size.
| Change Type | Lines Changed | Entropy | Incident Rate |
|---|---|---|---|
| Focused refactor | 800 | 0.1 (low) | 6% |
| Feature in one module | 200 | 0.3 (low-medium) | 4% |
| Cross-cutting change | 300 | 0.7 (high) | 9% |
| Scattered refactor | 500 | 0.9 (very high) | 14% |
How to Calculate Change Entropy from a GitHub PR
import requests
import math
def get_pr_file_changes(owner, repo, pr_number, token):
"""Fetch per-file change counts for a PR."""
url = f"https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/files"
headers = {
"Authorization": f"token {token}",
"Accept": "application/vnd.github.v3+json",
}
resp = requests.get(url, headers=headers, params={"per_page": 100})
files = resp.json()
return {
f["filename"]: f["additions"] + f["deletions"]
for f in files
if f["additions"] + f["deletions"] > 0
}
def pr_entropy_score(owner, repo, pr_number, token):
file_changes = get_pr_file_changes(owner, repo, pr_number, token)
return {
"normalized_entropy": normalized_entropy(file_changes),
"raw_entropy": change_entropy(file_changes),
"file_count": len(file_changes),
"total_lines": sum(file_changes.values()),
}
# Usage
result = pr_entropy_score("myorg", "myrepo", 123, "ghp_...")
print(f"Normalized entropy: {result['normalized_entropy']:.2f}")
print(f"Files changed: {result['file_count']}")
print(f"Total lines: {result['total_lines']}")High-Entropy PRs: What to Do
When a PR has high entropy, there are two questions to ask before routing it to additional review:
1. Is the scatter intentional? Some cross-cutting changes — renaming a function, updating an API contract that is used everywhere, fixing a security vulnerability in a shared utility — are inherently high-entropy by necessity. In these cases, entropy is not a defect signal; it is a characteristic of the change type. The appropriate response is more thorough review, not rejection.
2. Can the change be decomposed? High-entropy PRs that mix unrelated changes — fixing a bug in the auth service while also updating the UI while also refactoring a utility — are candidates for splitting. Lower-entropy atomic changes are easier to review, easier to roll back if something goes wrong, and produce cleaner attribution when an incident does occur.
For AI-generated code specifically, high entropy is a red flag beyond the usual signal strength. AI agents tend to make scattered changes because they follow all implications of a change without the human instinct to leave cleanup for a separate PR. High entropy in an AI PR often indicates the agent has over-scoped the task.
Koalr calculates entropy for every PR
Koalr computes normalized change entropy as one of its deploy risk signals, combined with author expertise, coverage delta, and review thoroughness. High- entropy PRs get higher risk scores and are routed to additional reviewers automatically.
Score change entropy alongside 10+ other risk signals
Koalr combines change entropy with author expertise, coverage delta, DDL detection, and timing signals into a single 0–100 deploy risk score for every open PR. Connect GitHub in 5 minutes.