Deploy RiskMarch 28, 2026 · 12 min read

DDL Migrations: The Riskiest Change Nobody Tracks

If you model deployment risk across a large dataset of incidents, one signal stands out as the strongest single predictor of production failures: the presence of a Data Definition Language (DDL) statement in the deployment. ALTER TABLE,DROP COLUMN, CREATE INDEX, RENAME TABLE — these SQL statements change the structure of your database, and they interact with production traffic in ways that pure code changes do not.

The data point

Deployments containing DDL migrations have 4.2× the incident rate of deployments without DDL. This is the strongest single signal in deployment risk research, exceeding PR size, author expertise, and coverage delta in predictive power.

Why DDL Is Uniquely Dangerous

Code changes are generally reversible. If you deploy a bug, you can roll back to the previous artifact. The database state before and after the rollback is the same — the code was the variable. DDL migrations break this assumption.

When you run ALTER TABLE users ADD COLUMN last_login_at TIMESTAMP, that column exists in your production database. If you then roll back the application code because of an incident, the column still exists. Your old code does not know about it — but the database does. Depending on your ORM configuration, this might be fine (the old code ignores unknown columns) or it might cause failures (the old code has strict schema validation).

Destructive DDL makes this worse by orders of magnitude: DROP COLUMN last_login_at is not reversible. If you drop a column that the current production code depends on, you have an incident. If you discover this after the migration runs and try to roll back the application code, the old code references a column that no longer exists. You are stuck.

The Locking Problem

Beyond reversibility, DDL operations interact with production database traffic through table locking. Most ALTER TABLE operations in PostgreSQL, MySQL, and other relational databases acquire an exclusive lock on the table being modified. While the migration runs, reads and writes to that table queue up waiting for the lock to release.

For small tables, this takes milliseconds. For large tables — the users table at a SaaS company with 500,000 active accounts, or the events table with 50 million rows — this takes minutes. During those minutes, every database query that touches the locked table is blocked. If your application has a connection pool with 100 connections and they all block waiting for the migration to finish, new requests cannot get a connection. Your API returns 500s. Your users see errors. You have an incident.

Common DDL locking scenarios

DDL OperationLock TypeRisk Level
ADD COLUMN (nullable, no default)Minimal (instant in PG 11+)Low
ADD COLUMN (with default value)Exclusive lock (rewrites table)High
CREATE INDEX (without CONCURRENTLY)Exclusive lock on tableVery High
CREATE INDEX CONCURRENTLYNo exclusive lockLow
DROP COLUMNExclusive lock + irreversibleVery High
RENAME TABLEExclusive lock + breaks code referencesExtreme

How to Detect DDL in PRs

Detecting DDL in a PR requires scanning migration files for SQL keywords. The detection logic depends on your migration framework (Prisma, Flyway, Liquibase, Alembic, custom SQL files).

import re
from pathlib import Path

DDL_KEYWORDS = [
    r"\bALTER\s+TABLE\b",
    r"\bDROP\s+(TABLE|COLUMN|INDEX|CONSTRAINT)\b",
    r"\bCREATE\s+(TABLE|INDEX|UNIQUE\s+INDEX)\b",
    r"\bRENAME\s+(TABLE|COLUMN|TO)\b",
    r"\bTRUNCATE\b",
]

HIGH_RISK_DDL = [
    r"\bDROP\s+TABLE\b",
    r"\bDROP\s+COLUMN\b",
    r"\bRENAME\s+TABLE\b",
    r"\bTRUNCATE\b",
    # CREATE INDEX without CONCURRENTLY keyword
    r"\bCREATE\s+(?!UNIQUE\s+)?INDEX\s+(?!CONCURRENTLY)\w",
]

def scan_file_for_ddl(file_content: str) -> dict:
    """Scan a file for DDL statements."""
    content_upper = file_content.upper()
    ddl_found = []
    high_risk = False

    for pattern in DDL_KEYWORDS:
        matches = re.findall(pattern, content_upper)
        if matches:
            ddl_found.extend(matches)

    for pattern in HIGH_RISK_DDL:
        if re.search(pattern, content_upper):
            high_risk = True

    return {
        "has_ddl": len(ddl_found) > 0,
        "ddl_statements": ddl_found,
        "is_high_risk": high_risk,
    }

def scan_pr_for_ddl(pr_files: list) -> dict:
    """Scan all files in a PR diff for DDL statements."""
    migration_extensions = {".sql", ".prisma", ".xml"}  # Add .py for Alembic

    results = []
    for f in pr_files:
        filename = f.get("filename", "")
        patch = f.get("patch", "")

        # Focus on migration files
        if any(filename.endswith(ext) for ext in migration_extensions):
            result = scan_file_for_ddl(patch)
            if result["has_ddl"]:
                result["filename"] = filename
                results.append(result)

    return {
        "ddl_migrations_found": len(results) > 0,
        "high_risk_ddl": any(r["is_high_risk"] for r in results),
        "migration_files": results,
    }

Blue-Green Migration Patterns

The safest pattern for DDL migrations in production is blue-green schema migration — making the schema change backward-compatible before deploying the application code that uses it. The three-step process:

Step 1: Backward-compatible schema change. Add new columns as nullable. Add new tables. Create indexes concurrently. These changes are safe to deploy while the old application code is running.

Step 2: Deploy the application code. The new application code writes to the new columns/tables. The old code ignores them. Both versions of the code work with the current schema.

Step 3: Cleanup migration. After the new code is fully deployed and stable (typically 24–48 hours), drop old columns, add NOT NULL constraints, remove backward-compatibility bridges.

This pattern eliminates the rollback risk entirely: if step 2 has a problem, you roll back the application code and the schema is still compatible with the old code.

Weighting DDL in Your Risk Score

In a composite risk score (0–100), DDL detection should be a high-weight additive signal. Recommended weights:

  • Any DDL detected: +25 points to risk score
  • High-risk DDL (DROP, RENAME, blocking INDEX): +40 points and route to mandatory DBA review
  • DDL on a table with >1M rows: +50 points (requires table size lookup from database)

The weight should be additive with other signals, not multiplicative — a PR that has DDL AND high change entropy AND poor coverage should score near the ceiling, not overflow.

Koalr detects DDL and adjusts risk scores automatically

Koalr scans migration files in every PR for DDL statements, distinguishes between safe and high-risk DDL operations, and routes PRs with dangerous schema changes to mandatory expert review before merge.

Catch dangerous schema changes before they reach production

Koalr automatically scans every PR for DDL migrations, flags high-risk operations, and ensures schema changes get expert review before merge. Connect GitHub in 5 minutes.