How to Improve Lead Time for Changes: A Practical Guide for Engineering Teams
Lead time for changes is the DORA metric that most directly reflects engineering efficiency. Elite teams ship in under an hour. Most teams take days or weeks. This guide gives a step-by-step breakdown of where lead time actually goes — and exactly what to fix first, depending on where your team's constraint lives.
What this guide covers
We decompose lead time into four measurable segments — coding time, PR review wait, CI pipeline time, and deploy pipeline time — and provide specific, implementable interventions for each. Includes GitHub API queries for baselining, a bottleneck diagnosis framework, and a 90-day improvement plan.
Why lead time for changes is the most important DORA metric
Of the four DORA metrics — deployment frequency, lead time for changes, change failure rate, and mean time to restore — lead time is the one most directly within your team's control. Deployment frequency is a lagging indicator: it rises as lead time falls. Change failure rate reflects code quality and testing culture, which changes slowly. MTTR depends heavily on your incident management process and tooling.
Lead time for changes is different. It is a direct measure of how long it takes your team to move an idea through the delivery pipeline and into production. It captures friction at every stage — in how code is written, reviewed, tested, and deployed. And it is the metric that DORA research most consistently correlates with business outcomes: the 2023 State of DevOps Report found that elite performers (lead time under one hour) were 2× more likely to meet or exceed their reliability targets and 3× more likely to exceed their organizational performance goals than low performers (lead time over one month).
The challenge is that "lead time for changes" is a single number that aggregates several fundamentally different kinds of delay. Teams that try to improve the number without decomposing it end up optimizing the wrong thing. The most common mistake: investing heavily in CI pipeline speed when the actual bottleneck is PR review wait time that is 5× longer.
Where lead time actually goes: decomposing the metric
Lead time for changes breaks into four segments, each with different root causes and different interventions:
| Segment | Definition | Typical range | Primary lever |
|---|---|---|---|
| Coding time | First commit → PR opened for review | 1–3 days | PR size, trunk-based dev |
| PR review wait | PR opened → first meaningful review | 1–5 days | Review SLAs, auto-assignment |
| CI pipeline time | PR opened → CI green | 15–60 min | Parallelization, caching |
| Deploy pipeline time | Merge → production live | 5–30 min | Deploy-on-merge, blue-green |
For most teams, PR review wait is the largest segment — often accounting for 60–70% of total lead time. That is the place to start. Teams that optimise CI without fixing review latency see minimal improvement to their lead time number, because they are shaving minutes off a segment that is measured in days.
Step 1: Measure your baseline
Before you can improve lead time, you need to measure each segment separately. The GitHub API provides all the data you need. The key timestamps and how to calculate each segment:
- Total lead time:
deployment.created_atminuspull_request.created_at(or the timestamp of the first commit in the branch, for a tighter measure) - Review wait time:
pull_request_review_event.created_at(first review event) minuspull_request.created_at - CI duration:
check_suite.updated_atminuscheck_suite.created_at(for the final passing check suite run) - Deploy duration:
deployment.created_atminuspull_request.merged_at
One critical note on statistics: use the median, not the mean. Long-lived branches (multi-sprint features, large refactors) skew the mean dramatically and make the metric look worse than the typical PR experience actually is. The median gives you the number that represents what your typical engineer experiences on a typical day.
Calculate each segment at the team level, not just per-repository. Different teams within the same organisation often have wildly different lead time profiles — and the bottleneck differs by team. A platform team with 20-minute CI and a 6-hour deploy pipeline has a different problem from a product team with 45-minute CI and 2-day review wait.
The bottleneck diagnosis: finding YOUR constraint
Once you have baselines for each segment, the constraint is usually obvious. Apply these thresholds as a first-pass diagnosis:
- Review wait > 2 days: your bottleneck is review culture, reviewer availability, or PR size — not CI. Optimising CI first would be wrong.
- CI time > 30 minutes: your bottleneck is test suite performance. The fix is parallelization and caching — not process changes.
- Deploy time > 15 minutes: your bottleneck is deploy pipeline design. Blue-green deployments, automated smoke tests, and deploy-on-merge are the levers.
- Coding time > 5 days: your bottleneck is PR size and working habits. Trunk-based development and feature flags are the interventions.
The framework here is deliberately sequential: fix the largest segment first. A 50% reduction in a segment that accounts for 5% of total lead time produces a 2.5% improvement overall. A 50% reduction in a segment that accounts for 65% of total lead time produces a 32% improvement. The math is simple but teams routinely ignore it.
Intervention 1: Reduce PR review wait time (highest impact for most teams)
Review wait time is the most impactful and most neglected segment. It is also the one most amenable to process change without infrastructure investment. There are four interventions that consistently move the number.
Set PR size limits
The research is consistent and dramatic: PRs over 400 lines take 3× longer to receive a first review than PRs under 200 lines. The mechanism is psychological as much as logistical — a reviewer who opens a 600-line diff and sees they have 45 minutes of work ahead of them will defer to a less loaded moment. That moment may come 36 hours later.
Establish a soft limit of 400 lines (excluding auto-generated code, lockfiles, and migrations) and a hard conversation for PRs over 600 lines. You do not need to enforce this mechanically at first — the act of measuring PR size per author and surfacing it in team reviews changes behavior faster than most teams expect.
The GitHub API gives you this directly: pull_request.additions + pull_request.deletions. You can enforce a size check as a GitHub Actions Check Run that posts a warning comment on PRs over your chosen threshold.
Implement reviewer auto-assignment
Requiring authors to manually select reviewers introduces friction and inconsistency. Authors tend to select reviewers they know are responsive — which concentrates review load on a small number of people and leaves expertise mismatched. CODEOWNERS with GitHub's auto-assign reviewer feature eliminates both problems.
Configure .github/CODEOWNERS with team-level ownership entries for each major directory. Then enable "Require review from Code Owners" in branch protection rules. GitHub will automatically request review from the relevant owner team when a PR touches those paths. Combined with GitHub's load-balancing option (which assigns to the team member with the fewest open review requests), this distributes review work evenly and ensures the right people are always in the loop.
Create review SLAs
The single most effective process change most teams can make is establishing an explicit expectation: first review within 4 hours during business hours. This is not aspirational — it is a team agreement, tracked and visible. When review wait times are invisible, they drift. When they are measured and discussed in retros, they improve.
The target thresholds to work toward: first review comment or approval within 4 hours of PR opening; full review turnaround (all review rounds complete) within 24 hours. Koalr's PR dashboard surfaces stale PRs — PRs that have been open without a review for longer than your configured SLA threshold — and can alert the assigned reviewer via Slack.
The GitHub API endpoint to track this: GET /repos/{owner}/{repo}/pulls/{number}/reviews — the timestamp of the first review event minus the PR open timestamp gives you time-to-first-review per PR, per reviewer. Track this metric per team member; review latency often concentrates in a small number of individuals.
Use draft PRs for early feedback
Opening a PR in draft status when work is 50–70% complete accomplishes two things: it gets architecture-level feedback before the implementation is locked in (when rework is cheap), and it gives reviewers visibility into upcoming work so they can plan their time. Teams that adopt this practice report that the surprise "here is a 500-line PR that needs review today" pattern — which is a major driver of review latency — almost disappears. Reviewers have already seen the design and the review session becomes confirmation rather than discovery.
Intervention 2: Shrink PR size
PR size reduction is the intervention with the highest downstream leverage. It improves review wait, review quality, and change failure rate simultaneously. The research benchmarks are stark: PRs under 100 lines have a median review time of approximately 40 minutes. PRs over 500 lines have a median review time of over 4 hours — and those 4 hours are spread across days, not a single session.
Feature flags enable small, safe merges
The most common objection to small PRs is "the feature is not complete yet, I cannot ship it." Feature flags dissolve this objection. Merge the feature into main behind a flag set to off — the code ships to production but is invisible to users. Subsequent PRs complete the feature and eventually flip the flag. This pattern allows daily merges of partial work without user impact, which is exactly the definition of trunk-based development.
The implementation overhead for a basic feature flag system is low: a database table or environment variable with a flag name and boolean value is sufficient to start. More sophisticated platforms (LaunchDarkly, Unleash, Flagsmith) add percentage rollouts and targeting rules when you need them.
Trunk-based development as a practice
The DORA research is unambiguous: trunk-based development — merging to main at least daily, with short-lived feature branches — is a predictor of elite delivery performance. The causal path runs through PR size: when branches cannot live for more than a day or two, they are forced to stay small. Small branches produce small PRs. Small PRs review fast and deploy without incident.
The transition from long-lived feature branches to trunk-based development is the hardest cultural change in this guide. Teams resist it because it requires confidence in CI (which catches broken code immediately), feature flags (which allow safe partial merges), and a shared understanding of what "done" means when code ships behind a flag. Build these preconditions before pushing trunk-based development as a practice.
Intervention 3: Reduce CI pipeline time
Once review wait is addressed, CI time is typically the next constraint. Thirty to sixty minute CI pipelines are extremely common and entirely unnecessary for most codebases. The target is CI green in under 10 minutes for most PRs. This is achievable for almost any test suite with the right architecture.
Parallelise test suites
GitHub Actions' matrix strategy allows you to split tests across multiple parallel runners. A test suite that takes 40 minutes to run sequentially can run in 10 minutes with 4 parallel shards. The configuration is straightforward:
jobs:
test:
strategy:
matrix:
shard: [1, 2, 3, 4]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests (shard ${{ matrix.shard }} of 4)
run: jest --shard=${{ matrix.shard }}/4For Jest specifically, the --shard=N/M flag splits test files evenly across M workers, running shard N. Vitest, pytest-xdist, and RSpec all have equivalent sharding primitives. The key insight: the wall-clock time of your CI is bounded by the slowest shard, not the total test runtime. With well-balanced shards, 4 runners reduce CI time to approximately 25% of the original — plus a few minutes of runner startup overhead.
Cache dependencies aggressively
Dependency installation is often the single largest time sink in a CI pipeline, and it is almost entirely cacheable. The actions/cache action stores and restores dependency directories based on a cache key derived from your lockfile hash. A cache hit eliminates the install step entirely — typically saving 3–10 minutes per run.
- uses: actions/cache@v3
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-For monorepos, cache the entire node_modules directory keyed to the root lockfile hash. For pnpm workspaces, use ~/.local/share/pnpm/storeas the cache path. For Python, cache ~/.cache/pip. For Cargo (Rust), cache ~/.cargo/registry and the project's target/ directory.
Fail fast on first failure
Configure matrix jobs to cancel remaining runs as soon as one shard fails. This prevents a broken PR from consuming 10 minutes of runner time across 4 shards when the failure is obvious after 2 minutes on shard 1. In GitHub Actions, add fail-fast: true to the matrix strategy block (it is the default, but worth making explicit). The practical effect: failed CI surfaces faster, feedback loops tighten, and your bill from GitHub Actions shrinks.
Use path filters to skip irrelevant jobs
Not every PR needs to run every test. A change that touches only the frontend should not trigger backend integration tests. GitHub Actions' paths filter allows jobs to be skipped when changed files do not match the filter. A documentation update that modifies only *.md files should not run your full test suite at all.
The caveat: path filters require careful design. Missing a critical dependency path (e.g., a shared config file that affects both frontend and backend) can cause tests to be skipped when they should run. Start with obvious non-overlapping paths and expand from there.
Intervention 4: Streamline the deploy pipeline
Deploy pipeline time — the gap between merge and production — is the segment most frequently over-engineered. Teams accumulate manual gates, sequential steps, and approval requirements over years of incident-driven policy changes, resulting in deploy pipelines that take 20–45 minutes for changes that carry minimal risk. The right fix is surgical: preserve gates for high-risk deploys, eliminate them for standard ones.
Blue-green deployments eliminate migration wait
A common source of deploy pipeline time is waiting for the new version to be ready before cutting over traffic. Blue-green deployments reverse this: the new version is fully started and health-checked before a single request is routed to it. The cutover is instant. The previous version stays running for a configurable window, enabling immediate rollback by re-routing traffic rather than re-deploying.
The deploy time reduction from blue-green is typically 30–50% compared to in-place deployments that drain connections before starting the new instance. More importantly, it makes the deploy safer — your health check runs against real production infrastructure before any user traffic touches the new version.
Deploy on merge
For teams still deploying manually or on a schedule, the transition to deploy-on-merge is the highest-leverage process change available. When CI passes on main, the CD pipeline triggers automatically. There is no human gate, no deployment window, no "can someone deploy the release?" message in Slack. The code goes from merged to production in the time it takes the deploy pipeline to run.
The preconditions for deploy-on-merge: reliable CI that catches regressions before merge, automated smoke tests post-deploy, and fast rollback capability. All three are achievable independently of deploy-on-merge — and building them first makes the transition to continuous deployment smooth rather than chaotic.
Remove manual approval gates for standard deploys
Many teams have a manual approval step in their deploy pipeline that was added after an incident years ago and never removed. If every deploy requires a human to click "approve" in a tool, that human becomes a bottleneck — and the latency accumulates silently in a segment that teams rarely measure separately from CI.
The right model is not zero manual gates — it is conditional gates. Standard deploys (low deploy risk score, small change, well-tested) deploy automatically. High-risk deploys (large changes, unfamiliar files, elevated risk signals) require additional review before proceeding. Koalr's deploy risk score provides the signal: PRs under a risk threshold of 50 auto-deploy; PRs over 70 route to a manual approval queue. The decision is made per-PR, not as a blanket policy.
The lead time vs. safety tradeoff — and why it is a false one
The most common objection to reducing lead time is that speed comes at the cost of safety. The DORA research directly contradicts this. Elite teams have both short lead time AND low change failure rates — they are not in tension. The mechanism: shorter lead time forces smaller PRs. Smaller PRs have lower risk per deploy. Lower risk per deploy means fewer incidents. The path to safety runs through speed, not against it.
Metrics to track alongside lead time
Improving lead time in isolation is possible but risky. The interventions above can produce shortcuts — PRs that shrink in size by cutting tests, or deploys that speed up by skipping review — that improve lead time while degrading quality. Track these four metrics alongside lead time to ensure the improvement is genuine:
- Review coverage rate: the percentage of PRs that received at least one review before merge. This should not fall as you reduce review wait time. Faster reviews are the goal; fewer reviews is not.
- Test coverage on changed files: per-PR coverage delta, not aggregate repository coverage. If CI time improvements come from skipping tests, this will surface it.
- Deployment frequency: a leading indicator that lead time improvements are real. As lead time decreases, teams naturally deploy more often. If deployment frequency does not rise alongside falling lead time, the measurement may be off.
- Change failure rate: the proportion of deployments that result in a degradation requiring remediation. If lead time improvements are genuine — driven by smaller PRs and better review coverage rather than gate removal — CFR should hold steady or fall.
A 90-day lead time improvement plan
Most teams can achieve a 40–60% reduction in median lead time within 90 days without any new tooling purchases, given focused execution on the interventions above. The 90-day plan:
Month 1: Measure and change the review process
Start by instrumenting. Set up the GitHub API queries to measure each segment. Calculate your current baseline for total lead time, review wait, CI time, and deploy time — at the team level, not repository level. Identify your constraint.
In parallel, implement CODEOWNERS auto-assignment if it is not already in place. Establish the review SLA agreement in your team (4h first review, 24h turnaround) and introduce PR size guidelines — target under 400 lines for most PRs, with a process for splitting larger ones. Hold a team retro specifically on review latency, sharing the per-person time-to-first-review data. This is typically the most impactful conversation the team has had about lead time.
Month 2: Attack CI pipeline time
With review process changes underway, turn to CI. Identify the slowest test jobs in your current pipeline. Implement parallelization using matrix sharding for the largest suites. Add dependency caching for the languages in your stack. Enable fail-fast on matrix jobs. Add path filters to skip irrelevant jobs for documentation and non-functional changes.
Measure CI time before and after each change. Target: CI green in under 10 minutes for 80% of PRs. Track the slowest 20% separately — they often have a different root cause (integration tests against real databases, end-to-end browser tests) that requires a different intervention.
Month 3: Continuous deployment for low-risk changes
By month 3, review wait should be measurably lower and CI faster. The third month focuses on the deploy pipeline. Implement deploy-on-merge for your staging environment if you have not already. Configure automated smoke tests that run post-deploy and gate promotion to production.
Remove manual approval steps for deploys that score below your risk threshold. For teams using Koalr, configure the risk-based auto-deploy rule: PRs with a risk score under 50 deploy automatically when CI passes; PRs over 70 queue for manual review before deploy. This preserves safety for high-risk changes while eliminating friction for the majority.
At the end of 90 days, recalculate your four segment metrics. Teams that execute consistently across all three months typically see: review wait reduced by 50–70%, CI time reduced by 40–60%, deploy pipeline reduced by 30–50%, and total lead time reduced by 40–60%. The teams that do not see this improvement usually stalled in Month 1 — either failing to get buy-in on the review SLA or not tracking the baseline rigorously enough to know what changed.
Lead time for changes is ultimately a measure of how well your team has eliminated friction from the path between an idea and production. The interventions are known. The data to diagnose the constraint is available in GitHub. The primary inputs are will, measurement, and execution — not new tools or infrastructure spend.
Measure your lead time and find your constraint — free
Koalr connects to GitHub and breaks down your lead time into all four segments — coding time, review wait, CI, and deploy — at the team level, with per-PR detail and trend tracking. Connect in under 5 minutes and see exactly where your time goes.