How to Improve Deployment Frequency: A Practical Engineering Playbook
Deployment frequency is the DORA metric that compounds fastest. Teams that deploy more often get feedback faster, build smaller batches, and — counterintuitively — have lower change failure rates. Going from once per sprint to multiple times per day is achievable for most teams. This guide shows the path.
The compounding DORA metric
DORA research shows that elite teams deploying multiple times per day have 2× lower MTTR and 5× lower change failure rates than teams deploying once per month. Higher frequency does not cause more incidents — it causes fewer, because it forces the smaller batches and tighter feedback loops that make quality engineering possible.
Why deployment frequency improves everything else
Most engineering teams think about deployment frequency as an output — the result of having good CI/CD infrastructure and reliable code. That framing is backwards. Deployment frequency is an input. It shapes how your team builds software, how quickly you learn from production, and how much risk accumulates in any given change.
The causal chain is well established. Each deployment is a feedback loop: you ship code, production tells you whether it behaves as expected, you adjust. Teams that deploy more often close that loop more frequently. Over months, this compresses dramatically. A team deploying ten times per day runs 3,650 feedback loops per year. A team deploying twice per month runs 24. The compound learning effect of 152× more feedback loops is not incremental — it changes the character of how a team relates to production.
DORA's longitudinal research across thousands of teams provides the quantitative backing for this intuition. Teams in the elite deployment frequency tier (multiple deployments per day) have 2× lower mean time to restore and 5× lower change failure rates compared to teams in the low tier (once per month or less). The mechanism is the forced batch size reduction: when you deploy daily, a single change can contain at most one day's worth of work. Small batches are easier to test, easier to reason about, and dramatically easier to roll back when something goes wrong.
There is one important myth to address before going further. Teams stuck at once-per-sprint deployment cadences often say: "we cannot deploy more often because our customers do not want constant changes." This conflates two distinct concepts. A deployment is an infrastructure event — pushing code to a production environment. A release is a product event — making a feature visible to users. The two do not have to be coupled. You can deploy twenty times per day while releasing features on a scheduled cadence that suits your customers. Feature flags are the mechanism. We will return to this in intervention three.
The deployment frequency ladder
Before optimizing, you need to know where you are. Teams sit at one of five levels on the deployment frequency ladder. Each level has characteristic practices and a distinct set of barriers preventing the move to the next level.
The Deployment Frequency Ladder
Most engineering teams sit at Level 1 or Level 2. The barriers preventing the move to Level 3 and 4 are almost always process and cultural — not technical. The tools for continuous delivery have been commodity infrastructure for a decade. What is missing is the organizational commitment to small batches, automated pipelines, and separated deploy and release concerns.
Root cause analysis: why teams do not deploy more often
Before prescribing interventions, it is worth understanding the actual blocking factors. In our analysis of teams stuck at Level 1 and 2, six root causes appear consistently.
Manual deployment process. The most common and most fixable barrier. Deploying requires a human to run a command, SSH into a server, or navigate a deployment UI. This creates activation energy for every single deploy. Teams with manual deploy processes naturally minimize the number of deploys to minimize that friction. The fix is full automation — a topic we cover in Intervention 1.
Long CI pipelines. A 45-minute CI run makes "deploy on every commit to main" economically painful. If you have 8 engineers committing to main and CI takes 45 minutes, you either queue builds (delaying deploys by hours) or pay for significant parallelization infrastructure. The solution is parallelizing and caching CI — most teams can get a 45-minute pipeline down to 10 minutes with targeted work on test splitting and build caching.
Large batch sizes. Teams that hold changes until a release "bundle" is ready naturally land at Level 1. Batch size reduction is the most leveraged intervention, and it is primarily a team agreement, not a technical change. Trunk-based development — covered in Intervention 2 — is the forcing function.
Fear of incidents. "We don't deploy more often because we are not confident enough in our deploys to ship more frequently." This is a reasonable concern, but it is a fixable one. Pre-merge deploy risk scoring gives teams an objective answer to the question "should we deploy this?" rather than relying on intuition. We cover this in Intervention 4.
Manual QA gates. If every deploy requires a QA engineer to sign off on a manual test pass, deployment frequency is bounded by QA throughput. The productive path is not eliminating QA — it is automating the common cases (smoke tests, regression suites, critical path tests) so that manual QA effort is reserved for genuinely ambiguous cases that automated coverage cannot address.
Environment promotion bottlenecks. Promoting from staging to production requires approval from multiple teams or manually filing a change request. This is common in regulated industries and larger organizations. The solution is policy automation: codify the approval criteria, automate the checks, and require human approval only for changes that fall outside automated policy bounds.
Intervention 1: Automate the deployment pipeline
The first intervention is the most mechanical. If deploying requires a human to do anything beyond merging a PR, that is activation energy you are paying on every deploy. Eliminate it.
The standard pattern with GitHub Actions is an on: push trigger on the main branch that runs your deploy workflow:
on:
push:
branches: [main]
jobs:
deploy:
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests
run: pnpm test
- name: Deploy (blue-green)
run: ./scripts/deploy.sh --strategy=blue-green
- name: Health check
run: ./scripts/health-check.sh --timeout=120
- name: Smoke tests
run: pnpm test:smoke
- name: Rollback on failure
if: failure()
run: ./scripts/rollback.shThe blue-green deploy with a health check before traffic cutover is a significant anxiety reducer for teams nervous about frequent deploys. The new version is provisioned and health-checked before a single user request reaches it. If the health check fails, the old version continues serving traffic and the pipeline fails loudly. No user impact, no manual intervention needed.
Add smoke tests that run after deploy and trigger an automatic rollback on failure. This closes the loop on the most common failure mode — the deploy succeeds but the application is misbehaving. A five-test smoke suite that hits your critical user paths takes two days to write and provides a meaningful safety net that makes deploying more often much less scary.
Time to implement for most stacks: one to two engineering days. This is the highest return-per-effort intervention on this list for teams at Level 0 or 1.
Intervention 2: Reduce batch size with trunk-based development
Batch size reduction is the single most impactful lever for improving all four DORA metrics simultaneously, and trunk-based development is the practice that forces it.
Trunk-based development has one defining rule: all engineers merge to main at least once per day. That rule has a powerful forcing function built in — if you must merge daily, your branches must be small enough to merge daily. Long-lived feature branches (three-plus days) are structurally incompatible with the practice. You cannot build a two-week feature on a two-week branch if the branch must be merged tomorrow.
The response most teams have is: "but we cannot merge incomplete features to main." This is where feature flags become essential — and we will cover that in Intervention 3. For now, accept the constraint: trunk-based development requires that incomplete features be hidden behind flags, not isolated on long branches.
The practical implementation has three components:
GitHub branch protection settings. Enable "require branches to be up to date before merging" on your main branch. This forces engineers to merge main into their branch before they can merge, which surfaces conflicts early rather than at the end of a two-week sprint. It also creates a natural feedback signal: if your branch is frequently out of date, your batches are too large.
Team agreement on branch lifetime. No long-lived feature branches without a documented plan to split the work. The working limit is 24 hours for most changes. Work that cannot be expressed in a 24-hour branch should be broken down into incremental steps, each of which ships behind a flag.
PR size norms. Establish a team agreement on PR size. A reasonable starting point: PRs over 400 lines of non-generated code should have a documented reason to be that large. Most of the time, the reason does not exist — the PR is large because it accumulated scope over several days, not because the work required it. Tracking average PR size over time, alongside deployment frequency, is a leading indicator for deployment frequency improvement.
Intervention 3: Separate deployment from release
This is the most important conceptual shift for teams stuck at the "we cannot deploy because the feature is not ready" barrier. The confusion between deployment and release is so common, and so costly, that it deserves careful unpacking.
Deployment is an infrastructure event: you push code from your repository to a production server. The code exists in production. No user necessarily sees it. No business decision has been made about whether to show it.
Release is a product event: you make a feature visible to users. A business decision has been made. A product manager has said "this feature is ready." Marketing may be involved. The customer-facing experience changes.
Feature flags decouple these two events completely. You can deploy twenty times per day, shipping code that includes partially-built features, while releasing features on whatever cadence your product team chooses. The flag is off until the product team turns it on. From the user's perspective, nothing changed. From the engineering team's perspective, every commit is in production immediately.
The dark launch pattern is the practical implementation of this separation. Ship the code, turn the feature on for internal users first (your own team, your support staff, beta customers), collect feedback, fix issues found in that smaller blast radius, then gradually expand the rollout. By the time you flip the flag for all users, you have already run production traffic through the code — just for a controlled subset.
For teams at Level 1 who say they cannot increase deployment frequency due to product release schedules: adopting feature flags transforms the conversation. The product team controls the release schedule as before. The engineering team ships code to production continuously. Both concerns are satisfied without either team compromising.
Feature flags as a prerequisite
Trunk-based development without feature flags creates a real problem: incomplete features get deployed and seen by users. Feature flags solve this by separating the code ship from the feature ship. Teams that adopt trunk-based development without simultaneously adopting feature flags frequently regress to long-lived branches within 60 days. Implement both together.
Intervention 4: Use deploy risk scoring to deploy with confidence
The psychological barrier to frequent deployment is often understated in technical discussions about CI/CD. Engineers know the tooling exists. They know continuous delivery is possible. But the question that makes them hesitate is: "will this particular change break something?" Without an objective answer to that question, the answer defaults to caution.
Deploy risk scoring provides an objective answer. Instead of intuition or experience or gut feel, each PR gets a 0–100 score based on signals that are empirically predictive of post-deployment incidents: change size, file churn, author expertise in the changed files, review coverage, timing, historical failure rate of touched files, and test coverage delta. A PR that scores 35 is a different conversation than a PR that scores 78.
The practical effect on deployment frequency is significant. When engineers can see that a change scores 28 — small diff, experienced author in these files, good review coverage, Tuesday morning — the psychological barrier drops substantially. That change is not scary. The data says so. Deploy it.
Conversely, when a change scores 74, the risk score surfaces the specific reason: large diff, author has no prior commits to three of the changed files, merged without a second reviewer. The engineer now has actionable information — not a block, but a signal. Split the PR, add a reviewer, or deploy with eyes open and a rollback plan ready.
Teams using Koalr report that pre-merge risk visibility reduces the psychological barrier to frequent deployment substantially. The data supports this: PRs that score under 40 in Koalr's model have a sub-2% incident rate across tracked deployments. That number gives teams a concrete foundation for the intuition that "this change is safe to ship."
The 90-day frequency improvement plan
The four interventions above are not sequential — but they do have natural dependencies. Automating the pipeline is a prerequisite for continuous delivery. Trunk-based development is a prerequisite for deploying multiple times per day. Feature flags are a prerequisite for trunk-based development to be sustainable. Risk scoring is additive at any stage but most impactful once the automated pipeline is in place.
A realistic 90-day plan for a team moving from Level 1 (monthly scheduled releases) to Level 3–4 (continuous delivery):
Month 1: Automate the deploy pipeline. Remove every manual step from the deploy process. Set up CD on merge to main. Add a health check and basic smoke tests. Target: every merge to main results in a production deployment automatically. Measure: deploys per week should rise from 2–4 (sprint-end bundled) to 20–40 (per-commit). Track change failure rate in parallel — it should hold steady or improve as batch sizes shrink.
Month 2: Reduce batch sizes. Introduce trunk-based development practices. Implement feature flags for any in-progress work that is not ready to be user-visible. Set team norms on branch lifetime (24-hour target) and PR size (under 400 lines non-generated). Target: 50% reduction in median PR age and median PR size. Measure PR age and PR lines changed weekly — these are the leading indicators for deployment frequency improvement.
Month 3: Decouple deploys from releases with feature flags. Instrument all in-progress features with flags. Establish a dark launch process: internal preview, then gradual rollout. Establish deploy risk scoring as a standard part of pre-merge review. Target: deploying at least once per day per service. Measure: deploys per day (not per sprint) and MTTR (which should be falling as batch sizes shrink and rollback becomes simpler).
One measurement note: track deploys per week per service, not per repository. A monorepo that deploys five independent services should count five deploys per day when each service deploys once. This matters because frequency targets differ between services — a customer-facing API has different deployment economics than a weekly batch job.
Deployment frequency benchmarks
How does your team's current deployment frequency compare to DORA tiers? The table below maps frequency tiers to practices and outcomes based on the DORA State of DevOps research:
| Tier | Frequency | Practices | Change Failure Rate |
|---|---|---|---|
| Elite | Multiple / day | CD on merge, feature flags, trunk-based development | <5% |
| High | Daily – weekly | CD on demand, short-lived branches (<3 days) | 5–10% |
| Medium | Weekly – monthly | Manual deploy trigger, sprint-cycle releases | 10–15% |
| Low | Monthly or less | Ad-hoc, waterfall, high-ceremony release bundles | >15% |
The inverse relationship between deployment frequency and change failure rate is the most counterintuitive finding in the DORA research for teams that have not lived it. Intuitively, more deploys should mean more chances for failures. In practice, the batch size reduction that higher frequency forces more than compensates — smaller changes fail less often, and when they do fail, the failure is smaller, faster to diagnose, and easier to roll back.
Common mistakes when increasing deployment frequency
Teams that successfully automate their pipeline and start deploying more often sometimes encounter avoidable pitfalls in the transition. These are the most common:
Increasing frequency without feature flags. Deploying incomplete features to production and exposing them to users creates a real product quality problem and a customer trust problem. If you are moving to continuous deployment without feature flags already in place, implement the flags first. The sequence matters.
Increasing frequency without improving monitoring. More deploys mean more potential signals in your alerting system. If your monitoring is already noisy and poorly tuned, increasing deployment frequency will amplify the noise. Before pushing to continuous deployment, audit your alerts: how many are actionable? What is your false positive rate? Deploying frequently into poor observability is a recipe for alert fatigue that causes real incidents to be missed.
Counting staging deploys in your frequency metric. Deployment frequency for DORA purposes is production deployments only. Teams sometimes inflate their reported frequency by counting staging, QA, or preview environment deploys. This is a vanity metric — staging deploys do not close a production feedback loop and do not contribute to the batch size reduction that drives the DORA outcomes. Track production deployments specifically.
Not measuring change failure rate alongside frequency. Frequency is only a positive metric if change failure rate holds steady or improves as frequency rises. It is entirely possible to increase deployment frequency while also increasing change failure rate — by shipping lower-quality changes faster. Frequency rising while CFR also rises is not a DORA improvement. Always track the two metrics together. The goal is the elite tier on both dimensions.
Skipping the cultural work. Continuous deployment is not a technical achievement alone — it is a team agreement about how work is structured. If engineers are not bought into small batches, short branches, and daily merges to main, they will find ways to work around the technical infrastructure. The pipeline automation is the easy part. The team agreement is where most improvement efforts stall or regress. Make the rationale explicit, show the data, and run the 90-day plan with visibility into the metrics it is meant to improve.
Measure your deployment frequency with Koalr
Koalr tracks deployment frequency per service alongside change failure rate, MTTR, and lead time — with DORA tier classification and trend analysis built in. Connect GitHub in under five minutes and see where your team sits on the deployment frequency ladder today.