DORA MetricsMarch 2026 · 8 min read

LaunchDarkly and DORA Metrics: Connecting Flag Deployments to Delivery Health

DORA metrics — deploy frequency, lead time for changes, change failure rate, and MTTR — are the standard framework for measuring software delivery health. Most DORA implementations measure deployments. Feature flags mean that value reaches users in two distinct steps, and most DORA dashboards only capture one of them.

Why feature flags create a measurement gap in DORA

The DORA research program, which began at Google and is now maintained through the annual State of DevOps Report, defines deployment as the unit of delivery measurement. A deployment is when code changes go live to production. This definition made sense in 2014, before feature flags were standard practice at most engineering teams.

In a flag-heavy engineering organization, the deployment event and the release event are decoupled. Code is deployed continuously, but value reaches users when a flag is enabled. A team that deploys ten times a week but rolls out features once a month has high deploy frequency and low release frequency. Which number should their DORA dashboard show?

The answer depends on what question you are trying to answer. If you want to measure the health of your CI/CD pipeline, deployment frequency is the right metric. If you want to measure how quickly value reaches users — which is what DORA is ultimately trying to capture — you need to include flag rollout events.

LaunchDarkly does not expose DORA metrics natively. It is a feature flag platform, not an engineering analytics tool. Bridging the two requires pulling LaunchDarkly's flag event data into your DORA measurement pipeline alongside your deployment events.

How each DORA metric maps to LaunchDarkly events

DORA Metric	LaunchDarkly Event	Without Flags	With Flags
Deploy Frequency	Flag rollout event (percentage increase or full enable)	Only counts code deployments; underestimates actual delivery frequency	Each flag rollout increment is a release event — ships value to users even without a new deployment
Lead Time for Changes	Time from PR merge → flag reaching 100% rollout	Measured PR merge → deployment only; ignores time-to-user	Full lead time includes flag staging period; reveals hidden latency between "shipped" and "available"
Change Failure Rate	Flag kill-switch triggered = implicit failure signal	Only counts explicit rollbacks and hotfixes; misses flag-masked failures	Kill-switch events surface failures that were resolved silently without appearing in deployment history
MTTR	Time from kill-switch (incident signal) → stable rollout resume	Only measures incidents that went through formal PagerDuty/OpsGenie workflow	Flag-resolved incidents have their own MTTR signal; often faster but also often invisible to reporting

Deploy frequency: flags as release events

Deployment frequency measures how often an organization successfully releases to production. Elite DORA performers deploy multiple times per day. But at a team using LaunchDarkly, many of those deployments carry no user-visible change — the new code ships dark, behind flags set to 0%.

There are two ways to handle this in DORA measurement. The first is to count only code deployments, accepting that your deploy frequency metric measures CI/CD pipeline health but does not represent release cadence. The second is to count each flag rollout increment as a deployment event, treating the enabling of a flag for users as the point at which value is delivered.

Neither approach is universally correct. The right choice depends on what your team is optimizing for. If engineering leadership uses deploy frequency to assess pipeline maturity, count deployments. If product leadership uses it to assess release cadence, count flag rollouts. The important thing is to be explicit about which definition you are using — most teams are not.

There is also a third pattern worth distinguishing: dark launches, sometimes called flag-behind-flag. In this pattern, a feature is deployed and technically enabled at 0% rollout, but the infrastructure is exercised in production to warm caches and validate configuration. This is neither a deployment (the feature is not live) nor a release (no users see it). Dark launch events are worth tracking separately as a leading indicator of upcoming releases.

Lead time: the hidden gap between merge and rollout

Lead time for changes measures the time from a commit being made to it running in production. DORA defines "running in production" loosely — technically it means deployed, not necessarily user-visible. But the spirit of the metric is measuring how quickly value moves from idea to user.

For teams using feature flags, there is a meaningful and often large gap between "deployed to production" and "available to users." A feature might be deployed in January but held behind a flag until a product launch in March. The lead time measured from commit to deployment looks excellent. The lead time measured from commit to users seeing the feature is two months.

Koalr tracks both. The deployment lead time (commit → deployment) tells you about your CI/CD pipeline efficiency. The release lead time (commit → flag reaching 100% rollout) tells you about your full delivery cycle. The difference between the two — the time a feature spends staged behind a flag — is a metric we call flag staging time.

High flag staging time is not always bad. Planned launches, legal review periods, and coordinated go-to-market timing all justify holding features behind flags for weeks. But high flag staging time that accumulates accidentally — because the product decision stalled, because the rollout got deprioritized, or because no one scheduled the final enable — represents waste. Tracking it makes the waste visible.

Change failure rate: kill switches as implicit failure signals

Change failure rate measures the percentage of deployments that result in a failure requiring a hotfix, rollback, or patch. It is calculated as failed deployments divided by total deployments over a measurement period.

Feature flags create a new category of failure signal that does not appear in the standard change failure rate calculation: the flag kill switch. When an engineer disables a flag in response to user-reported errors, elevated error rates, or performance degradation, that is a rollback. It is functionally identical to a deployment rollback — it reverts a change to reduce user impact — but it happens through LaunchDarkly rather than through your deployment pipeline.

If you measure change failure rate only from deployment rollbacks and hotfixes, you are undercounting failures. Every kill-switch event that was triggered in response to an incident is a failure that should increment your change failure rate counter. Teams that use flags extensively often have lower apparent change failure rates precisely because flags make rollbacks so fast and painless — the failures are still happening, but they resolve quickly and do not show up in deployment history.

Koalr includes kill-switch events in change failure rate calculation. This typically increases a team's reported change failure rate, but it also gives a more accurate picture. A team with a 2% deployment-only change failure rate and a 5% flag-inclusive change failure rate has a real 5% failure rate that is partly masked by effective use of flags.

MTTR: flag-resolved incidents and recovery time

Mean time to recovery measures how long it takes to restore service after a failure. Traditional MTTR calculation uses incident management systems: the gap between when an alert fires and when the incident is marked resolved.

Feature flags create a parallel recovery path that bypasses the formal incident management workflow. When an engineer disables a flag to stop user impact, service restores within seconds — but the incident may never be formally opened in PagerDuty or OpsGenie if the flag kill switch resolved the problem before anyone filed a ticket.

This is actually a genuine improvement in MTTR, but it is invisible to your MTTR dashboard. The recovery happened; it just happened outside the measurement system.

Koalr measures flag-resolved MTTR as a separate metric: the time from when error rates spiked (or when a user report was filed) to when a kill switch was toggled and error rates returned to baseline. This gives you a complete picture of recovery performance — both incidents that went through formal channels and incidents that were resolved silently through flag management.

For elite DORA performers, flag-resolved MTTR is often under five minutes. That is a meaningful capability worth surfacing. A team that achieves sub-five-minute recovery through flag management should see that in their metrics — not have it invisible because it bypassed the incident management tool.

Implementing LaunchDarkly-aware DORA metrics

To implement DORA metrics that account for LaunchDarkly flag events, you need to pull data from three sources and correlate them on a common timeline:

GitHub or your source control — commit timestamps, deployment events, PR metadata
LaunchDarkly audit log — flag enable/disable events, percentage rollout changes, kill-switch events
Incident management — PagerDuty or OpsGenie alerts, incident open and close timestamps

Koalr connects all three. Flag events from LaunchDarkly appear in the same timeline as deployments and incidents, so your DORA metrics automatically incorporate flag rollout events without manual reconciliation.

The implementation also handles the disambiguation problem: not every flag enable is a release event, and not every kill switch is a failure. Koalr distinguishes between flag changes that affect end users (rollout changes, targeting rule changes) and flag changes that affect only internal configuration (maintenance flags, ops toggles). Only user-facing flag events contribute to DORA metric calculations.

What good looks like

A team with a mature LaunchDarkly + DORA integration typically sees three things in their metrics:

Deploy frequency increases when flag rollout events are included, reflecting the actual cadence at which value reaches users.
Change failure rate increases slightly as kill-switch events are included, giving a more accurate picture of the failure rate the team is actually experiencing.
MTTR decreases as flag-resolved incidents are included, revealing fast recovery capabilities that were previously invisible to reporting.

These changes do not reflect a deterioration in engineering health — they reflect a more accurate measurement of it. The team that previously appeared to have a 1% change failure rate and a 45-minute MTTR may actually have a 3% failure rate and a 6-minute MTTR. Both numbers are more useful for making decisions about where to invest in reliability engineering.

DORA metrics that include your LaunchDarkly data

Koalr connects LaunchDarkly flag events to your DORA dashboard — deploy frequency, lead time, change failure rate, and MTTR. Get a complete picture of delivery health, not just the deployment half of it.

See LaunchDarkly integration →Try Koalr free

← Back to Blog