Why feature flags are your best deployment safety net

Rollback strategies come in several forms: git reverts, redeployments of previous artifact versions, database migration reversals. All of them involve time — time to detect the problem, time to initiate the rollback, time for the deployment pipeline to run. During that window, users are hitting broken code.

Feature flags operate differently. When a flag controls a code path, disabling it in production takes seconds: log into LaunchDarkly, flip the toggle, the change propagates to your SDK clients within milliseconds via the streaming connection. No pipeline. No build. No deployment. The rollback window collapses from minutes-to-hours down to seconds.

The engineering industry has understood this for over a decade. Facebook's gatekeeper system, Google's experiment framework, and every major SaaS company's internal deployment tooling have converged on the same insight: flags decouple deployment from release, and that decoupling is one of the most powerful risk reduction levers available at the delivery layer. LaunchDarkly productized this pattern for the broader market and made it accessible to teams of any size.

The problem is not that teams lack access to feature flags. The problem is that flag usage is inconsistent, untracked, and unmeasured. High-risk changes ship without flags. Flags accumulate and never get cleaned up. Targeting rule complexity grows unchecked. The kill-switch capability exists, but the operational visibility to use it well does not.

That is the gap Koalr fills with its LaunchDarkly integration.

The three flag lifecycle risks

Flag-related deployment risk falls into three distinct failure modes, each occurring at a different point in the flag lifecycle.

Risk 1: No flag on a risky change

The most acute risk is also the most common: a high-risk PR ships to production without any feature flag wrapping its critical code paths. There is no kill switch. If the change causes an incident, the only recovery path is a full redeployment — which takes time the system may not have.

Koalr adds a +20 penalty to the deploy risk score when it detects this combination: a PR scoring above 60 on the base risk model (large diff, new DB migration, low test coverage, or other high-signal indicators), and no reference to a LaunchDarkly SDK call in the diff. The detection looks for the canonical SDK patterns: ldClient.variation(...) in server-side Node or Go code, useFlags() in React, and LDClient.variation(...) in other SDK variants. If none are present in a high-risk PR, the score reflects the absence of a rollback mechanism.

This signal is not about mandating flags on every PR. A minor copy fix or a CSS adjustment does not need a feature flag. The signal is specifically targeted at the PRs where the risk model has already identified meaningful deployment risk from other signals. Those are the changes where a kill switch has the highest expected value.

Risk 2: Stale flags

The opposite failure mode unfolds more slowly. A feature flag is created for a controlled rollout, the rollout completes successfully, the flag reaches 100% of traffic — and then nothing happens. The flag is never removed. Weeks become months.

Stale flags accumulate technical debt in several ways. The flag condition branches remain in the codebase, adding code paths that will never be evaluated. Test suites that simulate both flag states double the test surface area for a code path that is effectively dead. New engineers reading the code cannot tell whether the flag is live or legacy without checking LaunchDarkly. And in the worst case, a stale flag is accidentally re-evaluated in a new context — with unexpected results.

Koalr defines a stale flag as one that has been at 100% rollout for more than 30 days without removal. This definition is conservative enough to account for teams that run extended post-rollout observation periods before cleanup, while still surfacing flags that have genuinely been forgotten. The stale flag count and list are displayed in Koalr's coverage page, with flag key, owner team, creation date, and last evaluation date.

Risk 3: Flag targeting rules explosion

The third failure mode is structural: a single flag accumulates more than 20 targeting rules over time. Targeting rule complexity is one of the subtler risks in flag management, but it is a real one.

Flags with many rules become fragile. A rule that targets users by email pattern interacts with a rule targeting by plan tier, which interacts with a rule targeting by geographic region. Understanding who actually sees which variation requires mentally simulating all rules in order. Adding a new rule to a complex flag risks introducing a contradiction or unintended overlap. Debugging unexpected flag evaluations in production becomes significantly harder.

LaunchDarkly's API surfaces targeting rule count per flag through the GET /api/v2/flags/{projectKey} response, which returns the rules array for each flag variation. Koalr tracks the trend of targeting rule count across your flag inventory and alerts when any flag exceeds 20 rules, or when the fleet-wide average rule count is trending up over a 30-day window.

The LaunchDarkly data model

Understanding what Koalr collects from LaunchDarkly requires a brief orientation to LaunchDarkly's data model. The key entities are projects, feature flags, environments, and flag evaluations.

Projects and environments

A LaunchDarkly project is the top-level namespace, typically corresponding to a product or application. Within a project, environments provide separate targeting configurations — production, staging, and development typically each have independent flag states and targeting rules. A flag can be on in production and off in staging, or can use completely different targeting rules per environment.

This separation matters for Koalr's integration: the stale flag and kill-switch signals are environment-specific. You configure Koalr to monitor the production environment, and all flag state analysis is scoped to that environment's configuration and evaluation history.

Feature flags API

The primary endpoint Koalr uses for flag inventory is:

GET /api/v2/flags/{projectKey}

Response shape (abbreviated):
{
  "items": [
    {
      "key": "payments-v2-checkout",
      "name": "Payments V2 Checkout Flow",
      "kind": "boolean",
      "tags": ["payments", "q1-2026"],
      "variations": [
        { "value": true },
        { "value": false }
      ],
      "environments": {
        "production": {
          "on": true,
          "fallthrough": { "variation": 0 },
          "rules": [...],
          "_summary": {
            "variations": {
              "0": { "rules": 0, "rollout": 100000 },
              "1": { "rules": 2, "rollout": 0 }
            }
          }
        }
      },
      "creationDate": 1738800000000,
      "_maintainer": { "email": "alice@example.com" }
    }
  ]
}

The _summary.variations object is the key field for stale flag detection. When rollout reaches 100000 (representing 100%) on the primary variation, and creationDate is more than 30 days old with no recent targeting rule changes, the flag enters the stale candidate pool.

Authentication to the REST API uses an access token in the Authorization header: Authorization: api-<access-token>. This is distinct from the SDK key used by your application at runtime (Authorization: <sdk-key>), which only permits flag evaluation, not API reads. The access token for Koalr needs Reader role only — no write access required.

Flag evaluations

For kill-switch activation tracking, Koalr uses the evaluations endpoint:

GET /api/v2/usage/evaluations/{projectKey}/{environmentKey}/{featureFlagKey}

Query parameters:
  from=<unix_ms>   start of time window
  to=<unix_ms>     end of time window

Response:
{
  "series": [
    { "time": 1738800000000, "value": 14823 },
    { "time": 1738886400000, "value": 0 },
    { "time": 1738972800000, "value": 12 }
    ...
  ]
}

A kill-switch activation is inferred from a discontinuity in evaluation counts: a flag with sustained daily evaluations (indicating live traffic) that drops to zero or near zero corresponds to a flag being turned off in production. Koalr correlates these drops against incident timestamps from your PagerDuty or incident.io integration to determine whether the deactivation was a reactive rollback or an intentional controlled shutdown.

What Koalr tracks from LaunchDarkly

Koalr syncs your LaunchDarkly flag inventory nightly and surfaces four primary metrics in the engineering dashboard.

New flags per week

Flag creation velocity is a leading indicator of both feature delivery pace and future cleanup burden. A team shipping three to five new flags per week is in a healthy growth pattern — active feature development with controlled rollouts. A team creating 25 flags in a week has either a major release in progress or has stopped using any discipline around when flags are appropriate. Koalr tracks this as a weekly trend chart alongside your deployment frequency metric.

Stale flag count

The stale flag count is the number of flags in your production environment that are at 100% rollout and have been for more than 30 days. Koalr displays the raw count as a KPI tile and provides the full list in the coverage page — flag key, owner (from the _maintainer field or the team tag on the flag), creation date, and last evaluation date. The list is sorted by age descending: the flags that have been stale longest appear first, which is usually where the cleanup value is highest.

Kill-switch activations

Kill-switch activations over the trailing 90 days represent your real-world rollback rate. This is a metric most engineering teams have never measured, which means they have no baseline understanding of how often their feature flag infrastructure is actually protecting them from incidents. Koalr surfaces the count, the trend, and the correlation with incident data (more on this in the DORA signal section below).

Flag coverage on risky PRs

The flag coverage rate is the percentage of PRs that scored 60+ on Koalr's deploy risk model and contained at least one LaunchDarkly SDK call in their diff. A coverage rate of 100% means every high-risk change shipped with a kill switch available. A rate of 40% means most high-risk changes shipped without one.

This metric is most useful as a trend rather than a point-in-time number. A team that starts at 30% coverage and reaches 75% over a quarter has made a real improvement in rollback readiness — one that should eventually be visible in their Change Failure Rate and MTTR metrics.

The deploy risk signal: no rollback mechanism detected

When Koalr's deploy risk model scores a PR at 60 or above — indicating meaningful deployment risk from change size, author expertise, coverage delta, or other signals — it checks the PR diff for LaunchDarkly SDK references. The check is pattern-based and scans the full text of added lines in the diff:

// Detected patterns (any of these in added lines satisfy the check)
ldClient.variation(
ldClient.boolVariation(
ldClient.stringVariation(
useFlags()
useLDClient()
useFeatureFlag(
ld.BoolVariation(
ld.StringVariation(

If none of these patterns appear in the diff of a 60+ risk PR, Koalr adds a "No rollback mechanism detected" advisory to the PR's risk breakdown. This advisory carries a +20 adjustment to the risk score and appears in the PR detail view alongside the other contributing signals.

The advisory is not a hard block. Koalr deliberately avoids binary pass/fail gates that engineers work around rather than engage with. The goal is to surface the risk clearly enough that the pre-merge conversation includes it: "this PR scores 78, partially because we have no flag on the payment flow change — do we want to add one, or are we comfortable shipping this without a rollback path?" That is a decision the team should make consciously, not skip by default.

Setting up LaunchDarkly in Koalr

The integration requires an access token from LaunchDarkly and your project key. Setup takes under two minutes.

Step 1: Generate a LaunchDarkly access token

In your LaunchDarkly account, navigate to Account Settings → Authorization → Access Tokens. Create a new token with the built-in Reader role. Reader access is sufficient for all of Koalr's LaunchDarkly data collection — flag inventory, targeting rules, and evaluation usage data. No write access is needed or requested.

Copy the token immediately after creation. LaunchDarkly will not display it again.

Step 2: Add the integration in Koalr

In Koalr, go to Settings → Integrations → LaunchDarkly. Paste your access token and enter your LaunchDarkly project key. The project key is the short identifier visible in your LaunchDarkly project URL — typically a lowercase slug like my-app or platform.

Step 3: Select the environment to monitor

Choose the environment whose flag state and evaluation history you want Koalr to analyze. For most teams this is production. Koalr will use this environment's targeting configuration for stale flag detection and kill-switch activation tracking.

After saving, Koalr performs an initial flag inventory sync and populates the stale flag list and creation velocity chart. Kill-switch activation history is backfilled for the trailing 90 days using the LaunchDarkly evaluations API.

Subsequent syncs run nightly. Flag coverage on new PRs is checked in real-time as PRs are opened or updated, using your connected GitHub integration.

Kill-switch activations as a DORA signal

Kill-switch activations — moments when a feature flag was turned off in production — are a largely unmeasured signal in most engineering organizations. Koalr treats them as a direct input to Change Failure Rate analysis.

The traditional Change Failure Rate definition counts incidents or hotfixes as the failure signal. Flag-based rollbacks are a distinct third category: a team that prevents an incident from escalating by disabling a flag within minutes of detecting anomalous behavior has technically executed a successful deployment from a user-impact perspective, but the need to activate the kill switch is evidence that the change introduced a problem.

Kill-switch activations vs. incident correlation (trailing 90 days)

Flag deactivated	Incident opened within 2h	Classification	DORA impact
payments-v2-checkout	Yes — P1 opened 14 min after	Reactive rollback	Counts as change failure
dark-mode-beta	No	Controlled shutdown	Neutral — planned
search-reindex-v3	Yes — P2 opened 48 min after	Reactive rollback	Counts as change failure
new-onboarding-flow	No	A/B test stopped	Neutral — intentional
cdn-edge-caching	Yes — P1 opened 7 min after	Reactive rollback	Counts as change failure

Koalr applies a two-hour attribution window for incident correlation: if a kill-switch activation is followed by an incident ticket within two hours, Koalr classifies the activation as a reactive rollback and counts it against Change Failure Rate. Activations without a correlated incident (planned flag shutdowns, A/B test conclusions, deliberate phased rollbacks) are classified as neutral and do not affect the metric.

The practical value of this signal is that it surfaces a pattern most DORA measurements miss entirely: teams that are using feature flags to prevent incidents from becoming visible to users are often underreporting their true change failure rate. The flags are working — and working well — but the deployment quality signal is being lost. Koalr makes it visible.

Flag hygiene with AI Chat

Koalr's AI Chat interface has direct access to your LaunchDarkly flag inventory as part of its context. You can ask natural language questions about your flag state and get structured answers without navigating the LaunchDarkly dashboard:

AI Chat — Flag hygiene queries

You: Which flags have been at 100% for over 30 days?

Koalr: I found 6 stale flags in your production environment that have been at 100% rollout for more than 30 days:

1. payments-v2-checkout — owned by Payments team, created 94 days ago, last evaluated yesterday
2. dark-mode-beta — owned by Frontend team, created 67 days ago, last evaluated 3 days ago
3. search-reindex-v3 — owner unknown (no team tag), created 52 days ago, last evaluated 12 days ago

...and 3 more. The highest-priority cleanup candidate is payments-v2-checkout — it has been at 100% for over 90 days with daily evaluations, suggesting the flag code is still in place but has been fully rolled out for months.

You: Which flags had targeting rule changes in the last 7 days?

Koalr: 3 flags had targeting rule modifications in the last 7 days: new-billing-engine (2 rules added, now at 18 total — approaching the complexity threshold), feature-ai-assistant (1 rule added), and mobile-checkout-redesign (rollout increased from 10% to 50%).

The AI Chat can also correlate flag data with your PR history. "Did any of our stale flags ship in a PR that we could retrospectively identify as high-risk?" is a question that cuts across the GitHub and LaunchDarkly datasets simultaneously — something that would require manual cross-referencing without the integrated data model.

Flag hygiene best practices

Connecting LaunchDarkly to Koalr gives you visibility. Acting on that visibility requires operational practices that turn the data into team behavior. Five practices have proven most effective for teams managing flag sprawl at scale:

Establish naming conventions at flag creation. A flag named feature-payments-v2-checkout-q1-2026 carries more information than new-checkout. Include the team prefix, the feature name, and the quarter. The quarter component makes stale identification obvious — a flag tagged Q1 2026 that still exists in Q3 has clearly outlived its intended lifecycle.
Require owner team tags on every flag. LaunchDarkly supports custom tags on flags. Standardize on a team: tag prefix (e.g., team:payments) and enforce it in your flag creation checklist. Ownership without a tag is accountability without a home — nobody removes the flag because nobody is accountable for it.
Set a TTL policy per flag type. Not all flags have the same expected lifespan. Release flags (controlling a new feature rollout) should have a 30-day TTL after reaching 100%. Kill-switch flags (permanent off switches for risk mitigation) have indefinite TTLs. Experiment flags (A/B tests) have TTLs tied to statistical significance windows. Document these in your engineering handbook and reference them in Koalr's stale threshold configuration.
Hold quarterly flag graduation ceremonies. A "flag graduation" is the planned removal of a flag from both LaunchDarkly and the codebase after a successful rollout. Scheduling this as a recurring team ritual — 30 minutes every quarter to review and action the stale flag list — prevents the cleanup backlog from growing indefinitely. Koalr's stale flag list exports to a CSV that can seed this review session.
Alert on zero evaluations for live flags. A flag that is marked as on in production but receiving zero evaluations is either misconfigured or orphaned — the flag is serving no traffic, which means either the code path it guards is unreachable, or the flag check was removed from code without removing the flag from LaunchDarkly. Both situations indicate a configuration problem. Koalr fires an alert when a production flag goes from sustained evaluations to zero for more than 48 hours without a corresponding kill-switch event.

Bringing it together

Feature flags are not a new idea, and LaunchDarkly is not a new product. The gap that Koalr addresses is not the tooling — it is the measurement layer that connects flag usage to deployment risk and delivery quality outcomes.

Without measurement, feature flags are a capability that engineering teams possess but use inconsistently. Some PRs get flags; most do not. Some flags get cleaned up; most accumulate. Kill-switch activations happen but are invisible to the engineering metrics that leadership reviews. The safety net exists but its utilization is unmeasured and its effectiveness unquantified.

With Koalr's LaunchDarkly integration, the picture changes. You can see which high-risk PRs shipped without a rollback path. You can track the stale flag count as a hygiene metric with a clear owner and a clear trend. You can understand how often kill-switches are actually catching deployment failures, and whether that rate is improving or worsening. And you can answer the question that matters most to engineering leadership: is our deployment safety posture getting better?

That is what the integration is designed to enable — not a dashboard for its own sake, but the data that makes better deployment decisions possible.

Feature Flags as a Deployment Safety Net: LaunchDarkly + Koalr