Does Koalr replace PagerDuty or OpsGenie?

Koalr does not replace PagerDuty or OpsGenie — those tools manage your on-call rotations and escalations. Koalr integrates with both via webhooks and uses your incident data for MTTR trending, deploy-risk correlation, and change failure rate analysis.

Is Koalr a DORA metrics tool?

DORA metrics (deployment frequency, lead time, change failure rate, MTTR) are included in every Koalr plan, but they are the foundation, not the ceiling. Koalr is the only platform that uses DORA signals alongside 28 additional signals to predict whether a specific pull request will cause a production incident — before you merge.

How does Koalr predict deployment failures?

Koalr scores every pull request 0–100 using 36 signals: coverage delta, CODEOWNERS compliance, file churn, author experience, change entropy, DDL migrations, SLO burn rate, PR size, review latency, and more. The model is trained on the correlation between these signals and production incidents.

How is Koalr different from Swarmia or LinearB?

Swarmia and LinearB are excellent DORA dashboards. Neither has pre-deploy risk prediction, CODEOWNERS drift detection, or LLM chat on live engineering data. Koalr ships all four capabilities together: deploy risk scoring, CODEOWNERS governance, coverage-to-risk correlation, and Koalr AI.

Can Koalr automatically block risky deploys?

Yes, on the Business plan. Koalr writes a GitHub Check Run named "Koalr Deploy Risk" that fails when a PR score exceeds your configured threshold. Combined with branch protection rules requiring that check to pass, this blocks any engineer from merging until the risk factors are addressed.

Is my source code ever sent to Koalr or an AI model?

No. Koalr never reads or stores your source code. We pull metadata only: PR titles, file paths, line counts, check run statuses, commit SHAs, and coverage percentages.

What Is Blast Radius in Software Deployments (And How to Model It)

What blast radius means in software

The term comes from military engineering, where it describes the area of effect of an explosion. In software, it describes the set of services, users, or systems that are affected when a specific component fails.

In a monolith, blast radius is simple: one deployment, one failure surface. In a microservices architecture, it is a graph problem. Every service has upstream callers and downstream dependencies. A failure propagates along those edges, and the question is how far and how severely.

Blast radius modeling answers two questions before you deploy:

→If this service fails, which downstream services are at risk?
→How severe is the risk at each hop in the dependency chain?

The service dependency graph

To model blast radius, you first need a service dependency graph: a directed graph where nodes are services and edges represent call relationships. Service A calls Service B means there is a directed edge from A to B — if B fails, A is affected.

In practice, this graph comes from three sources:

1.Declared dependencies — service manifests, API gateway configurations, or service mesh topology
2.Observed traffic — actual call patterns from your observability stack (Datadog APM, Jaeger, OpenTelemetry traces)
3.Incident history — which services actually went down together in past incidents, capturing implicit dependencies that never make it into manifests

The incident history source is underused. Two services that consistently co-fail even without a declared dependency often share a database, a Redis cluster, or a third-party API. Historical co-failure is a more honest dependency map than any documentation.

How risk propagates hop-by-hop

Once you have the graph, you can propagate risk through it. The model Koalr uses works like this:

The propagation formula

propagated_score = source_score × decay_factor^hop_count × coupling_weight

source_score — the deploy risk score of the service being deployed (0–100)
decay_factor — typically 0.6–0.75, meaning risk attenuates with each hop
hop_count — number of edges from the failing service to the affected service
coupling_weight — 0.0–1.0 representing how tightly coupled the services are (call frequency, shared data stores, SLA dependency)

So if payments-service has a risk score of 80, a Hop 1 neighbor with tight coupling (0.9) and a decay factor of 0.7 would receive a propagated score of 80 × 0.7¹ × 0.9 = 50.4. A Hop 2 neighbor with loose coupling (0.3) would receive 80 × 0.7² × 0.3 = 11.8.

The exponential decay matters because in real systems, failures do attenuate. Circuit breakers, retries, graceful degradation, and feature flags all reduce propagation. Linear propagation models overstate downstream risk and produce alert fatigue.

Why Hop 1 neighbors matter most

In practice, the engineering insight from blast radius modeling comes almost entirely from Hop 1 — the direct callers of the service being deployed. These are the services with no buffer: if your service returns 500s, they see 500s. If your service adds 200ms of latency, they add 200ms to their p99.

What makes this useful pre-merge is that Hop 1 callees are often owned by different teams. The payments-service team may not know that recommendation-service has added a synchronous call to their checkout path in the last sprint. Blast radius visualization surfaces that dependency — and the on-call contact for the downstream team — before the deploy window opens.

At Hop 2 and beyond, the signal-to-noise ratio drops. By the time risk has propagated two or three hops through circuit breakers and independent retry budgets, the propagated score is usually low enough to be informational rather than actionable.

Blast radius as a deploy gate

The most direct use of blast radius modeling is as a pre-merge gate. If the source service has a risk score above a threshold and the Hop 1 propagated score for any downstream service also exceeds a threshold, the deploy is flagged for human review.

This is more useful than gating on source risk alone because it surfaces the systemic exposure. A moderate-risk change to a service with 12 tight downstream callers is more dangerous than a high-risk change to an isolated service with no callers. The source risk score tells you one thing; the blast radius tells you the system-level consequence.

Concretely, the workflow looks like this:

1PR opens. Koalr calculates the deploy risk score for the source service (0–100, 36-signal model).
2Blast radius is computed by walking the dependency graph from the source service, calculating propagated scores at each hop.
3If any downstream service exceeds the propagated risk threshold, the PR is flagged and the downstream team owner is notified.
4The PR author and downstream owners see the full hop-by-hop breakdown before approving the merge.

The dependency graph maintenance problem

The hardest part of blast radius modeling is not the algorithm — it is keeping the dependency graph current. In fast-moving teams, service dependencies change weekly. New internal clients appear. Deprecated call paths linger in the graph. The coupling weight of a relationship changes when teams switch from synchronous HTTP calls to async event queues.

Three practices reduce graph staleness:

→Continuous trace-based discovery. Ingest OpenTelemetry spans and update edge weights from actual traffic patterns weekly. Declared dependencies become a fallback, not the primary source.
→Co-failure mining. When an incident resolves, analyze which services went down together and add or weight edges accordingly.
→Team-owned graph segments. Service owners attest to their callers and callees as part of the onboarding process, and the graph UI makes ownership visible so staleness is noticed.

Blast Radius in Koalr

Koalr's blast radius tool lets you input any service name and risk score and see the full hop-by-hop propagation tree with propagated scores and risk badges, running against your actual service dependency graph. See the blast radius feature →