Why PR merges are the wrong DORA signal in GitOps

The standard approach to measuring deployment frequency — counting merged PRs to the main branch of your application repository — made sense before GitOps. In a traditional CI/CD pipeline, a merge to main triggers a deploy pipeline that pushes directly to production. Merge and deploy are coupled events separated by minutes at most.

GitOps decouples them. In an ArgoCD-based GitOps workflow, the process looks like this: a developer merges code to the application repository, a CI pipeline builds and pushes a container image, the image tag is updated in the GitOps repository (either by the CI pipeline or a separate image update automation), and then ArgoCD syncs the new state to the cluster. Each of those steps introduces latency and a potential failure point.

If you count PR merges to the application repository as deployments, you are counting an event that may not have reached production yet — or may never reach it. A sync failure, a policy violation caught by ArgoCD admission webhooks, or a resource health check that immediately degrades will all produce a PR merge with no corresponding deployment to production. Your deployment frequency count goes up; nothing actually changes in prod.

The lead time measurement problem is equally serious. Lead time for changes is defined as the time from code commit to code running in production. If you measure lead time as commit-to-PR-merge, you are measuring the time to code review completion — a useful metric, but not DORA lead time. The actual DORA lead time ends when ArgoCD finishes syncing the change to the production cluster.

ArgoCD sync events as the correct DORA signal

Each of the four DORA metrics maps cleanly to specific ArgoCD events and status transitions. Using ArgoCD as the data source rather than GitHub Events produces numbers that reflect what is actually happening in production.

Deployment frequency

Count Synced events on production ArgoCD applications, not PR merges to the application repository. Specifically: events with reason: Sync and type: Normal from the ArgoCD application events API, scoped to applications with a production environment label.

This correctly handles cases where a single PR merge eventually produces zero production syncs (sync failure, sync was later reverted) or multiple production syncs (a change was promoted through staging and then production as separate sync events).

Lead time for changes

Lead time runs from the first commit included in a change to the ArgoCD sync completion timestamp. ArgoCD includes the Git commit SHA that was synced in the sync event payload (the app.status.sync.revision field). This SHA traces back to the first commit in the change set, giving you the full commit-to-production duration — the correct DORA definition.

Using commit-to-PR-merge instead underestimates lead time by excluding the image build, GitOps repo update, and ArgoCD sync phases — which commonly add 10 to 45 minutes and can add hours when sync policies are not set to auto-sync.

Change failure rate

A change failure in the ArgoCD model is an ArgoCD sync followed by a degraded application health status. Two patterns contribute to CFR:

The first pattern is a failed sync: a sync event with reason: Sync and type: Warning — indicating ArgoCD attempted to sync but the operation failed before reaching a healthy state. This is a direct deployment failure.

The second pattern is a successful sync followed by health degradation: the sync completes with type: Normal, but within a short window the application health status transitions from Healthy to Degraded or Missing. This is the GitOps equivalent of a deploy that succeeds at the infrastructure level but causes an application-level failure. Koalr uses a 5-minute window for this correlation — a pattern discussed in detail later.

MTTR

Mean time to recovery is measured as the duration between the health status transitioning to Degraded and the subsequent transition back to Healthy. ArgoCD emits ResourceHealthDegraded events when health drops, and the application health field returns to Healthy when recovery is complete — via a rollback sync, a hotfix sync, or a manual remediation. Both timestamps are available from the ArgoCD events and status API.

ArgoCD API overview

ArgoCD exposes a REST API that provides full access to application state, sync history, and events. The base URL for all API requests is:

https://your-argocd-server.example.com/api/v1/

Authentication uses a bearer token in the Authorization header:

Authorization: Bearer <argocd-token>

Generate a token for a read-only service account using the ArgoCD CLI:

argocd account generate-token --account koalr-readonly

The minimum RBAC permissions required for DORA metric collection are applications, get and applications, list on the project or projects you want to monitor. No write permissions are needed or requested.

Key endpoints

GET /applications — Lists all ArgoCD applications with their current health status (Healthy, Degraded, Missing, Unknown), sync status (Synced, OutOfSync), and the destination cluster and namespace. Used to build the application inventory and the app-to-Git-repo-to-environment mapping.

GET /applications/{name}/events — Returns paginated event history for a specific application. Events include sync attempts, health transitions, and resource-level events. This is the primary source for deployment frequency and CFR data. The response includes type (Normal or Warning), reason (Sync, ResourceHealthDegraded, etc.), message, and eventTime.

GET /applications/{name}/resource-tree — Returns the current health status of every Kubernetes resource managed by the application. Useful for understanding the scope of a health degradation event — which specific pods, deployments, or services are unhealthy.

GET /projects — Lists all ArgoCD projects. Used to scope metric collection to specific projects (for example, restricting to the production project rather than collecting events from all dev and staging applications).

Webhook alternative for real-time collection

For real-time metric updates rather than polling, ArgoCD supports outbound webhooks. Navigate to ArgoCD Settings → Webhooks and add the Koalr webhook endpoint. The webhook payload includes the full application status object:

app.status.sync.status — current sync state
app.status.health.status — current health state
app.status.operationState.phase — sync operation result (Succeeded, Failed, Error)

ArgoCD triggers webhooks on three configurable events: on-sync-succeeded, on-sync-failed, and on-health-degraded. These map directly to the three DORA event types Koalr tracks — successful deployment, failed deployment, and health degradation.

The ArgoCD events data model

A successful deployment event from GET /applications/{name}/events looks like this:

{
  "type": "Normal",
  "reason": "Sync",
  "message": "Successfully synced (diff: app-diff.yaml)",
  "eventTime": "2026-03-15T14:22:01Z",
  "series": null,
  "firstTimestamp": "2026-03-15T14:22:01Z",
  "lastTimestamp": "2026-03-15T14:22:01Z",
  "count": 1,
  "involvedObject": {
    "kind": "Application",
    "name": "payments-service-production",
    "namespace": "argocd"
  }
}

The three event patterns Koalr uses for DORA signal extraction:

reason: Sync with type: Normal — successful deployment event. Increments deployment frequency. The eventTime timestamp is used as the deployment completion time for lead time calculation.
reason: Sync with type: Warning — sync failed. The sync attempt is counted as a change failure event. Does not increment deployment frequency.
reason: ResourceHealthDegraded — a resource managed by the application has become unhealthy. When this event follows a successful sync within a 5-minute window, Koalr classifies the sync as a change failure. The eventTime of this event starts the MTTR clock.

How Koalr processes ArgoCD data

Koalr ingests ArgoCD data in four steps, combining historical backfill with real-time webhook processing to provide both up-to-date metrics and historical trend data.

Step 1: Application inventory

On initial connection, Koalr calls GET /applications to enumerate all ArgoCD applications and build the mapping from ArgoCD application name to Git repository URL, target revision branch, and destination cluster. This inventory is used to associate sync events with the correct repositories and to apply environment filtering — only applications tagged as production feed into DORA metrics.

Step 2: Historical event backfill

Koalr calls GET /applications/{name}/events for each production application, paginating through the full 90-day event history. This populates historical deployment frequency, lead time, and CFR data from the moment of connection rather than requiring a waiting period for data to accumulate.

Step 3: Real-time webhook processing

After backfill, Koalr switches to webhook-based ingestion. Each sync event, health degradation, and health recovery fires the Koalr webhook endpoint in real time. This keeps deployment frequency counts, CFR, and MTTR current without requiring continuous polling of the ArgoCD API.

Step 4: GitHub commit correlation

The ArgoCD sync payload includes app.status.sync.revision — the Git commit SHA that was synced. Koalr uses this SHA to trace back through the GitHub commit history to the first commit in the change set, computing the full lead time from first commit to ArgoCD sync completion. This correlation is what makes accurate GitOps lead time measurement possible: without the commit SHA from ArgoCD, there is no way to connect a production sync event to the developer who authored the code.

Setting up ArgoCD in Koalr

The setup process requires creating a read-only ArgoCD service account, generating a token, and configuring the connection in Koalr Settings.

Create a read-only service account

argocd account create --account koalr-readonly

Then add the minimum required RBAC permissions to the argocd-rbac-cm ConfigMap in your ArgoCD namespace:

p, role:koalr-readonly, applications, list, */*, allow
p, role:koalr-readonly, applications, get, */*, allow
g, koalr-readonly, role:koalr-readonly

This grants list and get permissions on all applications in all projects. If you want to restrict Koalr to specific projects (for example, only the production project), replace the */* glob with production/*.

Generate a token

argocd account generate-token --account koalr-readonly --expires-in 8760h

The --expires-in 8760h flag sets a one-year expiry. Note this in your credentials rotation schedule — Koalr will surface a warning when the token is within 30 days of expiry.

Connect in Koalr

Navigate to Settings → Integrations → ArgoCD. Enter your ArgoCD server URL and the generated token. Koalr will validate the connection by calling GET /applications and listing the discovered applications for you to review.

Select which ArgoCD projects to monitor. Koalr recommends scoping to production-only initially — staging and dev sync events add noise to deployment frequency counts and dilute CFR signal. You can add additional environments later if you want environment- specific metrics.

After connecting, Koalr displays a webhook URL in the integration settings. Add this URL to ArgoCD Settings → Webhooks to enable real-time sync event delivery. Enable on-sync-succeeded, on-sync-failed, and on-health-degraded triggers.

Scoping recommendation

DORA metrics are production metrics by definition. The DORA research framework measures the performance of changes reaching end users — which means production only. Including staging syncs in deployment frequency counts inflates the number without reflecting delivery to users. Koalr applies production-only scoping by default; staging data is available separately under environment-specific views.

Reading your GitOps DORA data in Koalr

Once ArgoCD is connected and the initial backfill completes (typically 2 to 5 minutes for most teams), the DORA dashboard updates to reflect ArgoCD-sourced deployment data.

Deployment frequency now displays per-service sync frequency rather than PR merges. Each row in the deployment frequency chart corresponds to an ArgoCD application, showing sync counts by day, week, or month. Teams that previously appeared to deploy daily based on PR merge counts often see their actual production sync frequency is lower — the gap is the signal.

Lead time for changes is calculated from first Git commit to ArgoCD sync completion, using the commit SHA from the sync revision payload. Lead time values are typically 15 to 60 minutes longer than commit-to-PR-merge measurements for teams with manual GitOps repo update processes, and 10 to 20 minutes longer for teams with automated image update workflows.

Change failure rate counts sync failures plus sync-then-degrade events per ArgoCD application per period. CFR is displayed both aggregate and per-application — which surfaces which specific services are failing most frequently, rather than the average across all services.

AI Chat can query your ArgoCD data directly. Examples: "Which of our ArgoCD apps had the most sync failures last month?" returns the app list with failure count and most common failure reason. "What is the average lead time for the payments-service this quarter?" traces commit SHAs back through GitHub history and returns the full commit-to-production duration broken down by phase.

Multi-cluster and multi-environment setup

Most production GitOps environments involve multiple ArgoCD instances — one per cluster (dev, staging, prod) or multiple regional production clusters. Koalr supports multiple ArgoCD endpoints and handles the environment scoping at the connection level.

Add each ArgoCD instance separately in Settings → Integrations → ArgoCD. Each connection has its own server URL, token, and environment label. Tag production cluster connections with the production environment label. Staging and dev connections can be added with their respective labels for environment-specific views, but only production-labeled connections feed the primary DORA metrics dashboard.

For teams running multiple regional production clusters (for example, ArgoCD instances in us-east-1 and eu-west-1), add both with the production label. Koalr aggregates sync events across all production-labeled connections for the deployment frequency total, and breaks out per-cluster data in the service detail view.

Token per cluster

Each ArgoCD instance requires its own service account and token — there is no cross-cluster token in ArgoCD. Generate a koalr-readonly account and token in each cluster separately. Token expiry management applies per cluster independently.

The GitOps-specific CFR pattern: sync-then-degrade

Standard DORA change failure rate definitions cover two clear cases: a deploy that causes an immediate outage (detectable via alerting or incident creation) and a deploy that requires a hotfix rollback. Both are straightforward to detect.

GitOps introduces a third pattern that standard CFR tooling misses: a sync operation completes successfully at the ArgoCD level — all resources are applied, the sync phase reports Succeeded — but the application health status degrades within minutes of the sync completing. This is the "bad sync" pattern, and it is specifically a GitOps failure mode.

What causes it: Kubernetes resource application succeeds (the manifest is valid and the API server accepted it), but the resulting pod fails health checks, crashes on startup, or the deployment rollout stalls because the new image cannot be pulled. ArgoCD reports the sync as successful because the manifests were applied, then emits a ResourceHealthDegraded event as the pods fail to start.

Koalr detects this pattern with a 5-minute correlation window: any ResourceHealthDegraded event that occurs within 5 minutes of a successful Sync event on the same application is classified as a change failure, and the sync event is counted in CFR rather than as a clean deployment. The 5-minute window covers the typical Kubernetes pod startup and health check timing; degradations that occur more than 5 minutes after sync are more likely caused by external factors rather than the sync itself.

This pattern is not covered by standard CFR definitions and is largely invisible to teams measuring CFR from incident tickets or PagerDuty alerts — because the sync-then- degrade pattern frequently resolves through Kubernetes self-healing (rollout undo, replica set rollback) before an alert fires. Koalr surfaces it from the ArgoCD event stream regardless of whether a human-visible incident occurred.

Putting it together: GitOps DORA at a glance

DORA Metric	ArgoCD Signal	Wrong signal (common)
Deployment Frequency	`reason: Sync, type: Normal` events on production apps	PR merges to main branch
Lead Time	First commit → `eventTime` of sync completion (via revision SHA)	Commit → PR merge timestamp
Change Failure Rate	Sync failures + sync-then-degrade within 5 min	PagerDuty incidents only
MTTR	`ResourceHealthDegraded` → `Healthy` transition duration	Incident open → incident resolved

The ArgoCD-sourced signals are more precise and more complete than the PR-merge alternatives for every metric. They are also harder to collect — which is why most DORA tooling defaults to the GitHub Events approach despite its inaccuracies. Koalr handles the ArgoCD API integration, event correlation, and commit SHA tracing so you get accurate GitOps DORA metrics without building the pipeline yourself.

GitOps DORA Metrics: How ArgoCD Deployment Events Feed Deployment Frequency and CFR