Kubernetes Deployment Metrics: How to Track DORA from K8s Events
Kubernetes is now the default deployment substrate for most engineering teams — but it actively works against DORA measurement. A kubectl apply is not tied to a GitHub commit. An ArgoCD sync is not tied to a PR merge. Without deliberate instrumentation, your deployment frequency, lead time, change failure rate, and MTTR numbers are either wrong or missing entirely. This guide covers the four event sources that give you accurate Kubernetes DORA metrics and how to use each one.
K8s abstracts the deploy from the code
In a Kubernetes environment, the artifact that reaches production is a container image — not a commit or a PR. The path from commit to running pod passes through a CI pipeline, an image registry, a CD tool, and Kubernetes itself. Each hop breaks the direct commit-to-deploy correlation that simpler DORA tooling assumes. Accurate K8s DORA measurement requires capturing events at the pod or rollout level, then tracing backwards to the originating commit.
Why Kubernetes creates DORA measurement challenges
Traditional DORA tooling was built for a simpler deployment model: a merge to the main branch triggers a pipeline that pushes code directly to a server. The commit is the deployment. In this model, counting deployments means counting merges, and lead time is commit-to-merge.
Kubernetes breaks both assumptions. Between a developer pushing a commit and that code reaching a production pod, the following steps occur: a CI pipeline builds a container image and pushes it to a registry; a CD tool (ArgoCD, Flux, Spinnaker, or a plain CI step) applies a Kubernetes manifest to the cluster; the Kubernetes scheduler places pods on nodes; the kubelet pulls the image and starts containers; readiness probes pass; the Deployment controller marks the rollout complete. Any of these steps can fail independently, and none of them are visible from GitHub Events alone.
The most common result is that teams using Kubernetes measure DORA metrics from the wrong signal — PR merges to the application repository — and end up with deployment frequency counts that are inflated (counting merges that never reached production) and lead time values that are systematically too short (cutting off the measurement before the image build, registry push, and rollout phases that commonly add 15 to 60 minutes).
Change failure rate is even more problematic. A Kubernetes deployment can succeed at the manifest application level — the API server accepted the new Deployment spec — while the pods immediately enter CrashLoopBackOff. Standard CFR tools that only watch for explicit rollback events miss this entirely. And MTTR measurements that rely on incident ticket open/close times miss the many incidents that are resolved through automated Kubernetes self-healing before a human opens a ticket.
The four K8s deployment event sources for DORA
There is no single universal Kubernetes deployment event. Which event source gives you accurate DORA data depends on how your team deploys. Four sources cover the majority of Kubernetes deployment patterns.
1. GitHub Deployments API
The GitHub Deployments API is a first-party mechanism for recording that a specific commit was deployed to a specific environment. Your CD pipeline (or a post-deploy hook) posts a deployment event on each successful rollout, attaching the commit SHA, the environment name, and a status of success or failure.
This approach works with any Kubernetes tooling because the GitHub Deployment event is created by your pipeline, not by Kubernetes itself. It integrates with GitHub status checks, shows deployment history per environment in the GitHub UI, and gives DORA tools a clean, standardized event stream to consume.
The tradeoff: it requires your CD pipeline to post the event explicitly. It does not happen automatically. Teams often implement this as the last step in a GitHub Actions workflow or Tekton pipeline, after confirming the Kubernetes rollout completed successfully.
Example GitHub Actions step to post a deployment event after a successful rollout:
- name: Create GitHub Deployment
uses: chrnorm/deployment-action@v2
with:
token: ${{ secrets.GITHUB_TOKEN }}
environment: production
ref: ${{ github.sha }}
- name: Wait for rollout
run: kubectl rollout status deployment/api -n production --timeout=300s
- name: Update deployment status
uses: chrnorm/deployment-status@v2
with:
token: ${{ secrets.GITHUB_TOKEN }}
deployment-id: ${{ steps.deploy.outputs.deployment_id }}
state: success
environment-url: https://api.example.comDORA tooling that reads the GitHub Deployments API sees a clean commit SHA paired with an environment and a success/failure status — exactly what is needed for deployment frequency and lead time calculation.
2. ArgoCD sync events
For teams using ArgoCD, the sync event is the deployment event. Each time ArgoCD successfully syncs an Application to the cluster, it emits a Kubernetes event with reason: Sync and type: Normal. A failed sync emits type: Warning.
ArgoCD sync events are the most precise Kubernetes deployment signal available for GitOps teams because they reflect the actual state of the cluster — not what was pushed to Git, and not what the CI pipeline attempted. A sync event only fires when Kubernetes has accepted the new manifests and the ArgoCD operation completed.
The Application CRD carries everything needed for DORA measurement: app.status.sync.revision contains the Git commit SHA that was synced; app.status.operationState.phase indicates Succeeded or Failed; app.status.health.status indicates whether the resulting pods are Healthy or Degraded.
For a deeper walkthrough of ArgoCD-specific DORA instrumentation, see the ArgoCD DORA metrics guide.
3. Flux GitOps reconcile events
Flux uses two primary custom resources for deployment: Kustomization (for raw manifests or Kustomize overlays) and HelmRelease (for Helm charts). Both resources emit Kubernetes events on each reconcile cycle.
A successful Kustomization reconcile sets the resource condition Ready=True and emits a ReconcileSucceeded event. A failed reconcile sets Ready=False and emits ReconcileFailed. The HelmRelease resource follows the same pattern with InstallSucceeded and UpgradeSucceeded events for new installs and upgrades respectively.
Flux's Notification Controller can route these events to external webhook endpoints, making it straightforward to push reconcile events to a DORA metrics platform in real time.
4. Custom deployment webhook
For teams not using ArgoCD or Flux — running plain Helm, Kustomize via CI, or a bespoke deployment tool — the most reliable approach is emitting a structured webhook event on each rollout completion. This can be triggered from a Kubernetes event watcher that monitors Deployment objects for rollout completion, or as an explicit step at the end of any CD pipeline.
The webhook approach is the most portable: it works with any Kubernetes tooling and does not depend on a specific GitOps controller being present. The tradeoff is that you own the instrumentation code.
Kubernetes deployment event schema
Regardless of which event source you use, the deployment event payload should capture a consistent set of fields to support all four DORA metrics. A complete deployment event looks like this:
{
"service": "api",
"environment": "production",
"sha": "abc123def456",
"image_tag": "v1.2.3",
"deployed_at": "2026-03-16T10:00:00Z",
"deployed_by": "argocd-sync",
"rollout_duration_seconds": 145,
"replicas_updated": 5,
"previous_sha": "def456abc789"
}Each field serves a specific purpose. sha and previous_sha enable lead time calculation — the commit timestamp of sha is the end of the lead time window, and previous_sha lets you determine which commits are new in this deployment. rollout_duration_seconds is the time the Kubernetes Deployment rollout took to complete, a useful operational metric separate from DORA lead time. replicas_updated distinguishes a full rollout (all replicas on new image) from a partial rollout.
Deployment frequency from Kubernetes
Deployment frequency counts successful rollout completions per service per time period. The key word is successful — an attempted deployment that ends in rollback or a failed health check does not increment deployment frequency; it contributes to change failure rate.
From the command line, a successful rollout is indicated by a zero exit code from:
kubectl rollout status deployment/api -n productionFor ArgoCD, the equivalent check is both conditions being true simultaneously:
Application.status.sync.status == "Synced"
Application.status.health.status == "Healthy"A sync that reaches Synced but leaves the application in Degraded health is not a successful deployment — it is a change failure. Both conditions must be true.
HPA scaling events are not deployments — filter them out
Horizontal Pod Autoscaler scaling events change the number of running pods without changing the pod spec or the container image. An HPA scale-up creates new pods from the same image that is already running — it is not a deployment. Deployment frequency tracking must filter for pod spec changes only: events where .spec.template changed (i.e., a new image tag, updated environment variables, or modified resource limits). Replica count changes alone should be excluded. Most GitOps tools handle this correctly because they only create sync events when the Git source changes; raw Kubernetes event watchers need explicit filtering.
Lead time from Kubernetes — connecting commits to deploys
Lead time requires connecting a deployment event to the Git commit that triggered it. Kubernetes does not store this connection natively — a pod's spec contains an image reference, not a Git SHA. The connection must be recovered by parsing the image tag.
The most reliable approach is a consistent image tagging convention that embeds the Git SHA. The recommended format:
ghcr.io/org/api:sha-abc123def456With this convention, extracting the SHA from a running pod is straightforward:
kubectl get pod api-7d8b9f-xkz2p \
-o jsonpath='{.spec.containers[0].image}'
# → ghcr.io/org/api:sha-abc123def456Parse the sha- prefix off the tag to recover the Git SHA. Then query the GitHub API for the commit timestamp at that SHA. Lead time is:
lead_time = deploy_timestamp - commit_timestampFor multi-commit changes, use the timestamp of the earliest commit in the changeset (from the previous deployment SHA to the current one) as the start of the lead time window — this gives you the full commit-to-production duration for the longest-waiting change, which is the correct DORA definition.
Teams using semantic version tags (v1.2.3) without an embedded SHA need an additional lookup step: query the container registry for the image manifest, then look up the build that produced that image in the CI system to recover the source SHA. This works but adds pipeline complexity. SHA-tagged images are strongly recommended for any team that wants accurate lead time measurement.
Change failure rate from Kubernetes
Three Kubernetes signals contribute to change failure rate: deployment-correlated incident alerts, rollback events, and post-deploy pod health degradation.
Deployment-correlated incidents
Correlate PagerDuty or Alertmanager alerts with deployment events using a time window. An alert that fires within 30 minutes after a deployment to the same service is a strong CFR signal. Koalr uses a ±30 minute window for this correlation by default, which captures the vast majority of deployment-triggered incidents while excluding unrelated alerts.
Rollback events
A kubectl rollout undo is the strongest possible CFR signal — it means the team explicitly decided the deployment was bad enough to revert. Kubernetes records this as a new Deployment revision with the previous pod spec. Watch for Deployment ReplicaSet changes that restore an older revision (the kubernetes.io/change-cause annotation often contains "rollback" for these events).
For ArgoCD rollbacks:
argocd app rollback <app-name> <revision>ArgoCD records this as a sync to a previous revision, distinguishable from a forward sync by the operation annotation.
Post-deploy pod health degradation
Two Kubernetes health signals are leading indicators of a bad deployment even before a human-visible incident occurs:
CrashLoopBackOff spikes post-deploy. When a new deployment rolls out and pods immediately enter CrashLoopBackOff, the application is failing to start. Watch for pods in this state on the Deployment selector within 5 minutes of a rollout completing.
OOMKilled events post-deploy. A surge of OOMKilled pod events after a deployment indicates the new version has a memory regression. The container exceeded its memory limit and was terminated by the kernel. This frequently does not trigger a PagerDuty alert immediately (the pod restarts, traffic is served by other replicas) but it will eventually cascade into service degradation.
Both patterns can be detected from the Kubernetes Events API:
kubectl get events -n production \
--field-selector reason=OOMKilling \
--sort-by='.lastTimestamp'MTTR from Kubernetes
Mean time to recovery in a Kubernetes environment has a clear start and end signal that is independent of incident ticketing systems.
Incident start: the moment an alert fires in PagerDuty, Datadog, or Alertmanager — or the moment pods enter a degraded state post-deploy (CrashLoopBackOff, OOMKilled, failed readiness probes).
Recovery: the moment a rollback completes (kubectl rollout undo exits zero) or a new fix deployment reaches healthy pods. The Kubernetes recovery signal is unambiguous:
kubectl get deployment api -n production \
-o jsonpath='{.status.availableReplicas}'When availableReplicas equals spec.replicas, the service is fully recovered. MTTR is the duration from incident start to this timestamp.
Kubernetes MTTR is often shorter than incident-ticket-based MTTR because Kubernetes self-healing (automatic pod restarts, replica set rollback) resolves many incidents before a human manually closes a ticket. This is a real improvement in recovery speed — measure it.
ArgoCD-specific DORA instrumentation — the recommended GitOps approach
For teams running ArgoCD, the recommended instrumentation uses ArgoCD Notifications to emit deployment events in real time. ArgoCD Notifications is a built-in controller that watches Application status and triggers webhooks on configurable transitions.
The three notification triggers needed for complete DORA coverage:
on-sync-succeeded— fires whenapp.status.operationState.phase == Succeeded. This is the deployment frequency event.on-sync-failed— fires whenapp.status.operationState.phase == Failed. This is a change failure event.on-health-degraded— fires whenapp.status.health.status == Degraded. When this follows anon-sync-succeededwithin 5 minutes, it upgrades that sync to a change failure and starts the MTTR clock.
Add the notification triggers and a Koalr webhook template to the ArgoCD Notifications ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
trigger.on-sync-succeeded: |
- when: app.status.operationState.phase in ['Succeeded']
send: [koalr-deployment-webhook]
trigger.on-sync-failed: |
- when: app.status.operationState.phase in ['Failed', 'Error']
send: [koalr-deployment-webhook]
trigger.on-health-degraded: |
- when: app.status.health.status == 'Degraded'
send: [koalr-deployment-webhook]
template.koalr-deployment-webhook: |
webhook:
koalr:
method: POST
path: /api/integrations/argocd/webhook
body: |
{
"app": "{{.app.metadata.name}}",
"revision": "{{.app.status.sync.revision}}",
"phase": "{{.app.status.operationState.phase}}",
"health": "{{.app.status.health.status}}",
"sync_started_at": "{{.app.status.operationState.startedAt}}",
"sync_finished_at": "{{.app.status.operationState.finishedAt}}"
}Add the Koalr webhook destination to the ArgoCD Notifications Secret:
apiVersion: v1
kind: Secret
metadata:
name: argocd-notifications-secret
namespace: argocd
stringData:
notifiers.yaml: |
webhook:
koalr:
url: https://api.koalr.com/integrations/argocd/webhook
headers:
- name: Authorization
value: Bearer <your-koalr-api-key>This configuration emits a structured event to Koalr on every sync outcome and every health degradation — providing real-time deployment frequency, change failure rate, and MTTR data without any polling.
Flux-specific DORA instrumentation
Flux uses its Notification Controller to route reconcile events to external webhooks. The two resources to watch for deployment events are Kustomization and HelmRelease.
Check Kustomization status conditions to determine deployment success:
kubectl get kustomization api -n flux-system -o jsonpath='{.status.conditions}'A condition with type: Ready and status: True means the reconcile succeeded. The condition message includes the revision (Git SHA or semver tag) that was applied.
Configure the Flux Notification Controller to route events to Koalr. Create an Alert resource:
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: koalr-dora
namespace: flux-system
spec:
providerRef:
name: koalr-webhook
eventSeverity: info
eventSources:
- kind: Kustomization
namespace: flux-system
matchLabels:
environment: production
- kind: HelmRelease
namespace: flux-system
matchLabels:
environment: productionPair this with a Provider resource pointing to the Koalr webhook endpoint:
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
name: koalr-webhook
namespace: flux-system
spec:
type: generic
address: https://api.koalr.com/integrations/flux/webhook
secretRef:
name: koalr-webhook-secretFlux emits ReconcileSucceeded and ReconcileFailed events that map directly to successful deployment and change failure signals for DORA.
For CLI-based event monitoring during incident response:
flux get kustomizations --watchCommon Kubernetes DORA pitfalls
| Pitfall | What goes wrong | Fix |
|---|---|---|
| Measuring kubectl apply | Apply can succeed while pods crash — counts bad deploys as successes | Wait for rollout status exit code 0 before counting |
| Counting HPA scale events | Inflates deployment frequency with autoscaling noise | Filter for pod spec changes only; ignore replica count changes |
| Missing canary deployments | Partial rollouts are counted before they reach 100%, skewing frequency | Count as deployment only when canary weight reaches 100% |
| Mixing staging deploys into DORA | Staging deploy frequency inflates the number; staging failures inflate CFR | Scope all DORA metrics to production environment label only |
| Using semver tags only | Cannot trace image tag → Git SHA → commit timestamp for lead time | Use SHA-tagged images: sha-abc123 suffix convention |
| Relying on incident tickets | Self-healing deployments that crash and recover are invisible to ticketing systems | Add pod health event watchers for CrashLoopBackOff and OOMKilled post-deploy |
DORA metric sources by K8s tooling
| DORA Metric | ArgoCD | Flux | GitHub Deploys |
|---|---|---|---|
| Deployment Frequency | Sync + Healthy events | ReconcileSucceeded | Deployment status success |
| Lead Time | sync.revision SHA → commit timestamp | Condition message SHA → commit timestamp | Deployment ref SHA → commit timestamp |
| Change Failure Rate | Sync failed + sync-then-degrade (5 min window) | ReconcileFailed + health events | Deployment status failure + alerts |
| MTTR | Degraded → Healthy timestamp delta | Failed condition → Ready=True delta | Alert start → availableReplicas restored |
How Koalr handles Kubernetes DORA measurement
Koalr's ArgoCD integration reads sync events directly from the ArgoCD Application CRD and Notifications controller, correlating each sync to its commit SHA for lead time calculation. The GitHub integration reads the GitHub Deployments API for teams using that approach, and correlates deployment events with pull requests and commit history to compute full commit-to-production lead time.
Koalr applies the correct filters automatically: HPA scaling events are excluded from deployment frequency counts, staging and development environment events are separated from production DORA data, and the sync-then-degrade CFR pattern is detected using the 5-minute post-sync health window.
The result is that DORA metrics in Koalr reflect what is actually happening in your Kubernetes production environment — not what Git Events suggest happened, and not what a CI pipeline reported it attempted.
Further reading
For a deeper dive into GitOps-specific DORA measurement with ArgoCD — including the sync-then-degrade CFR pattern, multi-cluster setup, and the full ArgoCD API reference — see the ArgoCD DORA metrics guide. For a foundational overview of all four DORA metrics and how to interpret your scores, see the DORA metrics guide.
Connect your Kubernetes environment to Koalr
Koalr's ArgoCD and GitHub Deployments integrations give you accurate Kubernetes DORA metrics from day one — deployment frequency from actual rollout completions, lead time from commit SHA tracing, CFR from sync failures and post-deploy health degradation, and MTTR from pod recovery signals. Setup takes under 10 minutes.