What is DevSecOps and what does "shift left" actually mean?

DevSecOps is the practice of integrating security testing directly into the development and deployment pipeline — rather than treating security as a post-release gate enforced by a separate team. The "shift left" metaphor refers to moving security checks earlier on the software delivery timeline: from the right side (production, post-deployment review) toward the left side (code review, CI build, pre-merge check).

The goal is to find vulnerabilities when they are cheapest to fix. A vulnerable dependency caught by a Snyk scan during a PR review costs a developer 10 minutes to remediate — upgrade a package, push a new commit, done. The same vulnerability caught after a production breach costs an incident response team hours, involves legal review, requires customer notification under GDPR or SOC 2, and lands as a Change Failure Rate event in your DORA dashboard. The code change is identical. The cost is not.

In practice, shift-left security means: static analysis (SAST) runs in the CI pipeline on every commit; dependency scanning (SCA) runs on every PR that modifies a package manifest; secret detection runs as a pre-commit hook and again in CI; container image scanning runs before every deploy to a production environment. None of these checks require a security team member to take action — they are automated gates built into the same pipeline that runs your unit tests and builds your Docker image.

What DevSecOps adds to DORA is a set of measurable signals that connect security posture to engineering velocity. Without these signals, your DORA Change Failure Rate includes security incidents with no leading indicator — you see the spike after the incident, with no upstream data showing the vulnerability that caused it was detectable three weeks earlier. DevSecOps metrics close that gap.

The 8 DevSecOps metrics to track

1. Mean Time to Remediate (MTTR for security)

Security MTTR is the time elapsed from vulnerability discovery — when a CVE is first detected by your scanning tooling — to when the patched version is deployed to production. This is distinct from general DORA MTTR (time to restore service after an incident); security MTTR measures proactive remediation, not reactive recovery.

Industry targets vary by severity tier. A commonly applied framework, consistent with CISA's Known Exploited Vulnerabilities remediation guidance and FedRAMP requirements:

24 hours

Critical CVEs (CVSS 9.0+)

Actively exploitable or on the CISA KEV catalog. Emergency patch required. Block deploys of affected services until resolved.

7 days

High CVEs (CVSS 7.0–8.9)

Significant risk but not actively exploited. Schedule in next sprint. Surface in team security debt review.

30 days

Medium CVEs (CVSS 4.0–6.9)

Backlog candidate. Track trend. If count is growing across sprints, escalate priority.

90 days

Low CVEs (CVSS 0.1–3.9)

Accepted risk in most contexts. Track for compliance reporting. Remediate opportunistically during dependency upgrades.

Security MTTR is most useful as a trend metric rather than an absolute number. If your team's median MTTR for critical CVEs moves from 18 hours to 72 hours over three months, something in the remediation process has broken — even if 72 hours is within some published benchmark. Trending is the signal; the target is the floor.

2. Vulnerability introduction rate

The vulnerability introduction rate measures net-new CVEs introduced per deployment, sourced from Snyk or Dependabot scanning. A deployment that upgrades five packages and resolves two high CVEs while introducing one medium CVE has a net introduction rate of -1 (net improvement). A deployment that adds a new dependency with three transitive high-severity vulnerabilities has a rate of +3.

This metric is most valuable at the PR level — it answers the question "did this change make our security posture better or worse?" before the merge, not after. Tracking it as a per-deployment aggregate over time gives you a directional view of whether your team is net-reducing or net-increasing vulnerability debt with each release cycle. See Snyk + DORA: How to Track Security Debt as an Engineering Metric for the detailed scoring formula Koalr applies to this signal.

3. Dependency vulnerability debt

Dependency vulnerability debt is the snapshot count of known CVEs present in your production dependencies, broken down by severity tier: critical, high, medium, and low. This is the Snyk dashboard number — but tracking it as a time-series metric rather than a static count changes its meaning from "how bad is it right now" to "is the team reducing debt faster than new CVEs are being introduced."

A healthy engineering organization's dependency vulnerability debt chart looks like a sawtooth with a declining average: CVE counts spike when new vulnerabilities are published against packages you depend on, then drop as the team applies patches. If the sawtooth baseline is flat or rising over quarters, the team is not keeping pace with the rate of CVE disclosure in their dependency ecosystem.

Per-repo and per-team breakdown is more actionable than an org-wide aggregate. A single repository with 40 open high-severity CVEs owned by a team that is underwater on feature work is a prioritization conversation, not a dashboard metric.

4. Secret leak rate

The secret leak rate measures the number of secrets — API keys, tokens, passwords, certificates, private keys — committed to version control per month. Secrets committed to Git are a persistent exposure even after the commit is removed from the main branch: they remain in Git history, in forks, in GitHub search indexes, and potentially in third-party services that scan public repositories.

Detection tooling includes GitLeaks (open source), truffleHog, and GitHub Advanced Security's secret scanning. GitLeaks can be configured as a pre-commit hook, blocking the commit at the developer's workstation before it reaches the remote. GitHub Advanced Security secret scanning runs on every push and alerts within seconds of a secret landing on a branch.

The target is zero. A non-zero secret leak rate is not a performance metric to optimize — it is a process failure. Each leaked secret requires immediate revocation and rotation, a review of access logs to determine if the secret was used by unauthorized parties, and a post-incident review to identify why the pre-commit hook or CI gate failed to catch it. Tracking the rate over time identifies whether the prevention tooling is working and whether specific teams or repositories account for a disproportionate share of leaks.

5. SAST finding rate

The SAST (Static Application Security Testing) finding rate measures security vulnerabilities identified by static analysis tools — SonarCloud, CodeQL, Semgrep — per 1,000 lines of code on changed files. This normalizes the finding count against code volume, making it comparable across repositories of different sizes and between teams with different code output rates.

SAST tools catch a different class of vulnerability than dependency scanners. Where Snyk finds CVEs in packages you import, SAST tools find vulnerabilities in code you write: SQL injection, cross-site scripting (XSS), insecure deserialization, hardcoded credentials, path traversal, and so on. A high SAST finding rate in a repository indicates either a training gap (developers are not applying secure coding patterns) or a legacy codebase that was written before security gates were in place.

SonarCloud exposes SAST findings via its API, segmented into security hotspots (code that requires manual review) and confirmed vulnerabilities (code that is definitively insecure). The new_security_hotspots metric — hotspots on changed code only — is the quality gate signal for PR-level security review. This scope (new code only) prevents legacy technical debt from blocking all new development while still enforcing security standards on code being actively written.

6. License compliance rate

The license compliance rate measures the percentage of open-source dependencies in your codebase that carry licenses compatible with your commercial licensing model. This is a legal risk metric, not a security metric — but it belongs in a DevSecOps dashboard because non-compliant licenses are discovered by the same SCA tooling (Snyk, FOSSA, Black Duck) and require the same remediation workflow (identify, replace, or get legal clearance).

The most common license compliance failures in commercial software: GPL-licensed packages included in proprietary codebases without reciprocal source release obligations being met; AGPL packages in SaaS products (AGPL requires source disclosure when software is used as a network service); and LGPL packages linked in ways that trigger copyleft requirements. Snyk's license compliance scanning identifies these automatically and categorizes each dependency by license type and risk tier.

A target of 100% compliance is appropriate for most commercial software companies. The metric is most useful during dependency addition — catching a GPL package during PR review is trivially cheap; discovering it during a legal audit before an acquisition is not.

7. Security gate pass rate

The security gate pass rate measures the percentage of pull requests that pass all automated security checks — dependency scanning, SAST, secret detection, license compliance — without a manual override by a developer or security team member. This metric captures the effectiveness of your shift-left security tooling as a preventive control, not just a detection control.

A gate pass rate of 85% means 15% of PRs are either failing security checks and being merged anyway (via override) or triggering manual security review. Breaking that 15% down by override reason is more actionable than the rate alone: overrides for "no fix available" (the vulnerable package has no patched version yet) are different from overrides for "low risk, accepted by security team" versus overrides where no reason was recorded — the last category represents process failure.

Teams with high override rates on their security gates typically have one of three problems: gates are too strict (blocking on medium and low CVEs that the team legitimately accepts), gates are not integrated into the review workflow (developers see a red check and merge anyway), or the remediation path is too expensive (fixing the flagged issue requires a major version upgrade with breaking changes). Each problem has a different solution.

8. Penetration test escape rate

The penetration test escape rate measures the percentage of vulnerabilities found during external penetration testing that were not previously caught by automated security scanning in the CI pipeline. This is a lagging indicator — pen tests happen quarterly at most — but it is the ground-truth measure of whether your automated security tooling is actually catching the vulnerabilities that matter.

A high escape rate (pen test finds many issues that automated tooling missed) indicates gaps in your automated security coverage: the scanning tools are not covering the right attack surfaces, their rule sets are outdated, or the application has architectural vulnerabilities that static analysis cannot detect (business logic flaws, authentication design issues, privilege escalation paths). A declining escape rate over successive pen tests indicates that the automated tooling is improving and covering more of the attack surface.

Track pen test findings by category (OWASP Top 10 classification) to identify whether the gaps are concentrated in specific vulnerability classes — this informs which SAST rules or manual review checklists to add to the pipeline.

Integrating security into the DORA pipeline

Change Failure Rate should include security incidents

The DORA definition of Change Failure Rate counts deployments that result in a degraded service or require remediation. Most teams implement this by counting incidents, on-call pages, or rollbacks. Security incidents — a breach, an exploited vulnerability, an unauthorized access event — meet the DORA definition of a deployment failure: they result in degraded service or require an emergency remediation deployment.

The practical implementation is straightforward: your incident management system (PagerDuty, incident.io, Opsgenie) should have a "security incident" category. Incidents in that category are tagged and counted in your Change Failure Rate calculation alongside performance and availability incidents. This makes the security impact on DORA visible — and creates the feedback loop that motivates investment in shift-left security tooling.

Lead time for security fixes

General DORA Lead Time measures the time from code commit to production deployment. Security lead time is a distinct variant: the time from vulnerability discovery to patched deployment. These are different workflows — security patches often bypass normal sprint planning, require expedited review, and may involve coordinating with external package maintainers — and they should be tracked separately.

Tracking security lead time over time reveals whether your emergency remediation process is improving. A team that takes 72 hours to deploy a critical CVE patch in Q1 and 18 hours in Q3 has meaningfully improved its security response capability — even if their regular deployment lead time has not changed. This improvement is invisible if you only track general lead time.

Deployment frequency gate: blocking deploys on critical CVEs

The most direct integration of security into the DORA pipeline is a deployment frequency gate: a CI/CD check that blocks deployments introducing net-new critical CVEs. This is implemented as a pipeline step that runs snyk test --json against the build artifact, parses the output for critical-severity findings, and fails the build if any are present.

# GitHub Actions: Snyk security gate
- name: Snyk security scan
  run: |
    snyk test --json --severity-threshold=critical > snyk-output.json || true
    CRITICAL_COUNT=$(cat snyk-output.json | jq '[.vulnerabilities[] | select(.severity == "critical")] | length')
    if [ "$CRITICAL_COUNT" -gt "0" ]; then
      echo "FAIL: $CRITICAL_COUNT critical CVEs found. Deployment blocked."
      exit 1
    fi
  env:
    SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

Note that the gate threshold should match your remediation SLA. If your policy requires critical CVEs resolved within 24 hours, the gate blocks any deployment with an unresolved critical CVE older than 24 hours — not just newly introduced ones. This prevents a pattern where a team defers a critical CVE indefinitely by never touching the affected package.

Snyk integration for DevSecOps

Snyk is the most widely adopted SCA (Software Composition Analysis) tool for engineering teams and provides the richest API for pulling vulnerability data into a DORA dashboard. The key endpoints for DevSecOps metric collection:

# Vulnerability count by severity (current snapshot)
GET /orgs/{org_id}/issues?filter[severity][]=critical&filter[severity][]=high&filter[status]=open

# Per-project health score (0–100)
GET /orgs/{org_id}/projects/{project_id}
# Response includes: attributes.health_score (Snyk's composite security score)

# PR-level vulnerability delta (via Check Run result)
GET /repos/{owner}/{repo}/check-runs?check_name=Snyk
# Parse conclusion and output.summary for new CVE count

Snyk's project health score (0–100) is a composite metric that accounts for vulnerability count, severity, age, and fixability. A health score below 70 indicates significant unresolved vulnerability debt. Tracking health scores by repository over time provides a single number for executive reporting without losing the per-CVE detail that engineering teams need for remediation planning.

The PR-level vulnerability delta — "this PR introduced 2 new high CVEs" — is the most actionable DevSecOps signal for developers. It surfaces at review time, when the change is still cheap to make, and it names the specific CVEs and the packages that introduced them. Koalr surfaces this in the Snyk integration as part of the deploy risk score on every PR. For the full breakdown of how these signals are weighted, see Snyk + DORA: How to Track Security Debt as an Engineering Metric.

GitHub Advanced Security for DevSecOps

GitHub Advanced Security (GHAS) provides three native security capabilities that integrate directly with the GitHub Pull Request workflow — no external tool required if your code is on GitHub.

Code scanning with CodeQL

CodeQL is GitHub's semantic code analysis engine. Unlike pattern-matching SAST tools, CodeQL builds a database of your code's data flows and queries it for security vulnerabilities: does user-controlled input ever reach a SQL query without sanitization? Does a deserialization call operate on untrusted data? CodeQL outputs findings in SARIF (Static Analysis Results Interchange Format), which GitHub renders in the Security tab and as PR annotations directly on the vulnerable line of code.

Code scanning alerts are queryable via the GitHub API, making them a pull target for DORA dashboards:

# List open code scanning alerts
GET /repos/{owner}/{repo}/code-scanning/alerts?state=open&severity=critical,high

# Alert shape includes:
# number, state, severity, rule.id, rule.name,
# most_recent_instance.location (file + line),
# created_at, updated_at

Secret scanning

GitHub secret scanning runs on every push to any branch in a GHAS-enabled repository. It detects over 200 secret patterns from providers including AWS, Azure, Google Cloud, Stripe, Twilio, and GitHub itself. When a secret is detected, GitHub immediately alerts the repository owner and, for participating secret providers, sends an automatic revocation request to the provider — the secret is invalidated before most humans would even see the alert.

Secret scanning alerts are queryable for metric collection, giving you the data required to calculate secret leak rate:

GET /repos/{owner}/{repo}/secret-scanning/alerts?state=open
# For resolved (already remediated) secrets:
GET /repos/{owner}/{repo}/secret-scanning/alerts?state=resolved&resolution=revoked

Dependabot security updates

Dependabot generates automated pull requests when a dependency has a known CVE and a patched version is available. These PRs include the CVE details, severity, and the version upgrade required to resolve it. For DevSecOps metric tracking, Dependabot PRs are a signal in themselves: a repository with 15 open Dependabot security PRs that have been open for 30+ days is accumulating vulnerability debt faster than the team is processing remediation work.

Tracking the Dependabot PR merge rate — percentage of Dependabot security PRs merged within the target MTTR window — gives you a process health metric that is distinct from vulnerability count. A team with excellent Dependabot merge rate but worsening vulnerability count is being outpaced by CVE disclosure rates in their dependency ecosystem, which is a different problem (dependency selection strategy) than a team with a poor Dependabot merge rate (prioritization problem).

SonarCloud security metrics

SonarCloud adds SAST coverage to the DevSecOps stack. Its security-relevant metrics surface in PR quality gates and via its REST API.

Security hotspots and the security rating

SonarCloud distinguishes between security hotspots (code that requires human review to determine if it is vulnerable — it may or may not be, depending on context) and vulnerabilities (code that is definitively insecure). This distinction matters for metric tracking: hotspot count is a review burden metric, vulnerability count is a confirmed risk metric.

The SonarCloud security rating (A through E) maps to vulnerability counts:

Rating	Condition	Action
A	0 vulnerabilities	No action required
B	At least 1 minor vulnerability	Schedule remediation
C	At least 1 major vulnerability	Prioritize in current sprint
D	At least 1 critical vulnerability	Block deployment
E	At least 1 blocker vulnerability	Emergency remediation required

Quality gate: new code only

The most effective SonarCloud quality gate configuration for DevSecOps focuses on new code — code changed in the current PR — rather than the entire codebase. This prevents legacy security debt from blocking all new development while still enforcing security standards on code being actively written. The key quality gate condition is new_security_hotspots: if any new security hotspots are introduced in this PR, the gate fails and the PR is blocked until the hotspot is either reviewed and marked safe (with justification) or remediated.

This configuration embeds security review into the code review workflow without creating a separate security review queue. The developer who wrote the code is presented with the hotspot inline in the PR, reviews it in context, and either fixes it or documents why it is acceptable. This is the operational definition of shift-left security: the review happens at the cheapest possible moment.

Quantifying security debt: the business case

Remediation cost by severity tier

Security debt has a dollar cost that engineering teams rarely make explicit in prioritization conversations. Research from the Ponemon Institute and IBM's Cost of a Data Breach Report provides the underlying numbers; the following framework translates severity tiers into remediation cost estimates:

~$1,500

Low severity CVE

Developer time to identify, upgrade package, test, and deploy. No incident involved.

~$5,000

High severity CVE

Expedited triage, security review, potential breaking change to remediate, extended testing cycle.

~$15,000

Critical CVE — proactive fix

Emergency patch cycle, on-call involvement, incident coordination, post-fix verification across environments.

$50,000+

Critical CVE — post-breach

Incident response, forensics, legal, customer notification, regulatory reporting, reputation cost. Can reach millions for material breaches.

The security debt ratio for an organization is: open critical CVE count times the average cost of proactive critical remediation ($15,000), plus open high CVE count times $5,000. This gives the minimum expected remediation cost assuming no exploitation — a floor, not a ceiling. If a critical CVE is exploited, the cost floor jumps to $50,000+ per incident.

The IBM shift-left research quantifies the multiplier directly: a vulnerability found and fixed during code review costs 1× (baseline). The same vulnerability found in QA costs 5× — testing overhead, delay, redeployment. Found in production but before exploitation: 15×. Found after exploitation: 30×+. Shift-left security is not a compliance exercise — it is a cost reduction program with a measurable ROI.

Compliance-as-code metrics

For organizations subject to SOC 2, HIPAA, PCI-DSS, or FedRAMP, compliance requirements map directly to measurable engineering metrics. The "compliance-as-code" approach codifies policy requirements as automated checks that run in CI/CD pipelines and produce measurable pass/fail rates.

Policy violation rate (OPA/Gatekeeper)

Open Policy Agent (OPA) with Kubernetes Gatekeeper enforces security policies at admission time — before a workload can be deployed to the cluster. Policies can require that containers do not run as root, that all images come from approved registries, that resource limits are set, or that specific security contexts are configured. The policy violation rate measures how often deployment attempts are blocked by OPA policies, indicating misconfigurations in infrastructure-as-code being caught pre-production rather than discovered in a security audit.

Compliance scan pass rate

Tools like Chef InSpec, Prowler (AWS), or Security Command Center (GCP) run compliance scans against your cloud infrastructure and produce pass/fail results against specific control frameworks. Tracking the compliance scan pass rate over time — percentage of controls passing — gives a continuous compliance posture metric rather than a point-in-time audit finding. A declining pass rate in the weeks before an audit is an early warning signal, not a surprise.

Infrastructure drift detection

Infrastructure drift — the state of running infrastructure diverging from what is defined in Terraform or CloudFormation — is a security risk when the drift represents configuration weaknesses introduced outside the IaC pipeline (manual console changes, emergency patches that were never codified). Terraform's terraform plan output in CI provides a drift detection signal: ifplan shows changes that were not intentionally made, drift exists. Measuring the drift detection rate — how often plan detects unexpected changes — over time indicates whether your infrastructure governance process is working.

Making DevSecOps metrics actionable in your team's workflow

The metric collection described above is only useful if it changes behavior. The behavioral change DevSecOps metrics are designed to produce is specific: developers review security signals at PR time rather than ignoring them; teams prioritize vulnerability remediation in sprint planning rather than deferring it indefinitely; engineering managers see security debt alongside velocity metrics rather than receiving a separate quarterly security report.

The most effective integration point is the pull request review. Surfacing the vulnerability delta, the open critical count, and the SAST findings directly in the PR — not in a separate Snyk dashboard or SonarCloud report — puts the information in front of the developer at the moment of maximum leverage. This is what Koalr's Snyk integration does: it pulls the three most predictive vulnerability signals from Snyk and incorporates them into the deploy risk score visible on every PR, alongside the code-structure signals (change size, author expertise, review coverage) that the team already uses to assess deployment risk.

The result is a unified pre-deployment signal: not "this PR is large and touches a critical path" and separately "this PR introduces a high CVE (check Snyk)," but a single risk score that accounts for both — with the contributing signals broken down in a panel the developer sees before clicking merge. Security becomes part of the engineering team's risk vocabulary, not a separate audit function that shows up after the fact.

For teams beginning the DevSecOps journey, the practical starting point is two metrics: security MTTR for critical CVEs and the vulnerability introduction rate per PR. These two metrics, tracked weekly and reviewed in sprint retrospectives, produce the behavioral change that makes all the other metrics improve. MTTR creates urgency around remediation; introduction rate creates awareness at the moment of writing code. Everything else — SAST finding rate, license compliance, pen test escape rate — is refinement once those two fundamentals are in place.

See also: How Dependency Vulnerabilities Affect Deployment Risk Scores for a deep dive into how Koalr incorporates open CVEs as a leading indicator in its deploy risk model.

DevSecOps Metrics: How to Measure Security as Part of Your DORA Pipeline