The Platform Engineering Maturity Model: From Manual Deploys to Risk-Gated Automation
Platform engineering has a natural progression — from ad hoc manual processes through increasing automation, visibility, and intelligence. Most organizations know they are somewhere on this spectrum but are not sure exactly where, or what the next level looks like in practice. This maturity model gives you a clear framework for assessing where your team is and what investments to prioritize.
Where most teams fall
The 2025 DORA survey found: Level 1: 8% of teams, Level 2: 31%, Level 3: 38%, Level 4: 18%, Level 5: 5%. Most engineering organizations are at Level 3 — measuring DORA metrics but not yet using them to gate deployments automatically.
Level 1: Manual Deploys
Definition: Deployments require significant manual steps. SSH into servers, run deployment scripts manually, coordinate across team members via Slack or email. No automated deployment pipeline exists.
Characteristics:
- Deployment is a scheduled event, not a routine operation
- Only specific engineers have the knowledge and access to deploy
- Deployment failures require the same engineers to diagnose and fix
- No standardized rollback procedure
- Deployment frequency: monthly or less
Risks at this level: The knowledge required to deploy is concentrated in a few individuals, creating a bus factor risk. Deployments are inherently high-stakes events because they are infrequent and complex. Human error in the manual process is a significant failure mode.
Move to Level 2 by: Building a CI/CD pipeline that automates at least the build and test phases. Even a partially automated pipeline that requires manual approval before production is a significant improvement.
Level 2: CI/CD Pipeline
Definition: Code changes trigger an automated pipeline that builds, tests, and (with some form of approval) deploys to production. The deployment process is codified and repeatable.
Characteristics:
- Automated build and test on every PR and every merge to main
- Deployment to staging is automatic on merge
- Deployment to production requires a manual trigger or approval
- Rollback is possible (revert commit, re-deploy prior artifact)
- Deployment frequency: weekly to daily
The limitation: At Level 2, the pipeline knows whether the build succeeded and tests passed. It does not know whether the change is safe to deploy. The deployment decision is still entirely human, based on intuition rather than data.
Move to Level 3 by: Instrumenting DORA metrics. You cannot improve what you do not measure. Add deployment event tracking, incident attribution, and lead time calculation to give leadership and teams visibility into their delivery performance.
Level 3: DORA Metrics
Definition: The organization measures deployment frequency, lead time for changes, change failure rate, and MTTR. These metrics are reviewed regularly and used to identify improvement opportunities.
Characteristics:
- All four DORA metrics calculated and dashboarded
- Metrics reviewed in engineering leadership meetings
- Improvement initiatives tied to specific metric targets
- Incident attribution to deployments is at least partially automated
- Deployment frequency: daily to multiple times per day
The limitation: DORA metrics are outcome metrics — they tell you what happened after the fact. A 12% change failure rate tells you that something went wrong in 12% of your deployments last quarter. It does not tell you which specific open PR is most likely to become next quarter's incident.
Move to Level 4 by: Adding pre-deployment risk scoring. The transition from Level 3 to Level 4 is the transition from retrospective measurement to predictive risk management.
Level 4: Risk-Scored Deploys
Definition: Every deployment is scored for risk before it merges and deploys. The score is used to route high-risk changes to additional review, adjust deployment timing, and provide deployment windows appropriate to the change's risk profile.
Characteristics:
- Every PR receives a risk score (0–100) before merge
- Risk score incorporates: author file expertise, change size and entropy, coverage delta, DDL detection, review thoroughness, deployment timing
- High-risk PRs are automatically routed to additional reviewers
- CODEOWNERS enforcement is automated and tracked
- Deployment frequency: multiple times per day
- CFR: typically lower than Level 3 — risk scoring catches problems before deploy
The key shift: At Level 4, the platform treats different changes differently based on their risk profile. A one-line config update and a 500-line cross-service refactor do not have the same review requirements or deployment approval process.
Move to Level 5 by: Making the risk-gating fully automated. At Level 4, humans still make the final deployment decision informed by the risk score. At Level 5, the pipeline makes that decision autonomously within defined parameters.
Level 5: Autonomous Risk-Gated Pipeline
Definition: The deployment pipeline makes autonomous deployment decisions for low-risk and medium-risk changes. High-risk changes are automatically blocked and routed to human review. The system learns from outcome data to improve its risk predictions over time.
Characteristics:
- Low-risk changes (score 0–40) deploy automatically on merge, without human approval
- Medium-risk changes (40–70) deploy after a minimum review time and CODEOWNERS approval, without additional human escalation
- High-risk changes (70+) are automatically blocked at the GitHub Check Run level and require explicit engineering lead sign-off
- Risk model is updated with outcome data — when a high-score PR causes an incident, that outcome is fed back to improve future predictions
- SLO burn rate is incorporated in real-time: elevated burn rate pauses routine deployments
- Deployment frequency: continuous (limited only by CI time and risk score)
| Level | Deploy Frequency | Typical CFR | Key Capability Gap |
|---|---|---|---|
| 1: Manual | Monthly | > 20% | No automated pipeline |
| 2: CI/CD | Weekly | 10–20% | No visibility into outcomes |
| 3: DORA Metrics | Daily | 7–15% | No predictive risk signal |
| 4: Risk Scoring | Multiple/day | 3–8% | Human still approves everything |
| 5: Autonomous | Continuous | < 3% | — |
Where Most Teams Should Focus
The most impactful transition in this model is from Level 3 to Level 4. The majority of engineering teams have invested in CI/CD and have at least some DORA metric visibility. The move to risk-scored deployments is where the relationship between measurement and action becomes direct — where the data you have been collecting starts preventing incidents rather than just describing them.
Level 5 is achievable but requires significant trust in the risk model. Teams typically reach it by starting with Level 4 for 6–12 months, building confidence in the score accuracy through outcome tracking, and gradually expanding the set of changes that deploy without additional human review.
Koalr takes you from Level 3 to Level 4
If you are already measuring DORA metrics, Koalr adds the deploy risk scoring layer — scoring every PR, routing high-risk changes, and posting check runs to GitHub that block or warn on high-risk merges. The transition from measurement to prevention in one platform.
Move to Level 4: risk-scored deployments
Koalr adds deploy risk scoring on top of your existing CI/CD and DORA metrics infrastructure — no rearchitecting required. Connect GitHub in 5 minutes and start scoring PRs immediately.