DORA Metrics for Bitbucket Cloud Teams in 2026
Bitbucket Cloud gives you repositories, pipelines, and pull requests. It does not give you DORA metrics. Here is exactly how to calculate all four DORA metrics from Bitbucket data — which APIs to use, what the gaps are, and how to fill them.
Why DORA metrics matter for Bitbucket teams
The DORA (DevOps Research and Assessment) metrics — deploy frequency, lead time for changes, change failure rate, and mean time to recovery — are the most widely validated framework for measuring software delivery performance. The research behind them, now published annually in the State of DevOps Report, links high performance on these four metrics to better organizational outcomes: faster feature delivery, higher reliability, and more sustainable engineering teams.
For Bitbucket Cloud teams, DORA metrics represent a gap. Bitbucket provides the raw data — commits, pull requests, pipeline runs, deployment events — but it does not calculate DORA metrics from that data. The Insights tab that previously surfaced some PR analytics was removed by Atlassian. In 2026, if you want DORA metrics from your Bitbucket repos, you need to either build your own measurement pipeline or use a third-party tool like Koalr.
This guide walks through each DORA metric in detail: what data you need, where it lives in Bitbucket, and what is genuinely missing. Understanding the data model makes it easier to evaluate which tools are calculating DORA metrics accurately versus using approximations that produce misleadingly clean numbers.
Deploy frequency: what Bitbucket knows and what it does not
Deploy frequency measures how often your team successfully deploys to production. Bitbucket Pipelines generates two types of events that are relevant here: pipeline run completion events and deployment environment events.
A pipeline run completion event fires every time a pipeline finishes — whether it is a build, a test suite, a deployment, or an arbitrary script. Not all pipeline runs are deployments. If you use a separate deployment step within a pipeline that targets a named deployment environment (staging, production), Bitbucket tracks this through the Deployments API, which records the environment name, the pipeline run that triggered it, the commit SHA, and the deployment timestamp.
To calculate accurate deploy frequency, you need to use Bitbucket's Deployments API, not the general pipeline runs API. A common mistake is counting pipeline completions as deployments — this overcounts significantly if you run separate pipelines for tests, builds, and deployments.
The configuration requirement is the key gap: to use the Deployments API, pipelines must be configured with named deployment environments. Teams that have not done this — where deployments are just steps in a pipeline without environment declarations — produce deployment data that the Deployments API cannot surface. In those cases, deploy frequency requires inference from pipeline step naming conventions, which is unreliable.
Once you have deployment events, classifying your team into DORA performance tiers is straightforward. Elite performers deploy multiple times per day. High performers deploy between once per day and once per week. Medium performers deploy between once per week and once per month. Low performers deploy less frequently than once per month.
Lead time for changes: the three-part calculation
Lead time for changes measures the time from a commit being made to that commit running in production. For most teams, this is not a single number — it is the sum of three distinct phases that each have different drivers and different interventions when they are slow.
The three phases of lead time, with the Bitbucket data source for each:
Phase 1: Coding time.The time from the first commit on a branch to the PR being opened. This is available from Bitbucket's git data — the commit timestamp on the first commit that diverged from the base branch, compared to the PR creation timestamp. This phase is often called "time to open PR" and represents how long a developer worked on a feature before asking for review.
Phase 2: Review time.The time from PR creation to PR merge. This breaks down further into time to first review (PR created → first review comment or approval) and review resolution time (first review → merge). Bitbucket's Pull Requests API records the PR creation timestamp, all review event timestamps, and the merge timestamp. This is the most complete DORA-relevant data Bitbucket natively exposes.
Phase 3: Deployment time. The time from PR merge to the commit appearing in a production deployment. This requires correlating the PR merge event with a Bitbucket Deployments API event that includes the merge commit SHA. If your deployment pipeline runs automatically on merge to main, this phase is typically short — minutes, not hours. If deployments are manual or scheduled, this phase can add hours or days to your lead time.
The total lead time is the sum of all three phases. Most teams are surprised by how much of their lead time is in Phase 2 (review) versus Phase 3 (deployment). Review time is typically the largest component, and it is also the most directly improvable through team practices: smaller PR sizes, explicit reviewer assignment via CODEOWNERS, and review SLAs all reduce this phase.
What Bitbucket does not give you natively is a calculated lead time metric. The raw timestamps are available, but calculating lead time requires joining commit data, PR data, and deployment data across three separate API endpoints — work that a purpose-built DORA tool does automatically.
Change failure rate: what counts as a failure in Bitbucket
Change failure rate (CFR) measures the percentage of deployments that cause a failure requiring remediation — a hotfix, a rollback, or an incident response. It is calculated as failed deployments divided by total deployments.
The challenge with CFR in any platform, including Bitbucket, is defining what constitutes a failure. There are three types of failure signals available from Bitbucket data:
Pipeline failures in production:If a deployment pipeline step fails in the production environment, this is a clear failure signal. Bitbucket's Deployments API records deployment status (successful, failed, in-progress), so failed deployments are directly accessible. This is the cleanest failure signal and the easiest to use.
Revert commits:A revert commit (one that reverses a previous commit) is a strong signal that a change caused a problem. Bitbucket's git data includes commit messages, and revert commits follow a standard naming pattern that tools can detect. This signal requires parsing commit message patterns, which is slightly more complex but reliable for teams that use standard git practices.
Hotfix PRs: A PR merged to main that was branched from main (rather than from a feature branch) and merged quickly — within hours of a production deployment — is a candidate hotfix signal. This inference is weaker than the previous two signals and produces false positives for teams with frequent small PRs.
What Bitbucket does not give you is a calculated CFR metric. It also does not surface incident data — if a deployment caused an on-call alert at 2 AM but was resolved without a revert or a hotfix, Bitbucket has no record of the failure. Accurate CFR for production incidents requires an incident management integration (PagerDuty, OpsGenie, or incident.io).
Elite DORA performers have a change failure rate below 5%. High performers are 5–10%. For context, teams often discover their actual CFR is higher than they expected when they start measuring it formally — because informal rollbacks and quick hotfixes were not previously counted.
MTTR: the metric Bitbucket cannot calculate alone
Mean time to recovery (MTTR) measures the average time to restore service after a production failure. This metric has no native data source in Bitbucket — by definition, MTTR requires an incident signal (when did the failure begin) and a resolution signal (when was service restored), neither of which Bitbucket tracks.
The standard approach for MTTR in a Bitbucket environment is to integrate an incident management tool:
- PagerDuty: Incident open and close timestamps are available via the PagerDuty API. MTTR is calculated as the average time from incident.triggered to incident.resolved across production incidents in a measurement period.
- OpsGenie: Similar to PagerDuty — alert created and alert closed timestamps, filtered to production severity alerts.
- incident.io: Incident declared and incident resolved timestamps, with the added benefit of structured severity and impact metadata.
If your team does not have an incident management tool and relies on ad-hoc Slack notifications or email alerts, MTTR is effectively unmeasurable. This is a gap worth addressing independently of DORA measurement — structured incident management improves response time regardless of whether you are tracking the metric.
Elite DORA performers have MTTR under one hour. High performers restore service in under one day. For context, teams that are measuring MTTR for the first time often discover their actual recovery time is significantly longer than their intuition suggests — because informal resolution events were not timestamped.
Pulling it together: the Bitbucket DORA data model
To calculate all four DORA metrics from Bitbucket Cloud, you need data from these sources, in order of availability:
- Bitbucket Pull Requests API: PR creation, review events, approval events, merge events. This is the most complete data source for lead time Phase 2 and a partial signal for CFR.
- Bitbucket Deployments API: Deployment events tied to named environments. Required for deploy frequency and lead time Phase 3. Requires proper pipeline configuration to be useful.
- Bitbucket git data (commits API): Commit timestamps, commit messages (for revert detection), branch ancestry (for lead time Phase 1). Available but requires pagination handling for large repos.
- Incident management tool API (external): Required for MTTR. Not available from Bitbucket. Integrate PagerDuty, OpsGenie, or incident.io.
The practical challenge is that joining these data sources correctly requires significant engineering effort if you are building the measurement pipeline yourself. A purpose-built tool like Koalr handles the API integration, data joins, and metric calculation automatically — and surfaces the results in a dashboard without requiring you to maintain a custom data pipeline.
| DORA Metric | Bitbucket Data Source | Native Support | Key Gaps |
|---|---|---|---|
| Deploy Frequency | Bitbucket Pipelines deployment events (deployment environment API) | Partial — pipeline runs visible, no DORA-tier classification | No classification into elite/high/medium/low tiers. Pipeline runs ≠ deployments without environment configuration. |
| Lead Time for Changes | Commit timestamps + PR merge time + Pipelines deployment event time | None — no native lead time metric | Requires correlating git commit data with PR merge events and deployment pipeline completion. Bitbucket exposes the raw data but calculates nothing. |
| Change Failure Rate | Pipeline failure events + manual rollback detection via revert PRs | None — no native failure rate metric | Bitbucket Pipelines does not distinguish between a failed deployment and a failed test run. Rollbacks require detecting revert commit patterns. |
| MTTR | Not available in Bitbucket — requires incident management tool integration | None | MTTR requires an external incident signal (PagerDuty, OpsGenie, or incident.io). Bitbucket has no native incident tracking. |
Common mistakes when calculating DORA from Bitbucket
Teams building their own DORA measurement from Bitbucket data consistently run into the same pitfalls. Avoiding these produces more accurate baselines and more useful trend data.
- Counting pipeline runs as deployments. The most common mistake. Bitbucket Pipelines runs include builds, test runs, and deployments. Only pipeline steps that target a production deployment environment should be counted. Use the Deployments API, not the Pipelines API, for this.
- Starting lead time at PR open, not first commit. Lead time for changes is defined from first commit to production, not from PR creation to production. Teams that start at PR creation systematically undercount their lead time by the length of the coding phase, which hides long development cycles.
- Using merge events as the deployment event. If your deployment pipeline does not run automatically on merge, or if it runs and then has a separate approval gate, using the merge timestamp as the deployment timestamp understates lead time Phase 3. Always use the actual deployment event timestamp.
- Excluding weekends from lead time. DORA research uses calendar time, not business hours. Excluding weekends makes lead time look better than it is and obscures the business impact of slow delivery cycles.
- Ignoring revert commits in CFR. Teams that do not count revert commits as failures systematically undercount their change failure rate. A revert commit is a deployment that caused a problem — it should always count.
Getting started with Bitbucket DORA metrics
If you are starting from scratch with DORA measurement on Bitbucket, the fastest path to a working dashboard is to connect a purpose-built analytics tool rather than build your own measurement pipeline. The API integration, data joining, and metric calculation are solved problems — the value is in the insights the metrics produce, not in building the pipeline.
Koalr connects directly to Bitbucket Cloud via OAuth, backfills 90 days of PR and deployment history, and produces a DORA dashboard with per-team and per-repo breakdowns on first login. It also calculates PR cycle time by stage and provides deploy risk scoring for every PR — capabilities that go beyond DORA and address the deployment reliability gaps that DORA metrics reveal but do not solve.
If you want to build your own pipeline, the Bitbucket Cloud REST API documentation for the Deployments and Pull Requests endpoints is the right starting point. Budget for significant pagination handling and rate limit management before your data collection pipeline is production-ready.
DORA metrics for your Bitbucket repos — ready in 30 minutes
Koalr connects to Bitbucket Cloud and calculates all four DORA metrics, PR cycle time, and deploy risk scores from your actual pipeline and PR data. No spreadsheets, no custom pipelines.