Engineering MetricsMarch 16, 2026 · 10 min read

Engineering Throughput vs. Velocity: Why Teams Measure the Wrong Thing

Sprint velocity is the most widely reported engineering metric and one of the least useful. It measures how many story points a team completed — which is a measure of estimation accuracy, not delivery performance. Throughput metrics measure what actually ships. Here is why the distinction matters and how to make the switch.

What this guide covers

The velocity trap and why story points lie, the throughput metrics that measure real delivery output, why velocity is easy to game and throughput is not, how to measure throughput at the team level (not individual), and a practical transition playbook for engineering leaders.

The Velocity Trap: Measuring Effort, Not Output

Story points were invented to help teams estimate relative effort, not to measure delivery performance. When Ron Jeffries and the XP community developed planning poker in the early 2000s, the intent was better sprint planning — sizing work so teams could make realistic commitments. Velocity was a byproduct: if the team consistently completes 30 points per sprint, you can plan the next sprint around 30 points.

Somewhere between then and now, velocity became a management metric. Teams are expected to report velocity upward, and executives use velocity trends to assess team performance. This creates an immediate perverse incentive: teams optimize for velocity, not for delivery. The easiest way to increase velocity is to inflate point estimates. A 3-point story becomes a 5-point story; an 8-point story becomes 13. Velocity goes up. Delivery speed does not.

The fundamental problem is that story points are a measure of estimated effort input. Throughput is a measure of output — work delivered. A team that completes 40 story points but ships no features to production has zero throughput. A team that ships 3 deployments per day has high throughput regardless of what their story point estimates look like.

How Velocity Lies to You

Inflated estimates

When teams know they are measured on velocity, estimation sessions drift toward higher numbers. Engineers add buffer to protect themselves from missing commitments. Managers who push for higher velocity get higher numbers — not faster delivery. The velocity number climbs while cycle time stays flat or grows.

Cherry-picking easy tasks

High-point stories that require deep investigation, architectural decisions, or external dependencies get deprioritized in favor of lower-risk stories that will definitely close within the sprint. The backlog accumulates hard work while velocity metrics look healthy. This is sometimes called the "low-hanging fruit problem" — teams harvest easy stories to hit their velocity number while the hard, high-value work sits.

Gaming sprint boundaries

Work that is 90% complete at the end of a sprint either gets rushed to closure (introducing bugs) or gets carried over without counting toward that sprint's velocity. Both outcomes distort the signal. Rushed closures increase change failure rate; carryover stories create artificial volatility in velocity numbers that has nothing to do with actual team performance.

Velocity does not account for quality

A team completing 40 story points that produces three production incidents is not performing better than a team completing 25 story points with zero incidents. Velocity is silent on quality. Throughput metrics — when combined with change failure rate — give you both the quantity and the quality signal.

Dimension	Sprint Velocity	Throughput Metrics
What it measures	Estimated effort completed	Actual work delivered
Unit of measurement	Story points (abstract)	PRs, deployments, features (concrete)
Gaming risk	High — inflate estimates	Low — output is observable
Comparable across teams	No — teams use different scales	Yes — same units everywhere
Quality signal	None	Via change failure rate + MTTR
Trend detection	Noisy — estimate drift obscures trends	Clear — output does not drift

The Throughput Metrics That Actually Matter

Deployment frequency (DORA)

Deployment frequency is the most direct measure of delivery throughput: how often does the team ship working software to production? It is output-focused, ungameable (you either deployed or you did not), and directly correlated with business outcomes. DORA research shows that elite teams deploy multiple times per day; high performers deploy daily to weekly. Tracking deployment frequency over time tells you whether your delivery capability is improving or degrading.

PR throughput (PRs merged per week)

The number of pull requests merged per week is a team-level throughput measure that captures the rate at which code moves through the development pipeline. It is not a quality measure on its own — a high PR throughput with a high change failure rate is a problem — but combined with quality metrics, it reveals whether the team is accelerating or stalling. Segment by repository and by team to identify where throughput is constrained.

Cycle time

Cycle time measures the elapsed time from when work starts (first commit on a branch, or PR opening) to when it ships to production. Unlike story points, cycle time is objective and cannot be inflated. A team whose cycle time is trending downward is genuinely getting faster. A team whose velocity is trending up but whose cycle time is flat or growing is gaming their estimates.

Measure cycle time at the p50 and p90. The p50 tells you the typical experience; the p90 tells you how bad the outliers are. Teams with high p90 cycle times often have a small number of long-lived PRs or tickets that are masking the team's true capability.

PR size distribution

Smaller PRs move faster, get better reviews, and introduce fewer bugs. Tracking median PR size over time reveals whether the team is breaking work into small, shippable units or accumulating large batch changes. A downward trend in median PR size typically correlates with improvements in both cycle time and deployment frequency.

Features shipped per sprint

Not all throughput metrics need to be GitHub-derived. Tracking the number of user-facing features shipped per sprint — even counting them manually — gives a business-legible throughput measure that executives understand. Unlike story points, features shipped is a concrete output that non-engineers can reason about.

The Throughput Metric Stack

Deployment Frequency

Source: CI/CD pipeline

deploys / day or week

Direct measure of output cadence. Elite: multiple/day. High: daily.

PR Throughput

Source: GitHub

PRs merged / week

Reveals delivery rate trends. Drops signal review or WIP bottlenecks.

Cycle Time

Source: GitHub

hours from first commit to deploy

Ungameable speed measure. Downward trend = genuine acceleration.

Team Throughput vs. Individual Throughput

One of the most important caveats in any throughput measurement program: throughput metrics should be measured at the team level, not at the individual level.

Individual PR throughput or deployment counts are not meaningful performance indicators. Different engineers work on different types of problems — a week spent on a complex architectural refactor may produce one PR, while a week of bug fixes may produce ten. Comparing these numbers across individuals measures the nature of the work, not the quality of the engineer.

Individual throughput measurement also creates perverse incentives: engineers break work into more, smaller PRs to inflate their count, cherry-pick easy bugs to hit a weekly PR target, and avoid difficult architectural work that takes time but creates real value. These are exactly the same failure modes as individual story point tracking, just with a different metric.

Use throughput metrics to understand team-level delivery capability and trend direction. Use them to identify structural bottlenecks (low PR throughput on a specific repository often indicates a review bottleneck, not a performance problem). Never use them to rank or evaluate individual engineers.

How to Transition from Velocity to Throughput Reporting

Transitioning away from velocity-based reporting requires careful change management. Executives who have been looking at velocity charts for years will push back. The transition is easier when framed as adding accuracy, not changing the metric.

Phase 1: Add throughput metrics alongside velocity (weeks 1–4)

Start collecting deployment frequency, cycle time, and PR throughput without removing velocity from reporting. Show both sets of numbers in engineering reviews. This gives you a baseline and lets stakeholders see the two metrics moving in relationship to each other — often revealing that velocity is increasing while throughput is flat.

Phase 2: Correlate velocity with throughput (weeks 4–8)

Run a correlation analysis: in weeks when the team completed high velocity, did throughput (deployments, cycle time, PRs merged) also improve? For most teams, the correlation is weak. Showing this data to executives is more persuasive than arguing about measurement philosophy.

Phase 3: Shift sprint reviews to throughput language (week 8+)

Reframe sprint reviews around throughput questions: how many deployments did we ship? What was our median cycle time? Did PR throughput increase or decrease? Retire the story point velocity chart from executive dashboards and replace it with a deployment frequency chart and a cycle time trend.

What to tell stakeholders

The most effective framing for executives is simplicity: "Deployment frequency tells us how often we ship value to customers. Cycle time tells us how fast we move from idea to production. Story points tell us how well we estimated effort. The first two are the ones that affect the business."

Throughput Benchmarks for Engineering Teams

Based on DORA research and industry data from 2024–2026, here are the throughput benchmarks that serve as useful targets for different maturity levels:

Deployment frequency: Elite teams deploy multiple times per day. High performers deploy daily to weekly. Medium performers deploy weekly to monthly. If your team deploys less than once per week, deployment frequency is a high-leverage improvement target.
Cycle time: Elite teams achieve cycle times under 24 hours from commit to production. High performers are typically 1–7 days. Medium performers 1–4 weeks. If your cycle time is measured in weeks, the bottleneck is usually in review wait time or deployment pipeline speed.
PR throughput: A 6-engineer team in a healthy delivery cadence typically merges 25–50 PRs per week, depending on PR size norms. Below 15 PRs per week for a team of this size usually indicates either very large PRs or review bottlenecks.

Throughput and the DORA Framework

The DORA four key metrics are, at their core, a throughput measurement framework. Deployment frequency and lead time for changes are both throughput measures — how often you ship and how fast you ship. Change failure rate and MTTR are the quality companion metrics that tell you whether the throughput is sustainable.

Teams that adopt DORA metrics are effectively making the transition from velocity-based to throughput-based reporting, even if they do not frame it that way. The DORA framework provides the academic and industry credibility that helps engineering leaders get executive buy-in for the transition.

See your real throughput trends — without story points

Koalr calculates deployment frequency, PR throughput, and cycle time automatically from your GitHub and CI/CD data — no manual tracking, no story point inflation. Give executives a throughput dashboard that is honest, trend-based, and resistant to gaming. Ask the AI chat "is our delivery throughput increasing or decreasing this quarter?" and get a direct answer backed by your actual data.

Get started free →See all features

← Back to Blog