Value Stream Mapping for Software Engineering: How to Find and Eliminate Delivery Bottlenecks
Most teams know their delivery is slower than it should be. Sprint retrospectives surface the same frustrations — PRs sitting unreviewed, QA becoming a bottleneck, deploys queued behind a weekly release window. Value stream mapping gives you the framework to stop guessing about where time goes and start measuring it with precision.
What this guide covers
What value stream mapping is and where it comes from, the 7 waste categories in software delivery, a step-by-step mapping process with a real worked example, how VSM connects to DORA metrics, and how to design a future state targeting 70%+ flow efficiency.
What is Value Stream Mapping?
Value stream mapping (VSM) is a lean manufacturing technique that originated in the Toyota Production System. Toyota used it to visualize every step in a physical production line — from raw materials arriving at the factory to finished vehicles leaving it — and identify where time was being lost to waiting, rework, and unnecessary movement.
Adapted for software engineering, VSM maps the full flow from a customer request — a backlog item, a feature ticket, a bug report — to working software running in production. Every step between those two endpoints is captured: refinement sessions, sprint planning, coding, code review, CI/CD pipelines, QA sign-off, deployment, and post-deploy monitoring. Each step is tagged with two measurements: the active time (when someone or something is actually working on it) and the wait time (when it sits idle, queued, or blocked).
The reason VSM is powerful is that it makes invisible time visible. Most engineering teams can tell you how long coding takes. Almost none can tell you, without looking at the data, that the same feature spends 7.5 days waiting between those active work sessions. The wait time is the waste. And waste, once measured, can be systematically eliminated.
For engineering managers and platform engineers, VSM is the diagnostic tool that turns vague delivery frustrations into a specific bottleneck you can act on.
The 7 Waste Categories in Software Delivery
Lean manufacturing identified 7 categories of waste (muda) in physical production. Each maps directly to a software delivery equivalent. Recognizing which category your team's slowdowns fall into is the first step toward eliminating them.
1. Overproduction
In manufacturing, overproduction means making more units than customers ordered. In software, it means building features nobody uses. This is arguably the most expensive waste category — it consumes engineering time, adds maintenance burden, increases complexity, and never generates value. Teams optimizing for story point velocity without measuring feature adoption are especially prone to overproduction.
2. Waiting
Waiting is the dominant waste in most software value streams. A pull request sitting in the review queue for 1.5 days before anyone looks at it. A CI/CD build queued behind 15 other jobs. A deploy waiting for the Thursday release window because that is when the release manager is available. Waiting is pure delay with no value added — and it is the waste that shows up most clearly when you run a VSM exercise on your own delivery pipeline.
3. Defects
Bugs that escape to production, rework caused by unclear requirements, tests that are re-written because the first implementation was wrong. Every defect is work that must be done twice. The later in the value stream a defect is discovered, the more expensive it becomes to fix — a bug caught in code review costs far less than one caught by a customer three weeks after deploy.
4. Transport and Handoffs
Each time work crosses a team boundary — from product to design, from design to engineering, from engineering to QA, from QA to operations — there is a handoff. Each handoff introduces a wait state (the receiving team is busy), a context translation cost (things get misunderstood in the transfer), and a feedback loop delay (the originating team does not hear about problems until much later). The classic requirements → design → dev → QA → ops waterfall is essentially a sequential chain of transport waste.
5. Over-processing
Doing more work than the customer actually requires. Excessive documentation written for PRs that nobody reads. Design reviews that add two weeks of calendar time for features that are already well-understood. Gold-plating: adding polish and micro-optimizations to features before they have proven they will be used at all. Over-processing often feels productive in the moment because people are busy — but it is not producing customer value.
6. Motion
Context switching between tasks. Being pulled from a complex piece of work to join an emergency meeting. Jumping between three different Jira tickets in the same afternoon because all three are technically "in progress." Every context switch costs a re-ramp time estimated at 15–20 minutes before an engineer reaches flow state again. Teams with high work-in-progress (WIP) limits compound this problem: more things in-flight means more interruptions and more motion waste.
7. Inventory
In manufacturing, inventory is raw materials or finished goods piling up in a warehouse. In software, inventory is work that has been started but not finished: unmerged pull requests that have been sitting approved for days, tickets stuck in "In Progress" with no recent commits, backlog items groomed and estimated but never actually pulled into a sprint. Inventory waste is particularly insidious because it looks like progress — tickets are moving through the workflow — but nothing has actually been delivered to users.
How to Map Your Software Delivery Value Stream
A VSM exercise has six steps. The first three establish your current state; the last three identify the bottleneck and design the improvement.
Step 1: Define the value stream start and end
The start of your value stream is the point at which a customer request enters your system. For most engineering teams this is a backlog item being created in Jira or Linear — a feature request, bug report, or technical task that has been acknowledged and placed in the queue. The end of the value stream is working software deployed to production and available to users.
This definition is important because it captures the full lead time, not just the engineering portion. Teams that define their value stream start as "sprint planning" or "coding begins" hide the planning and refinement wait time, which is often substantial.
Step 2: Map all steps in the current state
Walk through every step between your start and end points and write each one down. A typical medium-sized engineering team will have steps like: backlog refinement, sprint planning, coding, code review, CI/CD pipeline, QA sign-off, deployment, and post-deploy monitoring confirmation. Do not skip steps because they seem small — a 15-minute deploy process preceded by a 2-day deploy window wait is very different from a 15-minute automated deploy.
Step 3: Measure active time vs. wait time for each step
For each step, estimate two numbers: the active time (how long a person or system is actively working on the ticket) and the wait time (how long the ticket sits between being ready for this step and someone actually starting it). Be honest with the estimates. The wait time numbers will feel uncomfortable — that discomfort is the entire point.
If you have GitHub, Jira, or Linear data, you can make these estimates data-driven. Jira timestamps show when a ticket entered each status and when it left. GitHub PR timestamps show when a PR was opened and when the first review was submitted. Use the data if you have it; use honest estimates if you do not.
Step 4: Calculate total lead time and flow efficiency
Sum all active times and all wait times separately. Total lead time is their sum. Flow efficiency — the key metric from VSM — is calculated as:
Flow efficiency = (Total active time / Total lead time) × 100%Industry research suggests the average software team operates at 15–40% flow efficiency. Elite teams reach 50% or above. If your first VSM exercise produces a number below 25%, that is completely normal — and it means the upside from improvement is very large.
Step 5: Identify the biggest bottleneck
Look at your wait times by step. One step will almost always have a disproportionately large wait time relative to its active time. That is your bottleneck — the single point in the value stream causing more delay than any other. For most software teams, it is either PR review wait (work piling up waiting for a reviewer) or manual deploy gate (work queued behind a scheduled release window).
The Theory of Constraints principle applies here: improving a non-bottleneck step does not improve overall throughput. All improvement effort should be focused on the bottleneck until it is no longer the slowest step, at which point the bottleneck will have shifted to the next-worst step.
Step 6: Design the future state
Once you have identified the bottleneck, design a target future state that eliminates or reduces it. Quantify the improvement: if you eliminate the 2-day deploy window wait with continuous deployment, what does the new flow efficiency look like? Set a target and a timeline, then track the metrics as you make the change.
A Real VSM Example
The table below shows a value stream map for a typical medium-performing engineering team — the kind of numbers that come up repeatedly when teams do their first honest VSM exercise. The pattern is consistent: lots of active work happening efficiently, surrounded by long idle wait states.
| Step | Active Time | Wait Time | Notes |
|---|---|---|---|
| Backlog refinement | 30 min | 3 days | Ticket waiting for next refinement session |
| Sprint planning | 1 hour | 0 | Happens same session as refinement pull |
| Coding | 2 days | 0 | Active development, no idle time |
| PR review | 1 hour | 1.5 days | Waiting for reviewer to pick up the PR |
| CI/CD build | 15 min | 0 | Automated, runs immediately on merge |
| QA sign-off | 2 hours | 1 day | Waiting for QA availability in staging |
| Deploy (manual gate) | 30 min | 2 days | Waiting for Thursday deploy window |
| Totals | ~2.5 days | ~7.5 days | Flow efficiency: 25% |
This team's engineers are working efficiently when they are working. The coding, review, and QA steps each happen at reasonable speed. The problem is that work spends 75% of its time sitting idle — waiting for a meeting, waiting for a reviewer, waiting for a deploy window. Total elapsed time from backlog to production: 10 days. Actual work performed: 2.5 days. Three of the largest wait states (PR review, QA handoff, deploy gate) are all addressable with process or tooling changes.
How DORA Metrics Relate to Value Stream Mapping
DORA metrics and VSM are measuring the same underlying delivery system from different angles. Understanding the relationship between them helps teams use both frameworks more effectively.
Lead time for changes is the most direct DORA equivalent of total VSM lead time. In DORA terms, lead time is measured from first commit to production deployment — capturing most of the coding, review, CI/CD, and deploy steps in the value stream. A VSM exercise tells you exactly which steps are driving lead time up, giving you the specific targets for improvement that DORA alone cannot provide.
PR review wait time is consistently the largest single component of lead time for most software teams. The 1.5-day review wait in the example above is not unusual — industry data suggests median PR review time-to-first-review across mid-sized engineering organizations is between 4 and 24 hours, with long tails extending to multiple days on complex PRs. Reducing this wait is the highest-leverage lead time improvement most teams can make.
Manual deploy approval is the second most common bottleneck. The 2-day deploy window wait in the example means work that is fully reviewed, tested, and ready to ship sits idle for 40% of the total lead time for purely process reasons. Teams with automated continuous deployment eliminate this wait entirely — and that improvement flows directly into both better lead time and higher deployment frequency metrics.
Change failure rate represents the cost of defects escaping the value stream. If your QA step is a bottleneck and you eliminate it by shifting quality left, you need to ensure your automated test coverage picks up the work the manual QA was doing — otherwise you eliminate the wait but increase your CFR. VSM improvements should always be evaluated against change failure rate to confirm the bottleneck was resolved rather than bypassed.
The connection to lead time
If you are already tracking DORA metrics, lead time for changes is the aggregate output of your entire value stream. VSM tells you exactly which step is consuming that lead time. The two frameworks are most useful together.
What to Measure With Tools
A VSM exercise run as a workshop can produce useful directional insights, but the estimates are vulnerable to optimism bias — teams tend to underestimate wait times and overestimate active work time. Instrument your value stream from actual data sources wherever possible.
PR review wait: GitHub API
The time from a PR being opened to the first review comment or approval is available from the GitHub Pull Requests API. Pull created_at from the PR and the earliest submitted_at from its reviews. The delta is your time-to-first-review. Track this as a median across all PRs in a rolling 30-day window, broken down by repository and team. Outliers — PRs sitting more than 48 hours without a review — are your clearest signal of reviewer bottleneck.
CI/CD pipeline duration: GitHub Actions or CircleCI
GitHub Actions exposes workflow run durations via the Workflow Runs API. For each completed run on your main branch, capture created_at and updated_at to get total pipeline duration. Track this over time — unexplained increases in pipeline duration are usually caused by test suite growth without parallelization, flaky tests causing retries, or dependency caching failures.
PR merge-to-deploy lag: GitHub Deployments API
After a PR is merged, how long before its changes reach production? Correlate PR merge timestamps with the subsequent deployment that includes the merge commit SHA. This gap captures CI/CD duration plus any manual deploy gate time. If this number is consistently above 4 hours for an organization targeting high performance, a manual gate is almost certainly involved.
Deploy frequency: GitHub Environments API
Count successful production deployments per day, week, or month using the GitHub Deployments API filtered to your production environment with a success status. Deploy frequency is both a DORA metric and a proxy for how well you have eliminated deploy gate waste — teams deploying continuously have no deploy window wait by definition.
Ticket cycle time: Jira or Linear
In Jira, the time a ticket spends in each workflow status is available via the issue changelog. In Linear, issue state transitions are similarly timestamped. Tracking how long tickets spend in "In Progress" versus "In Review" versus "In QA" gives you a data-driven VSM at the ticket level, covering the planning and handoff portions of the value stream that GitHub data alone cannot capture.
Common Bottlenecks and How to Eliminate Them
PR review wait — the most common bottleneck
The most effective interventions for PR review wait are structural rather than cultural. Setting a review SLA (for example, first review within 4 hours of opening) creates a clear expectation, but it only works if engineers are assigned reviews automatically rather than waiting for someone to volunteer. Use GitHub's CODEOWNERS file to auto-assign reviewers based on file ownership — this eliminates the diffusion of responsibility that causes PRs to go unreviewed. Pair this with keeping PR size small. PRs under 200 lines of changed code are reviewed 40% faster on average than PRs over 400 lines, and they accumulate smaller wait times because reviewers prioritize shorter reviews.
Manual deploy gates
The right fix for a manual deploy gate is to remove it by building the confidence that made the gate necessary in the first place. This means automated testing at the appropriate coverage level, a staged rollout strategy (canary deployments or feature flags) that limits blast radius, and automated rollback triggered by monitoring alerts. Once the safety net is automated, the manual gate becomes redundant — not a shortcut around it but an unnecessary layer on top of better controls.
Long CI/CD builds
Build times above 10 minutes significantly increase the friction cost of each change, which reduces deploy frequency and increases batch size (engineers wait and batch changes rather than committing frequently). The standard interventions: parallelize test suites across multiple workers, cache dependencies between runs, run fast unit tests first and fail early before triggering slower integration tests, and split monolithic test suites by domain. A CI pipeline that takes 25 minutes can often be brought below 8 minutes with parallelization and caching alone.
QA handoffs
Manual QA sign-off is a serializing constraint — one QA engineer can only review one feature at a time, and QA availability creates a queue that grows proportionally with team output. The shift-left testing approach addresses this at the source: move the testing responsibility earlier (to developers writing tests alongside code) and later (to monitoring and canary analysis in production) rather than concentrating it in a pre-deploy manual gate. Automated regression suites that run on every PR replace the manual QA check for regression coverage. Feature flags allow gradual rollouts that make the binary "ready / not ready" QA decision less consequential.
Sprint planning ceremony overhead
Teams spending 4–8 hours per sprint in planning ceremonies are incurring a significant motion and over-processing cost. Two alternatives reduce this. First, continuous flow (Kanban-style) eliminates the batch planning cycle entirely — work flows through the value stream continuously rather than in sprint-size batches. Second, for teams that need sprints for coordination, investing in better asynchronous refinement (detailed written tickets with acceptance criteria, recorded architecture decisions) reduces the time required in synchronous planning meetings.
Designing Your Future State
Once you have identified your primary bottleneck, design a future state value stream that eliminates it. A realistic target for teams at 25% flow efficiency is 50–60% as a 6-month goal, with 70%+ as the longer-term target for teams reaching elite performance.
Applied to the example VSM above, a future state targeting 70%+ flow efficiency might look like this:
Eliminate QA handoff: replace the 2-hour manual QA review and 1-day wait with automated testing (unit + integration + end-to-end for critical paths) combined with feature flags for gradual rollout. Active time decreases to near-zero; wait time drops to zero.
Automate deployment: continuous deployment from main branch with canary analysis replaces the Thursday deploy window. The 2-day deploy gate wait disappears. The 30-minute deploy active time is retained but now happens immediately after CI passes.
CODEOWNERS auto-assignment with 4-hour SLA: reduces the PR review wait from 1.5 days to under 4 hours. Smaller PRs (enforced via a PR size limit policy) further reduce review time.
The revised future state: ~2.5 days active time, ~0.5 days wait time. Flow efficiency: 83%. Total lead time from backlog to production: 3 days instead of 10.
Tools for VSM in Software Engineering
A VSM exercise can start with nothing more than a whiteboard and honest time estimates. But tooling that automates data collection turns VSM from a quarterly workshop into a continuously updated view of where your delivery pipeline is losing time.
Manual workshop tools
Miro and Lucidchart both support VSM diagram templates and are commonly used for initial current-state mapping workshops. They are good for getting a team aligned on the shape of the value stream but require manual data entry and do not update automatically as the team's delivery patterns change.
Automated measurement
Koalr measures your value stream automatically from your existing GitHub, Jira, and Linear data. PR review wait, CI/CD duration, merge-to-deploy lag, ticket cycle time by status — all calculated from live integration data without manual instrumentation. The result is a continuously updated view of where time is being lost in your delivery pipeline, with team and repository breakdowns that make bottleneck identification fast. When you make a process change — adding a review SLA, enabling continuous deployment — the impact is visible in the metrics within days rather than waiting for the next retrospective.
Other platforms in the engineering analytics space — Waydev, LinearB, Swarmia — offer some visibility into PR cycle times and deploy frequency, but none provide an integrated VSM view that ties together ticket age by status, PR review wait, CI duration, and deploy lag in a single flow efficiency calculation.
| Capability | Koalr | LinearB | Waydev | JIRA Adv. Roadmaps |
|---|---|---|---|---|
| PR review wait tracking | ✓ | ✓ | ✓ | ✗ |
| Ticket cycle time by status | ✓ | ✓ | ✗ | Partial |
| Flow efficiency calculation | ✓ | ✗ | ✗ | ✗ |
| CI/CD duration tracking | ✓ | Partial | ✗ | ✗ |
| Deploy risk prediction | ✓ | ✗ | ✗ | ✗ |
Getting Started
The fastest way to run your first VSM exercise is to schedule a 90-minute workshop with your engineering leads and walk through the six steps with a whiteboard. Pull any Jira or GitHub data you already have access to and use it to pressure-test the estimates. The output of that 90 minutes — a current-state map with wait times quantified — is often the clearest view an engineering leadership team has ever had of where their delivery time actually goes.
The next step is to make that view continuous. A one-time VSM exercise produces a snapshot; a platform that measures your value stream from live data produces a dashboard you can act on every week.
If your team is already tracking DORA metrics, VSM is the natural complement — it explains the 'why' behind the lead time number and gives you the specific bottleneck to target. If you are not yet tracking DORA, starting with VSM and connecting it to DORA metrics gives you both the diagnostic and the outcome measurement in one motion.
See your value stream metrics automatically
Koalr measures PR review wait, CI/CD duration, deploy lag, and ticket cycle time from your existing GitHub, Jira, and Linear data — no manual instrumentation required. Connect in under 5 minutes and see where your delivery time is going.