Team Health9 min readMarch 16, 2026

How to Build an Engineering Team Health Score

A single number that captures team health is seductive but dangerous if built wrong. Here's a methodology for constructing a composite health score that engineering managers can actually use in 1:1s and leadership reviews — without reducing people to a metric.

Engineering team health scores are having a moment. Every VP Engineering wants a dashboard that surfaces struggling teams before attrition or missed deadlines make the problem visible. But most implementations fail in one of two ways: they measure only output metrics (velocity, deployment frequency) and miss the human factors, or they measure only sentiment (survey responses) and miss the operational signals that predict burnout six weeks before engineers report it.

A robust health score requires both. Here's how to construct one that holds up under scrutiny from engineers and leadership alike.

The Four Pillars of Team Health

A useful engineering team health score draws from four distinct data sources, each measuring something the others cannot capture.

Delivery health captures whether the team is shipping reliably. This is your DORA data: deployment frequency, lead time for changes, change failure rate, and MTTR. Delivery health is a lagging indicator — it tells you how the team has been performing, not why.

Flow health captures whether engineers have uninterrupted time to do deep work. Metrics here include the ratio of planned work completed vs. unplanned interruptions, average PR cycle time (excluding outliers), and the number of context switches per engineer per week estimated from commit activity across repos.

Review health captures the quality of the team's code review culture. Key signals: median time to first review, PR review participation rate (what percentage of engineers are reviewing, not just receiving reviews), rework rate (how often does code need significant changes after review approval), and PR size distribution.

Well-being signals capture what the data cannot. This includes short pulse survey scores (one or two questions per week, not a quarterly monster survey), on-call burden measured by alert volume and overnight pages per engineer, and PTO utilization rate. Teams with very low PTO utilization are often the most at-risk for attrition.

Weighting Methodology

Equal weighting across pillars is the wrong starting point. Delivery health should carry more weight for teams that own customer-facing services with SLOs. Flow health matters more for teams doing complex, research-heavy work where context-switching is especially costly. Here's a weighting framework that works for most product engineering teams:

Recommended weighting

  • Delivery health (DORA)35%
  • Flow health25%
  • Review health20%
  • Well-being signals20%

Adjust well-being upward to 30% for teams with high on-call rotation or recent organizational change. Platform teams with no direct customer impact can reduce delivery health to 25% and increase flow health to 35%.

Within each pillar, normalize metrics to a 0–100 scale before weighting. For DORA metrics, use the DORA performance bands as anchors: elite performance maps to 90–100, high to 70–89, medium to 40–69, and low to 0–39. For subjective metrics like survey scores, use your own historical baseline rather than industry benchmarks — a team consistently scoring 7/10 is healthier than a team that spikes to 9/10 and then drops to 4/10.

Red / Amber / Green Thresholds

Once you have a composite score on a 0–100 scale, you need thresholds that trigger different management responses. Avoid the trap of making thresholds too tight — a score that flips between amber and green week-over-week creates alert fatigue and teaches managers to ignore it.

RAG thresholds

GREEN

Score 70–100

Team is operating well. Review in regular 1:1s. No escalation needed.

AMBER

Score 45–69

One or more pillars showing stress. Schedule a dedicated conversation with the EM. Identify which pillar is dragging the score.

RED

Score below 45

Team is at risk. Escalate to VP Engineering. Create a 30-day improvement plan with specific metric targets.

The threshold for escalation matters less than the consistency of your response once thresholds are crossed. If amber teams consistently get attention and improve, engineers learn that the score is connected to real support — not just surveillance. If amber scores are ignored, the score becomes meaningless and engineers stop taking surveys seriously.

Using the Health Score in 1:1s

The biggest mistake engineering managers make with health scores is presenting them to engineers as judgments rather than conversation starters. Engineers who feel surveilled will game any metric you put in front of them.

The right approach is to share the pillar breakdown, not the composite number. In a 1:1, show an engineer their team's flow health trend and ask: "Our unplanned work ratio has been climbing for three weeks. What are you seeing on the ground?" This creates a conversation where the data is the opening question, not the verdict.

Never use a team's health score in performance reviews for individual engineers. The score reflects team-level dynamics, not individual contribution. An excellent engineer on a struggling team will have a low health score through no fault of their own. Conflating team health with individual performance will destroy trust in the metric overnight.

Using the Health Score in Leadership Reviews

At the VP and CTO level, health scores serve a different purpose: portfolio visibility. With eight engineering teams and limited leadership bandwidth, you need a signal that surfaces which teams need attention without requiring the VP to review every team's metrics manually each week.

In your monthly engineering leadership sync, display the health score trend for all teams on a single view. Look for teams that have been amber for more than three consecutive weeks — they are the ones where a VP-level conversation or resource intervention may be needed. Single-week amber events are noise. Persistent amber is signal.

The well-being signal early warning system

Well-being survey scores typically decline four to six weeks before delivery metrics start to suffer. If you're only watching DORA metrics, you're reacting to problems that started a month ago. Teams with rapidly declining survey scores but still-green DORA metrics are your highest-priority intervention targets — they're running on reserves that will eventually exhaust.

Building and maintaining a health score requires commitment from engineering management to actually respond to what it surfaces. The score itself is easy. The harder part is building organizational muscle to act on amber signals before they turn red.

See your team health score

Koalr automatically calculates a composite health score for every engineering team, combining DORA metrics, flow data, review health, and pulse survey results into a single view your leadership team can act on.

See your team health score →