Engineering Delivery Metrics and Systems

What you measure shapes what your teams optimize for. The wrong metrics create perverse incentives -- measuring lines of code rewards verbosity, measuring story points completed rewards point inflation. The right metrics illuminate whether your engineering organization is delivering value sustainably. EM interviews frequently test whether you understand which metrics matter and how to use them without turning them into weapons.

DORA Metrics

The most rigorously validated engineering metrics come from the DORA (DevOps Research and Assessment) team at Google. This research, led by Nicole Forsgren, Jez Humble, and Gene Kim, was published in the book "Accelerate" (2018) and is backed by years of survey data from thousands of organizations. The research identifies four key metrics that predict both organizational performance and software delivery capability:

Metric	What It Measures	Elite Performance	Low Performance
Deployment Frequency	How often code is deployed to production	On-demand, multiple times per day	Between once per month and once every six months
Lead Time for Changes	Time from code commit to running in production	Less than one hour	Between one month and six months
Change Failure Rate	Percentage of deployments that cause a failure requiring remediation	0-15%	46-60%
Time to Restore Service (MTTR)	How long it takes to recover from a failure in production	Less than one hour	Between one month and six months

The power of DORA metrics is that they are correlated with each other in elite teams. High-performing organizations deploy more frequently AND have lower failure rates AND recover faster. This disproves the common assumption that moving fast requires sacrificing stability.

When discussing DORA metrics in interviews, make three key points:

Use them as team-level indicators, not individual performance metrics. DORA metrics reflect system health and process effectiveness, not whether a specific engineer is performing well.
Track trends, not absolute numbers. A team improving from monthly to weekly deployments is making meaningful progress even if they are not yet deploying daily.
Address all four metrics together. Optimizing deployment frequency without monitoring change failure rate leads to shipping broken code faster.

Velocity Tracking

Velocity -- the amount of work a team completes per sprint, measured in story points -- is useful for internal planning and trend analysis. It answers the question: "Based on our recent pace, how much can we realistically commit to next sprint?"

Common pitfalls with velocity:

Comparing velocity across teams. Team A completing 40 points per sprint and Team B completing 25 points says nothing about relative productivity. Each team calibrates story points differently.
Using velocity as a performance metric. When managers use velocity to evaluate teams, teams inflate point estimates to look productive. The metric becomes meaningless.
Ignoring velocity variance. A team that averages 30 points per sprint but swings between 15 and 45 has a predictability problem worth investigating.

The correct use of velocity is to calculate a rolling average (typically over 3-5 sprints) and use it as a forecasting input. If the rolling average is 32 points and the backlog for the next milestone contains 128 points, you can forecast approximately four sprints of work.

Roadmap Creation

Engineering roadmaps bridge strategy and execution. They answer three questions: What are we building? Why does it matter? When will it be ready?

A practical roadmap structure uses three time horizons:

Now (current quarter): Committed work with high confidence in scope and timing. Items here should be broken into epics with estimates.
Next (next quarter): Planned work with moderate confidence. Scope is defined at the theme or initiative level. Timing is approximate.
Later (beyond two quarters): Directional bets with low confidence. These represent strategic intentions, not commitments. They will change as you learn more.

This structure communicates honestly about certainty. Stakeholders who see a detailed 12-month roadmap with exact dates should be skeptical -- no one has that level of foresight in software development. A Now-Next-Later roadmap sets appropriate expectations.

When building roadmaps, account for these common traps:

All features, no infrastructure. If the roadmap contains zero investment in platform improvements, tech debt, or reliability, the team will slow down over time.
No dependencies marked. Call out cross-team dependencies explicitly. A feature that requires work from three teams needs coordination, and the roadmap should reflect that.
Missing capacity buffers. Plan for roughly 70% of capacity. The remaining 30% absorbs unplanned work, production incidents, and scope adjustments.

Dependency Management

Dependencies between teams are the primary source of delivery delays in mid-to-large engineering organizations. Managing them requires making them visible and reducing them systematically.

Tactics for dependency management:

Dependency boards. Maintain a shared board (physical or digital) where every cross-team dependency is tracked with an owner, a status, and a target date.
Regular dependency syncs. A weekly 15-minute standup where teams surface blockers and update dependency status. This catches problems early when they are cheap to fix.
Architecture that reduces dependencies. Invest in APIs, event-driven systems, and platform abstractions that allow teams to work independently. The best dependency management is having fewer dependencies.
Sequence work to minimize blocking. When planning a quarter, identify which items will unblock other teams and prioritize them early.

OKR Setting for Engineering

OKRs (Objectives and Key Results), originated by Andy Grove at Intel and popularized by John Doerr at Google, provide a framework for connecting engineering work to business outcomes.

The structure is straightforward:

Objective: A qualitative, inspirational goal. Example: "Make our API the fastest and most reliable in the industry."
Key Results: 2-4 measurable outcomes that indicate whether the objective is being achieved. Example: "Reduce p99 API latency from 800ms to 200ms" or "Achieve 99.95% uptime for all Tier 1 endpoints."

Common mistakes EMs make with OKRs:

Output-based key results. "Ship the new caching layer" is an output. "Reduce database read latency by 60%" is an outcome. Outcome-based key results allow teams to find the best solution rather than being locked into a predetermined implementation.
Too many OKRs. Three objectives with three key results each means nine measurable targets. That is already at the upper limit. More than that dilutes focus.
Sandbagging targets. If every key result is achieved at 100%, the targets were not ambitious enough. A healthy OKR system expects 60-70% achievement on stretch goals.
No mid-cycle check-ins. OKRs set at the start of a quarter and reviewed only at the end are not useful for course correction. Run check-ins at weeks 3 and 6 to assess progress and adjust.

When an interviewer asks you to set OKRs for a hypothetical team, demonstrate the connection between the team's work and the company's strategic priorities. Show that you think in terms of outcomes, not outputs.

You have now completed the four lessons in this module on Leadership System Design. Take the module quiz to test your understanding of organizational design, team topologies, engineering processes, and delivery metrics. :::

DORA Metrics

Velocity Tracking

Roadmap Creation

Dependency Management

OKR Setting for Engineering

Quiz

Stay on the Nerd Track