Severity tagging done well

Severity tags are how a code review becomes actionable. "There's a bug" leaves the reviewer guessing whether to block the merge. "HIGH — SQL injection" doesn't.

Three labels are usually enough: HIGH, MED, LOW. More than three creates analysis paralysis. Two (just "important" and "not") loses the ability to differentiate "must fix before merge" from "fix in the next sprint."

Here is the rubric to embed in your review prompts. Different teams will calibrate differently; what matters is that your prompt commits to a definition.

Severity	Definition	Example
HIGH	Production-impacting if shipped: security, data loss, correctness violations	SQL injection, missing auth check, off-by-one in financial logic
MED	Will cause an issue eventually: latent bugs, performance issues at scale, missing error handling	Unhandled exception path, N+1 query, missing input validation
LOW	Style, readability, minor refactors	Variable naming, dead code, repetitive helpers

The captured review in lesson 1 calibrated correctly. The SQL injection is HIGH — it can be exploited today. The missing input validation is MED — it won't bite immediately but will eventually. The short variable name is LOW — it slows reading but doesn't break anything.

The three severity tiers, side by side:

HIGH / MED / LOW severity rubric

BLOCK

HIGH

VerdictBLOCK

WhenProduction-impacting if shipped

ExamplesSQL injection, auth bypass

Reviewer actionDo not merge

Cons

Security exploit reachable today
Data loss or corruption
Off-by-one in financial logic

REQUEST_CHANGES

MED

VerdictREQUEST_CHANGES

WhenWill surface eventually

ExamplesUnhandled exception, N+1 query

Reviewer actionAuthor fixes, re-requests review

Pros

Latent bug, not exploited today
Performance issue at scale
Missing input validation

APPROVE

LOW

VerdictAPPROVE (optional fixes)

WhenNo behavioural impact

ExamplesNaming, dead code

Reviewer actionMerge or fix in next sprint

Pros

Style or readability
Repetitive helper
Variable name too short

A common mistake is letting the model self-calibrate without a rubric. Without explicit definitions, models tend to over-tag HIGH (everything sounds urgent in code review prose) or under-tag (the model is hedging because it doesn't know your team's bar). Pasting your rubric in the prompt fixes both.

Try this version of the review prompt with explicit calibration:

Review this code. Use this severity rubric:

HIGH: would cause a production incident if shipped (security, data loss, correctness)

MED: latent issue that will surface eventually (perf at scale, error handling, hidden bugs)

LOW: style or readability, no behavioural impact

Categories: SECURITY, PERFORMANCE, READABILITY. 0–3 issues per category. Format: SEVERITY — issue (line N) — fix in one line. End with a 1-line verdict: APPROVE / REQUEST_CHANGES / BLOCK.

The output will be more consistent across reviews because the model is now using your team's bar, not its own.

A trick for teams that want stricter calibration: ask the model to justify any HIGH tag in one sentence. "HIGH — SQL injection (line 5) — string concatenation directly from user input is exploitable today." Now you can audit whether the HIGH was warranted. A finding tagged HIGH that the model can't justify in one sentence probably wasn't HIGH.

You can also use severity tags for triage at scale. If you're reviewing a 50-file PR, ask the model to output only HIGH and MED findings across all files in one combined list. LOW findings get a separate low_findings.md you skim later. The same prompt skeleton, scaled by output filtering.

Next up: turning a feature spec into pytest tests. :::

HIGH / MED / LOW severity rubric

HIGH

MED

LOW

Quiz

Stay on the Nerd Track