Code-review prompts (security / perf / readability)

Severity tagging done well

4 min read

Severity tags are how a code review becomes actionable. "There's a bug" leaves the reviewer guessing whether to block the merge. "HIGH — SQL injection" doesn't.

Three labels are usually enough: HIGH, MED, LOW. More than three creates analysis paralysis. Two (just "important" and "not") loses the ability to differentiate "must fix before merge" from "fix in the next sprint."

Here is the rubric to embed in your review prompts. Different teams will calibrate differently; what matters is that your prompt commits to a definition.

SeverityDefinitionExample
HIGHProduction-impacting if shipped: security, data loss, correctness violationsSQL injection, missing auth check, off-by-one in financial logic
MEDWill cause an issue eventually: latent bugs, performance issues at scale, missing error handlingUnhandled exception path, N+1 query, missing input validation
LOWStyle, readability, minor refactorsVariable naming, dead code, repetitive helpers

The captured review in lesson 1 calibrated correctly. The SQL injection is HIGH — it can be exploited today. The missing input validation is MED — it won't bite immediately but will eventually. The short variable name is LOW — it slows reading but doesn't break anything.

The three severity tiers, side by side:

HIGH / MED / LOW severity rubric

BLOCK

HIGH

VerdictBLOCK
WhenProduction-impacting if shipped
ExamplesSQL injection, auth bypass
Reviewer actionDo not merge
Cons
  • Security exploit reachable today
  • Data loss or corruption
  • Off-by-one in financial logic
REQUEST_CHANGES

MED

VerdictREQUEST_CHANGES
WhenWill surface eventually
ExamplesUnhandled exception, N+1 query
Reviewer actionAuthor fixes, re-requests review
Pros
  • Latent bug, not exploited today
  • Performance issue at scale
  • Missing input validation
APPROVE

LOW

VerdictAPPROVE (optional fixes)
WhenNo behavioural impact
ExamplesNaming, dead code
Reviewer actionMerge or fix in next sprint
Pros
  • Style or readability
  • Repetitive helper
  • Variable name too short

A common mistake is letting the model self-calibrate without a rubric. Without explicit definitions, models tend to over-tag HIGH (everything sounds urgent in code review prose) or under-tag (the model is hedging because it doesn't know your team's bar). Pasting your rubric in the prompt fixes both.

Try this version of the review prompt with explicit calibration:

Review this code. Use this severity rubric:

  • HIGH: would cause a production incident if shipped (security, data loss, correctness)
  • MED: latent issue that will surface eventually (perf at scale, error handling, hidden bugs)
  • LOW: style or readability, no behavioural impact

Categories: SECURITY, PERFORMANCE, READABILITY. 0–3 issues per category. Format: SEVERITY — issue (line N) — fix in one line. End with a 1-line verdict: APPROVE / REQUEST_CHANGES / BLOCK.

The output will be more consistent across reviews because the model is now using your team's bar, not its own.

A trick for teams that want stricter calibration: ask the model to justify any HIGH tag in one sentence. "HIGH — SQL injection (line 5) — string concatenation directly from user input is exploitable today." Now you can audit whether the HIGH was warranted. A finding tagged HIGH that the model can't justify in one sentence probably wasn't HIGH.

You can also use severity tags for triage at scale. If you're reviewing a 50-file PR, ask the model to output only HIGH and MED findings across all files in one combined list. LOW findings get a separate low_findings.md you skim later. The same prompt skeleton, scaled by output filtering.

Next up: turning a feature spec into pytest tests. :::

Quiz

Module 4: Code Review Prompts

Take Quiz
Was this lesson helpful?

Sign in to rate

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.