Self-Critique: Make the Model Grade Itself

Self-critique is the pattern where you ask the model to write something, then immediately critique its own output against criteria you specify, and revise if any criterion fails. It's surprisingly effective because the criticism step uses the model's evaluative skill — which is usually sharper than its first-draft skill.

How the prompt is shaped

The structure is always the same:

Generate something (a headline, a description, a code snippet).
Score it against a list of named criteria.
Revise if any criterion scores below a threshold.

You write this in one prompt. The model does all three steps in one response.

The self-critique loop

Generate

Produce the first draft from the prompt

Score

Grade it against 3-5 named criteria

Decide

Any criterion below the threshold?

Revise

Rewrite to fix the failing criterion

A worked example — headline generation

Write a 2-sentence headline for a blog post titled "Why your A/B tests
are lying to you". Then critique your own headline against these criteria:
clarity, curiosity, specificity. If any criterion scores below 8/10, write
a second headline that fixes it.

Captured output:

# Headline

**Your A/B Tests Might Be Statistically Significant—But Completely Wrong. Here's Why Sample Ratio Mismatch and 5 Other Hidden Biases Are Sabotaging Your Results.**

# Critique

**Clarity: 7/10**
The headline makes the core message clear (A/B tests can be wrong despite seeming valid), but it's dense with jargon. "Sample Ratio Mismatch" and "statistical significance" may confuse readers unfamiliar with testing terminology.

**Curiosity: 9/10**
Strong curiosity driver—the tension between "statistically significant" and "completely wrong" creates intrigue. Promising to reveal "5 other hidden biases" provides a specific payoff.

**Specificity: 9/10**
Excellent specificity with concrete details: mentions specific problem (Sample Ratio Mismatch), quantifies issues (6 total biases), and identifies the practical impact (sabotaging results).

# Revised Headline

**Your A/B Tests Show a Clear Winner—But You're Making the Wrong Decision. Here's How 6 Invisible Errors Corrupt Your Data (And How to Catch Them).**

This revision improves clarity to 9/10 by removing technical jargon while maintaining the same tension. It keeps curiosity at 9/10 with relatable language (\"clear winner\" vs \"wrong decision\") and maintains specificity at 9/10 by adding a practical benefit (\"how to catch them\").

Captured from Claude Sonnet 4.5 (claude-sonnet-4-5) on 2026-04-27. Re-runs may differ slightly.

Notice what just happened:

The first draft scored 7/10 on clarity because of jargon — and the model itself flagged that.
Triggered by the "below 8/10" rule, it produced a second draft that swaps "Sample Ratio Mismatch" and "statistically significant" for plain-English equivalents ("clear winner", "invisible errors").
The improvement is real and concrete, not a cosmetic edit.

Why this works

The model is good at evaluation when given explicit criteria. Without criteria, "is this headline good?" elicits a vague "yes, it's pretty good" answer. With named axes (clarity, curiosity, specificity) and a numerical scale, the model is forced to look at the headline through three different lenses. That focused criticism is what surfaces the weakness.

Designing your criteria

Three rules for picking good criteria:

Rule	Why
3 to 5 criteria, no more.	Too many and the critique becomes a list of micro-quibbles.
Name them precisely.	"Good" is meaningless. "Specificity" or "fits 80 chars" or "no jargon" is concrete.
Make them independent.	"Clarity" and "ease of understanding" are the same thing — pick one.

A trap to avoid

Self-critique is not a substitute for showing the model what good looks like. If you have great examples (few-shot territory), use them. Self-critique shines when you need judgement on a draft you just generated — like editing a headline, tightening a paragraph, or stress-testing a plan.

When you'll reach for it

In Hagar's daily work, self-critique is the right move when:

She wrote a LinkedIn post and it feels "fine but not great".
She's pitching three campaign concepts and wants the model to score each.
An email draft sounds slightly off and she can't pinpoint why.

The pattern turns "this isn't quite right" into a structured diagnosis plus a fix.

Next: decomposition — when one big task becomes 5 smaller, manageable ones. :::