Tests from a spec

When you're reviewing a PR that's missing tests, you have two choices. Push back on the author and wait. Or write the tests yourself in the next minute and unblock the review. The second is usually faster and almost always healthier for the team.

A spec-to-tests prompt is the workhorse for this. Give the model the function signature, an input/output example, and an explicit list of cases to cover. It returns a test file you can paste in.

Write pytest tests for a function split_invoice_lines(text: str) -> list[dict] that parses lines like: "3 x Latte @ 65" -> {"qty": 3, "item": "Latte", "unit_price": 65}

Cover:

Single line happy path.

Multiple lines (newline-separated).

Whitespace tolerance (extra spaces).

Decimal price ("@ 65.50").

Invalid line (no "x" or no "@") raises ValueError.

Empty string returns [].

Output ONLY the test file, with imports.

Claude Sonnet 4.5 returned a long file that included both the function and the tests (the response was truncated at the boundary). The relevant test portion looks like:

def test_single_line_happy_path():
    """Test parsing a single valid invoice line."""
    text = "3 x Latte @ 65"
    result = split_invoice_lines(text)

    assert len(result) == 1
    assert result[0] == {"qty": 3, "item": "Latte", "unit_price": 65.0}


def test_multiple_lines():
    """Test parsing multiple newline-separated invoice lines."""
    text = """3 x Latte @ 65
2 x Espresso @ 50
1 x Cappuccino @ 70"""

    result = split_invoice_lines(text)

    assert len(result) == 3
    assert result[0] == {"qty": 3, "item": "Latte", "unit_price": 65.0}
    assert result[1] == {"qty": 2, "item": "Espresso", "unit_price": 50.0}

Captured from Claude Sonnet 4.5 (claude-sonnet-4-5) on 2026-04-27. Re-runs may differ slightly.

Two important observations:

One: the model added a function implementation alongside the tests. That's because the prompt didn't say "tests only — assume the function exists." Without that constraint, the model helpfully writes the function so the tests have something to import. That's harmless if you're prototyping. It's noise if you're reviewing a PR that already has the implementation. Add the constraint:

Assume the function is already implemented in app/invoices.py and import it. Output the test file only.

Two: each test does one thing. test_single_line_happy_path doesn't also check whitespace tolerance. test_multiple_lines doesn't also check decimal prices. The cases listed in the prompt became the test names — one case per test. That's the discipline you'd hope for from a careful colleague.

How a spec turns into a one-test-per-property suite:

To make spec-to-tests prompts even tighter for code review use, add these constraints:

Constraint	What it does
`Assume the function is already implemented`	Stops the model from re-writing the function
`One assert per property`	Avoids "passes for the wrong reason"
`Each test name describes the property tested`	Tests double as documentation
`Use parametrize where 3+ cases share structure`	Reduces line count for similar cases

A useful next step on the same prompt: ask for a second test file that targets only the boundary cases the author probably forgot. Empty input, single character, max-length input, unicode characters. The model's recall on edge cases is excellent because it has seen thousands of similar bug reports in training. Use that.

You can chain the spec-to-tests prompt with the review prompt from lesson 1: "Review this PR. If tests are missing, generate them in pytest format." Now your review pass produces both the feedback and the tests in one round-trip.

Next up: getting the model to commit to a one-line verdict. :::

Quiz

Stay on the Nerd Track