Debugging prompts that actually localize
"This is what should happen" framing
The hardest debugging prompts are the ones where there is no exception. The code runs, returns a value, and the value is wrong. No traceback to paste, no error class to look up. Just this is not what I expected.
These bugs need a specific kind of prompt — the expected vs. actual framing. You give the model the code, an input, the actual output, and the expected output. The gap between actual and expected is the bug specification.
Compare two attempts at the same prompt:
Weak:
Why is
format_phone("01012345678")returning the wrong thing?
Strong:
Function
format_phone("01012345678")returns"010-1234-5678". Expected output:"+20 10 1234 5678"(E.164-style with Egypt country code). What is causing the gap?
The strong version forces the model to compare two strings. The weak version invites it to invent what "wrong" means.
The same template applies to subtle logic bugs. Suppose you have a sort function that should be stable but isn't. Don't say "the sort is wrong." Say:
Input:
[("a", 1), ("b", 1), ("a", 2)]Actual output (from current code):[("a", 1), ("a", 2), ("b", 1)]Expected output (stable sort by first element):[("a", 1), ("b", 1), ("a", 2)]The sort is supposed to be stable. What in this code breaks stability?
Three properties to keep in this framing:
| Property | Why it matters |
|---|---|
| Concrete input | The model can mentally trace execution. Vague inputs invite vague answers. |
| Concrete actual output | Anchors the bug to a value, not a feeling. |
| Concrete expected output | Removes ambiguity about what "correct" means. |
If you're missing any of the three, you're guessing. Run the code in a REPL or print statement until you have all three, then prompt. Five minutes adding a print() saves twenty minutes of "the model said try X but X is unrelated."
The expected-vs-actual triangle the model needs:
A useful reframing trick: when you can't articulate the expected output, ask the model to articulate it for you, but in a separate prompt. Something like "Given this function name and docstring, what should format_phone('01012345678') return for an Egyptian mobile number?" This turns the model into a spec-writer, not a debugger. Once you have the spec — actual + expected — you can run the debugging prompt with full information.
This pattern also catches a class of bug LLMs are notoriously bad at: when the model "fixes" the wrong thing because it misread your intent. By making intent explicit (expected output) you prevent the silent rewrite.
Next up: turning a fixed bug into a regression test on the spot. :::
Sign in to rate