Capstone — three production prompts
What good and bad submissions look like
You will grade yourself. The next person to read your prompt — a teammate, a future you, the engineer you hand it off to — will not. To make sure your self-grade survives that handoff, here is what each kind of submission looks like.
A passing submission
A capstone prompt that scores 8 or higher has the same shape every time:
- The system prompt fits in one screen. You can read it without scrolling.
- Each of the five slots has a clear line. You can point at the role, the capabilities, the constraints, the format, and the example — they are not blended together.
- The constraints include at least one banned-word rule and at least one refusal-scope rule.
- There is a single worked example showing the joint shape of all the rules above.
- The captured input is real (a real email, a real diff, a real ingredient list — not "imagine an email").
- The captured output is real (run through the live API, not "the model would probably say").
- The self-grade has 8, 9, or 10 ones in the rubric column, with a one-line note next to any zero explaining why it is not blocking.
If a teammate could pick up this prompt and ship a fix on Monday morning without asking you a single question, it passes.
A failing submission
Failing submissions tend to fail in the same predictable ways. Look at your three drafts and check whether any of these traps apply:
| Failure mode | What it looks like | Fix |
|---|---|---|
| Toy scenario | "Make me a haiku writer." Nobody asks for haiku at 3pm Tuesday. | Pick a real, repeating task. |
| Wall-of-text constraints | Three paragraphs of "be friendly, be warm, be helpful, also..." | Convert to bulleted hard rules. |
| No real input | "Here is what someone might send..." | Paste a real message you actually got. |
| Missing refusal scope | The assistant cheerfully discusses anything. | Add a "refuse anything outside X" line. |
| Forged outputs | "The model would probably reply..." | Run it through the API. Paste the actual reply. |
| Self-grade inflation | 10/10 with no critical eye. | Read the prompt as if a stranger wrote it; grade harder. |
| Over 400 words | Prompts that try to anticipate every edge case. | Cut. The model handles edges if the rules are clear. |
Passing submission vs failing submission
Passing (≥ 8/10)
- Teammate can pick it up cold on Monday
- Refusal scope holds against pushback
- Output matches the spec, not just the rules
Failing (< 8/10)
- Toy scenario nobody asked for
- No refusal scope, no I-don't-know trigger
- Spec and actual output do not match
How to actually self-grade without lying to yourself
Three tactics:
- Wait 24 hours before grading. A prompt feels great the moment you write it. Sleep on it; come back tomorrow; grade then.
- Read it like a teammate inheriting it. Ask "if I were debugging this at 3am, would I know what each line was for?" If not, the prompt is not ready.
- Grade against the captures. If the prompt scored 9/10 on the rubric but the actual model output you captured does not match the spec, the prompt failed criterion 9 — drop a point.
Honesty here is what makes the rubric useful. Every prompt engineer who ships in production has shipped a prompt they thought was 10/10 that turned out to be a 6. The difference is they caught it before users did.
You are done
When all three prompts pass at 8 or above, the capstone is complete. Hagar finished hers in one weekend. You can finish yours in less if your scenarios are clear. Save the three prompts somewhere you will find them in six months — they are the foundation you will iterate on for the rest of your prompt-engineering career.
Welcome out the other side. Module 1 was a worried Hagar staring at a blank ChatGPT tab. The capstone is you, with three production prompts in your back pocket. :::
Sign in to rate