Lesson 40 of 42

Capstone — three production prompts

The 10-point rubric

3 min read

Every prompt you ship in the capstone gets graded against the same ten criteria. One point per criterion, no half points. Anything below 8/10 means the prompt is not ready — redesign it before claiming it shipped.

The rubric

#CriterionPoints
1Role / persona stated1
2Specific task with measurable outcome1
3Input format defined1
4Output format defined (length, structure, tone)1
5At least 2 hard constraints (must / must-not)1
6At least 1 worked example (input → expected output)1
7Refusal or fallback when out-of-scope1
8"I don't know" trigger when info is missing1
9Tested on at least one real input — output meets spec1
10Total length under 400 words (system prompts that don't fit in working memory rot)1

Passing = 8/10 (80%). Anything below means redesign before shipping. (The 80% pass bar is consistent across all capstones in the Prompt Engineering Path — Code uses 40/50, Cross-Model uses a five-row rubric — same bar, different scales.)

The 10-point rubric — grouped by what each criterion guards against

1, 2, 3, 4, 10

Structure (5 points)

Role stated1 pt
Specific task1 pt
Input format1 pt
Output format1 pt
Under 400 words1 pt
Pros
  • Anchors who and what
  • Defines shape end-to-end
  • Keeps it readable
5, 7, 8

Behaviour (3 points)

Hard constraints1 pt
Refusal scope1 pt
I don't know trigger1 pt
Pros
  • Stops tone drift
  • Stops off-brand wandering
  • Stops confident hallucination
6, 9

Evidence (2 points)

Worked example1 pt
Tested on real input1 pt
Pros
  • Locks the joint shape
  • Caught real failures before ship
  • Promotes rubric to reality

How to grade each item, honestly

The trap with self-grading is being generous with yourself. The criteria are written so that "almost" does not score. Here is how to grade strictly:

  • Role stated — there is a single sentence in the prompt that names the role and the brand. "You are helpful" does not count.
  • Specific task with measurable outcome — could a teammate write a one-line check that decides if the output passes? If not, the task is not measurable.
  • Input format defined — the prompt names what the user message will look like (a paragraph, a JSON, a code block, a transcript). If you would not know how to feed it, fail it.
  • Output format defined — length AND structure AND tone are all specified. Two of three fails.
  • 2 hard constraints — count them. They must be specific ("never use 'unfortunately'") not vague ("be friendly").
  • 1 worked example — a real input and a real expected output, not a description of "what good output looks like".
  • Refusal scope — there is a line covering what the assistant will NOT do.
  • "I don't know" trigger — there is a line covering what the assistant says when it lacks the info.
  • Tested on real input — you actually fed it a real message and read the output, not "I think it would work".
  • Under 400 words — open the prompt in a word counter. Be honest.

Why this rubric exists

The rubric is reverse-engineered from the failure modes of prompts that crashed in production. Each criterion maps to a class of bug:

  • Missing role → tone drift across replies.
  • Vague task → outputs that "look reasonable" but do not solve the user's problem.
  • No format → markdown headings everywhere, or none at all.
  • No examples → tone matches the rules but the shape is wrong.
  • No refusal → the assistant cheerfully answers off-brand questions.
  • No "I don't know" trigger → confident hallucination.
  • Untested → all of the above shipped at once.
  • Over 400 words → contradictions inside the prompt that you cannot see at a glance.

If you score 8 or above on a prompt, you have avoided all eight failure classes. That is the bar.

Next: three example scenarios you can study or use as templates. :::

Quiz

Module 9: Capstone

Take Quiz
Was this lesson helpful?

Sign in to rate

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.