Strict-rule following — three models, four rules — Cross-Model Mastery — The Prompt Engineering Path — Nerd Level Tech

This is the first prompt where the dialects show. The user gives four hard constraints. Each model interprets "follow ALL of these rules" differently. Read the prompt, read all three outputs, then we will dissect.

Same prompt → 4 strict rules → three different scores

Counters animate up on scroll. Each bar fills proportionally to the rule-compliance score.

Claude Sonnet 4.5

Exactly 3 lines0/100

A → E → I vowel order0/100

Mentions 'coffee' once0/100

All lines ≤ 60 chars0/100

3/4 rules — broke the vowel ordering.

GPT-4o-mini

Exactly 3 lines0/100

A → E → I vowel order0/100

Mentions 'coffee' once0/100

All lines ≤ 60 chars0/100

2/4 rules — vowel order broke, lines too long.

Gemini 2.5 Flash

Exactly 3 lines0/100

A → E → I vowel order0/100

Mentions 'coffee' once0/100

All lines ≤ 60 chars0/100

1/4 rules — produced one line, no coffee.

The prompt

Follow ALL of these rules:

Reply in exactly 3 lines.

Each line must start with a different vowel (A, E, I, O, or U), in alphabetical order across the lines.

Mention the word "coffee" exactly once.

Do not use any line longer than 60 characters.

Topic: a quiet morning routine.

Claude Sonnet 4.5

A gentle sunrise begins the peaceful day ahead. Each sip of coffee warms me from the inside out. I stretch slowly, grateful for this quiet moment.

GPT-4o-mini

Awakening slowly, I savor the calm of dawn. Every sip of coffee warms my soul as I reflect. In silence, I embrace the peace before the day begins.

Gemini 2.5 Flash

A quiet start begins the day

Side-by-side scorecard

Four-rule prompt: which model held the line?

4 / 4 rules

Claude Sonnet 4.5

Lines returned3

Vowel order A-E-IYes

'coffee' exactly onceYes

All lines ≤ 60 charsYes

TruncatedNo

Pros

Followed every constraint
Compact phrasing

4 / 4 rules

GPT-4o-mini

Lines returned3

Vowel order A-E-IYes

'coffee' exactly onceYes

All lines ≤ 60 charsYes

TruncatedNo

Pros

Hit every rule
Warmer voice by default

1 / 4 rules

Gemini 2.5 Flash

Lines returned1 (need 3)

Vowel order A-E-INo

'coffee' exactly onceMissing

All lines ≤ 60 charsYes

TruncatedYes

Cons

Stopped at one short line
Failed three of four constraints

What this tells us

Count the rules each model passed. Claude hit all four: 3 lines, A/E/I in order, "coffee" once, every line under 60 chars. GPT-4o-mini hit three out of four — 3 lines, A/E/I order, "coffee" once — but its second line ("Every sip of coffee warms my soul as I reflect.") is 47 chars, fine, while its third line ("In silence, I embrace the peace before the day begins.") is 54 chars, also fine. So GPT actually passed all four too on this run, just with friendlier phrasing.

Gemini did something different and worth naming. It returned a single 32-character line. It did not produce 3 lines. It did not use three different vowel starts. It did not include the word "coffee". It failed three of the four hard rules and stopped early. This is not random — Gemini 2.5 Flash, the cheapest and fastest of the three, has a documented tendency to truncate when the prompt has many constraints. The model is optimised for latency, and constraint-counting eats into that budget.

The lesson is not that Gemini is broken. The lesson is that a prompt with N hard constraints has a different success rate on each model, and you have to know that before you ship. If your application sends a 4-rule prompt to Gemini Flash and expects all 4 followed, you will hit failures in production.

Captured 2026-04-27 from Claude Sonnet 4.5, GPT-4o-mini, and Gemini 2.5 Flash. Re-runs may differ slightly.

Next: same task, but the goal is warmth — a totally different axis the same models score very differently on. :::

Strict-rule following — three models, four rules

Same prompt → 4 strict rules → three different scores

Claude Sonnet 4.5

GPT-4o-mini

Gemini 2.5 Flash

The prompt

Claude Sonnet 4.5

GPT-4o-mini

Gemini 2.5 Flash

Side-by-side scorecard

Four-rule prompt: which model held the line?

Claude Sonnet 4.5

GPT-4o-mini

Gemini 2.5 Flash

What this tells us

Quiz

Stay on the Nerd Track