Creative vs reasoning — different model strengths — Cross-Model Mastery — The Prompt Engineering Path — Nerd Level Tech

The puzzle in lesson 2 had a single correct answer. Reasoning tasks reward correctness. Creative tasks reward something different — concision, voice, surprise, structure that fits the prompt. The same models that ranked Claude > GPT > Gemini on reasoning may rank differently on creative.

The captured prompt is a 50-word ghost story with a twist in the last 5 words.

The prompt

Write a 50-word ghost story set in a Cairo apartment building. The twist is in the last 5 words. No more than 50 words total.

Claude Sonnet 4.5

The old elevator in my Cairo building stopped at floors that didn't exist. Fourth floor—screams. Sixth floor—whispers in Arabic. Tonight it stopped at the eighth floor, doors opening to reveal my own apartment. I stepped out and saw myself sleeping peacefully in bed, still very much alive.

GPT-4o-mini

In an old Cairo apartment, whispers echoed through the halls at night. Residents spoke of a woman in white, mourning her lost love. One evening, a curious tenant followed the sound, discovering a dusty mirror. As she peered in, the reflection smiled back, revealing her own face—yet she was never there.

Gemini 2.5 Flash

The old Cairo apartment building held its

Side-by-side scorecard

50-word ghost story with a final-5-words twist

Tight twist

Claude Sonnet 4.5

Word count47 (≤ 50 ✓)

Set in Cairo apartmentYes (eighth floor)

Has a twistYes

Twist in last 5 wordsYes ('still very much alive')

TruncatedNo

Pros

Structural beats: floors 4, 6, 8
Twist lands exactly on the last 5

Smooth, off-target

GPT-4o-mini

Word count51 (over by 1)

Set in Cairo apartmentYes (general)

Has a twistYes (paradox)

Twist in last 5 wordsBuried in longer last sentence

TruncatedNo

Cons

Twist is conceptual, not structural
Slightly over the cap

No story

Gemini 2.5 Flash

Word count8

Set in Cairo apartmentStarted

Has a twistNo

Twist in last 5 wordsN/A

TruncatedYes

Cons

Stopped at 'held its'
Not viable for creative tasks

What this tells us

Claude's story has actual narrative architecture. It builds — fourth floor, sixth floor, eighth floor — using the elevator stops as structural beats. The setting is specific (Cairo, the eighth floor, "my own apartment"). The twist is in the last 5 words: "still very much alive." That was the brief, and Claude landed it. Word count: 47 — within the cap.

GPT-4o-mini's story is well-formed prose but it does not really land the twist requirement. The last five words are "yet she was never there", which is a closing twist but it is more of a logical paradox than a structural one — the woman in white was the protagonist all along is not as cleanly plotted as Claude's "I am the ghost looking at me sleeping". GPT also missed slightly on the word cap — it is right around 50 words but the twist is buried in a longer last sentence.

Gemini truncated again. Six words. No story.

For creative writing, the takeaway is: Claude tends to produce tighter narrative structure under word caps, GPT tends to produce smoother prose without quite hitting the structural beats, and Gemini Flash on creative tasks is rarely the right model. This pattern is consistent with what we have seen across all the captures — Claude is disciplined under constraints, GPT is conversational and expansive, Gemini Flash truncates.

What this means for thinking mode

Should you turn on thinking for creative writing? Almost never. Creative writing benefits from one strong forward pass with good defaults, not from a hidden trace where the model second-guesses itself. The trace can introduce edits that flatten voice — "actually, let me try a different opening" produces an opening that sounds like a model trying to sound creative rather than a model that was creative on the first attempt.

The exception is creative writing with strict structural rules — a haiku, a sonnet, a story with mandatory elements that need accounting. There the constraint-tracking helps. For free-form creative writing, base models with good prompts beat thinking modes consistently.

Captured 2026-04-27 from Claude Sonnet 4.5, GPT-4o-mini, and Gemini 2.5 Flash. Re-runs may differ slightly.

Next module: where each frontier API sits relative to open-weight models — and how the prompt budget changes when you go local. :::

Creative vs reasoning — different model strengths

The prompt

Claude Sonnet 4.5

GPT-4o-mini

Gemini 2.5 Flash

Side-by-side scorecard

50-word ghost story with a final-5-words twist

Claude Sonnet 4.5

GPT-4o-mini

Gemini 2.5 Flash

What this tells us

What this means for thinking mode

Quiz

Stay on the Nerd Track