Reasoning modes — when to think, when to skip
Creative vs reasoning — different model strengths
The puzzle in lesson 2 had a single correct answer. Reasoning tasks reward correctness. Creative tasks reward something different — concision, voice, surprise, structure that fits the prompt. The same models that ranked Claude > GPT > Gemini on reasoning may rank differently on creative.
The captured prompt is a 50-word ghost story with a twist in the last 5 words.
The prompt
Write a 50-word ghost story set in a Cairo apartment building. The twist is in the last 5 words. No more than 50 words total.
Claude Sonnet 4.5
The old elevator in my Cairo building stopped at floors that didn't exist. Fourth floor—screams. Sixth floor—whispers in Arabic. Tonight it stopped at the eighth floor, doors opening to reveal my own apartment. I stepped out and saw myself sleeping peacefully in bed, still very much alive.
GPT-4o-mini
In an old Cairo apartment, whispers echoed through the halls at night. Residents spoke of a woman in white, mourning her lost love. One evening, a curious tenant followed the sound, discovering a dusty mirror. As she peered in, the reflection smiled back, revealing her own face—yet she was never there.
Gemini 2.5 Flash
The old Cairo apartment building held its
Side-by-side scorecard
50-word ghost story with a final-5-words twist
Claude Sonnet 4.5
- Structural beats: floors 4, 6, 8
- Twist lands exactly on the last 5
GPT-4o-mini
- Twist is conceptual, not structural
- Slightly over the cap
Gemini 2.5 Flash
- Stopped at 'held its'
- Not viable for creative tasks
What this tells us
Claude's story has actual narrative architecture. It builds — fourth floor, sixth floor, eighth floor — using the elevator stops as structural beats. The setting is specific (Cairo, the eighth floor, "my own apartment"). The twist is in the last 5 words: "still very much alive." That was the brief, and Claude landed it. Word count: 47 — within the cap.
GPT-4o-mini's story is well-formed prose but it does not really land the twist requirement. The last five words are "yet she was never there", which is a closing twist but it is more of a logical paradox than a structural one — the woman in white was the protagonist all along is not as cleanly plotted as Claude's "I am the ghost looking at me sleeping". GPT also missed slightly on the word cap — it is right around 50 words but the twist is buried in a longer last sentence.
Gemini truncated again. Six words. No story.
For creative writing, the takeaway is: Claude tends to produce tighter narrative structure under word caps, GPT tends to produce smoother prose without quite hitting the structural beats, and Gemini Flash on creative tasks is rarely the right model. This pattern is consistent with what we have seen across all the captures — Claude is disciplined under constraints, GPT is conversational and expansive, Gemini Flash truncates.
What this means for thinking mode
Should you turn on thinking for creative writing? Almost never. Creative writing benefits from one strong forward pass with good defaults, not from a hidden trace where the model second-guesses itself. The trace can introduce edits that flatten voice — "actually, let me try a different opening" produces an opening that sounds like a model trying to sound creative rather than a model that was creative on the first attempt.
The exception is creative writing with strict structural rules — a haiku, a sonnet, a story with mandatory elements that need accounting. There the constraint-tracking helps. For free-form creative writing, base models with good prompts beat thinking modes consistently.
Captured 2026-04-27 from Claude Sonnet 4.5, GPT-4o-mini, and Gemini 2.5 Flash. Re-runs may differ slightly.
Next module: where each frontier API sits relative to open-weight models — and how the prompt budget changes when you go local. :::
Sign in to rate