Tone & instruction-following across models
Tone differences — warmer without losing intent
Same three models, very different tone defaults. The user wants a passive-aggressive deadline reminder rewritten warmer and more direct, under 80 words, no new commitments. Watch what each model thinks "warm" means.
The original email vs what we want — toggle to compare
The prompt
Rewrite this email warmer and more direct. Keep it under 80 words. Do not add new commitments.
Original: "Hi team, Per my last note, the deck is overdue. We need it by EOD or we cannot ship the campaign Monday. Please prioritise. Thanks."
Claude Sonnet 4.5
Hi team,
Just checking in on the deck we discussed. I know you're juggling a lot right now!
To keep our Monday campaign launch on track, we'll need it by end of day today.
Appreciate your help with this!
Thanks,
GPT-4o-mini
Subject: Quick Reminder: Deck Due Today
Hi team,
I hope you're all doing well! Just a friendly reminder that the deck is overdue. We really need it by the end of the day to ensure we can ship the campaign on Monday. Your help in prioritizing this is greatly appreciated!
Thanks so much!
Gemini 2.5 Flash
Hi team,
The deck is overdue
Side-by-side scorecard
Warmth rewrite under 80 words
Claude Sonnet 4.5
- Hit the 80-word cap with margin
- Stripped the implicit threat
GPT-4o-mini
- Volunteers email scaffolding you didn't ask for
- Filler phrases ('I hope you're all doing well')
Gemini 2.5 Flash
- Stopped after the greeting
- Returned the un-rewritten problem
What this tells us
Three completely different ideas of "warm". Claude opened with empathy ("I know you're juggling a lot right now"), softened the deadline language ("we'll need it"), and stripped the implicit threat. It also followed the constraint — no subject line, kept under 80 words, no new commitments added. Word count: 40.
GPT-4o-mini added a subject line the user did not ask for ("Quick Reminder: Deck Due Today"). Adding fields that were not requested is GPT-4o-mini's most consistent dialect quirk across this course — when it sees an email-shaped task, it volunteers email-shaped scaffolding. The body itself is warm but redundant ("I hope you're all doing well" is a phrase Claude was specifically taught to skip). Word count: 51.
Gemini truncated again. Two sentences, no warmth, no greeting — and the second sentence is just the original problem statement restated. This is the failure mode you saw in the previous lesson but in a different domain: when the prompt asks for transformation under constraints, Flash sometimes returns the un-transformed version with formatting damage.
The takeaway for tone tasks: GPT and Claude both produce usable warmth, but they get there via opposite levers. Claude removes pressure. GPT adds friendliness. If your brand voice is Bayt Coffee — direct, warm, no fluff — Claude's pattern matches. If your brand voice is American-corporate-friendly, GPT's pattern matches. Gemini, on this kind of task, needs the prompt rewritten to be far less subtle.
Captured 2026-04-27 from Claude Sonnet 4.5, GPT-4o-mini, and Gemini 2.5 Flash. Re-runs may differ slightly.
Next: how to choose the right model per task — a decision tree you can take to your team. :::
Sign in to rate