Tone & instruction-following across models
Why the same prompt produces three outputs
Hagar finished the Foundation course six months ago. Her marketing-side-project at Bayt Coffee turned into a real role — she is now the AI engineer at a small Cairo startup, and her CTO walked into standup last Monday with a single question: "Why are we paying for Claude when GPT-4o-mini is one-tenth the price? Should we switch?"
That question is what this course answers.
You already know how to write a prompt. The Foundation course taught you the 5-slot skeleton, format locking, system prompts, few-shot, the I-don't-know trigger. Those rules still hold. What changes here is that the same prompt — same words, same constraints, same examples — produces a different answer depending on which model receives it.
Not slightly different. Visibly different. Different lengths, different formats, different willingness to push back, different rule-following discipline, different tone defaults.
What you will see across this course
Every comparison lesson in this course was captured live on 2026-04-27 from three frontier APIs:
| Model | Vendor | Role in the comparison |
|---|---|---|
| Claude Sonnet 4.5 | Anthropic | The careful, format-disciplined one |
| GPT-4o-mini | OpenAI | The chatty, friendly default |
| Gemini 2.5 Flash | The fast, sometimes-truncating one |
These are not screenshots from a blog post. They are not paraphrased. The exact bytes the API returned are pasted into each lesson. When a model fails an instruction, you will see the failure verbatim. When two models give answers that look almost identical, you will see how the small differences matter.
What you will learn to decide
- Which model produces the cleanest output for this task — without any prompt rewriting.
- Which model needs the prompt rewritten to match its dialect, and what the rewrite looks like.
- Which model fails on this task in a way that no prompt fix can repair, and what to fall back to.
- What it costs you in tokens, latency, and dollars to get to a usable answer on each.
Hagar's CTO question — Claude vs GPT, should we switch — is not answered with "yes" or "no". It is answered with: "for these tasks GPT works fine, for these tasks Claude is worth paying for, for these tasks Gemini is faster, and here is the comparison report."
That report is your capstone.
Next: a strict-rule prompt sent to all three models. The output is uglier than you think. :::
Sign in to rate