Tone & instruction-following across models

Why the same prompt produces three outputs

4 min read

Hagar finished the Foundation course six months ago. Her marketing-side-project at Bayt Coffee turned into a real role — she is now the AI engineer at a small Cairo startup, and her CTO walked into standup last Monday with a single question: "Why are we paying for Claude when GPT-4o-mini is one-tenth the price? Should we switch?"

That question is what this course answers.

You already know how to write a prompt. The Foundation course taught you the 5-slot skeleton, format locking, system prompts, few-shot, the I-don't-know trigger. Those rules still hold. What changes here is that the same prompt — same words, same constraints, same examples — produces a different answer depending on which model receives it.

Not slightly different. Visibly different. Different lengths, different formats, different willingness to push back, different rule-following discipline, different tone defaults.

Cross-model knowledge map

What you will see across this course

Every comparison lesson in this course was captured live on 2026-04-27 from three frontier APIs:

ModelVendorRole in the comparison
Claude Sonnet 4.5AnthropicThe careful, format-disciplined one
GPT-4o-miniOpenAIThe chatty, friendly default
Gemini 2.5 FlashGoogleThe fast, sometimes-truncating one

These are not screenshots from a blog post. They are not paraphrased. The exact bytes the API returned are pasted into each lesson. When a model fails an instruction, you will see the failure verbatim. When two models give answers that look almost identical, you will see how the small differences matter.

What you will learn to decide

  1. Which model produces the cleanest output for this task — without any prompt rewriting.
  2. Which model needs the prompt rewritten to match its dialect, and what the rewrite looks like.
  3. Which model fails on this task in a way that no prompt fix can repair, and what to fall back to.
  4. What it costs you in tokens, latency, and dollars to get to a usable answer on each.

Hagar's CTO question — Claude vs GPT, should we switch — is not answered with "yes" or "no". It is answered with: "for these tasks GPT works fine, for these tasks Claude is worth paying for, for these tasks Gemini is faster, and here is the comparison report."

That report is your capstone.

Next: a strict-rule prompt sent to all three models. The output is uglier than you think. :::

Quiz

Module 1: Tone & instruction-following across models

Take Quiz
Was this lesson helpful?

Sign in to rate

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.