Faithful summarisation — count the embellishments — Cross-Model Mastery — The Prompt Engineering Path — Nerd Level Tech

The instruction here is unusually demanding: "Do not add facts not in the source. Do not soften or strengthen claims." Three sentences exactly. The source is a small pilot study with explicit caveats from the author. This is a faithfulness test — does the model preserve the qualifiers, or does it polish them into something cleaner that the original did not say?

The prompt

Summarise this in exactly 3 sentences. Do not add facts not in the source. Do not soften or strengthen claims.

Source: "In a small pilot, 14 of 22 participants reported feeling more focused after switching from afternoon coffee to herbal tea. The pilot ran for 2 weeks. The author cautions the sample is small and not blinded; results may not generalise."

Claude Sonnet 4.5

In a small pilot study, 14 of 22 participants reported feeling more focused after switching from afternoon coffee to herbal tea. The pilot study lasted for 2 weeks. The author cautions that the sample is small and not blinded, and results may not generalise.

GPT-4o-mini

In a small pilot study, 14 of 22 participants reported increased focus after switching from afternoon coffee to herbal tea. The study lasted for 2 weeks. The author warns that the sample size is small and not blinded, so the results may not be generalizable.

Gemini 2.5 Flash

A small, two-week pilot study

Side-by-side scorecard

Faithful summary: did the qualifiers survive?

Literal

Claude Sonnet 4.5

Sentence count3 (target)

Numbers preservedYes (14 of 22, 2 weeks)

Verb shift in caveatNone ('cautions')

Claim strengthenedNo

TruncatedNo

Pros

Preserves author qualifiers
Safest for compliance digests

Drifts subtly

GPT-4o-mini

Sentence count3 (target)

Numbers preservedYes (14 of 22, 2 weeks)

Verb shift in caveat'cautions' → 'warns'

Claim strengthenedSlight ('feeling more focused' → 'increased focus')

TruncatedNo

Cons

Two small strengthenings
Risky at scale for research digests

Truncated

Gemini 2.5 Flash

Sentence count1 fragment

Numbers preservedPartial

Verb shift in caveatCaveat lost

Claim strengthenedCannot tell

TruncatedYes

Cons

Stopped at 'A small, two-week pilot study'
Unusable as a summary

What this tells us

Claude is the most literal of the three. Its summary is essentially a re-typesetting of the source: same numbers (14 of 22), same time (2 weeks), same author qualifiers ("small and not blinded", "may not generalise"). Three sentences exactly. No new claims.

GPT-4o-mini changes two phrases. "Reported feeling more focused" became "reported increased focus" — a tiny strengthening: "feeling more focused" is a subjective report, "increased focus" sounds slightly more like an objective measurement. The other change: "the author cautions" became "the author warns" — semantically close, but "warns" is stronger than "cautions". Both shifts are small. In a single summary they look harmless. Across thousands of summaries in a research-digest pipeline, they accumulate into a systematic over-strengthening of claims.

Gemini truncated. "A small, two-week pilot study" is six words and stops mid-fragment. There is no claim, no number, no caveat. It would be useless as a summary, and worse than useless if the downstream code thinks it has a real summary.

For a faithful-summarisation task — the kind that goes into research digests, regulatory filings, medical content review — Claude's literal preservation is the safest default. GPT works if you accept some semantic drift, which is fine for marketing summaries but dangerous for compliance work. Gemini Flash on this task is not a candidate at all.

Notice that all three got the count right when they said anything: 14 of 22, 2 weeks. The drift is not in the numbers. It is in the qualifiers — and the qualifiers are exactly what makes a summary faithful or unfaithful.

Captured 2026-04-27 from Claude Sonnet 4.5, GPT-4o-mini, and Gemini 2.5 Flash. Re-runs may differ slightly.

Next: a translation guide for moving prompts between vendors when one is failing. :::

Faithful summarisation — count the embellishments

The prompt

Claude Sonnet 4.5

GPT-4o-mini

Gemini 2.5 Flash

Side-by-side scorecard

Faithful summary: did the qualifiers survive?

Claude Sonnet 4.5

GPT-4o-mini

Gemini 2.5 Flash

What this tells us

Quiz

Stay on the Nerd Track