Reasoning modes — when to think, when to skip
A logic puzzle across three base models
A small logic puzzle is the cleanest place to see reasoning differences. Three constraints, three runners, work out the order. The output format is fixed — a one-paragraph reasoning, then a 1st/2nd/3rd assignment. None of the three models was given thinking mode for this run. This is the base-model performance.
The prompt
Three friends — Ali, Bilal, and Cyrus — finished a race in some order.
- Ali did not finish last.
- Bilal finished after Cyrus.
- Cyrus did not finish first.
Who finished in each position? Show your one-paragraph reasoning, then give the final answer in this format: 1st: 2nd: 3rd:
Claude Sonnet 4.5
Let me work through this step by step. Since Cyrus did not finish first, he finished either 2nd or 3rd. Since Bilal finished after Cyrus, if Cyrus finished 2nd, then Bilal finished 3rd; if Cyrus finished 3rd, then Bilal would need to finish after 3rd, which is impossible. Therefore, Cyrus finished 2nd and Bilal finished 3rd. This means Ali finished 1st, which is consistent with the constraint that Ali did not finish last.
1st: Ali 2nd: Cyrus 3rd: Bilal
GPT-4o-mini
To determine the finishing order of Ali, Bilal, and Cyrus, we analyze the provided clues. First, since Ali did not finish last, he must have finished either first or second. Second, since Bilal finished after Cyrus, he cannot be in first place. Third, since Cyrus did not finish first, he must have finished either second or third. The only possible scenario that satisfies all these conditions is if Cyrus finished second, Ali finished first, and Bilal finished last. Thus, the final positions are:
1st: Ali 2nd: Cyrus 3rd: Bilal
Gemini 2.5 Flash
Cyrus could not have finished first, and Bilal finished after Cyrus,
Side-by-side scorecard
Logic puzzle: did the base model solve it?
Claude Sonnet 4.5
- Picks most-constrained variable first
- Shows the impossible case
GPT-4o-mini
- Hides the actual deduction step
- Right answer, hand-wavier explanation
Gemini 2.5 Flash
- Stopped after restating two clues
- Not a candidate for reasoning tasks
What this tells us
Claude and GPT-4o-mini both arrived at the correct answer (Ali / Cyrus / Bilal) and both produced the answer in the requested format. They got there via slightly different reasoning paths.
Claude's reasoning is constraint-elimination: pick the most-constrained variable first (Cyrus, who has two explicit constraints), enumerate the cases, eliminate the impossible one. This is the textbook approach to small constraint-satisfaction problems and Claude executes it crisply.
GPT-4o-mini's reasoning is constraint-listing: gather what each constraint implies, narrow the search space, identify the only consistent assignment. The chain is correct but slightly indirect — "the only possible scenario that satisfies all these conditions" is doing more work than the explanation makes clear. GPT got the answer; the explanation hides the actual deduction.
Gemini Flash truncated. Again. Two clauses in, no answer, no format. This is a base-model run, not a thinking-mode run, but Flash is producing nothing usable for any reasoning task we have shown so far. The thinking mode is supposed to fix this — Gemini 2.5 Pro with thinking_config enabled is a different beast. But Flash on its own does not have the headroom for a constrained reasoning problem.
The takeaway for Hagar's CTO question: if your task involves any kind of multi-step deduction, even a tiny one like this, Claude and GPT-4o-mini are both viable on the base model. Gemini Flash is not. The next lesson asks: should you upgrade to thinking mode anyway? When does the latency and cost pay back?
Captured 2026-04-27 from Claude Sonnet 4.5, GPT-4o-mini, and Gemini 2.5 Flash. Re-runs may differ slightly.
Next: when extended thinking actually pays back the latency cost. :::
Sign in to rate