From chat to API: the system slot
Temperature in plain English
Temperature is the third API knob, and the most misunderstood. Every tutorial gives a vague "controls creativity" definition and moves on. Here is the practical version.
When the model picks the next word, it is choosing from a probability distribution. Temperature is how much you scale that distribution before sampling. Low temperature flattens the long tail and makes the top choice almost certain. High temperature flattens the peak and lets unlikely words show up.
Temperature dial — 0 to 1
Strict. Same input → near-same output. JSON, classification, code.
Steady. Small variation. Customer replies, tone-locked rewrites.
Loose. Notable variation. Brainstorming, marketing copy, taglines.
Wild. Big swings between runs. Creative fiction, ad concepts.
Mentally, you can think of it as a dial from 0 to 1:
| Temperature | What it feels like | Use it for |
|---|---|---|
| 0.0 – 0.2 | Strict. Same input mostly returns the same output. | Classification, extraction, JSON, code, anything you grade against a fixed answer. |
| 0.3 – 0.5 | Steady. Small variation, still controlled. | Customer support replies, summaries, tone-locked rewrites. |
| 0.6 – 0.8 | Loose. Each run feels noticeably different. | Brainstorming, marketing copy, taglines, alternate phrasings. |
| 0.9 – 1.0 | Wild. Outputs swing hard between runs. | Creative fiction, ad concepts, idea generation when you want surprise. |
Two practical rules
- Default to 0.3 for production. It is boring on purpose. You want repeatability when real users are on the other end. Save the high-temperature runs for the brainstorming session, not the live customer reply.
- Temperature does not fix a bad prompt. If your system prompt is vague, cranking temperature up just gives you "vaguer in five different ways". Cranking it down gives you "the same vague answer every time". Tighten the prompt first; tune temperature last.
A common trap
People reach for temperature when the real problem is missing constraints. If the model keeps adding fluffy disclaimers, the fix is a system prompt that says "no disclaimers, no caveats" — not temperature 0.0. Temperature controls how the model picks words, not which words are allowed.
The Bayt Coffee assistant Hagar is about to build will live at temperature 0.3: deterministic enough that a complaint about a late order produces almost the same reply every time, but loose enough that the language does not feel robotic.
Next module: scaling the 5-slot prompt skeleton up into a real production system prompt. :::
Sign in to rate