Prompt Design Patterns: Zero-Shot to Tree of Thoughts

Why This Matters for Interviews

OpenAI, Anthropic, and Meta expect L5-L6 candidates to design production-ready prompts on the spot. Common interview scenarios:

System design: "Build a code review assistant. What prompting strategy would you use?"
Debugging: "Our chatbot hallucinates on edge cases. How would you fix it with prompting?"
Comparison: "When would you use CoT vs ReAct vs ToT?"

Real Interview Question (Anthropic L6):

"A customer wants Claude to solve complex math problems. Walk me through how you'd design the prompt, from zero-shot to advanced techniques. What are the trade-offs?"

The Prompting Hierarchy

From simplest to most complex:

Zero-Shot → Few-Shot → Chain-of-Thought → ReAct → Tree of Thoughts
   ↓           ↓            ↓              ↓           ↓
 No examples  Examples   Show reasoning   Add actions  Explore paths
 Fast/cheap   Better     Much better     Tool use     Best (expensive)

Interview Insight: Always start with zero-shot, add complexity only if needed.

Zero-Shot Prompting (The Baseline)

Definition: Prompt with task description, no examples.

When to Use:

Task is well-defined and within LLM's capabilities
Speed and cost matter (no extra tokens)
GPT-5.2 / Claude 4.5 level models (strong instruction following)

Example - Code Review:

from openai import OpenAI

client = OpenAI()

def zero_shot_review(code):
    prompt = """You are an expert code reviewer. Review the following code for:
1. Bugs and edge cases
2. Performance issues
3. Security vulnerabilities

Provide specific, actionable feedback.

Code to review:
```python
{code}
```"""

    response = client.chat.completions.create(
        model="gpt-5.2-mini",
        messages=[
            {"role": "system", "content": "You are an expert code reviewer."},
            {"role": "user", "content": prompt.format(code=code)}
        ],
        temperature=0.3  # Lower temperature for consistency
    )

    return response.choices[0].message.content

Pros:

✅ Fast: No extra prompt tokens
✅ Simple: Easy to test and iterate
✅ Works well for GPT-5+ models

Cons:

❌ Less consistent than few-shot
❌ May miss domain-specific patterns
❌ Struggles with complex reasoning

Interview Question: "When would zero-shot fail?"

Strong Answer:

"Zero-shot fails when: (1) Task requires domain-specific format (e.g., legal documents, medical reports), (2) Ambiguous instructions (model guesses intent), (3) Complex multi-step reasoning without guidance. Example: asking GPT to 'debug this code' might give generic advice, but showing an example debug output guides the format and depth expected."

Few-Shot Prompting (In-Context Learning)

Definition: Provide 2-8 input-output examples before the actual task.

The Magic Number:

GPT-5.2: 3-5 examples optimal (diminishing returns after 5)
Claude Opus 4.5: 2-4 examples (better instruction following)
Smaller models: 5-8 examples needed

Example - Sentiment Analysis with Nuance:

def few_shot_sentiment(text):
    prompt = """Classify sentiment as: POSITIVE, NEGATIVE, or MIXED (if contradictory).

Examples:

Input: "The product is great but shipping took forever."
Output: MIXED

Input: "Absolutely love it! Best purchase ever."
Output: POSITIVE

Input: "Waste of money. Broke after 2 days."
Output: NEGATIVE

Input: "Works okay I guess, nothing special."
Output: MIXED

Now classify:
Input: "{text}"
Output:"""

    response = client.chat.completions.create(
        model="gpt-5.2-mini",
        messages=[{"role": "user", "content": prompt.format(text=text)}],
        temperature=0.0,  # Deterministic for classification
        max_tokens=10
    )

    return response.choices[0].message.content.strip()

Why This Works:

Model learns task format (classification, not explanation)
Model learns edge cases (MIXED sentiment)
Model learns output style (one-word answers)

Interview Deep Dive: "How do you select few-shot examples?"

Production Strategy:

Diversity: Cover different input patterns

# Bad: All positive examples
examples = ["Great!", "Awesome!", "Love it!"]

# Good: Diverse patterns
examples = [
    ("Great product!", "POSITIVE"),
    ("Terrible quality", "NEGATIVE"),
    ("Good but pricey", "MIXED"),
    ("Meh, it's okay", "MIXED")
]

Similarity: Retrieve examples similar to current input (RAG-style)

from sklearn.metrics.pairwise import cosine_similarity
from openai import OpenAI

client = OpenAI()

def get_embedding(text):
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=text
    )
    return response.data[0].embedding

def select_examples(query, example_pool, k=3):
    """Select most similar examples to query."""
    query_emb = get_embedding(query)

    similarities = []
    for ex in example_pool:
        ex_emb = get_embedding(ex['input'])
        sim = cosine_similarity([query_emb], [ex_emb])[0][0]
        similarities.append((sim, ex))

    # Return top-k most similar
    similarities.sort(reverse=True)
    return [ex for _, ex in similarities[:k]]

Ordering: Most complex example last (primes model)

# Better: Complex example right before task
examples = [
    ("Simple case", "output"),
    ("Medium case", "output"),
    ("Complex edge case", "output")  # This one matters most
]

Cost Consideration:

For GPT-5.2 ($1.75 / $14 per 1M tokens):

5 examples × 100 tokens = 500 input tokens
At 10,000 requests/day = 5M tokens/day = $8.75/day
Optimization: Cache examples in system prompt (90% discount with prompt caching)

Chain-of-Thought (CoT) Prompting

The Breakthrough (Wei et al., 2022):

Adding "Let's think step by step" improves reasoning by 30-40% on math/logic tasks.

Why It Works:

Forces model to decompose problem
Makes reasoning inspectable (you can debug wrong steps)
Activates System 2 thinking (deliberate, not reflexive)

Example - Math Word Problem:

def cot_math_solver(problem):
    prompt = """Solve this step-by-step:

Problem: {problem}

Let's think through this carefully:
1. Identify what we need to find
2. Extract the relevant numbers and relationships
3. Set up the equation
4. Solve step-by-step
5. Verify the answer makes sense

Solution:"""

    response = client.chat.completions.create(
        model="gpt-5.2",  # Larger model for reasoning
        messages=[{"role": "user", "content": prompt.format(problem=problem)}],
        temperature=0.0
    )

    return response.choices[0].message.content

# Example usage
problem = """A train travels 120 km in 2 hours, then 180 km in 3 hours.
What is its average speed for the entire journey?"""

solution = cot_math_solver(problem)
print(solution)

Output:

1. We need to find: Average speed for the entire journey

2. Extract information:
   - First leg: 120 km in 2 hours
   - Second leg: 180 km in 3 hours

3. Set up the equation:
   Average speed = Total distance / Total time

4. Solve:
   Total distance = 120 + 180 = 300 km
   Total time = 2 + 3 = 5 hours
   Average speed = 300 / 5 = 60 km/h

5. Verify: 60 km/h × 5 hours = 300 km ✓

Answer: 60 km/h

Few-Shot CoT (Even Better):

def few_shot_cot_solver(problem):
    prompt = """Solve math problems step-by-step.

Example 1:
Problem: If 5 apples cost $3, how much do 8 apples cost?
Solution:
Step 1: Find cost per apple: $3 ÷ 5 = $0.60 per apple
Step 2: Multiply by 8: $0.60 × 8 = $4.80
Answer: $4.80

Example 2:
Problem: A rectangle has length 12 cm and width 5 cm. What is its area?
Solution:
Step 1: Use formula: Area = length × width
Step 2: Calculate: 12 × 5 = 60 cm²
Answer: 60 cm²

Now solve:
Problem: {problem}
Solution:"""

    response = client.chat.completions.create(
        model="gpt-5.2",
        messages=[{"role": "user", "content": prompt.format(problem=problem)}],
        temperature=0.0
    )

    return response.choices[0].message.content

Interview Question (OpenAI): "When does CoT fail? How would you detect and handle it?"

Strong Answer:

"CoT fails when: (1) Model lacks domain knowledge (no reasoning can fix missing facts), (2) Problem requires external tools (e.g., precise calculation - use ReAct instead), (3) Multiple valid solution paths confuse the model (use Tree of Thoughts). Detection: Parse the CoT output, verify each step's logic. If a step contradicts previous steps or makes unjustified leaps, flag for human review or re-generate with stronger constraints."

ReAct (Reasoning + Acting)

The Pattern (Yao et al., 2023):

Interleave reasoning steps with tool calls to ground answers in facts.

Structure:

Thought: What do I need to do?
Action: Call tool X with input Y
Observation: Tool returns Z
Thought: Based on Z, next I should...
Action: Call tool W
Observation: ...
Answer: Final response

Example - Customer Support with Database Lookup:

import json
from datetime import datetime

def react_customer_support(query, tools):
    """
    tools: dict of available functions
    """
    system_prompt = """You are a customer support agent with access to tools.

Available tools:
- lookup_order(order_id): Get order details
- check_inventory(product_id): Check stock status
- create_refund(order_id, reason): Initiate refund

Use this format:
Thought: [your reasoning]
Action: [tool_name(arguments)]
Observation: [tool output]
... (repeat as needed)
Answer: [final response to customer]"""

    max_iterations = 5
    conversation = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": query}
    ]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-5.2",
            messages=conversation,
            temperature=0.0
        )

        content = response.choices[0].message.content
        conversation.append({"role": "assistant", "content": content})

        # Check if answer is complete
        if "Answer:" in content:
            return content.split("Answer:")[-1].strip()

        # Parse and execute action
        if "Action:" in content:
            action_line = [l for l in content.split("\n") if l.startswith("Action:")][0]
            action = action_line.replace("Action:", "").strip()

            # Execute tool (simplified)
            observation = execute_tool(action, tools)

            # Add observation to conversation
            conversation.append({
                "role": "user",
                "content": f"Observation: {observation}"
            })
        else:
            break

    return "Unable to complete request after 5 iterations."

def execute_tool(action, tools):
    """Parse and execute tool call."""
    # Simple parser (production would use function calling API)
    import re
    match = re.match(r"(\w+)\((.*?)\)", action)
    if match:
        tool_name = match.group(1)
        args = match.group(2)

        if tool_name in tools:
            return tools[tool_name](args)

    return "Tool not found or invalid syntax."

# Example usage
tools = {
    "lookup_order": lambda order_id: json.dumps({
        "order_id": order_id,
        "status": "shipped",
        "tracking": "1Z999AA10123456784",
        "estimated_delivery": "2026-01-05"
    }),
    "check_inventory": lambda product_id: json.dumps({
        "product_id": product_id,
        "in_stock": True,
        "quantity": 47
    })
}

query = "Where is my order #12345? I need it by January 4th."
response = react_customer_support(query, tools)
print(response)

Expected ReAct Trace:

Thought: I need to look up the order status to see if it will arrive on time.
Action: lookup_order("12345")
Observation: {"order_id": "12345", "status": "shipped", "tracking": "1Z999AA10123456784", "estimated_delivery": "2026-01-05"}
Thought: The order is shipped but won't arrive until January 5th, one day late. I should inform the customer.
Answer: Your order #12345 is currently in transit with tracking number 1Z999AA10123456784. Unfortunately, the estimated delivery is January 5th, which is one day later than you need. Would you like me to check if we can expedite shipping or issue a partial refund for the delay?

Interview Question: "ReAct vs function calling API (GPT-5.2, Claude 4.5) - what's the difference?"

Answer:

"ReAct is a prompting pattern where reasoning and actions are in text. Function calling is a native API feature where the model outputs structured JSON for tool use, which is more reliable. Use function calling when available (GPT-5.2, Claude 4.5, Gemini 3 Pro all support it). Use ReAct when: (1) You need to see reasoning traces for debugging, (2) Working with models without function calling, (3) Complex multi-step reasoning where showing the thought process improves accuracy."

Tree of Thoughts (ToT)

The Idea (Yao et al., 2023):

For complex problems, explore multiple reasoning paths (like a search tree), evaluate each, and select the best.

When to Use:

Creative writing (generate multiple story outlines)
Code optimization (explore different algorithms)
Strategic planning (evaluate options)

Cost Warning: ToT generates 3-5x more tokens than CoT. Only use for high-value tasks.

Example - Algorithm Design:

from typing import List, Dict

def tree_of_thoughts_solver(problem: str, num_paths: int = 3) -> Dict:
    """
    Generate multiple solution paths, evaluate each, select best.
    """
    # Step 1: Generate multiple reasoning paths
    paths = []
    for i in range(num_paths):
        prompt = f"""Solve this problem. Propose a unique approach (this is attempt {i+1} of {num_paths}):

Problem: {problem}

Approach:"""

        response = client.chat.completions.create(
            model="gpt-5.2",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.8,  # Higher temp for diversity
            n=1
        )

        paths.append({
            "path_id": i,
            "approach": response.choices[0].message.content
        })

    # Step 2: Evaluate each path
    for path in paths:
        eval_prompt = f"""Evaluate this approach on a scale of 1-10 for:
- Correctness (will it work?)
- Efficiency (time/space complexity)
- Simplicity (easy to implement?)

Approach:
{path['approach']}

Evaluation (JSON format):"""

        response = client.chat.completions.create(
            model="gpt-5.2",
            messages=[{"role": "user", "content": eval_prompt}],
            temperature=0.0,
            response_format={"type": "json_object"}
        )

        path["evaluation"] = json.loads(response.choices[0].message.content)
        path["total_score"] = sum(path["evaluation"].values())

    # Step 3: Select best path
    best_path = max(paths, key=lambda p: p["total_score"])

    # Step 4: Refine best path
    refine_prompt = f"""Take this approach and implement it fully:

{best_path['approach']}

Provide complete, production-ready code with comments:"""

    response = client.chat.completions.create(
        model="gpt-5.2",
        messages=[{"role": "user", "content": refine_prompt}],
        temperature=0.3
    )

    return {
        "all_paths": paths,
        "best_path": best_path,
        "final_solution": response.choices[0].message.content
    }

# Example
problem = """Design an algorithm to find the k most frequent elements in an array.
Optimize for: O(n log k) time complexity."""

result = tree_of_thoughts_solver(problem, num_paths=3)
print(f"Best approach (score: {result['best_path']['total_score']/30*100:.0f}%):")
print(result['final_solution'])

Interview Insight: Mentioning ToT shows you know cutting-edge techniques, but acknowledge it's expensive and only for critical tasks.

Pattern Selection Framework (Interview Gold)

Decision Matrix:

Pattern	Best For	Cost	Consistency	Example Use Case
Zero-Shot	Simple, well-defined tasks	$	Medium	Summarization, translation
Few-Shot	Format learning, classification	$$	High	Sentiment analysis, entity extraction
CoT	Math, logic, multi-step reasoning	$$$	High	Debugging, planning
ReAct	Tool use, factual grounding	$$$$	Very High	Customer support, data retrieval
ToT	Creative, strategic, optimization	$$$$$	Medium	Algorithm design, story writing

Interview Question (Meta): "Design a code documentation generator. Walk through your prompting strategy."

Strong Answer:

1. Start with Zero-Shot:
   - Prompt: "Generate docstring for this function"
   - Test on 10 functions
   - If quality is good (>90% acceptance), ship it

2. If inconsistent, add Few-Shot:
   - Collect 5 high-quality examples (diverse function types)
   - Include edge cases (async functions, generators, etc.)
   - Re-test

3. If still missing context, add CoT:
   - "Analyze the function step-by-step: inputs, logic, outputs, edge cases"
   - Then generate docstring
   - Improves quality for complex functions

4. If hallucinating (making up behavior), add ReAct:
   - Tool: AST parser to verify function signature
   - Tool: Run tests to verify behavior
   - Ground docstring in actual code behavior

I'd start with zero-shot + GPT-5.2-mini ($0.15/$0.60 per 1M) for cost efficiency,
then upgrade to few-shot CoT only for functions that fail quality checks (maybe 10%).
This optimizes both cost and quality.

Advanced: Self-Consistency (Ensemble CoT)

The Technique:

Generate multiple CoT reasoning paths (e.g., 5)
Extract final answers from each
Return majority vote

When to Use: High-stakes decisions where accuracy > cost (medical, legal, financial).

from collections import Counter

def self_consistency_solver(problem: str, num_samples: int = 5) -> Dict:
    """Generate multiple CoT solutions and take majority vote."""
    solutions = []

    for i in range(num_samples):
        response = client.chat.completions.create(
            model="gpt-5.2",
            messages=[{
                "role": "user",
                "content": f"Solve step-by-step:\n{problem}\nFinal answer:"
            }],
            temperature=0.7,  # Some randomness for diversity
        )

        solutions.append(response.choices[0].message.content)

    # Extract final answers (simplified)
    answers = [s.split("Answer:")[-1].strip() for s in solutions if "Answer:" in s]

    # Majority vote
    vote_counts = Counter(answers)
    majority_answer, count = vote_counts.most_common(1)[0]

    return {
        "all_solutions": solutions,
        "majority_answer": majority_answer,
        "confidence": count / num_samples,
        "vote_distribution": dict(vote_counts)
    }

# Example
result = self_consistency_solver(
    "If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?",
    num_samples=5
)

print(f"Answer: {result['majority_answer']}")
print(f"Confidence: {result['confidence']*100:.0f}%")

Cost: 5x a single CoT call. For GPT-5.2: ~$0.10 per problem (if 1K tokens avg).

Common Mistakes to Avoid (Interview Debugging)

Mistake 1: Vague Instructions

# Bad
prompt = "Make this code better"

# Good
prompt = """Refactor this code to:
1. Reduce time complexity from O(n²) to O(n log n)
2. Add error handling for edge cases
3. Add docstrings following Google style"""

Mistake 2: No Output Constraints

# Bad - model might write an essay
prompt = "What is 2+2?"

# Good
prompt = "What is 2+2? Answer with only the number."
# Or use max_tokens=5

Mistake 3: Ignoring Model Limitations

# Bad - model doesn't know current data
prompt = "What's the stock price of AAPL right now?"

# Good - use ReAct with tool
prompt = "Check the stock price of AAPL using the get_stock_price tool, then analyze the trend."

✅ Start simple: Zero-shot first, add complexity only if needed ✅ Know the costs: Few-shot = 2-3x, CoT = 3-5x, ToT = 10x+ token usage ✅ Few-shot selection: Diverse, similar to query, complex examples last ✅ CoT unlocks reasoning: Essential for math, logic, debugging ✅ ReAct for grounding: Use tools to prevent hallucination ✅ ToT for exploration: Only for high-value creative/strategic tasks

Next: Learn how in-context learning actually works under the hood in Lesson 2.

:::

Why This Matters for Interviews

The Prompting Hierarchy

Zero-Shot Prompting (The Baseline)

Few-Shot Prompting (In-Context Learning)

Chain-of-Thought (CoT) Prompting

ReAct (Reasoning + Acting)

Tree of Thoughts (ToT)

Pattern Selection Framework (Interview Gold)

Advanced: Self-Consistency (Ensemble CoT)

Common Mistakes to Avoid (Interview Debugging)

Key Takeaways for Interviews

Quiz

Stay on the Nerd Track