Prompt Engineering for Interviews
Prompt Design Patterns: Zero-Shot to Tree of Thoughts
Why This Matters for Interviews
OpenAI, Anthropic, and Meta expect L5-L6 candidates to design production-ready prompts on the spot. Common interview scenarios:
- System design: "Build a code review assistant. What prompting strategy would you use?"
- Debugging: "Our chatbot hallucinates on edge cases. How would you fix it with prompting?"
- Comparison: "When would you use CoT vs ReAct vs ToT?"
Real Interview Question (Anthropic L6):
"A customer wants Claude to solve complex math problems. Walk me through how you'd design the prompt, from zero-shot to advanced techniques. What are the trade-offs?"
The Prompting Hierarchy
From simplest to most complex:
Zero-Shot → Few-Shot → Chain-of-Thought → ReAct → Tree of Thoughts
↓ ↓ ↓ ↓ ↓
No examples Examples Show reasoning Add actions Explore paths
Fast/cheap Better Much better Tool use Best (expensive)
Interview Insight: Always start with zero-shot, add complexity only if needed.
Zero-Shot Prompting (The Baseline)
Definition: Prompt with task description, no examples.
When to Use:
- Task is well-defined and within LLM's capabilities
- Speed and cost matter (no extra tokens)
- GPT-5.2 / Claude 4.5 level models (strong instruction following)
Example - Code Review:
from openai import OpenAI
client = OpenAI()
def zero_shot_review(code):
prompt = """You are an expert code reviewer. Review the following code for:
1. Bugs and edge cases
2. Performance issues
3. Security vulnerabilities
Provide specific, actionable feedback.
Code to review:
```python
{code}
```"""
response = client.chat.completions.create(
model="gpt-5.2-mini",
messages=[
{"role": "system", "content": "You are an expert code reviewer."},
{"role": "user", "content": prompt.format(code=code)}
],
temperature=0.3 # Lower temperature for consistency
)
return response.choices[0].message.content
Pros:
- ✅ Fast: No extra prompt tokens
- ✅ Simple: Easy to test and iterate
- ✅ Works well for GPT-5+ models
Cons:
- ❌ Less consistent than few-shot
- ❌ May miss domain-specific patterns
- ❌ Struggles with complex reasoning
Interview Question: "When would zero-shot fail?"
Strong Answer:
"Zero-shot fails when: (1) Task requires domain-specific format (e.g., legal documents, medical reports), (2) Ambiguous instructions (model guesses intent), (3) Complex multi-step reasoning without guidance. Example: asking GPT to 'debug this code' might give generic advice, but showing an example debug output guides the format and depth expected."
Few-Shot Prompting (In-Context Learning)
Definition: Provide 2-8 input-output examples before the actual task.
The Magic Number:
- GPT-5.2: 3-5 examples optimal (diminishing returns after 5)
- Claude Opus 4.5: 2-4 examples (better instruction following)
- Smaller models: 5-8 examples needed
Example - Sentiment Analysis with Nuance:
def few_shot_sentiment(text):
prompt = """Classify sentiment as: POSITIVE, NEGATIVE, or MIXED (if contradictory).
Examples:
Input: "The product is great but shipping took forever."
Output: MIXED
Input: "Absolutely love it! Best purchase ever."
Output: POSITIVE
Input: "Waste of money. Broke after 2 days."
Output: NEGATIVE
Input: "Works okay I guess, nothing special."
Output: MIXED
Now classify:
Input: "{text}"
Output:"""
response = client.chat.completions.create(
model="gpt-5.2-mini",
messages=[{"role": "user", "content": prompt.format(text=text)}],
temperature=0.0, # Deterministic for classification
max_tokens=10
)
return response.choices[0].message.content.strip()
Why This Works:
- Model learns task format (classification, not explanation)
- Model learns edge cases (MIXED sentiment)
- Model learns output style (one-word answers)
Interview Deep Dive: "How do you select few-shot examples?"
Production Strategy:
-
Diversity: Cover different input patterns
# Bad: All positive examples examples = ["Great!", "Awesome!", "Love it!"] # Good: Diverse patterns examples = [ ("Great product!", "POSITIVE"), ("Terrible quality", "NEGATIVE"), ("Good but pricey", "MIXED"), ("Meh, it's okay", "MIXED") ] -
Similarity: Retrieve examples similar to current input (RAG-style)
from sklearn.metrics.pairwise import cosine_similarity from openai import OpenAI client = OpenAI() def get_embedding(text): response = client.embeddings.create( model="text-embedding-3-large", input=text ) return response.data[0].embedding def select_examples(query, example_pool, k=3): """Select most similar examples to query.""" query_emb = get_embedding(query) similarities = [] for ex in example_pool: ex_emb = get_embedding(ex['input']) sim = cosine_similarity([query_emb], [ex_emb])[0][0] similarities.append((sim, ex)) # Return top-k most similar similarities.sort(reverse=True) return [ex for _, ex in similarities[:k]] -
Ordering: Most complex example last (primes model)
# Better: Complex example right before task examples = [ ("Simple case", "output"), ("Medium case", "output"), ("Complex edge case", "output") # This one matters most ]
Cost Consideration:
For GPT-5.2 ($1.75 / $14 per 1M tokens):
- 5 examples × 100 tokens = 500 input tokens
- At 10,000 requests/day = 5M tokens/day = $8.75/day
- Optimization: Cache examples in system prompt (90% discount with prompt caching)
Chain-of-Thought (CoT) Prompting
The Breakthrough (Wei et al., 2022):
Adding "Let's think step by step" improves reasoning by 30-40% on math/logic tasks.
Why It Works:
- Forces model to decompose problem
- Makes reasoning inspectable (you can debug wrong steps)
- Activates System 2 thinking (deliberate, not reflexive)
Example - Math Word Problem:
def cot_math_solver(problem):
prompt = """Solve this step-by-step:
Problem: {problem}
Let's think through this carefully:
1. Identify what we need to find
2. Extract the relevant numbers and relationships
3. Set up the equation
4. Solve step-by-step
5. Verify the answer makes sense
Solution:"""
response = client.chat.completions.create(
model="gpt-5.2", # Larger model for reasoning
messages=[{"role": "user", "content": prompt.format(problem=problem)}],
temperature=0.0
)
return response.choices[0].message.content
# Example usage
problem = """A train travels 120 km in 2 hours, then 180 km in 3 hours.
What is its average speed for the entire journey?"""
solution = cot_math_solver(problem)
print(solution)
Output:
1. We need to find: Average speed for the entire journey
2. Extract information:
- First leg: 120 km in 2 hours
- Second leg: 180 km in 3 hours
3. Set up the equation:
Average speed = Total distance / Total time
4. Solve:
Total distance = 120 + 180 = 300 km
Total time = 2 + 3 = 5 hours
Average speed = 300 / 5 = 60 km/h
5. Verify: 60 km/h × 5 hours = 300 km ✓
Answer: 60 km/h
Few-Shot CoT (Even Better):
def few_shot_cot_solver(problem):
prompt = """Solve math problems step-by-step.
Example 1:
Problem: If 5 apples cost $3, how much do 8 apples cost?
Solution:
Step 1: Find cost per apple: $3 ÷ 5 = $0.60 per apple
Step 2: Multiply by 8: $0.60 × 8 = $4.80
Answer: $4.80
Example 2:
Problem: A rectangle has length 12 cm and width 5 cm. What is its area?
Solution:
Step 1: Use formula: Area = length × width
Step 2: Calculate: 12 × 5 = 60 cm²
Answer: 60 cm²
Now solve:
Problem: {problem}
Solution:"""
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": prompt.format(problem=problem)}],
temperature=0.0
)
return response.choices[0].message.content
Interview Question (OpenAI): "When does CoT fail? How would you detect and handle it?"
Strong Answer:
"CoT fails when: (1) Model lacks domain knowledge (no reasoning can fix missing facts), (2) Problem requires external tools (e.g., precise calculation - use ReAct instead), (3) Multiple valid solution paths confuse the model (use Tree of Thoughts). Detection: Parse the CoT output, verify each step's logic. If a step contradicts previous steps or makes unjustified leaps, flag for human review or re-generate with stronger constraints."
ReAct (Reasoning + Acting)
The Pattern (Yao et al., 2023):
Interleave reasoning steps with tool calls to ground answers in facts.
Structure:
Thought: What do I need to do?
Action: Call tool X with input Y
Observation: Tool returns Z
Thought: Based on Z, next I should...
Action: Call tool W
Observation: ...
Answer: Final response
Example - Customer Support with Database Lookup:
import json
from datetime import datetime
def react_customer_support(query, tools):
"""
tools: dict of available functions
"""
system_prompt = """You are a customer support agent with access to tools.
Available tools:
- lookup_order(order_id): Get order details
- check_inventory(product_id): Check stock status
- create_refund(order_id, reason): Initiate refund
Use this format:
Thought: [your reasoning]
Action: [tool_name(arguments)]
Observation: [tool output]
... (repeat as needed)
Answer: [final response to customer]"""
max_iterations = 5
conversation = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
]
for i in range(max_iterations):
response = client.chat.completions.create(
model="gpt-5.2",
messages=conversation,
temperature=0.0
)
content = response.choices[0].message.content
conversation.append({"role": "assistant", "content": content})
# Check if answer is complete
if "Answer:" in content:
return content.split("Answer:")[-1].strip()
# Parse and execute action
if "Action:" in content:
action_line = [l for l in content.split("\n") if l.startswith("Action:")][0]
action = action_line.replace("Action:", "").strip()
# Execute tool (simplified)
observation = execute_tool(action, tools)
# Add observation to conversation
conversation.append({
"role": "user",
"content": f"Observation: {observation}"
})
else:
break
return "Unable to complete request after 5 iterations."
def execute_tool(action, tools):
"""Parse and execute tool call."""
# Simple parser (production would use function calling API)
import re
match = re.match(r"(\w+)\((.*?)\)", action)
if match:
tool_name = match.group(1)
args = match.group(2)
if tool_name in tools:
return tools[tool_name](args)
return "Tool not found or invalid syntax."
# Example usage
tools = {
"lookup_order": lambda order_id: json.dumps({
"order_id": order_id,
"status": "shipped",
"tracking": "1Z999AA10123456784",
"estimated_delivery": "2026-01-05"
}),
"check_inventory": lambda product_id: json.dumps({
"product_id": product_id,
"in_stock": True,
"quantity": 47
})
}
query = "Where is my order #12345? I need it by January 4th."
response = react_customer_support(query, tools)
print(response)
Expected ReAct Trace:
Thought: I need to look up the order status to see if it will arrive on time.
Action: lookup_order("12345")
Observation: {"order_id": "12345", "status": "shipped", "tracking": "1Z999AA10123456784", "estimated_delivery": "2026-01-05"}
Thought: The order is shipped but won't arrive until January 5th, one day late. I should inform the customer.
Answer: Your order #12345 is currently in transit with tracking number 1Z999AA10123456784. Unfortunately, the estimated delivery is January 5th, which is one day later than you need. Would you like me to check if we can expedite shipping or issue a partial refund for the delay?
Interview Question: "ReAct vs function calling API (GPT-5.2, Claude 4.5) - what's the difference?"
Answer:
"ReAct is a prompting pattern where reasoning and actions are in text. Function calling is a native API feature where the model outputs structured JSON for tool use, which is more reliable. Use function calling when available (GPT-5.2, Claude 4.5, Gemini 3 Pro all support it). Use ReAct when: (1) You need to see reasoning traces for debugging, (2) Working with models without function calling, (3) Complex multi-step reasoning where showing the thought process improves accuracy."
Tree of Thoughts (ToT)
The Idea (Yao et al., 2023):
For complex problems, explore multiple reasoning paths (like a search tree), evaluate each, and select the best.
When to Use:
- Creative writing (generate multiple story outlines)
- Code optimization (explore different algorithms)
- Strategic planning (evaluate options)
Cost Warning: ToT generates 3-5x more tokens than CoT. Only use for high-value tasks.
Example - Algorithm Design:
from typing import List, Dict
def tree_of_thoughts_solver(problem: str, num_paths: int = 3) -> Dict:
"""
Generate multiple solution paths, evaluate each, select best.
"""
# Step 1: Generate multiple reasoning paths
paths = []
for i in range(num_paths):
prompt = f"""Solve this problem. Propose a unique approach (this is attempt {i+1} of {num_paths}):
Problem: {problem}
Approach:"""
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": prompt}],
temperature=0.8, # Higher temp for diversity
n=1
)
paths.append({
"path_id": i,
"approach": response.choices[0].message.content
})
# Step 2: Evaluate each path
for path in paths:
eval_prompt = f"""Evaluate this approach on a scale of 1-10 for:
- Correctness (will it work?)
- Efficiency (time/space complexity)
- Simplicity (easy to implement?)
Approach:
{path['approach']}
Evaluation (JSON format):"""
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": eval_prompt}],
temperature=0.0,
response_format={"type": "json_object"}
)
path["evaluation"] = json.loads(response.choices[0].message.content)
path["total_score"] = sum(path["evaluation"].values())
# Step 3: Select best path
best_path = max(paths, key=lambda p: p["total_score"])
# Step 4: Refine best path
refine_prompt = f"""Take this approach and implement it fully:
{best_path['approach']}
Provide complete, production-ready code with comments:"""
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": refine_prompt}],
temperature=0.3
)
return {
"all_paths": paths,
"best_path": best_path,
"final_solution": response.choices[0].message.content
}
# Example
problem = """Design an algorithm to find the k most frequent elements in an array.
Optimize for: O(n log k) time complexity."""
result = tree_of_thoughts_solver(problem, num_paths=3)
print(f"Best approach (score: {result['best_path']['total_score']/30*100:.0f}%):")
print(result['final_solution'])
Interview Insight: Mentioning ToT shows you know cutting-edge techniques, but acknowledge it's expensive and only for critical tasks.
Pattern Selection Framework (Interview Gold)
Decision Matrix:
| Pattern | Best For | Cost | Consistency | Example Use Case |
|---|---|---|---|---|
| Zero-Shot | Simple, well-defined tasks | $ | Medium | Summarization, translation |
| Few-Shot | Format learning, classification | $$ | High | Sentiment analysis, entity extraction |
| CoT | Math, logic, multi-step reasoning | $$$ | High | Debugging, planning |
| ReAct | Tool use, factual grounding | $$$$ | Very High | Customer support, data retrieval |
| ToT | Creative, strategic, optimization | $$$$$ | Medium | Algorithm design, story writing |
Interview Question (Meta): "Design a code documentation generator. Walk through your prompting strategy."
Strong Answer:
1. Start with Zero-Shot:
- Prompt: "Generate docstring for this function"
- Test on 10 functions
- If quality is good (>90% acceptance), ship it
2. If inconsistent, add Few-Shot:
- Collect 5 high-quality examples (diverse function types)
- Include edge cases (async functions, generators, etc.)
- Re-test
3. If still missing context, add CoT:
- "Analyze the function step-by-step: inputs, logic, outputs, edge cases"
- Then generate docstring
- Improves quality for complex functions
4. If hallucinating (making up behavior), add ReAct:
- Tool: AST parser to verify function signature
- Tool: Run tests to verify behavior
- Ground docstring in actual code behavior
I'd start with zero-shot + GPT-5.2-mini ($0.15/$0.60 per 1M) for cost efficiency,
then upgrade to few-shot CoT only for functions that fail quality checks (maybe 10%).
This optimizes both cost and quality.
Advanced: Self-Consistency (Ensemble CoT)
The Technique:
- Generate multiple CoT reasoning paths (e.g., 5)
- Extract final answers from each
- Return majority vote
When to Use: High-stakes decisions where accuracy > cost (medical, legal, financial).
from collections import Counter
def self_consistency_solver(problem: str, num_samples: int = 5) -> Dict:
"""Generate multiple CoT solutions and take majority vote."""
solutions = []
for i in range(num_samples):
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{
"role": "user",
"content": f"Solve step-by-step:\n{problem}\nFinal answer:"
}],
temperature=0.7, # Some randomness for diversity
)
solutions.append(response.choices[0].message.content)
# Extract final answers (simplified)
answers = [s.split("Answer:")[-1].strip() for s in solutions if "Answer:" in s]
# Majority vote
vote_counts = Counter(answers)
majority_answer, count = vote_counts.most_common(1)[0]
return {
"all_solutions": solutions,
"majority_answer": majority_answer,
"confidence": count / num_samples,
"vote_distribution": dict(vote_counts)
}
# Example
result = self_consistency_solver(
"If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?",
num_samples=5
)
print(f"Answer: {result['majority_answer']}")
print(f"Confidence: {result['confidence']*100:.0f}%")
Cost: 5x a single CoT call. For GPT-5.2: ~$0.10 per problem (if 1K tokens avg).
Common Mistakes to Avoid (Interview Debugging)
Mistake 1: Vague Instructions
# Bad
prompt = "Make this code better"
# Good
prompt = """Refactor this code to:
1. Reduce time complexity from O(n²) to O(n log n)
2. Add error handling for edge cases
3. Add docstrings following Google style"""
Mistake 2: No Output Constraints
# Bad - model might write an essay
prompt = "What is 2+2?"
# Good
prompt = "What is 2+2? Answer with only the number."
# Or use max_tokens=5
Mistake 3: Ignoring Model Limitations
# Bad - model doesn't know current data
prompt = "What's the stock price of AAPL right now?"
# Good - use ReAct with tool
prompt = "Check the stock price of AAPL using the get_stock_price tool, then analyze the trend."
Key Takeaways for Interviews
✅ Start simple: Zero-shot first, add complexity only if needed ✅ Know the costs: Few-shot = 2-3x, CoT = 3-5x, ToT = 10x+ token usage ✅ Few-shot selection: Diverse, similar to query, complex examples last ✅ CoT unlocks reasoning: Essential for math, logic, debugging ✅ ReAct for grounding: Use tools to prevent hallucination ✅ ToT for exploration: Only for high-value creative/strategic tasks
Next: Learn how in-context learning actually works under the hood in Lesson 2.
:::