Statistics & Probability
Probability Fundamentals
Probability questions test your ability to think rigorously under pressure. Interviewers want to see clear reasoning, not just final answers.
Bayes' Theorem
The most important formula in data science interviews:
P(A|B) = P(B|A) × P(A) / P(B)
Where:
- P(A|B) = Probability of A given B (posterior)
- P(B|A) = Probability of B given A (likelihood)
- P(A) = Prior probability of A
- P(B) = Total probability of B
Classic interview question: A test for a disease is 99% accurate. If 1% of the population has the disease, what's the probability someone who tests positive actually has the disease?
Given:
- P(Disease) = 0.01
- P(Positive|Disease) = 0.99
- P(Positive|No Disease) = 0.01 (false positive)
Calculate:
P(Disease|Positive) = P(Positive|Disease) × P(Disease) / P(Positive)
P(Positive) = P(Positive|Disease) × P(Disease) + P(Positive|No Disease) × P(No Disease)
= 0.99 × 0.01 + 0.01 × 0.99
= 0.0099 + 0.0099 = 0.0198
P(Disease|Positive) = (0.99 × 0.01) / 0.0198 = 0.5 = 50%
Key insight: Despite a 99% accurate test, there's only a 50% chance the person has the disease. This counterintuitive result comes from the low base rate.
Expected Value and Variance
Expected Value (mean):
E[X] = Σ xᵢ × P(xᵢ)
Variance:
Var(X) = E[(X - μ)²] = E[X²] - (E[X])²
Interview question: A casino game costs $10 to play. You roll a die: if you get 6, you win $50; otherwise, you lose your $10. What's the expected value?
E[X] = P(win) × $40 + P(lose) × (-$10)
= (1/6) × $40 + (5/6) × (-$10)
= $6.67 - $8.33
= -$1.67
Expected loss of $1.67 per game.
Common Probability Distributions
| Distribution | Use Case | Key Property |
|---|---|---|
| Normal | Continuous data, averages | 68-95-99.7 rule |
| Binomial | Count of successes in n trials | Fixed n, same probability |
| Poisson | Count of events in interval | λ = mean = variance |
| Exponential | Time between events | Memoryless property |
Interview tip: Always state which distribution you're assuming and why.
Classic Probability Puzzles
The Birthday Problem
Question: How many people needed for a 50% chance that two share a birthday?
Answer: Just 23 people.
P(no match with n people) = 365/365 × 364/365 × 363/365 × ... × (365-n+1)/365
P(at least one match) = 1 - P(no match)
At n=23: P(match) ≈ 50.7%
Monty Hall Problem
Scenario: Three doors, one car, two goats. You pick door 1. Host opens door 3 (goat). Should you switch to door 2?
Answer: Yes! Switching gives 2/3 probability of winning.
Initial pick: 1/3 chance of car
Switch: 2/3 chance of car (the probability "transfers" from the opened door)
Why this matters: Tests conditional probability reasoning.
Interview Framework
When solving probability problems:
- Define the sample space: What are all possible outcomes?
- Identify the events: What are we calculating?
- State assumptions: Independence, distributions
- Write the formula: Show your work
- Check reasonableness: Does the answer make sense?
Practice explaining your reasoning out loud. Interviewers care more about your thought process than the final number. :::