Statistics & Probability
Statistics Interview Problems
Practice these classic statistics problems that appear frequently in data science interviews. Focus on clear reasoning and stating assumptions.
Problem 1: Two-Sample t-Test
Scenario: You're testing a new checkout flow. Control group (n=500) has mean conversion of 4.2% (std=1.8%). Treatment group (n=500) has mean conversion of 4.8% (std=2.0%). Is the difference significant?
Solution:
Step 1: State hypotheses
H₀: μ_treatment = μ_control
H₁: μ_treatment ≠ μ_control
Step 2: Calculate pooled standard error
SE = √[(s₁²/n₁) + (s₂²/n₂)]
= √[(0.018²/500) + (0.020²/500)]
= √[(0.000324/500) + (0.0004/500)]
= √[0.000001448]
= 0.00120
Step 3: Calculate t-statistic
t = (x̄₁ - x̄₂) / SE
= (0.048 - 0.042) / 0.00120
= 0.006 / 0.00120
= 5.0
Step 4: Compare to critical value
df ≈ 998, critical t at α=0.05 ≈ 1.96
Our t=5.0 > 1.96
Conclusion: Significant at α=0.05. The new checkout flow has a statistically significant higher conversion rate.
Problem 2: Chi-Square Test for Independence
Scenario: Does user device type affect premium subscription rates?
| Device | Subscribed | Not Subscribed | Total |
|---|---|---|---|
| Mobile | 120 | 880 | 1000 |
| Desktop | 200 | 800 | 1000 |
Solution:
Step 1: Calculate expected values
E(Mobile, Sub) = (1000 × 320) / 2000 = 160
E(Mobile, Not) = (1000 × 1680) / 2000 = 840
E(Desktop, Sub) = 160
E(Desktop, Not) = 840
Step 2: Calculate chi-square statistic
χ² = Σ (O - E)² / E
= (120-160)²/160 + (880-840)²/840 + (200-160)²/160 + (800-840)²/840
= 10 + 1.9 + 10 + 1.9
= 23.8
Step 3: Compare to critical value
df = (rows-1) × (cols-1) = 1
Critical χ² at α=0.05 = 3.84
χ² = 23.8 > 3.84
Conclusion: Device type is significantly associated with subscription rate.
Problem 3: Correlation vs Causation
Interview question: "We found that users who use our mobile app have 3x higher retention than web-only users. Should we invest more in mobile?"
Strong answer:
"Before recommending increased mobile investment, I'd investigate several alternatives:
-
Selection bias: Are mobile users fundamentally different? They may be more engaged overall, using both platforms.
-
Reverse causation: Does mobile cause retention, or do retained users eventually download the app?
-
Confounders:
- Notification access (mobile users receive push notifications)
- Demographic differences (age, tech-savviness)
- Use case differences
What I'd do:
- Compare retention for users who started on mobile vs web (cohort analysis)
- Control for user characteristics in regression
- Look at retention change when users adopt mobile after using web
- If possible, run an experiment encouraging web users to try mobile"
Problem 4: Simpson's Paradox
Scenario: A drug trial shows:
| Group | Drug Success | Control Success |
|---|---|---|
| Mild cases | 80% (80/100) | 90% (180/200) |
| Severe cases | 30% (60/200) | 20% (20/100) |
| Overall | 47% (140/300) | 67% (200/300) |
Drug looks worse overall but better for severe cases!
Explanation:
"This is Simpson's Paradox. The drug appears worse overall (47% vs 67%), but when we stratify by severity:
- Severe cases: Drug 30% vs Control 20% (drug better)
- Mild cases: Drug 80% vs Control 90% (drug worse)
The paradox occurs because:
- Drug was given more often to severe cases (200 severe vs 100 mild)
- Control was given more often to mild cases (200 mild vs 100 severe)
- Severe cases have lower success rates overall
Correct interpretation: The drug is more effective for severe cases (the harder problem). The overall average is misleading because of unequal allocation."
Problem 5: Power Analysis
Question: "How many users do we need per group to detect a 5% relative improvement in conversion rate (from 10% to 10.5%) with 80% power at α=0.05?"
Solution:
Using standard formula for two-proportion test:
n = 2 × [(Zα/2 + Zβ)² × p̄(1-p̄)] / (p₁ - p₂)²
Where:
- Zα/2 = 1.96 (for α=0.05, two-tailed)
- Zβ = 0.84 (for 80% power)
- p₁ = 0.10, p₂ = 0.105
- p̄ = (0.10 + 0.105) / 2 = 0.1025
n = 2 × [(1.96 + 0.84)² × 0.1025 × 0.8975] / (0.005)²
= 2 × [7.84 × 0.092] / 0.000025
= 2 × 0.721 / 0.000025
= 57,680 per group
Need ~58,000 users per group (116,000 total) to detect this small effect.
Interview insight: "This highlights why detecting small effects requires large samples. I'd ask whether a 5% relative improvement is worth the cost of this experiment, or if we should focus on larger potential improvements first."
Show your work step-by-step. Interviewers care more about your process than memorizing formulas. :::