Statistics & Probability

Statistics Interview Problems

4 min read

Practice these classic statistics problems that appear frequently in data science interviews. Focus on clear reasoning and stating assumptions.

Problem 1: Two-Sample t-Test

Scenario: You're testing a new checkout flow. Control group (n=500) has mean conversion of 4.2% (std=1.8%). Treatment group (n=500) has mean conversion of 4.8% (std=2.0%). Is the difference significant?

Solution:

Step 1: State hypotheses
H₀: μ_treatment = μ_control
H₁: μ_treatment ≠ μ_control

Step 2: Calculate pooled standard error
SE = √[(s₁²/n₁) + (s₂²/n₂)]
   = √[(0.018²/500) + (0.020²/500)]
   = √[(0.000324/500) + (0.0004/500)]
   = √[0.000001448]
   = 0.00120

Step 3: Calculate t-statistic
t = (x̄₁ - x̄₂) / SE
  = (0.048 - 0.042) / 0.00120
  = 0.006 / 0.00120
  = 5.0

Step 4: Compare to critical value
df ≈ 998, critical t at α=0.05 ≈ 1.96
Our t=5.0 > 1.96

Conclusion: Significant at α=0.05. The new checkout flow has a statistically significant higher conversion rate.

Problem 2: Chi-Square Test for Independence

Scenario: Does user device type affect premium subscription rates?

Device Subscribed Not Subscribed Total
Mobile 120 880 1000
Desktop 200 800 1000

Solution:

Step 1: Calculate expected values
E(Mobile, Sub) = (1000 × 320) / 2000 = 160
E(Mobile, Not) = (1000 × 1680) / 2000 = 840
E(Desktop, Sub) = 160
E(Desktop, Not) = 840

Step 2: Calculate chi-square statistic
χ² = Σ (O - E)² / E
   = (120-160)²/160 + (880-840)²/840 + (200-160)²/160 + (800-840)²/840
   = 10 + 1.9 + 10 + 1.9
   = 23.8

Step 3: Compare to critical value
df = (rows-1) × (cols-1) = 1
Critical χ² at α=0.05 = 3.84

χ² = 23.8 > 3.84

Conclusion: Device type is significantly associated with subscription rate.

Problem 3: Correlation vs Causation

Interview question: "We found that users who use our mobile app have 3x higher retention than web-only users. Should we invest more in mobile?"

Strong answer:

"Before recommending increased mobile investment, I'd investigate several alternatives:

  1. Selection bias: Are mobile users fundamentally different? They may be more engaged overall, using both platforms.

  2. Reverse causation: Does mobile cause retention, or do retained users eventually download the app?

  3. Confounders:

    • Notification access (mobile users receive push notifications)
    • Demographic differences (age, tech-savviness)
    • Use case differences

What I'd do:

  • Compare retention for users who started on mobile vs web (cohort analysis)
  • Control for user characteristics in regression
  • Look at retention change when users adopt mobile after using web
  • If possible, run an experiment encouraging web users to try mobile"

Problem 4: Simpson's Paradox

Scenario: A drug trial shows:

Group Drug Success Control Success
Mild cases 80% (80/100) 90% (180/200)
Severe cases 30% (60/200) 20% (20/100)
Overall 47% (140/300) 67% (200/300)

Drug looks worse overall but better for severe cases!

Explanation:

"This is Simpson's Paradox. The drug appears worse overall (47% vs 67%), but when we stratify by severity:

  • Severe cases: Drug 30% vs Control 20% (drug better)
  • Mild cases: Drug 80% vs Control 90% (drug worse)

The paradox occurs because:

  1. Drug was given more often to severe cases (200 severe vs 100 mild)
  2. Control was given more often to mild cases (200 mild vs 100 severe)
  3. Severe cases have lower success rates overall

Correct interpretation: The drug is more effective for severe cases (the harder problem). The overall average is misleading because of unequal allocation."

Problem 5: Power Analysis

Question: "How many users do we need per group to detect a 5% relative improvement in conversion rate (from 10% to 10.5%) with 80% power at α=0.05?"

Solution:

Using standard formula for two-proportion test:

n = 2 × [(Zα/2 + Zβ)² × p̄(1-p̄)] / (p₁ - p₂)²

Where:
- Zα/2 = 1.96 (for α=0.05, two-tailed)
- Zβ = 0.84 (for 80% power)
- p₁ = 0.10, p₂ = 0.105
- p̄ = (0.10 + 0.105) / 2 = 0.1025

n = 2 × [(1.96 + 0.84)² × 0.1025 × 0.8975] / (0.005)²
  = 2 × [7.84 × 0.092] / 0.000025
  = 2 × 0.721 / 0.000025
  = 57,680 per group

Need ~58,000 users per group (116,000 total) to detect this small effect.

Interview insight: "This highlights why detecting small effects requires large samples. I'd ask whether a 5% relative improvement is worth the cost of this experiment, or if we should focus on larger potential improvements first."

Show your work step-by-step. Interviewers care more about your process than memorizing formulas. :::

Quiz

Module 3: Statistics & Probability

Take Quiz