Statistics & Probability
Hypothesis Testing
Hypothesis testing is the foundation of data-driven decision making. Interviewers expect you to understand not just the mechanics, but the interpretation and limitations.
The Hypothesis Testing Framework
Every hypothesis test follows this structure:
-
State hypotheses:
- H₀ (Null): The default assumption (no effect, no difference)
- H₁ (Alternative): What we're testing for
-
Choose significance level (α): Typically 0.05 or 0.01
-
Calculate test statistic: Based on sample data
-
Make decision: Reject H₀ if p-value < α
Type I and Type II Errors
| Decision | H₀ True | H₀ False |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct |
| Fail to Reject H₀ | Correct | Type II Error (β) |
Type I Error (False Positive):
- Rejecting H₀ when it's true
- Probability = α (significance level)
- "Crying wolf" - saying there's an effect when there isn't
Type II Error (False Negative):
- Failing to reject H₀ when it's false
- Probability = β
- Missing a real effect
Power = 1 - β: The probability of correctly detecting an effect when it exists.
Interview question: "A/B test shows p=0.03. Is the new feature better?"
Good answer: "At α=0.05, we'd reject the null hypothesis that there's no difference. However, statistical significance doesn't mean practical significance. I'd also look at effect size and confidence intervals before recommending a launch."
p-Values: What They Really Mean
p-value = Probability of observing data this extreme (or more) IF the null hypothesis is true
Common misinterpretations to avoid:
| Wrong | Correct |
|---|---|
| p=0.03 means 3% chance H₀ is true | p=0.03 means 3% chance of this data if H₀ is true |
| p=0.03 means 97% chance effect is real | The effect size is a separate question |
| p=0.06 means "almost significant" | Either significant or not - no "almost" |
Interview trap: "We got p=0.06. Should we collect more data?"
Answer: This is p-hacking. You should decide sample size in advance based on power analysis. Collecting more data because p is close to significant inflates your false positive rate.
Confidence Intervals
A 95% confidence interval means: If we repeated this experiment many times, 95% of the intervals would contain the true parameter.
It does NOT mean: There's a 95% probability the true value is in this interval.
Construction for mean (large sample):
CI = x̄ ± z × (s / √n)
For 95% CI: z = 1.96
For 99% CI: z = 2.58
Interview insight: Confidence intervals are more informative than p-values because they show effect size and uncertainty.
Scenario A: p=0.04, 95% CI = [0.1%, 2.0%]
Scenario B: p=0.04, 95% CI = [5.0%, 25.0%]
Both are "significant," but Scenario B shows a practically meaningful effect.
Common Tests Quick Reference
| Test | Use Case | Assumptions |
|---|---|---|
| t-test (one sample) | Mean vs known value | Normal distribution or n > 30 |
| t-test (two sample) | Compare two group means | Independence, normality |
| Paired t-test | Before/after same subjects | Paired observations |
| Chi-square | Categorical independence | Expected count ≥ 5 per cell |
| ANOVA | Compare 3+ group means | Normality, equal variance |
Interview Problem: A/B Test Analysis
Scenario: Control has 5% conversion rate. Treatment has 5.5% conversion rate. Each group has 10,000 users. Is this significant?
Solution:
H₀: p_treatment = p_control
H₁: p_treatment ≠ p_control
Pooled proportion: p = (500 + 550) / 20000 = 0.0525
Standard error: SE = √[p(1-p)(1/n₁ + 1/n₂)]
= √[0.0525 × 0.9475 × (1/10000 + 1/10000)]
= √[0.0498 × 0.0002]
= 0.00316
z = (0.055 - 0.05) / 0.00316 = 1.58
p-value ≈ 0.114 (two-tailed)
Conclusion: Not significant at α=0.05. The 10% relative improvement (0.5% absolute) could be due to chance.
Always connect statistical results to business implications. "Not significant" doesn't mean "no effect" - it means we can't distinguish from noise. :::