Real-World ML Problem Solving
Fraud Detection & Anomaly Detection
5 min read
Fraud Detection Challenges
Imbalanced Data:
- Frauds are <1% of transactions
- Solutions:
- SMOTE (synthetic minority oversampling)
- Class weights in loss function
- Ensemble methods (Random Forest robust to imbalance)
- Anomaly detection (treat fraud as outlier)
Features:
- Transaction: Amount, time, location, merchant
- Behavioral: Average spend, frequency, velocity
- Network: Device fingerprint, IP address
- Historical: Past fraud flags, dispute rate
Interview Q: "Credit card fraud detection - what model?" A:
- Start: Logistic Regression or Random Forest (interpretable for compliance)
- Handle imbalance: Class weights, SMOTE, or anomaly detection
- Features: Transaction velocity, location change, amount deviation
- Real-time: Low latency (<100ms), use cached user profiles
- Evaluation: Precision-Recall (not accuracy), AUC-PR
- Monitoring: Concept drift (fraudsters adapt)
Anomaly Detection
Techniques:
- Statistical: Z-score, IQR
- Isolation Forest: Isolates outliers faster
- Autoencoders: Reconstruction error for normal vs anomaly
- One-class SVM: Learn boundary of normal data
Interview Q: "Detect unusual login activity" A:
- Features: Login time, location, device, failed attempts
- Baseline: User's historical patterns
- Model: Isolation Forest or autoencoder
- Alert: Threshold on anomaly score + contextual rules
:::