ML Fundamentals Questions

Supervised Learning Deep Dive

5 min read

Core Algorithms You Must Know

1. Linear Regression

Interview Question: "Explain how linear regression works and when you'd use it."

Answer Framework:

  • What it is: Fits a linear relationship between features and continuous target
  • Formula: y = w₁x₁ + w₂x₂ + ... + b
  • How it learns: Minimizes Mean Squared Error (MSE) using normal equation or gradient descent
  • When to use: Simple baseline, interpretable coefficients, linear relationships
  • Limitations: Assumes linearity, sensitive to outliers, can't model complex patterns

Code Example:

import numpy as np

def fit_linear_regression(X, y):
    """
    Fit using normal equation: w = (X^T X)^(-1) X^T y

    Time: O(n * d^2) where n=samples, d=features
    Space: O(d^2)
    """
    X_with_bias = np.hstack([np.ones((X.shape[0], 1)), X])
    weights = np.linalg.inv(X_with_bias.T @ X_with_bias) @ X_with_bias.T @ y
    return weights

# Test
X = np.array([[1], [2], [3], [4]])
y = np.array([2, 4, 6, 8])
weights = fit_linear_regression(X, y)
print(f"Intercept: {weights[0]:.2f}, Slope: {weights[1]:.2f}")

Common Follow-ups:

  • "What if X^T X is not invertible?" → Use regularization (Ridge/Lasso) or SVD
  • "How do you handle categorical features?" → One-hot encoding or target encoding
  • "Linear regression vs logistic regression?" → Linear for continuous, logistic for binary classification

2. Logistic Regression

Interview Question: "How does logistic regression work for classification?"

Answer Framework:

  • What it is: Linear model + sigmoid function for probability estimates
  • Formula: P(y=1|x) = σ(w^T x) where σ(z) = 1/(1 + e^(-z))
  • Loss function: Binary cross-entropy (log loss)
  • Decision boundary: Linear (can be made non-linear with feature engineering)
  • Output: Probabilities between 0 and 1

Key Insight: Despite "regression" in the name, it's a classification algorithm.

Implementation:

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def logistic_regression_predict(X, weights):
    """
    Predict probabilities using logistic regression

    X: (n_samples, n_features)
    weights: (n_features + 1,) including bias
    """
    X_with_bias = np.hstack([np.ones((X.shape[0], 1)), X])
    logits = X_with_bias @ weights
    probabilities = sigmoid(logits)
    return probabilities

# Convert probabilities to class predictions
def predict_class(probabilities, threshold=0.5):
    return (probabilities >= threshold).astype(int)

Common Interview Questions:

  • "Why use sigmoid?" → Maps any real number to (0,1), differentiable for gradient descent
  • "Multi-class logistic regression?" → Softmax regression (one-vs-all or multinomial)
  • "How to choose threshold?" → Depends on precision/recall trade-off for your use case

3. Decision Trees

Interview Question: "Explain how decision trees make decisions and their pros/cons."

Answer Framework:

  • How they work: Recursively split data based on features to maximize information gain
  • Splitting criteria:
    • Classification: Gini impurity or entropy
    • Regression: Variance reduction
  • Pros: Interpretable, handles non-linear relationships, no feature scaling needed
  • Cons: Prone to overfitting, high variance, unstable (small data changes → different tree)

Gini Impurity Formula:

Gini = 1 - Σ(p_i)² for all classes i

Example Calculation:

def gini_impurity(labels):
    """
    Calculate Gini impurity for a set of labels

    Example: [0, 0, 1, 1, 1] -> 0.48
    """
    from collections import Counter
    counts = Counter(labels)
    total = len(labels)
    impurity = 1.0

    for count in counts.values():
        prob = count / total
        impurity -= prob ** 2

    return impurity

# Test
labels = [0, 0, 1, 1, 1]
print(f"Gini: {gini_impurity(labels):.2f}")  # 0.48

Key Interview Points:

  • "How to prevent overfitting?" → Limit max_depth, min_samples_split, pruning
  • "Decision tree vs random forest?" → Single tree overfits; forest aggregates many trees for better generalization
  • "Can handle missing values?" → Yes, with surrogate splits or imputation

4. Random Forests

Interview Question: "How do random forests improve on decision trees?"

Answer Framework:

  • Key technique: Bagging (Bootstrap Aggregating)
  • Process:
    1. Create N bootstrap samples (sample with replacement)
    2. Train decision tree on each, using random subset of features at each split
    3. Average predictions (regression) or vote (classification)
  • Why it works: Reduces variance while maintaining low bias
  • Feature randomness: Decorrelates trees, prevents dominant features from appearing in every tree

Code Concept:

def random_forest_predict(X, trees):
    """
    Aggregate predictions from multiple trees

    For classification: majority vote
    For regression: average
    """
    predictions = [tree.predict(X) for tree in trees]

    # Classification: majority vote
    from scipy import stats
    final_pred = stats.mode(predictions, axis=0)[0]

    # Regression alternative:
    # final_pred = np.mean(predictions, axis=0)

    return final_pred

Advantages Over Single Tree:

  • Lower variance (less overfitting)
  • More robust to noise
  • Feature importance estimates
  • Out-of-bag (OOB) error for validation

Trade-offs:

  • Less interpretable than single tree
  • Slower to train and predict
  • More memory intensive

5. Gradient Boosting (XGBoost, LightGBM)

Interview Question: "Explain gradient boosting and when to use it over random forests."

Answer Framework:

  • Key technique: Boosting (sequential ensemble)
  • Process:
    1. Train weak learner (shallow tree)
    2. Calculate residuals (errors)
    3. Train next tree to predict residuals
    4. Add to ensemble with learning rate
    5. Repeat
  • Difference from bagging: Sequential (each tree corrects previous) vs parallel

Pseudocode:

def gradient_boosting_concept(X, y, n_trees, learning_rate):
    """
    Conceptual gradient boosting

    F_0(x) = initial prediction (mean)
    F_m(x) = F_{m-1}(x) + learning_rate * h_m(x)

    where h_m(x) is a tree trained on residuals
    """
    F = np.full(len(y), y.mean())  # Initial prediction

    for m in range(n_trees):
        residuals = y - F
        tree_m = fit_tree(X, residuals)  # Train on errors
        F += learning_rate * tree_m.predict(X)

    return F

When to Use:

  • Gradient Boosting (XGBoost, LightGBM):

    • Tabular data competitions (Kaggle)
    • Need highest accuracy
    • Have time for hyperparameter tuning
    • Features are heterogeneous
  • Random Forests:

    • Want out-of-the-box performance
    • Less tuning time
    • More robust to hyperparameters
    • Easier to parallelize

Key Interview Points:

  • "How to prevent overfitting?" → Lower learning rate, max_depth, early stopping
  • "XGBoost vs LightGBM?" → LightGBM faster for large datasets, uses histogram-based splits
  • "Why learning rate?" → Prevents overfitting by making smaller updates

Algorithm Comparison Table

AlgorithmInterpretabilitySpeedAccuracyOverfitting RiskHyperparameter Tuning
Linear RegressionHighFastLow-MediumLowMinimal
Logistic RegressionHighFastMediumLowMinimal
Decision TreeMedium-HighFastMediumHighMedium
Random ForestLowMediumHighMediumMedium
Gradient BoostingLowSlowVery HighHigh if not tunedHigh

How to Answer "When would you use algorithm X?"

Template:

  1. Nature of data: Linear vs non-linear patterns, feature types
  2. Problem requirements: Interpretability, speed, accuracy priority
  3. Data size: Small datasets → simpler models; large → can use complex
  4. Baseline: Start simple (linear/logistic), then try ensembles if needed

Example Answer:

"For a credit scoring problem with 50 features and 100K samples, I'd start with logistic regression as a baseline for interpretability and speed. If accuracy isn't sufficient, I'd try random forest for robust performance with minimal tuning. For maximum accuracy in a Kaggle-style competition, I'd use XGBoost with careful cross-validation and hyperparameter tuning."

Key Takeaways

  1. Know the fundamentals - Be able to explain math and intuition
  2. Understand trade-offs - No algorithm is best for everything
  3. Implementation matters - Know how to code from scratch for interviews
  4. Relate to experience - Connect to projects you've worked on

What's Next?

In the next lesson, we'll cover neural networks and deep learning concepts that frequently appear in ML interviews.

:::

Quick check: how does this lesson land for you?

Quiz

Module 3: ML Fundamentals Questions

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.