Python for ML Interviews

Why Python Mastery Matters

In ML engineering interviews, you're expected to code fluently in Python. Unlike general software engineering interviews where any language works, ML roles almost universally require Python proficiency because:

90%+ of ML libraries are Python-first (TensorFlow, PyTorch, scikit-learn)
Data manipulation happens in Python (pandas, NumPy)
Interviewers will evaluate your Python idioms and style

Essential Python Patterns

1. List Comprehensions

List comprehensions are not just syntactic sugar—they show Pythonic thinking and are often faster than loops.

Basic Pattern:

# Bad: Using loops
result = []
for x in range(10):
    if x % 2 == 0:
        result.append(x * x)

# Good: List comprehension
result = [x * x for x in range(10) if x % 2 == 0]

Nested Comprehensions:

# Flatten a 2D matrix
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Bad
flat = []
for row in matrix:
    for val in row:
        flat.append(val)

# Good
flat = [val for row in matrix for val in row]
# Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Dictionary Comprehensions:

# Create feature name to index mapping
features = ['age', 'income', 'score']
feature_to_idx = {name: idx for idx, name in enumerate(features)}
# Output: {'age': 0, 'income': 1, 'score': 2}

Interview Question:

"Given a list of numbers, create a dictionary mapping each number to its square, but only for even numbers."

def create_square_dict(numbers):
    return {n: n**2 for n in numbers if n % 2 == 0}

# Test
print(create_square_dict([1, 2, 3, 4, 5, 6]))
# Output: {2: 4, 4: 16, 6: 36}

2. Collections Module

The collections module provides powerful data structures that solve common ML interview problems.

Counter - For Frequency Counting:

from collections import Counter

# Common interview task: Find most common elements
words = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']
counter = Counter(words)

print(counter.most_common(2))
# Output: [('apple', 3), ('banana', 2)]

# Useful for: Feature frequency, class distribution, word counts

defaultdict - Avoid KeyError:

from collections import defaultdict

# Bad: Manual key checking
word_indices = {}
for idx, word in enumerate(['cat', 'dog', 'cat', 'bird']):
    if word not in word_indices:
        word_indices[word] = []
    word_indices[word].append(idx)

# Good: defaultdict
word_indices = defaultdict(list)
for idx, word in enumerate(['cat', 'dog', 'cat', 'bird']):
    word_indices[word].append(idx)

# Output: {'cat': [0, 2], 'dog': [1], 'bird': [3]}

deque - For Sliding Windows:

from collections import deque

def moving_average(values, window_size):
    """Calculate moving average using deque"""
    window = deque(maxlen=window_size)
    result = []

    for val in values:
        window.append(val)
        result.append(sum(window) / len(window))

    return result

# Test
print(moving_average([1, 2, 3, 4, 5], 3))
# Output: [1.0, 1.5, 2.0, 3.0, 4.0]

3. NumPy-Style Operations

Many ML coding questions involve matrix/array operations. Know these patterns cold.

Array Creation:

import numpy as np

# Zeros, ones, identity
zeros = np.zeros((3, 3))
ones = np.ones((2, 4))
identity = np.eye(3)

# From lists
arr = np.array([[1, 2], [3, 4]])

# Ranges
linear = np.arange(0, 10, 2)      # [0, 2, 4, 6, 8]
linspace = np.linspace(0, 1, 5)   # [0.0, 0.25, 0.5, 0.75, 1.0]

Reshaping and Transposing:

# Reshape
arr = np.arange(12)
matrix = arr.reshape(3, 4)    # 3x4 matrix

# Transpose
transposed = matrix.T         # 4x3 matrix

# Flatten
flat = matrix.flatten()       # Back to 1D array

Broadcasting:

# Add scalar to matrix (broadcasting)
matrix = np.array([[1, 2, 3], [4, 5, 6]])
result = matrix + 10
# [[11, 12, 13], [14, 15, 16]]

# Normalize each row (common in ML)
row_means = matrix.mean(axis=1, keepdims=True)
normalized = matrix - row_means

Indexing and Slicing:

# Boolean indexing (very common in ML)
arr = np.array([1, 2, 3, 4, 5, 6])
even = arr[arr % 2 == 0]      # [2, 4, 6]

# Fancy indexing
indices = [0, 2, 4]
selected = arr[indices]       # [1, 3, 5]

# 2D slicing
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
sub = matrix[:2, 1:]          # First 2 rows, columns 1 onwards
# [[2, 3], [5, 6]]

Interview Question:

"Normalize each column of a matrix to have mean 0 and standard deviation 1 (z-score normalization)."

def normalize_columns(X):
    """
    X: numpy array of shape (n_samples, n_features)
    Returns: normalized array
    """
    mean = X.mean(axis=0)
    std = X.std(axis=0)
    return (X - mean) / std

# Test
X = np.array([[1, 2], [3, 4], [5, 6]])
normalized = normalize_columns(X)
print(normalized.mean(axis=0))  # ~[0, 0]
print(normalized.std(axis=0))   # ~[1, 1]

4. Efficient Iteration Patterns

enumerate() - Get Index and Value:

# Bad
for i in range(len(items)):
    print(i, items[i])

# Good
for i, item in enumerate(items):
    print(i, item)

# Start from custom index
for i, item in enumerate(items, start=1):
    print(i, item)

zip() - Iterate Multiple Lists:

names = ['Alice', 'Bob', 'Charlie']
scores = [85, 92, 78]

# Create name-score pairs
for name, score in zip(names, scores):
    print(f"{name}: {score}")

# Create dictionary
name_to_score = dict(zip(names, scores))

itertools - Powerful Combinations:

from itertools import combinations, permutations, product

# All pairs from a list (common in similarity calculations)
items = [1, 2, 3]
pairs = list(combinations(items, 2))
# [(1, 2), (1, 3), (2, 3)]

# Cartesian product (grid search)
learning_rates = [0.01, 0.1]
batch_sizes = [32, 64]
configs = list(product(learning_rates, batch_sizes))
# [(0.01, 32), (0.01, 64), (0.1, 32), (0.1, 64)]

5. Time and Space Complexity Awareness

Always analyze your code's efficiency—interviewers will ask.

Common Complexity Patterns:

Operation	Time Complexity	Note
`x in list`	O(n)	Linear scan
`x in set`	O(1) average	Hash lookup
`list.append()`	O(1) amortized	Occasionally O(n) when resizing
`list.insert(0, x)`	O(n)	Shifts all elements
`dict[key]`	O(1) average	Hash lookup
`sorted(list)`	O(n log n)	Timsort algorithm
`list comprehension`	O(n)	For n elements

Space Complexity Example:

# O(1) space - in-place
def reverse_in_place(arr):
    left, right = 0, len(arr) - 1
    while left < right:
        arr[left], arr[right] = arr[right], arr[left]
        left += 1
        right -= 1

# O(n) space - creates new list
def reverse_new_list(arr):
    return arr[::-1]  # Creates a copy

6. Common Pitfalls to Avoid

Pitfall 1: Mutable Default Arguments

# Bug!
def add_sample(sample, samples=[]):
    samples.append(sample)
    return samples

# Each call modifies the SAME list
print(add_sample(1))  # [1]
print(add_sample(2))  # [1, 2] - Unexpected!

# Fix
def add_sample(sample, samples=None):
    if samples is None:
        samples = []
    samples.append(sample)
    return samples

Pitfall 2: Shallow vs Deep Copy

import copy

# Shallow copy - nested objects are shared
original = [[1, 2], [3, 4]]
shallow = original.copy()
shallow[0][0] = 999
print(original)  # [[999, 2], [3, 4]] - Original changed!

# Deep copy - fully independent
original = [[1, 2], [3, 4]]
deep = copy.deepcopy(original)
deep[0][0] = 999
print(original)  # [[1, 2], [3, 4]] - Original unchanged

Pitfall 3: Integer Division in Python 3

# Python 3 uses true division
result = 5 / 2      # 2.5 (float)

# For integer division
result = 5 // 2     # 2 (int)

# Common in ML: calculating batch indices
batch_size = 32
total_samples = 100
num_batches = total_samples // batch_size  # 3, not 3.125

Pitfall 4: Modifying List While Iterating

# Bug!
numbers = [1, 2, 3, 4, 5]
for i, n in enumerate(numbers):
    if n % 2 == 0:
        numbers.pop(i)  # This breaks iteration!

# Fix 1: Iterate backwards
for i in range(len(numbers) - 1, -1, -1):
    if numbers[i] % 2 == 0:
        numbers.pop(i)

# Fix 2: List comprehension (best)
numbers = [n for n in numbers if n % 2 != 0]

Interview Coding Best Practices

1. Think Out Loud

Don't code in silence. Explain your approach:

"I'll use a dictionary to store frequency counts"
"This will be O(n) time complexity"
"Let me handle the edge case where the array is empty"

2. Start with Examples

def function(arr):
    """
    Example:
    Input: [1, 2, 2, 3, 3, 3]
    Output: {1: 1, 2: 2, 3: 3}

    Approach: Use Counter from collections
    """
    # Your code

3. Write Clean, Readable Code

# Bad: Cryptic variable names
def f(x):
    return sum([i*i for i in x if i%2==0])

# Good: Descriptive names, clear logic
def sum_of_even_squares(numbers):
    """Return sum of squares of even numbers"""
    even_numbers = [n for n in numbers if n % 2 == 0]
    squares = [n ** 2 for n in even_numbers]
    return sum(squares)

4. Test Your Code

Always test with:

Normal case: [1, 2, 3, 4]
Edge cases: [], [1]
Special values: [0], negative numbers

Practice Problem

Problem: Implement a function to find the top K most frequent elements in an array.

from collections import Counter
import heapq

def top_k_frequent(nums, k):
    """
    Given an integer array nums and k, return the k most frequent elements.

    Example:
    Input: nums = [1,1,1,2,2,3], k = 2
    Output: [1, 2]

    Time: O(n log k)
    Space: O(n)
    """
    # Count frequencies
    counts = Counter(nums)

    # Use heap to find top k
    # Counter.most_common(k) also works, but let's use heap approach
    return heapq.nlargest(k, counts.keys(), key=counts.get)

# Test
print(top_k_frequent([1,1,1,2,2,3], 2))  # [1, 2]
print(top_k_frequent([1], 1))            # [1]

Key Takeaways

Master list comprehensions - They appear in 70% of ML coding questions
Know collections module - Counter, defaultdict, deque are interview favorites
Practice NumPy operations - Matrix manipulation is core to ML
Analyze complexity - Always state time and space complexity
Write clean code - Readable code shows engineering maturity
Test thoroughly - Catch bugs before the interviewer does

What's Next?

In the next lesson, we'll focus specifically on array and matrix manipulation patterns that appear repeatedly in ML interviews.

:::