Python & Algorithms for ML

Common Algorithm Patterns

5 min read

The Pattern Approach

Most coding problems follow recognizable patterns. Master 5-6 core patterns and you can solve 80% of interview questions. This lesson focuses on patterns adapted for ML contexts.

Pattern 1: Two Pointers

When to Use:

  • Problems involving sorted arrays
  • Finding pairs with specific properties
  • Reversing or rearranging elements
  • Comparing sequences

Basic Template:

def two_pointers(arr):
    left, right = 0, len(arr) - 1

    while left < right:
        # Process elements at left and right
        if condition:
            left += 1
        else:
            right -= 1

    return result

ML Interview Example: Find Pair with Target Sum

def find_pair_sum(numbers, target):
    """
    Find two numbers that sum to target in sorted array

    Time: O(n)
    Space: O(1)
    """
    left, right = 0, len(numbers) - 1

    while left < right:
        current_sum = numbers[left] + numbers[right]

        if current_sum == target:
            return [left, right]
        elif current_sum < target:
            left += 1  # Need larger sum
        else:
            right -= 1  # Need smaller sum

    return None

# Test
print(find_pair_sum([1, 2, 3, 4, 6], target=6))  # [1, 3] (2+4=6)

Advanced: Remove Duplicates In-Place

def remove_duplicates(arr):
    """
    Remove duplicates from sorted array in-place

    Example: [1,1,2,2,3] -> [1,2,3] (length=3)
    Return new length

    Time: O(n)
    Space: O(1)
    """
    if not arr:
        return 0

    write = 1  # Position to write next unique element

    for read in range(1, len(arr)):
        if arr[read] != arr[read - 1]:
            arr[write] = arr[read]
            write += 1

    return write

# Test
arr = [1, 1, 2, 2, 3, 4, 4]
length = remove_duplicates(arr)
print(arr[:length])  # [1, 2, 3, 4]

Pattern 2: Sliding Window

When to Use:

  • Problems involving subarrays/substrings
  • Running calculations over fixed-size windows
  • Finding optimal subarray
  • Time series problems

Basic Template:

def sliding_window(arr, k):
    window_start = 0
    window_sum = 0
    result = []

    for window_end in range(len(arr)):
        # Add new element to window
        window_sum += arr[window_end]

        # Window is size k
        if window_end >= k - 1:
            result.append(window_sum)
            # Remove oldest element
            window_sum -= arr[window_start]
            window_start += 1

    return result

ML Interview Example: Maximum Sum Subarray

def max_sum_subarray(arr, k):
    """
    Find maximum sum of any contiguous subarray of size k

    Example: arr=[2,1,5,1,3,2], k=3
    Answer: 9 (subarray [5,1,3])

    Time: O(n)
    Space: O(1)
    """
    max_sum = float('-inf')
    window_sum = 0
    window_start = 0

    for window_end in range(len(arr)):
        window_sum += arr[window_end]

        # Once we hit window size k
        if window_end >= k - 1:
            max_sum = max(max_sum, window_sum)
            # Slide window
            window_sum -= arr[window_start]
            window_start += 1

    return max_sum

# Test
print(max_sum_subarray([2, 1, 5, 1, 3, 2], k=3))  # 9

Advanced: Longest Substring with K Distinct Characters

def longest_substring_k_distinct(s, k):
    """
    Find length of longest substring with at most k distinct characters

    Example: s="araaci", k=2
    Answer: 4 (substring "araa")

    Time: O(n)
    Space: O(k) for character frequency map
    """
    from collections import defaultdict

    char_freq = defaultdict(int)
    window_start = 0
    max_length = 0

    for window_end in range(len(s)):
        # Add character to window
        char_freq[s[window_end]] += 1

        # Shrink window if more than k distinct
        while len(char_freq) > k:
            char_freq[s[window_start]] -= 1
            if char_freq[s[window_start]] == 0:
                del char_freq[s[window_start]]
            window_start += 1

        max_length = max(max_length, window_end - window_start + 1)

    return max_length

# Test
print(longest_substring_k_distinct("araaci", k=2))  # 4

When to Use:

  • Searching in sorted arrays
  • Finding thresholds or boundaries
  • Optimization problems with monotonic property
  • Hyperparameter tuning contexts

Basic Template:

def binary_search(arr, target):
    left, right = 0, len(arr) - 1

    while left <= right:
        mid = (left + right) // 2

        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1

    return -1  # Not found

ML Interview Example: Find Threshold

def find_threshold(scores, k):
    """
    Find minimum threshold where at least k scores are >= threshold

    Example: scores=[1,2,3,4,5], k=3
    Answer: 3 (scores 3,4,5 are >= 3)

    Time: O(n log n) for sort + O(log n) for binary search = O(n log n)
    Space: O(1)
    """
    scores_sorted = sorted(scores)
    left, right = 0, len(scores) - 1
    result = scores_sorted[-1]

    while left <= right:
        mid = (left + right) // 2
        threshold = scores_sorted[mid]

        # Count how many scores >= threshold
        count = len(scores) - mid

        if count >= k:
            result = threshold
            left = mid + 1  # Try higher threshold
        else:
            right = mid - 1  # Need lower threshold

    return result

# Test
print(find_threshold([1, 2, 3, 4, 5], k=3))  # 3

Advanced: Find Peak Element

def find_peak(arr):
    """
    Find any peak element (element greater than neighbors)

    Example: [1,2,3,1] -> index 2 (value 3)

    Time: O(log n)
    Space: O(1)
    """
    left, right = 0, len(arr) - 1

    while left < right:
        mid = (left + right) // 2

        if arr[mid] < arr[mid + 1]:
            # Peak is on the right
            left = mid + 1
        else:
            # Peak is on the left or at mid
            right = mid

    return left

# Test
print(find_peak([1, 2, 3, 1]))  # 2
print(find_peak([1, 2, 1, 3, 5, 6, 4]))  # 5 or 1 (multiple peaks)

Pattern 4: Hash Maps (Dictionaries)

When to Use:

  • Counting frequency
  • Finding duplicates
  • Storing seen elements
  • Building lookup tables

ML Interview Example: Find Most Frequent Class

from collections import Counter

def most_frequent_class(predictions):
    """
    Find most frequently predicted class

    Example: [1,2,1,3,1,2] -> 1

    Time: O(n)
    Space: O(k) where k = number of unique classes
    """
    counter = Counter(predictions)
    return counter.most_common(1)[0][0]

# Test
print(most_frequent_class([1, 2, 1, 3, 1, 2]))  # 1

Advanced: Group Anagrams

from collections import defaultdict

def group_anagrams(words):
    """
    Group words that are anagrams

    Example: ["eat","tea","tan","ate","nat","bat"]
    Output: [["eat","tea","ate"], ["tan","nat"], ["bat"]]

    Time: O(n * k log k) where k = avg word length
    Space: O(n * k)
    """
    groups = defaultdict(list)

    for word in words:
        # Sort word to get canonical form
        key = ''.join(sorted(word))
        groups[key].append(word)

    return list(groups.values())

# Test
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

ML Context: Feature Value Mapping

def encode_categorical(categories):
    """
    Encode categorical values to integers

    Example: ["red", "blue", "red", "green", "blue"]
    Output: [0, 1, 0, 2, 1], {"red": 0, "blue": 1, "green": 2}

    Time: O(n)
    Space: O(k) where k = number of unique categories
    """
    encoding = {}
    encoded = []
    next_id = 0

    for cat in categories:
        if cat not in encoding:
            encoding[cat] = next_id
            next_id += 1
        encoded.append(encoding[cat])

    return encoded, encoding

# Test
encoded, mapping = encode_categorical(["red", "blue", "red", "green", "blue"])
print(encoded)   # [0, 1, 0, 2, 1]
print(mapping)   # {'red': 0, 'blue': 1, 'green': 2}

Pattern 5: Stack

When to Use:

  • Parsing expressions
  • Matching brackets
  • Maintaining order with constraints
  • Backtracking problems

ML Interview Example: Valid Parentheses

def valid_parentheses(s):
    """
    Check if string has valid parentheses

    Example: "({[]})" -> True
             "({[})" -> False

    Time: O(n)
    Space: O(n)
    """
    stack = []
    pairs = {')': '(', '}': '{', ']': '['}

    for char in s:
        if char in '({[':
            stack.append(char)
        elif char in ')}]':
            if not stack or stack[-1] != pairs[char]:
                return False
            stack.pop()

    return len(stack) == 0

# Test
print(valid_parentheses("({[]})"))  # True
print(valid_parentheses("({[})"))   # False

ML Context: Moving Average

class MovingAverage:
    """
    Calculate moving average of last k values

    Used in: Smoothing loss curves, running metrics

    Time: O(1) per add
    Space: O(k)
    """
    def __init__(self, k):
        from collections import deque
        self.k = k
        self.queue = deque(maxlen=k)
        self.sum = 0

    def add(self, val):
        # If queue is full, subtract oldest value
        if len(self.queue) == self.k:
            self.sum -= self.queue[0]

        self.queue.append(val)
        self.sum += val
        return self.sum / len(self.queue)

# Test
ma = MovingAverage(3)
print(ma.add(1))   # 1.0
print(ma.add(10))  # 5.5
print(ma.add(3))   # 4.67
print(ma.add(5))   # 6.0 (avg of [10, 3, 5])

Complete Interview Problem

Problem: Implement a function to find the K closest data points to a target point in feature space.

import heapq

def k_closest_points(points, target, k):
    """
    Find K closest points to target using Euclidean distance

    points: List of tuples [(x1, y1), (x2, y2), ...]
    target: Tuple (x, y)
    k: Number of closest points to return

    Returns: List of k closest points

    Time: O(n log k) using heap
    Space: O(k) for heap

    Pattern: Heap (priority queue)
    """
    def distance(p1, p2):
        return ((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2) ** 0.5

    # Max heap (negate distances for max heap behavior)
    max_heap = []

    for point in points:
        dist = distance(point, target)

        if len(max_heap) < k:
            # Haven't filled k points yet
            heapq.heappush(max_heap, (-dist, point))
        elif dist < -max_heap[0][0]:
            # This point is closer than furthest point in heap
            heapq.heapreplace(max_heap, (-dist, point))

    return [point for (_, point) in max_heap]

# Test
points = [(1, 3), (3, 4), (2, -1), (-2, 2), (5, 1)]
target = (0, 0)
k = 3
result = k_closest_points(points, target, k)
print(result)  # Closest 3 points to origin

Pattern Recognition Guide

Problem Clue Pattern Time Complexity
"Subarray of size k" Sliding Window O(n)
"Sorted array" + search Binary Search O(log n)
"Two elements sum to..." Two Pointers or Hash Map O(n)
"Most/least frequent" Hash Map + Heap O(n log k)
"Valid parentheses/brackets" Stack O(n)
"K closest/largest/smallest" Heap O(n log k)
"Substring with condition" Sliding Window + Hash Map O(n)

Key Takeaways

  1. Recognize patterns - Don't solve from scratch every time
  2. Start with brute force - Optimize after you have a working solution
  3. Draw examples - Visualize the algorithm on paper
  4. State complexity - Always mention time and space
  5. Test edge cases - Empty arrays, single elements, duplicates

What's Next?

In Module 3, we'll apply these algorithmic skills to ML-specific interview questions, covering supervised learning algorithms, neural networks, and optimization techniques.

:::