Python & Algorithms for ML
Common Algorithm Patterns
The Pattern Approach
Most coding problems follow recognizable patterns. Master 5-6 core patterns and you can solve 80% of interview questions. This lesson focuses on patterns adapted for ML contexts.
Pattern 1: Two Pointers
When to Use:
- Problems involving sorted arrays
- Finding pairs with specific properties
- Reversing or rearranging elements
- Comparing sequences
Basic Template:
def two_pointers(arr):
left, right = 0, len(arr) - 1
while left < right:
# Process elements at left and right
if condition:
left += 1
else:
right -= 1
return result
ML Interview Example: Find Pair with Target Sum
def find_pair_sum(numbers, target):
"""
Find two numbers that sum to target in sorted array
Time: O(n)
Space: O(1)
"""
left, right = 0, len(numbers) - 1
while left < right:
current_sum = numbers[left] + numbers[right]
if current_sum == target:
return [left, right]
elif current_sum < target:
left += 1 # Need larger sum
else:
right -= 1 # Need smaller sum
return None
# Test
print(find_pair_sum([1, 2, 3, 4, 6], target=6)) # [1, 3] (2+4=6)
Advanced: Remove Duplicates In-Place
def remove_duplicates(arr):
"""
Remove duplicates from sorted array in-place
Example: [1,1,2,2,3] -> [1,2,3] (length=3)
Return new length
Time: O(n)
Space: O(1)
"""
if not arr:
return 0
write = 1 # Position to write next unique element
for read in range(1, len(arr)):
if arr[read] != arr[read - 1]:
arr[write] = arr[read]
write += 1
return write
# Test
arr = [1, 1, 2, 2, 3, 4, 4]
length = remove_duplicates(arr)
print(arr[:length]) # [1, 2, 3, 4]
Pattern 2: Sliding Window
When to Use:
- Problems involving subarrays/substrings
- Running calculations over fixed-size windows
- Finding optimal subarray
- Time series problems
Basic Template:
def sliding_window(arr, k):
window_start = 0
window_sum = 0
result = []
for window_end in range(len(arr)):
# Add new element to window
window_sum += arr[window_end]
# Window is size k
if window_end >= k - 1:
result.append(window_sum)
# Remove oldest element
window_sum -= arr[window_start]
window_start += 1
return result
ML Interview Example: Maximum Sum Subarray
def max_sum_subarray(arr, k):
"""
Find maximum sum of any contiguous subarray of size k
Example: arr=[2,1,5,1,3,2], k=3
Answer: 9 (subarray [5,1,3])
Time: O(n)
Space: O(1)
"""
max_sum = float('-inf')
window_sum = 0
window_start = 0
for window_end in range(len(arr)):
window_sum += arr[window_end]
# Once we hit window size k
if window_end >= k - 1:
max_sum = max(max_sum, window_sum)
# Slide window
window_sum -= arr[window_start]
window_start += 1
return max_sum
# Test
print(max_sum_subarray([2, 1, 5, 1, 3, 2], k=3)) # 9
Advanced: Longest Substring with K Distinct Characters
def longest_substring_k_distinct(s, k):
"""
Find length of longest substring with at most k distinct characters
Example: s="araaci", k=2
Answer: 4 (substring "araa")
Time: O(n)
Space: O(k) for character frequency map
"""
from collections import defaultdict
char_freq = defaultdict(int)
window_start = 0
max_length = 0
for window_end in range(len(s)):
# Add character to window
char_freq[s[window_end]] += 1
# Shrink window if more than k distinct
while len(char_freq) > k:
char_freq[s[window_start]] -= 1
if char_freq[s[window_start]] == 0:
del char_freq[s[window_start]]
window_start += 1
max_length = max(max_length, window_end - window_start + 1)
return max_length
# Test
print(longest_substring_k_distinct("araaci", k=2)) # 4
Pattern 3: Binary Search
When to Use:
- Searching in sorted arrays
- Finding thresholds or boundaries
- Optimization problems with monotonic property
- Hyperparameter tuning contexts
Basic Template:
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1 # Not found
ML Interview Example: Find Threshold
def find_threshold(scores, k):
"""
Find minimum threshold where at least k scores are >= threshold
Example: scores=[1,2,3,4,5], k=3
Answer: 3 (scores 3,4,5 are >= 3)
Time: O(n log n) for sort + O(log n) for binary search = O(n log n)
Space: O(1)
"""
scores_sorted = sorted(scores)
left, right = 0, len(scores) - 1
result = scores_sorted[-1]
while left <= right:
mid = (left + right) // 2
threshold = scores_sorted[mid]
# Count how many scores >= threshold
count = len(scores) - mid
if count >= k:
result = threshold
left = mid + 1 # Try higher threshold
else:
right = mid - 1 # Need lower threshold
return result
# Test
print(find_threshold([1, 2, 3, 4, 5], k=3)) # 3
Advanced: Find Peak Element
def find_peak(arr):
"""
Find any peak element (element greater than neighbors)
Example: [1,2,3,1] -> index 2 (value 3)
Time: O(log n)
Space: O(1)
"""
left, right = 0, len(arr) - 1
while left < right:
mid = (left + right) // 2
if arr[mid] < arr[mid + 1]:
# Peak is on the right
left = mid + 1
else:
# Peak is on the left or at mid
right = mid
return left
# Test
print(find_peak([1, 2, 3, 1])) # 2
print(find_peak([1, 2, 1, 3, 5, 6, 4])) # 5 or 1 (multiple peaks)
Pattern 4: Hash Maps (Dictionaries)
When to Use:
- Counting frequency
- Finding duplicates
- Storing seen elements
- Building lookup tables
ML Interview Example: Find Most Frequent Class
from collections import Counter
def most_frequent_class(predictions):
"""
Find most frequently predicted class
Example: [1,2,1,3,1,2] -> 1
Time: O(n)
Space: O(k) where k = number of unique classes
"""
counter = Counter(predictions)
return counter.most_common(1)[0][0]
# Test
print(most_frequent_class([1, 2, 1, 3, 1, 2])) # 1
Advanced: Group Anagrams
from collections import defaultdict
def group_anagrams(words):
"""
Group words that are anagrams
Example: ["eat","tea","tan","ate","nat","bat"]
Output: [["eat","tea","ate"], ["tan","nat"], ["bat"]]
Time: O(n * k log k) where k = avg word length
Space: O(n * k)
"""
groups = defaultdict(list)
for word in words:
# Sort word to get canonical form
key = ''.join(sorted(word))
groups[key].append(word)
return list(groups.values())
# Test
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
ML Context: Feature Value Mapping
def encode_categorical(categories):
"""
Encode categorical values to integers
Example: ["red", "blue", "red", "green", "blue"]
Output: [0, 1, 0, 2, 1], {"red": 0, "blue": 1, "green": 2}
Time: O(n)
Space: O(k) where k = number of unique categories
"""
encoding = {}
encoded = []
next_id = 0
for cat in categories:
if cat not in encoding:
encoding[cat] = next_id
next_id += 1
encoded.append(encoding[cat])
return encoded, encoding
# Test
encoded, mapping = encode_categorical(["red", "blue", "red", "green", "blue"])
print(encoded) # [0, 1, 0, 2, 1]
print(mapping) # {'red': 0, 'blue': 1, 'green': 2}
Pattern 5: Stack
When to Use:
- Parsing expressions
- Matching brackets
- Maintaining order with constraints
- Backtracking problems
ML Interview Example: Valid Parentheses
def valid_parentheses(s):
"""
Check if string has valid parentheses
Example: "({[]})" -> True
"({[})" -> False
Time: O(n)
Space: O(n)
"""
stack = []
pairs = {')': '(', '}': '{', ']': '['}
for char in s:
if char in '({[':
stack.append(char)
elif char in ')}]':
if not stack or stack[-1] != pairs[char]:
return False
stack.pop()
return len(stack) == 0
# Test
print(valid_parentheses("({[]})")) # True
print(valid_parentheses("({[})")) # False
ML Context: Moving Average
class MovingAverage:
"""
Calculate moving average of last k values
Used in: Smoothing loss curves, running metrics
Time: O(1) per add
Space: O(k)
"""
def __init__(self, k):
from collections import deque
self.k = k
self.queue = deque(maxlen=k)
self.sum = 0
def add(self, val):
# If queue is full, subtract oldest value
if len(self.queue) == self.k:
self.sum -= self.queue[0]
self.queue.append(val)
self.sum += val
return self.sum / len(self.queue)
# Test
ma = MovingAverage(3)
print(ma.add(1)) # 1.0
print(ma.add(10)) # 5.5
print(ma.add(3)) # 4.67
print(ma.add(5)) # 6.0 (avg of [10, 3, 5])
Complete Interview Problem
Problem: Implement a function to find the K closest data points to a target point in feature space.
import heapq
def k_closest_points(points, target, k):
"""
Find K closest points to target using Euclidean distance
points: List of tuples [(x1, y1), (x2, y2), ...]
target: Tuple (x, y)
k: Number of closest points to return
Returns: List of k closest points
Time: O(n log k) using heap
Space: O(k) for heap
Pattern: Heap (priority queue)
"""
def distance(p1, p2):
return ((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2) ** 0.5
# Max heap (negate distances for max heap behavior)
max_heap = []
for point in points:
dist = distance(point, target)
if len(max_heap) < k:
# Haven't filled k points yet
heapq.heappush(max_heap, (-dist, point))
elif dist < -max_heap[0][0]:
# This point is closer than furthest point in heap
heapq.heapreplace(max_heap, (-dist, point))
return [point for (_, point) in max_heap]
# Test
points = [(1, 3), (3, 4), (2, -1), (-2, 2), (5, 1)]
target = (0, 0)
k = 3
result = k_closest_points(points, target, k)
print(result) # Closest 3 points to origin
Pattern Recognition Guide
| Problem Clue | Pattern | Time Complexity |
|---|---|---|
| "Subarray of size k" | Sliding Window | O(n) |
| "Sorted array" + search | Binary Search | O(log n) |
| "Two elements sum to..." | Two Pointers or Hash Map | O(n) |
| "Most/least frequent" | Hash Map + Heap | O(n log k) |
| "Valid parentheses/brackets" | Stack | O(n) |
| "K closest/largest/smallest" | Heap | O(n log k) |
| "Substring with condition" | Sliding Window + Hash Map | O(n) |
Key Takeaways
- Recognize patterns - Don't solve from scratch every time
- Start with brute force - Optimize after you have a working solution
- Draw examples - Visualize the algorithm on paper
- State complexity - Always mention time and space
- Test edge cases - Empty arrays, single elements, duplicates
What's Next?
In Module 3, we'll apply these algorithmic skills to ML-specific interview questions, covering supervised learning algorithms, neural networks, and optimization techniques.
:::