Back to Course|System Design Interview Mastery: Scalable Architecture from Estimation to Production

Lab

Build a Search and Recommendation Engine

40 min

Advanced

Unlimited free attempts

Instructions

In this lab, you will build the core components of a product search and recommendation system from scratch in TypeScript. This is the exact system you would design in a "Design product search for e-commerce" interview question -- but here you implement it.

Architecture Overview

Documents (product catalog)
     ↓
Tokenizer (lowercase, stop-words, stemming)
     ↓
Inverted Index (term → posting lists with TF)
     ↓
BM25 Scorer (rank by relevance)
     ↓
Search Results

User Interaction History
     ↓
┌────────────────────┬────────────────────┐
│ Collaborative      │ Content-Based      │
│ Filter             │ Filter             │
│ (cosine similarity │ (feature vectors)  │
│  on user ratings)  │                    │
└────────┬───────────┴─────────┬──────────┘
         │                     │
         ▼                     ▼
     Hybrid Recommender (blend + cold-start)
         ↓
     Recommendations

Step-by-Step Guide

FILE 1: Tokenizer (`src/search/tokenizer.ts`)

Build a text tokenizer that prepares raw text for indexing. Your tokenizer should:

Convert text to lowercase
Split on non-alphanumeric characters (whitespace, punctuation)
Remove English stop words (a, an, the, is, are, was, were, in, on, at, to, for, of, and, or, but, not, with, this, that, it, be, have, do, will, can, etc.)
Apply a simple suffix-stripping stemmer that handles common English suffixes:
- "ing" -> remove (e.g., "running" -> "runn")
- "ed" -> remove (e.g., "played" -> "play")
- "s" at the end -> remove (e.g., "shoes" -> "shoe")
- "ly" -> remove (e.g., "quickly" -> "quick")
- Only strip if the remaining stem is at least 3 characters

Export a tokenize(text: string): string[] function.

FILE 2: Inverted Index (`src/search/inverted-index.ts`)

Build an inverted index that maps terms to posting lists. Your index should:

Accept documents via addDocument(docId: string, text: string)
Tokenize the text using your tokenizer
Build posting lists where each entry contains { docId, termFrequency }
Track document lengths (number of tokens per document) for BM25
Track the total number of documents and the average document length
Provide getPostings(term: string): PostingEntry[] to retrieve the posting list for a term
Provide getDocLength(docId: string): number and getAvgDocLength(): number

FILE 3: BM25 Scorer (`src/search/bm25-scorer.ts`)

Implement the BM25 ranking algorithm. Your scorer should:

Accept the inverted index as a dependency
Use standard parameters: k1 = 1.2, b = 0.75
Implement score(query: string, docId: string): number that computes the BM25 score
Implement search(query: string, topK?: number): SearchResult[] that returns the top-K documents ranked by BM25 score
Use the correct BM25 formula:
- IDF(t) = log((N - df + 0.5) / (df + 0.5) + 1)
- Score contribution per term = IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * docLen / avgDocLen))

FILE 4: Autocomplete (`src/search/autocomplete.ts`)

Build a prefix-based autocomplete service. Your service should:

Maintain a list of terms with popularity scores
Accept new entries via addEntry(term: string, score: number)
Implement suggest(prefix: string, limit?: number): AutocompleteResult[] that:
- Finds all terms starting with the lowercase prefix
- Ranks by popularity score descending
- Returns at most limit results (default 10)
Support updating scores via updateScore(term: string, delta: number) (e.g., increment by 1 on each search)

FILE 5: Collaborative Filter (`src/recommendation/collaborative-filter.ts`)

Implement user-based collaborative filtering. Your filter should:

Store user-item ratings via addRating(userId: string, itemId: string, rating: number)
Compute cosine similarity between two users based on their co-rated items
Implement findSimilarUsers(userId: string, topK?: number): SimilarUser[]
Implement recommend(userId: string, topN?: number): Recommendation[] that:
- Finds the top-K most similar users
- Collects items those users rated highly that the target user has not rated
- Predicts a score using weighted average of similar users' ratings
- Returns top-N recommendations sorted by predicted score

FILE 6: Content Filter (`src/recommendation/content-filter.ts`)

Implement content-based filtering with item feature vectors. Your filter should:

Register items with feature vectors via addItem(itemId: string, features: number[])
Build a user profile by averaging the feature vectors of items the user has liked
Implement buildUserProfile(likedItemIds: string[]): number[]
Implement recommend(likedItemIds: string[], topN?: number): Recommendation[] that:
- Builds the user profile vector
- Computes cosine similarity between the user profile and every item not in likedItemIds
- Returns top-N items ranked by similarity

FILE 7: Hybrid Recommender (`src/recommendation/hybrid-recommender.ts`)

Build a hybrid recommender that blends collaborative and content-based approaches. Your recommender should:

Accept both a collaborative filter and a content filter as dependencies
Implement recommend(userId: string, likedItemIds: string[], topN?: number): Recommendation[]
Handle cold start: if the user has fewer than a configurable threshold (default 5) of ratings, weight content-based results more heavily (e.g., 70% content, 30% collaborative)
For warm users: weight collaborative results more heavily (e.g., 70% collaborative, 30% content)
Deduplicate results (same item from both sources): keep the higher score
Return final results sorted by blended score

FILE 8: Main Entry (`src/index.ts`)

Wire everything together in a demonstration:

Create sample product documents (at least 5)
Index them using the inverted index
Search with the BM25 scorer and print results
Add autocomplete entries and demonstrate suggestions
Set up user ratings and demonstrate collaborative filtering recommendations
Set up item features and demonstrate content-based recommendations
Demonstrate the hybrid recommender with both a cold-start user and a warm user

Hints

For the stemmer, a simple suffix check is sufficient -- you do not need to implement the full Porter Stemmer algorithm
For cosine similarity, handle the zero-vector edge case (return 0 when either vector has zero magnitude)
In collaborative filtering, only compute similarity on items that both users have rated (co-rated items)
For the hybrid recommender, use a Map to deduplicate by itemId and keep the higher score
The BM25 IDF formula used here includes the "+1" inside the log to avoid negative IDF for terms that appear in more than half the documents

Grading Rubric

Tokenizer correctly lowercases input, splits on non-alphanumeric characters, removes stop words from the provided set, and applies suffix-stripping stemmer (ing, ed, ly, s) only when remaining stem is at least 3 characters10 points

Inverted index addDocument correctly tokenizes text, counts term frequencies per document, builds posting lists with PostingEntry objects, tracks document lengths, and maintains accurate docCount and average document length15 points

BM25 scorer implements the correct formula with k1=1.2 and b=0.75: IDF = log((N - df + 0.5) / (df + 0.5) + 1), term score = IDF * (tf * (k1+1)) / (tf + k1 * (1 - b + b * docLen/avgDocLen)); search method scores all documents, filters zeros, sorts descending, and returns top-K20 points

Autocomplete addEntry stores lowercase terms with scores, suggest filters by lowercase prefix, sorts by score descending, returns at most limit results, and updateScore correctly adjusts existing term scores10 points

Collaborative filter stores ratings in nested maps, computes cosine similarity only on co-rated items (handles zero-vector case), findSimilarUsers filters positive similarity and returns top-K, recommend generates predictions using weighted average of similar users' ratings and returns top-N sorted results15 points

Content filter stores item feature vectors, buildUserProfile averages feature vectors of liked items (skipping missing items), cosineSimilarity handles zero-magnitude vectors, recommend computes similarity with user profile for all non-liked items and returns top-N sorted by similarity10 points

Hybrid recommender checks user rating count against coldStartThreshold, applies correct weights (cold: 0.3 CF / 0.7 CB, warm: 0.7 CF / 0.3 CB), scales scores from both sources, deduplicates by itemId keeping higher score, and returns top-N sorted by blended score15 points

Main entry creates at least 5 product documents, indexes them, performs a BM25 search and prints results, demonstrates autocomplete suggestions, sets up user ratings for collaborative filtering, sets up item features for content filtering, and demonstrates hybrid recommendations for both a cold-start and warm user5 points

Checklist

0/8