Back to Course|System Design Interview Mastery: Scalable Architecture from Estimation to Production
Lab

Build a Search and Recommendation Engine

40 min
Advanced
Unlimited free attempts

Instructions

In this lab, you will build the core components of a product search and recommendation system from scratch in TypeScript. This is the exact system you would design in a "Design product search for e-commerce" interview question -- but here you implement it.

Architecture Overview

Documents (product catalog)
Tokenizer (lowercase, stop-words, stemming)
Inverted Index (term → posting lists with TF)
BM25 Scorer (rank by relevance)
Search Results

User Interaction History
┌────────────────────┬────────────────────┐
│ Collaborative      │ Content-Based      │
│ Filter             │ Filter             │
│ (cosine similarity │ (feature vectors)  │
│  on user ratings)  │                    │
└────────┬───────────┴─────────┬──────────┘
         │                     │
         ▼                     ▼
     Hybrid Recommender (blend + cold-start)
     Recommendations

Step-by-Step Guide

FILE 1: Tokenizer (src/search/tokenizer.ts)

Build a text tokenizer that prepares raw text for indexing. Your tokenizer should:

  1. Convert text to lowercase
  2. Split on non-alphanumeric characters (whitespace, punctuation)
  3. Remove English stop words (a, an, the, is, are, was, were, in, on, at, to, for, of, and, or, but, not, with, this, that, it, be, have, do, will, can, etc.)
  4. Apply a simple suffix-stripping stemmer that handles common English suffixes:
    • "ing" -> remove (e.g., "running" -> "runn")
    • "ed" -> remove (e.g., "played" -> "play")
    • "s" at the end -> remove (e.g., "shoes" -> "shoe")
    • "ly" -> remove (e.g., "quickly" -> "quick")
    • Only strip if the remaining stem is at least 3 characters

Export a tokenize(text: string): string[] function.

FILE 2: Inverted Index (src/search/inverted-index.ts)

Build an inverted index that maps terms to posting lists. Your index should:

  1. Accept documents via addDocument(docId: string, text: string)
  2. Tokenize the text using your tokenizer
  3. Build posting lists where each entry contains { docId, termFrequency }
  4. Track document lengths (number of tokens per document) for BM25
  5. Track the total number of documents and the average document length
  6. Provide getPostings(term: string): PostingEntry[] to retrieve the posting list for a term
  7. Provide getDocLength(docId: string): number and getAvgDocLength(): number

FILE 3: BM25 Scorer (src/search/bm25-scorer.ts)

Implement the BM25 ranking algorithm. Your scorer should:

  1. Accept the inverted index as a dependency
  2. Use standard parameters: k1 = 1.2, b = 0.75
  3. Implement score(query: string, docId: string): number that computes the BM25 score
  4. Implement search(query: string, topK?: number): SearchResult[] that returns the top-K documents ranked by BM25 score
  5. Use the correct BM25 formula:
    • IDF(t) = log((N - df + 0.5) / (df + 0.5) + 1)
    • Score contribution per term = IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * docLen / avgDocLen))

FILE 4: Autocomplete (src/search/autocomplete.ts)

Build a prefix-based autocomplete service. Your service should:

  1. Maintain a list of terms with popularity scores
  2. Accept new entries via addEntry(term: string, score: number)
  3. Implement suggest(prefix: string, limit?: number): AutocompleteResult[] that:
    • Finds all terms starting with the lowercase prefix
    • Ranks by popularity score descending
    • Returns at most limit results (default 10)
  4. Support updating scores via updateScore(term: string, delta: number) (e.g., increment by 1 on each search)

FILE 5: Collaborative Filter (src/recommendation/collaborative-filter.ts)

Implement user-based collaborative filtering. Your filter should:

  1. Store user-item ratings via addRating(userId: string, itemId: string, rating: number)
  2. Compute cosine similarity between two users based on their co-rated items
  3. Implement findSimilarUsers(userId: string, topK?: number): SimilarUser[]
  4. Implement recommend(userId: string, topN?: number): Recommendation[] that:
    • Finds the top-K most similar users
    • Collects items those users rated highly that the target user has not rated
    • Predicts a score using weighted average of similar users' ratings
    • Returns top-N recommendations sorted by predicted score

FILE 6: Content Filter (src/recommendation/content-filter.ts)

Implement content-based filtering with item feature vectors. Your filter should:

  1. Register items with feature vectors via addItem(itemId: string, features: number[])
  2. Build a user profile by averaging the feature vectors of items the user has liked
  3. Implement buildUserProfile(likedItemIds: string[]): number[]
  4. Implement recommend(likedItemIds: string[], topN?: number): Recommendation[] that:
    • Builds the user profile vector
    • Computes cosine similarity between the user profile and every item not in likedItemIds
    • Returns top-N items ranked by similarity

FILE 7: Hybrid Recommender (src/recommendation/hybrid-recommender.ts)

Build a hybrid recommender that blends collaborative and content-based approaches. Your recommender should:

  1. Accept both a collaborative filter and a content filter as dependencies
  2. Implement recommend(userId: string, likedItemIds: string[], topN?: number): Recommendation[]
  3. Handle cold start: if the user has fewer than a configurable threshold (default 5) of ratings, weight content-based results more heavily (e.g., 70% content, 30% collaborative)
  4. For warm users: weight collaborative results more heavily (e.g., 70% collaborative, 30% content)
  5. Deduplicate results (same item from both sources): keep the higher score
  6. Return final results sorted by blended score

FILE 8: Main Entry (src/index.ts)

Wire everything together in a demonstration:

  1. Create sample product documents (at least 5)
  2. Index them using the inverted index
  3. Search with the BM25 scorer and print results
  4. Add autocomplete entries and demonstrate suggestions
  5. Set up user ratings and demonstrate collaborative filtering recommendations
  6. Set up item features and demonstrate content-based recommendations
  7. Demonstrate the hybrid recommender with both a cold-start user and a warm user

Hints

  • For the stemmer, a simple suffix check is sufficient -- you do not need to implement the full Porter Stemmer algorithm
  • For cosine similarity, handle the zero-vector edge case (return 0 when either vector has zero magnitude)
  • In collaborative filtering, only compute similarity on items that both users have rated (co-rated items)
  • For the hybrid recommender, use a Map to deduplicate by itemId and keep the higher score
  • The BM25 IDF formula used here includes the "+1" inside the log to avoid negative IDF for terms that appear in more than half the documents

Grading Rubric

Tokenizer correctly lowercases input, splits on non-alphanumeric characters, removes stop words from the provided set, and applies suffix-stripping stemmer (ing, ed, ly, s) only when remaining stem is at least 3 characters10 points
Inverted index addDocument correctly tokenizes text, counts term frequencies per document, builds posting lists with PostingEntry objects, tracks document lengths, and maintains accurate docCount and average document length15 points
BM25 scorer implements the correct formula with k1=1.2 and b=0.75: IDF = log((N - df + 0.5) / (df + 0.5) + 1), term score = IDF * (tf * (k1+1)) / (tf + k1 * (1 - b + b * docLen/avgDocLen)); search method scores all documents, filters zeros, sorts descending, and returns top-K20 points
Autocomplete addEntry stores lowercase terms with scores, suggest filters by lowercase prefix, sorts by score descending, returns at most limit results, and updateScore correctly adjusts existing term scores10 points
Collaborative filter stores ratings in nested maps, computes cosine similarity only on co-rated items (handles zero-vector case), findSimilarUsers filters positive similarity and returns top-K, recommend generates predictions using weighted average of similar users' ratings and returns top-N sorted results15 points
Content filter stores item feature vectors, buildUserProfile averages feature vectors of liked items (skipping missing items), cosineSimilarity handles zero-magnitude vectors, recommend computes similarity with user profile for all non-liked items and returns top-N sorted by similarity10 points
Hybrid recommender checks user rating count against coldStartThreshold, applies correct weights (cold: 0.3 CF / 0.7 CB, warm: 0.7 CF / 0.3 CB), scales scores from both sources, deduplicates by itemId keeping higher score, and returns top-N sorted by blended score15 points
Main entry creates at least 5 product documents, indexes them, performs a BM25 search and prints results, demonstrates autocomplete suggestions, sets up user ratings for collaborative filtering, sets up item features for content filtering, and demonstrates hybrid recommendations for both a cold-start and warm user5 points

Checklist

0/8

Your Solution

Unlimited free attempts
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.