Build a Search and Recommendation Engine
Instructions
In this lab, you will build the core components of a product search and recommendation system from scratch in TypeScript. This is the exact system you would design in a "Design product search for e-commerce" interview question -- but here you implement it.
Architecture Overview
Documents (product catalog)
↓
Tokenizer (lowercase, stop-words, stemming)
↓
Inverted Index (term → posting lists with TF)
↓
BM25 Scorer (rank by relevance)
↓
Search Results
User Interaction History
↓
┌────────────────────┬────────────────────┐
│ Collaborative │ Content-Based │
│ Filter │ Filter │
│ (cosine similarity │ (feature vectors) │
│ on user ratings) │ │
└────────┬───────────┴─────────┬──────────┘
│ │
▼ ▼
Hybrid Recommender (blend + cold-start)
↓
Recommendations
Step-by-Step Guide
FILE 1: Tokenizer (src/search/tokenizer.ts)
Build a text tokenizer that prepares raw text for indexing. Your tokenizer should:
- Convert text to lowercase
- Split on non-alphanumeric characters (whitespace, punctuation)
- Remove English stop words (a, an, the, is, are, was, were, in, on, at, to, for, of, and, or, but, not, with, this, that, it, be, have, do, will, can, etc.)
- Apply a simple suffix-stripping stemmer that handles common English suffixes:
- "ing" -> remove (e.g., "running" -> "runn")
- "ed" -> remove (e.g., "played" -> "play")
- "s" at the end -> remove (e.g., "shoes" -> "shoe")
- "ly" -> remove (e.g., "quickly" -> "quick")
- Only strip if the remaining stem is at least 3 characters
Export a tokenize(text: string): string[] function.
FILE 2: Inverted Index (src/search/inverted-index.ts)
Build an inverted index that maps terms to posting lists. Your index should:
- Accept documents via
addDocument(docId: string, text: string) - Tokenize the text using your tokenizer
- Build posting lists where each entry contains
{ docId, termFrequency } - Track document lengths (number of tokens per document) for BM25
- Track the total number of documents and the average document length
- Provide
getPostings(term: string): PostingEntry[]to retrieve the posting list for a term - Provide
getDocLength(docId: string): numberandgetAvgDocLength(): number
FILE 3: BM25 Scorer (src/search/bm25-scorer.ts)
Implement the BM25 ranking algorithm. Your scorer should:
- Accept the inverted index as a dependency
- Use standard parameters: k1 = 1.2, b = 0.75
- Implement
score(query: string, docId: string): numberthat computes the BM25 score - Implement
search(query: string, topK?: number): SearchResult[]that returns the top-K documents ranked by BM25 score - Use the correct BM25 formula:
- IDF(t) = log((N - df + 0.5) / (df + 0.5) + 1)
- Score contribution per term = IDF * (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * docLen / avgDocLen))
FILE 4: Autocomplete (src/search/autocomplete.ts)
Build a prefix-based autocomplete service. Your service should:
- Maintain a list of terms with popularity scores
- Accept new entries via
addEntry(term: string, score: number) - Implement
suggest(prefix: string, limit?: number): AutocompleteResult[]that:- Finds all terms starting with the lowercase prefix
- Ranks by popularity score descending
- Returns at most
limitresults (default 10)
- Support updating scores via
updateScore(term: string, delta: number)(e.g., increment by 1 on each search)
FILE 5: Collaborative Filter (src/recommendation/collaborative-filter.ts)
Implement user-based collaborative filtering. Your filter should:
- Store user-item ratings via
addRating(userId: string, itemId: string, rating: number) - Compute cosine similarity between two users based on their co-rated items
- Implement
findSimilarUsers(userId: string, topK?: number): SimilarUser[] - Implement
recommend(userId: string, topN?: number): Recommendation[]that:- Finds the top-K most similar users
- Collects items those users rated highly that the target user has not rated
- Predicts a score using weighted average of similar users' ratings
- Returns top-N recommendations sorted by predicted score
FILE 6: Content Filter (src/recommendation/content-filter.ts)
Implement content-based filtering with item feature vectors. Your filter should:
- Register items with feature vectors via
addItem(itemId: string, features: number[]) - Build a user profile by averaging the feature vectors of items the user has liked
- Implement
buildUserProfile(likedItemIds: string[]): number[] - Implement
recommend(likedItemIds: string[], topN?: number): Recommendation[]that:- Builds the user profile vector
- Computes cosine similarity between the user profile and every item not in likedItemIds
- Returns top-N items ranked by similarity
FILE 7: Hybrid Recommender (src/recommendation/hybrid-recommender.ts)
Build a hybrid recommender that blends collaborative and content-based approaches. Your recommender should:
- Accept both a collaborative filter and a content filter as dependencies
- Implement
recommend(userId: string, likedItemIds: string[], topN?: number): Recommendation[] - Handle cold start: if the user has fewer than a configurable threshold (default 5) of ratings, weight content-based results more heavily (e.g., 70% content, 30% collaborative)
- For warm users: weight collaborative results more heavily (e.g., 70% collaborative, 30% content)
- Deduplicate results (same item from both sources): keep the higher score
- Return final results sorted by blended score
FILE 8: Main Entry (src/index.ts)
Wire everything together in a demonstration:
- Create sample product documents (at least 5)
- Index them using the inverted index
- Search with the BM25 scorer and print results
- Add autocomplete entries and demonstrate suggestions
- Set up user ratings and demonstrate collaborative filtering recommendations
- Set up item features and demonstrate content-based recommendations
- Demonstrate the hybrid recommender with both a cold-start user and a warm user
Hints
- For the stemmer, a simple suffix check is sufficient -- you do not need to implement the full Porter Stemmer algorithm
- For cosine similarity, handle the zero-vector edge case (return 0 when either vector has zero magnitude)
- In collaborative filtering, only compute similarity on items that both users have rated (co-rated items)
- For the hybrid recommender, use a Map to deduplicate by itemId and keep the higher score
- The BM25 IDF formula used here includes the "+1" inside the log to avoid negative IDF for terms that appear in more than half the documents