Hybrid Search & Reranking
Why Hybrid Search
2 min read
Neither semantic nor keyword search is perfect alone. Hybrid search combines their strengths to achieve better retrieval.
The Limitations Problem
Semantic Search Limitations
Vector similarity struggles with:
# Query: "error 404"
# Semantic search finds: "page not found", "missing resource"
# But misses: Documents that literally say "error 404"
# Query: "HIPAA compliance"
# Semantic search finds: "healthcare privacy regulations"
# But misses: Documents using the exact acronym "HIPAA"
Fails at:
- Exact terms and acronyms (API, HIPAA, HTTP)
- Product names and IDs (iPhone 15, SKU-12345)
- Code snippets and error messages
- Numerical queries (dates, versions, prices)
Keyword Search Limitations
BM25/TF-IDF struggles with:
# Query: "how to fix slow application"
# Keyword search finds: Documents with "slow" and "application"
# But misses: "performance optimization techniques"
# Query: "cheap flights to Paris"
# Keyword search finds: Documents with "cheap" and "flights"
# But misses: "budget-friendly airfare to France"
Fails at:
- Synonyms and paraphrases
- Conceptual matching
- Understanding intent
- Handling typos
Complementary Strengths
| Aspect | Semantic | Keyword | Hybrid |
|---|---|---|---|
| Exact terms | Poor | Excellent | Excellent |
| Synonyms | Excellent | Poor | Excellent |
| Acronyms | Poor | Excellent | Excellent |
| Concepts | Excellent | Poor | Excellent |
| Typo tolerance | Good | Poor | Good |
Real-World Examples
Example 1: Technical Documentation
Query: "OAuth 2.0 authentication flow"
Keyword matches:
- "OAuth 2.0 is an authorization framework..."
- "The OAuth 2.0 flow begins with..."
Semantic matches:
- "Token-based authentication process..."
- "Third-party login implementation..."
Hybrid combines both → Complete coverage
Example 2: E-commerce
Query: "blue running shoes size 10"
Keyword matches:
- Products with "blue", "running", "shoes", "10" in title
- SKUs matching exactly
Semantic matches:
- "Athletic footwear in navy"
- "Jogging sneakers"
Hybrid combines both → Better product discovery
When to Use Hybrid Search
START
│
▼
Do queries contain exact terms (IDs, codes, acronyms)?
│
├─ YES → Hybrid search needed
│
▼
Do users search with varying vocabulary?
│
├─ YES → Hybrid search needed
│
▼
Is your domain highly technical?
│
├─ YES → Hybrid search recommended
│
▼
Pure semantic may be sufficient
(but hybrid rarely hurts)
Performance Comparison
Studies show hybrid search outperforms both methods alone:
| Method | Recall@10 | Precision@10 | MRR |
|---|---|---|---|
| BM25 only | 0.72 | 0.58 | 0.65 |
| Semantic only | 0.78 | 0.62 | 0.71 |
| Hybrid | 0.86 | 0.70 | 0.79 |
Results from MS MARCO dataset benchmark
The Cost of Hybrid
| Factor | Impact |
|---|---|
| Latency | ~1.5x single method (parallelizable) |
| Complexity | Moderate (fusion logic needed) |
| Storage | 2x (vectors + inverted index) |
| Maintenance | Two indexes to update |
Verdict: The quality improvement almost always justifies the additional complexity for production RAG systems.
Key Insight: Hybrid search isn't about choosing between methods—it's about recognizing that different query types need different retrieval approaches, and combining them provides robustness.
Next, let's implement hybrid search with BM25 and vector retrieval. :::