Lesson 12 of 23

Hybrid Search & Reranking

Why Hybrid Search

2 min read

Neither semantic nor keyword search is perfect alone. Hybrid search combines their strengths to achieve better retrieval.

The Limitations Problem

Semantic Search Limitations

Vector similarity struggles with:

# Query: "error 404"
# Semantic search finds: "page not found", "missing resource"
# But misses: Documents that literally say "error 404"

# Query: "HIPAA compliance"
# Semantic search finds: "healthcare privacy regulations"
# But misses: Documents using the exact acronym "HIPAA"

Fails at:

  • Exact terms and acronyms (API, HIPAA, HTTP)
  • Product names and IDs (iPhone 15, SKU-12345)
  • Code snippets and error messages
  • Numerical queries (dates, versions, prices)

Keyword Search Limitations

BM25/TF-IDF struggles with:

# Query: "how to fix slow application"
# Keyword search finds: Documents with "slow" and "application"
# But misses: "performance optimization techniques"

# Query: "cheap flights to Paris"
# Keyword search finds: Documents with "cheap" and "flights"
# But misses: "budget-friendly airfare to France"

Fails at:

  • Synonyms and paraphrases
  • Conceptual matching
  • Understanding intent
  • Handling typos

Complementary Strengths

Aspect Semantic Keyword Hybrid
Exact terms Poor Excellent Excellent
Synonyms Excellent Poor Excellent
Acronyms Poor Excellent Excellent
Concepts Excellent Poor Excellent
Typo tolerance Good Poor Good

Real-World Examples

Example 1: Technical Documentation

Query: "OAuth 2.0 authentication flow"

Keyword matches:
- "OAuth 2.0 is an authorization framework..."
- "The OAuth 2.0 flow begins with..."

Semantic matches:
- "Token-based authentication process..."
- "Third-party login implementation..."

Hybrid combines both → Complete coverage

Example 2: E-commerce

Query: "blue running shoes size 10"

Keyword matches:
- Products with "blue", "running", "shoes", "10" in title
- SKUs matching exactly

Semantic matches:
- "Athletic footwear in navy"
- "Jogging sneakers"

Hybrid combines both → Better product discovery
START
Do queries contain exact terms (IDs, codes, acronyms)?
  ├─ YES → Hybrid search needed
Do users search with varying vocabulary?
  ├─ YES → Hybrid search needed
Is your domain highly technical?
  ├─ YES → Hybrid search recommended
Pure semantic may be sufficient
(but hybrid rarely hurts)

Performance Comparison

Studies show hybrid search outperforms both methods alone:

Method Recall@10 Precision@10 MRR
BM25 only 0.72 0.58 0.65
Semantic only 0.78 0.62 0.71
Hybrid 0.86 0.70 0.79

Results from MS MARCO dataset benchmark

The Cost of Hybrid

Factor Impact
Latency ~1.5x single method (parallelizable)
Complexity Moderate (fusion logic needed)
Storage 2x (vectors + inverted index)
Maintenance Two indexes to update

Verdict: The quality improvement almost always justifies the additional complexity for production RAG systems.

Key Insight: Hybrid search isn't about choosing between methods—it's about recognizing that different query types need different retrieval approaches, and combining them provides robustness.

Next, let's implement hybrid search with BM25 and vector retrieval. :::

Quiz

Module 4: Hybrid Search & Reranking

Take Quiz