Lesson 12 of 24

Hybrid Search & Reranking

Why Hybrid Search

2 min read

Neither semantic nor keyword search is perfect alone. Hybrid search combines their strengths to achieve better retrieval.

The Limitations Problem

Semantic Search Limitations

Vector similarity struggles with:

# Query: "error 404"
# Semantic search finds: "page not found", "missing resource"
# But misses: Documents that literally say "error 404"

# Query: "HIPAA compliance"
# Semantic search finds: "healthcare privacy regulations"
# But misses: Documents using the exact acronym "HIPAA"

Fails at:

  • Exact terms and acronyms (API, HIPAA, HTTP)
  • Product names and IDs (iPhone 15, SKU-12345)
  • Code snippets and error messages
  • Numerical queries (dates, versions, prices)

Keyword Search Limitations

BM25/TF-IDF struggles with:

# Query: "how to fix slow application"
# Keyword search finds: Documents with "slow" and "application"
# But misses: "performance optimization techniques"

# Query: "cheap flights to Paris"
# Keyword search finds: Documents with "cheap" and "flights"
# But misses: "budget-friendly airfare to France"

Fails at:

  • Synonyms and paraphrases
  • Conceptual matching
  • Understanding intent
  • Handling typos

Complementary Strengths

AspectSemanticKeywordHybrid
Exact termsPoorExcellentExcellent
SynonymsExcellentPoorExcellent
AcronymsPoorExcellentExcellent
ConceptsExcellentPoorExcellent
Typo toleranceGoodPoorGood

Real-World Examples

Example 1: Technical Documentation

Query: "OAuth 2.0 authentication flow"

Keyword matches:
- "OAuth 2.0 is an authorization framework..."
- "The OAuth 2.0 flow begins with..."

Semantic matches:
- "Token-based authentication process..."
- "Third-party login implementation..."

Hybrid combines both → Complete coverage

Example 2: E-commerce

Query: "blue running shoes size 10"

Keyword matches:
- Products with "blue", "running", "shoes", "10" in title
- SKUs matching exactly

Semantic matches:
- "Athletic footwear in navy"
- "Jogging sneakers"

Hybrid combines both → Better product discovery
START
Do queries contain exact terms (IDs, codes, acronyms)?
  ├─ YES → Hybrid search needed
Do users search with varying vocabulary?
  ├─ YES → Hybrid search needed
Is your domain highly technical?
  ├─ YES → Hybrid search recommended
Pure semantic may be sufficient
(but hybrid rarely hurts)

Performance Comparison

Studies show hybrid search outperforms both methods alone:

MethodRecall@10Precision@10MRR
BM25 only0.720.580.65
Semantic only0.780.620.71
Hybrid0.860.700.79

Illustrative results based on typical MS MARCO benchmark patterns. Actual results vary by implementation.

The Cost of Hybrid

FactorImpact
Latency~1.5x single method (parallelizable)
ComplexityModerate (fusion logic needed)
Storage2x (vectors + inverted index)
MaintenanceTwo indexes to update

Verdict: The quality improvement almost always justifies the additional complexity for production RAG systems.

Key Insight: Hybrid search isn't about choosing between methods—it's about recognizing that different query types need different retrieval approaches, and combining them provides robustness.

Next, let's implement hybrid search with BM25 and vector retrieval. :::

Quick check: how does this lesson land for you?

Quiz

Module 4: Hybrid Search & Reranking

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.