Lesson 15 of 23

Hybrid Search & Reranking

Query Enhancement

3 min read

Improving the query before retrieval significantly impacts results. These techniques bridge the gap between how users ask and how documents are written.

The Query-Document Gap

Users ask questions differently than documents are written:

User Query: "Why is my app slow?"
Document: "Performance optimization techniques include..."

Gap: Different vocabulary, question vs statement

Query Expansion

Generate multiple query variations to improve recall:

def expand_query(query: str, llm) -> list[str]:
    """Generate query variations for better coverage."""
    prompt = f"""Generate 3 alternative search queries for:
    "{query}"

    Include:
    1. A rephrased version
    2. A more technical version
    3. A simpler version

    Return only the queries, one per line."""

    response = llm.invoke(prompt)
    variations = response.content.strip().split('\n')

    # Include original query
    return [query] + variations[:3]

# Example
query = "Why is my app slow?"
expanded = expand_query(query, llm)
# ["Why is my app slow?",
#  "What causes application performance issues?",
#  "Application latency troubleshooting",
#  "App running slowly"]

Multi-Query Retrieval

Search with all query variations:

class MultiQueryRetriever:
    def __init__(self, retriever, llm):
        self.retriever = retriever
        self.llm = llm

    def search(self, query: str, k: int = 10) -> list[dict]:
        # Expand query
        queries = expand_query(query, self.llm)

        # Retrieve for each variation
        all_results = []
        seen_ids = set()

        for q in queries:
            results = self.retriever.search(q, k=k)
            for result in results:
                if result["id"] not in seen_ids:
                    all_results.append(result)
                    seen_ids.add(result["id"])

        # Rerank combined results
        return self._rerank(query, all_results, k)

HyDE (Hypothetical Document Embeddings)

Generate a hypothetical answer, then search for similar documents:

class HyDERetriever:
    def __init__(self, vectorstore, llm, embeddings):
        self.vectorstore = vectorstore
        self.llm = llm
        self.embeddings = embeddings

    def search(self, query: str, k: int = 5) -> list[dict]:
        # Generate hypothetical document
        prompt = f"""Write a detailed answer to this question as if you were
        writing documentation:

        Question: {query}

        Answer:"""

        hypothetical_doc = self.llm.invoke(prompt).content

        # Embed the hypothetical document
        hyde_embedding = self.embeddings.embed_query(hypothetical_doc)

        # Search using hypothetical embedding
        results = self.vectorstore.similarity_search_by_vector(
            hyde_embedding,
            k=k
        )

        return results

# Example
query = "How do I implement rate limiting?"

# Hypothetical doc generated:
# "Rate limiting can be implemented using a token bucket algorithm.
#  First, define a bucket size and refill rate..."

# This embedding matches documentation better than the question would

Why HyDE works:

  • Questions and documents have different embeddings
  • A hypothetical answer is closer to actual documentation
  • The embedding of the answer matches document embeddings better

Query Decomposition

Break complex queries into sub-queries:

def decompose_query(query: str, llm) -> list[str]:
    """Break complex query into simpler sub-queries."""
    prompt = f"""Analyze this query and break it into simpler sub-queries
    that can be answered independently:

    Query: {query}

    If the query is already simple, return it as-is.
    Otherwise, return 2-4 sub-queries, one per line.

    Sub-queries:"""

    response = llm.invoke(prompt)
    sub_queries = response.content.strip().split('\n')

    return sub_queries if len(sub_queries) > 1 else [query]

# Example
query = "Compare OAuth and JWT for API authentication and show implementation"

sub_queries = decompose_query(query, llm)
# ["What is OAuth for API authentication?",
#  "What is JWT for API authentication?",
#  "How to implement OAuth?",
#  "How to implement JWT?"]

Step-Back Prompting

Generate a more general query first:

def step_back_query(query: str, llm) -> str:
    """Generate a broader, more general query."""
    prompt = f"""Given this specific query, generate a broader question
    that would provide useful background context:

    Specific query: {query}

    Broader question:"""

    return llm.invoke(prompt).content.strip()

class StepBackRetriever:
    def __init__(self, retriever, llm):
        self.retriever = retriever
        self.llm = llm

    def search(self, query: str, k: int = 5) -> list[dict]:
        # Get step-back query
        broad_query = step_back_query(query, self.llm)

        # Retrieve for both
        specific_results = self.retriever.search(query, k=k)
        broad_results = self.retriever.search(broad_query, k=k//2)

        # Combine (broad context first, then specific)
        return broad_results + specific_results

# Example
query = "Why does HNSW index have higher memory usage than IVF?"

step_back = "How do vector database indexing algorithms work?"
# Broad context helps answer the specific question

Query Transformation Pipeline

Combine multiple techniques:

class QueryTransformPipeline:
    def __init__(self, retriever, llm, embeddings):
        self.retriever = retriever
        self.llm = llm
        self.embeddings = embeddings

    def search(
        self,
        query: str,
        k: int = 5,
        use_expansion: bool = True,
        use_hyde: bool = False,
        use_decomposition: bool = False
    ) -> list[dict]:
        all_results = []
        seen = set()

        # Original query
        queries = [query]

        # Query expansion
        if use_expansion:
            queries.extend(expand_query(query, self.llm))

        # Query decomposition
        if use_decomposition:
            queries.extend(decompose_query(query, self.llm))

        # Retrieve for all queries
        for q in queries:
            if use_hyde:
                results = self._hyde_search(q, k=k)
            else:
                results = self.retriever.search(q, k=k)

            for r in results:
                if r["id"] not in seen:
                    all_results.append(r)
                    seen.add(r["id"])

        # Rerank by original query
        return self._rerank(query, all_results, k)

Choosing Enhancement Techniques

Technique Best For Latency Impact
Query expansion Vocabulary mismatch +100-200ms
HyDE Q&A retrieval +200-500ms
Decomposition Complex queries +200-400ms
Step-back Questions needing context +100-200ms
START
Simple factual query?
  ├─ YES → Query expansion only
Q&A over documentation?
  ├─ YES → HyDE + expansion
Complex multi-part query?
  ├─ YES → Decomposition
Query needs background context?
  ├─ YES → Step-back prompting
Default → Query expansion (good baseline)

Latency Note: Query enhancement adds LLM calls. Cache common query transformations and consider async processing for production systems.

In the next module, we'll learn how to evaluate RAG systems systematically using RAGAS and other frameworks. :::

Quiz

Module 4: Hybrid Search & Reranking

Take Quiz