Back to Course|AI Agent Engineer Interviews: Design, Build & Deploy Production Agentic Systems

Lab

Build a RAG-Powered Conversational Agent

45 min

Advanced

Unlimited free attempts

Instructions

In this lab, you'll build a complete RAG-powered conversational agent in Python. Your agent will ingest documents with configurable chunking, perform vector similarity search, decompose complex queries, manage conversation memory, allocate context window budgets, attribute claims to sources, and guard against hallucinations.

This covers the full lifecycle of a production RAG agent — the kind of system you'd be asked to design or implement in an agent engineer interview.

Architecture Overview

Documents → Chunking → Embeddings → Vector Store
                                        ↑
User Query → Memory Context             |
     ↓                                  |
Query Decomposition (Agentic RAG)       |
     ↓                                  |
Vector Search ←─────────────────────────┘
     ↓
Context Budget Allocation
     ↓
LLM Generation (with source attribution)
     ↓
Hallucination Guard
     ↓
Response to User (with citations)

Step 1: Document Store (document_store.py)

Build a DocumentStore class that ingests text documents and splits them into chunks using configurable strategies:

Fixed-size chunking: Split text every N characters with configurable overlap
Semantic chunking: Group consecutive sentences by embedding similarity — start a new chunk when the similarity between consecutive sentences drops below a threshold
Recursive chunking: Split by paragraphs first, then by sentences if a paragraph exceeds the max chunk size, then by character count as a last resort

Each chunk should carry metadata: chunk_id, document_id, source_filename, chunk_index, strategy_used.

Step 2: Vector Search (vector_search.py)

Build a VectorSearch class that provides similarity search over document chunks:

Accept an embedding function (callable that takes text and returns a list of floats)
Index chunks by computing and storing their embeddings
Search by computing the query embedding and finding the top-K most similar chunks using cosine similarity
Support metadata filtering (e.g., filter by source_filename or document_id)
Return results with similarity scores

Step 3: Agentic RAG (agentic_rag.py)

Build an AgenticRAG class that goes beyond naive retrieve-and-generate:

Query decomposition: Analyze a complex question and break it into simpler sub-queries that can each be searched independently
Multi-query retrieval: Execute each sub-query against the vector store and merge results, removing duplicates
Self-reflection: After retrieval, evaluate whether the retrieved chunks contain sufficient information to answer the original question. Return a quality assessment (sufficient / partial / insufficient) and optionally suggest a reformulated query

Step 4: Memory Manager (memory_manager.py)

Build a MemoryManager class that handles conversation memory:

Short-term buffer: Store the last N conversation turns (user messages + agent responses) in a list
Long-term summarization: When the buffer exceeds a configurable limit, compress older turns into a running summary using an LLM call (accept a summarization function as a parameter)
Memory retrieval: Return the combined context — summary (if any) plus recent turns — formatted for injection into the LLM prompt

Step 5: Context Budget Manager (context_budget.py)

Build a ContextBudgetManager class that allocates a finite token budget across competing components:

Accept a total token budget and allocation percentages for: system prompt, conversation memory, retrieved documents, and output reserve
Accept a token counting function (callable that takes text and returns an integer)
allocate() method: given the system prompt, memory context, and retrieved chunks, trim each component to fit within its allocated budget. Trim retrieved chunks by removing the lowest-scored ones first. Trim memory by summarizing or truncating from the oldest turns
Return the final assembled context with each component within budget

Step 6: Source Attribution (source_attribution.py)

Build a SourceAttributor class that links claims in the agent's response to source chunks:

Accept the agent's response text and the list of source chunks that were used
Split the response into individual claims (sentence-level)
For each claim, compute similarity against all source chunks and assign the best-matching chunk as the source
Assign a confidence score (the similarity value) to each attribution
Return a list of Attribution objects: claim, source_chunk_id, source_document, confidence, relevant_excerpt

Step 7: Hallucination Guard (hallucination_guard.py)

Build a HallucinationGuard class that cross-references the agent's answer against retrieved content:

Accept the agent's response and the retrieved source chunks
Split the response into individual claims
For each claim, check if it is supported by any source chunk (similarity above a configurable threshold)
Classify each claim as: supported, partially_supported, or unsupported
Return a GuardResult with: the list of classified claims, an overall trust score (ratio of supported claims), and a boolean is_safe flag (true if the trust score exceeds a configurable safety threshold)

What to Submit

The editor has 7 file sections with TODO comments. Replace each TODO with your Python code. The AI grader will evaluate each section against the rubric.

Hints

For cosine similarity, use the dot product divided by the product of magnitudes, or use numpy if you prefer
For sentence splitting, splitting on . (period + space) is a reasonable starting point
For the embedding function parameter, design your code to accept any callable with signature (str) -> List[float] — this makes it testable with mock embeddings
For token counting, accept any callable with signature (str) -> int — a simple approximation is len(text.split())
The hallucination guard threshold is a design choice — 0.7 is a reasonable default for cosine similarity

Grading Rubric

DocumentStore implements ingest() dispatching to three chunking strategies: fixed-size (with character-level splitting and overlap), semantic (grouping sentences by embedding similarity with threshold), and recursive (paragraphs then sentences then characters). Each chunk carries complete metadata (chunk_id, document_id, source_filename, chunk_index, strategy_used).15 points

VectorSearch implements index_chunks() storing embeddings, search() with cosine similarity scoring and top-K retrieval, and metadata_filter support that matches all provided key-value pairs. Cosine similarity handles zero-magnitude vectors gracefully.15 points

AgenticRAG implements retrieve() with query decomposition via decompose_fn, multi-query search with deduplication by chunk_id (keeping highest score), and self-reflection via reflect_fn. retrieve_with_reflection() iterates on insufficient results up to max_iterations using reformulated queries.20 points

MemoryManager implements add_turn() appending to buffer, _compress() that summarizes the oldest half of turns using summarize_fn (prepending existing summary) and retains the recent half, get_context() combining summary and recent turns, and clear() resetting all state.15 points

ContextBudgetManager implements allocate() that calculates per-component token budgets from percentages, truncates system prompt and memory context by words when over budget, selects retrieved chunks by score descending until retrieval budget is reached, and returns BudgetAllocation with accurate total_tokens_used and budget_remaining.10 points

SourceAttributor implements attribute() that splits the response into sentence-level claims, computes embedding similarity between each claim and all source chunks, assigns the highest-scoring chunk as source with confidence score, and returns Attribution objects with claim, source_chunk_id, source_document, confidence, and relevant_excerpt.15 points

HallucinationGuard implements check() that splits response into claims, computes similarity against source chunks, classifies each claim as supported/partially_supported/unsupported using configurable thresholds, calculates trust_score as ratio of supported claims, and sets is_safe based on safety_threshold.10 points

Checklist

0/7