Lesson 1 of 24

RAG Architecture Deep Dive

Beyond Basic RAG

6 min read

The outcome first — what you'll build in the capstone

Before we talk theory, here's what the end of this course looks like:

$ curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"q": "What did I decide about Q3 in my March planning notes?"}'

{
  "answer": "Per your 2026-03-14 notes, Q3 priorities are: (1) ship MCP integration [cite:1],
             (2) launch the Arabic site [cite:2], (3) cut infra 30% [cite:1].",
  "citations": [
    {"id": 1, "source": "meetings/2026-03-14-planning.md"},
    {"id": 2, "source": "meetings/2026-03-14-planning.md"}
  ]
}

Every improvement you learn across the next six modules — better chunking, hybrid search, reranking, RAGAS evaluation, monitoring — makes this service answer more accurately over your actual documents. Not a toy.

Why "basic RAG" fails in production

The naive recipe looks elegant:

def naive_rag(query: str):
    docs = vectorstore.similarity_search(query, k=4)
    context = "\n".join(d.page_content for d in docs)
    return llm.invoke(f"Context: {context}\n\nQuestion: {query}")

Run this on 200 real documents and you'll hit all four canonical failures within the first hour:

  1. Query–document mismatch. Users phrase questions in natural language ("What did I decide about Q3?"). Documents are written in declarative prose ("Q3 priorities: MCP integration, Arabic launch, 30% infra cut"). Cosine similarity between the two can be surprisingly low.
  2. Irrelevant chunks pollute context. If 2 of your 4 retrieved chunks are about Q2 (not Q3), the LLM often prefers the majority and confabulates.
  3. No verification of retrieval quality. Basic RAG returns the top-k whether or not any of them are actually relevant. "What's the capital of Brazil?" over a docs folder about your company still returns 4 company docs — the LLM then either hallucinates or awkwardly says "I don't know."
  4. Fixed retrieval regardless of complexity. "Who owns the payments service?" needs 1 chunk. "Compare our Q2 and Q3 priorities across engineering and marketing" needs 8 chunks from 3 different docs. Top-k=4 is wrong for both.

The three generations of RAG

GenerationWhat it addsWhen to use it
Naive RAGEmbedding search + LLM callDemos, prototypes, < 50 documents
Advanced RAGQuery rewriting, hybrid search, reranking, grounded generationProduction with 100–100K documents — this course
Agentic RAGMulti-step retrieval, self-correction, tool use, adaptive kComplex analytical queries, cross-domain research agents

The honest truth most tutorials skip: Agentic RAG usually isn't what you need. Most business questions ("what's our refund policy," "what did the customer say about X") get answered well by properly-tuned Advanced RAG at a fraction of the latency and cost.

This course's focus

You're going to build Advanced RAG — the production-ready middle. Module by module:

ModuleWhat you learnWhat you ship at the end of the module
1 (this one)Why naive fails, RAG vs fine-tune, pipeline, failure modesA mental model of where quality actually lives
2Embedding choice + vector DB choiceIndexed corpus on Supabase pgvector with 3072-dim embeddings
3Chunking that preserves meaningYour corpus rechunked so answers land on clean boundaries
4Hybrid search + rerankingA hybrid retriever with BM25 + vector fusion + LLM rerank
5RAGAS evaluation + test datasetsA test set + numeric scores for your system
6Production hardening + capstoneThe full FastAPI RAG service, deployed, with citations

Key insight

Most RAG failures aren't model problems — they're retrieval problems. Master retrieval, and answer quality follows.

Swap Claude Sonnet 4.6 for GPT-5 — your answer quality barely moves. Swap your naive top-k retrieval for hybrid search with reranking — your answer quality goes from 60% helpful to 90%.

Build checkpoint — do this before the next lesson

You don't need to code yet, but you DO need to pick:

  1. What corpus will you RAG over? Your Notion export? Your company wiki? A folder of meeting notes? Pick now, not later. Roughly 20–200 documents is the sweet spot for the first build.
  2. Where do those documents live? If they're scattered across Slack/email/etc., spend 15 minutes exporting a clean folder of .md / .pdf files. Quality of your RAG is bounded by what you feed it.
  3. Write down 5 questions you'd actually want answered from that corpus. You'll use these to evaluate every module's improvement. Having a ground truth matters more than any technique you'll learn.

Next: RAG vs Fine-tuning — when each wins, and when the right answer is both. :::

Quick check: how does this lesson land for you?

Quiz

Module 1: RAG Architecture Deep Dive

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.