Lesson 10 of 20

Memory & Knowledge

RAG Integration

4 min read

Retrieval-Augmented Generation (RAG) gives agents access to external knowledge beyond their training data. It's essential for building agents that need current, specific, or proprietary information.

How RAG Works

User Query → Embed → Search Vector DB → Retrieve Docs → Augment Prompt → Generate

Basic RAG Implementation

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# 1. Create embeddings
embeddings = OpenAIEmbeddings()

# 2. Load and chunk documents
documents = load_documents("./knowledge_base/")
chunks = split_into_chunks(documents, chunk_size=500)

# 3. Create vector store
vectorstore = Chroma.from_documents(chunks, embeddings)

# 4. Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}  # Return top 5 matches
)

# 5. RAG chain
def rag_query(question):
    # Retrieve relevant docs
    docs = retriever.get_relevant_documents(question)
    context = "\n".join([doc.page_content for doc in docs])

    # Generate with context
    response = llm.generate(f"""
    Context: {context}

    Question: {question}

    Answer based on the context provided:
    """)
    return response

RAG for Agents

Integrate RAG as a tool for your agent:

from langchain.tools import Tool

# Create RAG tool
rag_tool = Tool(
    name="knowledge_base",
    description="Search the company knowledge base for policies, procedures, and documentation",
    func=rag_query
)

# Add to agent
agent = create_agent(
    llm=llm,
    tools=[rag_tool, other_tools...]
)

Chunking Strategies

Strategy Best For Chunk Size
Fixed size General text 500-1000 tokens
Semantic Technical docs Variable
Sentence Conversational 1-3 sentences
Document Short files Full document
# Semantic chunking example
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,  # Overlap prevents losing context at boundaries
    separators=["\n\n", "\n", ". ", " "]
)

Advanced RAG Techniques

Combine semantic and keyword search:

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

# Keyword search
bm25 = BM25Retriever.from_documents(documents)

# Semantic search
semantic = vectorstore.as_retriever()

# Combine both
hybrid = EnsembleRetriever(
    retrievers=[bm25, semantic],
    weights=[0.3, 0.7]  # Weight semantic higher
)

Re-ranking

Improve relevance with a second pass:

from cohere import Client

def rerank_results(query, documents):
    cohere = Client(api_key="...")
    results = cohere.rerank(
        query=query,
        documents=[doc.page_content for doc in documents],
        top_n=3
    )
    return [documents[r.index] for r in results]

RAG Best Practices

Do:

  • Use overlapping chunks
  • Include metadata (source, date)
  • Implement relevance filtering
  • Update index regularly

Don't:

  • Chunk too small (loses context)
  • Chunk too large (noise)
  • Ignore source attribution
  • Skip evaluation

Next, we'll explore different memory types for maintaining agent state. :::

Quiz

Module 3: Memory & Knowledge

Take Quiz