Memory & Knowledge
RAG Integration
4 min read
Retrieval-Augmented Generation (RAG) gives agents access to external knowledge beyond their training data. It's essential for building agents that need current, specific, or proprietary information.
How RAG Works
User Query → Embed → Search Vector DB → Retrieve Docs → Augment Prompt → Generate
Basic RAG Implementation
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
# 1. Create embeddings
embeddings = OpenAIEmbeddings()
# 2. Load and chunk documents
documents = load_documents("./knowledge_base/")
chunks = split_into_chunks(documents, chunk_size=500)
# 3. Create vector store
vectorstore = Chroma.from_documents(chunks, embeddings)
# 4. Create retriever
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 5} # Return top 5 matches
)
# 5. RAG chain
def rag_query(question):
# Retrieve relevant docs
docs = retriever.get_relevant_documents(question)
context = "\n".join([doc.page_content for doc in docs])
# Generate with context
response = llm.generate(f"""
Context: {context}
Question: {question}
Answer based on the context provided:
""")
return response
RAG for Agents
Integrate RAG as a tool for your agent:
from langchain.tools import Tool
# Create RAG tool
rag_tool = Tool(
name="knowledge_base",
description="Search the company knowledge base for policies, procedures, and documentation",
func=rag_query
)
# Add to agent
agent = create_agent(
llm=llm,
tools=[rag_tool, other_tools...]
)
Chunking Strategies
| Strategy | Best For | Chunk Size |
|---|---|---|
| Fixed size | General text | 500-1000 tokens |
| Semantic | Technical docs | Variable |
| Sentence | Conversational | 1-3 sentences |
| Document | Short files | Full document |
# Semantic chunking example
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200, # Overlap prevents losing context at boundaries
separators=["\n\n", "\n", ". ", " "]
)
Advanced RAG Techniques
Hybrid Search
Combine semantic and keyword search:
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
# Keyword search
bm25 = BM25Retriever.from_documents(documents)
# Semantic search
semantic = vectorstore.as_retriever()
# Combine both
hybrid = EnsembleRetriever(
retrievers=[bm25, semantic],
weights=[0.3, 0.7] # Weight semantic higher
)
Re-ranking
Improve relevance with a second pass:
from cohere import Client
def rerank_results(query, documents):
cohere = Client(api_key="...")
results = cohere.rerank(
query=query,
documents=[doc.page_content for doc in documents],
top_n=3
)
return [documents[r.index] for r in results]
RAG Best Practices
✅ Do:
- Use overlapping chunks
- Include metadata (source, date)
- Implement relevance filtering
- Update index regularly
❌ Don't:
- Chunk too small (loses context)
- Chunk too large (noise)
- Ignore source attribution
- Skip evaluation
Next, we'll explore different memory types for maintaining agent state. :::