RAG System Design
Vector Database Selection
5 min read
Choosing the right vector database is crucial for RAG system performance. This lesson covers the major options and how to select the best one for your use case.
Vector Database Landscape
┌─────────────────────────────────────────────────────────────┐
│ Vector Database Options │
├─────────────────────────────────────────────────────────────┤
│ │
│ Managed Services │ Self-Hosted │
│ ────────────────────── │ ───────────────────────── │
│ • Pinecone │ • Milvus │
│ • Weaviate Cloud │ • Qdrant │
│ • Zilliz Cloud │ • Chroma │
│ │ • Weaviate │
│ Database Extensions │ │
│ ────────────────────── │ In-Memory │
│ • pgvector (PostgreSQL) │ ───────────────────────── │
│ • Atlas Vector (MongoDB) │ • FAISS │
│ │ • Annoy │
│ │
└─────────────────────────────────────────────────────────────┘
Comparison Table
| Database | Best For | Scale | Filtering | Managed |
|---|---|---|---|---|
| Pinecone | Production, ease of use | Billions | Good | Yes |
| Qdrant | Filtering, self-hosted | Billions | Excellent | Optional |
| Milvus | High performance | Billions | Good | Optional |
| pgvector | PostgreSQL users | Millions | SQL-native | Via providers |
| Weaviate | GraphQL, hybrid search | Billions | Good | Optional |
| Chroma | Prototyping, small scale | Thousands | Basic | No |
Pinecone
Strengths:
- Fully managed, zero ops
- Serverless pricing option
- Simple API
from pinecone import Pinecone
# Initialize
pc = Pinecone(api_key="your-key")
index = pc.Index("documents")
# Upsert vectors
index.upsert(
vectors=[
{
"id": "doc1",
"values": embedding,
"metadata": {"source": "manual.pdf", "page": 5}
}
],
namespace="product-docs"
)
# Query
results = index.query(
vector=query_embedding,
top_k=10,
namespace="product-docs",
filter={"source": {"$eq": "manual.pdf"}}
)
Considerations:
- Vendor lock-in
- Costs scale with vectors stored
- Limited filtering compared to Qdrant
Qdrant
Strengths:
- Excellent filtering capabilities
- Open-source with cloud option
- Rich payload support
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(url="http://localhost:6333")
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
# Upsert
client.upsert(
collection_name="documents",
points=[
PointStruct(
id=1,
vector=embedding,
payload={
"source": "manual.pdf",
"page": 5,
"category": "technical"
}
)
]
)
# Query with complex filtering
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter={
"must": [
{"key": "category", "match": {"value": "technical"}},
{"key": "page", "range": {"gte": 1, "lte": 10}}
]
},
limit=10
)
Considerations:
- Self-hosted requires DevOps
- Cloud option available but newer
pgvector
Strengths:
- Familiar PostgreSQL
- SQL joins with vector search
- Existing infrastructure
import psycopg2
# Enable extension
cursor.execute("CREATE EXTENSION IF NOT EXISTS vector")
# Create table
cursor.execute("""
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536),
metadata JSONB
)
""")
# Create index for faster search
cursor.execute("""
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100)
""")
# Query
cursor.execute("""
SELECT id, content, metadata,
1 - (embedding <=> %s) AS similarity
FROM documents
WHERE metadata->>'category' = 'technical'
ORDER BY embedding <=> %s
LIMIT 10
""", (query_embedding, query_embedding))
Considerations:
- Limited scale (millions, not billions)
- Index build time on large datasets
- Great for small to medium applications
Selection Framework
Decision Tree
Start
│
├─ Need billions of vectors?
│ ├─ Yes ──▶ Pinecone or Milvus
│ └─ No ───▶ Continue
│
├─ Need complex filtering?
│ ├─ Yes ──▶ Qdrant
│ └─ No ───▶ Continue
│
├─ Already using PostgreSQL?
│ ├─ Yes ──▶ pgvector
│ └─ No ───▶ Continue
│
├─ Need zero ops?
│ ├─ Yes ──▶ Pinecone
│ └─ No ───▶ Qdrant or Milvus
│
└─ Prototyping only?
├─ Yes ──▶ Chroma
└─ No ───▶ Qdrant
Cost Comparison (100M vectors)
| Provider | Monthly Cost | Notes |
|---|---|---|
| Pinecone | $700-2000 | Serverless or pod-based |
| Qdrant Cloud | $500-1500 | Based on cluster size |
| Self-hosted | $200-500 | Compute + storage |
| pgvector | $100-300 | Existing DB may work |
Interview Tip
When discussing vector databases, always mention:
- Scale requirements - millions vs billions
- Filtering needs - metadata queries
- Operational complexity - managed vs self-hosted
- Cost at scale - show you understand economics
Next, we'll explore hybrid retrieval strategies that combine dense and sparse retrieval. :::