Lesson 10 of 23

RAG System Design

Vector Database Selection

5 min read

Choosing the right vector database is crucial for RAG system performance. This lesson covers the major options and how to select the best one for your use case.

Vector Database Landscape

┌─────────────────────────────────────────────────────────────┐
│                Vector Database Options                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Managed Services          │    Self-Hosted                 │
│  ──────────────────────    │    ─────────────────────────   │
│  • Pinecone                │    • Milvus                    │
│  • Weaviate Cloud          │    • Qdrant                    │
│  • Zilliz Cloud            │    • Chroma                    │
│                            │    • Weaviate                  │
│  Database Extensions       │                                │
│  ──────────────────────    │    In-Memory                   │
│  • pgvector (PostgreSQL)   │    ─────────────────────────   │
│  • Atlas Vector (MongoDB)  │    • FAISS                     │
│                            │    • Annoy                     │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Comparison Table

Database Best For Scale Filtering Managed
Pinecone Production, ease of use Billions Good Yes
Qdrant Filtering, self-hosted Billions Excellent Optional
Milvus High performance Billions Good Optional
pgvector PostgreSQL users Millions SQL-native Via providers
Weaviate GraphQL, hybrid search Billions Good Optional
Chroma Prototyping, small scale Thousands Basic No

Pinecone

Strengths:

  • Fully managed, zero ops
  • Serverless pricing option
  • Simple API
from pinecone import Pinecone

# Initialize
pc = Pinecone(api_key="your-key")
index = pc.Index("documents")

# Upsert vectors
index.upsert(
    vectors=[
        {
            "id": "doc1",
            "values": embedding,
            "metadata": {"source": "manual.pdf", "page": 5}
        }
    ],
    namespace="product-docs"
)

# Query
results = index.query(
    vector=query_embedding,
    top_k=10,
    namespace="product-docs",
    filter={"source": {"$eq": "manual.pdf"}}
)

Considerations:

  • Vendor lock-in
  • Costs scale with vectors stored
  • Limited filtering compared to Qdrant

Qdrant

Strengths:

  • Excellent filtering capabilities
  • Open-source with cloud option
  • Rich payload support
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

# Upsert
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=embedding,
            payload={
                "source": "manual.pdf",
                "page": 5,
                "category": "technical"
            }
        )
    ]
)

# Query with complex filtering
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter={
        "must": [
            {"key": "category", "match": {"value": "technical"}},
            {"key": "page", "range": {"gte": 1, "lte": 10}}
        ]
    },
    limit=10
)

Considerations:

  • Self-hosted requires DevOps
  • Cloud option available but newer

pgvector

Strengths:

  • Familiar PostgreSQL
  • SQL joins with vector search
  • Existing infrastructure
import psycopg2

# Enable extension
cursor.execute("CREATE EXTENSION IF NOT EXISTS vector")

# Create table
cursor.execute("""
    CREATE TABLE documents (
        id SERIAL PRIMARY KEY,
        content TEXT,
        embedding vector(1536),
        metadata JSONB
    )
""")

# Create index for faster search
cursor.execute("""
    CREATE INDEX ON documents
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100)
""")

# Query
cursor.execute("""
    SELECT id, content, metadata,
           1 - (embedding <=> %s) AS similarity
    FROM documents
    WHERE metadata->>'category' = 'technical'
    ORDER BY embedding <=> %s
    LIMIT 10
""", (query_embedding, query_embedding))

Considerations:

  • Limited scale (millions, not billions)
  • Index build time on large datasets
  • Great for small to medium applications

Selection Framework

Decision Tree

Start
  ├─ Need billions of vectors?
  │     ├─ Yes ──▶ Pinecone or Milvus
  │     └─ No ───▶ Continue
  ├─ Need complex filtering?
  │     ├─ Yes ──▶ Qdrant
  │     └─ No ───▶ Continue
  ├─ Already using PostgreSQL?
  │     ├─ Yes ──▶ pgvector
  │     └─ No ───▶ Continue
  ├─ Need zero ops?
  │     ├─ Yes ──▶ Pinecone
  │     └─ No ───▶ Qdrant or Milvus
  └─ Prototyping only?
        ├─ Yes ──▶ Chroma
        └─ No ───▶ Qdrant

Cost Comparison (100M vectors)

Provider Monthly Cost Notes
Pinecone $700-2000 Serverless or pod-based
Qdrant Cloud $500-1500 Based on cluster size
Self-hosted $200-500 Compute + storage
pgvector $100-300 Existing DB may work

Interview Tip

When discussing vector databases, always mention:

  1. Scale requirements - millions vs billions
  2. Filtering needs - metadata queries
  3. Operational complexity - managed vs self-hosted
  4. Cost at scale - show you understand economics

Next, we'll explore hybrid retrieval strategies that combine dense and sparse retrieval. :::

Quiz

Module 3: RAG System Design

Take Quiz