Lesson 10 of 23
RAG System Design

Vector Database Selection

5 min read

Choosing the right vector database is crucial for RAG system performance. This lesson covers the major options and how to select the best one for your use case.

Vector Database Landscape

┌─────────────────────────────────────────────────────────────┐
│                Vector Database Options                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Managed Services          │    Self-Hosted                 │
│  ──────────────────────    │    ─────────────────────────   │
│  • Pinecone                │    • Milvus                    │
│  • Weaviate Cloud          │    • Qdrant                    │
│  • Zilliz Cloud            │    • Chroma                    │
│                            │    • Weaviate                  │
│  Database Extensions       │                                │
│  ──────────────────────    │    In-Memory                   │
│  • pgvector (PostgreSQL)   │    ─────────────────────────   │
│  • Atlas Vector (MongoDB)  │    • FAISS                     │
│                            │    • Annoy                     │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Comparison Table

DatabaseBest ForScaleFilteringManaged
PineconeProduction, ease of useBillionsGoodYes
QdrantFiltering, self-hostedBillionsExcellentOptional
MilvusHigh performanceBillionsGoodOptional
pgvectorPostgreSQL usersMillionsSQL-nativeVia providers
WeaviateGraphQL, hybrid searchBillionsGoodOptional
ChromaPrototyping, small scaleThousandsBasicNo

Pinecone

Strengths:

  • Fully managed, zero ops
  • Serverless pricing option
  • Simple API
from pinecone import Pinecone

# Initialize
pc = Pinecone(api_key="your-key")
index = pc.Index("documents")

# Upsert vectors
index.upsert(
    vectors=[
        {
            "id": "doc1",
            "values": embedding,
            "metadata": {"source": "manual.pdf", "page": 5}
        }
    ],
    namespace="product-docs"
)

# Query
results = index.query(
    vector=query_embedding,
    top_k=10,
    namespace="product-docs",
    filter={"source": {"$eq": "manual.pdf"}}
)

Considerations:

  • Vendor lock-in
  • Costs scale with vectors stored
  • Limited filtering compared to Qdrant

Qdrant

Strengths:

  • Excellent filtering capabilities
  • Open-source with cloud option
  • Rich payload support
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

# Upsert
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=embedding,
            payload={
                "source": "manual.pdf",
                "page": 5,
                "category": "technical"
            }
        )
    ]
)

# Query with complex filtering
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter={
        "must": [
            {"key": "category", "match": {"value": "technical"}},
            {"key": "page", "range": {"gte": 1, "lte": 10}}
        ]
    },
    limit=10
)

Considerations:

  • Self-hosted requires DevOps
  • Cloud option available but newer

pgvector

Strengths:

  • Familiar PostgreSQL
  • SQL joins with vector search
  • Existing infrastructure
import psycopg2

# Enable extension
cursor.execute("CREATE EXTENSION IF NOT EXISTS vector")

# Create table
cursor.execute("""
    CREATE TABLE documents (
        id SERIAL PRIMARY KEY,
        content TEXT,
        embedding vector(1536),
        metadata JSONB
    )
""")

# Create index for faster search
cursor.execute("""
    CREATE INDEX ON documents
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100)
""")

# Query
cursor.execute("""
    SELECT id, content, metadata,
           1 - (embedding <=> %s) AS similarity
    FROM documents
    WHERE metadata->>'category' = 'technical'
    ORDER BY embedding <=> %s
    LIMIT 10
""", (query_embedding, query_embedding))

Considerations:

  • Limited scale (millions, not billions)
  • Index build time on large datasets
  • Great for small to medium applications

Selection Framework

Decision Tree

Start
  ├─ Need billions of vectors?
  │     ├─ Yes ──▶ Pinecone or Milvus
  │     └─ No ───▶ Continue
  ├─ Need complex filtering?
  │     ├─ Yes ──▶ Qdrant
  │     └─ No ───▶ Continue
  ├─ Already using PostgreSQL?
  │     ├─ Yes ──▶ pgvector
  │     └─ No ───▶ Continue
  ├─ Need zero ops?
  │     ├─ Yes ──▶ Pinecone
  │     └─ No ───▶ Qdrant or Milvus
  └─ Prototyping only?
        ├─ Yes ──▶ Chroma
        └─ No ───▶ Qdrant

Cost Comparison (100M vectors)

ProviderMonthly CostNotes
Pinecone$700-2000Serverless or pod-based
Qdrant Cloud$500-1500Based on cluster size
Self-hosted$200-500Compute + storage
pgvector$100-300Existing DB may work

Interview Tip

When discussing vector databases, always mention:

  1. Scale requirements - millions vs billions
  2. Filtering needs - metadata queries
  3. Operational complexity - managed vs self-hosted
  4. Cost at scale - show you understand economics

Next, we'll explore hybrid retrieval strategies that combine dense and sparse retrieval. :::

Quick check: how does this lesson land for you?

Quiz

Module 3: RAG System Design

Take Quiz