Lesson 2 of 23
AI System Design Fundamentals

Components of AI Systems

4 min read

Modern AI systems are composed of several key building blocks. Understanding when and how to use each component is essential for system design interviews.

Core Components Overview

┌─────────────────────────────────────────────────────────────┐
│                      AI System Architecture                  │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│   Gateway   │    LLM      │   Vector    │     Storage      │
│   & API     │   Layer     │   Store     │     & Cache      │
├─────────────┼─────────────┼─────────────┼──────────────────┤
│ Rate Limit  │ OpenAI      │ Pinecone    │ PostgreSQL       │
│ Auth        │ Anthropic   │ Qdrant      │ Redis            │
│ Load Balance│ Local LLMs  │ Milvus      │ S3               │
└─────────────┴─────────────┴─────────────┴──────────────────┘

1. LLM Layer

The brain of your system. Key considerations:

ProviderBest ForTrade-off
OpenAI GPT-5.4Complex reasoningHigher cost, rate limits
ClaudeLong context, safetyAvailability varies by region
Open-source (Llama)Privacy, cost controlRequires infrastructure
# Example: LLM abstraction layer
class LLMProvider:
    def __init__(self, provider: str):
        self.provider = provider

    def complete(self, prompt: str, **kwargs) -> str:
        # Route to appropriate provider
        if self.provider == "openai":
            return self._openai_complete(prompt, **kwargs)
        elif self.provider == "anthropic":
            return self._anthropic_complete(prompt, **kwargs)

2. Vector Database

Stores embeddings for semantic search. Critical for RAG systems.

DatabaseStrengthsConsiderations
PineconeManaged, easy to scaleVendor lock-in
QdrantOpen-source, filteringSelf-hosted complexity
pgvectorFamiliar PostgreSQLLimited scale
MilvusHigh performanceOperational overhead

3. Caching Layer

Reduces costs and latency by storing repeated queries.

# Semantic caching pattern
class SemanticCache:
    def __init__(self, similarity_threshold=0.95):
        self.threshold = similarity_threshold
        self.cache = {}  # embedding -> response

    def get(self, query_embedding):
        for cached_embedding, response in self.cache.items():
            if cosine_similarity(query_embedding, cached_embedding) > self.threshold:
                return response
        return None

4. Message Queue

Handles async processing for long-running AI tasks.

When to use:

  • Tasks taking > 30 seconds
  • Batch processing of documents
  • Retry logic for failed LLM calls

Popular choices: Redis Queue, Celery, AWS SQS

5. API Gateway

The entry point for all requests.

Responsibilities:

  • Authentication and authorization
  • Rate limiting (critical for cost control)
  • Request routing
  • Response formatting

Component Selection Framework

When choosing components, ask:

  1. Scale: How many requests per second?
  2. Latency: What's acceptable response time?
  3. Cost: What's the budget per query?
  4. Reliability: What's the uptime requirement?
  5. Complexity: Can the team maintain it?

Next, we'll dive into scalability concepts for AI systems. :::

Quick check: how does this lesson land for you?

Quiz

Module 1: AI System Design Fundamentals

Take Quiz