Lesson 2 of 23

AI System Design Fundamentals

Components of AI Systems

4 min read

Modern AI systems are composed of several key building blocks. Understanding when and how to use each component is essential for system design interviews.

Core Components Overview

┌─────────────────────────────────────────────────────────────┐
│                      AI System Architecture                  │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│   Gateway   │    LLM      │   Vector    │     Storage      │
│   & API     │   Layer     │   Store     │     & Cache      │
├─────────────┼─────────────┼─────────────┼──────────────────┤
│ Rate Limit  │ OpenAI      │ Pinecone    │ PostgreSQL       │
│ Auth        │ Anthropic   │ Qdrant      │ Redis            │
│ Load Balance│ Local LLMs  │ Milvus      │ S3               │
└─────────────┴─────────────┴─────────────┴──────────────────┘

1. LLM Layer

The brain of your system. Key considerations:

Provider Best For Trade-off
OpenAI GPT-4 Complex reasoning Higher cost, rate limits
Claude Long context, safety Availability varies by region
Open-source (Llama) Privacy, cost control Requires infrastructure
# Example: LLM abstraction layer
class LLMProvider:
    def __init__(self, provider: str):
        self.provider = provider

    def complete(self, prompt: str, **kwargs) -> str:
        # Route to appropriate provider
        if self.provider == "openai":
            return self._openai_complete(prompt, **kwargs)
        elif self.provider == "anthropic":
            return self._anthropic_complete(prompt, **kwargs)

2. Vector Database

Stores embeddings for semantic search. Critical for RAG systems.

Database Strengths Considerations
Pinecone Managed, easy to scale Vendor lock-in
Qdrant Open-source, filtering Self-hosted complexity
pgvector Familiar PostgreSQL Limited scale
Milvus High performance Operational overhead

3. Caching Layer

Reduces costs and latency by storing repeated queries.

# Semantic caching pattern
class SemanticCache:
    def __init__(self, similarity_threshold=0.95):
        self.threshold = similarity_threshold
        self.cache = {}  # embedding -> response

    def get(self, query_embedding):
        for cached_embedding, response in self.cache.items():
            if cosine_similarity(query_embedding, cached_embedding) > self.threshold:
                return response
        return None

4. Message Queue

Handles async processing for long-running AI tasks.

When to use:

  • Tasks taking > 30 seconds
  • Batch processing of documents
  • Retry logic for failed LLM calls

Popular choices: Redis Queue, Celery, AWS SQS

5. API Gateway

The entry point for all requests.

Responsibilities:

  • Authentication and authorization
  • Rate limiting (critical for cost control)
  • Request routing
  • Response formatting

Component Selection Framework

When choosing components, ask:

  1. Scale: How many requests per second?
  2. Latency: What's acceptable response time?
  3. Cost: What's the budget per query?
  4. Reliability: What's the uptime requirement?
  5. Complexity: Can the team maintain it?

Next, we'll dive into scalability concepts for AI systems. :::

Quiz

Module 1: AI System Design Fundamentals

Take Quiz