Components of AI Systems

Modern AI systems are composed of several key building blocks. Understanding when and how to use each component is essential for system design interviews.

Core Components Overview

┌─────────────────────────────────────────────────────────────┐
│                      AI System Architecture                  │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│   Gateway   │    LLM      │   Vector    │     Storage      │
│   & API     │   Layer     │   Store     │     & Cache      │
├─────────────┼─────────────┼─────────────┼──────────────────┤
│ Rate Limit  │ OpenAI      │ Pinecone    │ PostgreSQL       │
│ Auth        │ Anthropic   │ Qdrant      │ Redis            │
│ Load Balance│ Local LLMs  │ Milvus      │ S3               │
└─────────────┴─────────────┴─────────────┴──────────────────┘

1. LLM Layer

The brain of your system. Key considerations:

Provider	Best For	Trade-off
OpenAI GPT-4o	Complex reasoning	Higher cost, rate limits
Claude	Long context, safety	Availability varies by region
Open-source (Llama)	Privacy, cost control	Requires infrastructure

# Example: LLM abstraction layer
class LLMProvider:
    def __init__(self, provider: str):
        self.provider = provider

    def complete(self, prompt: str, **kwargs) -> str:
        # Route to appropriate provider
        if self.provider == "openai":
            return self._openai_complete(prompt, **kwargs)
        elif self.provider == "anthropic":
            return self._anthropic_complete(prompt, **kwargs)

2. Vector Database

Stores embeddings for semantic search. Critical for RAG systems.

Database	Strengths	Considerations
Pinecone	Managed, easy to scale	Vendor lock-in
Qdrant	Open-source, filtering	Self-hosted complexity
pgvector	Familiar PostgreSQL	Limited scale
Milvus	High performance	Operational overhead

3. Caching Layer

Reduces costs and latency by storing repeated queries.

# Semantic caching pattern
class SemanticCache:
    def __init__(self, similarity_threshold=0.95):
        self.threshold = similarity_threshold
        self.cache = {}  # embedding -> response

    def get(self, query_embedding):
        for cached_embedding, response in self.cache.items():
            if cosine_similarity(query_embedding, cached_embedding) > self.threshold:
                return response
        return None

4. Message Queue

Handles async processing for long-running AI tasks.

When to use:

Tasks taking > 30 seconds
Batch processing of documents
Retry logic for failed LLM calls

Popular choices: Redis Queue, Celery, AWS SQS

5. API Gateway

The entry point for all requests.

Responsibilities:

Authentication and authorization
Rate limiting (critical for cost control)
Request routing
Response formatting

Component Selection Framework

When choosing components, ask:

Scale: How many requests per second?
Latency: What's acceptable response time?
Cost: What's the budget per query?
Reliability: What's the uptime requirement?
Complexity: Can the team maintain it?

Next, we'll dive into scalability concepts for AI systems. :::