AI System Design Fundamentals
Components of AI Systems
4 min read
Modern AI systems are composed of several key building blocks. Understanding when and how to use each component is essential for system design interviews.
Core Components Overview
┌─────────────────────────────────────────────────────────────┐
│ AI System Architecture │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│ Gateway │ LLM │ Vector │ Storage │
│ & API │ Layer │ Store │ & Cache │
├─────────────┼─────────────┼─────────────┼──────────────────┤
│ Rate Limit │ OpenAI │ Pinecone │ PostgreSQL │
│ Auth │ Anthropic │ Qdrant │ Redis │
│ Load Balance│ Local LLMs │ Milvus │ S3 │
└─────────────┴─────────────┴─────────────┴──────────────────┘
1. LLM Layer
The brain of your system. Key considerations:
| Provider | Best For | Trade-off |
|---|---|---|
| OpenAI GPT-4 | Complex reasoning | Higher cost, rate limits |
| Claude | Long context, safety | Availability varies by region |
| Open-source (Llama) | Privacy, cost control | Requires infrastructure |
# Example: LLM abstraction layer
class LLMProvider:
def __init__(self, provider: str):
self.provider = provider
def complete(self, prompt: str, **kwargs) -> str:
# Route to appropriate provider
if self.provider == "openai":
return self._openai_complete(prompt, **kwargs)
elif self.provider == "anthropic":
return self._anthropic_complete(prompt, **kwargs)
2. Vector Database
Stores embeddings for semantic search. Critical for RAG systems.
| Database | Strengths | Considerations |
|---|---|---|
| Pinecone | Managed, easy to scale | Vendor lock-in |
| Qdrant | Open-source, filtering | Self-hosted complexity |
| pgvector | Familiar PostgreSQL | Limited scale |
| Milvus | High performance | Operational overhead |
3. Caching Layer
Reduces costs and latency by storing repeated queries.
# Semantic caching pattern
class SemanticCache:
def __init__(self, similarity_threshold=0.95):
self.threshold = similarity_threshold
self.cache = {} # embedding -> response
def get(self, query_embedding):
for cached_embedding, response in self.cache.items():
if cosine_similarity(query_embedding, cached_embedding) > self.threshold:
return response
return None
4. Message Queue
Handles async processing for long-running AI tasks.
When to use:
- Tasks taking > 30 seconds
- Batch processing of documents
- Retry logic for failed LLM calls
Popular choices: Redis Queue, Celery, AWS SQS
5. API Gateway
The entry point for all requests.
Responsibilities:
- Authentication and authorization
- Rate limiting (critical for cost control)
- Request routing
- Response formatting
Component Selection Framework
When choosing components, ask:
- Scale: How many requests per second?
- Latency: What's acceptable response time?
- Cost: What's the budget per query?
- Reliability: What's the uptime requirement?
- Complexity: Can the team maintain it?
Next, we'll dive into scalability concepts for AI systems. :::