Lesson 4 of 23

AI System Design Fundamentals

Design Framework

4 min read

Having a structured approach separates good candidates from great ones. The RADIO framework provides a systematic way to tackle any AI system design problem.

The RADIO Framework

Step Focus Time
Requirements What are we building? 5-10 min
Architecture High-level components 10-15 min
Data Storage, flow, models 5-10 min
Infrastructure Scaling, deployment 5-10 min
Operations Monitoring, safety 5 min

R - Requirements

Always start here. Ask clarifying questions:

Functional Requirements:
- What is the primary user interaction?
- What inputs and outputs are expected?
- Are there accuracy requirements?

Non-Functional Requirements:
- Expected QPS (queries per second)?
- Acceptable latency (p50, p99)?
- Budget constraints?
- Compliance requirements (GDPR, HIPAA)?

Example dialogue:

Interviewer: "Design a document Q&A system."

You: "Before I start, I'd like to clarify a few things:

  • How large are the documents? Single pages or hundreds of pages?
  • Do we need to cite sources in our answers?
  • What's the expected latency? Sub-second or is 5-10 seconds acceptable?
  • How many concurrent users should we support?"

A - Architecture

Draw the high-level system:

┌──────────────────────────────────────────────────────────┐
│                    Document Q&A System                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│   ┌─────────┐    ┌─────────┐    ┌─────────────────────┐ │
│   │  User   │───▶│   API   │───▶│   Query Processor   │ │
│   └─────────┘    └─────────┘    └─────────────────────┘ │
│                                          │               │
│                                          ▼               │
│   ┌─────────┐    ┌─────────┐    ┌─────────────────────┐ │
│   │ Vector  │◀───│Embedding│◀───│      Retriever      │ │
│   │   DB    │    │  Model  │    └─────────────────────┘ │
│   └─────────┘    └─────────┘             │               │
│                                          ▼               │
│                              ┌─────────────────────────┐ │
│                              │    LLM (Generation)     │ │
│                              └─────────────────────────┘ │
│                                          │               │
│                                          ▼               │
│                              ┌─────────────────────────┐ │
│                              │   Response + Citations  │ │
│                              └─────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

D - Data

Define data models and flow:

# Core data models
class Document:
    id: str
    content: str
    metadata: dict  # source, date, author
    chunks: List[Chunk]

class Chunk:
    id: str
    document_id: str
    content: str
    embedding: List[float]
    position: int  # For citation

class Query:
    id: str
    text: str
    embedding: List[float]
    retrieved_chunks: List[Chunk]
    response: str

Data flow:

  1. Documents ingested → chunked → embedded → stored
  2. Query received → embedded → similar chunks retrieved
  3. Chunks + query → LLM → response with citations

I - Infrastructure

Discuss scaling and deployment:

Component Scaling Strategy
API Layer Horizontal, auto-scale on CPU
Vector DB Sharding by document collection
LLM Calls Multiple API keys, provider fallback
Cache Redis cluster, replicated

Cost estimation:

Daily queries: 100,000
Avg tokens per query: 2,000 (input) + 500 (output)

LLM cost: 100K × ($0.01 × 2 + $0.03 × 0.5) = $3,500/day
Vector DB: $100/month
Infrastructure: $500/month

Total: ~$110,000/month

O - Operations

Cover monitoring and safety:

Monitoring:

  • Latency percentiles (p50, p95, p99)
  • Error rates by type
  • Cache hit rates
  • Cost per query

Safety:

  • Input validation (length, content filtering)
  • Output guardrails (PII detection, harmful content)
  • Rate limiting per user

Evaluation:

  • Automated metrics (retrieval accuracy, response relevance)
  • Human evaluation sampling
  • A/B testing for prompt changes

Framework in Action

Practice: When given a design problem, write down RADIO vertically and fill in each section. This keeps you organized and ensures you don't miss critical aspects.

Now that you have the fundamentals, let's dive into LLM application architecture. :::

Quiz

Module 1: AI System Design Fundamentals

Take Quiz