Design Framework

Having a structured approach separates good candidates from great ones. The RADIO framework provides a systematic way to tackle any AI system design problem.

The RADIO Framework

Step	Focus	Time
Requirements	What are we building?	5-10 min
Architecture	High-level components	10-15 min
Data	Storage, flow, models	5-10 min
Infrastructure	Scaling, deployment	5-10 min
Operations	Monitoring, safety	5 min

R - Requirements

Always start here. Ask clarifying questions:

Functional Requirements:
- What is the primary user interaction?
- What inputs and outputs are expected?
- Are there accuracy requirements?

Non-Functional Requirements:
- Expected QPS (queries per second)?
- Acceptable latency (p50, p99)?
- Budget constraints?
- Compliance requirements (GDPR, HIPAA)?

Example dialogue:

Interviewer: "Design a document Q&A system."

You: "Before I start, I'd like to clarify a few things:

How large are the documents? Single pages or hundreds of pages?

Do we need to cite sources in our answers?

What's the expected latency? Sub-second or is 5-10 seconds acceptable?

How many concurrent users should we support?"

A - Architecture

Draw the high-level system:

┌──────────────────────────────────────────────────────────┐
│                    Document Q&A System                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│   ┌─────────┐    ┌─────────┐    ┌─────────────────────┐ │
│   │  User   │───▶│   API   │───▶│   Query Processor   │ │
│   └─────────┘    └─────────┘    └─────────────────────┘ │
│                                          │               │
│                                          ▼               │
│   ┌─────────┐    ┌─────────┐    ┌─────────────────────┐ │
│   │ Vector  │◀───│Embedding│◀───│      Retriever      │ │
│   │   DB    │    │  Model  │    └─────────────────────┘ │
│   └─────────┘    └─────────┘             │               │
│                                          ▼               │
│                              ┌─────────────────────────┐ │
│                              │    LLM (Generation)     │ │
│                              └─────────────────────────┘ │
│                                          │               │
│                                          ▼               │
│                              ┌─────────────────────────┐ │
│                              │   Response + Citations  │ │
│                              └─────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

D - Data

Define data models and flow:

# Core data models
class Document:
    id: str
    content: str
    metadata: dict  # source, date, author
    chunks: List[Chunk]

class Chunk:
    id: str
    document_id: str
    content: str
    embedding: List[float]
    position: int  # For citation

class Query:
    id: str
    text: str
    embedding: List[float]
    retrieved_chunks: List[Chunk]
    response: str

Data flow:

Documents ingested → chunked → embedded → stored
Query received → embedded → similar chunks retrieved
Chunks + query → LLM → response with citations

I - Infrastructure

Discuss scaling and deployment:

Component	Scaling Strategy
API Layer	Horizontal, auto-scale on CPU
Vector DB	Sharding by document collection
LLM Calls	Multiple API keys, provider fallback
Cache	Redis cluster, replicated

Cost estimation:

Daily queries: 100,000
Avg tokens per query: 2,000 (input) + 500 (output)

LLM cost: 100K × ($0.01 × 2 + $0.03 × 0.5) = $3,500/day
Vector DB: $100/month
Infrastructure: $500/month

Total: ~$110,000/month

O - Operations

Cover monitoring and safety:

Monitoring:

Latency percentiles (p50, p95, p99)
Error rates by type
Cache hit rates
Cost per query

Safety:

Input validation (length, content filtering)
Output guardrails (PII detection, harmful content)
Rate limiting per user

Evaluation:

Automated metrics (retrieval accuracy, response relevance)
Human evaluation sampling
A/B testing for prompt changes

Framework in Action

Practice: When given a design problem, write down RADIO vertically and fill in each section. This keeps you organized and ensures you don't miss critical aspects.

Now that you have the fundamentals, let's dive into LLM application architecture. :::