Building Your Own AI Assistant

Model Selection & Platform Considerations

5 min read

Choosing the right model for your AI assistant is crucial. This lesson covers the latest models (January 2026) and how to select based on your needs.

Current Model Landscape (January 2026)

Claude Models (Anthropic)

Model Best For Context Cost
Claude Opus 4.5 Complex reasoning, agentic tasks 200K $15/$75 per MTok
Claude Sonnet 4.5 Balanced performance, coding 200K $3/$15 per MTok
Claude Haiku 4 Fast responses, simple tasks 200K $0.25/$1.25 per MTok
Claude Model Selection:
Opus 4.5 when:
- Complex multi-step reasoning needed
- Highest accuracy required
- Building autonomous agents
- 80.9% on SWE-Bench Verified

Sonnet 4.5 when:
- Coding tasks
- Balance of speed and quality
- Most production use cases
- Best coding model currently

Haiku 4 when:
- Simple queries
- High volume, low latency
- Cost-sensitive applications
- Classification tasks

OpenAI Models

Model Best For Context Cost
GPT-5.2 Pro Complex analysis, research 128K Premium
GPT-5.2 Thinking Step-by-step reasoning 128K High
GPT-5.2 Instant Fast responses 128K Standard
GPT-5.2-Codex Agentic coding 128K Variable
GPT-5.2 Selection:
Pro when:
- Deep research tasks
- Maximum capability needed
- Complex document analysis

Thinking when:
- Multi-step problems
- Math and logic
- Explicit reasoning required

Instant when:
- Real-time applications
- Chatbots
- Quick answers

Codex when:
- Autonomous coding
- Multi-file operations
- Agentic development

Google Models

Model Best For Context Cost
Gemini 3 Ultra Multimodal, complex tasks 2M Premium
Gemini 3 Pro Production workloads 1M Standard
Gemini 3 Flash Speed-optimized 1M Low
Gemini 3 Selection:
Ultra when:
- Complex multimodal tasks
- Video analysis
- Maximum capability

Pro when:
- Production applications
- Balanced performance
- Long context needs

Flash when:
- Fastest responses needed
- 3x faster than 2.5 Pro
- High-volume applications

Model Selection Framework

Decision Matrix

Step 1: Task Complexity
├── Simple (Q&A, classification) → Haiku/Flash/Instant
├── Medium (coding, analysis) → Sonnet/Pro/Flash
└── Complex (reasoning, agents) → Opus/Pro/Ultra

Step 2: Latency Requirements
├── Real-time (<1s) → Haiku/Flash/Instant
├── Interactive (<5s) → Sonnet/Pro
└── Batch (any) → Any model

Step 3: Cost Constraints
├── High volume, low margin → Haiku/Flash
├── Moderate volume → Sonnet/Pro
└── Low volume, high value → Opus/Ultra

Step 4: Special Requirements
├── Coding → Claude Sonnet 4.5, GPT-5.2-Codex
├── Long context → Gemini 3 (2M tokens)
├── Multimodal → Gemini 3, Claude Opus 4.5
└── Agentic → Claude Opus 4.5, Devin

Cost Optimization Strategies

Model Routing:
┌─────────────────┐
│  User Request   │
└────────┬────────┘
    ┌────┴────┐
    │Classifier│ (Haiku/Flash - cheap)
    └────┬────┘
    ┌────┴────────────────────────┐
    │                              │
    ▼                              ▼
Simple Request              Complex Request
    │                              │
    ▼                              ▼
Haiku/Flash                 Sonnet/Opus
(Fast, cheap)              (Capable, expensive)

Cursor's Model Strategy

From Cursor's implementation:

Cursor Model Routing:
- Tab completion: Fast model (Haiku-class)
- Chat: Sonnet 4.5 (default)
- Agent mode: Opus 4.5 (complex tasks)
- Background agents: Sonnet 4.5

Dynamic selection based on:
- Task complexity detected
- User tier (Pro/Business)
- Token budget remaining

Platform Comparison

API Direct vs Managed Platforms

Direct API (Claude/OpenAI/Google):
Pros:
- Full control
- Lower per-token cost
- Custom implementation
- No vendor lock-in (prompt level)

Cons:
- Build infrastructure yourself
- Handle rate limiting
- Manage failovers
- No built-in tools

Managed Platforms (Cursor/Windsurf):
Pros:
- Built-in IDE integration
- Pre-configured tools
- Team collaboration
- Managed infrastructure

Cons:
- Higher cost
- Less customization
- Platform dependencies

Multi-Model Architecture

Multi-Model Pattern:
┌─────────────────────────────────────┐
│         Application Layer           │
├─────────────────────────────────────┤
│         Router/Orchestrator         │
├─────────┬─────────┬─────────────────┤
│ Claude  │  GPT    │    Gemini       │
│ Sonnet  │  5.2    │    Flash        │
├─────────┴─────────┴─────────────────┤
│         Fallback Strategy           │
└─────────────────────────────────────┘

Benefits:
- Redundancy
- Cost optimization
- Feature matching
- No single vendor dependency

Context Window Strategies

Context Management:

Small context (<10K tokens):
- Direct prompt
- No special handling

Medium context (10K-50K):
- Chunking strategy
- Summary of earlier context
- Key information first

Large context (50K-200K):
- RAG hybrid approach
- Rolling summaries
- Indexed retrieval

Massive context (200K+):
- Gemini 3 for full context
- Or hierarchical summarization
- Key-value caching

Testing Across Models

Model Testing Framework:
1. Define test cases
   - Simple queries
   - Complex reasoning
   - Edge cases
   - Safety scenarios

2. Run against multiple models
   - Record responses
   - Measure latency
   - Track costs

3. Evaluate quality
   - Accuracy score
   - Format compliance
   - Safety adherence

4. Calculate ROI
   - Quality per dollar
   - Latency trade-offs
   - User satisfaction

Model Migration Path

Migration Strategy:
Phase 1: Start with best model (Opus/Pro)
- Learn what works
- No cost optimization
- Focus on quality

Phase 2: Identify downgrade candidates
- Simple queries
- Repeatable patterns
- Non-critical paths

Phase 3: Implement routing
- Classify requests
- Route appropriately
- Monitor quality

Phase 4: Optimize continuously
- A/B test models
- Update routing rules
- New model evaluation

Key Insight: Model selection isn't one-time—it's an ongoing optimization. Start with the best model for quality, then gradually introduce routing for cost efficiency. The best production systems use multiple models strategically based on task requirements.

Next, we'll put it all together with a complete implementation example. :::

Quiz

Module 6: Building Your Own AI Assistant

Take Quiz