Building Your Own AI Assistant
Model Selection & Platform Considerations
5 min read
Choosing the right model for your AI assistant is crucial. This lesson covers the latest models (January 2026) and how to select based on your needs.
Current Model Landscape (January 2026)
Claude Models (Anthropic)
| Model | Best For | Context | Cost |
|---|---|---|---|
| Claude Opus 4.5 | Complex reasoning, agentic tasks | 200K | $15/$75 per MTok |
| Claude Sonnet 4.5 | Balanced performance, coding | 200K | $3/$15 per MTok |
| Claude Haiku 4 | Fast responses, simple tasks | 200K | $0.25/$1.25 per MTok |
Claude Model Selection:
Opus 4.5 when:
- Complex multi-step reasoning needed
- Highest accuracy required
- Building autonomous agents
- 80.9% on SWE-Bench Verified
Sonnet 4.5 when:
- Coding tasks
- Balance of speed and quality
- Most production use cases
- Best coding model currently
Haiku 4 when:
- Simple queries
- High volume, low latency
- Cost-sensitive applications
- Classification tasks
OpenAI Models
| Model | Best For | Context | Cost |
|---|---|---|---|
| GPT-5.2 Pro | Complex analysis, research | 128K | Premium |
| GPT-5.2 Thinking | Step-by-step reasoning | 128K | High |
| GPT-5.2 Instant | Fast responses | 128K | Standard |
| GPT-5.2-Codex | Agentic coding | 128K | Variable |
GPT-5.2 Selection:
Pro when:
- Deep research tasks
- Maximum capability needed
- Complex document analysis
Thinking when:
- Multi-step problems
- Math and logic
- Explicit reasoning required
Instant when:
- Real-time applications
- Chatbots
- Quick answers
Codex when:
- Autonomous coding
- Multi-file operations
- Agentic development
Google Models
| Model | Best For | Context | Cost |
|---|---|---|---|
| Gemini 3 Ultra | Multimodal, complex tasks | 2M | Premium |
| Gemini 3 Pro | Production workloads | 1M | Standard |
| Gemini 3 Flash | Speed-optimized | 1M | Low |
Gemini 3 Selection:
Ultra when:
- Complex multimodal tasks
- Video analysis
- Maximum capability
Pro when:
- Production applications
- Balanced performance
- Long context needs
Flash when:
- Fastest responses needed
- 3x faster than 2.5 Pro
- High-volume applications
Model Selection Framework
Decision Matrix
Step 1: Task Complexity
├── Simple (Q&A, classification) → Haiku/Flash/Instant
├── Medium (coding, analysis) → Sonnet/Pro/Flash
└── Complex (reasoning, agents) → Opus/Pro/Ultra
Step 2: Latency Requirements
├── Real-time (<1s) → Haiku/Flash/Instant
├── Interactive (<5s) → Sonnet/Pro
└── Batch (any) → Any model
Step 3: Cost Constraints
├── High volume, low margin → Haiku/Flash
├── Moderate volume → Sonnet/Pro
└── Low volume, high value → Opus/Ultra
Step 4: Special Requirements
├── Coding → Claude Sonnet 4.5, GPT-5.2-Codex
├── Long context → Gemini 3 (2M tokens)
├── Multimodal → Gemini 3, Claude Opus 4.5
└── Agentic → Claude Opus 4.5, Devin
Cost Optimization Strategies
Model Routing:
┌─────────────────┐
│ User Request │
└────────┬────────┘
│
┌────┴────┐
│Classifier│ (Haiku/Flash - cheap)
└────┬────┘
│
┌────┴────────────────────────┐
│ │
▼ ▼
Simple Request Complex Request
│ │
▼ ▼
Haiku/Flash Sonnet/Opus
(Fast, cheap) (Capable, expensive)
Cursor's Model Strategy
From Cursor's implementation:
Cursor Model Routing:
- Tab completion: Fast model (Haiku-class)
- Chat: Sonnet 4.5 (default)
- Agent mode: Opus 4.5 (complex tasks)
- Background agents: Sonnet 4.5
Dynamic selection based on:
- Task complexity detected
- User tier (Pro/Business)
- Token budget remaining
Platform Comparison
API Direct vs Managed Platforms
Direct API (Claude/OpenAI/Google):
Pros:
- Full control
- Lower per-token cost
- Custom implementation
- No vendor lock-in (prompt level)
Cons:
- Build infrastructure yourself
- Handle rate limiting
- Manage failovers
- No built-in tools
Managed Platforms (Cursor/Windsurf):
Pros:
- Built-in IDE integration
- Pre-configured tools
- Team collaboration
- Managed infrastructure
Cons:
- Higher cost
- Less customization
- Platform dependencies
Multi-Model Architecture
Multi-Model Pattern:
┌─────────────────────────────────────┐
│ Application Layer │
├─────────────────────────────────────┤
│ Router/Orchestrator │
├─────────┬─────────┬─────────────────┤
│ Claude │ GPT │ Gemini │
│ Sonnet │ 5.2 │ Flash │
├─────────┴─────────┴─────────────────┤
│ Fallback Strategy │
└─────────────────────────────────────┘
Benefits:
- Redundancy
- Cost optimization
- Feature matching
- No single vendor dependency
Context Window Strategies
Context Management:
Small context (<10K tokens):
- Direct prompt
- No special handling
Medium context (10K-50K):
- Chunking strategy
- Summary of earlier context
- Key information first
Large context (50K-200K):
- RAG hybrid approach
- Rolling summaries
- Indexed retrieval
Massive context (200K+):
- Gemini 3 for full context
- Or hierarchical summarization
- Key-value caching
Testing Across Models
Model Testing Framework:
1. Define test cases
- Simple queries
- Complex reasoning
- Edge cases
- Safety scenarios
2. Run against multiple models
- Record responses
- Measure latency
- Track costs
3. Evaluate quality
- Accuracy score
- Format compliance
- Safety adherence
4. Calculate ROI
- Quality per dollar
- Latency trade-offs
- User satisfaction
Model Migration Path
Migration Strategy:
Phase 1: Start with best model (Opus/Pro)
- Learn what works
- No cost optimization
- Focus on quality
Phase 2: Identify downgrade candidates
- Simple queries
- Repeatable patterns
- Non-critical paths
Phase 3: Implement routing
- Classify requests
- Route appropriately
- Monitor quality
Phase 4: Optimize continuously
- A/B test models
- Update routing rules
- New model evaluation
Key Insight: Model selection isn't one-time—it's an ongoing optimization. Start with the best model for quality, then gradually introduce routing for cost efficiency. The best production systems use multiple models strategically based on task requirements.
Next, we'll put it all together with a complete implementation example. :::