Why Local LLMs?
Open Source Model Landscape
3 min read
The open-source LLM ecosystem has exploded. Here's your guide to the major model families and when to use each.
The Major Model Families (2025)
┌─────────────────────────────────────────────────────────────────┐
│ Open Source Model Landscape │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Meta (Llama) Mistral AI Microsoft (Phi) │
│ ─────────── ────────── ─────────────── │
│ • Llama 3.2 • Mistral 7B • Phi-3 │
│ • 1B, 3B, 8B, 70B • Mixtral 8x7B • 3.8B (mini) │
│ • Best all-rounder • Mixtral 8x22B • 14B (medium) │
│ • Apache 2.0 • Best efficiency • Best for size │
│ • Apache 2.0 • MIT License │
│ │
│ Alibaba (Qwen) DeepSeek Google (Gemma) │
│ ────────────── ──────── ────────────── │
│ • Qwen 2.5 • DeepSeek-V3 • Gemma 2 │
│ • 0.5B to 72B • DeepSeek-Coder • 2B, 9B, 27B │
│ • Best multilingual • Best cost/perf • Best instruction │
│ • Apache 2.0 • MIT License • following │
│ │
└─────────────────────────────────────────────────────────────────┘
Model Comparison by Task
Llama 3.2 (Meta) - The Gold Standard
Best for: General-purpose, most tasks
# Available sizes in Ollama
ollama pull llama3.2:1b # Ultra-fast, edge devices
ollama pull llama3.2:3b # Mobile/laptop
ollama pull llama3.2 # 8B default, great balance
ollama pull llama3.2:70b # Maximum capability
Strengths:
- Best overall quality across tasks
- Excellent instruction following
- Strong reasoning and coding
- Most community support and fine-tunes
Weaknesses:
- 70B requires significant hardware
- Not the best for pure coding tasks
Mistral/Mixtral (Mistral AI) - Efficiency King
Best for: Fast inference, resource-constrained environments
ollama pull mistral # 7B, excellent efficiency
ollama pull mixtral # 8x7B MoE, near-70B quality
ollama pull mixtral:8x22b # Maximum Mistral capability
Strengths:
- Best tokens/second for quality level
- MoE architecture (Mixtral) - uses only 12B params per token
- Excellent for European languages
Weaknesses:
- Smaller community than Llama
- Less fine-tuned variants available
DeepSeek-V3 - Best Value
Best for: Maximum quality per dollar
ollama pull deepseek-v3 # Latest (December 2024)
ollama pull deepseek-coder # Specialized for code
Strengths:
- Matches GPT-4 on many benchmarks
- Extremely cost-effective training
- Excellent coding capabilities
Weaknesses:
- Newer, less battle-tested
- Fewer fine-tuned variants
Phi-3 (Microsoft) - Small but Mighty
Best for: Edge deployment, mobile, resource-limited
ollama pull phi3:mini # 3.8B, runs on phones
ollama pull phi3:medium # 14B, laptop-friendly
Strengths:
- Incredible quality for size
- Runs on minimal hardware
- Fast inference
Weaknesses:
- Limited context length
- Not suitable for complex reasoning
Qwen 2.5 (Alibaba) - Multilingual Champion
Best for: Non-English languages, especially Asian languages
ollama pull qwen2.5:0.5b # Ultra-compact
ollama pull qwen2.5:7b # Good balance
ollama pull qwen2.5:72b # Full capability
Strengths:
- Best multilingual support
- Excellent for Chinese, Japanese, Korean
- Strong reasoning in all languages
Weaknesses:
- Less English-focused tuning
- Requires more context for English tasks
Quick Selection Guide
What's your primary use case?
├── General Assistant / Chat
│ └── Use: llama3.2:8b or llama3.2:70b
│
├── Code Generation / Review
│ └── Use: deepseek-coder or llama3.2:70b
│
├── Fast Inference Needed
│ └── Use: mistral or phi3:mini
│
├── Non-English Languages
│ └── Use: qwen2.5:7b or qwen2.5:72b
│
├── Edge / Mobile Deployment
│ └── Use: phi3:mini or llama3.2:1b
│
├── RAG / Document Q&A
│ └── Use: llama3.2:8b or mistral
│
└── Maximum Quality (hardware available)
└── Use: llama3.2:70b or deepseek-v3
Model Sizes and Requirements
| Model | Parameters | VRAM | RAM (CPU) | Speed (M3 Max) |
|---|---|---|---|---|
| phi3:mini | 3.8B | 3 GB | 6 GB | 80 tok/s |
| mistral | 7B | 6 GB | 10 GB | 45 tok/s |
| llama3.2:8b | 8B | 7 GB | 12 GB | 40 tok/s |
| qwen2.5:14b | 14B | 12 GB | 20 GB | 25 tok/s |
| mixtral:8x7b | 47B* | 26 GB | 48 GB | 20 tok/s |
| llama3.2:70b | 70B | 40 GB | 80 GB | 8 tok/s |
*Mixtral uses 12B active parameters per token due to MoE
Licensing Overview
Open for Commercial Use:
├── Apache 2.0: Llama 3.2, Mistral, Qwen 2.5
├── MIT: DeepSeek, Phi-3
└── Gemma License: Gemma 2 (with restrictions)
Key Considerations:
• All listed models allow commercial use
• Llama has acceptable use policy (no harm)
• Check fine-tuned model licenses separately
Staying Current
The landscape changes monthly. Key resources:
- Hugging Face Open LLM Leaderboard - Benchmark comparisons
- Ollama Model Library - Available models:
ollama.com/library - r/LocalLLaMA - Community discussions and discoveries
- Papers With Code - Latest research and benchmarks
In the next module, we'll get hands-on with Ollama to run these models locally. :::