Lesson 3 of 22

Why Local LLMs?

Open Source Model Landscape

3 min read

The open-source LLM ecosystem has exploded. Here's your guide to the major model families and when to use each.

The Major Model Families (2026)

┌─────────────────────────────────────────────────────────────────┐
│                    Open Source Model Landscape                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Meta (Llama)        Mistral AI           Microsoft (Phi)       │
│  ───────────         ──────────           ───────────────       │
│  • Llama 3.3, 4      • Mistral Small 3    • Phi-4               │
│  • 3.2: 1B,3B        • Mixtral 8x22B      • 14B (medium)        │
│  • 3.1: 8B, 3.3: 70B • Best efficiency    • 3.8B (mini)         │
│  • Llama License     • Apache 2.0         • MIT License         │
│                                                                 │
│  Alibaba (Qwen)      DeepSeek             Google (Gemma)        │
│  ──────────────      ────────             ──────────────        │
│  • Qwen 3 / 2.5      • DeepSeek-V3 / R1   • Gemma 3             │
│  • 0.5B to 72B       • DeepSeek-Coder     • 2B, 9B, 27B         │
│  • Best multilingual • Best cost/perf     • Best instruction    │
│  • Apache 2.0        • MIT License        • following           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Model Comparison by Task

Llama 3.x / 4 (Meta) - The Gold Standard

Best for: General-purpose, most tasks

# Available sizes in Ollama
ollama pull llama3.2:1b    # Ultra-fast, edge devices
ollama pull llama3.2:3b    # Mobile/laptop
ollama pull llama3.1:8b    # Great balance (Llama 3.1)
ollama pull llama3.3:70b   # Maximum capability (Llama 3.3)

Strengths:

  • Best overall quality across tasks
  • Excellent instruction following
  • Strong reasoning and coding
  • Most community support and fine-tunes

Weaknesses:

  • 70B requires significant hardware
  • Not the best for pure coding tasks

Mistral/Mixtral (Mistral AI) - Efficiency King

Best for: Fast inference, resource-constrained environments

ollama pull mistral        # 7B, excellent efficiency
ollama pull mixtral        # 8x7B MoE, near-70B quality
ollama pull mixtral:8x22b  # Maximum Mistral capability

Strengths:

  • Best tokens/second for quality level
  • MoE architecture (Mixtral) - uses only 12B params per token
  • Excellent for European languages

Weaknesses:

  • Smaller community than Llama
  • Less fine-tuned variants available

DeepSeek-V3 / R1 - Best Value

Best for: Maximum quality per dollar, strong reasoning

ollama pull deepseek-v3    # Frontier-competitive general model
ollama pull deepseek-r1    # Reasoning-optimized (2025)
ollama pull deepseek-coder # Specialized for code

Strengths:

  • Matches GPT-4o on many benchmarks
  • Extremely cost-effective training
  • Excellent coding capabilities

Weaknesses:

  • Newer, less battle-tested
  • Fewer fine-tuned variants

Phi-4 (Microsoft) - Small but Mighty

Best for: Edge deployment, mobile, resource-limited

ollama pull phi3:mini      # 3.8B, runs on phones
ollama pull phi4           # 14B, laptop-friendly, latest generation

Strengths:

  • Incredible quality for size
  • Runs on minimal hardware
  • Fast inference

Weaknesses:

  • Limited context length
  • Not suitable for complex reasoning

Qwen 2.5 (Alibaba) - Multilingual Champion

Best for: Non-English languages, especially Asian languages

ollama pull qwen2.5:0.5b   # Ultra-compact
ollama pull qwen2.5:7b     # Good balance
ollama pull qwen2.5:72b    # Full capability

Strengths:

  • Best multilingual support
  • Excellent for Chinese, Japanese, Korean
  • Strong reasoning in all languages

Weaknesses:

  • Less English-focused tuning
  • Requires more context for English tasks

Quick Selection Guide

What's your primary use case?

├── General Assistant / Chat
│   └── Use: llama3.1:8b or llama3.3:70b
├── Code Generation / Review
│   └── Use: deepseek-coder or llama3.3:70b
├── Fast Inference Needed
│   └── Use: mistral or phi3:mini
├── Non-English Languages
│   └── Use: qwen2.5:7b or qwen2.5:72b
├── Edge / Mobile Deployment
│   └── Use: phi3:mini or llama3.2:1b
├── RAG / Document Q&A
│   └── Use: llama3.1:8b or mistral
└── Maximum Quality (hardware available)
    └── Use: llama3.3:70b or deepseek-v3

Model Sizes and Requirements

ModelParametersVRAMRAM (CPU)Speed (M3 Max)
phi3:mini3.8B3 GB6 GB80 tok/s
mistral7B6 GB10 GB45 tok/s
llama3.1:8b8B7 GB12 GB40 tok/s
qwen2.5:14b14B12 GB20 GB25 tok/s
mixtral:8x7b47B*26 GB48 GB20 tok/s
llama3.3:70b70B40 GB80 GB8 tok/s

*Mixtral uses 12B active parameters per token due to MoE

Licensing Overview

Open for Commercial Use:
├── Llama Community License: Llama 3.x, 4 (with acceptable-use policy)
├── Apache 2.0: Mistral, Qwen 2.5 / 3
├── MIT: DeepSeek, Phi-3
└── Gemma License: Gemma 2 / 3 (with restrictions)

Key Considerations:
• All listed models allow commercial use
• Llama has acceptable use policy (no harm)
• Check fine-tuned model licenses separately

Staying Current

The landscape changes monthly. Key resources:

  1. Hugging Face Open LLM Leaderboard - Benchmark comparisons
  2. Ollama Model Library - Available models: ollama.com/library
  3. r/LocalLLaMA - Community discussions and discoveries
  4. Papers With Code - Latest research and benchmarks

In the next module, we'll get hands-on with Ollama to run these models locally. :::

Quick check: how does this lesson land for you?

Quiz

Module 1: Why Local LLMs?

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.