Lesson 3 of 22

Why Local LLMs?

Open Source Model Landscape

3 min read

The open-source LLM ecosystem has exploded. Here's your guide to the major model families and when to use each.

The Major Model Families (2025)

┌─────────────────────────────────────────────────────────────────┐
│                    Open Source Model Landscape                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Meta (Llama)        Mistral AI           Microsoft (Phi)       │
│  ───────────         ──────────           ───────────────       │
│  • Llama 3.2         • Mistral 7B         • Phi-3               │
│  • 1B, 3B, 8B, 70B   • Mixtral 8x7B       • 3.8B (mini)         │
│  • Best all-rounder  • Mixtral 8x22B      • 14B (medium)        │
│  • Apache 2.0        • Best efficiency    • Best for size       │
│                      • Apache 2.0         • MIT License         │
│                                                                 │
│  Alibaba (Qwen)      DeepSeek             Google (Gemma)        │
│  ──────────────      ────────             ──────────────        │
│  • Qwen 2.5          • DeepSeek-V3        • Gemma 2             │
│  • 0.5B to 72B       • DeepSeek-Coder     • 2B, 9B, 27B         │
│  • Best multilingual • Best cost/perf     • Best instruction    │
│  • Apache 2.0        • MIT License        • following           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Model Comparison by Task

Llama 3.2 (Meta) - The Gold Standard

Best for: General-purpose, most tasks

# Available sizes in Ollama
ollama pull llama3.2:1b    # Ultra-fast, edge devices
ollama pull llama3.2:3b    # Mobile/laptop
ollama pull llama3.2       # 8B default, great balance
ollama pull llama3.2:70b   # Maximum capability

Strengths:

  • Best overall quality across tasks
  • Excellent instruction following
  • Strong reasoning and coding
  • Most community support and fine-tunes

Weaknesses:

  • 70B requires significant hardware
  • Not the best for pure coding tasks

Mistral/Mixtral (Mistral AI) - Efficiency King

Best for: Fast inference, resource-constrained environments

ollama pull mistral        # 7B, excellent efficiency
ollama pull mixtral        # 8x7B MoE, near-70B quality
ollama pull mixtral:8x22b  # Maximum Mistral capability

Strengths:

  • Best tokens/second for quality level
  • MoE architecture (Mixtral) - uses only 12B params per token
  • Excellent for European languages

Weaknesses:

  • Smaller community than Llama
  • Less fine-tuned variants available

DeepSeek-V3 - Best Value

Best for: Maximum quality per dollar

ollama pull deepseek-v3    # Latest (December 2024)
ollama pull deepseek-coder # Specialized for code

Strengths:

  • Matches GPT-4 on many benchmarks
  • Extremely cost-effective training
  • Excellent coding capabilities

Weaknesses:

  • Newer, less battle-tested
  • Fewer fine-tuned variants

Phi-3 (Microsoft) - Small but Mighty

Best for: Edge deployment, mobile, resource-limited

ollama pull phi3:mini      # 3.8B, runs on phones
ollama pull phi3:medium    # 14B, laptop-friendly

Strengths:

  • Incredible quality for size
  • Runs on minimal hardware
  • Fast inference

Weaknesses:

  • Limited context length
  • Not suitable for complex reasoning

Qwen 2.5 (Alibaba) - Multilingual Champion

Best for: Non-English languages, especially Asian languages

ollama pull qwen2.5:0.5b   # Ultra-compact
ollama pull qwen2.5:7b     # Good balance
ollama pull qwen2.5:72b    # Full capability

Strengths:

  • Best multilingual support
  • Excellent for Chinese, Japanese, Korean
  • Strong reasoning in all languages

Weaknesses:

  • Less English-focused tuning
  • Requires more context for English tasks

Quick Selection Guide

What's your primary use case?

├── General Assistant / Chat
│   └── Use: llama3.2:8b or llama3.2:70b
├── Code Generation / Review
│   └── Use: deepseek-coder or llama3.2:70b
├── Fast Inference Needed
│   └── Use: mistral or phi3:mini
├── Non-English Languages
│   └── Use: qwen2.5:7b or qwen2.5:72b
├── Edge / Mobile Deployment
│   └── Use: phi3:mini or llama3.2:1b
├── RAG / Document Q&A
│   └── Use: llama3.2:8b or mistral
└── Maximum Quality (hardware available)
    └── Use: llama3.2:70b or deepseek-v3

Model Sizes and Requirements

Model Parameters VRAM RAM (CPU) Speed (M3 Max)
phi3:mini 3.8B 3 GB 6 GB 80 tok/s
mistral 7B 6 GB 10 GB 45 tok/s
llama3.2:8b 8B 7 GB 12 GB 40 tok/s
qwen2.5:14b 14B 12 GB 20 GB 25 tok/s
mixtral:8x7b 47B* 26 GB 48 GB 20 tok/s
llama3.2:70b 70B 40 GB 80 GB 8 tok/s

*Mixtral uses 12B active parameters per token due to MoE

Licensing Overview

Open for Commercial Use:
├── Apache 2.0: Llama 3.2, Mistral, Qwen 2.5
├── MIT: DeepSeek, Phi-3
└── Gemma License: Gemma 2 (with restrictions)

Key Considerations:
• All listed models allow commercial use
• Llama has acceptable use policy (no harm)
• Check fine-tuned model licenses separately

Staying Current

The landscape changes monthly. Key resources:

  1. Hugging Face Open LLM Leaderboard - Benchmark comparisons
  2. Ollama Model Library - Available models: ollama.com/library
  3. r/LocalLLaMA - Community discussions and discoveries
  4. Papers With Code - Latest research and benchmarks

In the next module, we'll get hands-on with Ollama to run these models locally. :::

Quiz

Module 1: Why Local LLMs?

Take Quiz