Installing Ollama

Ollama makes running local LLMs as easy as running Docker containers. Let's get it installed on your system.

What is Ollama?

┌─────────────────────────────────────────────────────────────────┐
│                         Ollama                                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  • Open-source LLM runner (MIT License)                         │
│  • Runs GGUF quantized models                                   │
│  • Built-in model management (pull, run, delete)                │
│  • REST API on port 11434                                       │
│  • Native optimization for Apple Silicon, NVIDIA, AMD           │
│  • Simple CLI interface                                         │
│                                                                 │
│  Think of it as "Docker for LLMs"                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Installation by Platform

macOS (Recommended: Native Install)

# Option 1: Direct download (easiest)
# Visit https://ollama.com/download and download the app

# Option 2: Homebrew
brew install ollama

# Verify installation
ollama --version
# Output: ollama version 0.4.x

After installation, Ollama runs as a background service automatically.

Linux

# One-line installer (recommended)
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

# Check service status
systemctl status ollama

For manual installation or specific distributions:

# Ubuntu/Debian manual install
curl -L https://ollama.com/download/ollama-linux-amd64 -o ollama
chmod +x ollama
sudo mv ollama /usr/local/bin/

# Start Ollama server
ollama serve

Windows

# Option 1: Download installer from https://ollama.com/download

# Option 2: Using winget
winget install Ollama.Ollama

# Verify installation
ollama --version

Docker (All Platforms)

# CPU only
docker run -d -v ollama:/root/.ollama -p 11434:11434 \
  --name ollama ollama/ollama

# With NVIDIA GPU
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 \
  --name ollama ollama/ollama

# Verify it's running
curl http://localhost:11434/api/version

Verifying Your Installation

# Check version
ollama --version

# List available commands
ollama help

# Test the API endpoint
curl http://localhost:11434/api/tags
# Should return: {"models":[]}

GPU Detection

Ollama automatically detects and uses available GPUs:

# Check what Ollama sees
ollama run llama3.2 --verbose 2>&1 | head -20

# You should see output like:
# ggml_cuda_init: found 1 CUDA devices:
#   Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9

# Or for Apple Silicon:
# ggml_metal_init: found 1 Metal devices:
#   Device 0: Apple M3 Max

Common Installation Issues

Port Already in Use

# Check if something is using port 11434
lsof -i :11434

# Kill the process or change Ollama's port
OLLAMA_HOST=0.0.0.0:11435 ollama serve

Insufficient Permissions (Linux)

# Add yourself to the ollama group
sudo usermod -aG ollama $USER

# Log out and back in, or:
newgrp ollama

GPU Not Detected

# For NVIDIA, ensure drivers are installed
nvidia-smi

# For AMD (ROCm)
rocm-smi

# If still not detected, try reinstalling with GPU support
curl -fsSL https://ollama.com/install.sh | sh

Storage Location

Ollama stores models in:

Platform	Default Location
macOS	`~/.ollama/models`
Linux	`/usr/share/ollama/.ollama/models`
Windows	`C:\Users\<user>\.ollama\models`

To change storage location:

# Set custom model directory
export OLLAMA_MODELS=/path/to/models
ollama serve

Ollama Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Ollama Architecture                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   CLI (ollama run/pull/...)                                     │
│           │                                                     │
│           ▼                                                     │
│   ┌───────────────────┐                                         │
│   │   Ollama Server   │ ◄── REST API (port 11434)              │
│   │   (background)    │                                         │
│   └─────────┬─────────┘                                         │
│             │                                                   │
│             ▼                                                   │
│   ┌───────────────────┐                                         │
│   │   llama.cpp       │ ◄── Inference engine                   │
│   │   (optimized C++) │                                         │
│   └─────────┬─────────┘                                         │
│             │                                                   │
│             ▼                                                   │
│   ┌───────────────────┐                                         │
│   │   GPU/CPU         │ ◄── Hardware acceleration              │
│   │   (CUDA/Metal/CPU)│                                         │
│   └───────────────────┘                                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

You're now ready to run your first model! :::