Ollama Fundamentals
Installing Ollama
3 min read
Ollama makes running local LLMs as easy as running Docker containers. Let's get it installed on your system.
What is Ollama?
┌─────────────────────────────────────────────────────────────────┐
│ Ollama │
├─────────────────────────────────────────────────────────────────┤
│ │
│ • Open-source LLM runner (MIT License) │
│ • Runs GGUF quantized models │
│ • Built-in model management (pull, run, delete) │
│ • REST API on port 11434 │
│ • Native optimization for Apple Silicon, NVIDIA, AMD │
│ • Simple CLI interface │
│ │
│ Think of it as "Docker for LLMs" │
│ │
└─────────────────────────────────────────────────────────────────┘
Installation by Platform
macOS (Recommended: Native Install)
# Option 1: Direct download (easiest)
# Visit https://ollama.com/download and download the app
# Option 2: Homebrew
brew install ollama
# Verify installation
ollama --version
# Output: ollama version 0.4.x
After installation, Ollama runs as a background service automatically.
Linux
# One-line installer (recommended)
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
# Check service status
systemctl status ollama
For manual installation or specific distributions:
# Ubuntu/Debian manual install
curl -L https://ollama.com/download/ollama-linux-amd64 -o ollama
chmod +x ollama
sudo mv ollama /usr/local/bin/
# Start Ollama server
ollama serve
Windows
# Option 1: Download installer from https://ollama.com/download
# Option 2: Using winget
winget install Ollama.Ollama
# Verify installation
ollama --version
Docker (All Platforms)
# CPU only
docker run -d -v ollama:/root/.ollama -p 11434:11434 \
--name ollama ollama/ollama
# With NVIDIA GPU
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 \
--name ollama ollama/ollama
# Verify it's running
curl http://localhost:11434/api/version
Verifying Your Installation
# Check version
ollama --version
# List available commands
ollama help
# Test the API endpoint
curl http://localhost:11434/api/tags
# Should return: {"models":[]}
GPU Detection
Ollama automatically detects and uses available GPUs:
# Check what Ollama sees
ollama run llama3.2 --verbose 2>&1 | head -20
# You should see output like:
# ggml_cuda_init: found 1 CUDA devices:
# Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9
# Or for Apple Silicon:
# ggml_metal_init: found 1 Metal devices:
# Device 0: Apple M3 Max
Common Installation Issues
Port Already in Use
# Check if something is using port 11434
lsof -i :11434
# Kill the process or change Ollama's port
OLLAMA_HOST=0.0.0.0:11435 ollama serve
Insufficient Permissions (Linux)
# Add yourself to the ollama group
sudo usermod -aG ollama $USER
# Log out and back in, or:
newgrp ollama
GPU Not Detected
# For NVIDIA, ensure drivers are installed
nvidia-smi
# For AMD (ROCm)
rocm-smi
# If still not detected, try reinstalling with GPU support
curl -fsSL https://ollama.com/install.sh | sh
Storage Location
Ollama stores models in:
| Platform | Default Location |
|---|---|
| macOS | ~/.ollama/models |
| Linux | /usr/share/ollama/.ollama/models |
| Windows | C:\Users\<user>\.ollama\models |
To change storage location:
# Set custom model directory
export OLLAMA_MODELS=/path/to/models
ollama serve
Ollama Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Ollama Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ CLI (ollama run/pull/...) │
│ │ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ Ollama Server │ ◄── REST API (port 11434) │
│ │ (background) │ │
│ └─────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ llama.cpp │ ◄── Inference engine │
│ │ (optimized C++) │ │
│ └─────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ GPU/CPU │ ◄── Hardware acceleration │
│ │ (CUDA/Metal/CPU)│ │
│ └───────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
You're now ready to run your first model! :::