Ollama Fundamentals
Ollama CLI Mastery
3 min read
The Ollama CLI is your primary interface for managing and running models. Let's explore its full capabilities.
Complete Command Reference
┌─────────────────────────────────────────────────────────────────┐
│ Ollama CLI Commands │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Model Management Server Control │
│ ──────────────── ────────────── │
│ pull - Download model serve - Start server │
│ push - Upload model ps - List running models │
│ list - Show models stop - Stop a running model │
│ rm - Delete model │
│ cp - Copy model Information │
│ create - Create custom ─────────── │
│ show - Model details │
│ Execution help - Show help │
│ ───────── version - Show version │
│ run - Interactive │
│ │
└─────────────────────────────────────────────────────────────────┘
Model Management Commands
Listing Models
# List all downloaded models
ollama list
# NAME ID SIZE MODIFIED
# llama3.2:latest a80c4f17acd5 4.7 GB 2 hours ago
# mistral:latest 2ae6f6dd7a3d 4.1 GB 1 day ago
# deepseek-coder:latest 8934d96d3f08 4.7 GB 3 days ago
Model Information
# Show detailed model info
ollama show llama3.2
# Output includes:
# - Model architecture
# - Parameters
# - Quantization level
# - License
# - Template format
# Show specific sections
ollama show llama3.2 --modelfile # Show Modelfile
ollama show llama3.2 --license # Show license
ollama show llama3.2 --template # Show prompt template
ollama show llama3.2 --parameters # Show parameters
Pulling Specific Versions
# Pull latest version
ollama pull llama3.2
# Pull specific size
ollama pull llama3.2:8b # 8 billion parameters
ollama pull llama3.2:70b # 70 billion parameters
ollama pull llama3.2:1b # 1 billion parameters
# Pull specific quantization
ollama pull llama3.2:8b-q4_0 # 4-bit quantization
ollama pull llama3.2:8b-q8_0 # 8-bit quantization
Copying and Renaming
# Copy a model (useful before modifications)
ollama cp llama3.2 my-llama
# Now you have both:
# - llama3.2 (original)
# - my-llama (copy for customization)
Running Models with Parameters
Temperature Control
# Lower temperature = more deterministic
ollama run llama3.2 --temperature 0.1 "Write a haiku about coding"
# Higher temperature = more creative
ollama run llama3.2 --temperature 1.5 "Write a haiku about coding"
Context Length
# Increase context window (uses more memory)
ollama run llama3.2 --num-ctx 8192 "Summarize this long document..."
# Default is typically 2048 or 4096
GPU Layers
# Control how much runs on GPU vs CPU
ollama run llama3.2 --num-gpu 35 # 35 layers on GPU
# 0 = CPU only (useful if GPU memory is low)
ollama run llama3.2 --num-gpu 0
Runtime Parameters Table
| Parameter | Description | Default | Range |
|---|---|---|---|
--temperature |
Randomness | 0.8 | 0.0-2.0 |
--top-p |
Nucleus sampling | 0.9 | 0.0-1.0 |
--top-k |
Top-k sampling | 40 | 1-100 |
--num-ctx |
Context length | 2048 | 512-32768 |
--num-gpu |
GPU layers | auto | 0-100 |
--repeat-penalty |
Repetition penalty | 1.1 | 0.0-2.0 |
Process Management
Viewing Running Models
# See what models are currently loaded
ollama ps
# NAME ID SIZE PROCESSOR UNTIL
# llama3.2 a80c4f17acd5 5.1 GB 100% GPU 4 minutes
# mistral 2ae6f6dd7a3d 4.5 GB 50% GPU/CPU Idle
Stopping Models
# Stop a specific model (free memory)
ollama stop llama3.2
# Models also unload automatically after timeout (default: 5 min)
Keep Models Loaded
# Keep a model loaded indefinitely
ollama run llama3.2 --keepalive -1
# Set specific duration
ollama run llama3.2 --keepalive 30m
Server Configuration
Environment Variables
# Change API host/port
export OLLAMA_HOST=0.0.0.0:11434
# Set model storage location
export OLLAMA_MODELS=/mnt/external/models
# Control GPU usage
export OLLAMA_NUM_PARALLEL=2 # Concurrent requests
export OLLAMA_MAX_LOADED_MODELS=3 # Models in memory
# Start server with config
ollama serve
Useful Server Flags
# Start server in debug mode
OLLAMA_DEBUG=1 ollama serve
# Allow cross-origin requests (for web apps)
OLLAMA_ORIGINS="*" ollama serve
Advanced Usage Patterns
Batch Processing
# Process multiple files
for file in *.txt; do
echo "Processing $file..."
cat "$file" | ollama run llama3.2 "Summarize:" > "${file%.txt}_summary.txt"
done
JSON Output
# Get structured JSON response
ollama run llama3.2 "List 3 programming languages as JSON array" --format json
# Output: ["Python", "JavaScript", "Go"]
Chaining Models
# Use one model's output as input to another
ollama run llama3.2 "Write a story about AI" | \
ollama run mistral "Critique this story:"
Quick Debugging
# Check server logs
journalctl -u ollama -f # Linux (systemd)
tail -f ~/.ollama/logs/server.log # macOS
# Test API directly
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello"
}'
# Check GPU utilization
watch -n 1 nvidia-smi # NVIDIA
The CLI gives you full control over your local LLM workflow. In the next lesson, we'll learn to create custom models with Modelfiles. :::