Ollama Fundamentals
Running Your First Model
3 min read
Let's pull and run your first local LLM. In just a few minutes, you'll have a powerful AI running entirely on your machine.
Your First Model: Llama 3.2
# Pull the model (one-time download)
ollama pull llama3.2
# Output:
# pulling manifest
# pulling 8934d96d3f08... 100% ▕████████████████▏ 4.7 GB
# pulling 8c17c2ebb0ea... 100% ▕████████████████▏ 7.0 KB
# verifying sha256 digest
# writing manifest
# success
The 8B model is about 4.7 GB. Download time depends on your internet speed.
Running the Model
# Start an interactive chat session
ollama run llama3.2
# You'll see:
>>>
# Type your message and press Enter
>>> What is the capital of France?
The capital of France is Paris.
>>>
Basic Chat Interactions
>>> Explain quantum computing in simple terms
Quantum computing uses quantum mechanics principles to process
information in ways classical computers cannot.
Key concepts:
1. **Qubits**: Unlike classical bits (0 or 1), qubits can exist
in superposition - being 0 and 1 simultaneously.
2. **Entanglement**: Qubits can be connected so measuring one
instantly affects the other, regardless of distance.
3. **Quantum speedup**: For specific problems, quantum computers
can explore many solutions at once, dramatically faster.
>>> /exit
Useful Chat Commands
Inside the chat session, you can use special commands:
>>> /help # Show available commands
>>> /clear # Clear the conversation history
>>> /set parameter # Change runtime parameters
>>> /show info # Display model information
>>> /exit # Exit the chat session (or Ctrl+D)
Multi-line Input
For longer prompts, use triple quotes:
>>> """
... Write a Python function that:
... 1. Takes a list of numbers
... 2. Returns the sum of even numbers
... 3. Includes docstring and type hints
... """
def sum_even_numbers(numbers: list[int]) -> int:
"""
Calculate the sum of even numbers in a list.
Args:
numbers: A list of integers
Returns:
The sum of all even numbers in the list
"""
return sum(n for n in numbers if n % 2 == 0)
Running Different Models
# List downloaded models
ollama list
# NAME ID SIZE MODIFIED
# llama3.2:latest a80c4f17acd5 4.7 GB 5 minutes ago
# Pull and run Mistral
ollama pull mistral
ollama run mistral
# Pull a specific size variant
ollama pull llama3.2:70b # Larger, more capable
ollama pull llama3.2:1b # Smaller, faster
One-liner Queries
For scripting and quick queries:
# Single prompt (no interactive session)
ollama run llama3.2 "What is 2+2?"
# Output: 2 + 2 = 4
# Pipe content to the model
echo "Summarize this text:" | ollama run llama3.2
# Read from file
cat document.txt | ollama run llama3.2 "Summarize:"
Comparing Models
Try the same prompt with different models:
# Test coding ability
echo "Write a binary search in Python" | ollama run llama3.2
echo "Write a binary search in Python" | ollama run mistral
echo "Write a binary search in Python" | ollama run deepseek-coder
Performance Indicators
Watch the output speed to understand model performance:
# Run with verbose output
ollama run llama3.2 --verbose
>>> Hello
Hello! How can I help you today?
# Stats shown after response:
# total duration: 1.234s
# load duration: 0.123s
# prompt eval count: 5 token(s)
# prompt eval duration: 0.050s
# prompt eval rate: 100.00 tokens/s
# eval count: 12 token(s)
# eval duration: 0.800s
# eval rate: 15.00 tokens/s <-- Generation speed
Model Storage
# Check model storage location
ls ~/.ollama/models/
# See disk usage
du -sh ~/.ollama/models/*
# Remove a model to free space
ollama rm mistral
Quick Reference
| Command | Description |
|---|---|
ollama pull <model> |
Download a model |
ollama run <model> |
Start interactive chat |
ollama run <model> "prompt" |
Single query |
ollama list |
Show downloaded models |
ollama rm <model> |
Delete a model |
ollama show <model> |
Show model details |
You now have a local LLM running on your machine. No internet required, no API costs, complete privacy. In the next lesson, we'll explore more CLI commands and parameters. :::