Building Applications with Ollama
Ollama API Basics
3 min read
Ollama provides a REST API that makes it easy to integrate local LLMs into any application. Let's explore the core endpoints.
API Overview
┌─────────────────────────────────────────────────────────────────┐
│ Ollama REST API │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Base URL: http://localhost:11434 │
│ │
│ Endpoints: │
│ ────────── │
│ POST /api/generate - Generate text (completion) │
│ POST /api/chat - Chat conversation │
│ POST /api/embed - Generate embeddings │
│ GET /api/tags - List models │
│ POST /api/pull - Pull a model │
│ DELETE /api/delete - Delete a model │
│ POST /api/show - Model information │
│ │
└─────────────────────────────────────────────────────────────────┘
Generate Endpoint
The /api/generate endpoint is for text completion:
# Basic generation
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "The capital of France is",
"stream": false
}'
# Response:
{
"model": "llama3.2",
"response": " Paris.",
"done": true,
"context": [1, 2, 3, ...],
"total_duration": 1234567890,
"load_duration": 123456789,
"prompt_eval_count": 7,
"eval_count": 3
}
Generation with Parameters
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Write a short poem about coding",
"stream": false,
"options": {
"temperature": 0.7,
"top_p": 0.9,
"num_ctx": 4096,
"stop": ["\n\n"]
}
}'
Chat Endpoint
The /api/chat endpoint handles multi-turn conversations:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"},
{"role": "assistant", "content": "Python is a programming language."},
{"role": "user", "content": "What is it used for?"}
],
"stream": false
}'
# Response:
{
"model": "llama3.2",
"message": {
"role": "assistant",
"content": "Python is used for web development, data science..."
},
"done": true
}
Streaming Responses
For real-time output, use streaming:
# Stream generates one JSON object per token
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Tell me a story",
"stream": true
}'
# Output (multiple JSON lines):
{"model":"llama3.2","response":"Once","done":false}
{"model":"llama3.2","response":" upon","done":false}
{"model":"llama3.2","response":" a","done":false}
{"model":"llama3.2","response":" time","done":false}
...
{"model":"llama3.2","response":"","done":true}
Processing Streams in Shell
# Pretty print streamed output
curl -s http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain AI in one sentence",
"stream": true
}' | while read line; do
echo "$line" | jq -r '.response' | tr -d '\n'
done
echo
Embeddings Endpoint
Generate vector embeddings for RAG and semantic search:
curl http://localhost:11434/api/embed -d '{
"model": "llama3.2",
"input": "Ollama is a great tool for local LLMs"
}'
# Response:
{
"model": "llama3.2",
"embeddings": [[0.123, -0.456, 0.789, ...]]
}
# Multiple inputs
curl http://localhost:11434/api/embed -d '{
"model": "llama3.2",
"input": ["First text", "Second text", "Third text"]
}'
Model Management Endpoints
List Models
curl http://localhost:11434/api/tags
# Response:
{
"models": [
{
"name": "llama3.2:latest",
"modified_at": "2024-12-15T10:30:00Z",
"size": 4700000000,
"digest": "sha256:abc123..."
}
]
}
Model Information
curl http://localhost:11434/api/show -d '{
"model": "llama3.2"
}'
# Response includes modelfile, parameters, template
Pull Model
# Pull with progress (stream)
curl http://localhost:11434/api/pull -d '{
"name": "mistral",
"stream": true
}'
Error Handling
# Model not found
curl http://localhost:11434/api/generate -d '{
"model": "nonexistent-model",
"prompt": "Hello"
}'
# Response:
{
"error": "model 'nonexistent-model' not found"
}
Common HTTP status codes:
200: Success400: Bad request (invalid parameters)404: Model not found500: Server error
API Quick Reference
| Endpoint | Method | Purpose |
|---|---|---|
/api/generate |
POST | Text completion |
/api/chat |
POST | Chat conversation |
/api/embed |
POST | Generate embeddings |
/api/tags |
GET | List models |
/api/show |
POST | Model details |
/api/pull |
POST | Download model |
/api/delete |
DELETE | Remove model |
/api/copy |
POST | Copy model |
The REST API provides everything you need to integrate Ollama into any application. In the next lesson, we'll use Python for a more convenient developer experience. :::