Ollama API Basics

Ollama provides a REST API that makes it easy to integrate local LLMs into any application. Let's explore the core endpoints.

API Overview

┌─────────────────────────────────────────────────────────────────┐
│                    Ollama REST API                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Base URL: http://localhost:11434                               │
│                                                                 │
│  Endpoints:                                                     │
│  ──────────                                                     │
│  POST /api/generate     - Generate text (completion)            │
│  POST /api/chat         - Chat conversation                     │
│  POST /api/embed        - Generate embeddings                   │
│  GET  /api/tags         - List models                           │
│  POST /api/pull         - Pull a model                          │
│  DELETE /api/delete     - Delete a model                        │
│  POST /api/show         - Model information                     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Generate Endpoint

The /api/generate endpoint is for text completion:

# Basic generation
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "The capital of France is",
  "stream": false
}'

# Response:
{
  "model": "llama3.2",
  "response": " Paris.",
  "done": true,
  "context": [1, 2, 3, ...],
  "total_duration": 1234567890,
  "load_duration": 123456789,
  "prompt_eval_count": 7,
  "eval_count": 3
}

Generation with Parameters

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Write a short poem about coding",
  "stream": false,
  "options": {
    "temperature": 0.7,
    "top_p": 0.9,
    "num_ctx": 4096,
    "stop": ["\n\n"]
  }
}'

Chat Endpoint

The /api/chat endpoint handles multi-turn conversations:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a programming language."},
    {"role": "user", "content": "What is it used for?"}
  ],
  "stream": false
}'

# Response:
{
  "model": "llama3.2",
  "message": {
    "role": "assistant",
    "content": "Python is used for web development, data science..."
  },
  "done": true
}

Streaming Responses

For real-time output, use streaming:

# Stream generates one JSON object per token
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Tell me a story",
  "stream": true
}'

# Output (multiple JSON lines):
{"model":"llama3.2","response":"Once","done":false}
{"model":"llama3.2","response":" upon","done":false}
{"model":"llama3.2","response":" a","done":false}
{"model":"llama3.2","response":" time","done":false}
...
{"model":"llama3.2","response":"","done":true}

Processing Streams in Shell

# Pretty print streamed output
curl -s http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain AI in one sentence",
  "stream": true
}' | while read line; do
  echo "$line" | jq -r '.response' | tr -d '\n'
done
echo

Embeddings Endpoint

Generate vector embeddings for RAG and semantic search:

curl http://localhost:11434/api/embed -d '{
  "model": "llama3.2",
  "input": "Ollama is a great tool for local LLMs"
}'

# Response:
{
  "model": "llama3.2",
  "embeddings": [[0.123, -0.456, 0.789, ...]]
}

# Multiple inputs
curl http://localhost:11434/api/embed -d '{
  "model": "llama3.2",
  "input": ["First text", "Second text", "Third text"]
}'

Model Management Endpoints

List Models

curl http://localhost:11434/api/tags

# Response:
{
  "models": [
    {
      "name": "llama3.2:latest",
      "modified_at": "2024-12-15T10:30:00Z",
      "size": 4700000000,
      "digest": "sha256:abc123..."
    }
  ]
}

Model Information

curl http://localhost:11434/api/show -d '{
  "model": "llama3.2"
}'

# Response includes modelfile, parameters, template

Pull Model

# Pull with progress (stream)
curl http://localhost:11434/api/pull -d '{
  "name": "mistral",
  "stream": true
}'

Error Handling

# Model not found
curl http://localhost:11434/api/generate -d '{
  "model": "nonexistent-model",
  "prompt": "Hello"
}'

# Response:
{
  "error": "model 'nonexistent-model' not found"
}

Common HTTP status codes:

200: Success
400: Bad request (invalid parameters)
404: Model not found
500: Server error

API Quick Reference

Endpoint	Method	Purpose
`/api/generate`	POST	Text completion
`/api/chat`	POST	Chat conversation
`/api/embed`	POST	Generate embeddings
`/api/tags`	GET	List models
`/api/show`	POST	Model details
`/api/pull`	POST	Download model
`/api/delete`	DELETE	Remove model
`/api/copy`	POST	Copy model

The REST API provides everything you need to integrate Ollama into any application. In the next lesson, we'll use Python for a more convenient developer experience. :::