LLM Gateways & Routing

LLM Gateway Architecture

3 min read

LLM gateways provide a unified interface for managing multiple LLM providers, enabling routing, fallbacks, cost optimization, and observability from a single control point.

Why LLM Gateways?

Without Gateway:                With Gateway:
─────────────────               ────────────────

┌─────────────┐                 ┌─────────────┐
│  Service A  │───→ OpenAI     │  Service A  │─┐
└─────────────┘                 └─────────────┘ │
┌─────────────┐                 ┌─────────────┐ │   ┌─────────┐
│  Service B  │───→ Anthropic  │  Service B  │─┼──→│ Gateway │───→ Providers
└─────────────┘                 └─────────────┘ │   └─────────┘
                                                │       │
┌─────────────┐                 ┌─────────────┐ │   ┌───┴───┐
│  Service C  │───→ Azure      │  Service C  │─┘   │OpenAI │
└─────────────┘                 └─────────────┘     │Anthro │
                                                    │Azure  │
Problems:                       Benefits:           │Bedrock│
• N×M integrations             • Single API        └───────┘
• No unified monitoring        • Central control
• Hard to switch providers     • Easy fallbacks
• Scattered credentials        • Unified logging

Gateway Features

┌─────────────────────────────────────────────────────────────┐
│                   LLM Gateway Capabilities                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Routing & Load Balancing                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  • Model-based routing                               │   │
│  │  • Latency-based routing                             │   │
│  │  • Cost-based routing                                │   │
│  │  • Weighted distribution                             │   │
│  │  • Fallback chains                                   │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Reliability                                                │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  • Automatic retries with backoff                    │   │
│  │  • Fallback to alternative providers                 │   │
│  │  • Health checks and circuit breakers                │   │
│  │  • Request queuing during outages                    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Cost Management                                            │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  • Budget limits per team/project                    │   │
│  │  • Rate limiting by user/API key                     │   │
│  │  • Spend tracking and alerts                         │   │
│  │  • Automatic model downgrade on budget               │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Security & Governance                                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  • Centralized credential management                 │   │
│  │  • Virtual API keys for teams                        │   │
│  │  • Request/response logging                          │   │
│  │  • PII detection and redaction                       │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Gateway Solutions Comparison

Solution Type Best For Key Features
LiteLLM Open-source Flexibility, self-host 100+ providers, async, A2A
Helicone SaaS/Self-host Observability 8ms latency, caching
Portkey SaaS Enterprise Governance, security
Martian SaaS Routing optimization Auto-routing
OpenRouter SaaS Simple access Pay-per-use

Common Architecture Patterns

Pattern 1: Central Gateway

┌──────────────────────────────────────────────────────────┐
│                    Central Gateway                        │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  All services → Gateway → Providers                      │
│                                                          │
│  Pros:                      Cons:                        │
│  • Single point of control  • Single point of failure    │
│  • Easy management          • Latency added              │
│  • Unified logging          • Scaling complexity         │
│                                                          │
└──────────────────────────────────────────────────────────┘

Pattern 2: Sidecar Gateway

┌──────────────────────────────────────────────────────────┐
│                    Sidecar Gateway                        │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  ┌─────────────────────┐                                 │
│  │  Pod                │                                 │
│  │  ┌────────┐ ┌─────┐│                                 │
│  │  │ Service│→│Sidecar│→ Providers                      │
│  │  └────────┘ └─────┘│                                 │
│  └─────────────────────┘                                 │
│                                                          │
│  Pros:                      Cons:                        │
│  • No network hop          • Per-pod resource usage      │
│  • Local caching           • Configuration sprawl        │
│  • Service isolation       • Complex updates             │
│                                                          │
└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│                    Hybrid Pattern                         │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  SDK (client-side) + Central Gateway (server-side)       │
│                                                          │
│  ┌────────────┐                                         │
│  │  Service   │                                         │
│  │  ┌───────┐ │     ┌─────────┐                         │
│  │  │LiteLLM│──────→│ Gateway │──→ Providers            │
│  │  │  SDK  │ │     │(routing)│                         │
│  │  └───────┘ │     └─────────┘                         │
│  └────────────┘                                         │
│                                                          │
│  SDK handles: retries, formatting, streaming             │
│  Gateway handles: routing, budgets, logging              │
│                                                          │
└──────────────────────────────────────────────────────────┘

Key Considerations

  1. Latency: Gateway adds 1-10ms overhead; acceptable for most LLM calls (100ms+)
  2. Reliability: Gateway should be highly available; consider multi-region
  3. Caching: Gateway can cache identical requests for significant cost savings
  4. Compliance: Central logging helps with audit requirements
  5. Team Access: Virtual keys enable per-team quotas and tracking :::

Quiz

Module 5: LLM Gateways & Routing

Take Quiz