Istio for ML Workloads

Istio service mesh provides essential capabilities for production ML systems: traffic management, security, and observability. In 2025-2026, Istio's Gateway API Inference Extension brings ML-specific features for intelligent routing.

Service Mesh Architecture for ML

┌─────────────────────────────────────────────────────────────────────┐
│                    Istio for ML Architecture                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                   Istio Control Plane                        │    │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐                   │    │
│  │  │  Istiod  │  │  Pilot   │  │  Citadel │                   │    │
│  │  │ (Config) │  │ (Traffic)│  │ (mTLS)   │                   │    │
│  │  └──────────┘  └──────────┘  └──────────┘                   │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                              │                                       │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                   Data Plane (Envoy Sidecars)                │    │
│  │                                                               │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │    │
│  │  │ Inference   │  │  Feature    │  │  Model      │          │    │
│  │  │ Service     │  │  Store      │  │  Registry   │          │    │
│  │  │ [Envoy]     │  │  [Envoy]    │  │  [Envoy]    │          │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘          │    │
│  │                                                               │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │    │
│  │  │ Training    │  │  Pipeline   │  │  Monitoring │          │    │
│  │  │ Jobs        │  │  Controller │  │  Stack      │          │    │
│  │  │ [Envoy]     │  │  [Envoy]    │  │  [Envoy]    │          │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘          │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Istio Installation for ML Platform

# Install Istio with ML-optimized profile
istioctl install --set profile=default \
  --set meshConfig.accessLogFile=/dev/stdout \
  --set meshConfig.enableTracing=true \
  --set values.pilot.traceSampling=10

# Label namespace for sidecar injection
kubectl label namespace ml-serving istio-injection=enabled
kubectl label namespace ml-training istio-injection=enabled

# Verify installation
istioctl verify-install
kubectl get pods -n istio-system

Gateway API Inference Extension (2025)

# Gateway API with Inference Extension for ML routing
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: ml-gateway
  namespace: ml-serving
spec:
  gatewayClassName: istio
  listeners:
  - name: http
    port: 80
    protocol: HTTP
  - name: grpc
    port: 8081
    protocol: GRPC
---
# Inference-aware HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: inference-route
  namespace: ml-serving
spec:
  parentRefs:
  - name: ml-gateway
  hostnames:
  - "inference.ml.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1/models
    backendRefs:
    - name: kserve-predictor
      port: 8080
      weight: 100

mTLS Configuration for ML Services

# Strict mTLS for all ML services
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: ml-mtls
  namespace: ml-serving
spec:
  mtls:
    mode: STRICT
---
# Allow specific services to communicate
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: inference-access
  namespace: ml-serving
spec:
  selector:
    matchLabels:
      app: inference-service
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/ml-frontend/sa/frontend-sa"
        - "cluster.local/ns/ml-serving/sa/gateway-sa"
    to:
    - operation:
        methods: ["POST", "GET"]
        paths: ["/v1/models/*", "/v2/models/*"]

Traffic Management for ML

# Virtual Service for inference routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: inference-routing
  namespace: ml-serving
spec:
  hosts:
  - inference-service
  http:
  # Route based on model version header
  - match:
    - headers:
        x-model-version:
          exact: "v2"
    route:
    - destination:
        host: inference-v2
        port:
          number: 8080
    timeout: 30s
    retries:
      attempts: 3
      perTryTimeout: 10s
      retryOn: 5xx,reset,connect-failure

  # Default route with traffic split
  - route:
    - destination:
        host: inference-v1
        port:
          number: 8080
      weight: 90
    - destination:
        host: inference-v2
        port:
          number: 8080
      weight: 10
---
# Destination Rule with circuit breaking
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: inference-destination
  namespace: ml-serving
spec:
  host: inference-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1000
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 1000
        http2MaxRequests: 1000
        maxRequestsPerConnection: 100
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 60s
      maxEjectionPercent: 50
    loadBalancer:
      simple: LEAST_REQUEST

Request Timeouts for Inference

# Long timeouts for LLM inference
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: llm-timeouts
spec:
  hosts:
  - llm-service
  http:
  - match:
    - uri:
        prefix: /v1/chat/completions
    timeout: 120s
    retries:
      attempts: 2
      perTryTimeout: 60s
      retryOn: 5xx,reset
  - match:
    - uri:
        prefix: /v1/embeddings
    timeout: 30s
    retries:
      attempts: 3
      perTryTimeout: 10s

Sidecar Resource Configuration

# Optimized sidecar for ML workloads
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    defaultConfig:
      proxyMetadata:
        # Increase buffer sizes for large ML payloads
        ISTIO_META_HTTP10: "1"
      concurrency: 2
  values:
    global:
      proxy:
        resources:
          requests:
            cpu: 100m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
---
# Per-pod sidecar override for inference
apiVersion: v1
kind: Pod
metadata:
  annotations:
    sidecar.istio.io/proxyCPU: "500m"
    sidecar.istio.io/proxyMemory: "512Mi"
    sidecar.istio.io/proxyCPULimit: "1000m"
    sidecar.istio.io/proxyMemoryLimit: "1Gi"

Next lesson: Observability and distributed tracing for ML pipelines. :::