Service Mesh & Networking for ML
Istio for ML Workloads
4 min read
Istio service mesh provides essential capabilities for production ML systems: traffic management, security, and observability. In 2025-2026, Istio's Gateway API Inference Extension brings ML-specific features for intelligent routing.
Service Mesh Architecture for ML
┌─────────────────────────────────────────────────────────────────────┐
│ Istio for ML Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Istio Control Plane │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Istiod │ │ Pilot │ │ Citadel │ │ │
│ │ │ (Config) │ │ (Traffic)│ │ (mTLS) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Data Plane (Envoy Sidecars) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Inference │ │ Feature │ │ Model │ │ │
│ │ │ Service │ │ Store │ │ Registry │ │ │
│ │ │ [Envoy] │ │ [Envoy] │ │ [Envoy] │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Training │ │ Pipeline │ │ Monitoring │ │ │
│ │ │ Jobs │ │ Controller │ │ Stack │ │ │
│ │ │ [Envoy] │ │ [Envoy] │ │ [Envoy] │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Istio Installation for ML Platform
# Install Istio with ML-optimized profile
istioctl install --set profile=default \
--set meshConfig.accessLogFile=/dev/stdout \
--set meshConfig.enableTracing=true \
--set values.pilot.traceSampling=10
# Label namespace for sidecar injection
kubectl label namespace ml-serving istio-injection=enabled
kubectl label namespace ml-training istio-injection=enabled
# Verify installation
istioctl verify-install
kubectl get pods -n istio-system
Gateway API Inference Extension (2025)
# Gateway API with Inference Extension for ML routing
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: ml-gateway
namespace: ml-serving
spec:
gatewayClassName: istio
listeners:
- name: http
port: 80
protocol: HTTP
- name: grpc
port: 8081
protocol: GRPC
---
# Inference-aware HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: inference-route
namespace: ml-serving
spec:
parentRefs:
- name: ml-gateway
hostnames:
- "inference.ml.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/models
backendRefs:
- name: kserve-predictor
port: 8080
weight: 100
mTLS Configuration for ML Services
# Strict mTLS for all ML services
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: ml-mtls
namespace: ml-serving
spec:
mtls:
mode: STRICT
---
# Allow specific services to communicate
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: inference-access
namespace: ml-serving
spec:
selector:
matchLabels:
app: inference-service
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/ml-frontend/sa/frontend-sa"
- "cluster.local/ns/ml-serving/sa/gateway-sa"
to:
- operation:
methods: ["POST", "GET"]
paths: ["/v1/models/*", "/v2/models/*"]
Traffic Management for ML
# Virtual Service for inference routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: inference-routing
namespace: ml-serving
spec:
hosts:
- inference-service
http:
# Route based on model version header
- match:
- headers:
x-model-version:
exact: "v2"
route:
- destination:
host: inference-v2
port:
number: 8080
timeout: 30s
retries:
attempts: 3
perTryTimeout: 10s
retryOn: 5xx,reset,connect-failure
# Default route with traffic split
- route:
- destination:
host: inference-v1
port:
number: 8080
weight: 90
- destination:
host: inference-v2
port:
number: 8080
weight: 10
---
# Destination Rule with circuit breaking
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: inference-destination
namespace: ml-serving
spec:
host: inference-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1000
http:
h2UpgradePolicy: UPGRADE
http1MaxPendingRequests: 1000
http2MaxRequests: 1000
maxRequestsPerConnection: 100
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 60s
maxEjectionPercent: 50
loadBalancer:
simple: LEAST_REQUEST
Request Timeouts for Inference
# Long timeouts for LLM inference
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: llm-timeouts
spec:
hosts:
- llm-service
http:
- match:
- uri:
prefix: /v1/chat/completions
timeout: 120s
retries:
attempts: 2
perTryTimeout: 60s
retryOn: 5xx,reset
- match:
- uri:
prefix: /v1/embeddings
timeout: 30s
retries:
attempts: 3
perTryTimeout: 10s
Sidecar Resource Configuration
# Optimized sidecar for ML workloads
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
meshConfig:
defaultConfig:
proxyMetadata:
# Increase buffer sizes for large ML payloads
ISTIO_META_HTTP10: "1"
concurrency: 2
values:
global:
proxy:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
---
# Per-pod sidecar override for inference
apiVersion: v1
kind: Pod
metadata:
annotations:
sidecar.istio.io/proxyCPU: "500m"
sidecar.istio.io/proxyMemory: "512Mi"
sidecar.istio.io/proxyCPULimit: "1000m"
sidecar.istio.io/proxyMemoryLimit: "1Gi"
Next lesson: Observability and distributed tracing for ML pipelines. :::