Service Mesh & Networking for ML

Network Policies for ML Security

3 min read

ML platforms handle sensitive data and models that require strict network segmentation. Kubernetes NetworkPolicies combined with Istio authorization policies provide defense-in-depth security for ML workloads.

ML Network Security Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    ML Network Security Layers                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  Layer 1: Kubernetes NetworkPolicy (L3/L4)                   │    │
│  │  - Pod-to-pod communication rules                            │    │
│  │  - Namespace isolation                                        │    │
│  │  - CIDR-based egress control                                 │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                              │                                       │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  Layer 2: Istio AuthorizationPolicy (L7)                     │    │
│  │  - Service-to-service authorization                          │    │
│  │  - HTTP method/path-based rules                              │    │
│  │  - JWT validation                                            │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                              │                                       │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  Layer 3: mTLS Encryption                                    │    │
│  │  - All traffic encrypted                                      │    │
│  │  - Certificate-based identity                                 │    │
│  │  - Automatic rotation                                         │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Default Deny Policy

# Default deny all ingress/egress for ML namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: ml-serving
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
# Allow DNS for all pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: ml-serving
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53

Inference Service Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: inference-service-policy
  namespace: ml-serving
spec:
  podSelector:
    matchLabels:
      app: inference-service
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Allow from Istio ingress gateway
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: istio-system
      podSelector:
        matchLabels:
          istio: ingressgateway
    ports:
    - protocol: TCP
      port: 8080
  # Allow from frontend services
  - from:
    - namespaceSelector:
        matchLabels:
          app.kubernetes.io/part-of: ml-platform
      podSelector:
        matchLabels:
          role: frontend
    ports:
    - protocol: TCP
      port: 8080
  # Allow Prometheus scraping
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
      podSelector:
        matchLabels:
          app: prometheus
    ports:
    - protocol: TCP
      port: 8082
  egress:
  # Allow to model storage (S3/GCS)
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
    ports:
    - protocol: TCP
      port: 443
  # Allow to feature store
  - to:
    - podSelector:
        matchLabels:
          app: feature-store
    ports:
    - protocol: TCP
      port: 6566

Training Job Isolation

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: training-job-isolation
  namespace: ml-training
spec:
  podSelector:
    matchLabels:
      workload-type: training
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Only allow from pipeline controller
  - from:
    - podSelector:
        matchLabels:
          app: kubeflow-pipelines
    ports:
    - protocol: TCP
      port: 2222
  egress:
  # Allow NCCL for distributed training
  - to:
    - podSelector:
        matchLabels:
          workload-type: training
    ports:
    - protocol: TCP
      port: 29500
    - protocol: TCP
      port: 29501
  # Allow to data storage
  - to:
    - ipBlock:
        cidr: 10.0.0.0/8
    ports:
    - protocol: TCP
      port: 2049  # NFS
    - protocol: TCP
      port: 443   # Object storage
  # Allow artifact upload
  - to:
    - podSelector:
        matchLabels:
          app: mlflow
    ports:
    - protocol: TCP
      port: 5000

Istio Authorization for ML APIs

# Only authenticated users can access inference
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: inference-authz
  namespace: ml-serving
spec:
  selector:
    matchLabels:
      app: inference-service
  action: ALLOW
  rules:
  # Allow health checks without auth
  - to:
    - operation:
        methods: ["GET"]
        paths: ["/health", "/ready", "/v2/health/*"]
  # Require JWT for inference endpoints
  - from:
    - source:
        requestPrincipals: ["*"]
    to:
    - operation:
        methods: ["POST"]
        paths: ["/v1/models/*", "/v2/models/*"]
    when:
    - key: request.auth.claims[aud]
      values: ["ml-inference-api"]
---
# Rate limiting per user
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: rate-limit-policy
  namespace: ml-serving
spec:
  selector:
    matchLabels:
      app: inference-service
  action: CUSTOM
  provider:
    name: rate-limiter
  rules:
  - to:
    - operation:
        paths: ["/v1/models/*"]

Egress Control for Model Downloads

# Allow only specific model registries
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: model-download-egress
  namespace: ml-serving
spec:
  podSelector:
    matchLabels:
      needs-model-access: "true"
  policyTypes:
  - Egress
  egress:
  # HuggingFace Hub
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
    ports:
    - protocol: TCP
      port: 443
---
# Istio ServiceEntry for allowed domains
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: allowed-model-registries
spec:
  hosts:
  - huggingface.co
  - "*.huggingface.co"
  - storage.googleapis.com
  - "*.s3.amazonaws.com"
  ports:
  - number: 443
    name: https
    protocol: HTTPS
  resolution: DNS
  location: MESH_EXTERNAL
---
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: model-loader-sidecar
  namespace: ml-serving
spec:
  workloadSelector:
    labels:
      needs-model-access: "true"
  egress:
  - hosts:
    - "istio-system/*"
    - "./allowed-model-registries"

Sensitive Data Protection

# Isolate PII processing pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: pii-processor-isolation
  namespace: ml-sensitive
spec:
  podSelector:
    matchLabels:
      data-classification: pii
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          pii-access: "true"
    ports:
    - protocol: TCP
      port: 8080
  egress:
  # No internet access for PII workloads
  - to:
    - podSelector: {}
    ports:
    - protocol: TCP

Next lesson: High availability and disaster recovery for ML services. :::

Quiz

Module 5: Service Mesh & Networking for ML

Take Quiz