Kubernetes Foundations for ML
Namespaces & Resource Quotas for ML Teams
3 min read
Multi-tenancy is critical when multiple ML teams share GPU clusters. Namespaces provide isolation, while resource quotas prevent any team from monopolizing expensive resources.
Namespace Strategy for ML
Team-Based Namespaces
# Namespace for ML Research team
apiVersion: v1
kind: Namespace
metadata:
name: ml-research
labels:
team: research
cost-center: rd-001
gpu-tier: high-priority
---
# Namespace for ML Production
apiVersion: v1
kind: Namespace
metadata:
name: ml-production
labels:
team: platform
cost-center: prod-001
gpu-tier: critical
---
# Namespace for ML Experimentation
apiVersion: v1
kind: Namespace
metadata:
name: ml-experiments
labels:
team: data-science
cost-center: ds-001
gpu-tier: best-effort
Namespace Isolation Model
┌─────────────────────────────────────────────────────────────────┐
│ GPU Kubernetes Cluster │
├─────────────────────────────────────────────────────────────────┤
│ ml-production (Critical) │
│ ├── inference-deployments (always running) │
│ ├── model-servers (autoscaled) │
│ └── quota: 16 GPUs guaranteed │
├─────────────────────────────────────────────────────────────────┤
│ ml-research (High Priority) │
│ ├── training-jobs (batch) │
│ ├── notebooks (interactive) │
│ └── quota: 32 GPUs limit, 8 guaranteed │
├─────────────────────────────────────────────────────────────────┤
│ ml-experiments (Best Effort) │
│ ├── experiment-jobs (preemptible) │
│ ├── hyperparameter-tuning │
│ └── quota: 8 GPUs limit, 0 guaranteed │
└─────────────────────────────────────────────────────────────────┘
Resource Quotas
GPU Quota Configuration
# Resource quota for ml-research namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: ml-research-quota
namespace: ml-research
spec:
hard:
# GPU limits
requests.nvidia.com/gpu: "8"
limits.nvidia.com/gpu: "32"
# Compute limits
requests.cpu: "64"
limits.cpu: "128"
requests.memory: "256Gi"
limits.memory: "512Gi"
# Storage limits
requests.storage: "2Ti"
persistentvolumeclaims: "20"
# Object counts
pods: "50"
services: "10"
secrets: "50"
configmaps: "50"
Production Namespace Quota
apiVersion: v1
kind: ResourceQuota
metadata:
name: ml-production-quota
namespace: ml-production
spec:
hard:
# Guaranteed GPU allocation
requests.nvidia.com/gpu: "16"
limits.nvidia.com/gpu: "16"
# Higher compute for inference
requests.cpu: "128"
limits.cpu: "256"
requests.memory: "512Gi"
limits.memory: "1Ti"
# Unlimited pods for scaling
pods: "200"
services: "50"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values:
- critical
- high
LimitRanges for Default Resources
Default GPU Requests
apiVersion: v1
kind: LimitRange
metadata:
name: ml-limit-range
namespace: ml-research
spec:
limits:
# Default container limits
- type: Container
default:
cpu: "2"
memory: "8Gi"
defaultRequest:
cpu: "1"
memory: "4Gi"
max:
cpu: "32"
memory: "128Gi"
nvidia.com/gpu: "8"
min:
cpu: "100m"
memory: "128Mi"
# PVC size limits
- type: PersistentVolumeClaim
max:
storage: "500Gi"
min:
storage: "1Gi"
Priority Classes for ML Workloads
Priority Class Hierarchy
# Critical: Production inference (never preempted)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: ml-critical
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Production ML inference - never preempt"
---
# High: Research training (can preempt experiments)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: ml-high
value: 100000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Research training jobs"
---
# Low: Experiments (preemptible)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: ml-low
value: 1000
globalDefault: false
preemptionPolicy: Never
description: "Experimental workloads - can be preempted"
Using Priority in Training Jobs
apiVersion: batch/v1
kind: Job
metadata:
name: research-training
namespace: ml-research
spec:
template:
spec:
priorityClassName: ml-high # Can preempt experiments
containers:
- name: trainer
image: pytorch/pytorch:latest
resources:
limits:
nvidia.com/gpu: 4
Checking Quota Usage
# View quota usage
kubectl describe resourcequota ml-research-quota -n ml-research
# Output:
# Name: ml-research-quota
# Namespace: ml-research
# Resource Used Hard
# -------- ---- ----
# limits.nvidia.com/gpu 12 32
# requests.nvidia.com/gpu 8 8
# pods 15 50
# requests.memory 128Gi 256Gi
# List all quotas in cluster
kubectl get resourcequota --all-namespaces
# Check if pod would fit quota
kubectl run test --image=nginx --dry-run=client \
--requests='nvidia.com/gpu=4' -n ml-research
Next, we'll cover kubectl essentials and debugging techniques for ML workloads. :::