Namespaces & Resource Quotas for ML Teams

Multi-tenancy is critical when multiple ML teams share GPU clusters. Namespaces provide isolation, while resource quotas prevent any team from monopolizing expensive resources.

Namespace Strategy for ML

Team-Based Namespaces

# Namespace for ML Research team
apiVersion: v1
kind: Namespace
metadata:
  name: ml-research
  labels:
    team: research
    cost-center: rd-001
    gpu-tier: high-priority
---
# Namespace for ML Production
apiVersion: v1
kind: Namespace
metadata:
  name: ml-production
  labels:
    team: platform
    cost-center: prod-001
    gpu-tier: critical
---
# Namespace for ML Experimentation
apiVersion: v1
kind: Namespace
metadata:
  name: ml-experiments
  labels:
    team: data-science
    cost-center: ds-001
    gpu-tier: best-effort

Namespace Isolation Model

┌─────────────────────────────────────────────────────────────────┐
│                     GPU Kubernetes Cluster                       │
├─────────────────────────────────────────────────────────────────┤
│  ml-production (Critical)                                        │
│  ├── inference-deployments (always running)                     │
│  ├── model-servers (autoscaled)                                 │
│  └── quota: 16 GPUs guaranteed                                  │
├─────────────────────────────────────────────────────────────────┤
│  ml-research (High Priority)                                     │
│  ├── training-jobs (batch)                                      │
│  ├── notebooks (interactive)                                    │
│  └── quota: 32 GPUs limit, 8 guaranteed                         │
├─────────────────────────────────────────────────────────────────┤
│  ml-experiments (Best Effort)                                    │
│  ├── experiment-jobs (preemptible)                              │
│  ├── hyperparameter-tuning                                      │
│  └── quota: 8 GPUs limit, 0 guaranteed                          │
└─────────────────────────────────────────────────────────────────┘

Resource Quotas

GPU Quota Configuration

# Resource quota for ml-research namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: ml-research-quota
  namespace: ml-research
spec:
  hard:
    # GPU limits
    requests.nvidia.com/gpu: "8"
    limits.nvidia.com/gpu: "32"

    # Compute limits
    requests.cpu: "64"
    limits.cpu: "128"
    requests.memory: "256Gi"
    limits.memory: "512Gi"

    # Storage limits
    requests.storage: "2Ti"
    persistentvolumeclaims: "20"

    # Object counts
    pods: "50"
    services: "10"
    secrets: "50"
    configmaps: "50"

Production Namespace Quota

apiVersion: v1
kind: ResourceQuota
metadata:
  name: ml-production-quota
  namespace: ml-production
spec:
  hard:
    # Guaranteed GPU allocation
    requests.nvidia.com/gpu: "16"
    limits.nvidia.com/gpu: "16"

    # Higher compute for inference
    requests.cpu: "128"
    limits.cpu: "256"
    requests.memory: "512Gi"
    limits.memory: "1Ti"

    # Unlimited pods for scaling
    pods: "200"
    services: "50"
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values:
      - critical
      - high

LimitRanges for Default Resources

Default GPU Requests

apiVersion: v1
kind: LimitRange
metadata:
  name: ml-limit-range
  namespace: ml-research
spec:
  limits:
  # Default container limits
  - type: Container
    default:
      cpu: "2"
      memory: "8Gi"
    defaultRequest:
      cpu: "1"
      memory: "4Gi"
    max:
      cpu: "32"
      memory: "128Gi"
      nvidia.com/gpu: "8"
    min:
      cpu: "100m"
      memory: "128Mi"

  # PVC size limits
  - type: PersistentVolumeClaim
    max:
      storage: "500Gi"
    min:
      storage: "1Gi"

Priority Classes for ML Workloads

Priority Class Hierarchy

# Critical: Production inference (never preempted)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: ml-critical
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Production ML inference - never preempt"
---
# High: Research training (can preempt experiments)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: ml-high
value: 100000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Research training jobs"
---
# Low: Experiments (preemptible)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: ml-low
value: 1000
globalDefault: false
preemptionPolicy: Never
description: "Experimental workloads - can be preempted"

Using Priority in Training Jobs

apiVersion: batch/v1
kind: Job
metadata:
  name: research-training
  namespace: ml-research
spec:
  template:
    spec:
      priorityClassName: ml-high  # Can preempt experiments
      containers:
      - name: trainer
        image: pytorch/pytorch:latest
        resources:
          limits:
            nvidia.com/gpu: 4

Checking Quota Usage

# View quota usage
kubectl describe resourcequota ml-research-quota -n ml-research

# Output:
# Name:                    ml-research-quota
# Namespace:               ml-research
# Resource                 Used    Hard
# --------                 ----    ----
# limits.nvidia.com/gpu    12      32
# requests.nvidia.com/gpu  8       8
# pods                     15      50
# requests.memory          128Gi   256Gi

# List all quotas in cluster
kubectl get resourcequota --all-namespaces

# Check if pod would fit quota
kubectl run test --image=nginx --dry-run=client \
  --requests='nvidia.com/gpu=4' -n ml-research

Next, we'll cover kubectl essentials and debugging techniques for ML workloads. :::