GPU Scheduling & Resource Management
Kueue & Volcano: Advanced GPU Scheduling
4 min read
Organizations treating GPUs as shared, policy-driven resources win at AI scale. Kueue and Volcano provide queue-based admission control and gang scheduling essential for ML workloads.
The Queue Management Problem
Without Queue Management
┌─────────────────────────────────────────────────────────────────┐
│ Problem: Native Kubernetes Scheduling │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Team A submits: 32 GPU job │
│ Team B submits: 8 GPU job │
│ Team C submits: 64 GPU job │
│ │
│ Kubernetes behavior: │
│ - First pod scheduled gets resources │
│ - No fairness across teams │
│ - Distributed training: partial pod allocation (deadlock!) │
│ - No borrowing/lending between quotas │
│ - Jobs stuck waiting with no visibility │
│ │
└─────────────────────────────────────────────────────────────────┘
With Queue Management
┌─────────────────────────────────────────────────────────────────┐
│ Solution: Kueue Queue Management │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ClusterQueue: gpu-cluster (64 GPUs total) │
│ ├── LocalQueue: team-a (quota: 16 GPUs, can borrow 32) │
│ ├── LocalQueue: team-b (quota: 16 GPUs, can borrow 32) │
│ └── LocalQueue: team-c (quota: 32 GPUs, can borrow 16) │
│ │
│ Workload submitted → Admission control → Gang scheduling │
│ │
│ Benefits: │
│ - Fair share across teams │
│ - Gang admission (all-or-nothing) │
│ - Borrowing when queues are idle │
│ - Preemption policies │
│ - Queue visibility and priorities │
│ │
└─────────────────────────────────────────────────────────────────┘
Kueue: Kubernetes-Native Job Queueing
Installing Kueue
# Install Kueue
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.9.0/manifests.yaml
# Verify installation
kubectl get pods -n kueue-system
# Check CRDs
kubectl get crd | grep kueue
# clusterqueues.kueue.x-k8s.io
# localqueues.kueue.x-k8s.io
# resourceflavors.kueue.x-k8s.io
# workloads.kueue.x-k8s.io
Resource Flavors
# Define GPU types as flavors
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: nvidia-a100
spec:
nodeLabels:
nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: nvidia-h100
spec:
nodeLabels:
nvidia.com/gpu.product: NVIDIA-H100-SXM5-80GB
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: nvidia-l4
spec:
nodeLabels:
nvidia.com/gpu.product: NVIDIA-L4
ClusterQueue Configuration
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: gpu-cluster
spec:
namespaceSelector: {} # All namespaces
queueingStrategy: BestEffortFIFO
preemption:
reclaimWithinCohort: Any
withinClusterQueue: LowerPriority
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: nvidia-a100
resources:
- name: "cpu"
nominalQuota: 256
borrowingLimit: 128
- name: "memory"
nominalQuota: 1Ti
borrowingLimit: 512Gi
- name: "nvidia.com/gpu"
nominalQuota: 32
borrowingLimit: 16
- name: nvidia-h100
resources:
- name: "cpu"
nominalQuota: 128
- name: "memory"
nominalQuota: 512Gi
- name: "nvidia.com/gpu"
nominalQuota: 16
LocalQueue per Team
# Team A's queue
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: ml-research-queue
namespace: ml-research
spec:
clusterQueue: gpu-cluster
---
# Team B's queue
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: ml-production-queue
namespace: ml-production
spec:
clusterQueue: gpu-cluster
Submitting Jobs to Kueue
apiVersion: batch/v1
kind: Job
metadata:
name: distributed-training
namespace: ml-research
labels:
kueue.x-k8s.io/queue-name: ml-research-queue
spec:
parallelism: 4
completions: 4
template:
spec:
containers:
- name: trainer
image: pytorch/pytorch:2.1-cuda12.1
resources:
requests:
nvidia.com/gpu: 8
cpu: "32"
memory: "128Gi"
limits:
nvidia.com/gpu: 8
cpu: "32"
memory: "128Gi"
restartPolicy: Never
Monitoring Kueue
# Check queue status
kubectl get clusterqueue gpu-cluster -o yaml
# View pending/admitted workloads
kubectl get workloads -n ml-research
# Check LocalQueue status
kubectl describe localqueue ml-research-queue -n ml-research
Volcano: Gang Scheduling for Distributed Training
Why Gang Scheduling?
┌─────────────────────────────────────────────────────────────────┐
│ Problem: Partial Pod Allocation │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 4-worker distributed training needs 4 GPUs simultaneously │
│ │
│ Without gang scheduling: │
│ Worker 0: ✓ Scheduled (waiting for others) │
│ Worker 1: ✓ Scheduled (waiting for others) │
│ Worker 2: ✗ Pending (no GPU) │
│ Worker 3: ✗ Pending (no GPU) │
│ │
│ Result: DEADLOCK - GPUs wasted, training stuck! │
│ │
│ With gang scheduling: │
│ All 4 workers: ✗ Waiting until 4 GPUs available │
│ All 4 workers: ✓ Admitted together │
│ │
│ Result: No wasted resources │
│ │
└─────────────────────────────────────────────────────────────────┘
Installing Volcano
# Install Volcano
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml
# Verify
kubectl get pods -n volcano-system
Volcano Job Example
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: pytorch-distributed
namespace: ml-research
spec:
minAvailable: 4 # Gang scheduling: all 4 or nothing
schedulerName: volcano
plugins:
env: []
svc: []
queue: default
tasks:
- replicas: 1
name: master
template:
spec:
containers:
- name: pytorch
image: pytorch/pytorch:2.1-cuda12.1
command: ["python", "-m", "torch.distributed.launch"]
args: ["--master_addr=master-0", "--nproc_per_node=1", "train.py"]
resources:
limits:
nvidia.com/gpu: 1
- replicas: 3
name: worker
template:
spec:
containers:
- name: pytorch
image: pytorch/pytorch:2.1-cuda12.1
command: ["python", "-m", "torch.distributed.launch"]
args: ["--master_addr=master-0", "--nproc_per_node=1", "train.py"]
resources:
limits:
nvidia.com/gpu: 1
Kueue vs Volcano
| Feature | Kueue | Volcano |
|---|---|---|
| Primary Focus | Job queueing & admission | Gang scheduling |
| Preemption | Advanced policies | Basic |
| Multi-tenancy | Strong (cohorts, borrowing) | Basic |
| CRD Required | Uses native Jobs | Custom VolcanoJob |
| CNCF Status | Kubernetes SIG project | CNCF Incubating |
| Best For | Fair sharing, quotas | Distributed training |
Recommendation: Use Kueue for queue management + Volcano for strict gang scheduling.
Next, we'll cover NVIDIA KAI Scheduler and Dynamic Resource Allocation (DRA) for cutting-edge GPU management. :::