cloud-devops

Kubernetes Zero-Downtime Deployments: 2026 Hands-On Guide

June 9, 2026

#kubernetes #zero-downtime deployment #graceful shutdown #rolling update #readiness probe #prestop hook #devops #node.js

Kubernetes Zero-Downtime Deployments: 2026 Hands-On Guide

A zero-downtime Kubernetes deployment needs four things working together: a readiness probe so traffic only reaches pods that are actually ready, a rolling update with maxUnavailable: 0, a preStop hook that outlasts endpoint propagation, and an app that drains in-flight requests on SIGTERM. This guide builds all four on a real cluster.

TL;DR

You will build a small Node.js service, run it on a local kind cluster, and prove — with a live load test — that a naive Deployment drops requests during a rollout. Then you will fix it step by step: readiness probes gate traffic on the way up, maxUnavailable: 0 keeps capacity during the roll, and a native preStop sleep plus graceful SIGTERM draining closes the connection-loss window on the way down. Budget about 30 minutes. Every version here is pinned to current releases — Kubernetes 1.36, kind v0.32.0, and Node 24 LTS.

What you'll learn

Why Kubernetes rolling updates drop connections even when your app handles SIGTERM perfectly
How to write a Node.js HTTP server that drains in-flight requests and exits cleanly
How readiness probes gate traffic to pods on scale-up (and why liveness is different)
How to tune maxUnavailable and maxSurge for a true zero-downtime rolling update
How the preStop sleep action (GA in Kubernetes 1.34) bridges the endpoint-removal race
How terminationGracePeriodSeconds interacts with preStop and your drain logic
How a PodDisruptionBudget protects you during node drains

Prerequisites

Pin these versions so the commands below behave exactly as written:

Docker (or Podman) running locally — kind builds its node "machine" as a container.
kind v0.32.0, which defaults to Kubernetes 1.36.1.¹ Earlier kind releases work too; the native preStop sleep action used in Step 7 is GA in Kubernetes 1.34 and has shipped on by default since 1.30 (beta).²
kubectl matching your cluster (1.35 or 1.36 client is fine against a 1.36 server).
Node.js 24 LTS (24.16.0 at the time of writing) only if you want to run the app outside the container first.³

Check your tools:

kind version          # expect v0.32.0
kubectl version --client
docker info >/dev/null && echo "docker is running"

New to the platform? Our Kubernetes fundamentals guide covers Pods, Deployments, and Services before you dive into the failure modes below.

Step 1 — A Node service that shuts down gracefully

The single most important rule of graceful shutdown: when your process receives SIGTERM, stop accepting new connections, let in-flight requests finish, and then exit. The default Node behavior on SIGTERM is to die immediately, severing every open connection — so you have to handle the signal yourself.

Create src/server.js:

// src/server.js — minimal, zero-dependency HTTP server with graceful shutdown.
// Targets Node 24 LTS. On SIGTERM it stops accepting new connections,
// drains in-flight requests, then exits 0.
import http from 'node:http';

const PORT = Number(process.env.PORT ?? 8080);
const WORK_MS = Number(process.env.WORK_MS ?? 500);
const READY_DELAY_MS = Number(process.env.READY_DELAY_MS ?? 0);
const DRAIN_TIMEOUT_MS = Number(process.env.DRAIN_TIMEOUT_MS ?? 25_000);

const log = (msg, extra = {}) =>
  console.log(JSON.stringify({ ts: new Date().toISOString(), msg, ...extra }));

const server = http.createServer((req, res) => {
  if (req.url === '/healthz') {
    res.writeHead(200, { 'content-type': 'text/plain' });
    res.end('ok\n');
    return;
  }
  // Simulate real work so draining is observable.
  setTimeout(() => {
    res.writeHead(200, { 'content-type': 'text/plain' });
    res.end(`handled by ${process.pid}\n`);
  }, WORK_MS);
});

// Simulate warm-up (config load, DB pool, cache prime) before we start listening.
setTimeout(() => {
  server.listen(PORT, () => log('listening', { port: PORT, pid: process.pid }));
}, READY_DELAY_MS);

let shuttingDown = false;
function shutdown(signal) {
  if (shuttingDown) return;
  shuttingDown = true;
  log('signal received, draining', { signal });

  // Stop accepting new connections; the callback runs once all
  // in-flight requests have completed.
  server.close(() => {
    log('drain complete, exiting', { code: 0 });
    process.exit(0);
  });
  // Close idle keep-alive sockets (server.close() also does this on Node 18.19+).
  server.closeIdleConnections();

  // Safety net: never hang past the grace period.
  setTimeout(() => {
    log('drain timeout, forcing exit', { code: 1 });
    process.exit(1);
  }, DRAIN_TIMEOUT_MS).unref();
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

Two details that trip people up. First, server.close() stops accepting new connections and waits for active requests to finish; since Node 18.19 it also closes idle keep-alive sockets automatically, so the explicit server.closeIdleConnections() call here mainly documents intent and stays safe on newer runtimes.⁴ Second, the READY_DELAY_MS knob fakes a slow start-up; we will use it to make the readiness-probe failure obvious.

Verify the drain behavior locally before containerizing. Start the server with a long per-request delay, fire a request, then send SIGTERM while it is in flight:

WORK_MS=1500 node src/server.js &     # prints {"msg":"listening",...}
SRV=$!
curl -s -w 'in-flight: HTTP %{http_code} in %{time_total}s\n' http://localhost:8080/ &
sleep 0.4
kill -TERM $SRV                        # terminate mid-request

Expected output — the in-flight request finishes with 200 even though the server received SIGTERM mid-flight, and the process exits cleanly:

{"ts":"...","msg":"listening","port":8080,"pid":8}
{"ts":"...","msg":"signal received, draining","signal":"SIGTERM"}
in-flight: HTTP 200 in 1.506991s
{"ts":"...","msg":"drain complete, exiting","code":0}

A new request started after SIGTERM is refused (the listener is closed) — exactly what we want. The app is correct. Now watch Kubernetes drop requests anyway.

Step 2 — Containerize it and load it into kind

Create a Dockerfile next to src/:

# syntax=docker/dockerfile:1
FROM node:24-slim
ENV NODE_ENV=production
WORKDIR /app
COPY src/ ./src/
USER node
EXPOSE 8080
CMD ["node", "src/server.js"]

Build the image and spin up a cluster. kind runs each node as a container, so you load the image directly into the cluster instead of pushing to a registry:

docker build -t zdt-demo:v1 .
kind create cluster --name zdt
kind load docker-image zdt-demo:v1 --name zdt

kind load docker-image copies your local image into every node so imagePullPolicy: IfNotPresent finds it without a registry. Confirm the cluster is up:

kubectl get nodes   # one control-plane node, STATUS Ready

Step 3 — Deploy the naive way (and watch it drop requests)

Here is a Deployment with none of the zero-downtime machinery — no readiness probe, default rolling strategy, and a 5-second simulated warm-up that mimics a real app loading config and warming a connection pool. Create k8s/deployment-naive.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: zdt-demo:v1
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
          env:
            - name: READY_DELAY_MS
              value: "5000"   # 5s warm-up before the server listens
            - name: WORK_MS
              value: "500"

And a Service to load-balance across the pods (k8s/service.yaml):

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web
  ports:
    - name: http
      port: 80
      targetPort: 8080

Apply both and wait for the pods to settle:

kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/deployment-naive.yaml
kubectl rollout status deployment/web

The problem is invisible at rest — three pods, all serving. It only shows up during a rollout, which is exactly when your users are watching.

Step 4 — Measure the downtime

You cannot fix what you cannot see, so run a continuous load loop from inside the cluster against the Service, then trigger a rollout. This in-cluster busybox pod sends one request after another for 60 seconds and counts successes and failures:

kubectl run probe --image=busybox:1.37 --restart=Never --rm -i -- \
  sh -c 'ok=0; fail=0; end=$(( $(date +%s) + 60 ));
         while [ $(date +%s) -lt $end ]; do
           if wget -q -T 2 -O /dev/null http://web/; then ok=$((ok+1));
           else fail=$((fail+1)); echo "drop at $(date +%T)"; fi
         done; echo "RESULT ok=$ok fail=$fail"'

While that loop runs, open a second terminal and force a rollout (this re-creates all three pods, just like shipping a new image would):

kubectl rollout restart deployment/web

When the loop finishes you will see a non-zero fail count, plus a drop at ... line for each failed request. Every drop happens in the window where the Service routes a request to a brand-new pod whose Node process is still in its 5-second warm-up and is not yet listening on port 8080. Without a readiness probe, Kubernetes considers a container "ready" the moment it starts, so the pod joins the Service endpoints before the app can actually serve.⁵ Hitting the Service directly like this, each failure is a connection refused; with an ingress or load balancer in front, the same gap usually surfaces as a 502. Let's close it.

Step 5 — Readiness probes: only send traffic to ready pods

A readiness probe tells Kubernetes whether a pod should receive traffic. When it fails, the pod is removed from the Service's endpoints but keeps running (no restart). A liveness probe is different: when it fails, the kubelet restarts the container. Confusing the two is a classic way to cause an outage, so keep liveness lenient and let readiness do the traffic gating.⁵

Create the production manifest k8s/deployment.yaml (we will keep adding to it through Step 7):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: zdt-demo:v1
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
          env:
            - name: READY_DELAY_MS
              value: "5000"
            - name: WORK_MS
              value: "500"
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 0
            periodSeconds: 2
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 3

During the 5-second warm-up the server is not listening, so the readiness httpGet to /healthz gets connection-refused and fails. The pod stays out of the Service until the probe succeeds, after which traffic flows. The liveness probe starts later (initialDelaySeconds: 10) and polls slowly, so it never trips during normal warm-up.

Apply it and re-run the load loop from Step 4 during a kubectl rollout restart. The failure count from new-pod warm-up disappears. But there is still a gap on the shutdown side, and a subtler capacity question during the roll. Two more steps.

Step 6 — Roll without dropping capacity (maxUnavailable & maxSurge)

A rolling update replaces pods in batches. Two knobs control it: maxUnavailable (how many pods may be down below the desired count) and maxSurge (how many extra pods may be created above it). The defaults are 25% each.⁶ With 25%, maxUnavailable rounds down and maxSurge rounds up, so the exact behavior depends on your replica count — which is precisely why you should set them explicitly rather than rely on rounding.

For zero downtime, set maxUnavailable: 0 so the rollout never drops below your desired ready count, and maxSurge: 1 so it adds a new pod before retiring an old one. Add a strategy block and a minReadySeconds cushion to the top of the spec in k8s/deployment.yaml:

spec:
  replicas: 3
  minReadySeconds: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector:
    matchLabels:
      app: web
  # ... template unchanged ...

minReadySeconds: 5 tells the Deployment a new pod must stay ready for 5 seconds before it counts toward availability and the next old pod is removed — a cheap guard against pods that flap ready, then immediately fall over. With maxUnavailable: 0, the only way to roll is to surge a new ready pod first, so the served capacity never dips.

Step 7 — Close the shutdown race with a preStop hook

Here is the part that is easy to get wrong. Your app already drains on SIGTERM, so why would terminating a pod ever drop a request?

Because two things happen at the same time when a pod is deleted (or replaced by a rollout):

The kubelet starts terminating the pod: the terminationGracePeriodSeconds clock starts, the kubelet runs your preStop hook, then sends SIGTERM to the container, and finally sends SIGKILL if the pod is still alive when that single shared budget runs out.⁷
The control plane removes the pod from the Service's EndpointSlice, and then kube-proxy and your ingress controller sync that change to their routing tables.

Path 2 is eventually consistent and runs in parallel with path 1. So for a short window, a node can still route a fresh request to a pod that has already received SIGTERM and closed its listener — and the client gets a connection refused.⁸ An app that drains perfectly cannot help here, because the problem is that traffic is still being sent to it.

The fix is a preStop hook that simply sleeps. It delays SIGTERM long enough for endpoint removal to propagate, and because preStop runs to completion before SIGTERM is sent, the container stays alive and serving during the sleep.⁷ Kubernetes 1.34 promoted a native sleep action to GA, so you no longer need a sleep binary in your image — it works even on distroless.²

Add a lifecycle block and a grace period to the container spec in k8s/deployment.yaml:

spec:
  template:
    spec:
      terminationGracePeriodSeconds: 45
      containers:
        - name: web
          # ... image, env, probes unchanged ...
          lifecycle:
            preStop:
              sleep:
                seconds: 15
          resources:
            requests:
              cpu: 25m
              memory: 32Mi
            limits:
              cpu: 100m
              memory: 64Mi

Sizing matters. terminationGracePeriodSeconds covers the whole termination — the preStop hook plus your app's drain. The grace clock starts when termination begins, so it must exceed the preStop sleep plus your longest expected request. Here: 15s sleep + a 0.5s request + headroom, well within 45s. The official guidance is blunt: if your hook takes 55 seconds and the container needs 10 more to stop, a terminationGracePeriodSeconds below 65 will SIGKILL the container before it finishes.⁷

On clusters older than 1.30 (or with the PodLifecycleSleepAction gate disabled), swap the native action for the portable form, which needs a shell and sleep in the image: preStop: { exec: { command: ["/bin/sh","-c","sleep 15"] } }.

Here is the complete k8s/deployment.yaml after Steps 5–7, ready to copy and apply:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  minReadySeconds: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      terminationGracePeriodSeconds: 45
      containers:
        - name: web
          image: zdt-demo:v1
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
          env:
            - name: READY_DELAY_MS
              value: "5000"
            - name: WORK_MS
              value: "500"
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 0
            periodSeconds: 2
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 3
          lifecycle:
            preStop:
              sleep:
                seconds: 15
          resources:
            requests:
              cpu: 25m
              memory: 32Mi
            limits:
              cpu: 100m
              memory: 64Mi

Apply the finished manifest:

kubectl apply -f k8s/deployment.yaml
kubectl rollout status deployment/web

Step 8 — Survive node drains with a PodDisruptionBudget

The rolling-update strategy from Step 6 covers the deployments you trigger. But node drains for upgrades or autoscaling are a separate, voluntary disruption — and a single drain can try to evict several of your pods at once. A PodDisruptionBudget (PDB) caps how many pods of a set can be voluntarily evicted at a time, forcing kubectl drain and the cluster autoscaler to wait rather than take you below a safe count.⁹

Create k8s/pdb.yaml:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web

kubectl apply -f k8s/pdb.yaml
kubectl get pdb web   # ALLOWED DISRUPTIONS should be 1

With three replicas and minAvailable: 2, only one pod can be evicted at a time during a drain. A PDB constrains only voluntary disruptions through the Eviction API: it can't stop involuntary disruptions like a node crash (though those still count against the budget), and Deployment rolling updates are governed by your maxUnavailable/maxSurge strategy, not the PDB.⁹

Verification: prove zero downtime

Run the same load loop from Step 4 one more time, and trigger a rollout in a second terminal:

# terminal 1
kubectl run probe --image=busybox:1.37 --restart=Never --rm -i -- \
  sh -c 'ok=0; fail=0; end=$(( $(date +%s) + 60 ));
         while [ $(date +%s) -lt $end ]; do
           if wget -q -T 2 -O /dev/null http://web/; then ok=$((ok+1));
           else fail=$((fail+1)); echo "drop at $(date +%T)"; fi
         done; echo "RESULT ok=$ok fail=$fail"'

# terminal 2
kubectl rollout restart deployment/web

This time the result should read fail=0. New pods only receive traffic after they warm up (readiness), capacity never dips (maxUnavailable: 0/maxSurge: 1), and terminating pods keep serving until endpoints update (preStop sleep) and then drain in-flight work (graceful SIGTERM). For a higher-fidelity test, swap the sequential busybox loop for a concurrent tool like fortio or hey and watch the response-code histogram stay 100% 200.

Two commands worth knowing for any rollout:

kubectl rollout status deployment/web    # watch a roll complete
kubectl rollout undo deployment/web      # roll back to the previous ReplicaSet

When you're done, tear down the cluster: kind delete cluster --name zdt.

Troubleshooting

Rollout hangs at "waiting for deployment to finish." With maxUnavailable: 0 and maxSurge: 1, a new pod must become ready before the next step. If your readiness probe never passes (wrong path, wrong port, app crash), the rollout stalls forever by design. Check kubectl describe pod <new-pod> and kubectl logs <new-pod>.

Requests still drop on shutdown. Your preStop sleep is shorter than the time your ingress/kube-proxy needs to stop routing. Increase the sleep (10–20s is common) and confirm terminationGracePeriodSeconds is larger than the sleep plus your longest request. Also confirm the app actually handles SIGTERM — a process that ignores it is SIGKILLed at the end of the grace period, dropping everything in flight.

error: lifecycle.preStop.sleep ... unknown field. The native sleep action is GA in 1.34 and has been on by default since 1.30 (beta); only clusters older than that — or ones with the feature gate disabled — reject the field. Use the exec/sleep form shown in Step 7, or upgrade.²

Liveness probe restarts pods during load. A liveness probe pointed at a slow or dependency-heavy endpoint will fail under pressure and restart healthy pods, amplifying an incident. Point liveness at a cheap local check (like /healthz here), keep initialDelaySeconds generous, and let readiness — not liveness — handle "temporarily busy."

kind load says the image isn't found. You either built a different tag than the manifest references or loaded into the wrong cluster. Match docker build -t zdt-demo:v1 to the manifest's image: and pass the same --name to kind load.

Next steps & further reading

You now have a Deployment that ships new versions without dropping a single request. From here:

Put the Service behind real ingress with the Kubernetes Gateway API and confirm the same zero-downtime behavior end to end.
Before production, walk the Kubernetes security best practices checklist — non-root containers (already done here via USER node), resource limits, and network policy.
Add progressive delivery (canary or blue-green) on top of this rolling-update baseline once you have metrics to gate on.

kind v0.32.0 release notes (default node image kindest/node:v1.36.1, published 2026-06-02): https://github.com/kubernetes-sigs/kind/releases/tag/v0.32.0 ↩
KEP-3960 "Pod lifecycle sleep action" (status: implemented; stable: v1.34): https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3960-pod-lifecycle-sleep-action/README.md and the v1.33 container-lifecycle update: https://kubernetes.io/blog/2025/05/14/kubernetes-v1-33-updates-to-container-lifecycle/ ↩ ↩² ↩³
Node.js releases (24 LTS): https://nodejs.org/en/about/previous-releases ↩
Node.js HTTP server.close() and server.closeIdleConnections(): https://nodejs.org/api/http.html#servercloseidleconnections ↩
Kubernetes — Configure Liveness, Readiness and Startup Probes: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ ↩ ↩²
Kubernetes — Deployments (rolling update maxUnavailable/maxSurge defaults are 25%): https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ ↩
Kubernetes — Container Lifecycle Hooks (preStop runs before SIGTERM; grace period covers hook + stop): https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/ ↩ ↩² ↩³
Kubernetes — Pod Lifecycle, Termination of Pods (endpoint removal happens alongside the grace period): https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination ↩
Kubernetes — Specifying a Disruption Budget for your Application: https://kubernetes.io/docs/tasks/run-application/configure-pdb/ ↩ ↩²