← All Articles

Zero-Downtime Deployments on Kubernetes

A practical guide to achieving true zero-downtime deployments with Kubernetes, covering rolling updates, health checks, PodDisruptionBudgets, and blue-green strategies.

Deploying without downtime seems simple until you actually try it. There are dozens of subtle ways your users can experience errors during a rollout. Let me walk you through the strategies I use to achieve true zero-downtime deployments.

The Problem

During a typical deployment, there's a window where:

  1. Old pods are terminating (may still receive traffic)
  2. New pods are starting (not yet ready to serve)
  3. Load balancers are updating their backends

If any of these overlap incorrectly, users see 502s, connection resets, or timeout errors.

Rolling Update Strategy

The foundation of zero-downtime deployments in Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api-server
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Create 1 new pod before killing old ones
      maxUnavailable: 0  # Never have fewer than desired replicas
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: api
          image: registry.example.com/api:v2.1.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "128Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "1000m"

Key insight: Setting maxUnavailable: 0 ensures you always have at least the desired number of healthy pods. Combined with maxSurge: 1, Kubernetes creates a new pod first, waits for it to be ready, then terminates an old one.

Health Checks: The Foundation

Without proper health checks, Kubernetes can't know when your app is ready to serve traffic:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 3
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 2

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 0
  periodSeconds: 2
  failureThreshold: 30  # Allow up to 60s for startup

Your readiness endpoint should verify:

func readyHandler(w http.ResponseWriter, r *http.Request) {
    // Check database connectivity
    if err := db.PingContext(r.Context()); err != nil {
        http.Error(w, "db not ready", http.StatusServiceUnavailable)
        return
    }

    // Check cache connectivity  
    if err := redis.Ping(r.Context()).Err(); err != nil {
        http.Error(w, "cache not ready", http.StatusServiceUnavailable)
        return
    }

    // Check that warmup is complete
    if !cache.IsWarmed() {
        http.Error(w, "cache warming", http.StatusServiceUnavailable)
        return
    }

    w.WriteHeader(http.StatusOK)
    w.Write([]byte("ready"))
}

Graceful Shutdown with PreStop Hook

The most common source of dropped requests is the gap between when Kubernetes removes a pod from the service endpoints and when the pod actually stops accepting connections:

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 10"]

This 10-second sleep gives the kube-proxy and ingress controllers time to update their routing tables before the pod starts shutting down.

In your application code:

func main() {
    srv := &http.Server{Addr: ":8080", Handler: router}

    go srv.ListenAndServe()

    // Wait for SIGTERM
    stop := make(chan os.Signal, 1)
    signal.Notify(stop, syscall.SIGTERM)
    <-stop

    // Stop accepting new connections, finish existing ones
    ctx, cancel := context.WithTimeout(context.Background(), 45*time.Second)
    defer cancel()
    srv.Shutdown(ctx)
}

PodDisruptionBudgets

Protect against voluntary disruptions (node drains, cluster upgrades):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
spec:
  minAvailable: 3  # Always keep at least 3 pods running
  selector:
    matchLabels:
      app: api-server

The Complete Picture

Deployment Flow

Here's the sequence during a zero-downtime deployment:

  1. New pod created with updated image
  2. Startup probe begins checking
  3. Once startup succeeds, readiness probe takes over
  4. Pod marked Ready → added to Service endpoints
  5. Traffic starts flowing to new pod
  6. Old pod's preStop hook fires (sleep 10s)
  7. Old pod removed from Service endpoints
  8. SIGTERM sent to old pod's process
  9. Old pod completes in-flight requests
  10. Old pod terminates

Conclusion

Zero-downtime deployments aren't a single feature — they're the result of multiple mechanisms working together correctly. Get the health checks right, handle shutdown gracefully, and always test your deployment strategy under load before relying on it in production.

Next ArticleReact Server Components: A Mental ModelReact · 14 min read