Deploying without downtime seems simple until you actually try it. There are dozens of subtle ways your users can experience errors during a rollout. Let me walk you through the strategies I use to achieve true zero-downtime deployments.
The Problem
During a typical deployment, there's a window where:
- Old pods are terminating (may still receive traffic)
- New pods are starting (not yet ready to serve)
- Load balancers are updating their backends
If any of these overlap incorrectly, users see 502s, connection resets, or timeout errors.
Rolling Update Strategy
The foundation of zero-downtime deployments in Kubernetes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
labels:
app: api-server
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Create 1 new pod before killing old ones
maxUnavailable: 0 # Never have fewer than desired replicas
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api
image: registry.example.com/api:v2.1.0
ports:
- containerPort: 8080
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"Key insight: Setting
maxUnavailable: 0ensures you always have at least the desired number of healthy pods. Combined withmaxSurge: 1, Kubernetes creates a new pod first, waits for it to be ready, then terminates an old one.
Health Checks: The Foundation
Without proper health checks, Kubernetes can't know when your app is ready to serve traffic:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 2
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 30 # Allow up to 60s for startupYour readiness endpoint should verify:
func readyHandler(w http.ResponseWriter, r *http.Request) {
// Check database connectivity
if err := db.PingContext(r.Context()); err != nil {
http.Error(w, "db not ready", http.StatusServiceUnavailable)
return
}
// Check cache connectivity
if err := redis.Ping(r.Context()).Err(); err != nil {
http.Error(w, "cache not ready", http.StatusServiceUnavailable)
return
}
// Check that warmup is complete
if !cache.IsWarmed() {
http.Error(w, "cache warming", http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("ready"))
}Graceful Shutdown with PreStop Hook
The most common source of dropped requests is the gap between when Kubernetes removes a pod from the service endpoints and when the pod actually stops accepting connections:
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]This 10-second sleep gives the kube-proxy and ingress controllers time to update their routing tables before the pod starts shutting down.
In your application code:
func main() {
srv := &http.Server{Addr: ":8080", Handler: router}
go srv.ListenAndServe()
// Wait for SIGTERM
stop := make(chan os.Signal, 1)
signal.Notify(stop, syscall.SIGTERM)
<-stop
// Stop accepting new connections, finish existing ones
ctx, cancel := context.WithTimeout(context.Background(), 45*time.Second)
defer cancel()
srv.Shutdown(ctx)
}PodDisruptionBudgets
Protect against voluntary disruptions (node drains, cluster upgrades):
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
spec:
minAvailable: 3 # Always keep at least 3 pods running
selector:
matchLabels:
app: api-serverThe Complete Picture

Here's the sequence during a zero-downtime deployment:
- New pod created with updated image
- Startup probe begins checking
- Once startup succeeds, readiness probe takes over
- Pod marked Ready → added to Service endpoints
- Traffic starts flowing to new pod
- Old pod's preStop hook fires (sleep 10s)
- Old pod removed from Service endpoints
- SIGTERM sent to old pod's process
- Old pod completes in-flight requests
- Old pod terminates
Conclusion
Zero-downtime deployments aren't a single feature — they're the result of multiple mechanisms working together correctly. Get the health checks right, handle shutdown gracefully, and always test your deployment strategy under load before relying on it in production.