---
title: "Upgrading Kubernetes Clusters Safely"
description: "Step-by-step procedures for upgrading Kubernetes clusters across managed services and self-managed environments, including version skew policy, pre-upgrade checks, and rollback strategies."
url: https://agent-zone.ai/knowledge/kubernetes/cluster-upgrades/
section: knowledge
date: 2026-02-22
categories: ["kubernetes"]
tags: ["upgrades","version-skew","kubeadm","eks","aks","gke","cluster-maintenance"]
skills: ["cluster-upgrade-planning","api-deprecation-detection","etcd-backup","node-pool-management"]
tools: ["kubectl","kubeadm","etcdctl","aws","az","gcloud"]
levels: ["intermediate"]
word_count: 817
formats:
  json: https://agent-zone.ai/knowledge/kubernetes/cluster-upgrades/index.json
  html: https://agent-zone.ai/knowledge/kubernetes/cluster-upgrades/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Upgrading+Kubernetes+Clusters+Safely
---


# Upgrading Kubernetes Clusters Safely

Kubernetes releases a new minor version roughly every four months. Staying current is not optional -- clusters more than three versions behind lose security patches, and skipping versions during upgrade is not supported. Every upgrade must step through each minor version sequentially.

## Version Skew Policy

The version skew policy defines which component version combinations are supported:

- **kube-apiserver** instances within an HA cluster can differ by at most 1 minor version.
- **kubelet** can be up to 3 minor versions older than kube-apiserver (changed from 2 in Kubernetes 1.28+), but never newer.
- **kube-controller-manager**, **kube-scheduler**, and **kube-proxy** must not be newer than kube-apiserver and can be up to 1 minor version older.
- **kubectl** is supported within 1 minor version (older or newer) of kube-apiserver.

The practical consequence: always upgrade the control plane first, then node pools. Never upgrade nodes past the control plane version.

## Pre-Upgrade Checklist

Run every one of these before starting the upgrade.

### 1. Check API Deprecations

```bash
# Install kubectl-convert plugin
kubectl krew install convert

# Check for deprecated API versions in your manifests
kubectl get --raw /metrics | grep apiserver_requested_deprecated_apis

# Scan all resources for deprecated APIs
kubectl api-resources --verbs=list -o name | \
  xargs -I {} kubectl get {} --all-namespaces -o json 2>/dev/null | \
  jq -r '.items[] | select(.apiVersion | test("v1beta1|v1alpha1")) | "\(.apiVersion) \(.kind) \(.metadata.namespace)/\(.metadata.name)"'
```

Tools like `pluto` give cleaner output:

```bash
pluto detect-all-in-cluster --target-versions k8s=v1.31
```

### 2. Review PodDisruptionBudgets

PDBs that are too tight will block node drains during upgrade. Check for PDBs that allow zero disruptions:

```bash
kubectl get pdb --all-namespaces -o json | \
  jq -r '.items[] | select(.status.disruptionsAllowed == 0) | "\(.metadata.namespace)/\(.metadata.name) allowed=\(.status.disruptionsAllowed)"'
```

### 3. Back Up etcd

```bash
ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-pre-upgrade.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

etcdctl snapshot status /tmp/etcd-pre-upgrade.db --write-table
```

### 4. Verify Addon Compatibility

Check release notes for your target version. Common breakage points: ingress controllers, CSI drivers, CNI plugins, and cert-manager. Each addon documents which Kubernetes versions it supports.

```bash
# Check current addon versions
kubectl get deploy -n kube-system -o custom-columns=NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image
helm list -A
```

## Managed Kubernetes Upgrades

### EKS

EKS upgrades the control plane and node groups separately. The control plane upgrade is non-disruptive to running workloads.

```bash
# Upgrade control plane
aws eks update-cluster-version --name my-cluster --kubernetes-version 1.31

# Wait for completion
aws eks wait cluster-active --name my-cluster

# Upgrade managed node group with surge
aws eks update-nodegroup-version \
  --cluster-name my-cluster \
  --nodegroup-name workers \
  --launch-template version=2
```

For zero-downtime node upgrades, use a blue/green node pool strategy: create a new node group at the target version, shift workloads, delete the old group.

### GKE

GKE supports surge upgrades natively. Configure surge settings to control how many extra nodes are created during upgrade:

```bash
gcloud container clusters upgrade my-cluster \
  --master --cluster-version 1.31 --zone us-central1-a

# Upgrade node pool with surge
gcloud container node-pools update workers \
  --cluster my-cluster --zone us-central1-a \
  --max-surge-upgrade 3 --max-unavailable-upgrade 0
```

Setting `--max-unavailable-upgrade 0` ensures no capacity loss during the upgrade. GKE creates 3 extra nodes, migrates workloads, then removes old nodes.

### AKS

```bash
# Check available versions
az aks get-upgrades --resource-group myRG --name myCluster -o table

# Upgrade control plane only
az aks upgrade --resource-group myRG --name myCluster \
  --kubernetes-version 1.31 --control-plane-only

# Upgrade node pool
az aks nodepool upgrade --resource-group myRG --cluster-name myCluster \
  --name workers --kubernetes-version 1.31 --max-surge 33%
```

## Self-Managed Upgrades (kubeadm)

### Upgrade Control Plane Nodes

```bash
# On first control plane node
apt-get update && apt-get install -y kubeadm=1.31.0-1.1

# Check what will change
kubeadm upgrade plan

# Apply the upgrade
kubeadm upgrade apply v1.31.0

# Upgrade kubelet and kubectl
apt-get install -y kubelet=1.31.0-1.1 kubectl=1.31.0-1.1
systemctl daemon-reload && systemctl restart kubelet
```

On additional control plane nodes, use `kubeadm upgrade node` instead of `kubeadm upgrade apply`.

### Upgrade Worker Nodes

Drain, upgrade, uncordon -- one node at a time:

```bash
# From a machine with kubectl access
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

# On the worker node
apt-get update && apt-get install -y kubeadm=1.31.0-1.1
kubeadm upgrade node
apt-get install -y kubelet=1.31.0-1.1
systemctl daemon-reload && systemctl restart kubelet

# From the kubectl machine
kubectl uncordon node-1
```

## Post-Upgrade Validation

```bash
# Verify all nodes are at the new version and Ready
kubectl get nodes -o wide

# Check system pods are running
kubectl get pods -n kube-system

# Verify API server is serving the expected version
kubectl version

# Run a smoke test deployment
kubectl create deployment smoke-test --image=nginx:alpine --replicas=2
kubectl expose deployment smoke-test --port=80
kubectl run curl-test --rm -it --image=curlimages/curl -- curl http://smoke-test
kubectl delete deployment smoke-test && kubectl delete svc smoke-test
```

## Rollback Strategy

Control plane rollbacks on managed services are generally not supported -- the upgrade path is forward. For self-managed clusters, rollback requires restoring from the etcd backup taken pre-upgrade and reinstalling the previous kubelet/kubeadm versions.

The real rollback strategy is preparation: always upgrade non-production first, run the full validation suite, and only then proceed to production. For managed services, the blue/green node pool approach gives you a rollback path -- keep the old node pool until the new one is verified, then delete it.

