Advanced Kubernetes Debugging#
Every Kubernetes failure follows a pattern, and every pattern has a diagnostic sequence. This guide covers the most common failure modes you will encounter in production, with the exact commands and thought process to move from symptom to resolution.
Systematic Debugging Methodology#
Before diving into specific scenarios, internalize this sequence. It applies to nearly every pod issue:
# Step 1: What state is the pod in?
kubectl get pod <pod> -n <ns> -o wide
# Step 2: What does the full pod spec and event history show?
kubectl describe pod <pod> -n <ns>
# Step 3: What did the application log before it failed?
kubectl logs <pod> -n <ns> --previous --all-containers
# Step 4: Can you get inside the container?
kubectl exec -it <pod> -n <ns> -- /bin/sh
# Step 5: Is the node healthy?
kubectl describe node <node-name>
kubectl top node <node-name>
Each failure mode below follows this pattern, with specific things to look for at each step.