Scenario: Debugging Kubernetes Network Connectivity End-to-End

Scenario: Debugging Kubernetes Network Connectivity End-to-End#

The report comes in as it always does: “my application can’t reach another service.” This is one of the most common and most frustrating categories of Kubernetes issues because the networking stack has multiple layers, and the symptom (timeout, connection refused, 502) tells you almost nothing about which layer is broken.

This scenario walks through a systematic diagnostic process, starting from the symptom and narrowing down to the root cause. Follow these steps in order. Each step either identifies the problem or eliminates a layer from the investigation.

Advanced Kubernetes Debugging: CrashLoopBackOff, ImagePullBackOff, OOMKilled, and Stuck Pods

Advanced Kubernetes Debugging#

Every Kubernetes failure follows a pattern, and every pattern has a diagnostic sequence. This guide covers the most common failure modes you will encounter in production, with the exact commands and thought process to move from symptom to resolution.

Systematic Debugging Methodology#

Before diving into specific scenarios, internalize this sequence. It applies to nearly every pod issue:

# Step 1: What state is the pod in?
kubectl get pod <pod> -n <ns> -o wide

# Step 2: What does the full pod spec and event history show?
kubectl describe pod <pod> -n <ns>

# Step 3: What did the application log before it failed?
kubectl logs <pod> -n <ns> --previous --all-containers

# Step 4: Can you get inside the container?
kubectl exec -it <pod> -n <ns> -- /bin/sh

# Step 5: Is the node healthy?
kubectl describe node <node-name>
kubectl top node <node-name>

Each failure mode below follows this pattern, with specific things to look for at each step.