Kubernetes Troubleshooting Decision Trees: Symptom to Diagnosis to Fix

Kubernetes Troubleshooting Decision Trees#

Troubleshooting Kubernetes in production is about eliminating possibilities in the right order. Every symptom maps to a finite set of causes, and each cause has a specific diagnostic command. The decision trees below encode that mapping. Start at the symptom, follow the branches, run the commands, and the output tells you which branch to take next.

These trees are designed to be followed mechanically. No intuition required – just execute the commands and interpret the results.

Load Balancer Patterns: L4 vs L7, Health Checks, Session Affinity, and Cloud LB Selection

L4 vs L7 Load Balancing#

The distinction between Layer 4 and Layer 7 load balancing determines what the load balancer can see and what routing decisions it can make.

Layer 4 (Transport) load balancers work at the TCP/UDP level. They see source/destination IPs and ports but not the content of the traffic. They forward raw TCP connections to backends. This makes them fast (no protocol parsing), protocol-agnostic (works for HTTP, gRPC, database connections, custom protocols), and transparent (the backend sees the original packets, mostly). Use L4 for database connections, raw TCP services, and when you need maximum throughput with minimum latency.

Minikube Networking: Services, Ingress, DNS, and LoadBalancer Emulation

Minikube Networking: Services, Ingress, DNS, and LoadBalancer Emulation#

Minikube networking behaves differently from cloud Kubernetes in ways that cause confusion. LoadBalancer services do not get external IPs by default, the minikube IP may or may not be directly reachable from your host depending on the driver, and ingress requires specific addon setup. Understanding these differences prevents hours of debugging connection timeouts to services that are actually running fine.

How Minikube Networking Works#

Minikube creates a single node (a VM or container depending on the driver) with its own IP address. Pods inside the cluster get IPs from an internal CIDR. Services get ClusterIPs from another internal range. The bridge between your host machine and the cluster depends entirely on which driver you use.

Minikube Setup, Drivers, and Resource Configuration

Minikube Setup, Drivers, and Resource Configuration#

Minikube runs a single-node Kubernetes cluster on your local machine. The difference between a minikube setup that feels like a toy and one that behaves like production comes down to three choices: the driver, the resource allocation, and the Kubernetes version. Get these wrong and you spend more time fighting the tool than using it.

Installation#

On macOS with Homebrew:

brew install minikube

On Linux via direct download:

Minikube with Docker Driver on Apple Silicon

Why the Docker Driver on ARM64#

When running Minikube on Apple Silicon (M1/M2/M3/M4), the driver you choose determines whether your containers run natively or through emulation. The Docker driver runs containers directly on the host architecture — ARM64 — with zero emulation overhead.

This matters because QEMU user-mode emulation, which kicks in when you try to run amd64 images on ARM64, cannot reliably execute Go binaries. The specific failure is a crash in lfstack.push, deep in Go’s runtime memory management. This is not a fixable application bug — it is a fundamental incompatibility between QEMU’s user-mode emulation and Go’s lock-free stack implementation.

Multi-Cluster Kubernetes: Architecture, Networking, and Management Patterns

Multi-Cluster Kubernetes#

A single Kubernetes cluster is a single blast radius. A bad deployment, a control plane failure, a misconfigured admission webhook – any of these can take down everything. Multi-cluster is not about complexity for its own sake. It is about isolation, resilience, and operating workloads that span regions, regulations, or teams.

Why Multi-Cluster#

Blast radius isolation. A cluster-wide failure (etcd corruption, bad admission webhook, API server overload) only affects one cluster. Critical workloads in another cluster are untouched.

OAuth2 and OIDC for Infrastructure

OAuth2 vs OIDC: What Actually Matters#

OAuth2 is an authorization framework. It answers the question “what is this client allowed to do?” by issuing access tokens. It does not tell you who the user is. OIDC (OpenID Connect) is a layer on top of OAuth2 that adds authentication. It answers “who is this user?” by adding an ID token – a signed JWT containing user identity claims like email, name, and group memberships.

OPA Gatekeeper: Policy as Code for Kubernetes

OPA Gatekeeper: Policy as Code for Kubernetes#

Gatekeeper is a Kubernetes-native policy engine built on Open Policy Agent (OPA). It runs as a validating admission webhook and evaluates policies written in Rego against every matching API request. Instead of deploying raw Rego files to an OPA server, Gatekeeper uses Custom Resource Definitions: you define policies as ConstraintTemplates and instantiate them as Constraints. This makes policy management declarative, auditable, and version-controlled.

Pod Affinity and Anti-Affinity: Co-locating and Spreading Workloads

Pod Affinity and Anti-Affinity#

Node affinity controls which nodes a pod can run on. Pod affinity and anti-affinity go further – they control whether a pod should run near or away from other specific pods. This is how you co-locate a frontend with its cache for low latency, or spread database replicas across failure domains for high availability.

Pod Affinity: Schedule Near Other Pods#

Pod affinity tells the scheduler “place this pod in the same topology domain as pods matching a label selector.” The topology domain is defined by topologyKey – it could be the same node, the same zone, or any other node label.

Pod Security Standards and Admission: Replacing PodSecurityPolicy

Pod Security Standards and Admission#

PodSecurityPolicy (PSP) was removed from Kubernetes in v1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces three predefined security profiles. PSA is simpler than PSP – no separate policy objects, no RBAC bindings to manage – but it is also less flexible. You apply security standards to namespaces via labels and the admission controller handles enforcement.

The Three Security Standards#

Kubernetes defines three Pod Security Standards, each progressively more restrictive: