Kubernetes Audit Logging: Policies, Backends, and Threat Detection

Kubernetes Audit Logging#

Kubernetes audit logging records every request to the API server: who made the request, what they asked for, and what happened. Without audit logging, you have no visibility into who accessed secrets, who changed RBAC roles, or who exec’d into a production pod. It is the foundation of security monitoring in Kubernetes.

Audit Policy#

The audit policy defines which events to record and at what detail level. There are four levels:

Kubernetes Audit Logging: Tracking API Activity for Security and Compliance

Kubernetes Audit Logging: Tracking API Activity for Security and Compliance#

Audit logging records every request to the Kubernetes API server. Every kubectl command, every controller reconciliation, every kubelet heartbeat, every admission webhook call – all of it can be captured with the requester’s identity, the target resource, the timestamp, and optionally the full request and response bodies. Without audit logging, you have no record of who did what in your cluster. With it, you can trace security incidents, satisfy compliance requirements, and debug access control issues.

Security Incident Response for Infrastructure

Incident Response Overview#

Security incidents in infrastructure environments follow a predictable lifecycle. The difference between a contained incident and a catastrophic breach is usually preparation and speed of response. This playbook covers the six phases of incident response with specific commands and procedures for Kubernetes and containerized infrastructure.

The phases are sequential but overlap in practice: you may be containing one aspect of an incident while still detecting the full scope.

Advanced Kubernetes Debugging: CrashLoopBackOff, ImagePullBackOff, OOMKilled, and Stuck Pods

Advanced Kubernetes Debugging#

Every Kubernetes failure follows a pattern, and every pattern has a diagnostic sequence. This guide covers the most common failure modes you will encounter in production, with the exact commands and thought process to move from symptom to resolution.

Systematic Debugging Methodology#

Before diving into specific scenarios, internalize this sequence. It applies to nearly every pod issue:

# Step 1: What state is the pod in?
kubectl get pod <pod> -n <ns> -o wide

# Step 2: What does the full pod spec and event history show?
kubectl describe pod <pod> -n <ns>

# Step 3: What did the application log before it failed?
kubectl logs <pod> -n <ns> --previous --all-containers

# Step 4: Can you get inside the container?
kubectl exec -it <pod> -n <ns> -- /bin/sh

# Step 5: Is the node healthy?
kubectl describe node <node-name>
kubectl top node <node-name>

Each failure mode below follows this pattern, with specific things to look for at each step.

Kubernetes Troubleshooting Decision Trees: Symptom to Diagnosis to Fix

Kubernetes Troubleshooting Decision Trees#

Troubleshooting Kubernetes in production is about eliminating possibilities in the right order. Every symptom maps to a finite set of causes, and each cause has a specific diagnostic command. The decision trees below encode that mapping. Start at the symptom, follow the branches, run the commands, and the output tells you which branch to take next.

These trees are designed to be followed mechanically. No intuition required – just execute the commands and interpret the results.