---
title: "Kubernetes Events Debugging: Patterns, Filtering, and Alerting"
description: "Using Kubernetes events for debugging workload issues. Event structure, filtering by reason and type, common event patterns that indicate problems, and event-based alerting with kubewatch and Event Exporter."
url: https://agent-zone.ai/knowledge/kubernetes/kubernetes-events-debugging/
section: knowledge
date: 2026-02-22
categories: ["kubernetes"]
tags: ["events","debugging","monitoring","kubewatch","troubleshooting","observability"]
skills: ["event-analysis","workload-debugging","event-based-alerting","problem-diagnosis"]
tools: ["kubectl","kubewatch","kubernetes-event-exporter"]
levels: ["beginner","intermediate"]
word_count: 1276
formats:
  json: https://agent-zone.ai/knowledge/kubernetes/kubernetes-events-debugging/index.json
  html: https://agent-zone.ai/knowledge/kubernetes/kubernetes-events-debugging/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Kubernetes+Events+Debugging%3A+Patterns%2C+Filtering%2C+and+Alerting
---


# Kubernetes Events Debugging

Kubernetes events are the cluster's built-in audit trail for what is happening to resources. When a pod fails to schedule, a container crashes, a node runs out of disk, or a volume fails to mount, the system records an event. Events are the first place to look when something goes wrong, and learning to read them efficiently separates quick diagnosis from hours of guessing.

## Event Structure

Every Kubernetes event has these fields:

| Field | Description |
|---|---|
| `type` | `Normal` or `Warning`. Normal events are informational. Warning events indicate problems. |
| `reason` | Machine-readable cause: `Scheduled`, `Pulling`, `Started`, `BackOff`, `FailedScheduling`, etc. |
| `message` | Human-readable description of what happened. |
| `involvedObject` | The resource the event is about (Pod, Node, Deployment, PVC, etc.). |
| `source` | The component that generated the event (kubelet, scheduler, controller-manager). |
| `firstTimestamp` | When the event first occurred. |
| `lastTimestamp` | When the event most recently occurred. |
| `count` | How many times the event has been observed. A high count means the problem is repeating. |

Events are not persisted indefinitely. By default, the API server keeps events for 1 hour. After that, they are garbage collected. If you need historical events, export them to a logging system.

## Viewing Events

### All Events in a Namespace

```bash
# Default listing, sorted by last timestamp
kubectl get events -n production

# Sort by creation timestamp for chronological order
kubectl get events -n production --sort-by='.lastTimestamp'

# Watch events in real time
kubectl get events -n production --watch
```

### Events for a Specific Resource

```bash
# Events for a specific pod
kubectl describe pod my-pod -n production
# The Events section at the bottom shows all events for this pod

# Events for a specific deployment
kubectl describe deployment web-api -n production

# Events for a specific node
kubectl describe node worker-1
```

### Events Across All Namespaces

```bash
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
```

## Filtering Events

Raw event output is noisy. Filtering is essential for finding the signal.

### Filter by Type (Warning Only)

```bash
# Show only warning events -- these are the ones that indicate problems
kubectl get events -n production --field-selector type=Warning
```

This is the single most useful filter. Normal events tell you things are working. Warning events tell you things are broken.

### Filter by Reason

```bash
# Find all scheduling failures
kubectl get events --all-namespaces --field-selector reason=FailedScheduling

# Find all image pull failures
kubectl get events --all-namespaces --field-selector reason=Failed

# Find all OOMKilled events (requires searching message text)
kubectl get events --all-namespaces -o json | \
  jq -r '.items[] | select(.message | test("OOM")) |
    "\(.metadata.namespace)/\(.involvedObject.name): \(.message)"'
```

### Filter by Involved Object

```bash
# Events for a specific object type
kubectl get events -n production --field-selector involvedObject.kind=Pod

# Events for a specific named resource
kubectl get events -n production \
  --field-selector involvedObject.name=web-api-7d4f8b6c9-x2k4p

# Events for a specific node
kubectl get events --field-selector involvedObject.kind=Node,involvedObject.name=worker-1
```

### Combined Filters

```bash
# Warning events for pods in production
kubectl get events -n production \
  --field-selector type=Warning,involvedObject.kind=Pod
```

### Custom Output Columns

```bash
# Compact output with the fields that matter
kubectl get events -n production \
  -o custom-columns=TIME:.lastTimestamp,TYPE:.type,REASON:.reason,OBJECT:.involvedObject.name,MESSAGE:.message
```

## Common Event Patterns and What They Mean

### Scheduling Failures

**Event:** `FailedScheduling` with message containing `Insufficient cpu` or `Insufficient memory`

```
Warning  FailedScheduling  pod/web-api-xyz  0/3 nodes are available: 3 Insufficient cpu.
```

**Fix:** Reduce resource requests, add nodes, or check current allocation with `kubectl describe nodes | grep -A 5 "Allocated resources"`.

If the message mentions `node(s) had taint`, all nodes have taints the pod does not tolerate. Add tolerations to the pod spec or untaint the nodes.

### Image Pull Failures

**Event:** `Failed` with message `Failed to pull image` or `ErrImagePull`

```
Warning  Failed     pod/web-api-xyz  Failed to pull image "registry.example.com/web-api:2.0.0": rpc error: code = NotFound desc = failed to pull and unpack image
Warning  BackOff    pod/web-api-xyz  Back-off pulling image "registry.example.com/web-api:2.0.0"
```

**Cause:** The image does not exist, the tag is wrong, or the node cannot authenticate with the registry.

**Fix:** Verify the image and tag exist. Check image pull secrets:

```bash
# Verify the image exists
docker manifest inspect registry.example.com/web-api:2.0.0

# Check if the pod has an imagePullSecret configured
kubectl get pod web-api-xyz -n production -o jsonpath='{.spec.imagePullSecrets}'

# Verify the secret exists and has valid credentials
kubectl get secret regcred -n production -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d
```

### CrashLoopBackOff

**Event:** `BackOff` with message `Back-off restarting failed container`

```
Warning  BackOff  pod/worker-abc  Back-off restarting failed container
```

**Cause:** The container starts, crashes, and Kubernetes restarts it with exponentially increasing delays. The container's own logs explain the actual error.

**Fix:**

```bash
# Check the container's logs from the current (or previous crashed) instance
kubectl logs worker-abc -n production
kubectl logs worker-abc -n production --previous

# Check the exit code
kubectl get pod worker-abc -n production -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
# Exit code 1: application error. Exit code 137: OOMKilled. Exit code 139: segfault.
```

### Volume Mount Failures

**Event:** `FailedMount` -- the PersistentVolume cannot be mounted (attached to another node, storage class unavailable, PVC pending).

```bash
kubectl get pvc -n production
kubectl describe pvc data-db-0 -n production
```

### Probe Failures

**Event:** `Unhealthy` with liveness or readiness probe details. Liveness failures restart the container. Readiness failures remove the pod from service endpoints. Common cause: `initialDelaySeconds` too short.

```bash
kubectl get pod web-api-xyz -n production -o jsonpath='{.spec.containers[0].livenessProbe}'
```

### OOMKilled

Not always a direct event, but visible in pod status. The container exceeded its memory limit.

```bash
kubectl get pods -n production -o json | \
  jq -r '.items[] | .status.containerStatuses[]? |
    select(.lastState.terminated.reason == "OOMKilled") |
    "\(.name) restartCount=\(.restartCount)"'
```

## Node-Level Events

Node events reveal infrastructure issues -- `NodeNotReady`, `EvictionThreshold`, `OOMKilling`, `KernelDeadlock`:

```bash
kubectl describe node worker-1 | tail -30

# Find nodes with pressure conditions
kubectl get nodes -o json | \
  jq -r '.items[] | select(.status.conditions[] | select(.type != "Ready" and .status == "True")) |
    "\(.metadata.name): \([.status.conditions[] | select(.status == "True") | .type])"'
```

## Event-Based Alerting

### Kubernetes Event Exporter

Event Exporter watches all cluster events and forwards them to external sinks (Slack, Elasticsearch, webhooks). Configure it to route warning events to alerting channels and all events to a log store for post-incident analysis:

```yaml
# event-exporter-config.yaml (ConfigMap data)
logLevel: error
route:
  routes:
    - match:
        - receiver: "slack-warnings"
          kind: "Pod|Node|Deployment"
          type: "Warning"
    - match:
        - receiver: "elasticsearch-all"
receivers:
  - name: "slack-warnings"
    webhook:
      endpoint: "https://hooks.slack.com/services/T00/B00/xxx"
      headers:
        Content-Type: application/json
      layout:
        text: "{{ .Type }} {{ .Reason }} in {{ .Namespace }}/{{ .InvolvedObject.Name }}: {{ .Message }}"
  - name: "elasticsearch-all"
    elasticsearch:
      hosts:
        - "http://elasticsearch:9200"
      index: kube-events
      useEventID: true
```

### Kubewatch

Kubewatch is a simpler tool focused on resource state changes. Install via Helm:

```bash
helm install kubewatch kubewatch/kubewatch \
  --set rbac.create=true \
  --set slack.enabled=true \
  --set slack.channel="#k8s-alerts" \
  --set slack.token="xoxb-your-token" \
  --set resourcesToWatch.pod=true \
  --set resourcesToWatch.deployment=true \
  --set namespaceToWatch="production"
```

### Prometheus Event Metrics

If you run Prometheus with kube-state-metrics, event counts are available as metrics. Create alerting rules for recurring problems:

```yaml
groups:
- name: kubernetes-events
  rules:
  - alert: PodSchedulingFailure
    expr: increase(kube_event_count{reason="FailedScheduling",type="Warning"}[15m]) > 5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Repeated scheduling failures detected"
```

## Debugging Workflow Using Events

When a workload is not behaving as expected, follow this event-driven debugging sequence:

```bash
# 1. Get warning events in the namespace, most recent first
kubectl get events -n production --field-selector type=Warning --sort-by='.lastTimestamp'

# 2. If events point to a specific pod, get its full event history
kubectl describe pod <pod-name> -n production

# 3. If events mention scheduling, check node capacity
kubectl describe nodes | grep -A 5 "Allocated resources"

# 4. If events mention image pull, verify the image
kubectl get pod <pod-name> -n production -o jsonpath='{.spec.containers[*].image}'

# 5. If events mention volume mount, check PVC status
kubectl get pvc -n production

# 6. If events mention probe failures, check application logs
kubectl logs <pod-name> -n production --previous

# 7. If no events exist (event TTL expired), check pod status directly
kubectl get pod <pod-name> -n production -o yaml | grep -A 20 "status:"
```

Events are ephemeral by design. For post-incident analysis, make sure your event exporter or logging pipeline is capturing events before you need them. Discovering that events expired before you could read them is one of the more frustrating Kubernetes debugging experiences.

