---
title: "Resource Requests and Limits: CPU, Memory, QoS, and OOMKilled Debugging"
description: "How Kubernetes CPU and memory requests and limits work, QoS classes, what happens when you get them wrong, and how to right-size your containers."
url: https://agent-zone.ai/knowledge/kubernetes/resource-requests-limits/
section: knowledge
date: 2026-02-22
categories: ["kubernetes"]
tags: ["resources","cpu","memory","qos","oomkilled","limits","requests"]
skills: ["resource-sizing","oomkilled-debugging","capacity-planning"]
tools: ["kubectl","vertical-pod-autoscaler"]
levels: ["intermediate"]
word_count: 1005
formats:
  json: https://agent-zone.ai/knowledge/kubernetes/resource-requests-limits/index.json
  html: https://agent-zone.ai/knowledge/kubernetes/resource-requests-limits/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Resource+Requests+and+Limits%3A+CPU%2C+Memory%2C+QoS%2C+and+OOMKilled+Debugging
---


# Resource Requests and Limits

Requests and limits control how Kubernetes schedules pods and enforces resource usage. Getting them wrong leads to pods that get evicted, throttled to a crawl, or that starve other workloads on the same node.

## Requests vs Limits

**Requests** are what the scheduler uses for placement. When you request 500m CPU and 256Mi memory, Kubernetes finds a node with that much allocatable capacity. The request is a guarantee -- the kubelet reserves those resources for your container.

**Limits** are the ceiling. If your container tries to use more memory than its limit, it gets OOMKilled. If it tries to use more CPU than its limit, it gets throttled (not killed).

```yaml
resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi
```

**Units:**
- CPU: `1` = 1 vCPU/core. `250m` = 0.25 cores. `100m` is a common minimum for lightweight services.
- Memory: `Mi` (mebibytes) and `Gi` (gibibytes). Do not use `M` and `G` (decimal) unless you mean it -- `128Mi` is 134MB, `128M` is 128MB.

## What Happens Without Them

If you set **no requests and no limits**, the pod is BestEffort QoS. It can use whatever is available on the node, but it is the first to be evicted when the node runs low on resources. In a production cluster, this is a recipe for random evictions.

If you set **requests but no limits**, the container is guaranteed its requested resources but can burst above them. This is often the right choice for CPU -- let it burst when the node has spare cycles. For memory, this is riskier because the container can grow unbounded until the node OOM killer intervenes.

## QoS Classes

Kubernetes assigns a Quality of Service class to every pod based on its resource configuration:

| QoS Class | Condition | Eviction Priority |
|---|---|---|
| **Guaranteed** | Every container has requests = limits for both CPU and memory | Last to be evicted |
| **Burstable** | At least one container has a request or limit set, but they are not all equal | Middle |
| **BestEffort** | No requests or limits on any container | First to be evicted |

```yaml
# Guaranteed: requests == limits
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 500m
    memory: 512Mi

# Burstable: requests < limits (most common)
resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

# BestEffort: nothing set (do not do this in production)
# resources: {}
```

For critical workloads (databases, payment services), use Guaranteed. For general web services, Burstable is usually fine.

## CPU Throttling

When a container hits its CPU limit, the kernel throttles it using CFS (Completely Fair Scheduler) bandwidth control. The container does not get killed -- it just runs slower. This manifests as increased request latency with no obvious cause in application logs.

Detect throttling by checking the container's cgroup metrics:

```bash
# Exec into the pod and check throttling stats
kubectl exec web-api-6d4f8b7c9-x2k4m -- cat /sys/fs/cgroup/cpu.stat
# Look for: nr_throttled and throttled_usec
```

A growing `nr_throttled` count means your CPU limit is too low. **Recommendation:** For most workloads, do not set a CPU limit at all. Set a CPU request (which guarantees scheduling) and let the container burst when the node has capacity. CPU limits cause more problems than they solve unless you need strict multi-tenant isolation.

```yaml
# Recommended for most services
resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    # No CPU limit -- let it burst
    memory: 512Mi
```

## OOMKilled Debugging

When a container exceeds its memory limit, the kernel OOM killer terminates it. The pod shows `OOMKilled` in its status.

```bash
# Check for OOMKilled
kubectl get pod web-api-6d4f8b7c9-x2k4m -o jsonpath='{.status.containerStatuses[0].lastState}'

# See the exit code (137 = OOMKilled / SIGKILL)
kubectl describe pod web-api-6d4f8b7c9-x2k4m | grep -A5 "Last State"

# Check current memory usage
kubectl top pod web-api-6d4f8b7c9-x2k4m
```

Common causes of OOMKilled:
1. **Memory limit too low** -- the application legitimately needs more memory. Increase the limit.
2. **Memory leak** -- the application grows over time. The fix is in the application, not the limit.
3. **JVM/runtime overhead** -- your app uses 200Mi but the JVM overhead pushes total container memory past the limit. Account for runtime overhead in your limit. For Java: `-XX:MaxRAMPercentage=75.0` keeps the heap at 75% of the container limit.

**Node-level OOM** is different from container OOM. If the node runs out of memory, the kubelet evicts BestEffort pods first, then Burstable, then Guaranteed. This is why QoS class matters.

## LimitRanges and ResourceQuotas

**LimitRange** sets default and maximum resource values per container in a namespace. Use it to prevent anyone from deploying pods without resource requests:

```yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-resources
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: 250m
      memory: 256Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "2"
      memory: 2Gi
    min:
      cpu: 50m
      memory: 64Mi
```

**ResourceQuota** caps the total resources consumed by all pods in a namespace:

```yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
```

When a ResourceQuota is active, **every pod must specify requests and limits** for the resources being quota'd. If you enable a memory quota but a pod does not have a memory request, the pod will be rejected. This is why LimitRange with defaults should always accompany ResourceQuota.

## Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) monitors actual resource usage and recommends (or automatically applies) better request/limit values. Run it in recommendation mode first:

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Off"  # "Off" = recommend only, "Auto" = apply changes
```

Check recommendations with:

```bash
kubectl describe vpa web-api-vpa
```

VPA and HPA (Horizontal Pod Autoscaler) can conflict if both try to scale based on CPU. If you use both, have HPA scale on custom metrics and VPA handle resource sizing.

## Practical Recommendations

- Always set memory limits. An unconstrained container can take down the entire node.
- Consider omitting CPU limits to avoid throttling. Use CPU requests to guarantee scheduling.
- Start with generous limits and tighten based on VPA recommendations or `kubectl top pod` data over a week.
- Set LimitRange defaults on every namespace so no pod runs without resources defined.
- For production databases: Guaranteed QoS with requests equal to limits.

