---
title: "Cluster Autoscaling: HPA, Cluster Autoscaler, and KEDA"
description: "How to configure pod and node autoscaling in Kubernetes using HPA v2, Cluster Autoscaler, and KEDA for event-driven workloads."
url: https://agent-zone.ai/knowledge/kubernetes/cluster-autoscaling/
section: knowledge
date: 2026-02-22
categories: ["kubernetes"]
tags: ["autoscaling","hpa","cluster-autoscaler","keda","scaling"]
skills: ["autoscaling-configuration","capacity-planning"]
tools: ["kubectl","helm"]
levels: ["intermediate"]
word_count: 964
formats:
  json: https://agent-zone.ai/knowledge/kubernetes/cluster-autoscaling/index.json
  html: https://agent-zone.ai/knowledge/kubernetes/cluster-autoscaling/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Cluster+Autoscaling%3A+HPA%2C+Cluster+Autoscaler%2C+and+KEDA
---


# Cluster Autoscaling

Kubernetes autoscaling operates at two levels: pod-level (HPA adds or removes pod replicas) and node-level (Cluster Autoscaler adds or removes nodes). Getting them to work together requires understanding how each makes decisions.

## Horizontal Pod Autoscaler (HPA)

HPA adjusts the replica count of a Deployment, StatefulSet, or ReplicaSet based on observed metrics. The metrics-server must be running in your cluster for CPU and memory metrics.

### Basic HPA on CPU

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
```

This scales `my-app` between 2 and 10 replicas, targeting 70% average CPU utilization across all pods. The HPA checks metrics every 15 seconds (default) and computes the desired replica count as:

```
desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue))
```

**Your pods must have CPU requests defined.** Without requests, HPA cannot calculate utilization percentages and will refuse to scale.

### Multiple Metrics

HPA v2 supports scaling on multiple metrics simultaneously. The HPA calculates the desired replica count for each metric independently and takes the maximum:

```yaml
spec:
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
```

Custom metrics (like `http_requests_per_second`) require a metrics adapter such as Prometheus Adapter or Datadog Cluster Agent that implements the `custom.metrics.k8s.io` API.

### Scaling Behavior and Stabilization

HPA v2 lets you control how fast scaling happens in each direction. This prevents thrashing -- rapid scale-up/scale-down cycles caused by metric fluctuations:

```yaml
spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0    # scale up immediately
      policies:
        - type: Percent
          value: 100                    # can double pod count per period
          periodSeconds: 60
        - type: Pods
          value: 4                      # or add up to 4 pods per period
          periodSeconds: 60
      selectPolicy: Max                 # use whichever policy allows more scaling

    scaleDown:
      stabilizationWindowSeconds: 300   # wait 5 minutes before scaling down
      policies:
        - type: Percent
          value: 10                     # remove at most 10% of pods per period
          periodSeconds: 60
      selectPolicy: Min                 # use whichever policy is more conservative
```

The stabilization window looks at desired replica counts over the window duration and picks the highest (for scale-down) or lowest (for scale-up). A 300-second scale-down window means the HPA will not scale down until the desired count has been consistently lower for 5 minutes.

`selectPolicy: Max` means "use the policy that allows the most change" (aggressive). `selectPolicy: Min` means "use the policy that allows the least change" (conservative). `selectPolicy: Disabled` prevents scaling in that direction entirely.

### Debugging HPA

```bash
# Check current status and events
kubectl describe hpa my-app

# Watch scaling decisions
kubectl get hpa my-app --watch

# Common problems:
# "unable to fetch metrics" -- metrics-server not running or pods have no resource requests
# "failed to get cpu utilization" -- resource requests not set on containers
# ScalingActive condition is False -- check the events for the reason
```

## Cluster Autoscaler

Cluster Autoscaler adjusts the number of nodes in your cluster. It runs as a deployment and interacts with your cloud provider's API (EKS, GKE, AKS) to add or remove nodes.

**Scale-up trigger:** A pod is unschedulable because no node has enough resources. The Cluster Autoscaler simulates whether adding a node from any node group would allow the pod to schedule. If yes, it provisions a node.

**Scale-down trigger:** A node's utilization (sum of pod requests / node capacity) drops below a threshold (default 50%) for a sustained period (default 10 minutes). The autoscaler checks if all pods on the node can be moved elsewhere. If yes, it drains and removes the node.

### Pod Disruption Budgets

PDBs protect workloads during node scale-down. The Cluster Autoscaler respects PDBs -- it will not drain a node if doing so violates a PDB:

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2        # or use maxUnavailable: 1
  selector:
    matchLabels:
      app: my-app
```

Without PDBs, the Cluster Autoscaler can drain all replicas of your app simultaneously during scale-down. Always define PDBs for production workloads.

### Scale-Down Blockers

Certain pods prevent node removal:
- Pods with local storage (`emptyDir` is fine, `hostPath` is not by default)
- Pods not managed by a controller (bare pods without a Deployment/ReplicaSet)
- Pods with `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"` annotation
- Kube-system pods without a PDB

Annotate pods that are safe to evict even though they have local storage:

```yaml
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
```

## KEDA: Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) extends HPA with external event sources. It can scale based on queue depth, database connections, cron schedules, Prometheus queries, and dozens of other sources.

Install KEDA:

```bash
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
```

### Scale on Queue Depth

```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
spec:
  scaleTargetRef:
    name: order-processor    # Deployment name
  minReplicaCount: 0         # KEDA can scale to zero (HPA cannot)
  maxReplicaCount: 20
  cooldownPeriod: 300
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://guest:guest@rabbitmq.default.svc:5672/
        queueName: orders
        queueLength: "5"     # 1 pod per 5 messages in queue
```

KEDA also supports Prometheus queries, cron schedules, Kafka consumer lag, AWS SQS, and dozens of other trigger types. Its key advantage over plain HPA is **scale-to-zero**. Standard HPA requires `minReplicas >= 1`. KEDA manages the zero-to-one transition by watching the event source directly. Once the first pod is running, KEDA hands off to HPA for further scaling.

## Putting It Together

A production autoscaling setup typically combines all three:

1. **HPA** scales pods based on CPU/memory and custom metrics.
2. **Cluster Autoscaler** adds nodes when pending pods cannot be scheduled.
3. **PDBs** protect workloads during scale-down events.
4. **KEDA** handles event-driven workloads that need scale-to-zero.

The interaction is: HPA requests more pods, pods go to Pending because nodes are full, Cluster Autoscaler detects pending pods and provisions a node, the new pods get scheduled. On scale-down, HPA reduces replicas, Cluster Autoscaler detects underutilized nodes, respects PDBs, drains, and removes nodes.

