Choosing an Autoscaling Strategy: HPA vs VPA vs KEDA vs Karpenter/Cluster Autoscaler

Choosing an Autoscaling Strategy#

Kubernetes autoscaling operates at two distinct layers: pod-level scaling changes how many pods run or how large they are, while node-level scaling changes how many nodes exist in the cluster to host those pods. Getting the right combination of tools at each layer is the key to a system that responds to demand without wasting resources.

The Two Scaling Layers#

Understanding which layer a tool operates on prevents the most common misconfiguration – expecting pod-level scaling to solve node-level capacity problems, or vice versa.

Kubernetes Cost Audit and Reduction: A Systematic Operational Plan

Kubernetes Cost Audit and Reduction#

Kubernetes clusters accumulate cost waste silently. Resource requests padded “just in case” during initial deployment never get revisited. Load balancers created for debugging stay running. PVCs from deleted applications persist. Over six months, a cluster originally running at $5,000/month can drift to $12,000 with no corresponding increase in actual workload.

This operational plan works through cost reduction systematically, starting with visibility (you cannot cut what you cannot see), moving through quick wins, then tackling the larger structural optimizations that require data collection and careful rollout.

Kubernetes Cost Optimization: Rightsizing, Resource Efficiency, and Waste Reduction

Kubernetes Cost Optimization#

Most Kubernetes clusters run at 15-30% actual CPU utilization but are billed for the full provisioned capacity. The gap between what you reserve and what you use is pure waste. This article covers the practical workflow for finding and eliminating that waste.

The Cost Problem: Requests vs Actual Usage#

Kubernetes resource requests are the foundation of cost. When a pod requests 4 CPUs, the scheduler reserves 4 CPUs on a node regardless of whether the pod ever uses more than 0.1 CPU. The node is sized (and billed) based on what is reserved, not what is consumed.

Vertical Pod Autoscaler (VPA): Right-Sizing Resource Requests Automatically

Vertical Pod Autoscaler (VPA)#

Horizontal scaling adds more pod replicas. Vertical scaling gives each pod more (or fewer) resources. VPA automates the vertical side by watching actual CPU and memory usage over time and adjusting resource requests to match reality. Without it, teams guess at resource requests during initial deployment and rarely revisit them, leading to either waste (over-provisioned) or instability (under-provisioned).

What VPA Does#

VPA monitors historical and current resource usage for pods in a target Deployment (or StatefulSet, DaemonSet, etc.) and produces recommendations for CPU and memory requests. Depending on the configured mode, it either reports these recommendations passively or actively applies them by evicting and recreating pods with updated requests.