Cloud Cost Optimization

The Cost Optimization Hierarchy#

Cloud cost optimization follows a hierarchy of impact. Work from the top down – fixing the wrong tier of commitment discount matters far less than shutting down resources nobody uses.

  1. Eliminate waste – turn off unused resources, delete orphaned storage
  2. Right-size – match instance sizes to actual usage
  3. Use commitment discounts – reserved instances, savings plans, CUDs
  4. Shift to spot/preemptible – for fault-tolerant workloads
  5. Optimize storage and network – tiering, transfer patterns, caching
  6. Architect for cost – serverless, auto-scaling, multi-region strategy

Eliminating Waste#

The fastest cost reduction comes from finding resources that serve no purpose. Every cloud provider accumulates these: instances left running after a test, snapshots from decommissioned servers, load balancers with no backends, unattached disks.

Vertical Pod Autoscaler (VPA): Right-Sizing Resource Requests Automatically

Vertical Pod Autoscaler (VPA)#

Horizontal scaling adds more pod replicas. Vertical scaling gives each pod more (or fewer) resources. VPA automates the vertical side by watching actual CPU and memory usage over time and adjusting resource requests to match reality. Without it, teams guess at resource requests during initial deployment and rarely revisit them, leading to either waste (over-provisioned) or instability (under-provisioned).

What VPA Does#

VPA monitors historical and current resource usage for pods in a target Deployment (or StatefulSet, DaemonSet, etc.) and produces recommendations for CPU and memory requests. Depending on the configured mode, it either reports these recommendations passively or actively applies them by evicting and recreating pods with updated requests.