Self-Hosted CI Runners at Scale: GitHub Actions Runner Controller, GitLab Runners on K8s, and Autoscaling

Self-Hosted CI Runners at Scale#

GitHub-hosted and GitLab SaaS runners work until they do not. You hit limits when you need private network access to deploy to internal infrastructure, specific hardware like GPUs or ARM64 machines, compliance requirements that prohibit running code on shared infrastructure, or cost control when you are burning thousands of dollars per month on hosted runner minutes.

Self-hosted runners solve these problems but introduce operational complexity: you now own runner provisioning, scaling, security, image updates, and cost management.

Choosing an Autoscaling Strategy: HPA vs VPA vs KEDA vs Karpenter/Cluster Autoscaler

Choosing an Autoscaling Strategy#

Kubernetes autoscaling operates at two distinct layers: pod-level scaling changes how many pods run or how large they are, while node-level scaling changes how many nodes exist in the cluster to host those pods. Getting the right combination of tools at each layer is the key to a system that responds to demand without wasting resources.

The Two Scaling Layers#

Understanding which layer a tool operates on prevents the most common misconfiguration – expecting pod-level scaling to solve node-level capacity problems, or vice versa.

Zero-Egress Architecture with Cloudflare R2: Eliminating Data Transfer Costs

Zero-Egress Architecture with Cloudflare R2#

Every major cloud provider charges you to download your own data. AWS S3 charges $0.09/GB. Google Cloud Storage charges $0.12/GB. Azure Blob charges $0.087/GB. These egress fees are the most unpredictable line item on cloud bills – they scale with success. The more users download your data, the more you pay.

Cloudflare R2 charges $0 for egress. Zero. Unlimited. Every download is free, whether it is 1 GB or 100 TB. R2 uses the S3-compatible API, so existing tools and SDKs work without changes. This single pricing difference changes how you architect storage, serving, and cross-cloud data flow.

Spot Instances and Preemptible Nodes: Running Kubernetes on Discounted Compute

Spot Instances and Preemptible Nodes#

Spot instances are unused cloud capacity sold at a steep discount – typically 60-90% off on-demand pricing. The trade-off: the cloud provider can reclaim them with minimal notice. AWS gives a 2-minute warning, GCP gives 30 seconds, and Azure varies. Running Kubernetes workloads on spot instances is one of the most effective cost reduction strategies available, but it requires architecture that tolerates sudden node loss.