Cluster Autoscaling: HPA, Cluster Autoscaler, and KEDA

Cluster Autoscaling#

Kubernetes autoscaling operates at two levels: pod-level (HPA adds or removes pod replicas) and node-level (Cluster Autoscaler adds or removes nodes). Getting them to work together requires understanding how each makes decisions.

Horizontal Pod Autoscaler (HPA)#

HPA adjusts the replica count of a Deployment, StatefulSet, or ReplicaSet based on observed metrics. The metrics-server must be running in your cluster for CPU and memory metrics.

Basic HPA on CPU#

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

This scales my-app between 2 and 10 replicas, targeting 70% average CPU utilization across all pods. The HPA checks metrics every 15 seconds (default) and computes the desired replica count as:

CockroachDB Day-2 Operations

Adding and Removing Nodes#

Adding a node: start a new cockroach process with --join pointing to existing nodes. CockroachDB automatically rebalances ranges to the new node.

cockroach start --insecure --store=node4-data \
  --advertise-addr=node4:26257 \
  --join=node1:26257,node2:26257,node3:26257

Watch rebalancing in the DB Console under Metrics > Replication, or query directly:

SELECT node_id, range_count, lease_count FROM crdb_internal.kv_store_status;

Decommissioning a node moves all range replicas off before shutdown, preventing under-replication:

cockroach node decommission 4 --insecure --host=node1:26257

# Monitor progress
cockroach node status --insecure --host=node1:26257 --decommission

Do not simply kill a node. Without decommissioning, CockroachDB treats it as a failure and waits 5 minutes before re-replicating. On Kubernetes with the operator, scale by changing spec.nodes in the CrdbCluster resource.

CockroachDB Setup and Architecture

Architecture: What CockroachDB Actually Does Under the Hood#

CockroachDB is a distributed SQL database that stores data across multiple nodes while presenting a single logical database to clients. Understanding three concepts is essential before deploying it.

Ranges. All data is stored in key-value pairs, sorted by key. CockroachDB splits this sorted keyspace into contiguous chunks called ranges, each targeting 512 MiB by default. Every SQL table, index, and system table maps to one or more ranges. When a range grows beyond the threshold, it splits automatically.

ConfigMaps and Secrets: Configuration Management in Kubernetes

ConfigMaps and Secrets#

ConfigMaps hold non-sensitive configuration data. Secrets hold sensitive data like passwords, tokens, and TLS certificates. They look similar in structure but differ in handling: Secrets are base64-encoded, stored with slightly restricted access by default, and can be encrypted at rest if the cluster is configured for it.

Creating ConfigMaps#

From a literal value:

kubectl create configmap app-config \
  --from-literal=LOG_LEVEL=info \
  --from-literal=MAX_CONNECTIONS=100

From a file:

kubectl create configmap nginx-config --from-file=nginx.conf

The key name defaults to the filename. Override it with --from-file=custom-key=nginx.conf.

Container Registry Management: Tagging, Signing, and Operations

Container Registry Management#

A container registry stores and distributes your images. Getting registry operations right – tagging, access control, garbage collection, signing – prevents a class of problems ranging from “which version is deployed?” to “someone pushed a compromised image.”

Registry Options#

Docker Hub – The default registry. Free tier has rate limits (100 pulls per 6 hours for anonymous, 200 for authenticated). Public images only on free plans.

GitHub Container Registry (ghcr.io) – Tight integration with GitHub Actions. Free for public images, included storage for private repos. Authenticate with a GitHub PAT or GITHUB_TOKEN in Actions.

Container Runtime Security Hardening

Why Runtime Security Matters#

Container images get scanned for vulnerabilities before deployment. Admission controllers enforce pod security standards at creation time. But neither addresses what happens after the container starts running. Runtime security fills this gap: it detects and prevents malicious behavior inside running containers.

A compromised container with a properly hardened runtime is limited in what damage it can cause. Without runtime hardening, a single container escape can compromise the entire node.

Converting kubectl Manifests to Helm Charts: Packaging for Reuse

Converting kubectl Manifests to Helm Charts#

You have a set of YAML files that you kubectl apply to deploy your application. They work, but deploying to a second environment means copying files and editing values by hand. Helm charts solve this by parameterizing your manifests.

Step 1: Scaffold the Chart#

Create the chart structure with helm create:

helm create my-app

This generates:

my-app/
  Chart.yaml           # Chart metadata (name, version, appVersion)
  values.yaml          # Default configuration values
  charts/              # Subcharts / dependencies
  templates/
    deployment.yaml    # Deployment template
    service.yaml       # Service template
    ingress.yaml       # Ingress template
    hpa.yaml           # HorizontalPodAutoscaler
    serviceaccount.yaml
    _helpers.tpl       # Named template helpers
    NOTES.txt          # Post-install message
    tests/
      test-connection.yaml

Delete the generated templates you do not need. Keep _helpers.tpl – it provides essential naming functions.

Converting kubectl Manifests to Terraform: From Manual Applies to Infrastructure as Code

Converting kubectl Manifests to Terraform#

You have a working Kubernetes setup built with kubectl apply -f. It works, but there is no state tracking, no dependency graph, and no way to reliably reproduce it. Terraform fixes all three problems.

Step 1: Export Existing Resources#

Start by extracting what you have. For each resource type, export the YAML:

kubectl get deployment,service,configmap,ingress -n my-app -o yaml > exported.yaml

For a single resource with cleaner output:

Debugging ArgoCD: Diagnosing Sync Failures, Health Checks, RBAC, and Repo Issues

Debugging ArgoCD#

Most ArgoCD problems fall into predictable categories: sync stuck in a bad state, resources showing OutOfSync when they should not be, health checks reporting wrong status, RBAC blocking operations, or repository connections failing. Here is how to diagnose and fix each one.

Application Stuck in Progressing#

An application stuck in Progressing means ArgoCD is waiting for a resource to become healthy and it never does. The most common causes:

Deploying Nginx on Kubernetes

Deploying Nginx on Kubernetes#

Nginx shows up in Kubernetes in two completely different roles. First, as a regular Deployment serving static content or acting as a reverse proxy for your application. Second, as an Ingress controller that watches Ingress resources and dynamically reconfigures itself. These are different deployments with different images and different configuration models. Knowing when to use which saves you from over-engineering or under-engineering your setup.

Nginx as a Web Server (Deployment + Service + ConfigMap)#

For serving static files or acting as a reverse proxy in front of your application pods, deploy nginx as a standard Deployment.