Chaos Engineering: From First Experiment to Mature Practice

Why Break Things on Purpose#

Production systems fail in ways that testing environments never reveal. A database connection pool exhaustion under load, a cascading timeout across three services, a DNS cache that masks a routing change until it expires – these failures only surface when real conditions collide in ways nobody predicted. Chaos engineering is the discipline of deliberately injecting failures into a system to discover weaknesses before they cause outages.

Choosing a CNI Plugin: Calico vs Cilium vs Flannel vs Cloud-Native CNI

Choosing a CNI Plugin#

The Container Network Interface (CNI) plugin is one of the most consequential infrastructure decisions in a Kubernetes cluster. It determines how pods get IP addresses, how traffic flows between them, whether network policies are enforced, and what observability you get into network behavior. Changing CNI after deployment is painful – it typically requires draining and rebuilding nodes, or rebuilding the cluster entirely. Choose carefully up front.

Choosing an Autoscaling Strategy: HPA vs VPA vs KEDA vs Karpenter/Cluster Autoscaler

Choosing an Autoscaling Strategy#

Kubernetes autoscaling operates at two distinct layers: pod-level scaling changes how many pods run or how large they are, while node-level scaling changes how many nodes exist in the cluster to host those pods. Getting the right combination of tools at each layer is the key to a system that responds to demand without wasting resources.

The Two Scaling Layers#

Understanding which layer a tool operates on prevents the most common misconfiguration – expecting pod-level scaling to solve node-level capacity problems, or vice versa.

Choosing an Ingress Controller: Nginx vs Traefik vs HAProxy vs Cloud ALB/NLB

Choosing an Ingress Controller#

An Ingress controller is the component that actually routes external traffic into your cluster. The Ingress resource (or Gateway API resource) defines the rules – which hostnames and paths map to which backend Services – but without a controller watching those resources and configuring a reverse proxy, nothing happens. The choice of controller affects performance, configuration ergonomics, TLS management, protocol support, and operational cost.

Unlike CNI plugins, you can run multiple ingress controllers in the same cluster, which is a common pattern for separating internal and external traffic. This reduces the stakes of any single choice, but your primary controller still deserves careful selection.

Cloud-Native vs Portable Infrastructure: A Decision Framework

Cloud-Native vs Portable Infrastructure#

Every infrastructure decision sits on a spectrum between portability and fidelity. On one end, you have generic Kubernetes running on minikube or kind – it works everywhere, costs nothing, and captures the behavior of the Kubernetes API itself. On the other end, you have cloud-native managed services – EKS with IRSA and ALB Ingress Controller, GKE with Workload Identity and Cloud Load Balancing, AKS with Azure AD Pod Identity and Azure Load Balancer. These capture the behavior of the actual platform your workloads will run on.

Cluster Autoscaling: HPA, Cluster Autoscaler, and KEDA

Cluster Autoscaling#

Kubernetes autoscaling operates at two levels: pod-level (HPA adds or removes pod replicas) and node-level (Cluster Autoscaler adds or removes nodes). Getting them to work together requires understanding how each makes decisions.

Horizontal Pod Autoscaler (HPA)#

HPA adjusts the replica count of a Deployment, StatefulSet, or ReplicaSet based on observed metrics. The metrics-server must be running in your cluster for CPU and memory metrics.

Basic HPA on CPU#

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

This scales my-app between 2 and 10 replicas, targeting 70% average CPU utilization across all pods. The HPA checks metrics every 15 seconds (default) and computes the desired replica count as:

ConfigMaps and Secrets: Configuration Management in Kubernetes

ConfigMaps and Secrets#

ConfigMaps hold non-sensitive configuration data. Secrets hold sensitive data like passwords, tokens, and TLS certificates. They look similar in structure but differ in handling: Secrets are base64-encoded, stored with slightly restricted access by default, and can be encrypted at rest if the cluster is configured for it.

Creating ConfigMaps#

From a literal value:

kubectl create configmap app-config \
  --from-literal=LOG_LEVEL=info \
  --from-literal=MAX_CONNECTIONS=100

From a file:

kubectl create configmap nginx-config --from-file=nginx.conf

The key name defaults to the filename. Override it with --from-file=custom-key=nginx.conf.

Converting kubectl Manifests to Helm Charts: Packaging for Reuse

Converting kubectl Manifests to Helm Charts#

You have a set of YAML files that you kubectl apply to deploy your application. They work, but deploying to a second environment means copying files and editing values by hand. Helm charts solve this by parameterizing your manifests.

Step 1: Scaffold the Chart#

Create the chart structure with helm create:

helm create my-app

This generates:

my-app/
  Chart.yaml           # Chart metadata (name, version, appVersion)
  values.yaml          # Default configuration values
  charts/              # Subcharts / dependencies
  templates/
    deployment.yaml    # Deployment template
    service.yaml       # Service template
    ingress.yaml       # Ingress template
    hpa.yaml           # HorizontalPodAutoscaler
    serviceaccount.yaml
    _helpers.tpl       # Named template helpers
    NOTES.txt          # Post-install message
    tests/
      test-connection.yaml

Delete the generated templates you do not need. Keep _helpers.tpl – it provides essential naming functions.

Converting kubectl Manifests to Terraform: From Manual Applies to Infrastructure as Code

Converting kubectl Manifests to Terraform#

You have a working Kubernetes setup built with kubectl apply -f. It works, but there is no state tracking, no dependency graph, and no way to reliably reproduce it. Terraform fixes all three problems.

Step 1: Export Existing Resources#

Start by extracting what you have. For each resource type, export the YAML:

kubectl get deployment,service,configmap,ingress -n my-app -o yaml > exported.yaml

For a single resource with cleaner output:

Deploying Nginx on Kubernetes

Deploying Nginx on Kubernetes#

Nginx shows up in Kubernetes in two completely different roles. First, as a regular Deployment serving static content or acting as a reverse proxy for your application. Second, as an Ingress controller that watches Ingress resources and dynamically reconfigures itself. These are different deployments with different images and different configuration models. Knowing when to use which saves you from over-engineering or under-engineering your setup.

Nginx as a Web Server (Deployment + Service + ConfigMap)#

For serving static files or acting as a reverse proxy in front of your application pods, deploy nginx as a standard Deployment.