Choosing an Ingress Controller: Nginx vs Traefik vs HAProxy vs Cloud ALB/NLB

Choosing an Ingress Controller#

An Ingress controller is the component that actually routes external traffic into your cluster. The Ingress resource (or Gateway API resource) defines the rules – which hostnames and paths map to which backend Services – but without a controller watching those resources and configuring a reverse proxy, nothing happens. The choice of controller affects performance, configuration ergonomics, TLS management, protocol support, and operational cost.

Unlike CNI plugins, you can run multiple ingress controllers in the same cluster, which is a common pattern for separating internal and external traffic. This reduces the stakes of any single choice, but your primary controller still deserves careful selection.

Choosing Kubernetes Workload Types: Deployment vs StatefulSet vs DaemonSet vs Job

Choosing Kubernetes Workload Types#

Kubernetes provides several workload controllers, each designed for a specific class of application behavior. Choosing the wrong one leads to data loss, unnecessary complexity, or workloads that fight the platform instead of leveraging it. This guide walks through the decision criteria and tradeoffs for each type.

The Workload Types at a Glance#

Workload TypeLifecyclePod IdentityScaling ModelStorage ModelTypical Use
DeploymentLong-runningInterchangeableHorizontal replicasShared or noneWeb servers, APIs, stateless microservices
StatefulSetLong-runningStable, orderedOrdered horizontalPer-pod persistentDatabases, message queues, distributed consensus
DaemonSetLong-runningOne per nodeTied to node countNode-localLog collectors, monitoring agents, network plugins
JobRun to completionDisposableParallel completionsEphemeralBatch processing, migrations, one-time tasks
CronJobScheduledDisposablePer-schedule runEphemeralPeriodic backups, cleanup, scheduled reports
ReplicaSetLong-runningInterchangeableHorizontal replicasShared or noneAlmost never used directly

Decision Criteria#

The choice comes down to four questions:

Cloud Behavioral Divergence Guide: Where AWS, Azure, and GCP Actually Differ

Cloud Behavioral Divergence Guide#

Running the “same” workload on AWS, Azure, and GCP does not produce the same behavior. The Kubernetes API is portable, application containers are portable, and SQL queries are portable. Everything else – identity, networking, storage, load balancing, DNS, and managed service behavior – diverges in ways that matter for production reliability.

This guide documents the specific divergence points with practical examples. Use it when translating infrastructure from one cloud to another, when debugging behavior that differs between environments, or when assessing migration risk.

Cloud-Native vs Portable Infrastructure: A Decision Framework

Cloud-Native vs Portable Infrastructure#

Every infrastructure decision sits on a spectrum between portability and fidelity. On one end, you have generic Kubernetes running on minikube or kind – it works everywhere, costs nothing, and captures the behavior of the Kubernetes API itself. On the other end, you have cloud-native managed services – EKS with IRSA and ALB Ingress Controller, GKE with Workload Identity and Cloud Load Balancing, AKS with Azure AD Pod Identity and Azure Load Balancer. These capture the behavior of the actual platform your workloads will run on.

Cluster Autoscaling: HPA, Cluster Autoscaler, and KEDA

Cluster Autoscaling#

Kubernetes autoscaling operates at two levels: pod-level (HPA adds or removes pod replicas) and node-level (Cluster Autoscaler adds or removes nodes). Getting them to work together requires understanding how each makes decisions.

Horizontal Pod Autoscaler (HPA)#

HPA adjusts the replica count of a Deployment, StatefulSet, or ReplicaSet based on observed metrics. The metrics-server must be running in your cluster for CPU and memory metrics.

Basic HPA on CPU#

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

This scales my-app between 2 and 10 replicas, targeting 70% average CPU utilization across all pods. The HPA checks metrics every 15 seconds (default) and computes the desired replica count as:

CockroachDB Day-2 Operations

Adding and Removing Nodes#

Adding a node: start a new cockroach process with --join pointing to existing nodes. CockroachDB automatically rebalances ranges to the new node.

cockroach start --insecure --store=node4-data \
  --advertise-addr=node4:26257 \
  --join=node1:26257,node2:26257,node3:26257

Watch rebalancing in the DB Console under Metrics > Replication, or query directly:

SELECT node_id, range_count, lease_count FROM crdb_internal.kv_store_status;

Decommissioning a node moves all range replicas off before shutdown, preventing under-replication:

cockroach node decommission 4 --insecure --host=node1:26257

# Monitor progress
cockroach node status --insecure --host=node1:26257 --decommission

Do not simply kill a node. Without decommissioning, CockroachDB treats it as a failure and waits 5 minutes before re-replicating. On Kubernetes with the operator, scale by changing spec.nodes in the CrdbCluster resource.

CockroachDB Setup and Architecture

Architecture: What CockroachDB Actually Does Under the Hood#

CockroachDB is a distributed SQL database that stores data across multiple nodes while presenting a single logical database to clients. Understanding three concepts is essential before deploying it.

Ranges. All data is stored in key-value pairs, sorted by key. CockroachDB splits this sorted keyspace into contiguous chunks called ranges, each targeting 512 MiB by default. Every SQL table, index, and system table maps to one or more ranges. When a range grows beyond the threshold, it splits automatically.

ConfigMaps and Secrets: Configuration Management in Kubernetes

ConfigMaps and Secrets#

ConfigMaps hold non-sensitive configuration data. Secrets hold sensitive data like passwords, tokens, and TLS certificates. They look similar in structure but differ in handling: Secrets are base64-encoded, stored with slightly restricted access by default, and can be encrypted at rest if the cluster is configured for it.

Creating ConfigMaps#

From a literal value:

kubectl create configmap app-config \
  --from-literal=LOG_LEVEL=info \
  --from-literal=MAX_CONNECTIONS=100

From a file:

kubectl create configmap nginx-config --from-file=nginx.conf

The key name defaults to the filename. Override it with --from-file=custom-key=nginx.conf.

Container Registry Management: Tagging, Signing, and Operations

Container Registry Management#

A container registry stores and distributes your images. Getting registry operations right – tagging, access control, garbage collection, signing – prevents a class of problems ranging from “which version is deployed?” to “someone pushed a compromised image.”

Registry Options#

Docker Hub – The default registry. Free tier has rate limits (100 pulls per 6 hours for anonymous, 200 for authenticated). Public images only on free plans.

GitHub Container Registry (ghcr.io) – Tight integration with GitHub Actions. Free for public images, included storage for private repos. Authenticate with a GitHub PAT or GITHUB_TOKEN in Actions.

Container Runtime Security Hardening

Why Runtime Security Matters#

Container images get scanned for vulnerabilities before deployment. Admission controllers enforce pod security standards at creation time. But neither addresses what happens after the container starts running. Runtime security fills this gap: it detects and prevents malicious behavior inside running containers.

A compromised container with a properly hardened runtime is limited in what damage it can cause. Without runtime hardening, a single container escape can compromise the entire node.