Kubernetes Operator Development: Patterns, Frameworks, and Best Practices

Kubernetes Operator Development#

Operators are custom controllers that manage CRDs. They encode operational knowledge – the kind of tasks a human operator would perform – into software that runs inside the cluster. An operator watches for changes to its custom resources and reconciles the actual state to match the desired state, creating, updating, or deleting child resources as needed.

Operator Maturity Model#

The Operator Framework defines five maturity levels:

LevelCapabilityExample
1Basic installHelm operator deploys the application
2Seamless upgradesOperator handles version migrations
3Full lifecycleBackup, restore, failure recovery
4Deep insightsExposes metrics, fires alerts, generates dashboards
5Auto-pilotAuto-scaling, auto-healing, auto-tuning without human input

Most custom operators target Level 2-3. Levels 4-5 are typically reached by mature projects like the Prometheus Operator or Rook/Ceph.

Kubernetes Resource Management: QoS Classes, Eviction, OOM Scoring, and Capacity Planning

Kubernetes Resource Management Deep Dive#

Resource management in Kubernetes is the mechanism that decides which pods get scheduled, which pods get killed when the node runs low, and how much CPU and memory each container is actually allowed to use. The surface-level concept of requests and limits is straightforward. The underlying mechanics – QoS classification, CFS CPU quotas, kernel OOM scoring, kubelet eviction thresholds – are where misconfigurations cause production outages.

Kubernetes Troubleshooting Decision Trees: Symptom to Diagnosis to Fix

Kubernetes Troubleshooting Decision Trees#

Troubleshooting Kubernetes in production is about eliminating possibilities in the right order. Every symptom maps to a finite set of causes, and each cause has a specific diagnostic command. The decision trees below encode that mapping. Start at the symptom, follow the branches, run the commands, and the output tells you which branch to take next.

These trees are designed to be followed mechanically. No intuition required – just execute the commands and interpret the results.

Managed Kubernetes vs Self-Managed: EKS/AKS/GKE vs kubeadm vs k3s vs RKE

Managed Kubernetes vs Self-Managed#

The fundamental tradeoff is straightforward: managed Kubernetes trades control for reduced operational burden, while self-managed Kubernetes gives you full control at the cost of owning everything – etcd, certificates, upgrades, high availability, and recovery.

This decision has cascading effects on team structure, hiring, on-call burden, and long-term maintenance cost. Choose deliberately.

Managed Kubernetes (EKS, AKS, GKE)#

The cloud provider runs the control plane: API server, etcd, controller manager, scheduler. They handle patching, scaling, and high availability for these components. You manage worker nodes and workloads.

Minikube Networking: Services, Ingress, DNS, and LoadBalancer Emulation

Minikube Networking: Services, Ingress, DNS, and LoadBalancer Emulation#

Minikube networking behaves differently from cloud Kubernetes in ways that cause confusion. LoadBalancer services do not get external IPs by default, the minikube IP may or may not be directly reachable from your host depending on the driver, and ingress requires specific addon setup. Understanding these differences prevents hours of debugging connection timeouts to services that are actually running fine.

How Minikube Networking Works#

Minikube creates a single node (a VM or container depending on the driver) with its own IP address. Pods inside the cluster get IPs from an internal CIDR. Services get ClusterIPs from another internal range. The bridge between your host machine and the cluster depends entirely on which driver you use.

Minikube Setup, Drivers, and Resource Configuration

Minikube Setup, Drivers, and Resource Configuration#

Minikube runs a single-node Kubernetes cluster on your local machine. The difference between a minikube setup that feels like a toy and one that behaves like production comes down to three choices: the driver, the resource allocation, and the Kubernetes version. Get these wrong and you spend more time fighting the tool than using it.

Installation#

On macOS with Homebrew:

brew install minikube

On Linux via direct download:

Multi-Cluster Kubernetes: Architecture, Networking, and Management Patterns

Multi-Cluster Kubernetes#

A single Kubernetes cluster is a single blast radius. A bad deployment, a control plane failure, a misconfigured admission webhook – any of these can take down everything. Multi-cluster is not about complexity for its own sake. It is about isolation, resilience, and operating workloads that span regions, regulations, or teams.

Why Multi-Cluster#

Blast radius isolation. A cluster-wide failure (etcd corruption, bad admission webhook, API server overload) only affects one cluster. Critical workloads in another cluster are untouched.

OPA Gatekeeper: Policy as Code for Kubernetes

OPA Gatekeeper: Policy as Code for Kubernetes#

Gatekeeper is a Kubernetes-native policy engine built on Open Policy Agent (OPA). It runs as a validating admission webhook and evaluates policies written in Rego against every matching API request. Instead of deploying raw Rego files to an OPA server, Gatekeeper uses Custom Resource Definitions: you define policies as ConstraintTemplates and instantiate them as Constraints. This makes policy management declarative, auditable, and version-controlled.

Pod Affinity and Anti-Affinity: Co-locating and Spreading Workloads

Pod Affinity and Anti-Affinity#

Node affinity controls which nodes a pod can run on. Pod affinity and anti-affinity go further – they control whether a pod should run near or away from other specific pods. This is how you co-locate a frontend with its cache for low latency, or spread database replicas across failure domains for high availability.

Pod Affinity: Schedule Near Other Pods#

Pod affinity tells the scheduler “place this pod in the same topology domain as pods matching a label selector.” The topology domain is defined by topologyKey – it could be the same node, the same zone, or any other node label.

Pod Security Standards and Admission: Replacing PodSecurityPolicy

Pod Security Standards and Admission#

PodSecurityPolicy (PSP) was removed from Kubernetes in v1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces three predefined security profiles. PSA is simpler than PSP – no separate policy objects, no RBAC bindings to manage – but it is also less flexible. You apply security standards to namespaces via labels and the admission controller handles enforcement.

The Three Security Standards#

Kubernetes defines three Pod Security Standards, each progressively more restrictive: