Kubernetes Production Readiness Checklist: Everything to Verify Before Going Live

Kubernetes Production Readiness Checklist#

This checklist is designed for agents to audit a Kubernetes cluster before production workloads run on it. Every item includes the verification command and what a passing result looks like. Work through each category sequentially. A failing item in Cluster Health should be fixed before checking Workload Configuration.


Cluster Health#

These are non-negotiable. If any of these fail, stop and fix them before evaluating anything else.

Kubernetes Scheduler: How Pods Get Placed on Nodes

Kubernetes Scheduler: How Pods Get Placed on Nodes#

The scheduler (kube-scheduler) watches for newly created pods that have no node assignment. For each unscheduled pod, the scheduler selects the best node and writes a binding back to the API server. The kubelet on that node then starts the pod. If no node is suitable, the pod stays Pending until conditions change.

The scheduler is the reason pods run where they do. Understanding its internals is essential for diagnosing Pending pods, designing placement constraints, and managing cluster utilization.

Kubernetes Service Types and DNS-Based Discovery

Kubernetes Service Types and DNS-Based Discovery#

Services are the stable networking abstraction in Kubernetes. Pods come and go, but a Service gives you a consistent DNS name and IP address that routes to the right set of pods. Choosing the wrong Service type or misunderstanding DNS discovery is behind a large percentage of connectivity failures.

Service Types#

ClusterIP (Default)#

ClusterIP creates an internal-only virtual IP. Only pods inside the cluster can reach it. This is what you want for internal communication between microservices.

Kustomize Patterns: Bases, Overlays, and Practical Transformers

Kustomize Patterns#

Kustomize lets you customize Kubernetes manifests without templating. You start with plain YAML (bases) and layer modifications (overlays) on top. It is built into kubectl, so there is no extra tool to install.

Base and Overlay Structure#

The standard layout separates shared manifests from per-environment customizations:

k8s/
  base/
    kustomization.yaml
    deployment.yaml
    service.yaml
    configmap.yaml
  overlays/
    dev/
      kustomization.yaml
      replica-patch.yaml
    staging/
      kustomization.yaml
      ingress.yaml
    production/
      kustomization.yaml
      replica-patch.yaml
      hpa.yaml

The base kustomization.yaml lists the resources:

# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - deployment.yaml
  - service.yaml
  - configmap.yaml

An overlay references the base and adds modifications:

Minikube to Cloud Migration: 10 Things That Change on EKS, GKE, and AKS

Minikube to Cloud Migration Guide#

Minikube is excellent for learning and local development. But almost everything that “just works” on minikube requires explicit configuration on a cloud cluster. Here are the 10 things that change.

1. Ingress Controller Becomes a Cloud Load Balancer#

On minikube: You enable the NGINX ingress addon with minikube addons enable ingress. Traffic reaches your services through minikube tunnel or minikube service.

On cloud: The ingress controller must be deployed explicitly, and it provisions a real cloud load balancer. On AWS, the AWS Load Balancer Controller creates ALBs or NLBs from Ingress resources. On GKE, the built-in GCE ingress controller creates Google Cloud Load Balancers. You pay per load balancer.

Multi-Tenancy Patterns: Namespace Isolation, vCluster, and Dedicated Clusters

Multi-Tenancy Patterns: Namespace Isolation, vCluster, and Dedicated Clusters#

Multi-tenancy in Kubernetes means running workloads for multiple teams, customers, or environments on shared infrastructure. The core tension is always the same: sharing reduces cost, but isolation prevents blast radius. Choosing the wrong model creates security gaps or wastes money. This guide provides a framework for selecting the right approach and implementing it correctly.

The Three Models#

Every Kubernetes multi-tenancy approach falls into one of three categories, each with different isolation guarantees:

Namespace Strategy and Multi-Tenancy: Isolation, Quotas, and Policies

Namespace Strategy and Multi-Tenancy#

Namespaces are the foundation for isolating workloads in a shared Kubernetes cluster. Without a deliberate strategy, teams deploy into arbitrary namespaces, resources are unbound, and one misbehaving application can take down the entire cluster.

Why Namespaces Matter#

Namespaces provide four isolation boundaries:

  • RBAC scoping: Roles and RoleBindings are namespace-scoped, so you can grant teams access to their namespaces only.
  • Resource quotas: Limit CPU, memory, and object counts per namespace, preventing one team from starving others.
  • Network policies: Restrict traffic between namespaces so a compromised application cannot reach services it should not.
  • Organizational clarity: kubectl get pods -n payments-prod shows exactly what you expect, not a jumble of unrelated workloads.

System Namespaces#

These exist in every cluster and should be off-limits to application teams:

Network Policies: Namespace Isolation and Pod-to-Pod Rules

Network Policies: Namespace Isolation and Pod-to-Pod Rules#

By default, every pod in a Kubernetes cluster can talk to every other pod. Network policies let you restrict that. They are namespace-scoped resources that select pods by label and define allowed ingress and egress rules.

Critical Prerequisite: CNI Support#

Network policies are only enforced if your CNI plugin supports them. Calico, Cilium, and Weave all support network policies. Flannel does not. If you are running Flannel, you can create NetworkPolicy resources without errors, but they will have absolutely no effect. This is a silent failure that wastes hours of debugging.

Node Drain and Cordon: Safe Node Maintenance

Node Drain and Cordon#

Node maintenance is a routine part of cluster operations: kernel patches, instance type changes, Kubernetes upgrades, hardware replacement. The tools are kubectl cordon (stop scheduling new pods) and kubectl drain (evict existing pods). Getting the flags and sequence right is the difference between a seamless operation and a production incident.

Cordon: Mark Unschedulable#

Cordon sets the spec.unschedulable field on a node to true. The scheduler will not place new pods on it, but existing pods continue running undisturbed.

Pod Lifecycle and Probes: Init Containers, Hooks, and Health Checks

Pod Lifecycle and Probes#

Understanding how Kubernetes starts, monitors, and stops pods is essential for running reliable services. Misconfigurations here cause cascading failures, dropped requests, and restart loops that are difficult to diagnose.

Pod Startup Sequence#

When a pod is scheduled, this is the exact order of operations:

  1. Init containers run sequentially. Each must exit 0 before the next starts.
  2. All regular containers start simultaneously.
  3. postStart hooks fire (in parallel with the container’s main process).
  4. Startup probe begins checking (if defined).
  5. Once the startup probe passes, liveness and readiness probes begin.

Init Containers#

Init containers run before your application containers and are used for setup tasks: waiting for a dependency, running database migrations, cloning config from a remote source.