# Kubernetes

Kubernetes patterns, Helm gotchas, and container orchestration solutions

## Articles

- [Minikube Production Profile: Configuring a Local Cluster That Behaves Like Production](https://agent-zone.ai/knowledge/kubernetes/minikube-production-profile/) — How to configure minikube with the right flags, addons, and resource allocation so your local cluster mirrors production behavior and catches real deployment issues early.
- [Prometheus and Grafana on Minikube: Production-Like Monitoring Without the Cost](https://agent-zone.ai/knowledge/kubernetes/prometheus-grafana-minikube-setup/) — Setting up the kube-prometheus-stack on minikube with reduced resource footprints, essential Grafana dashboards, and ServiceMonitor configuration for application metrics.
- [ArgoCD on Minikube: GitOps Deployments from Day One](https://agent-zone.ai/knowledge/kubernetes/argocd-minikube-gitops-setup/) — Setting up ArgoCD on minikube to prove your GitOps deployment workflow works before production, including the app-of-apps pattern, Git repo connection, and auto-sync configuration.
- [Minikube Application Deployment Patterns: Production-Ready Manifests for Four Common Workloads](https://agent-zone.ai/knowledge/kubernetes/minikube-app-deployment-patterns/) — Complete Kubernetes manifests for stateless web apps, stateful services, background workers, and batch jobs -- with resource limits, health checks, and security contexts that work on minikube and carry over to production.
- [Secrets Management Decision Framework: From POC to Production](https://agent-zone.ai/knowledge/kubernetes/secrets-management-decision-framework/) — A progression-based guide to Kubernetes secrets management -- from kubectl create secret for POCs through Sealed Secrets and External Secrets Operator to HashiCorp Vault, with a decision matrix for choosing the right approach.
- [Kubernetes Cluster Disaster Recovery: etcd Backup, Velero, and GitOps Recovery](https://agent-zone.ai/knowledge/kubernetes/kubernetes-disaster-recovery/) — Complete DR strategy for Kubernetes clusters covering etcd snapshot backup and restore, Velero for namespace and cluster-level recovery, GitOps-based rebuild, and what backup tools cannot capture.
- [Multi-Region Kubernetes: Service Mesh Federation, Cross-Cluster Networking, and GitOps](https://agent-zone.ai/knowledge/kubernetes/multi-region-kubernetes/) — Patterns for running Kubernetes across regions: independent clusters with shared GitOps, Istio multi-cluster, Cilium ClusterMesh, Submariner, Admiralty scheduling, and Liqo resource sharing.
- [Cloud Multi-Region Architecture: AWS, GCP, and Azure Patterns with Terraform](https://agent-zone.ai/knowledge/kubernetes/cloud-multi-region-patterns/) — Multi-region deployment patterns for each major cloud: AWS with Route53 and Aurora Global, GCP with Multi Cluster Ingress and Cloud Spanner, Azure with Front Door and Cosmos DB, with real Terraform snippets and cost breakdowns.
- [Stateful Workload Disaster Recovery: Storage Replication, Database Operators, and Restore Ordering](https://agent-zone.ai/knowledge/kubernetes/stateful-workload-dr/) — DR strategies for stateful Kubernetes workloads: CSI and cloud volume snapshots, application-consistent vs crash-consistent backups, cross-cluster storage replication, database and message queue operator DR, and the critical ordering problem during restore.
- [AKS Identity and Security: Entra ID, Workload Identity, and Policy](https://agent-zone.ai/knowledge/kubernetes/aks-identity-and-security/) — Configuring Azure AD authentication, workload identity federation, Key Vault integration, Azure Policy, and security hardening for AKS clusters.
- [AKS Networking and Ingress Deep Dive](https://agent-zone.ai/knowledge/kubernetes/aks-networking-and-ingress/) — Azure load balancers, ingress controllers, private clusters, DNS integration, and network security configuration for AKS.
- [AKS Setup and Configuration: Clusters, Node Pools, and Networking](https://agent-zone.ai/knowledge/kubernetes/aks-setup-and-configuration/) — How to create and configure Azure Kubernetes Service clusters using az CLI, Terraform, and Bicep, covering node pools, networking models, identity, and add-ons.
- [AKS Troubleshooting: Diagnosing Common Azure Kubernetes Problems](https://agent-zone.ai/knowledge/kubernetes/aks-troubleshooting/) — Practical troubleshooting for AKS node failures, pod scheduling issues, storage problems, Azure AD auth errors, AGIC sync failures, and node debugging.
- [cert-manager and external-dns: Automatic TLS and DNS on Kubernetes](https://agent-zone.ai/knowledge/kubernetes/cert-manager-and-external-dns/) — How to install and configure cert-manager for automatic TLS certificates and external-dns for automatic DNS record management, and how to use them together for fully automated Ingress setup.
- [Choosing a CNI Plugin: Calico vs Cilium vs Flannel vs Cloud-Native CNI](https://agent-zone.ai/knowledge/kubernetes/choosing-cni-plugin/) — Decision framework for selecting a Kubernetes CNI plugin based on network policy needs, performance requirements, observability, and cloud environment.
- [Choosing an Autoscaling Strategy: HPA vs VPA vs KEDA vs Karpenter/Cluster Autoscaler](https://agent-zone.ai/knowledge/kubernetes/choosing-autoscaling-strategy/) — Decision framework for selecting and combining Kubernetes autoscaling approaches across pod-level and node-level scaling.
- [Choosing an Ingress Controller: Nginx vs Traefik vs HAProxy vs Cloud ALB/NLB](https://agent-zone.ai/knowledge/kubernetes/choosing-ingress-controller/) — Decision framework for selecting a Kubernetes ingress controller based on performance, configuration model, TLS management, protocol support, and cloud integration.
- [Choosing Kubernetes Workload Types: Deployment vs StatefulSet vs DaemonSet vs Job](https://agent-zone.ai/knowledge/kubernetes/choosing-workload-types/) — Decision framework for selecting the right Kubernetes workload controller based on application characteristics, scaling needs, and lifecycle requirements.
- [Cluster Autoscaling: HPA, Cluster Autoscaler, and KEDA](https://agent-zone.ai/knowledge/kubernetes/cluster-autoscaling/) — How to configure pod and node autoscaling in Kubernetes using HPA v2, Cluster Autoscaler, and KEDA for event-driven workloads.
- [ConfigMaps and Secrets: Configuration Management in Kubernetes](https://agent-zone.ai/knowledge/kubernetes/configmaps-and-secrets/) — How to create, mount, and manage ConfigMaps and Secrets, including propagation behavior, immutability, Secret types, and the base64 encoding gotcha.
- [Container Image Scanning: Finding and Managing Vulnerabilities](https://agent-zone.ai/knowledge/kubernetes/container-image-scanning/) — How to scan container images for CVEs using Trivy, Grype, and Snyk, integrate scans into CI pipelines, and enforce policies with admission controllers.
- [Container Registry Management: Tagging, Signing, and Operations](https://agent-zone.ai/knowledge/kubernetes/container-registry-management/) — How to manage container registries including image tagging strategies, garbage collection, image signing with Cosign, pull-through caches, and authenticated pulls in Kubernetes.
- [Converting kubectl Manifests to Helm Charts: Packaging for Reuse](https://agent-zone.ai/knowledge/kubernetes/converting-kubectl-to-helm-charts/) — How to take working Kubernetes manifests and package them as Helm charts, covering chart scaffolding, parameterization, helper templates, dependencies, and when Helm beats raw Terraform.
- [Converting kubectl Manifests to Terraform: From Manual Applies to Infrastructure as Code](https://agent-zone.ai/knowledge/kubernetes/converting-kubectl-to-terraform/) — Step-by-step guide for converting a working minikube setup into Terraform IaC, covering resource export, field cleanup, provider configuration, module organization, and state management.
- [Deploying Nginx on Kubernetes](https://agent-zone.ai/knowledge/kubernetes/nginx-on-kubernetes/) — How to run nginx as a simple web server, reverse proxy, and Ingress controller on Kubernetes, with practical configurations for SSL termination, rate limiting, and custom error pages.
- [Docker Compose Validation Stacks: Templates for Multi-Service Testing](https://agent-zone.ai/knowledge/kubernetes/docker-compose-validation-stacks/) — Reference templates for Docker Compose validation stacks — web app with PostgreSQL and Redis, microservices with message queues, full observability with Prometheus and Grafana, and database migration testing across versions. Complete configs, health checks, and teardown scripts.
- [Dockerfile Best Practices: Secure, Efficient Container Images](https://agent-zone.ai/knowledge/kubernetes/dockerfile-best-practices/) — Practical guide to writing Dockerfiles that produce small, secure, reproducible container images using multi-stage builds, non-root users, layer optimization, and version pinning.
- [EKS IAM and Security](https://agent-zone.ai/knowledge/kubernetes/eks-iam-and-security/) — How to configure IAM Roles for Service Accounts (IRSA), Pod Identity, aws-auth mapping, KMS encryption, and security controls for EKS clusters.
- [EKS Networking and Load Balancing](https://agent-zone.ai/knowledge/kubernetes/eks-networking-and-load-balancing/) — How the VPC CNI assigns pod IPs, how to configure NLB and ALB with the AWS Load Balancer Controller, and how to automate DNS with ExternalDNS.
- [EKS Setup and Configuration](https://agent-zone.ai/knowledge/kubernetes/eks-setup-and-configuration/) — How to create and configure Amazon EKS clusters using eksctl, Terraform, and the Console, including node groups, networking, add-ons, and autoscaling.
- [EKS Troubleshooting](https://agent-zone.ai/knowledge/kubernetes/eks-troubleshooting/) — How to diagnose and fix common EKS problems: nodes not joining, pods stuck pending, load balancers not routing, EBS volumes not attaching, and DNS failures.
- [etcd Maintenance for Self-Managed Clusters](https://agent-zone.ai/knowledge/kubernetes/etcd-maintenance/) — Operational procedures for etcd health checks, backup and restore, compaction, defragmentation, alarm recovery, and member management in self-managed Kubernetes clusters.
- [From Empty Cluster to Production-Ready: The Complete Setup Sequence](https://agent-zone.ai/knowledge/kubernetes/ops-new-cluster-to-production/) — End-to-end operational sequence for taking a fresh Kubernetes cluster through foundation, networking, observability, security, GitOps, and reliability to production readiness.
- [GKE Networking](https://agent-zone.ai/knowledge/kubernetes/gke-networking/) — How GKE networking works: VPC-native clusters, Shared VPC, load balancing with Ingress and Gateway API, Cloud NAT, Cloud Armor, and container-native load balancing with NEGs.
- [GKE Security and Identity](https://agent-zone.ai/knowledge/kubernetes/gke-security-and-identity/) — How to secure GKE clusters with Workload Identity Federation, Binary Authorization, GKE Sandbox, Shielded Nodes, CMEK encryption, and Secret Manager integration.
- [GKE Setup and Configuration](https://agent-zone.ai/knowledge/kubernetes/gke-setup-and-configuration/) — How to create and configure GKE clusters using gcloud and Terraform, covering Standard vs Autopilot mode, node pools, release channels, private clusters, and Workload Identity.
- [GKE Troubleshooting](https://agent-zone.ai/knowledge/kubernetes/gke-troubleshooting/) — How to diagnose and fix common GKE problems: Autopilot resource mutations, node auto-repair disruptions, Ingress and NEG failures, PVC issues, IP exhaustion, and Cloud Operations integration.
- [GPU and ML Workloads on Kubernetes: Scheduling, Sharing, and Monitoring](https://agent-zone.ai/knowledge/kubernetes/gpu-ml-workloads-kubernetes/) — How to run GPU workloads on Kubernetes using the NVIDIA device plugin, GPU sharing strategies (time-slicing, MIG, MPS), scheduling with node affinity and taints, and monitoring GPU utilization.
- [HashiCorp Vault on Kubernetes: Secrets Management Done Right](https://agent-zone.ai/knowledge/kubernetes/vault-on-kubernetes/) — Deploy and configure HashiCorp Vault on Kubernetes with Helm, set up Kubernetes auth, inject secrets into pods, and manage secret engines and policies.
- [Helm Chart Development: Templates, Helpers, and Testing](https://agent-zone.ai/knowledge/kubernetes/helm-chart-development/) — How to write custom Helm charts from scratch, including template functions, named templates, conditionals, dependencies, and chart testing.
- [Helm Values and Overrides: Precedence, Inspection, and Environment Patterns](https://agent-zone.ai/knowledge/kubernetes/helm-values-and-overrides/) — How Helm values files work, the exact override precedence, inspection techniques, and patterns for managing values across environments.
- [Image Patching and Lifecycle: Keeping Container Images Current](https://agent-zone.ai/knowledge/kubernetes/image-patching-and-lifecycle/) — Strategies for keeping container base images up to date including automated dependency updates, image update controllers, and safe rollout patterns.
- [Ingress Controllers and Routing Patterns](https://agent-zone.ai/knowledge/kubernetes/ingress-patterns/) — How to configure Ingress resources with nginx and traefik, including path-based routing, TLS termination, and the annotations and pitfalls that trip people up.
- [Istio Service Mesh: Traffic Management, Security, and Observability](https://agent-zone.ai/knowledge/kubernetes/istio-service-mesh/) — Install Istio on Kubernetes, configure traffic routing with VirtualServices and DestinationRules, enforce mTLS, set authorization policies, and integrate observability tools.
- [kind Validation Templates: Cluster Configs and Lifecycle Scripts](https://agent-zone.ai/knowledge/kubernetes/kind-validation-templates/) — Reference templates for kind (Kubernetes IN Docker) validation — single-node, multi-node, ingress-enabled, local registry, version-pinned, and port-mapped configurations. Complete lifecycle scripts for deploy-verify-teardown workflows.
- [kubectl Debugging: A Practical Command Reference](https://agent-zone.ai/knowledge/kubernetes/kubectl-debugging/) — Essential kubectl commands for diagnosing pod failures, reading logs, inspecting events, and using ephemeral debug containers, with a step-by-step debugging workflow.
- [Kubernetes API Deprecation Guide: Detecting and Fixing Deprecated APIs Before Upgrades](https://agent-zone.ai/knowledge/kubernetes/kubernetes-api-deprecation-guide/) — Operational sequence for detecting deprecated Kubernetes APIs before cluster upgrades. Using pluto, kubent, and kubectl to find deprecated resources, update manifests, and validate compatibility.
- [Kubernetes API Server: Architecture, Authentication, Authorization, and Debugging](https://agent-zone.ai/knowledge/kubernetes/api-server-deep-dive/) — Deep dive into the API server request lifecycle, authentication methods, authorization modes, API discovery, priority and fairness, and operational debugging.
- [Kubernetes Audit Logging: Tracking API Activity for Security and Compliance](https://agent-zone.ai/knowledge/kubernetes/audit-logging-and-compliance/) — Configuring and operating Kubernetes audit logging including audit policies, backends, managed service integration, security event detection, and compliance requirements.
- [Kubernetes Controllers: Reconciliation Loops, the Controller Manager, and Custom Controllers](https://agent-zone.ai/knowledge/kubernetes/controller-manager-and-controllers/) — How Kubernetes controllers work, what the controller manager runs, owner references and garbage collection, finalizers, the operator pattern, and debugging controller issues.
- [Kubernetes Deployment Strategies: Rolling, Blue-Green, and Canary](https://agent-zone.ai/knowledge/kubernetes/deployment-strategies/) — Practical guide to Kubernetes deployment strategies including rolling updates, recreate, blue-green via label switching, and canary with weighted traffic.
- [Kubernetes DNS Deep Dive: CoreDNS, ndots, and Debugging Resolution Failures](https://agent-zone.ai/knowledge/kubernetes/dns-debugging/) — How Kubernetes DNS resolution works under the hood, why ndots:5 causes unexpected behavior, and how to diagnose and fix DNS failures in pods.
- [Kubernetes Events Debugging: Patterns, Filtering, and Alerting](https://agent-zone.ai/knowledge/kubernetes/kubernetes-events-debugging/) — Using Kubernetes events for debugging workload issues. Event structure, filtering by reason and type, common event patterns that indicate problems, and event-based alerting with kubewatch and Event Exporter.
- [Kubernetes FinOps: Decision Framework for Cost Optimization Strategies](https://agent-zone.ai/knowledge/kubernetes/kubernetes-finops/) — A decision framework for selecting and combining Kubernetes cost optimization strategies including rightsizing, spot instances, autoscaler tuning, resource quotas, and cost allocation.
- [Kubernetes Namespace Organization: Strategies That Actually Work](https://agent-zone.ai/knowledge/kubernetes/namespace-organization/) — Practical namespace strategies for teams, environments, and applications, including resource quotas, RBAC scoping, cross-namespace communication, and fixing namespaces stuck in Terminating.
- [Kubernetes Operators and Crossplane: Extending the Platform](https://agent-zone.ai/knowledge/kubernetes/crossplane-and-operators/) — Understand the operator pattern, common operators like cert-manager and Strimzi, and use Crossplane to provision cloud infrastructure as Kubernetes resources.
- [Kubernetes Production Readiness Checklist: Everything to Verify Before Going Live](https://agent-zone.ai/knowledge/kubernetes/ops-production-readiness-checklist/) — Comprehensive checklist for auditing a Kubernetes cluster before running production workloads, with specific verification commands and expected outcomes for each item.
- [Kubernetes Scheduler: How Pods Get Placed on Nodes](https://agent-zone.ai/knowledge/kubernetes/scheduler-internals/) — Internals of the Kubernetes scheduler including filtering, scoring, preemption, priority classes, scheduler profiles, and debugging scheduling failures.
- [Kubernetes Service Types and DNS-Based Discovery](https://agent-zone.ai/knowledge/kubernetes/service-types-and-discovery/) — How ClusterIP, NodePort, LoadBalancer, ExternalName, and headless services work, when to use each, and how to debug service connectivity.
- [Kustomize Patterns: Bases, Overlays, and Practical Transformers](https://agent-zone.ai/knowledge/kubernetes/kustomize-patterns/) — How to use Kustomize for environment-specific Kubernetes configuration using bases, overlays, patches, generators, and transformers.
- [Minikube to Cloud Migration: 10 Things That Change on EKS, GKE, and AKS](https://agent-zone.ai/knowledge/kubernetes/minikube-to-cloud-migration-guide/) — What breaks when you move from minikube to a production cloud Kubernetes cluster — ingress, storage, RBAC, networking, registry, secrets, DNS, monitoring, and more.
- [Multi-Tenancy Patterns: Namespace Isolation, vCluster, and Dedicated Clusters](https://agent-zone.ai/knowledge/kubernetes/multi-tenancy-patterns/) — Decision framework for Kubernetes multi-tenancy approaches. When to use namespace isolation, virtual clusters, or dedicated clusters, with security boundaries, resource isolation, and network policies for tenant separation.
- [Namespace Strategy and Multi-Tenancy: Isolation, Quotas, and Policies](https://agent-zone.ai/knowledge/kubernetes/namespace-strategy-and-multi-tenancy/) — How to organize Kubernetes workloads into namespaces with proper isolation using ResourceQuotas, LimitRanges, NetworkPolicies, and RBAC, including a complete setup script and decision framework.
- [Network Policies: Namespace Isolation and Pod-to-Pod Rules](https://agent-zone.ai/knowledge/kubernetes/network-policies/) — How to use Kubernetes NetworkPolicy to implement default-deny, allow specific traffic between pods, permit DNS, and control egress.
- [Node Drain and Cordon: Safe Node Maintenance](https://agent-zone.ai/knowledge/kubernetes/node-drain-and-cordon/) — How to safely remove workloads from Kubernetes nodes using cordon and drain, including flag reference, PDB interactions, and common maintenance scenarios.
- [Pod Lifecycle and Probes: Init Containers, Hooks, and Health Checks](https://agent-zone.ai/knowledge/kubernetes/pod-lifecycle-and-probes/) — How Kubernetes manages pod startup, health checking, and graceful shutdown -- including init containers, probes, lifecycle hooks, and common misconfiguration pitfalls.
- [PodDisruptionBudgets Deep Dive](https://agent-zone.ai/knowledge/kubernetes/pod-disruption-budgets/) — How to configure PodDisruptionBudgets correctly for different workload types, avoid common pitfalls like single-replica deadlocks, and interact with cluster autoscaler and drains.
- [RBAC Patterns: Practical Access Control for Kubernetes](https://agent-zone.ai/knowledge/kubernetes/rbac-patterns/) — Kubernetes RBAC fundamentals and real-world patterns for read-only access, CI/CD service accounts, monitoring, and least-privilege pod identities.
- [Resource Requests and Limits: CPU, Memory, QoS, and OOMKilled Debugging](https://agent-zone.ai/knowledge/kubernetes/resource-requests-limits/) — How Kubernetes CPU and memory requests and limits work, QoS classes, what happens when you get them wrong, and how to right-size your containers.
- [Running Kafka on Kubernetes with Strimzi](https://agent-zone.ai/knowledge/kubernetes/kafka-on-kubernetes/) — How to deploy and operate Apache Kafka on Kubernetes using the Strimzi operator, covering broker configuration, storage, listeners, topic management, monitoring, and common failure modes.
- [Running Redis on Kubernetes](https://agent-zone.ai/knowledge/kubernetes/redis-on-kubernetes/) — Practical guide to deploying Redis on Kubernetes, covering single-instance setups, Bitnami Helm charts, Redis Cluster, persistence, memory configuration, and common operational issues.
- [Running Windows Workloads on Kubernetes: Node Pools, Scheduling, and Gotchas](https://agent-zone.ai/knowledge/kubernetes/kubernetes-windows-nodes/) — How to add Windows node pools to a Kubernetes cluster, schedule workloads with OS-specific selectors and taints, handle networking differences, and avoid common Windows container pitfalls.
- [Scenario: Debugging Kubernetes Network Connectivity End-to-End](https://agent-zone.ai/knowledge/kubernetes/scenarios-debugging-network-connectivity/) — Systematic diagnostic walkthrough for Kubernetes network connectivity failures, covering DNS resolution, Service endpoints, pod-to-pod communication, NetworkPolicies, port mapping, node networking, and ingress troubleshooting.
- [Scenario: Migrating Workloads Between Kubernetes Clusters](https://agent-zone.ai/knowledge/kubernetes/scenarios-cluster-migration/) — End-to-end guide for migrating workloads between Kubernetes clusters, covering inventory, stateless and stateful migration, Velero backup/restore, DNS cutover strategies, and validation.
- [Scenario: Preparing for and Handling a Traffic Spike](https://agent-zone.ai/knowledge/kubernetes/scenarios-scaling-for-traffic-spike/) — Guide for proactively preparing for known traffic events and reactively handling unexpected traffic surges in Kubernetes, covering HPA tuning, node pre-scaling, load testing, and graceful degradation.
- [Scenario: Recovering from a Failed Deployment](https://agent-zone.ai/knowledge/kubernetes/scenarios-recovering-from-failed-deployment/) — Diagnosis and recovery guide for when a Kubernetes deployment fails -- covering CrashLoopBackOff, ImagePullBackOff, stuck rollouts, rollback execution, and post-incident prevention.
- [Security Hardening a Kubernetes Cluster: End-to-End Operational Sequence](https://agent-zone.ai/knowledge/kubernetes/ops-security-hardening-cluster/) — Step-by-step operational plan for hardening an existing Kubernetes cluster covering RBAC, pod security, network policies, image security, audit logging, runtime monitoring, and data protection.
- [Service Account Security: Tokens, RBAC Binding, and Workload Identity](https://agent-zone.ai/knowledge/kubernetes/service-account-security/) — Kubernetes service account best practices including token projection, RBAC binding patterns, workload identity federation for GKE/EKS/AKS, disabling automounting, and audience-scoped tokens.
- [StatefulSets and Persistent Storage: Stable Identity, PVCs, and StorageClasses](https://agent-zone.ai/knowledge/kubernetes/statefulsets-and-persistent-storage/) — When and how to use StatefulSets for stateful workloads, with persistent volume provisioning, PVC resizing, and the deletion gotchas you need to know.
- [Upgrading Kubernetes Clusters Safely](https://agent-zone.ai/knowledge/kubernetes/cluster-upgrades/) — Step-by-step procedures for upgrading Kubernetes clusters across managed services and self-managed environments, including version skew policy, pre-upgrade checks, and rollback strategies.
- [Upgrading Self-Managed Kubernetes Clusters with kubeadm: Step-by-Step](https://agent-zone.ai/knowledge/kubernetes/self-managed-kubernetes-upgrades/) — Complete operational sequence for upgrading kubeadm-managed Kubernetes clusters, covering pre-upgrade checks, etcd backup, control plane upgrade, worker node drain and upgrade, rollback procedures, and version skew policy.
- [Velero Backup and Restore: Disaster Recovery for Kubernetes](https://agent-zone.ai/knowledge/kubernetes/velero-backup-and-restore/) — Install Velero with cloud or MinIO storage, configure scheduled backups, back up persistent volumes, and restore workloads to the same or a different cluster.
- [Admission Controllers and Webhooks: Intercepting and Enforcing Kubernetes API Requests](https://agent-zone.ai/knowledge/kubernetes/admission-controllers-and-webhooks/) — How Kubernetes admission controllers intercept API requests, and how to build and deploy validating and mutating webhooks for policy enforcement.
- [Advanced Kubernetes Debugging: CrashLoopBackOff, ImagePullBackOff, OOMKilled, and Stuck Pods](https://agent-zone.ai/knowledge/kubernetes/advanced-debugging-scenarios/) — Systematic debugging methodology for the most common Kubernetes pod failure modes, with exact commands, exit code interpretation, and resolution patterns.
- [ARM64 Kubernetes: The QEMU Problem with Go Binaries](https://agent-zone.ai/knowledge/kubernetes/arm64-k8s-images/) — QEMU user-mode emulation cannot reliably run Go binaries on ARM64. Learn how to diagnose the lfstack crash, build native ARM64 images, and work around missing multi-arch support.
- [Choosing a Service Mesh: Istio vs Linkerd vs Consul Connect vs No Mesh](https://agent-zone.ai/knowledge/kubernetes/choosing-service-mesh/) — Decision framework for evaluating service mesh options including when you need one, how they compare, and which to choose based on team size, feature needs, and operational capacity.
- [Choosing Kubernetes Storage: Local vs Network vs Cloud CSI Drivers](https://agent-zone.ai/knowledge/kubernetes/choosing-storage-backend/) — Decision framework for selecting Kubernetes storage backends based on performance requirements, durability needs, access modes, and environment constraints.
- [Cilium Deep Dive: eBPF Networking, L7 Policies, Hubble Observability, and Cluster Mesh](https://agent-zone.ai/knowledge/kubernetes/cilium-deep-dive/) — Advanced guide to Cilium covering eBPF-based networking, kube-proxy replacement, L3/L4/L7 network policies, FQDN-based egress control, transparent encryption, Hubble observability, Cluster Mesh multi-cluster connectivity, and production deployment considerations.
- [Custom Resource Definitions (CRDs): Extending the Kubernetes API](https://agent-zone.ai/knowledge/kubernetes/custom-resource-definitions/) — How to create, validate, version, and manage Custom Resource Definitions to extend Kubernetes with your own resource types.
- [DaemonSets: Node-Level Workloads, System Agents, and Update Strategies](https://agent-zone.ai/knowledge/kubernetes/daemonsets/) — Guide to running node-level infrastructure with DaemonSets including tolerations, update strategies, resource management, priority classes, and common operational gotchas.
- [EKS vs AKS vs GKE: Choosing a Managed Kubernetes Provider](https://agent-zone.ai/knowledge/kubernetes/choosing-cloud-k8s-provider/) — Decision framework comparing Amazon EKS, Azure AKS, and Google GKE across networking, identity, cost, node management, and ecosystem fit.
- [Emulating Production Namespace Organization in Minikube](https://agent-zone.ai/knowledge/kubernetes/minikube-namespace-organization/) — How to set up production-realistic namespace hierarchies, ResourceQuotas, LimitRanges, and RBAC in minikube for local development that mirrors real environments.
- [Gateway API: The Modern Replacement for Ingress in Kubernetes](https://agent-zone.ai/knowledge/kubernetes/gateway-api-patterns/) — Comprehensive guide to the Kubernetes Gateway API covering core resources, role-oriented design, HTTPRoute features, traffic splitting, TLS configuration, available implementations, and migration strategies from legacy Ingress.
- [GitOps for Kubernetes: Patterns, Tools, and Workflow Design](https://agent-zone.ai/knowledge/kubernetes/gitops-kubernetes-patterns/) — Production patterns for GitOps including ArgoCD vs Flux, repository structures, environment promotion, secrets management, drift detection, and multi-cluster strategies.
- [Helm Release Naming Gotchas: How Resource Names Actually Work](https://agent-zone.ai/knowledge/kubernetes/helm-naming-gotchas/) — Bitnami and community Helm charts derive resource names inconsistently. Learn the real naming patterns, common collisions, and how to debug them.
- [Init Containers and Sidecar Patterns: Sequential Setup and Co-located Services](https://agent-zone.ai/knowledge/kubernetes/init-containers-and-sidecars/) — How to use init containers for startup dependencies, shared volume setup, and permission initialization, plus sidecar patterns for log shipping, proxies, and config reloading.
- [Jobs and CronJobs: Batch Workloads, Retry Logic, and Scheduling](https://agent-zone.ai/knowledge/kubernetes/jobs-and-cronjobs/) — Practical guide to Kubernetes Jobs and CronJobs covering run-to-completion semantics, retry policies, pod failure handling, cron scheduling, and common production patterns.
- [kubectl debug and Ephemeral Containers: Non-Invasive Production Debugging](https://agent-zone.ai/knowledge/kubernetes/kubectl-debug-and-ephemeral-containers/) — Use kubectl debug and ephemeral containers to troubleshoot distroless images, inspect running processes, and debug node-level issues without restarting pods.
- [Kubernetes Cost Optimization: Rightsizing, Resource Efficiency, and Waste Reduction](https://agent-zone.ai/knowledge/kubernetes/cost-optimization-and-rightsizing/) — Practical strategies for reducing Kubernetes cluster costs through resource rightsizing, node optimization, and waste elimination.
- [Kubernetes Disaster Recovery: Runbooks for Common Incidents](https://agent-zone.ai/knowledge/kubernetes/disaster-recovery-runbooks/) — Step-by-step runbooks for node failures, etcd quorum loss, control plane outages, certificate expiry, PVC data loss, bad deployments, and stuck namespaces.
- [Kubernetes Operator Development: Patterns, Frameworks, and Best Practices](https://agent-zone.ai/knowledge/kubernetes/operator-development-patterns/) — How to build Kubernetes operators using Kubebuilder and controller-runtime, with reconciliation patterns, error handling, and testing strategies.
- [Kubernetes Resource Management: QoS Classes, Eviction, OOM Scoring, and Capacity Planning](https://agent-zone.ai/knowledge/kubernetes/resource-management-deep-dive/) — Advanced guide to Kubernetes resource management covering QoS class assignment, CPU throttling mechanics, OOM killer behavior, node-level eviction, capacity planning, and the monitoring metrics that reveal resource problems before they become outages.
- [Kubernetes Troubleshooting Decision Trees: Symptom to Diagnosis to Fix](https://agent-zone.ai/knowledge/kubernetes/kubernetes-troubleshooting-decision-trees/) — A collection of structured decision trees for the most common Kubernetes problems, designed for systematic diagnosis from initial symptom through root cause identification to resolution.
- [Managed Kubernetes vs Self-Managed: EKS/AKS/GKE vs kubeadm vs k3s vs RKE](https://agent-zone.ai/knowledge/kubernetes/choosing-managed-vs-self-managed/) — Decision framework for choosing between managed Kubernetes services and self-managed distributions based on operational burden, cost, control, and environment requirements.
- [Minikube Networking: Services, Ingress, DNS, and LoadBalancer Emulation](https://agent-zone.ai/knowledge/kubernetes/minikube-networking-deep-dive/) — Deep dive into minikube networking: how service types behave locally, LoadBalancer emulation with tunnel and MetalLB, ingress configuration, DNS debugging, and network policies.
- [Minikube Setup, Drivers, and Resource Configuration](https://agent-zone.ai/knowledge/kubernetes/minikube-setup-and-drivers/) — Complete guide to installing minikube, choosing the right driver for your platform, configuring resources, and managing profiles for production-like local Kubernetes.
- [Multi-Cluster Kubernetes: Architecture, Networking, and Management Patterns](https://agent-zone.ai/knowledge/kubernetes/multi-cluster-patterns/) — Architecture patterns, networking strategies, GitOps management, and observability approaches for running workloads across multiple Kubernetes clusters.
- [OPA Gatekeeper: Policy as Code for Kubernetes](https://agent-zone.ai/knowledge/kubernetes/opa-gatekeeper-policy/) — How to use OPA Gatekeeper to define, test, and enforce Kubernetes policies using ConstraintTemplates, Constraints, and audit mode.
- [Pod Affinity and Anti-Affinity: Co-locating and Spreading Workloads](https://agent-zone.ai/knowledge/kubernetes/pod-affinity-and-anti-affinity/) — How to use pod affinity to co-locate related pods and pod anti-affinity to spread replicas across nodes and zones.
- [Pod Security Standards and Admission: Replacing PodSecurityPolicy](https://agent-zone.ai/knowledge/kubernetes/pod-security-standards/) — How to use Pod Security Admission to enforce the privileged, baseline, and restricted security standards across Kubernetes namespaces.
- [Pod Topology Spread Constraints: Even Distribution Across Failure Domains](https://agent-zone.ai/knowledge/kubernetes/topology-spread-constraints/) — How to use topologySpreadConstraints to evenly distribute pods across zones, nodes, and other topology domains.
- [PostgreSQL 15+ Permissions: Why Your Helm Deployment Cannot Create Tables](https://agent-zone.ai/knowledge/kubernetes/postgres15-permissions/) — PostgreSQL 15 changed default permissions on the public schema. Learn why GRANT ALL is not enough, how to fix Helm init scripts, and how to debug permission denied errors.
- [Secrets Management in Minikube: From Basic to Production Patterns](https://agent-zone.ai/knowledge/kubernetes/minikube-secrets-management/) — Comprehensive guide to Kubernetes secret types, mounting strategies, rotation patterns, RBAC, Sealed Secrets, External Secrets Operator, and Helm integration for local development.
- [Security Contexts, Seccomp, and AppArmor: Container Runtime Security](https://agent-zone.ai/knowledge/kubernetes/security-contexts-and-runtime-security/) — How to configure pod and container security contexts, seccomp profiles, AppArmor, and Linux capabilities for production-hardened Kubernetes workloads.
- [Spot Instances and Preemptible Nodes: Running Kubernetes on Discounted Compute](https://agent-zone.ai/knowledge/kubernetes/spot-and-preemptible-nodes/) — How to run Kubernetes workloads on spot/preemptible instances for 60-90% cost savings while handling interruptions gracefully.
- [Taints, Tolerations, and Node Affinity: Controlling Pod Placement](https://agent-zone.ai/knowledge/kubernetes/taints-tolerations-and-node-affinity/) — How taints repel pods from nodes, how tolerations override them, and how node affinity targets specific nodes for workload placement.
- [Vertical Pod Autoscaler (VPA): Right-Sizing Resource Requests Automatically](https://agent-zone.ai/knowledge/kubernetes/vertical-pod-autoscaler/) — How VPA analyzes actual pod resource usage and recommends or applies optimal CPU and memory requests.
- [Minikube Add-ons for Production-Like Environments](https://agent-zone.ai/knowledge/kubernetes/minikube-production-addons/) — Enable and configure minikube add-ons to emulate production infrastructure locally -- metrics, ingress, registry, load balancers, and monitoring.
- [Minikube Storage: PersistentVolumes, StorageClasses, and Data Persistence Patterns](https://agent-zone.ai/knowledge/kubernetes/minikube-storage-and-persistence/) — How minikube handles persistent storage with its built-in hostPath provisioner, PVC lifecycle, database storage patterns, and gotchas around permissions and data persistence.
- [Multi-Cluster Emulation with Minikube Profiles](https://agent-zone.ai/knowledge/kubernetes/minikube-multi-cluster-profiles/) — Use minikube profiles to run multiple independent Kubernetes clusters on one machine for testing upgrades, simulating environments, and multi-cluster tooling.
- [Using Minikube for CI, Integration Testing, and Local Development Workflows](https://agent-zone.ai/knowledge/kubernetes/minikube-ci-local-testing/) — Patterns for using minikube in CI pipelines, integration test suites, and local development loops -- from image loading to GitHub Actions workflows to Makefile templates.


---

[JSON](https://agent-zone.ai/knowledge/kubernetes/index.json) | [HTML](https://agent-zone.ai/knowledge/kubernetes/?format=html)