Cloud-Native vs Portable Infrastructure: A Decision Framework

Cloud-Native vs Portable Infrastructure#

Every infrastructure decision sits on a spectrum between portability and fidelity. On one end, you have generic Kubernetes running on minikube or kind – it works everywhere, costs nothing, and captures the behavior of the Kubernetes API itself. On the other end, you have cloud-native managed services – EKS with IRSA and ALB Ingress Controller, GKE with Workload Identity and Cloud Load Balancing, AKS with Azure AD Pod Identity and Azure Load Balancer. These capture the behavior of the actual platform your workloads will run on.

Ephemeral Cloud Clusters: Create, Validate, Destroy Sequences for EKS, GKE, and AKS

Ephemeral Cloud Clusters#

Ephemeral clusters exist for one purpose: validate something, then disappear. They are not staging environments, not shared dev clusters, not long-lived resources that someone forgets to turn off. The operational model is strict – create, validate, destroy – and the entire sequence must be automated so that destruction cannot be forgotten.

The cost of getting this wrong is real. A three-node EKS cluster left running over a weekend costs roughly $15. Left running for a month, $200. Multiply by the number of developers or CI pipelines that create clusters, and forgotten ephemeral infrastructure becomes a significant budget line item. Every template in this article includes auto-destroy mechanisms to prevent this.

GCP Fundamentals for Agents

Projects and Organization#

GCP organizes resources into Projects, which sit under Folders and an Organization. A project is the fundamental unit of resource organization, billing, and API enablement. Every GCP resource belongs to exactly one project.

# Set the active project
gcloud config set project my-prod-project

# List all projects
gcloud projects list

# Create a new project
gcloud projects create staging-project-2026 \
  --name="Staging" \
  --organization=ORG_ID

# Enable required APIs (must be done per-project)
gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable sqladmin.googleapis.com

Check which project is currently active:

GKE Networking

GKE Networking#

GKE networking centers on VPC-native clusters, where pods and services get IP addresses from VPC subnet ranges. This integrates Kubernetes networking directly into Google Cloud’s VPC, enabling native routing, firewall rules, and load balancing without extra overlays.

VPC-Native Clusters and Alias IP Ranges#

VPC-native clusters use alias IP ranges on the subnet. You allocate two secondary ranges: one for pods, one for services.

# Create subnet with secondary ranges
gcloud compute networks subnets create gke-subnet \
  --network my-vpc \
  --region us-central1 \
  --range 10.0.0.0/20 \
  --secondary-range pods=10.4.0.0/14,services=10.8.0.0/20

# Create cluster using those ranges
gcloud container clusters create my-cluster \
  --region us-central1 \
  --network my-vpc \
  --subnetwork gke-subnet \
  --cluster-secondary-range-name pods \
  --services-secondary-range-name services \
  --enable-ip-alias

The pod range needs to be large. A /14 gives about 262,000 pod IPs. Each node reserves a /24 from the pod range (256 IPs, 110 usable pods per node). If you have 100 nodes, that consumes 100 /24 blocks. Undersizing the pod range is a common cause of IP exhaustion – the cluster cannot add nodes even though VMs are available.

GKE Security and Identity

GKE Security and Identity#

GKE security covers identity (who can do what), workload isolation (sandboxing untrusted code), supply chain integrity (ensuring only trusted images run), and data protection (encryption at rest). These features layer on top of standard Kubernetes RBAC and network policies.

Workload Identity Federation#

Workload Identity Federation is the successor to the original Workload Identity. It removes the need for a separate workload-pool flag and uses the standard GCP IAM federation model. The concept is the same: bind a Kubernetes service account to a Google Cloud service account so pods get GCP credentials without exported keys.

GKE Setup and Configuration

GKE Setup and Configuration#

GKE is Google’s managed Kubernetes service. The two major decisions when creating a cluster are the mode (Standard vs Autopilot) and the networking model (VPC-native is now the default and the only option for new clusters). Everything else – node pools, release channels, Workload Identity – layers on top of those choices.

Standard vs Autopilot#

Standard mode gives you full control over node pools, machine types, and node configuration. You manage capacity, pay per node (whether pods are using the resources or not), and can run DaemonSets, privileged containers, and host-network pods.

GKE Troubleshooting

GKE Troubleshooting#

GKE adds a layer of Google Cloud infrastructure on top of Kubernetes, which means some problems are pure Kubernetes issues and others are GKE-specific. This guide covers the GKE-specific problems that trip people up.

Autopilot Resource Adjustment#

Autopilot automatically mutates pod resource requests to fit its scheduling model. If you request cpu: 100m and memory: 128Mi, Autopilot may bump the request to cpu: 250m and memory: 512Mi. This affects your billing (you pay per resource request) and can cause unexpected OOMKills if the limits were set relative to the original request.

Minikube to Cloud Migration: 10 Things That Change on EKS, GKE, and AKS

Minikube to Cloud Migration Guide#

Minikube is excellent for learning and local development. But almost everything that “just works” on minikube requires explicit configuration on a cloud cluster. Here are the 10 things that change.

1. Ingress Controller Becomes a Cloud Load Balancer#

On minikube: You enable the NGINX ingress addon with minikube addons enable ingress. Traffic reaches your services through minikube tunnel or minikube service.

On cloud: The ingress controller must be deployed explicitly, and it provisions a real cloud load balancer. On AWS, the AWS Load Balancer Controller creates ALBs or NLBs from Ingress resources. On GKE, the built-in GCE ingress controller creates Google Cloud Load Balancers. You pay per load balancer.

Running Windows Workloads on Kubernetes: Node Pools, Scheduling, and Gotchas

Running Windows Workloads on Kubernetes#

Kubernetes supports Windows worker nodes alongside Linux worker nodes in the same cluster. This enables running Windows-native applications – .NET Framework services, IIS-hosted applications, Windows-specific middleware – on Kubernetes without rewriting them for Linux. However, Windows nodes are not interchangeable with Linux nodes. There are fundamental differences in networking, storage, container runtime behavior, and resource management that you must account for.

Core Constraints#

Before adding Windows nodes, understand what is and is not supported:

Terraform Cloud Architecture Patterns: VPC/EKS/RDS on AWS, VNET/AKS on Azure, VPC/GKE on GCP

Terraform Cloud Architecture Patterns#

The three-tier architecture — networking, managed Kubernetes, managed database — is the most common pattern for production deployments on any major cloud. The concepts are identical across AWS, Azure, and GCP. The Terraform code is not. Resource names differ, required arguments differ, default behaviors differ, and the gotchas that catch agents and humans are cloud-specific.

This article shows the real Terraform for each layer on each cloud, side by side, so agents can write correct infrastructure code for whichever cloud the user deploys to.