GKE Setup and Configuration

GKE Setup and Configuration#

GKE is Google’s managed Kubernetes service. The two major decisions when creating a cluster are the mode (Standard vs Autopilot) and the networking model (VPC-native is now the default and the only option for new clusters). Everything else – node pools, release channels, Workload Identity – layers on top of those choices.

Standard vs Autopilot#

Standard mode gives you full control over node pools, machine types, and node configuration. You manage capacity, pay per node (whether pods are using the resources or not), and can run DaemonSets, privileged containers, and host-network pods.

GKE Troubleshooting

GKE Troubleshooting#

GKE adds a layer of Google Cloud infrastructure on top of Kubernetes, which means some problems are pure Kubernetes issues and others are GKE-specific. This guide covers the GKE-specific problems that trip people up.

Autopilot Resource Adjustment#

Autopilot automatically mutates pod resource requests to fit its scheduling model. If you request cpu: 100m and memory: 128Mi, Autopilot may bump the request to cpu: 250m and memory: 512Mi. This affects your billing (you pay per resource request) and can cause unexpected OOMKills if the limits were set relative to the original request.

Infrastructure Knowledge Scoping for Agents

Infrastructure Knowledge Scoping for Agents#

An agent working on infrastructure tasks needs to operate at the right level of specificity. Giving generic Kubernetes advice when the user runs EKS with IRSA is unhelpful – the agent misses the IAM integration that will make or break the deployment. Giving EKS-specific advice when the user runs minikube on a laptop is equally unhelpful – the agent references services and configurations that do not exist.

Multi-Cloud Networking Patterns

Multi-Cloud Networking Patterns#

Multi-cloud networking connects workloads across two or more cloud providers into a coherent network. The motivations vary – vendor redundancy, best-of-breed service selection, regulatory requirements – but the challenges are the same: private connectivity between isolated networks, consistent service discovery, and traffic routing that handles failures.

VPN Tunnels Between Clouds#

IPsec VPN tunnels are the simplest way to connect two cloud networks. Each provider offers managed VPN gateways that terminate IPsec tunnels, encrypting traffic between VPCs without exposing it to the public internet.

Running Windows Workloads on Kubernetes: Node Pools, Scheduling, and Gotchas

Running Windows Workloads on Kubernetes#

Kubernetes supports Windows worker nodes alongside Linux worker nodes in the same cluster. This enables running Windows-native applications – .NET Framework services, IIS-hosted applications, Windows-specific middleware – on Kubernetes without rewriting them for Linux. However, Windows nodes are not interchangeable with Linux nodes. There are fundamental differences in networking, storage, container runtime behavior, and resource management that you must account for.

Core Constraints#

Before adding Windows nodes, understand what is and is not supported:

Service Account Security: Tokens, RBAC Binding, and Workload Identity

Service Account Security#

Every pod in Kubernetes runs as a service account. By default, that is the default service account in the pod’s namespace, with an auto-mounted API token that never expires. This default configuration is overly permissive for most workloads. Hardening service accounts is one of the highest-impact security improvements you can make in a Kubernetes cluster.

The Default Problem#

When a pod starts without specifying a service account, Kubernetes does three things:

Terraform Cloud Architecture Patterns: VPC/EKS/RDS on AWS, VNET/AKS on Azure, VPC/GKE on GCP

Terraform Cloud Architecture Patterns#

The three-tier architecture — networking, managed Kubernetes, managed database — is the most common pattern for production deployments on any major cloud. The concepts are identical across AWS, Azure, and GCP. The Terraform code is not. Resource names differ, required arguments differ, default behaviors differ, and the gotchas that catch agents and humans are cloud-specific.

This article shows the real Terraform for each layer on each cloud, side by side, so agents can write correct infrastructure code for whichever cloud the user deploys to.

Upgrading Kubernetes Clusters Safely

Upgrading Kubernetes Clusters Safely#

Kubernetes releases a new minor version roughly every four months. Staying current is not optional – clusters more than three versions behind lose security patches, and skipping versions during upgrade is not supported. Every upgrade must step through each minor version sequentially.

Version Skew Policy#

The version skew policy defines which component version combinations are supported:

  • kube-apiserver instances within an HA cluster can differ by at most 1 minor version.
  • kubelet can be up to 3 minor versions older than kube-apiserver (changed from 2 in Kubernetes 1.28+), but never newer.
  • kube-controller-manager, kube-scheduler, and kube-proxy must not be newer than kube-apiserver and can be up to 1 minor version older.
  • kubectl is supported within 1 minor version (older or newer) of kube-apiserver.

The practical consequence: always upgrade the control plane first, then node pools. Never upgrade nodes past the control plane version.

Cloud Networking Fundamentals: VPCs, Subnets, Security Groups, and Connectivity

VPC Concepts#

A Virtual Private Cloud is an isolated virtual network inside a cloud provider. Every resource you launch – EC2 instances, RDS databases, Lambda functions with VPC access – lives inside a VPC. The VPC defines an IP address range using CIDR notation, and all resources within it get addresses from that range.

The most common mistake is giving every VPC a /16 (65,536 addresses). This wastes IP space and causes problems later when you need to peer VPCs – overlapping CIDR blocks cannot be peered. Plan your IP allocation before building anything.

EKS vs AKS vs GKE: Choosing a Managed Kubernetes Provider

EKS vs AKS vs GKE: Choosing a Managed Kubernetes Provider#

All three major managed Kubernetes services run certified, conformant Kubernetes. The differences lie in networking models, identity integration, node management, upgrade experience, cost, and ecosystem strengths. Your choice should be driven by where the rest of your infrastructure lives, your team’s existing expertise, and specific feature requirements.

Feature Comparison#

Control Plane#

GKE has the most polished upgrade experience. Release channels (Rapid, Regular, Stable) provide automatic upgrades with configurable maintenance windows. Surge upgrades handle node pools with minimal disruption. Google invented Kubernetes, and GKE reflects that pedigree in control plane operations.