GKE Security and Identity

GKE Security and Identity#

GKE security covers identity (who can do what), workload isolation (sandboxing untrusted code), supply chain integrity (ensuring only trusted images run), and data protection (encryption at rest). These features layer on top of standard Kubernetes RBAC and network policies.

Workload Identity Federation#

Workload Identity Federation is the successor to the original Workload Identity. It removes the need for a separate workload-pool flag and uses the standard GCP IAM federation model. The concept is the same: bind a Kubernetes service account to a Google Cloud service account so pods get GCP credentials without exported keys.

GKE Setup and Configuration

GKE Setup and Configuration#

GKE is Google’s managed Kubernetes service. The two major decisions when creating a cluster are the mode (Standard vs Autopilot) and the networking model (VPC-native is now the default and the only option for new clusters). Everything else – node pools, release channels, Workload Identity – layers on top of those choices.

Standard vs Autopilot#

Standard mode gives you full control over node pools, machine types, and node configuration. You manage capacity, pay per node (whether pods are using the resources or not), and can run DaemonSets, privileged containers, and host-network pods.

GKE Troubleshooting

GKE Troubleshooting#

GKE adds a layer of Google Cloud infrastructure on top of Kubernetes, which means some problems are pure Kubernetes issues and others are GKE-specific. This guide covers the GKE-specific problems that trip people up.

Autopilot Resource Adjustment#

Autopilot automatically mutates pod resource requests to fit its scheduling model. If you request cpu: 100m and memory: 128Mi, Autopilot may bump the request to cpu: 250m and memory: 512Mi. This affects your billing (you pay per resource request) and can cause unexpected OOMKills if the limits were set relative to the original request.

GPU and ML Workloads on Kubernetes: Scheduling, Sharing, and Monitoring

GPU and ML Workloads on Kubernetes#

Running GPU workloads on Kubernetes requires hardware-aware scheduling that the default scheduler does not provide out of the box. GPUs are expensive – an NVIDIA A100 node costs $3-12/hour on cloud providers – so efficient utilization matters far more than with CPU workloads. This article covers the full stack from device plugin installation through GPU sharing and monitoring.

The NVIDIA Device Plugin#

Kubernetes has no native understanding of GPUs. The NVIDIA device plugin bridges that gap by exposing GPUs as a schedulable resource (nvidia.com/gpu). Without it, the scheduler has no idea which nodes have GPUs or how many are available.

HashiCorp Vault on Kubernetes: Secrets Management Done Right

HashiCorp Vault on Kubernetes#

Vault centralizes secret management with dynamic credentials, encryption as a service, and fine-grained access control. On Kubernetes, workloads authenticate using service accounts and pull secrets without hardcoding anything.

Installation with Helm#

helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update

Dev Mode (Single Pod, In-Memory)#

Automatically initialized and unsealed, stores everything in memory, loses all data on restart. Root token is root. Never use this in production.

helm upgrade --install vault hashicorp/vault \
  --namespace vault --create-namespace \
  --set server.dev.enabled=true \
  --set injector.enabled=true

Production Mode (HA with Integrated Raft Storage)#

Run Vault in HA mode with Raft consensus – a 3-node StatefulSet with persistent storage.

Helm Chart Development: Templates, Helpers, and Testing

Helm Chart Development#

Writing your own Helm charts turns static YAML into reusable, configurable packages. The learning curve is in Go’s template syntax and Helm’s conventions, but once you internalize the patterns, chart development is fast.

Chart Structure#

Create a new chart scaffold:

helm create my-app

This generates:

my-app/
  Chart.yaml              # chart metadata (name, version, dependencies)
  values.yaml             # default configuration values
  charts/                 # dependency charts (populated by helm dependency update)
  templates/              # Kubernetes manifest templates
    deployment.yaml
    service.yaml
    ingress.yaml
    serviceaccount.yaml
    hpa.yaml
    NOTES.txt             # post-install instructions (printed after helm install)
    _helpers.tpl           # named template definitions
    tests/
      test-connection.yaml # helm test pod

Chart.yaml#

The Chart.yaml defines your chart’s identity and dependencies:

Helm Values and Overrides: Precedence, Inspection, and Environment Patterns

Helm Values and Overrides#

Every Helm chart has a values.yaml file that defines defaults. When you install or upgrade a release, you override those defaults through values files (-f) and inline flags (--set). Getting the precedence wrong leads to silent misconfigurations where you think you set something but the chart used a different value.

Inspecting Chart Defaults#

Before overriding anything, look at what the chart provides. helm show values dumps the full default values.yaml for any chart:

Image Patching and Lifecycle: Keeping Container Images Current

Image Patching and Lifecycle#

Building a container image and deploying it is the easy part. Keeping it patched over weeks, months, and years is where most teams fail. A container image deployed today with zero known vulnerabilities will accumulate CVEs as new vulnerabilities are disclosed against its OS packages, language runtime, and dependencies. You need an automated system that detects stale base images, triggers rebuilds, and rolls out updates safely.

Ingress Controllers and Routing Patterns

Ingress Controllers and Routing Patterns#

An Ingress resource defines HTTP routing rules – which hostnames and paths map to which backend Services. But an Ingress resource does nothing on its own. You need an Ingress controller running in the cluster to watch for Ingress resources and configure the actual reverse proxy.

Ingress Controllers#

The two most common controllers are nginx-ingress and Traefik.

nginx-ingress (ingress-nginx):

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx --namespace ingress-nginx --create-namespace

Note: there are two different nginx ingress projects. kubernetes/ingress-nginx (community) and nginxinc/kubernetes-ingress (NGINX Inc). The community version is far more common. Make sure you install from https://kubernetes.github.io/ingress-nginx, not the NGINX Inc chart.

Istio Service Mesh: Traffic Management, Security, and Observability

Istio Service Mesh#

Istio adds a proxy sidecar (Envoy) to every pod in the mesh. These proxies handle traffic routing, mutual TLS, retries, circuit breaking, and telemetry without changing application code. The control plane (istiod) pushes configuration to all sidecars.

When You Actually Need a Service Mesh#

You need Istio when you have multiple services requiring mTLS, fine-grained traffic control (canary releases, fault injection), or consistent observability across service-to-service communication. If you have fewer than five services, standard Kubernetes Services and NetworkPolicies are sufficient. A service mesh adds operational complexity – more moving parts, higher memory usage per sidecar, and a learning curve for proxy-level debugging.