Ephemeral Cloud Clusters: Create, Validate, Destroy Sequences for EKS, GKE, and AKS

Ephemeral Cloud Clusters#

Ephemeral clusters exist for one purpose: validate something, then disappear. They are not staging environments, not shared dev clusters, not long-lived resources that someone forgets to turn off. The operational model is strict – create, validate, destroy – and the entire sequence must be automated so that destruction cannot be forgotten.

The cost of getting this wrong is real. A three-node EKS cluster left running over a weekend costs roughly $15. Left running for a month, $200. Multiply by the number of developers or CI pipelines that create clusters, and forgotten ephemeral infrastructure becomes a significant budget line item. Every template in this article includes auto-destroy mechanisms to prevent this.

etcd Maintenance for Self-Managed Clusters

etcd Maintenance for Self-Managed Clusters#

etcd is the backing store for all Kubernetes cluster state. Every object – pods, services, secrets, configmaps – lives in etcd. If etcd is unhealthy, your cluster is unhealthy. If etcd data is lost, your cluster is gone. Managed Kubernetes services (EKS, GKE, AKS) handle etcd for you, but self-managed clusters require you to operate it directly.

All etcdctl commands below require TLS flags. Set these as environment variables to avoid repeating them:

Event-Driven Architecture for Microservices

Event-Driven Architecture for Microservices#

In a microservices architecture, services need to communicate. The two fundamental approaches are synchronous (request-response) and asynchronous (event-driven). Most systems use both – the decision is which interactions should be synchronous and which should be event-driven.

Synchronous vs Asynchronous Communication#

Synchronous (request-response): Service A calls Service B and waits for a response. Simple, familiar, and works well when A needs the response to continue. The cost is temporal coupling – if B is down, A fails.

FIPS 140 Compliance: Validated Cryptography, FIPS-Enabled Runtimes, and Kubernetes Deployment

FIPS 140 Compliance#

FIPS 140 (Federal Information Processing Standard 140) is a US and Canadian government standard for cryptographic modules. If you sell software to US federal agencies, process federal data, or operate under FedRAMP, you must use FIPS 140-validated cryptographic modules. Many regulated industries (finance, healthcare, defense) also require or strongly prefer FIPS compliance.

FIPS 140 does not tell you which algorithms to use — it validates that a specific implementation of those algorithms has been tested and certified by an accredited lab (CMVP — Cryptographic Module Validation Program).

From Empty Cluster to Production-Ready: The Complete Setup Sequence

From Empty Cluster to Production-Ready#

This is the definitive operational plan for taking a fresh Kubernetes cluster and making it production-ready. Each phase builds on the previous one, with verification steps between phases and rollback notes where applicable. An agent should be able to follow this sequence end-to-end.

Estimated timeline: 5 days for a single operator. Phases 1-2 are blocking prerequisites. Phases 3-6 can partially overlap.


Phase 1 – Foundation (Day 1)#

Everything else depends on a healthy cluster with proper namespacing and storage. Do not proceed until every verification step passes.

GCP Fundamentals for Agents

Projects and Organization#

GCP organizes resources into Projects, which sit under Folders and an Organization. A project is the fundamental unit of resource organization, billing, and API enablement. Every GCP resource belongs to exactly one project.

# Set the active project
gcloud config set project my-prod-project

# List all projects
gcloud projects list

# Create a new project
gcloud projects create staging-project-2026 \
  --name="Staging" \
  --organization=ORG_ID

# Enable required APIs (must be done per-project)
gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable sqladmin.googleapis.com

Check which project is currently active:

GCP Terraform Patterns: Projects, GKE, Workload Identity, Cloud SQL, and Common Gotchas

GCP Terraform Patterns#

GCP’s Terraform provider (google and google-beta) has patterns distinct from both AWS and Azure. The biggest differences: APIs must be explicitly enabled per project, IAM uses a binding model (not inline policies), and GKE requires secondary IP ranges for VPC-native networking. GCP resources also tend to have longer creation times with more eventual consistency.

Projects and API Enablement#

Before creating any resource in GCP, the corresponding API must be enabled in the project. This is the most common source of first-time failures.

Git Branching Strategies: Trunk-Based, GitHub Flow, and When to Use What

Git Branching Strategies#

Choosing a branching strategy is choosing your team’s speed limit. The wrong model introduces merge conflicts, stale branches, and release bottlenecks. The right model depends on how you deploy, how big your team is, and how much you trust your test suite.

Trunk-Based Development#

Everyone commits to main (or very short-lived branches that merge within hours). No long-running feature branches. No develop branch. No release branches unless you need to patch old versions.

GitHub Actions Fundamentals: Workflows, Triggers, Jobs, and Data Passing

GitHub Actions Fundamentals#

GitHub Actions is CI/CD built into GitHub. Workflows are YAML files in .github/workflows/. They run on GitHub-hosted or self-hosted machines in response to repository events. No external CI server required.

Workflow File Structure#

Every workflow has three levels: workflow (triggers and config), jobs (parallel units of work), and steps (sequential commands within a job).

name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.23'
      - run: go test ./...

  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: golangci/golangci-lint-action@v6

Jobs run in parallel by default. Steps within a job run sequentially. Each job gets a fresh runner – no state carries over between jobs unless you explicitly pass it via artifacts or outputs.

GitHub Actions Kubernetes Pipeline: From Git Push to Helm Deploy

GitHub Actions Kubernetes Pipeline#

This guide builds a complete pipeline: push code, build a container image, validate the Helm chart, and deploy to Kubernetes. Each stage gates the next, so broken images never reach your cluster.

Pipeline Overview#

The pipeline has four stages:

  1. Build and push the container image to GitHub Container Registry (GHCR).
  2. Lint and validate the Helm chart with helm lint and kubeconform.
  3. Deploy to dev automatically on pushes to main.
  4. Promote to staging and production via manual approval.

Complete Workflow File#

# .github/workflows/deploy.yml
name: Build and Deploy

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      environment:
        description: "Target environment"
        required: true
        type: choice
        options: [dev, staging, production]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image-tag: ${{ steps.meta.outputs.version }}
    steps:
      - uses: actions/checkout@v4

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch

      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

  validate:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Helm lint
        run: helm lint ./charts/my-app -f charts/my-app/values.yaml

      - name: Install kubeconform
        run: |
          curl -sL https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz \
            | tar xz -C /usr/local/bin

      - name: Validate rendered templates
        run: |
          helm template my-app ./charts/my-app \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            | kubeconform -strict -summary \
              -kubernetes-version 1.29.0

  deploy-dev:
    runs-on: ubuntu-latest
    needs: [build, validate]
    if: github.ref == 'refs/heads/main'
    environment: dev
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_DEV }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-dev \
            --create-namespace \
            -f charts/my-app/values-dev.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

      - name: Verify deployment
        run: kubectl rollout status deployment/my-app -n my-app-dev --timeout=120s

  deploy-staging:
    runs-on: ubuntu-latest
    needs: [build, validate, deploy-dev]
    environment: staging
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_STAGING }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-staging \
            --create-namespace \
            -f charts/my-app/values-staging.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

  deploy-production:
    runs-on: ubuntu-latest
    needs: [build, validate, deploy-staging]
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_PROD }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-prod \
            --create-namespace \
            -f charts/my-app/values-production.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

Key Design Decisions#

Image Tagging with Git SHA#

The docker/metadata-action generates tags from the git SHA. This creates immutable, traceable image tags – you can always identify exactly which commit produced a given deployment.