Ephemeral Cloud Clusters: Create, Validate, Destroy Sequences for EKS, GKE, and AKS

Ephemeral Cloud Clusters#

Ephemeral clusters exist for one purpose: validate something, then disappear. They are not staging environments, not shared dev clusters, not long-lived resources that someone forgets to turn off. The operational model is strict – create, validate, destroy – and the entire sequence must be automated so that destruction cannot be forgotten.

The cost of getting this wrong is real. A three-node EKS cluster left running over a weekend costs roughly $15. Left running for a month, $200. Multiply by the number of developers or CI pipelines that create clusters, and forgotten ephemeral infrastructure becomes a significant budget line item. Every template in this article includes auto-destroy mechanisms to prevent this.

GitHub Actions Kubernetes Pipeline: From Git Push to Helm Deploy

GitHub Actions Kubernetes Pipeline#

This guide builds a complete pipeline: push code, build a container image, validate the Helm chart, and deploy to Kubernetes. Each stage gates the next, so broken images never reach your cluster.

Pipeline Overview#

The pipeline has four stages:

  1. Build and push the container image to GitHub Container Registry (GHCR).
  2. Lint and validate the Helm chart with helm lint and kubeconform.
  3. Deploy to dev automatically on pushes to main.
  4. Promote to staging and production via manual approval.

Complete Workflow File#

# .github/workflows/deploy.yml
name: Build and Deploy

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      environment:
        description: "Target environment"
        required: true
        type: choice
        options: [dev, staging, production]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image-tag: ${{ steps.meta.outputs.version }}
    steps:
      - uses: actions/checkout@v4

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch

      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

  validate:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Helm lint
        run: helm lint ./charts/my-app -f charts/my-app/values.yaml

      - name: Install kubeconform
        run: |
          curl -sL https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz \
            | tar xz -C /usr/local/bin

      - name: Validate rendered templates
        run: |
          helm template my-app ./charts/my-app \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            | kubeconform -strict -summary \
              -kubernetes-version 1.29.0

  deploy-dev:
    runs-on: ubuntu-latest
    needs: [build, validate]
    if: github.ref == 'refs/heads/main'
    environment: dev
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_DEV }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-dev \
            --create-namespace \
            -f charts/my-app/values-dev.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

      - name: Verify deployment
        run: kubectl rollout status deployment/my-app -n my-app-dev --timeout=120s

  deploy-staging:
    runs-on: ubuntu-latest
    needs: [build, validate, deploy-dev]
    environment: staging
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_STAGING }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-staging \
            --create-namespace \
            -f charts/my-app/values-staging.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

  deploy-production:
    runs-on: ubuntu-latest
    needs: [build, validate, deploy-staging]
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_PROD }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-prod \
            --create-namespace \
            -f charts/my-app/values-production.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

Key Design Decisions#

Image Tagging with Git SHA#

The docker/metadata-action generates tags from the git SHA. This creates immutable, traceable image tags – you can always identify exactly which commit produced a given deployment.

Grafana Dashboards for Kubernetes Monitoring

Data Source Configuration#

Grafana connects to backend data stores through data sources. For a complete Kubernetes observability stack, you need three: Prometheus for metrics, Loki for logs, and Tempo for traces.

Provision data sources declaratively so they survive Grafana restarts and are version-controlled:

# grafana/provisioning/datasources/observability.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus-operated:9090
    isDefault: true
    jsonData:
      timeInterval: "15s"
      exemplarTraceIdDestinations:
        - name: traceID
          datasourceUid: tempo

  - name: Loki
    type: loki
    access: proxy
    url: http://loki-gateway:3100
    jsonData:
      derivedFields:
        - name: TraceID
          matcherRegex: '"traceID":"(\w+)"'
          url: "$${__value.raw}"
          datasourceUid: tempo

  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3100
    jsonData:
      tracesToMetrics:
        datasourceUid: prometheus
        tags: [{key: "service.name", value: "job"}]
      serviceMap:
        datasourceUid: prometheus
      nodeGraph:
        enabled: true

The cross-linking configuration lets you click from a metric data point to the trace that generated it, and extract trace IDs from log lines to link to Tempo.

Grafana Loki for Log Aggregation

Loki Architecture#

Loki is a log aggregation system designed by Grafana Labs. Unlike Elasticsearch, Loki does not index log content. It indexes only metadata labels, then stores compressed log chunks in object storage. This makes it cheaper to operate and simpler to scale, at the cost of slower full-text search across massive datasets.

The core components are:

  • Distributor: Receives incoming log streams from agents, validates labels, and forwards to ingesters via consistent hashing.
  • Ingester: Buffers log data in memory, builds compressed chunks, and flushes them to long-term storage (S3, GCS, filesystem).
  • Querier: Executes LogQL queries by fetching chunk references from the index and reading chunk data from storage.
  • Compactor: Runs periodic compaction on the index (especially for boltdb-shipper) and handles retention enforcement by deleting old data.
  • Query Frontend (optional): Splits large queries into smaller ones, caches results, and distributes work across queriers.

Deployment Modes#

Loki supports three deployment modes, each suited to different scales.

Infrastructure Knowledge Scoping for Agents

Infrastructure Knowledge Scoping for Agents#

An agent working on infrastructure tasks needs to operate at the right level of specificity. Giving generic Kubernetes advice when the user runs EKS with IRSA is unhelpful – the agent misses the IAM integration that will make or break the deployment. Giving EKS-specific advice when the user runs minikube on a laptop is equally unhelpful – the agent references services and configurations that do not exist.

Jenkins Debugging: Diagnosing Stuck Builds, Pipeline Failures, Performance Issues, and Kubernetes Agent Problems

Jenkins Debugging#

Jenkins failures fall into a few categories: builds stuck waiting, cryptic pipeline errors, performance degradation, and Kubernetes agent pods that refuse to launch.

Builds Stuck in Queue#

When a build sits in the queue and never starts, check the queue tooltip in the UI – it tells you why. Common causes:

No agents with matching labels. The pipeline requests agent { label 'docker-arm64' } but no agent has that label. Check Manage Jenkins > Nodes to see available labels.

Jenkins Kubernetes Integration: Dynamic Pod Agents, Pod Templates, and In-Cluster Builds

Jenkins Kubernetes Integration#

The kubernetes plugin gives Jenkins elastic build capacity. Each build spins up a pod, runs its work, and the pod is deleted. No idle agents, no capacity planning, no snowflake build servers.

The Kubernetes Plugin#

The plugin creates agent pods on demand. When a pipeline requests an agent, a pod is created from a template, its JNLP container connects back to Jenkins, the build runs, and the pod is deleted.

Jenkins Setup and Configuration: Installation, JCasC, Plugins, Credentials, and Agents

Jenkins Setup and Configuration#

Jenkins is a self-hosted automation server. Unlike managed CI services, you own the infrastructure, which means you control everything from plugin versions to executor capacity. This guide covers the three main installation methods and the configuration patterns that make Jenkins manageable at scale.

Installation with Docker#

The fastest way to run Jenkins locally or in a VM:

docker run -d \
  --name jenkins \
  -p 8080:8080 \
  -p 50000:50000 \
  -v jenkins_home:/var/jenkins_home \
  jenkins/jenkins:lts-jdk17

Port 8080 is the web UI. Port 50000 is the JNLP agent port for inbound agent connections. The volume mount is critical – without it, all configuration and build history is lost when the container restarts.

kind Validation Templates: Cluster Configs and Lifecycle Scripts

kind Validation Templates#

kind (Kubernetes IN Docker) runs Kubernetes clusters using Docker containers as nodes. It was designed for testing Kubernetes itself, which makes it an excellent tool for validating infrastructure changes. It starts fast, uses fewer resources than minikube, and is disposable by design.

This article provides copy-paste cluster configurations and complete lifecycle scripts for common validation scenarios.

Cluster Configuration Templates#

Basic Single-Node#

The simplest configuration. One container acts as both control plane and worker. Sufficient for validating that deployments, services, ConfigMaps, and Secrets work correctly.

Knative: Serverless on Kubernetes

Knative: Serverless on Kubernetes#

Knative brings serverless capabilities to any Kubernetes cluster. Unlike managed serverless platforms, you own the cluster – Knative adds autoscaling to zero, revision-based deployments, and event-driven invocation on top of standard Kubernetes primitives. This gives you the serverless developer experience without vendor lock-in.

Knative has two independent components: Serving (request-driven compute that scales to zero) and Eventing (event routing and delivery). You can install either or both.