Building a Kubernetes Deployment Pipeline: From Code Push to Production

Building a Kubernetes Deployment Pipeline: From Code Push to Production#

A deployment pipeline connects a code commit to a running container in your cluster. This operational sequence walks through building one end-to-end, with decision points at each phase and verification steps to confirm the pipeline works before moving on.

Phase 1 – Source Control and CI#

Step 1: Repository Structure#

Every deployable service needs three things alongside its application code: a Dockerfile, deployment manifests, and a CI pipeline definition.

CI/CD Patterns for Monorepos

CI/CD Patterns for Monorepos#

A monorepo puts multiple packages, services, and applications in a single repository. This simplifies cross-package changes and dependency management, but it breaks the assumption that most CI systems are built on: one repo means one build. Without careful pipeline design, every commit triggers a full rebuild of everything, and CI becomes the bottleneck.

The Core Problem#

In a monorepo, a commit that touches packages/auth-service/src/handler.ts should build and test auth-service and its dependents, but not billing-service or frontend. Getting this right is the central challenge of monorepo CI.

Debugging GitHub Actions: Triggers, Failures, Secrets, Caching, and Performance

Debugging GitHub Actions#

When a GitHub Actions workflow fails or does not behave as expected, the problem falls into a few predictable categories. This guide covers each one with the diagnostic steps and fixes.

Workflow Not Triggering#

The most common GitHub Actions “bug” is a workflow that never runs.

Check the event and branch filter. A push trigger with branches: [main] will not fire for pushes to feature/xyz. A pull_request trigger fires for the PR’s head branch, not the base branch:

GitHub Actions Fundamentals: Workflows, Triggers, Jobs, and Data Passing

GitHub Actions Fundamentals#

GitHub Actions is CI/CD built into GitHub. Workflows are YAML files in .github/workflows/. They run on GitHub-hosted or self-hosted machines in response to repository events. No external CI server required.

Workflow File Structure#

Every workflow has three levels: workflow (triggers and config), jobs (parallel units of work), and steps (sequential commands within a job).

name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.23'
      - run: go test ./...

  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: golangci/golangci-lint-action@v6

Jobs run in parallel by default. Steps within a job run sequentially. Each job gets a fresh runner – no state carries over between jobs unless you explicitly pass it via artifacts or outputs.

GitHub Actions Kubernetes Pipeline: From Git Push to Helm Deploy

GitHub Actions Kubernetes Pipeline#

This guide builds a complete pipeline: push code, build a container image, validate the Helm chart, and deploy to Kubernetes. Each stage gates the next, so broken images never reach your cluster.

Pipeline Overview#

The pipeline has four stages:

  1. Build and push the container image to GitHub Container Registry (GHCR).
  2. Lint and validate the Helm chart with helm lint and kubeconform.
  3. Deploy to dev automatically on pushes to main.
  4. Promote to staging and production via manual approval.

Complete Workflow File#

# .github/workflows/deploy.yml
name: Build and Deploy

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      environment:
        description: "Target environment"
        required: true
        type: choice
        options: [dev, staging, production]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image-tag: ${{ steps.meta.outputs.version }}
    steps:
      - uses: actions/checkout@v4

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch

      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

  validate:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Helm lint
        run: helm lint ./charts/my-app -f charts/my-app/values.yaml

      - name: Install kubeconform
        run: |
          curl -sL https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz \
            | tar xz -C /usr/local/bin

      - name: Validate rendered templates
        run: |
          helm template my-app ./charts/my-app \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            | kubeconform -strict -summary \
              -kubernetes-version 1.29.0

  deploy-dev:
    runs-on: ubuntu-latest
    needs: [build, validate]
    if: github.ref == 'refs/heads/main'
    environment: dev
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_DEV }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-dev \
            --create-namespace \
            -f charts/my-app/values-dev.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

      - name: Verify deployment
        run: kubectl rollout status deployment/my-app -n my-app-dev --timeout=120s

  deploy-staging:
    runs-on: ubuntu-latest
    needs: [build, validate, deploy-dev]
    environment: staging
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_STAGING }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-staging \
            --create-namespace \
            -f charts/my-app/values-staging.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

  deploy-production:
    runs-on: ubuntu-latest
    needs: [build, validate, deploy-staging]
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Install Helm
        uses: azure/setup-helm@v4

      - name: Set up kubeconfig
        run: |
          mkdir -p ~/.kube
          echo "${{ secrets.KUBECONFIG_PROD }}" | base64 -d > ~/.kube/config
          chmod 600 ~/.kube/config

      - name: Deploy with Helm
        run: |
          helm upgrade --install my-app ./charts/my-app \
            --namespace my-app-prod \
            --create-namespace \
            -f charts/my-app/values-production.yaml \
            --set image.tag=${{ needs.build.outputs.image-tag }} \
            --wait --timeout 300s

Key Design Decisions#

Image Tagging with Git SHA#

The docker/metadata-action generates tags from the git SHA. This creates immutable, traceable image tags – you can always identify exactly which commit produced a given deployment.

Load Testing Strategies: Tools, Patterns, and CI Integration

Sre

Why Load Test#

Performance problems discovered in production are expensive. A service that handles 100 requests per second in dev might collapse at 500 in production because connection pools exhaust, garbage collection pauses compound, or a downstream service starts throttling. Load testing reveals these limits before users do.

Load testing answers specific questions: What is the maximum throughput before errors start? At what concurrency does latency degrade beyond acceptable limits? Can the system sustain expected traffic for hours without resource leaks? Will a traffic spike cause cascading failures?

Running Terraform in CI/CD Pipelines

The Core Pattern#

The standard CI/CD pattern for Terraform: run terraform plan on every pull request and post the output as a PR comment. Run terraform apply only when the PR merges to main. This gives reviewers visibility into what will change before approving.

GitHub Actions Workflow#

name: Terraform
on:
  pull_request:
    paths: ["infra/**"]
  push:
    branches: [main]
    paths: ["infra/**"]

permissions:
  id-token: write    # OIDC
  contents: read
  pull-requests: write  # PR comments

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: infra
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/terraform-ci
          aws-region: us-east-1

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0
          terraform_wrapper: true  # captures stdout for PR comments

      - name: Terraform Init
        run: terraform init -input=false

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        id: plan
        run: terraform plan -input=false -no-color -out=tfplan
        continue-on-error: true

      - name: Post Plan to PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const plan = `${{ steps.plan.outputs.stdout }}`;
            const truncated = plan.length > 60000
              ? plan.substring(0, 60000) + "\n\n... truncated ..."
              : plan;
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `#### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
            });

      - name: Plan Status
        if: steps.plan.outcome == 'failure'
        run: exit 1

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -input=false tfplan

The plan step uses continue-on-error: true so the PR comment step still runs. A separate step checks the actual plan outcome. Apply only runs on pushes to main.

Self-Service Infrastructure Patterns

The Problem Self-Service Solves#

Developers need infrastructure: databases, caches, message queues, storage buckets, DNS records. In most organizations, getting these means filing a ticket, waiting days for someone to provision, and receiving credentials in a Slack DM. This bottleneck incentivizes workarounds — manual console provisioning, skipped security configs, everything crammed into shared databases.

Self-service infrastructure lets developers provision what they need directly, within guardrails the platform team defines. Choose a resource from a catalog, fill in parameters, and the system provisions it and returns connection details. No tickets, no waiting.

Setting Up Multi-Environment Infrastructure: Dev, Staging, and Production

Setting Up Multi-Environment Infrastructure: Dev, Staging, and Production#

Running a single environment is straightforward. Running three that drift apart silently is where teams lose weeks debugging “it works in dev.” This operational sequence walks through setting up dev, staging, and production environments that stay consistent where it matters and diverge only where intended.

Phase 1 – Environment Strategy#

Step 1: Define Environments#

Each environment serves a distinct purpose:

  • Dev: Rapid iteration. Developers deploy frequently, break things, and recover quickly. Data is disposable. Resources are minimal.
  • Staging: Production mirror. Same Kubernetes version, same network policies, same resource quotas. External services point to staging endpoints. Used for integration testing and pre-release validation.
  • Production: Real users, real data. Changes go through approval gates. Monitoring is comprehensive and alerting reaches on-call engineers.

Step 2: Isolation Model#

Decision point: Separate clusters per environment versus namespaces in a shared cluster.

Software Bill of Materials and Vulnerability Management

What Is an SBOM#

A Software Bill of Materials is a machine-readable inventory of every component in a software artifact. It lists packages, libraries, versions, licenses, and dependency relationships. An SBOM answers the question: what exactly is inside this container image, binary, or repository?

When a new CVE drops, organizations without SBOMs scramble to determine which systems are affected. Organizations with SBOMs query a database and have the answer in seconds.