Developer Self-Service Workflows

The Cost of Not Having Self-Service#

A developer needs a PostgreSQL database. They file a ticket. It sits in a backlog for two days. A DBA provisions it, sends credentials via Slack DM. Elapsed time: 3 days. Actual need: 5 minutes of configuration. Multiply across every database, cache, queue, and namespace, and manual provisioning becomes the single largest drag on velocity. Self-service lets developers provision pre-approved resources directly, within guardrails the platform team defines.

Choosing a Kubernetes Policy Engine: OPA/Gatekeeper vs Kyverno vs Pod Security Admission

Choosing a Kubernetes Policy Engine#

Kubernetes does not enforce security best practices by default. A freshly deployed cluster allows containers to run as root, pull images from any registry, mount the host filesystem, and use the host network. Policy engines close this gap by intercepting API requests through admission webhooks and rejecting or modifying resources that violate your rules.

The three main options – Pod Security Admission (built-in), OPA Gatekeeper, and Kyverno – serve different needs. Choosing the wrong one leads to either insufficient enforcement or unnecessary operational burden.

Implementing Compliance as Code

Implementing Compliance as Code#

Compliance as code encodes compliance requirements as machine-readable policies evaluated automatically, continuously, and with every change. Instead of quarterly spreadsheet audits, a policy like “all S3 buckets must have encryption enabled” becomes a check that runs in CI, blocks non-compliant Terraform plans, and scans running infrastructure hourly. Evidence generation is automatic. Drift is detected immediately.

Step 1: Map Compliance Controls to Technical Policies#

Translate your compliance framework’s controls into specific, testable technical requirements. This mapping bridges auditor language and infrastructure code.

Running Terraform in CI/CD Pipelines

The Core Pattern#

The standard CI/CD pattern for Terraform: run terraform plan on every pull request and post the output as a PR comment. Run terraform apply only when the PR merges to main. This gives reviewers visibility into what will change before approving.

GitHub Actions Workflow#

name: Terraform
on:
  pull_request:
    paths: ["infra/**"]
  push:
    branches: [main]
    paths: ["infra/**"]

permissions:
  id-token: write    # OIDC
  contents: read
  pull-requests: write  # PR comments

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: infra
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/terraform-ci
          aws-region: us-east-1

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0
          terraform_wrapper: true  # captures stdout for PR comments

      - name: Terraform Init
        run: terraform init -input=false

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        id: plan
        run: terraform plan -input=false -no-color -out=tfplan
        continue-on-error: true

      - name: Post Plan to PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const plan = `${{ steps.plan.outputs.stdout }}`;
            const truncated = plan.length > 60000
              ? plan.substring(0, 60000) + "\n\n... truncated ..."
              : plan;
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `#### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
            });

      - name: Plan Status
        if: steps.plan.outcome == 'failure'
        run: exit 1

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -input=false tfplan

The plan step uses continue-on-error: true so the PR comment step still runs. A separate step checks the actual plan outcome. Apply only runs on pushes to main.

Self-Service Infrastructure Patterns

The Problem Self-Service Solves#

Developers need infrastructure: databases, caches, message queues, storage buckets, DNS records. In most organizations, getting these means filing a ticket, waiting days for someone to provision, and receiving credentials in a Slack DM. This bottleneck incentivizes workarounds — manual console provisioning, skipped security configs, everything crammed into shared databases.

Self-service infrastructure lets developers provision what they need directly, within guardrails the platform team defines. Choose a resource from a catalog, fill in parameters, and the system provisions it and returns connection details. No tickets, no waiting.

Static Validation Patterns: Infrastructure Validation Without a Cluster

Static Validation Patterns#

Static validation catches infrastructure errors before anything is deployed. No cluster needed, no cloud credentials needed, no cost incurred. These tools analyze configuration files – Helm charts, Kubernetes manifests, Terraform modules, Kustomize overlays – and report problems that would cause failures at deploy time.

Static validation does not replace integration testing. It cannot verify that a service starts successfully, that a pod can pull its image, or that a database accepts connections. What it catches are structural errors: malformed YAML, invalid API versions, missing required fields, policy violations, deprecated resources, and misconfigured values. In practice, this covers roughly 40% of infrastructure issues – the ones that are cheapest to find and cheapest to fix.

Testing Infrastructure Code: The Validation Pyramid from Lint to Integration

Testing Infrastructure Code#

Infrastructure code has a unique testing challenge: the thing you are testing is expensive to instantiate. You cannot spin up a VPC, an RDS instance, and an EKS cluster for every pull request and tear it down 5 minutes later without significant cost and time. But you also cannot ship untested infrastructure changes to production without risk.

The solution is the same as in software engineering: a testing pyramid. Fast, cheap tests at the bottom catch most errors. Slower, expensive tests at the top catch the rest. The key is knowing what to test at which level.

Zero Trust Architecture: Principles, Identity-Based Access, Microsegmentation, and Implementation

Zero Trust Architecture#

Zero trust means no implicit trust. A request from inside the corporate network is treated with the same suspicion as a request from the public internet. Every request must prove who it is, what it is allowed to do, and that it is coming from a healthy device or service — regardless of network location.

This is not a product you buy. It is an architectural approach that requires changes to authentication, authorization, network design, and monitoring.