Implementing Compliance as Code

Implementing Compliance as Code#

Compliance as code encodes compliance requirements as machine-readable policies evaluated automatically, continuously, and with every change. Instead of quarterly spreadsheet audits, a policy like “all S3 buckets must have encryption enabled” becomes a check that runs in CI, blocks non-compliant Terraform plans, and scans running infrastructure hourly. Evidence generation is automatic. Drift is detected immediately.

Step 1: Map Compliance Controls to Technical Policies#

Translate your compliance framework’s controls into specific, testable technical requirements. This mapping bridges auditor language and infrastructure code.

Incident Management Lifecycle

Incident Lifecycle Overview#

An incident is an unplanned disruption to a service requiring coordinated response. The lifecycle has six phases: detection, triage, communication, mitigation, resolution, and review. Each has defined actions, owners, and exit criteria.

Phase 1: Detection#

Incidents are detected through three channels. Automated monitoring is best – alerts fire on SLO violations or error thresholds before users notice. Internal reports come from other teams noticing issues with dependencies. Customer reports are worst case – if users detect your incidents first, your observability has gaps.

Infrastructure Capacity Planning: Measurement, Projection, and Scaling

What Capacity Planning Solves#

Running out of capacity during a traffic spike causes outages. Over-provisioning wastes money continuously. Capacity planning is the process of measuring what you use now, projecting what you will need, and ensuring resources are available before demand arrives. Without it, you are either constantly firefighting resource exhaustion or explaining to finance why your cloud bill doubled.

Capacity planning is not a one-time exercise. It is a recurring process – monthly for fast-growing services, quarterly for stable ones.

Infrastructure Knowledge Scoping for Agents

Infrastructure Knowledge Scoping for Agents#

An agent working on infrastructure tasks needs to operate at the right level of specificity. Giving generic Kubernetes advice when the user runs EKS with IRSA is unhelpful – the agent misses the IAM integration that will make or break the deployment. Giving EKS-specific advice when the user runs minikube on a laptop is equally unhelpful – the agent references services and configurations that do not exist.

Infrastructure Security Testing Approaches

Choosing the Right Testing Approach#

Infrastructure security testing is not one activity. It is a spectrum from fully automated scanning to manual adversarial testing. Each approach has different costs, coverage, and compliance implications. Choosing wrong wastes budget on low-value scans or leaves critical gaps unexamined.

The core decision is: what are you trying to learn, and what constraints do you operate under?

Decision Matrix#

QuestionAutomated ScanningKubernetes-Specific TestingNetwork ScanningManual Penetration Testing
What does it find?Known CVEs, misconfigurations, missing patchesK8s-specific misconfigurations, RBAC issues, pod security gapsOpen ports, exposed services, protocol weaknessesBusiness logic flaws, chained exploits, privilege escalation paths
How often?Continuous or dailyEvery cluster change, weekly minimumWeekly to monthlyAnnually or after major architecture changes
Who runs it?Automated pipeline or security teamPlatform/SRE teamSecurity team or automatedSpecialized pentest firm or red team
CostLow (tooling cost only)Low (open-source tools)Low to mediumHigh ($20k-$100k+ per engagement)
False positive rateMedium to highLowMediumVery low
Compliance fitPCI-DSS 11.2, SOC2 CC7.1CIS Kubernetes BenchmarkPCI-DSS 11.2, NIST 800-53PCI-DSS 11.3, SOC2 CC4.1

When to Use Each Approach#

Use automated scanning when you need continuous visibility into known vulnerabilities across your infrastructure. This is the baseline. Every organization should run automated scans regardless of what other testing they do.

Ingress Controllers and Routing Patterns

Ingress Controllers and Routing Patterns#

An Ingress resource defines HTTP routing rules – which hostnames and paths map to which backend Services. But an Ingress resource does nothing on its own. You need an Ingress controller running in the cluster to watch for Ingress resources and configure the actual reverse proxy.

Ingress Controllers#

The two most common controllers are nginx-ingress and Traefik.

nginx-ingress (ingress-nginx):

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx --namespace ingress-nginx --create-namespace

Note: there are two different nginx ingress projects. kubernetes/ingress-nginx (community) and nginxinc/kubernetes-ingress (NGINX Inc). The community version is far more common. Make sure you install from https://kubernetes.github.io/ingress-nginx, not the NGINX Inc chart.

Integrating Infrastructure as Code with CI/CD: Patterns for Safe, Automated Infrastructure Delivery

Integrating Infrastructure as Code with CI/CD#

Running Terraform locally works for one person. It breaks down when multiple people (or agents) modify infrastructure concurrently, when changes need review before applying, and when environments (dev/staging/prod) need synchronized promotion. CI/CD pipelines solve this by making the plan-review-apply cycle automated, auditable, and safe.

This article covers the patterns for integrating Terraform into CI/CD — from the basic plan-on-PR flow to multi-directory monorepos with dependency ordering and environment promotion.

Istio Security: mTLS, Authorization Policies, and Egress Control

Istio Security#

Istio provides three security capabilities that are difficult to implement without a service mesh: automatic mutual TLS between services, fine-grained authorization policies, and egress traffic control. These features work at the infrastructure layer, meaning applications do not need any code changes.

Automatic mTLS with PeerAuthentication#

Istio’s sidecar proxies can automatically encrypt all pod-to-pod traffic with mutual TLS. The key resource is PeerAuthentication. There are three modes:

  • PERMISSIVE – Accepts both plaintext and mTLS traffic. This is the default and exists for migration. Do not leave it in production.
  • STRICT – Requires mTLS for all traffic. Plaintext connections are rejected.
  • DISABLE – Turns off mTLS entirely.

Enable strict mTLS across the entire mesh:

Istio Service Mesh: Traffic Management, Security, and Observability

Istio Service Mesh#

Istio adds a proxy sidecar (Envoy) to every pod in the mesh. These proxies handle traffic routing, mutual TLS, retries, circuit breaking, and telemetry without changing application code. The control plane (istiod) pushes configuration to all sidecars.

When You Actually Need a Service Mesh#

You need Istio when you have multiple services requiring mTLS, fine-grained traffic control (canary releases, fault injection), or consistent observability across service-to-service communication. If you have fewer than five services, standard Kubernetes Services and NetworkPolicies are sufficient. A service mesh adds operational complexity – more moving parts, higher memory usage per sidecar, and a learning curve for proxy-level debugging.

Jenkins Debugging: Diagnosing Stuck Builds, Pipeline Failures, Performance Issues, and Kubernetes Agent Problems

Jenkins Debugging#

Jenkins failures fall into a few categories: builds stuck waiting, cryptic pipeline errors, performance degradation, and Kubernetes agent pods that refuse to launch.

Builds Stuck in Queue#

When a build sits in the queue and never starts, check the queue tooltip in the UI – it tells you why. Common causes:

No agents with matching labels. The pipeline requests agent { label 'docker-arm64' } but no agent has that label. Check Manage Jenkins > Nodes to see available labels.