Human-in-the-Loop Patterns: Approval Gates, Escalation, and Progressive Autonomy

Human-in-the-Loop Patterns#

The most common failure mode in agent-driven work is not a wrong answer – it is a correct action taken without permission. An agent that deletes a file to “clean up,” force-pushes a branch to “fix history,” or restarts a service to “apply changes” can cause more damage in one unauthorized action than a dozen wrong answers.

Human-in-the-loop design is not about limiting agent capability. It is about matching autonomy to risk. Safe, reversible actions should proceed without interruption. Dangerous, irreversible actions should require explicit approval. The challenge is building this classification into the workflow without turning every action into a confirmation dialog.

Image Patching and Lifecycle: Keeping Container Images Current

Image Patching and Lifecycle#

Building a container image and deploying it is the easy part. Keeping it patched over weeks, months, and years is where most teams fail. A container image deployed today with zero known vulnerabilities will accumulate CVEs as new vulnerabilities are disclosed against its OS packages, language runtime, and dependencies. You need an automated system that detects stale base images, triggers rebuilds, and rolls out updates safely.

Implementing Compliance as Code

Implementing Compliance as Code#

Compliance as code encodes compliance requirements as machine-readable policies evaluated automatically, continuously, and with every change. Instead of quarterly spreadsheet audits, a policy like “all S3 buckets must have encryption enabled” becomes a check that runs in CI, blocks non-compliant Terraform plans, and scans running infrastructure hourly. Evidence generation is automatic. Drift is detected immediately.

Step 1: Map Compliance Controls to Technical Policies#

Translate your compliance framework’s controls into specific, testable technical requirements. This mapping bridges auditor language and infrastructure code.

Incident Management Lifecycle

Incident Lifecycle Overview#

An incident is an unplanned disruption to a service requiring coordinated response. The lifecycle has six phases: detection, triage, communication, mitigation, resolution, and review. Each has defined actions, owners, and exit criteria.

Phase 1: Detection#

Incidents are detected through three channels. Automated monitoring is best – alerts fire on SLO violations or error thresholds before users notice. Internal reports come from other teams noticing issues with dependencies. Customer reports are worst case – if users detect your incidents first, your observability has gaps.

Infrastructure Capacity Planning: Measurement, Projection, and Scaling

What Capacity Planning Solves#

Running out of capacity during a traffic spike causes outages. Over-provisioning wastes money continuously. Capacity planning is the process of measuring what you use now, projecting what you will need, and ensuring resources are available before demand arrives. Without it, you are either constantly firefighting resource exhaustion or explaining to finance why your cloud bill doubled.

Capacity planning is not a one-time exercise. It is a recurring process – monthly for fast-growing services, quarterly for stable ones.

Infrastructure Knowledge Scoping for Agents

Infrastructure Knowledge Scoping for Agents#

An agent working on infrastructure tasks needs to operate at the right level of specificity. Giving generic Kubernetes advice when the user runs EKS with IRSA is unhelpful – the agent misses the IAM integration that will make or break the deployment. Giving EKS-specific advice when the user runs minikube on a laptop is equally unhelpful – the agent references services and configurations that do not exist.

Infrastructure Security Testing Approaches

Choosing the Right Testing Approach#

Infrastructure security testing is not one activity. It is a spectrum from fully automated scanning to manual adversarial testing. Each approach has different costs, coverage, and compliance implications. Choosing wrong wastes budget on low-value scans or leaves critical gaps unexamined.

The core decision is: what are you trying to learn, and what constraints do you operate under?

Decision Matrix#

QuestionAutomated ScanningKubernetes-Specific TestingNetwork ScanningManual Penetration Testing
What does it find?Known CVEs, misconfigurations, missing patchesK8s-specific misconfigurations, RBAC issues, pod security gapsOpen ports, exposed services, protocol weaknessesBusiness logic flaws, chained exploits, privilege escalation paths
How often?Continuous or dailyEvery cluster change, weekly minimumWeekly to monthlyAnnually or after major architecture changes
Who runs it?Automated pipeline or security teamPlatform/SRE teamSecurity team or automatedSpecialized pentest firm or red team
CostLow (tooling cost only)Low (open-source tools)Low to mediumHigh ($20k-$100k+ per engagement)
False positive rateMedium to highLowMediumVery low
Compliance fitPCI-DSS 11.2, SOC2 CC7.1CIS Kubernetes BenchmarkPCI-DSS 11.2, NIST 800-53PCI-DSS 11.3, SOC2 CC4.1

When to Use Each Approach#

Use automated scanning when you need continuous visibility into known vulnerabilities across your infrastructure. This is the baseline. Every organization should run automated scans regardless of what other testing they do.

Ingress Controllers and Routing Patterns

Ingress Controllers and Routing Patterns#

An Ingress resource defines HTTP routing rules – which hostnames and paths map to which backend Services. But an Ingress resource does nothing on its own. You need an Ingress controller running in the cluster to watch for Ingress resources and configure the actual reverse proxy.

Ingress Controllers#

The two most common controllers are nginx-ingress and Traefik.

nginx-ingress (ingress-nginx):

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx --namespace ingress-nginx --create-namespace

Note: there are two different nginx ingress projects. kubernetes/ingress-nginx (community) and nginxinc/kubernetes-ingress (NGINX Inc). The community version is far more common. Make sure you install from https://kubernetes.github.io/ingress-nginx, not the NGINX Inc chart.

Integrating Infrastructure as Code with CI/CD: Patterns for Safe, Automated Infrastructure Delivery

Integrating Infrastructure as Code with CI/CD#

Running Terraform locally works for one person. It breaks down when multiple people (or agents) modify infrastructure concurrently, when changes need review before applying, and when environments (dev/staging/prod) need synchronized promotion. CI/CD pipelines solve this by making the plan-review-apply cycle automated, auditable, and safe.

This article covers the patterns for integrating Terraform into CI/CD — from the basic plan-on-PR flow to multi-directory monorepos with dependency ordering and environment promotion.

Istio Security: mTLS, Authorization Policies, and Egress Control

Istio Security#

Istio provides three security capabilities that are difficult to implement without a service mesh: automatic mutual TLS between services, fine-grained authorization policies, and egress traffic control. These features work at the infrastructure layer, meaning applications do not need any code changes.

Automatic mTLS with PeerAuthentication#

Istio’s sidecar proxies can automatically encrypt all pod-to-pod traffic with mutual TLS. The key resource is PeerAuthentication. There are three modes:

  • PERMISSIVE – Accepts both plaintext and mTLS traffic. This is the default and exists for migration. Do not leave it in production.
  • STRICT – Requires mTLS for all traffic. Plaintext connections are rejected.
  • DISABLE – Turns off mTLS entirely.

Enable strict mTLS across the entire mesh: