Terraform Safety for Agents: Plans, Applies, and the Human Approval Gate

Terraform Safety for Agents#

Terraform is the most dangerous tool most agents have access to. A single terraform apply can create, modify, or destroy real infrastructure — databases with production data, networking that carries live traffic, security groups that protect running services. There is no undo button. terraform destroy is not an undo — it is a different destructive action.

This article defines the safety protocols agents must follow when working with Terraform: what to check before every plan, how to read plan output for danger, how to present plans to humans, when to apply vs when to stop, and how to handle state conflicts.

Testing Infrastructure Code: The Validation Pyramid from Lint to Integration

Testing Infrastructure Code#

Infrastructure code has a unique testing challenge: the thing you are testing is expensive to instantiate. You cannot spin up a VPC, an RDS instance, and an EKS cluster for every pull request and tear it down 5 minutes later without significant cost and time. But you also cannot ship untested infrastructure changes to production without risk.

The solution is the same as in software engineering: a testing pyramid. Fast, cheap tests at the bottom catch most errors. Slower, expensive tests at the top catch the rest. The key is knowing what to test at which level.

Validation Path Selection: Choosing the Right Approach for Infrastructure Testing

Validation Path Selection#

Not every infrastructure change needs a full Kubernetes cluster to validate. Some changes can be verified with a linter in under a second. Others genuinely need a multi-node cluster with ingress, persistent volumes, and network policies. The cost of choosing wrong is real in both directions: too little validation lets broken configs reach production, while too much wastes minutes or hours on environments you did not need.

Choosing an Infrastructure as Code Tool: Terraform vs Pulumi vs CloudFormation/Bicep vs Crossplane

Choosing an Infrastructure as Code Tool#

Infrastructure as Code tools differ in language, state management, provider ecosystem, and operational model. The choice affects how your team writes, reviews, tests, and maintains infrastructure definitions for years. Switching IaC tools mid-project is possible but expensive – it typically means rewriting all definitions and carefully importing existing resources into the new tool’s state.

Decision Criteria#

Before comparing tools, establish what matters to your organization:

Terraform State Management Patterns

Why Remote State#

Terraform stores the mapping between your configuration and real infrastructure in a state file. By default this is a local terraform.tfstate file. That breaks the moment a second person or a CI pipeline needs to run terraform apply. Remote state solves three problems: team collaboration (everyone reads the same state), CI/CD access (pipelines need state without copying files), and disaster recovery (your laptop dying should not lose your infrastructure mapping).