Agent-Oriented Terraform: Linear Patterns for Machine-Managed Infrastructure

Agent-Oriented Terraform#

Most Terraform code is written by humans for humans. It favors abstraction, DRY principles, and deep module nesting — patterns that make sense when a human maintains a mental model of the codebase. Agents do not maintain mental models. They read code fresh each time, trace references to resolve dependencies, and reason about the full resource graph in a single context window.

The patterns that make Terraform elegant for humans make it expensive for agents. Deep module nesting multiplies the files an agent must read. Variable threading through three layers of modules hides dependencies behind indirection. Complex for_each over maps of objects creates resources that are invisible until runtime. The agent spends most of its context on navigation, not comprehension.

AWS Terraform Patterns: IAM, Networking, EKS, RDS, and Common Gotchas

AWS Terraform Patterns#

AWS is the most common Terraform target and the most complex. It has more services, more configuration options, and more subtle gotchas than Azure or GCP. This article covers the AWS-specific patterns that agents need to write correct, secure Terraform — with emphasis on the mistakes that cause real production issues.

IAM: The Foundation of Everything#

Every AWS resource that does anything needs IAM permissions. The two patterns agents must know: service roles (letting AWS services act on your behalf) and IRSA (letting Kubernetes pods assume IAM roles).

Azure Terraform Patterns: Resource Groups, AKS, Managed Identity, and Common Gotchas

Azure Terraform Patterns#

Azure’s Terraform provider (azurerm) has its own idioms, naming conventions, and gotchas that differ significantly from AWS. The biggest differences: everything lives in a Resource Group, identity management uses Managed Identity (not IAM roles), and many services require explicit Private DNS Zone configuration for private networking.

Resource Groups: Azure’s Organizational Unit#

Every Azure resource belongs to a Resource Group. This is the first thing you create and the last thing you delete.

Building Machine Images with Packer: Templates, Builders, Provisioners, and CI/CD

Building Machine Images with Packer#

Machine images (AMIs, Azure Managed Images, GCP Images) are the foundation of immutable infrastructure. Instead of provisioning a base OS and configuring it at boot, you build a pre-configured image and launch instances from it. Packer automates this process: it launches a temporary instance, runs provisioners to configure it, creates an image from the result, and destroys the temporary instance.

This operational sequence walks through building, testing, and managing machine images with Packer from template creation through CI/CD integration.

Cloud Migration Strategies: The 7 Rs Framework

Cloud Migration Strategies#

A company does not “migrate to the cloud” – it migrates dozens or hundreds of applications, each with different characteristics, dependencies, and risk profiles. The 7 Rs framework provides vocabulary for per-workload decisions, but selecting the right R requires understanding the application, its dependencies, and the organization’s tolerance for change.

The 7 Rs#

Rehost (Lift and Shift)#

Move the application to cloud infrastructure with minimal changes. A VM on-premises becomes an EC2 instance. OS, application code, and configuration remain the same.

Diagnosing Common Terraform Problems

Stuck State Lock#

A CI job was cancelled, a laptop lost network, or a process crashed mid-apply. Terraform refuses to run:

Error acquiring the state lock
Lock Info:
  ID:        f8e7d6c5-b4a3-2109-8765-43210fedcba9
  Operation: OperationTypeApply
  Who:       deploy@ci-runner
  Created:   2026-02-20 09:15:22 +0000 UTC

Verify the lock holder is truly dead. Check CI job status, then:

terraform force-unlock f8e7d6c5-b4a3-2109-8765-43210fedcba9

If the lock was from a crashed apply, the state may be partially updated. Run terraform plan immediately after unlocking to see the current situation.

Docker Compose Patterns for Local Development

Multi-Service Stack Structure#

A typical local development stack has an application, a database, and maybe a cache or message broker. The compose file should read top-to-bottom like a description of your system.

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    env_file:
      - .env
    volumes:
      - ./src:/app/src
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: myapp
      POSTGRES_PASSWORD: localdev
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U myapp"]
      interval: 5s
      timeout: 3s
      retries: 5

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

volumes:
  pgdata:

depends_on and Healthchecks#

The depends_on field controls startup order, but without a condition it only waits for the container to start, not for the service inside to be ready. A Postgres container starts in under a second, but the database process takes several seconds to accept connections. Use condition: service_healthy paired with a healthcheck to block until the dependency is actually ready.

Ephemeral Cloud Clusters: Create, Validate, Destroy Sequences for EKS, GKE, and AKS

Ephemeral Cloud Clusters#

Ephemeral clusters exist for one purpose: validate something, then disappear. They are not staging environments, not shared dev clusters, not long-lived resources that someone forgets to turn off. The operational model is strict – create, validate, destroy – and the entire sequence must be automated so that destruction cannot be forgotten.

The cost of getting this wrong is real. A three-node EKS cluster left running over a weekend costs roughly $15. Left running for a month, $200. Multiply by the number of developers or CI pipelines that create clusters, and forgotten ephemeral infrastructure becomes a significant budget line item. Every template in this article includes auto-destroy mechanisms to prevent this.

GCP Terraform Patterns: Projects, GKE, Workload Identity, Cloud SQL, and Common Gotchas

GCP Terraform Patterns#

GCP’s Terraform provider (google and google-beta) has patterns distinct from both AWS and Azure. The biggest differences: APIs must be explicitly enabled per project, IAM uses a binding model (not inline policies), and GKE requires secondary IP ranges for VPC-native networking. GCP resources also tend to have longer creation times with more eventual consistency.

Projects and API Enablement#

Before creating any resource in GCP, the corresponding API must be enabled in the project. This is the most common source of first-time failures.

HAProxy Configuration and Operations

Configuration File Structure#

HAProxy uses a single configuration file, typically /etc/haproxy/haproxy.cfg. The configuration is divided into four sections: global (process-level settings), defaults (default values for all proxies), frontend (client-facing listeners), and backend (server pools).

global
    log /dev/log local0
    log /dev/log local1 notice
    maxconn 50000
    user haproxy
    group haproxy
    daemon
    stats socket /run/haproxy/admin.sock mode 660 level admin
    tune.ssl.default-dh-param 2048

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    timeout http-request 10s
    timeout http-keep-alive 10s
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http

The maxconn in the global section sets the hard limit on total simultaneous connections. The timeout values in defaults apply to all frontends and backends unless overridden. The stats socket directive enables the runtime API, which is essential for operational management.