Advanced Terraform State Management

Remote Backends#

Every team beyond a single developer needs remote state. The three major backends:

S3 + DynamoDB (AWS):

terraform {
  backend "s3" {
    bucket         = "myorg-tfstate"
    key            = "prod/network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Azure Blob Storage:

terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "myorgtfstate"
    container_name       = "tfstate"
    key                  = "prod/network/terraform.tfstate"
  }
}

Google Cloud Storage:

terraform {
  backend "gcs" {
    bucket = "myorg-tfstate"
    prefix = "prod/network"
  }
}

All three support locking natively (DynamoDB for S3, blob leases for Azure, object locking for GCS). Always enable encryption at rest and restrict access with IAM.

Agent Context Preservation for Long-Running Workflows: Checkpoints, Sub-Agent Delegation, and Avoiding Context Pollution

Agent Context Preservation for Long-Running Workflows#

The context window is the single most important constraint in agent-driven work. A single-turn task uses a fraction of it. A multi-hour project fills it, overflows it, and degrades the agent’s reasoning quality long before the task is complete. Agents that work effectively on ambitious projects are not smarter – they manage context better.

This article covers practical, battle-tested patterns for preserving context across long sessions, delegating to sub-agents without losing coherence, and avoiding context pollution – the gradual degradation that happens when irrelevant information accumulates in the working context.

Agent Debugging Patterns: Tracing Decisions in Production

Agent Debugging Patterns#

When an agent produces a wrong answer, the question is always the same: why did it do that? Unlike traditional software where you read a stack trace, agent failures are buried in a chain of LLM decisions, tool calls, and context accumulation. Debugging agents requires specialized observability that captures not just what happened, but what the agent was thinking at each step.

Tracing Agent Decision Chains#

Every agent action follows a decision chain: the model reads its context, decides which tool to call (or whether to respond directly), processes the result, and decides again. To debug failures, you need to see this chain as a structured trace.

Agent Error Handling: Retries, Degradation, and Circuit Breakers

Agent Error Handling#

Agents call tools that call APIs that talk to services that query databases. Every link in that chain can fail. The difference between a useful agent and a frustrating one is what happens when something breaks.

Classify the Failure First#

Before deciding how to handle an error, classify it. The strategy depends entirely on whether the failure is transient or permanent.

Transient failures will likely succeed on retry: network timeouts, rate limits (HTTP 429), server overload (HTTP 503), connection resets, temporary DNS failures. These are the majority of failures in practice.

Agent Memory and Retrieval: Patterns for Persistent, Searchable Agent Knowledge

Agent Memory and Retrieval#

An agent without memory repeats mistakes, forgets context, and relearns the same facts every session. An agent with too much memory wastes context window tokens on irrelevant history and retrieves noise instead of signal. Effective memory sits between these extremes – storing what matters, retrieving what is relevant, and forgetting what is stale.

This reference covers the concrete patterns for building agent memory systems, from simple file-based approaches to production-grade retrieval pipelines.

Agent Runbook Generation: Producing Verified Infrastructure Deliverables

Agent Runbook Generation#

An agent that says “you should probably add a readiness probe to your deployment” is giving advice. An agent that hands you a tested manifest with the readiness probe configured, verified against a real cluster, with rollback steps if the probe misconfigures – that agent is producing a deliverable. The difference matters.

The core thesis of infrastructure agent work is that the output is always a deliverable – a runbook, playbook, tested manifest, or validated configuration – never a direct action on someone else’s systems. This article covers the complete workflow for generating those deliverables: understanding requirements, planning steps, executing in a sandbox, capturing what worked, and packaging the result.

Agent Sandboxing: Isolation Strategies for Execution Environments

Agent Sandboxing#

An AI agent that can execute code, run shell commands, or call APIs needs a sandbox. Without one, a single bad tool call – whether from a bug, a hallucination, or a prompt injection attack – can read secrets, modify production data, or pivot to other systems. This article is a decision framework for choosing the right sandboxing strategy based on your trust level, threat model, and performance requirements.

Agent Security Patterns: Defending Against Injection, Leakage, and Misuse

Agent Security Patterns#

An AI agent with tool access is a program that can read files, call APIs, execute code, and modify systems – driven by natural language input. Every classic security concern applies, plus new attack surfaces unique to LLM-powered systems. This article covers practical defenses, not theoretical risks.

Prompt Injection Defense#

Prompt injection is the most agent-specific security threat. An attacker embeds instructions in data the agent processes – a file, a web page, an API response – and the agent follows those instructions as if they came from the user.

Agent-Friendly API Design: Building APIs That Agents Can Consume

Agent-Friendly API Design#

Most APIs are designed for human developers who read documentation, interpret ambiguous error messages, and adapt their approach based on experience. Agents do not have these skills. They parse structured responses, follow explicit instructions, and fail on ambiguity. An API that is pleasant for humans to use may be impossible for an agent to use reliably.

This reference covers practical patterns for designing APIs – or modifying existing ones – so that agents can consume them effectively.

Agent-Oriented Terraform: Linear Patterns for Machine-Managed Infrastructure

Agent-Oriented Terraform#

Most Terraform code is written by humans for humans. It favors abstraction, DRY principles, and deep module nesting — patterns that make sense when a human maintains a mental model of the codebase. Agents do not maintain mental models. They read code fresh each time, trace references to resolve dependencies, and reason about the full resource graph in a single context window.

The patterns that make Terraform elegant for humans make it expensive for agents. Deep module nesting multiplies the files an agent must read. Variable threading through three layers of modules hides dependencies behind indirection. Complex for_each over maps of objects creates resources that are invisible until runtime. The agent spends most of its context on navigation, not comprehension.