Advanced Terraform State Management

Remote Backends#

Every team beyond a single developer needs remote state. The three major backends:

S3 + DynamoDB (AWS):

terraform {
  backend "s3" {
    bucket         = "myorg-tfstate"
    key            = "prod/network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Azure Blob Storage:

terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "myorgtfstate"
    container_name       = "tfstate"
    key                  = "prod/network/terraform.tfstate"
  }
}

Google Cloud Storage:

terraform {
  backend "gcs" {
    bucket = "myorg-tfstate"
    prefix = "prod/network"
  }
}

All three support locking natively (DynamoDB for S3, blob leases for Azure, object locking for GCS). Always enable encryption at rest and restrict access with IAM.

Diagnosing Common Terraform Problems

Stuck State Lock#

A CI job was cancelled, a laptop lost network, or a process crashed mid-apply. Terraform refuses to run:

Error acquiring the state lock
Lock Info:
  ID:        f8e7d6c5-b4a3-2109-8765-43210fedcba9
  Operation: OperationTypeApply
  Who:       deploy@ci-runner
  Created:   2026-02-20 09:15:22 +0000 UTC

Verify the lock holder is truly dead. Check CI job status, then:

terraform force-unlock f8e7d6c5-b4a3-2109-8765-43210fedcba9

If the lock was from a crashed apply, the state may be partially updated. Run terraform plan immediately after unlocking to see the current situation.

Running Terraform in CI/CD Pipelines

The Core Pattern#

The standard CI/CD pattern for Terraform: run terraform plan on every pull request and post the output as a PR comment. Run terraform apply only when the PR merges to main. This gives reviewers visibility into what will change before approving.

GitHub Actions Workflow#

name: Terraform
on:
  pull_request:
    paths: ["infra/**"]
  push:
    branches: [main]
    paths: ["infra/**"]

permissions:
  id-token: write    # OIDC
  contents: read
  pull-requests: write  # PR comments

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: infra
    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/terraform-ci
          aws-region: us-east-1

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0
          terraform_wrapper: true  # captures stdout for PR comments

      - name: Terraform Init
        run: terraform init -input=false

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        id: plan
        run: terraform plan -input=false -no-color -out=tfplan
        continue-on-error: true

      - name: Post Plan to PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const plan = `${{ steps.plan.outputs.stdout }}`;
            const truncated = plan.length > 60000
              ? plan.substring(0, 60000) + "\n\n... truncated ..."
              : plan;
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `#### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
            });

      - name: Plan Status
        if: steps.plan.outcome == 'failure'
        run: exit 1

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -input=false tfplan

The plan step uses continue-on-error: true so the PR comment step still runs. A separate step checks the actual plan outcome. Apply only runs on pushes to main.

Setting Up Multi-Environment Infrastructure: Dev, Staging, and Production

Setting Up Multi-Environment Infrastructure: Dev, Staging, and Production#

Running a single environment is straightforward. Running three that drift apart silently is where teams lose weeks debugging “it works in dev.” This operational sequence walks through setting up dev, staging, and production environments that stay consistent where it matters and diverge only where intended.

Phase 1 – Environment Strategy#

Step 1: Define Environments#

Each environment serves a distinct purpose:

  • Dev: Rapid iteration. Developers deploy frequently, break things, and recover quickly. Data is disposable. Resources are minimal.
  • Staging: Production mirror. Same Kubernetes version, same network policies, same resource quotas. External services point to staging endpoints. Used for integration testing and pre-release validation.
  • Production: Real users, real data. Changes go through approval gates. Monitoring is comprehensive and alerting reaches on-call engineers.

Step 2: Isolation Model#

Decision point: Separate clusters per environment versus namespaces in a shared cluster.

Terraform Core Concepts and Workflow

Providers, Resources, and Data Sources#

Terraform has three core object types. Providers are plugins that talk to APIs (AWS, Azure, GCP, Kubernetes, GitHub). Resources are the things you create and manage. Data sources read existing objects without managing them.

# providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  required_version = ">= 1.5.0"
}

provider "aws" {
  region = var.region
}
# A resource Terraform creates and manages
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  tags = { Name = "main-vpc" }
}

# A data source that reads an existing AMI
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_type
  subnet_id     = aws_vpc.main.id
}

Resources create, update, and delete. Data sources only read. If you need information about something Terraform does not manage, use a data source.

Terraform Modules: Structure, Composition, and Reuse

What Modules Are#

A Terraform module is a directory containing .tf files. Every Terraform configuration is already a module (the “root module”). When you call another module from your root module, that is a “child module.” Modules let you encapsulate a set of resources behind a clean interface of input variables and outputs.

Module Structure#

A well-organized module looks like this:

modules/vpc/
  main.tf           # resource definitions
  variables.tf      # input variables
  outputs.tf        # output values
  versions.tf       # required providers and terraform version
  README.md         # usage documentation

The module itself has no backend, no provider configuration, and no hardcoded values. Everything configurable comes in through variables. Everything downstream consumers need comes out through outputs.

Terraform State Management Patterns

Why Remote State#

Terraform stores the mapping between your configuration and real infrastructure in a state file. By default this is a local terraform.tfstate file. That breaks the moment a second person or a CI pipeline needs to run terraform apply. Remote state solves three problems: team collaboration (everyone reads the same state), CI/CD access (pipelines need state without copying files), and disaster recovery (your laptop dying should not lose your infrastructure mapping).