---
title: "Diagnosing Common Terraform Problems"
description: "Practical fixes for stuck state locks, dependency cycles, unexpected plan changes, import errors, slow plans, and partial apply recovery."
url: https://agent-zone.ai/knowledge/infrastructure/terraform-debugging/
section: knowledge
date: 2026-02-22
categories: ["infrastructure"]
tags: ["terraform","debugging","state","lifecycle","troubleshooting"]
skills: ["terraform-debugging","infrastructure-as-code","state-recovery"]
tools: ["terraform"]
levels: ["intermediate"]
word_count: 786
formats:
  json: https://agent-zone.ai/knowledge/infrastructure/terraform-debugging/index.json
  html: https://agent-zone.ai/knowledge/infrastructure/terraform-debugging/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Diagnosing+Common+Terraform+Problems
---


## Stuck State Lock

A CI job was cancelled, a laptop lost network, or a process crashed mid-apply. Terraform refuses to run:

```
Error acquiring the state lock
Lock Info:
  ID:        f8e7d6c5-b4a3-2109-8765-43210fedcba9
  Operation: OperationTypeApply
  Who:       deploy@ci-runner
  Created:   2026-02-20 09:15:22 +0000 UTC
```

Verify the lock holder is truly dead. Check CI job status, then:

```bash
terraform force-unlock f8e7d6c5-b4a3-2109-8765-43210fedcba9
```

If the lock was from a crashed `apply`, the state may be partially updated. Run `terraform plan` immediately after unlocking to see the current situation.

## Provider Authentication Failures

```
Error: error configuring S3 Backend: no valid credential sources found
```

Check in order: (1) environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, or `AWS_PROFILE`), (2) shared credentials file (`~/.aws/credentials`), (3) instance role or OIDC token. Common traps:

- Running in CI without OIDC configured or credentials injected
- `AWS_PROFILE` set to a profile that does not exist on the CI runner
- MFA-protected profiles that cannot work non-interactively
- Expired SSO session: run `aws sso login --profile your-profile`

For Azure: check `ARM_CLIENT_ID`, `ARM_CLIENT_SECRET`, `ARM_TENANT_ID`, `ARM_SUBSCRIPTION_ID`. For GCP: check `GOOGLE_CREDENTIALS` or `GOOGLE_APPLICATION_CREDENTIALS` path.

## Dependency Cycles

```
Error: Cycle: aws_security_group.a, aws_security_group.b
```

Two resources reference each other. Security groups are the classic case. Fix by splitting into separate resources and rules:

```hcl
resource "aws_security_group" "a" {
  name   = "sg-a"
  vpc_id = aws_vpc.main.id
}

resource "aws_security_group" "b" {
  name   = "sg-b"
  vpc_id = aws_vpc.main.id
}

resource "aws_security_group_rule" "a_from_b" {
  type                     = "ingress"
  security_group_id        = aws_security_group.a.id
  source_security_group_id = aws_security_group.b.id
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
}

resource "aws_security_group_rule" "b_from_a" {
  type                     = "ingress"
  security_group_id        = aws_security_group.b.id
  source_security_group_id = aws_security_group.a.id
  from_port                = 5432
  to_port                  = 5432
  protocol                 = "tcp"
}
```

Separate rule resources break the cycle because Terraform can create both groups first, then both rules.

## Plan Shows Unexpected Changes

When `terraform plan` shows changes you did not make, investigate the cause:

**State drift.** Someone modified the resource outside Terraform. Run `terraform plan -refresh-only` to see what drifted, then decide whether to update your config or let Terraform revert the change.

**Provider upgrade changed defaults.** A provider update may interpret attributes differently. Pin provider versions and review changelogs before upgrading.

**Lifecycle blocks** prevent unwanted changes:

```hcl
resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.medium"

  lifecycle {
    ignore_changes = [ami]   # AMI updates handled by a separate process
  }
}

resource "aws_db_instance" "main" {
  # ...
  lifecycle {
    prevent_destroy = true   # Block accidental deletion
  }
}
```

`ignore_changes` skips specific attributes during planning. `prevent_destroy` errors if a plan would destroy the resource.

## "Resource Already Exists" on Apply

```
Error: error creating S3 Bucket (my-bucket): BucketAlreadyOwnedByYou
```

The resource exists in AWS but not in Terraform state. Two options:

**Import it** (Terraform starts managing it):

```bash
terraform import aws_s3_bucket.logs my-bucket
```

**Remove the conflict** (if it is a naming collision from a previous partial apply):

```bash
# If state has a stale reference
terraform state rm aws_s3_bucket.logs
# Then plan/apply again
```

## Slow Plans

Large configurations take minutes to plan because Terraform refreshes every resource via API calls.

**Target specific resources** during development:

```bash
terraform plan -target=module.ecs
terraform plan -target=aws_instance.web
```

Do not use `-target` in CI or production applies. It skips dependency checks and can leave state inconsistent.

**Increase parallelism** (default is 10):

```bash
terraform apply -parallelism=30
```

**Split large configs** into smaller root modules. If networking, compute, and databases are independent blast radii, they should be separate state files.

## Version Constraint Conflicts

```
Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider hashicorp/aws:
locked provider registry.hashicorp.com/hashicorp/aws 4.67.0 does not match
configured version constraint ~> 5.0
```

The `.terraform.lock.hcl` file records exact provider versions. If you update version constraints in your config, delete the lock file and re-init:

```bash
rm .terraform.lock.hcl
terraform init -upgrade
```

Then commit the new lock file. In a team, coordinate lock file updates to avoid merge conflicts.

## Debugging with TF_LOG

When error messages are not enough, enable debug logging:

```bash
# Levels: TRACE, DEBUG, INFO, WARN, ERROR
TF_LOG=DEBUG terraform plan

# Log to file instead of stderr
TF_LOG=TRACE TF_LOG_PATH=terraform.log terraform plan

# Provider-specific logging
TF_LOG_PROVIDER=TRACE terraform plan
```

`TRACE` is extremely verbose but shows the exact API calls Terraform makes. Useful for diagnosing "why did the provider send this request?" problems.

## Handling API Rate Limits

With large configurations, Terraform can hit provider API rate limits:

```
Error: error reading S3 Bucket: SlowDown: Please reduce your request rate
```

The AWS provider retries automatically. Reduce parallelism as a blunt fix: `terraform apply -parallelism=5`.

## Recovering from Partial Applies

If `terraform apply` fails midway, the state file accurately reflects what was created. Run `terraform plan` to see remaining work and `terraform apply` again -- Terraform picks up where it left off.

If the partial state is badly broken (rare, usually provider bugs), restore from backup:

```bash
terraform state pull > broken.tfstate
# Restore from S3 versioning or your backup
aws s3api list-object-versions --bucket myorg-tfstate --prefix prod/terraform.tfstate
aws s3api get-object --bucket myorg-tfstate --key prod/terraform.tfstate \
  --version-id "previous-version-id" restored.tfstate
terraform state push restored.tfstate
```

