Advanced Terraform State Management

Remote Backends#

Every team beyond a single developer needs remote state. The three major backends:

S3 + DynamoDB (AWS):

terraform {
  backend "s3" {
    bucket         = "myorg-tfstate"
    key            = "prod/network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Azure Blob Storage:

terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "myorgtfstate"
    container_name       = "tfstate"
    key                  = "prod/network/terraform.tfstate"
  }
}

Google Cloud Storage:

terraform {
  backend "gcs" {
    bucket = "myorg-tfstate"
    prefix = "prod/network"
  }
}

All three support locking natively (DynamoDB for S3, blob leases for Azure, object locking for GCS). Always enable encryption at rest and restrict access with IAM.

Grafana Mimir for Long-Term Prometheus Storage

Grafana Mimir for Long-Term Prometheus Storage#

Prometheus stores metrics on local disk with a practical retention limit of weeks to a few months. Beyond that, you need a long-term storage solution. Grafana Mimir is a horizontally scalable, multi-tenant time series database designed for exactly this purpose. It is API-compatible with Prometheus – Grafana queries Mimir using the same PromQL, and Prometheus pushes data to Mimir via remote_write.

Mimir is the successor to Cortex. Grafana Labs forked Cortex, rewrote significant portions for performance, and released Mimir under the AGPLv3 license. If you see references to Cortex architecture, the concepts map directly to Mimir with improvements.

Long-Term Metrics Storage: Thanos vs Grafana Mimir vs VictoriaMetrics

The Retention Problem#

Prometheus stores metrics on local disk with a default retention of 15 days. Most production teams extend this to 30 or 90 days, but local storage has hard limits. A single Prometheus instance cannot scale disk beyond the node it runs on. It provides no high availability – if the instance goes down, you lose scraping and query access. And each Prometheus instance only sees its own targets, so there is no unified view across clusters or regions.