---
title: "GCP Terraform Patterns: Projects, GKE, Workload Identity, Cloud SQL, and Common Gotchas"
description: "GCP-specific Terraform patterns for the google provider. Covers project and API enablement, VPC networking with secondary ranges, GKE with Workload Identity, Cloud SQL with private service networking, IAM binding patterns, and GCP-specific gotchas that cause silent failures and permission errors."
url: https://agent-zone.ai/knowledge/infrastructure/gcp-terraform-patterns/
section: knowledge
date: 2026-02-22
categories: ["infrastructure"]
tags: ["terraform","gcp","gke","workload-identity","cloud-sql","iam","vpc","secondary-ranges","service-networking","gotchas"]
skills: ["gcp-terraform","gke-setup","workload-identity-patterns","cloud-sql-configuration"]
tools: ["terraform","gcloud"]
levels: ["intermediate"]
word_count: 1387
formats:
  json: https://agent-zone.ai/knowledge/infrastructure/gcp-terraform-patterns/index.json
  html: https://agent-zone.ai/knowledge/infrastructure/gcp-terraform-patterns/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=GCP+Terraform+Patterns%3A+Projects%2C+GKE%2C+Workload+Identity%2C+Cloud+SQL%2C+and+Common+Gotchas
---


# GCP Terraform Patterns

GCP's Terraform provider (`google` and `google-beta`) has patterns distinct from both AWS and Azure. The biggest differences: APIs must be explicitly enabled per project, IAM uses a binding model (not inline policies), and GKE requires secondary IP ranges for VPC-native networking. GCP resources also tend to have longer creation times with more eventual consistency.

## Projects and API Enablement

Before creating any resource in GCP, the corresponding API must be enabled in the project. This is the most common source of first-time failures.

```hcl
variable "project_id" {
  type        = string
  description = "GCP project ID (not the project number)"
}

# Enable required APIs
resource "google_project_service" "apis" {
  for_each = toset([
    "compute.googleapis.com",
    "container.googleapis.com",
    "sqladmin.googleapis.com",
    "servicenetworking.googleapis.com",
    "iam.googleapis.com",
    "cloudresourcemanager.googleapis.com",
  ])

  project = var.project_id
  service = each.value

  disable_on_destroy = false  # do not disable API when Terraform destroys
}
```

**Gotcha**: API enablement is eventually consistent. The API might report as enabled before it is fully ready. Add a short `time_sleep` or use `depends_on` from resource to API enablement:

```hcl
resource "time_sleep" "api_warmup" {
  depends_on      = [google_project_service.apis]
  create_duration = "30s"
}

resource "google_container_cluster" "main" {
  depends_on = [time_sleep.api_warmup]
  # ...
}
```

**Gotcha**: `disable_on_destroy = false` is critical. Without it, `terraform destroy` disables the API, which cascades to deleting all resources using that API — including resources managed by other Terraform configurations.

## IAM Binding Patterns

GCP IAM has three resource types. Using the wrong one causes silent permission overwrites.

```hcl
# google_project_iam_member — ADDITIVE, always safe
# Adds one member to one role. Does not affect other members in that role.
resource "google_project_iam_member" "gke_logging" {
  project = var.project_id
  role    = "roles/logging.logWriter"
  member  = "serviceAccount:${google_service_account.gke_nodes.email}"
}

# google_project_iam_binding — AUTHORITATIVE for the role
# Sets the COMPLETE list of members for a role. Removes anyone not listed.
# DANGEROUS: can silently remove permissions granted by other Terraform configs or manually.
resource "google_project_iam_binding" "editors" {
  project = var.project_id
  role    = "roles/editor"
  members = [
    "user:admin@example.com",
    "serviceAccount:ci@project.iam.gserviceaccount.com",
  ]
  # Anyone else who had roles/editor? Gone.
}

# google_project_iam_policy — AUTHORITATIVE for the ENTIRE project
# Sets ALL IAM bindings for the project. Removes everything not listed.
# EXTREMELY DANGEROUS: can lock you out of the project.
# Almost never use this.
```

**Rule for agents**: Always use `google_project_iam_member`. Never use `google_project_iam_binding` unless you are certain you control all members of that role. Never use `google_project_iam_policy`.

### Service Accounts

```hcl
resource "google_service_account" "app" {
  account_id   = "my-app-sa"
  display_name = "My Application Service Account"
  project      = var.project_id
}

# Grant specific permissions
resource "google_project_iam_member" "app_storage" {
  project = var.project_id
  role    = "roles/storage.objectViewer"
  member  = "serviceAccount:${google_service_account.app.email}"
}

resource "google_project_iam_member" "app_sql" {
  project = var.project_id
  role    = "roles/cloudsql.client"
  member  = "serviceAccount:${google_service_account.app.email}"
}
```

**Gotcha**: GCP IAM changes are eventually consistent (typically 60 seconds, can be up to 7 minutes). If a resource fails with `PERMISSION_DENIED` immediately after granting a role, it may be a propagation delay, not a missing permission.

## VPC Networking with Secondary Ranges

GKE requires VPC-native networking with secondary IP ranges for pods and services:

```hcl
resource "google_compute_network" "main" {
  name                    = "production-vpc"
  auto_create_subnetworks = false
  project                 = var.project_id
}

resource "google_compute_subnetwork" "gke" {
  name          = "gke-subnet"
  project       = var.project_id
  region        = var.region
  network       = google_compute_network.main.id
  ip_cidr_range = "10.0.0.0/24"    # node IPs

  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = "10.1.0.0/16"   # 65K pod IPs
  }

  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = "10.2.0.0/20"   # 4K service IPs
  }

  private_ip_google_access = true   # nodes can reach Google APIs without external IP
}
```

**Gotcha**: `auto_create_subnetworks = false` is essential. The default (`true`) creates a subnet in every region with /20 CIDRs — almost never what you want.

**Gotcha**: Secondary range sizing matters. For GKE, the pods range needs to be large enough for `max_pods_per_node × max_nodes`. A /16 gives 65K pod IPs, which supports ~600 nodes with the default 110 pods per node.

**Gotcha**: `private_ip_google_access = true` is required for private GKE nodes to reach Google Container Registry, Cloud APIs, and other Google services without NAT.

## GKE Configuration

```hcl
resource "google_container_cluster" "main" {
  name     = "production"
  project  = var.project_id
  location = var.region  # regional cluster (HA across zones)

  network    = google_compute_network.main.id
  subnetwork = google_compute_subnetwork.gke.id

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  # Remove default node pool and manage separately
  remove_default_node_pool = true
  initial_node_count       = 1

  # Workload Identity
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }

  # Private cluster
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false  # allow kubectl from internet (or true for fully private)
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  # Release channel for auto-upgrades
  release_channel {
    channel = "REGULAR"  # RAPID, REGULAR, or STABLE
  }

  # Network policy enforcement
  network_policy {
    enabled  = true
    provider = "CALICO"
  }

  depends_on = [google_project_service.apis]
}

resource "google_container_node_pool" "main" {
  name     = "production-nodes"
  project  = var.project_id
  location = var.region
  cluster  = google_container_cluster.main.name

  initial_node_count = 3

  autoscaling {
    min_node_count = 2
    max_node_count = 10
  }

  node_config {
    machine_type    = "e2-standard-4"
    service_account = google_service_account.gke_nodes.email

    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]

    workload_metadata_config {
      mode = "GKE_METADATA"  # required for Workload Identity
    }

    shielded_instance_config {
      enable_secure_boot = true
    }
  }

  management {
    auto_repair  = true
    auto_upgrade = true
  }
}
```

**Gotcha**: `remove_default_node_pool = true` requires `initial_node_count = 1`. GKE creates the default pool then immediately deletes it. Without `initial_node_count`, Terraform fails.

**Gotcha**: `master_ipv4_cidr_block` must be a /28 that does not overlap with any subnet in the VPC. Forgetting this produces a confusing error about CIDR range conflicts.

### GKE Workload Identity

```hcl
# GCP service account for the workload
resource "google_service_account" "workload" {
  account_id   = "my-app-workload"
  display_name = "My App Workload Identity"
  project      = var.project_id
}

# Allow the K8s service account to impersonate the GCP service account
resource "google_service_account_iam_member" "workload_identity" {
  service_account_id = google_service_account.workload.name
  role               = "roles/iam.workloadIdentityUser"
  member             = "serviceAccount:${var.project_id}.svc.id.goog[default/my-app]"
}

# Grant the GCP SA permissions it needs
resource "google_project_iam_member" "workload_storage" {
  project = var.project_id
  role    = "roles/storage.objectViewer"
  member  = "serviceAccount:${google_service_account.workload.email}"
}

# K8s service account annotated with GCP SA
resource "kubernetes_service_account" "app" {
  metadata {
    name      = "my-app"
    namespace = "default"
    annotations = {
      "iam.gke.io/gcp-service-account" = google_service_account.workload.email
    }
  }
}
```

**Gotcha**: The `member` format for Workload Identity binding is `serviceAccount:{project}.svc.id.goog[{namespace}/{sa-name}]`. The brackets are literal — they are part of the member string, not formatting.

## Cloud SQL with Private Networking

```hcl
# Reserve an IP range for service networking
resource "google_compute_global_address" "private_ip" {
  name          = "sql-private-ip"
  project       = var.project_id
  purpose       = "VPC_PEERING"
  address_type  = "INTERNAL"
  prefix_length = 16
  network       = google_compute_network.main.id
}

# Create the peering connection
resource "google_service_networking_connection" "private_vpc" {
  network                 = google_compute_network.main.id
  service                 = "servicenetworking.googleapis.com"
  reserved_peering_ranges = [google_compute_global_address.private_ip.name]

  depends_on = [google_project_service.apis]
}

resource "google_sql_database_instance" "main" {
  name             = "production-postgres"
  project          = var.project_id
  database_version = "POSTGRES_15"
  region           = var.region

  settings {
    tier              = "db-custom-2-8192"
    disk_size         = 50
    disk_autoresize   = true
    availability_type = "REGIONAL"

    ip_configuration {
      ipv4_enabled    = false           # no public IP
      private_network = google_compute_network.main.id
    }

    backup_configuration {
      enabled                        = true
      point_in_time_recovery_enabled = true
      start_time                     = "03:00"
    }

    maintenance_window {
      day  = 7  # Sunday
      hour = 3
    }
  }

  deletion_protection = true

  depends_on = [google_service_networking_connection.private_vpc]
}
```

**Gotcha**: The service networking connection must exist before Cloud SQL can use private IP. The `depends_on` is mandatory — without it, Terraform races and the database creation fails.

**Gotcha**: Cloud SQL instance names are globally unique per project and cannot be reused for 7 days after deletion. If you destroy and recreate, use a different name or wait.

**Gotcha**: `deletion_protection = true` is a GCP API flag (separate from Terraform's `lifecycle { prevent_destroy }`). Set both for production databases.

## Common GCP Terraform Gotchas

| Gotcha | Symptom | Fix |
|---|---|---|
| API not enabled | `googleapi: Error 403: API not enabled` | Add `google_project_service` for the API |
| API propagation delay | `PERMISSION_DENIED` after enabling API | Add `time_sleep` or `depends_on` chain |
| IAM eventual consistency | Permission denied after granting role | Wait 60 seconds, retry. Not a Terraform issue. |
| `iam_binding` overwrites | Other permissions silently removed | Use `google_project_iam_member`, never `iam_binding` |
| Cloud SQL name reuse | Cannot create instance with recently deleted name | Use unique names or wait 7 days |
| Default network exists | Terraform plan shows unexpected resources | Delete default network or import it |
| GKE secondary ranges required | Cluster creation fails with IP range error | Define secondary ranges on the subnet |
| Private cluster master CIDR | Overlap error with existing ranges | Use a /28 from unused CIDR space (172.16.0.0/28) |
| Service networking dependency | Cloud SQL fails without private networking | Add `depends_on` for service networking connection |
| `disable_on_destroy` default | API disabled on `terraform destroy`, cascading deletes | Set `disable_on_destroy = false` on all `google_project_service` |
| Labels vs tags | GCP uses `labels` (key-value) not `tags` (network tags) | Use `labels` for metadata, `tags` for firewall targeting |

