Data Consistency in Multi-Region Deployments

Data Consistency in Multi-Region Deployments#

When you replicate data across regions, you are forced to choose between consistency, latency, and availability. You cannot have all three. Every multi-region system makes this tradeoff explicitly or, more dangerously, implicitly by ignoring it until production exposes the consequences.

The Fundamental Tension#

Strong consistency means every read sees the most recent write, regardless of which region it comes from. This requires cross-region coordination on every write (30-100ms per round trip). Eventual consistency means reads might see stale data, but replicas converge given enough time – usually milliseconds to seconds, but during partitions it can be minutes.

Active-Active Architecture Patterns: Multi-Region, Data Replication, and Split-Brain Resolution

What Active-Active Actually Means#

Active-active means both (or all) regions are serving production traffic simultaneously. Not standing by. Not warmed up and waiting. Actually processing real user requests right now. A user in Frankfurt hits the EU region; a user in Virginia hits the US-East region. Both regions are authoritative. Both can read and write.

This is fundamentally different from active-passive, where the secondary region exists but does not serve traffic until failover. The distinction matters because active-active introduces a class of problems that active-passive avoids entirely – primarily, what happens when two regions modify the same data at the same time.

Active-Passive vs Active-Active: Decision Framework for Multi-Region Architecture

The Core Difference#

Active-passive: one region handles all traffic, a second region stands ready to take over. Failover is an event – something triggers it, traffic shifts, and there is a gap between detection and recovery.

Active-active: both regions handle production traffic simultaneously. There is no failover event for regional traffic – if one region fails, the other is already serving users. The complexity is in keeping data consistent across regions, not in switching traffic.

Multi-Region Kubernetes: Service Mesh Federation, Cross-Cluster Networking, and GitOps

Multi-Region Kubernetes#

Running Kubernetes in a single region is a single point of failure at the infrastructure level. Region outages are rare but real – AWS us-east-1 has gone down multiple times, taking entire companies offline. Multi-region Kubernetes addresses this, but it introduces complexity in networking, state management, and deployment coordination that you must handle deliberately.

Independent Clusters with Shared GitOps#

The simplest multi-region pattern: run completely independent clusters in each region, deploy the same applications to all of them using GitOps, and route traffic with DNS or a global load balancer.

Cloud Multi-Region Architecture: AWS, GCP, and Azure Patterns with Terraform

Cloud Multi-Region Architecture Patterns#

Multi-region is not just running clusters in two places. It is the networking between them, the data replication strategy, the traffic routing, and the cost of keeping it all running. Each cloud provider has different primitives and different pricing models. Here is how to build it on each.

The three pillars: a Kubernetes cluster per region for compute, a global traffic routing layer to direct users to the nearest healthy region, and a multi-region database for state. Get any one wrong and multi-region gives you complexity without resilience.

CockroachDB Day-2 Operations

Adding and Removing Nodes#

Adding a node: start a new cockroach process with --join pointing to existing nodes. CockroachDB automatically rebalances ranges to the new node.

cockroach start --insecure --store=node4-data \
  --advertise-addr=node4:26257 \
  --join=node1:26257,node2:26257,node3:26257

Watch rebalancing in the DB Console under Metrics > Replication, or query directly:

SELECT node_id, range_count, lease_count FROM crdb_internal.kv_store_status;

Decommissioning a node moves all range replicas off before shutdown, preventing under-replication:

cockroach node decommission 4 --insecure --host=node1:26257

# Monitor progress
cockroach node status --insecure --host=node1:26257 --decommission

Do not simply kill a node. Without decommissioning, CockroachDB treats it as a failure and waits 5 minutes before re-replicating. On Kubernetes with the operator, scale by changing spec.nodes in the CrdbCluster resource.

Data Sovereignty and Residency: Jurisdictional Requirements, GDPR, and Multi-Region Architecture

Data Sovereignty and Residency#

Data sovereignty is the principle that data is subject to the laws of the country where it is stored or processed. Data residency is the requirement to keep data within a specific geographic boundary. These are not abstract legal concepts — they dictate where you deploy infrastructure, how you replicate data, and what services you can use.

Get this wrong and the consequences are regulatory fines, contract violations, and loss of customer trust. GDPR fines alone have exceeded billions of euros since enforcement began.

Terraform Provider Configuration Patterns: Versioning, Aliasing, Multi-Region, and Authentication

Terraform Provider Configuration Patterns#

Providers are Terraform’s interface to cloud APIs. Misconfiguring them causes resources to be created in the wrong region, with the wrong credentials, or with an incompatible provider version. These failures are often silent — Terraform succeeds, but the resource is in the wrong place.

Version Constraints#

The Required Providers Block#

Every configuration should declare its providers with version constraints:

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.25"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.12"
    }
  }
}

Version Constraint Syntax#

ConstraintMeaningExample
= 5.31.0Exact version onlyPin for maximum reproducibility
~> 5.0Any 5.x (>=5.0.0, <6.0.0)Allow minor + patch updates
~> 5.31Any 5.31.x (>=5.31.0, <5.32.0)Allow patch updates only
>= 5.0, < 6.0RangeSame as ~> 5.0 but explicit
>= 5.0Any version 5.0 or newerDangerous — allows breaking changes

Recommended patterns: