---
title: "Database Cross-Region Replication Patterns"
description: "Cross-region replication strategies for databases — synchronous vs asynchronous tradeoffs, semi-synchronous replication, replication topologies, monitoring lag with PromQL, MySQL Group Replication, PostgreSQL logical replication, and promoting read replicas for DR."
url: https://agent-zone.ai/knowledge/databases/database-cross-region-replication/
section: knowledge
date: 2026-02-22
categories: ["databases"]
tags: ["replication","cross-region","disaster-recovery","postgresql","mysql","monitoring","prometheus","replication-lag","logical-replication","group-replication"]
skills: ["replication-design","cross-region-architecture","monitoring-setup","failover-management"]
tools: ["postgresql","mysql","prometheus","grafana","psql","mysqld"]
levels: ["intermediate","advanced"]
word_count: 1099
formats:
  json: https://agent-zone.ai/knowledge/databases/database-cross-region-replication/index.json
  html: https://agent-zone.ai/knowledge/databases/database-cross-region-replication/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Database+Cross-Region+Replication+Patterns
---


# Database Cross-Region Replication Patterns

Cross-region replication exists because regions fail. AWS us-east-1 has had multiple multi-hour outages. If your database runs in a single region, a regional failure takes your application down entirely. Cross-region replication gives you a copy of the data somewhere else so you can recover.

The fundamental problem is physics. Light through fiber between US East and US West takes about 30ms one way. Every replication strategy is a different answer to the question: do you wait for the remote region to confirm it has the data before telling the client the write succeeded?

## Synchronous Replication -- Zero Data Loss, High Latency Cost

Synchronous cross-region replication means the primary waits for the remote replica to confirm it has written the data to disk before acknowledging the commit. Every write pays the full round-trip latency penalty.

In practice, this means every INSERT, UPDATE, or DELETE takes at least 60ms longer (30ms each way for US East to US West). For a transaction that normally takes 2ms, this is a 30x slowdown. For workloads doing thousands of writes per second, this is rarely acceptable.

```
-- PostgreSQL: enable synchronous replication to a named standby
ALTER SYSTEM SET synchronous_standby_names = 'FIRST 1 (standby_west)';
SELECT pg_reload_conf();
```

**When synchronous cross-region makes sense:** Financial systems where losing a single transaction means regulatory violations. Systems doing fewer than 100 writes per second where the latency is tolerable. Systems where you can batch writes to amortize the round-trip cost.

## Asynchronous Replication -- Fast Writes, Data Loss Window

Asynchronous replication lets the primary commit immediately and sends WAL/binlog to the replica in the background. The replica is always some amount behind the primary. This gap is your data loss window if the primary region fails.

Under normal load, async lag for cross-region replication is typically 100ms-2 seconds. Under write spikes, lag can grow to minutes. During a network partition between regions, lag grows unbounded until the partition heals.

```
-- PostgreSQL: check replication lag on the standby
SELECT now() - pg_last_xact_replay_timestamp() AS replication_delay;

-- MySQL: check seconds behind source
SHOW REPLICA STATUS\G
-- Look at Seconds_Behind_Source
```

**The data loss question:** If your primary region dies with 2 seconds of replication lag, you lose 2 seconds of transactions. For most applications, this is acceptable. For payment processing, it is not. Know your RPO before choosing.

## Semi-Synchronous -- The Middle Ground

MySQL offers semi-synchronous replication as a native feature. The primary waits for at least one replica to acknowledge receipt (not replay, just receipt) of the transaction's binlog events before committing. If no replica acknowledges within a timeout, it falls back to async.

```ini
# MySQL source configuration
plugin-load-add = "rpl_semi_sync_source=semisync_source.so"
rpl_semi_sync_source_enabled = 1
rpl_semi_sync_source_wait_for_replica_count = 1
rpl_semi_sync_source_timeout = 5000  # 5 seconds, then fallback to async
```

The timeout fallback is critical. Without it, a network partition between regions would block all writes on the primary. With it, you get zero data loss under normal conditions and graceful degradation to async when the remote region is unreachable.

PostgreSQL does not have a direct equivalent but you can approximate it: set `synchronous_commit = remote_write` which waits for the OS to receive the data but not flush to disk, combined with a monitoring alert if the standby disconnects.

## Replication Topologies

### Star (Hub and Spoke)

One primary replicates directly to all replicas. Every replica gets data with minimal lag. For 2-3 replicas, this is fine. For 5+ replicas, the primary's network bandwidth for WAL sending becomes a concern.

### Chain (Cascading)

Primary replicates to Region B, which replicates to Region C. Reduces primary load but adds cumulative lag. If Primary-to-B lag is 1 second and B-to-C lag is 1 second, Region C is 2 seconds behind.

```
-- PostgreSQL cascading: Region C standby's postgresql.auto.conf
primary_conninfo = 'host=regionb-standby.internal port=5432 user=replicator'
```

Use cascading when you have 4+ regions. Put your most critical DR region on a direct link to the primary. If the intermediate node dies, all downstream replicas lose their source -- automate reconfiguration to handle this.

## MySQL Group Replication for Multi-Region

Group Replication provides multi-primary or single-primary replication with automatic conflict detection. For cross-region, single-primary mode is safer because multi-primary across regions creates high-latency certification conflicts.

```ini
# MySQL Group Replication - single-primary mode
group_replication_single_primary_mode = ON
group_replication_group_name = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
group_replication_enforce_update_everywhere_checks = OFF
```

Group Replication uses Paxos-based consensus internally. Cross-region latency directly impacts commit throughput because each transaction needs a majority of members to certify it. With 3 members across 2 regions, every write waits for at least one cross-region round trip.

## PostgreSQL Logical Replication for Selective Tables

When you only need specific tables replicated across regions (not the entire database), logical replication avoids the overhead of full physical replication.

```sql
-- On the publisher (primary region)
CREATE PUBLICATION orders_pub FOR TABLE orders, order_items, customers;

-- On the subscriber (DR region)
CREATE SUBSCRIPTION orders_sub
  CONNECTION 'host=primary.us-east-1.internal dbname=myapp user=replicator'
  PUBLICATION orders_pub;
```

The subscriber can have additional local tables, different indexes, even a different PostgreSQL version. **Limitation:** Logical replication does not replicate DDL. Schema changes must be applied manually on both sides.

## Promoting a Read Replica

When the primary region fails and you need to promote the DR replica:

```bash
# PostgreSQL
pg_ctl promote -D /var/lib/postgresql/16/main
# Or via SQL (PostgreSQL 12+)
SELECT pg_promote();

# MySQL - stop replication and reset
STOP REPLICA;
RESET REPLICA ALL;
```

After promotion, update your application's connection string or DNS. If you use a connection proxy (PgBouncer, ProxySQL), update its backend configuration. The time between "primary is dead" and "DR replica is serving writes" is your actual RTO.

## Monitoring Replication Lag with PromQL

If you expose PostgreSQL metrics via `postgres_exporter` or MySQL metrics via `mysqld_exporter`, these PromQL queries catch replication problems:

```promql
# PostgreSQL: replication lag in seconds (alert if > 30s)
pg_replication_lag_seconds > 30

# PostgreSQL: WAL send lag in bytes (alert if > 100MB)
pg_stat_replication_pg_wal_lsn_diff > 100000000

# MySQL: seconds behind source (alert if > 30s)
mysql_slave_status_seconds_behind_master > 30

# Rate of lag increase (lag growing by > 1s per minute)
rate(pg_replication_lag_seconds[5m]) > 1
```

The rate-of-increase alert is the most important one. A lag of 5 seconds that is stable is fine. A lag of 5 seconds that is growing by 1 second per minute means you will hit your RPO limit in 25 minutes and need to investigate immediately.

### Alerting Thresholds

| Metric | Warning | Critical |
|---|---|---|
| Replication lag (same region) | > 5 seconds | > 30 seconds |
| Replication lag (cross-region) | > 15 seconds | > 60 seconds |
| Lag rate of increase | > 0.5s/min | > 2s/min |
| Replica disconnected | - | > 60 seconds |
| WAL send backlog | > 500 MB | > 2 GB |

Set your critical threshold below your RPO. If your RPO is 60 seconds, critical should fire at 30 seconds to give you time to respond.

