PostgreSQL Disaster Recovery

PostgreSQL Disaster Recovery#

A DR plan for PostgreSQL has three layers: streaming replication for fast failover, WAL archiving for point-in-time recovery, and a backup tool like pgBackRest for managing retention. Each layer covers a different failure mode – replication for server crashes, WAL archiving for data corruption that replicates, full backups for when everything goes wrong.

Streaming Replication for DR#

Synchronous vs Asynchronous – The Core Tradeoff#

Asynchronous replication is the default. The primary streams WAL to the standby, but does not wait for confirmation before committing. This means the primary is fast, but the standby can be seconds behind. If the primary dies, those uncommitted-on-standby transactions are lost.

PostgreSQL Replication

PostgreSQL Replication#

Streaming replication gives you a full binary copy for high availability and read scaling. Logical replication gives you selective table-level syncing between databases that can run different PostgreSQL versions.

Streaming Replication Setup#

Configure the Primary#

# postgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB

Create a replication role and allow connections:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'repl-secret';
# pg_hba.conf
host  replication  replicator  10.0.0.0/8  scram-sha-256

Initialize the Standby#

sudo systemctl stop postgresql-16
sudo rm -rf /var/lib/postgresql/16/main/*
pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/16/main \
  --checkpoint=fast --wal-method=stream -R -P
sudo chown -R postgres:postgres /var/lib/postgresql/16/main
sudo systemctl start postgresql-16

The -R flag creates standby.signal and writes connection info to postgresql.auto.conf. The standby now continuously receives and replays WAL from the primary, accepting read-only queries by default.