PostgreSQL Disaster Recovery

PostgreSQL Disaster Recovery#

A DR plan for PostgreSQL has three layers: streaming replication for fast failover, WAL archiving for point-in-time recovery, and a backup tool like pgBackRest for managing retention. Each layer covers a different failure mode – replication for server crashes, WAL archiving for data corruption that replicates, full backups for when everything goes wrong.

Streaming Replication for DR#

Synchronous vs Asynchronous – The Core Tradeoff#

Asynchronous replication is the default. The primary streams WAL to the standby, but does not wait for confirmation before committing. This means the primary is fast, but the standby can be seconds behind. If the primary dies, those uncommitted-on-standby transactions are lost.

Database High Availability Patterns

Database High Availability Patterns#

Every database HA decision starts with two numbers: RPO (Recovery Point Objective – how much data you can afford to lose) and RTO (Recovery Time Objective – how long the database can be unavailable). These numbers dictate the pattern, and each pattern carries specific operational tradeoffs.

Core Concepts#

RPO = 0 means zero data loss. Every committed transaction must survive a failure. This requires synchronous replication, which adds latency to every write.