CQRS and Event Sourcing

CQRS and Event Sourcing#

CQRS (Command Query Responsibility Segregation) and event sourcing are frequently discussed together but are independent patterns. You can use either one without the other. Understanding each separately is essential before combining them.

CQRS: Separate Read and Write Models#

Most applications use the same data model for reads and writes. The same orders table serves the API that creates orders and the API that lists them. This works until read and write requirements diverge significantly.

Distributed Data Consistency Patterns

Distributed Data Consistency Patterns#

In a monolith with a single database, you get ACID transactions. In microservices, each service owns its database. Cross-service consistency requires explicit patterns because distributed ACID transactions are impractical in most real systems.

CAP Theorem: Practical Implications#

The CAP theorem states that a distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance. Since network partitions are inevitable, you choose between consistency and availability during partitions.

Database Schema Migrations in CI/CD: Tools, Pipeline Integration, and Zero-Downtime Strategies

Database Schema Migrations in CI/CD#

Schema migrations are the riskiest step in most deployment pipelines. Application code can be rolled back in seconds by deploying the previous container image. A database migration that drops a column, changes a data type, or restructures a table cannot be undone by pressing a button. Yet many teams run migrations manually, or tack them onto deployment scripts without testing, rollback plans, or zero-downtime considerations.

PostgreSQL Disaster Recovery

PostgreSQL Disaster Recovery#

A DR plan for PostgreSQL has three layers: streaming replication for fast failover, WAL archiving for point-in-time recovery, and a backup tool like pgBackRest for managing retention. Each layer covers a different failure mode – replication for server crashes, WAL archiving for data corruption that replicates, full backups for when everything goes wrong.

Streaming Replication for DR#

Synchronous vs Asynchronous – The Core Tradeoff#

Asynchronous replication is the default. The primary streams WAL to the standby, but does not wait for confirmation before committing. This means the primary is fast, but the standby can be seconds behind. If the primary dies, those uncommitted-on-standby transactions are lost.

Database Cross-Region Replication Patterns

Database Cross-Region Replication Patterns#

Cross-region replication exists because regions fail. AWS us-east-1 has had multiple multi-hour outages. If your database runs in a single region, a regional failure takes your application down entirely. Cross-region replication gives you a copy of the data somewhere else so you can recover.

The fundamental problem is physics. Light through fiber between US East and US West takes about 30ms one way. Every replication strategy is a different answer to the question: do you wait for the remote region to confirm it has the data before telling the client the write succeeded?

Planning and Executing Database Migrations: Schema Changes, Data Migrations, and Zero-Downtime Patterns

Planning and Executing Database Migrations#

Database migrations are the highest-risk routine operations most teams perform. A bad migration can cause downtime, data loss, or application errors that cascade across every service that touches the affected tables. This operational sequence walks through the assessment, planning, execution, and rollback of database migrations from simple column additions to full platform changes.

Phase 1 – Assessment#

Step 1: Classify the Migration#

Every migration falls into one of three categories, each with a different risk profile:

PostgreSQL Backup and Recovery

PostgreSQL Backup and Recovery#

A backup you have never tested restoring is not a backup. This covers the main backup tools, when to use each, point-in-time recovery, and automation.

Logical Backups: pg_dump and pg_dumpall#

pg_dump exports a single database as SQL or a compressed binary format. It takes a consistent snapshot without blocking writes.

# Custom format (compressed, supports parallel restore)
pg_dump -U postgres -Fc -d myapp -f myapp.dump

# Directory format (parallel dump)
pg_dump -U postgres -Fd -j 4 -d myapp -f myapp_dir/

pg_dumpall exports every database plus cluster-wide objects. In practice, dump roles separately and per-database for flexibility:

PostgreSQL Debugging

PostgreSQL Debugging#

When PostgreSQL breaks, it usually falls into a handful of patterns. This is a reference for diagnosing each one with specific queries and commands.

Connection Refused#

Work through these in order:

1. Is PostgreSQL running?

sudo systemctl status postgresql-16

2. Is it listening on the right address?

ss -tlnp | grep 5432

If it shows 127.0.0.1:5432 but you need remote access, set listen_addresses = '*' in postgresql.conf.

3. Does pg_hba.conf allow the connection? Check logs for no pg_hba.conf entry for host:

PostgreSQL Performance Tuning

PostgreSQL Performance Tuning#

Most PostgreSQL performance problems come from missing indexes, bad query plans, connection overhead, or table bloat. This covers how to diagnose each one.

Reading EXPLAIN ANALYZE#

EXPLAIN shows the query plan. EXPLAIN ANALYZE actually executes the query and shows real timings.

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
Index Scan using idx_orders_customer on orders  (cost=0.43..8.45 rows=1 width=120) (actual time=0.023..0.025 rows=3 loops=1)
  Index Cond: (customer_id = 42)
  Filter: (status = 'pending'::text)
  Rows Removed by Filter: 12
Planning Time: 0.152 ms
Execution Time: 0.048 ms

What to look for: Seq Scan on large tables means a missing index. Rows Removed by Filter means the index fetched extra rows that a composite index would eliminate. actual rows far from estimated rows means stale statistics – run ANALYZE tablename;. Nested Loop with high loops count usually wants a hash join; check the inner table’s indexes.

PostgreSQL Replication

PostgreSQL Replication#

Streaming replication gives you a full binary copy for high availability and read scaling. Logical replication gives you selective table-level syncing between databases that can run different PostgreSQL versions.

Streaming Replication Setup#

Configure the Primary#

# postgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB

Create a replication role and allow connections:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'repl-secret';
# pg_hba.conf
host  replication  replicator  10.0.0.0/8  scram-sha-256

Initialize the Standby#

sudo systemctl stop postgresql-16
sudo rm -rf /var/lib/postgresql/16/main/*
pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/16/main \
  --checkpoint=fast --wal-method=stream -R -P
sudo chown -R postgres:postgres /var/lib/postgresql/16/main
sudo systemctl start postgresql-16

The -R flag creates standby.signal and writes connection info to postgresql.auto.conf. The standby now continuously receives and replays WAL from the primary, accepting read-only queries by default.