PostgreSQL Backup and Recovery

PostgreSQL Backup and Recovery#

A backup you have never tested restoring is not a backup. This covers the main backup tools, when to use each, point-in-time recovery, and automation.

Logical Backups: pg_dump and pg_dumpall#

pg_dump exports a single database as SQL or a compressed binary format. It takes a consistent snapshot without blocking writes.

# Custom format (compressed, supports parallel restore)
pg_dump -U postgres -Fc -d myapp -f myapp.dump

# Directory format (parallel dump)
pg_dump -U postgres -Fd -j 4 -d myapp -f myapp_dir/

pg_dumpall exports every database plus cluster-wide objects. In practice, dump roles separately and per-database for flexibility:

PostgreSQL Debugging

PostgreSQL Debugging#

When PostgreSQL breaks, it usually falls into a handful of patterns. This is a reference for diagnosing each one with specific queries and commands.

Connection Refused#

Work through these in order:

1. Is PostgreSQL running?

sudo systemctl status postgresql-16

2. Is it listening on the right address?

ss -tlnp | grep 5432

If it shows 127.0.0.1:5432 but you need remote access, set listen_addresses = '*' in postgresql.conf.

3. Does pg_hba.conf allow the connection? Check logs for no pg_hba.conf entry for host:

PostgreSQL Performance Tuning

PostgreSQL Performance Tuning#

Most PostgreSQL performance problems come from missing indexes, bad query plans, connection overhead, or table bloat. This covers how to diagnose each one.

Reading EXPLAIN ANALYZE#

EXPLAIN shows the query plan. EXPLAIN ANALYZE actually executes the query and shows real timings.

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
Index Scan using idx_orders_customer on orders  (cost=0.43..8.45 rows=1 width=120) (actual time=0.023..0.025 rows=3 loops=1)
  Index Cond: (customer_id = 42)
  Filter: (status = 'pending'::text)
  Rows Removed by Filter: 12
Planning Time: 0.152 ms
Execution Time: 0.048 ms

What to look for: Seq Scan on large tables means a missing index. Rows Removed by Filter means the index fetched extra rows that a composite index would eliminate. actual rows far from estimated rows means stale statistics – run ANALYZE tablename;. Nested Loop with high loops count usually wants a hash join; check the inner table’s indexes.

PostgreSQL Replication

PostgreSQL Replication#

Streaming replication gives you a full binary copy for high availability and read scaling. Logical replication gives you selective table-level syncing between databases that can run different PostgreSQL versions.

Streaming Replication Setup#

Configure the Primary#

# postgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB

Create a replication role and allow connections:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'repl-secret';
# pg_hba.conf
host  replication  replicator  10.0.0.0/8  scram-sha-256

Initialize the Standby#

sudo systemctl stop postgresql-16
sudo rm -rf /var/lib/postgresql/16/main/*
pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/16/main \
  --checkpoint=fast --wal-method=stream -R -P
sudo chown -R postgres:postgres /var/lib/postgresql/16/main
sudo systemctl start postgresql-16

The -R flag creates standby.signal and writes connection info to postgresql.auto.conf. The standby now continuously receives and replays WAL from the primary, accepting read-only queries by default.

PostgreSQL Setup and Configuration

PostgreSQL Setup and Configuration#

Every PostgreSQL deployment boils down to three things: get the binary running, configure who can connect, and tune the memory settings.

Installation Methods#

Package Managers#

On Debian/Ubuntu, use the official PostgreSQL APT repository:

sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt install -y postgresql-16

On macOS: brew install postgresql@16 && brew services start postgresql@16

On RHEL/Fedora:

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo dnf install -y postgresql16-server
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
sudo systemctl enable --now postgresql-16

Config files live at /etc/postgresql/16/main/ (Debian) or /var/lib/pgsql/16/data/ (RHEL).

Progressive Agent Adoption: From First Task to Autonomous Workflows

Progressive Agent Adoption#

Nobody goes from “I have never used an agent” to “my agent runs multi-hour autonomous workflows” in one step. Trust builds through experience. Each successful task at one level creates confidence to try the next. Skipping levels creates fear and bad outcomes — the agent does something unexpected, the human loses trust, and adoption stalls.

This article maps the adoption ladder from first task to autonomous workflows, with concrete examples of what to try at each level and signals that indicate readiness to move up.

Prometheus and Grafana Monitoring Stack

Prometheus Architecture#

Prometheus pulls metrics from targets at regular intervals (scraping). Each target exposes an HTTP endpoint (typically /metrics) that returns metrics in a text format. Prometheus stores the scraped data in a local time-series database and evaluates alerting rules against it. Grafana connects to Prometheus as a data source and renders dashboards.

Scrape Configuration#

The core of Prometheus configuration is the scrape config. Each scrape_config block defines a set of targets and how to scrape them.

Prometheus Architecture Deep Dive

Pull-Based Scraping Model#

Prometheus pulls metrics from targets rather than having targets push metrics to it. Every scrape interval (default 15s in the global config), Prometheus sends an HTTP GET to each target’s metrics endpoint. The target responds with all its current metric values in Prometheus exposition format.

This pull model has concrete advantages. Prometheus controls the scrape rate, so a misbehaving target cannot flood the system. You can scrape a target from your laptop with curl http://target:8080/metrics to see exactly what Prometheus sees. Targets that go down are immediately detectable because the scrape fails.

Prompt Engineering for Infrastructure Operations: Templates, Safety, and Structured Reasoning

Prompt Engineering for Infrastructure Operations#

Infrastructure prompts differ from general-purpose prompts in one critical way: the output often drives real actions on real systems. A hallucinated filename in a creative writing task is harmless. A hallucinated resource name in a Kubernetes delete command causes an outage. Every prompt pattern here is designed with that asymmetry in mind – prioritizing correctness and safety over cleverness.

Structured Output for Infrastructure Data#

Infrastructure operations produce structured data: IP addresses, resource names, status codes, configuration values. Free-form text responses create parsing fragility. Force structured output from the start.

Prompt Engineering for Local Models: Presets, Focus Areas, and Differences from Cloud Model Prompting

Prompt Engineering for Local Models#

Prompting a 7B local model is not the same as prompting Claude or GPT-4. Cloud models are overtrained on instruction following, tolerate vague prompts, and self-correct. Small local models need more structure, more constraints, and more explicit formatting instructions. The prompts that work effortlessly on cloud models often produce garbage on local models.

This is not a weakness — it is a design consideration. Local models trade generality for speed and cost. Your prompts must compensate by being more specific.