CockroachDB Day-2 Operations

Adding and Removing Nodes#

Adding a node: start a new cockroach process with --join pointing to existing nodes. CockroachDB automatically rebalances ranges to the new node.

cockroach start --insecure --store=node4-data \
  --advertise-addr=node4:26257 \
  --join=node1:26257,node2:26257,node3:26257

Watch rebalancing in the DB Console under Metrics > Replication, or query directly:

SELECT node_id, range_count, lease_count FROM crdb_internal.kv_store_status;

Decommissioning a node moves all range replicas off before shutdown, preventing under-replication:

cockroach node decommission 4 --insecure --host=node1:26257

# Monitor progress
cockroach node status --insecure --host=node1:26257 --decommission

Do not simply kill a node. Without decommissioning, CockroachDB treats it as a failure and waits 5 minutes before re-replicating. On Kubernetes with the operator, scale by changing spec.nodes in the CrdbCluster resource.

CockroachDB Debugging and Troubleshooting

Node Liveness Issues#

Every node must renew its liveness record every 4.5 seconds. Failure to renew marks the node suspect, then dead, triggering re-replication of its ranges.

cockroach node status --insecure --host=localhost:26257

Look at is_live. If a node shows false, check in order:

Process crashed. Check cockroach-data/logs/ for fatal or panic entries. OOM kills are the most common cause – check dmesg | grep -i oom on the host.

Network partition. The node runs but cannot reach peers. If cockroach node status succeeds locally but fails from other nodes, the problem is network-level (firewalls, security groups, DNS).

CockroachDB Setup and Architecture

Architecture: What CockroachDB Actually Does Under the Hood#

CockroachDB is a distributed SQL database that stores data across multiple nodes while presenting a single logical database to clients. Understanding three concepts is essential before deploying it.

Ranges. All data is stored in key-value pairs, sorted by key. CockroachDB splits this sorted keyspace into contiguous chunks called ranges, each targeting 512 MiB by default. Every SQL table, index, and system table maps to one or more ranges. When a range grows beyond the threshold, it splits automatically.

Database High Availability Patterns

Database High Availability Patterns#

Every database HA decision starts with two numbers: RPO (Recovery Point Objective – how much data you can afford to lose) and RTO (Recovery Time Objective – how long the database can be unavailable). These numbers dictate the pattern, and each pattern carries specific operational tradeoffs.

Core Concepts#

RPO = 0 means zero data loss. Every committed transaction must survive a failure. This requires synchronous replication, which adds latency to every write.

Database Performance Investigation Runbook

Database Performance Investigation Runbook#

When a database is slow, resist the urge to immediately tune configuration parameters. Follow this sequence: identify what is slow, understand why, then fix the specific bottleneck. Most performance problems are caused by missing indexes or a single bad query, not global configuration issues.

Phase 1 – Identify Slow Queries#

The first step is always finding which queries are consuming the most time.

PostgreSQL: pg_stat_statements#

Enable the extension if not already loaded:

Database Testing Strategies

Database Testing Strategies#

Database tests are the tests most teams get wrong. They either skip them entirely (testing with mocks, then discovering schema mismatches in production), or they build a fragile suite sharing a single database where tests interfere with each other. The right approach depends on what you are testing and what tradeoffs you can accept.

Fixtures vs Factories#

Fixtures#

Fixtures are static SQL files loaded before a test suite runs:

MongoDB Operational Patterns

MongoDB Operational Patterns#

MongoDB operations center on three areas: keeping the cluster healthy (replica sets and sharding), protecting data (backups), and keeping queries fast (indexes and explain plans). This reference covers the practical commands and patterns for each.

Replica Set Setup#

A replica set is the minimum production deployment – three data-bearing members that elect a primary and maintain identical copies of the data.

Launching Members#

Each member runs mongod with the same --replSet name:

MySQL 8.x Setup and Configuration

MySQL 8.x Setup and Configuration#

MySQL 8.x is the current production series. It introduced caching_sha2_password as the default auth plugin, CTEs, window functions, and a redesigned data dictionary. Getting it installed is straightforward; getting it configured correctly for production takes more thought.

Installation#

Package Managers#

On Ubuntu/Debian, the MySQL APT repository gives you the latest 8.x:

# Add the MySQL APT repo
wget https://dev.mysql.com/get/mysql-apt-config_0.8.30-1_all.deb
sudo dpkg -i mysql-apt-config_0.8.30-1_all.deb
sudo apt update
sudo apt install mysql-server

On RHEL/Rocky/AlmaLinux:

MySQL Backup and Recovery

MySQL Backup and Recovery#

A backup that has never been restored is not a backup. This guide covers the tools, when to use each, and how to verify your backups work.

Logical vs Physical Backups#

Logical backups export SQL statements. Portable across versions but slow for large databases. Physical backups copy raw InnoDB data files. Fast but tied to the same MySQL version. Physical backups are essential once your database exceeds a few hundred gigabytes.

MySQL Debugging: Common Problems and Solutions

MySQL Debugging: Common Problems and Solutions#

When MySQL breaks, it falls into a handful of failure modes. Here are the diagnostic workflows, in order of frequency.

Access Denied Errors#

Access denied for user 'appuser'@'10.0.1.5' (using password: YES) means wrong password, user does not exist for that host, or missing privileges.

Diagnosis:

-- 1. Does the user exist for that host?
SELECT user, host, plugin FROM mysql.user WHERE user = 'appuser';
-- MySQL matches user+host pairs. 'appuser'@'localhost' != 'appuser'@'%'.

-- 2. Check grants
SHOW GRANTS FOR 'appuser'@'%';

-- 3. Auth plugin mismatch? Old clients can't handle caching_sha2_password:
ALTER USER 'appuser'@'%' IDENTIFIED WITH mysql_native_password BY 'password';

To reset a lost root password: