CockroachDB Day-2 Operations

Adding and Removing Nodes#

Adding a node: start a new cockroach process with --join pointing to existing nodes. CockroachDB automatically rebalances ranges to the new node.

cockroach start --insecure --store=node4-data \
  --advertise-addr=node4:26257 \
  --join=node1:26257,node2:26257,node3:26257

Watch rebalancing in the DB Console under Metrics > Replication, or query directly:

SELECT node_id, range_count, lease_count FROM crdb_internal.kv_store_status;

Decommissioning a node moves all range replicas off before shutdown, preventing under-replication:

cockroach node decommission 4 --insecure --host=node1:26257

# Monitor progress
cockroach node status --insecure --host=node1:26257 --decommission

Do not simply kill a node. Without decommissioning, CockroachDB treats it as a failure and waits 5 minutes before re-replicating. On Kubernetes with the operator, scale by changing spec.nodes in the CrdbCluster resource.

CockroachDB Debugging and Troubleshooting

Node Liveness Issues#

Every node must renew its liveness record every 4.5 seconds. Failure to renew marks the node suspect, then dead, triggering re-replication of its ranges.

cockroach node status --insecure --host=localhost:26257

Look at is_live. If a node shows false, check in order:

Process crashed. Check cockroach-data/logs/ for fatal or panic entries. OOM kills are the most common cause – check dmesg | grep -i oom on the host.

Network partition. The node runs but cannot reach peers. If cockroach node status succeeds locally but fails from other nodes, the problem is network-level (firewalls, security groups, DNS).

CockroachDB Setup and Architecture

Architecture: What CockroachDB Actually Does Under the Hood#

CockroachDB is a distributed SQL database that stores data across multiple nodes while presenting a single logical database to clients. Understanding three concepts is essential before deploying it.

Ranges. All data is stored in key-value pairs, sorted by key. CockroachDB splits this sorted keyspace into contiguous chunks called ranges, each targeting 512 MiB by default. Every SQL table, index, and system table maps to one or more ranges. When a range grows beyond the threshold, it splits automatically.

Choosing a Database Strategy: On Kubernetes vs Managed Service, and PostgreSQL vs MySQL vs CockroachDB

Choosing a Database Strategy#

Every Kubernetes-based platform eventually faces two questions: should the database run inside the cluster or as a managed service, and which database engine fits the workload? These decisions are difficult to reverse. A database migration is one of the highest-risk operations in production. Getting the initial decision roughly right saves months of future pain.

Where to Run: Kubernetes vs Managed Service#

This is not a technology question. It is an organizational question about who owns database operations and what tradeoffs the team will accept.