---
title: "CockroachDB Day-2 Operations"
description: "Node management, rolling upgrades, backup and restore, monitoring, changefeeds, and multi-region configuration for CockroachDB clusters."
url: https://agent-zone.ai/knowledge/databases/cockroachdb-operations/
section: knowledge
date: 2026-02-22
categories: ["databases"]
tags: ["cockroachdb","operations","backup","monitoring","cdc","multi-region"]
skills: ["cockroachdb-administration","database-operations","disaster-recovery"]
tools: ["cockroach","kubectl","db-console"]
levels: ["intermediate"]
word_count: 841
formats:
  json: https://agent-zone.ai/knowledge/databases/cockroachdb-operations/index.json
  html: https://agent-zone.ai/knowledge/databases/cockroachdb-operations/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=CockroachDB+Day-2+Operations
---


## Adding and Removing Nodes

**Adding a node:** start a new `cockroach` process with `--join` pointing to existing nodes. CockroachDB automatically rebalances ranges to the new node.

```bash
cockroach start --insecure --store=node4-data \
  --advertise-addr=node4:26257 \
  --join=node1:26257,node2:26257,node3:26257
```

Watch rebalancing in the DB Console under Metrics > Replication, or query directly:

```sql
SELECT node_id, range_count, lease_count FROM crdb_internal.kv_store_status;
```

**Decommissioning a node** moves all range replicas off before shutdown, preventing under-replication:

```bash
cockroach node decommission 4 --insecure --host=node1:26257

# Monitor progress
cockroach node status --insecure --host=node1:26257 --decommission
```

Do not simply kill a node. Without decommissioning, CockroachDB treats it as a failure and waits 5 minutes before re-replicating. On Kubernetes with the operator, scale by changing `spec.nodes` in the CrdbCluster resource.

## Rolling Upgrades

CockroachDB supports rolling upgrades with no downtime. The process is strict: you must upgrade one major version at a time, and all nodes must be on the same version before finalizing.

```bash
# 1. Check current version
cockroach sql --insecure -e "SELECT version();"

# 2. Upgrade each node's binary one at a time
#    Stop node, replace binary, restart. Repeat for every node.

# 3. After ALL nodes are running the new version, finalize
cockroach sql --insecure -e "SET CLUSTER SETTING version = crdb_internal.node_executable_version();"
```

Until you finalize, the cluster runs in mixed-version mode. New features from the upgrade are unavailable, but rollback is possible -- you can downgrade binaries back. After finalization, rollback is not possible. On Kubernetes, update the image tag in the CrdbCluster spec and the operator performs the rolling restart.

## Backup and Restore

CockroachDB has built-in `BACKUP` and `RESTORE` SQL statements. These operate at the cluster, database, or table level and write to cloud storage or local node storage.

```sql
-- Full backup of a database to S3
BACKUP DATABASE myapp INTO 's3://my-bucket/backups?AUTH=implicit'
  WITH revision_history;

-- Incremental backup (appended to the latest full backup)
BACKUP DATABASE myapp INTO LATEST IN 's3://my-bucket/backups?AUTH=implicit'
  WITH revision_history;

-- Restore a database (into a new name to avoid conflicts)
RESTORE DATABASE myapp FROM LATEST IN 's3://my-bucket/backups?AUTH=implicit'
  WITH new_db_name = 'myapp_restored';

-- Restore to a specific point in time
RESTORE DATABASE myapp FROM LATEST IN 's3://my-bucket/backups?AUTH=implicit'
  AS OF SYSTEM TIME '2026-02-21 14:00:00'
  WITH new_db_name = 'myapp_pit_restore';
```

**Scheduled backups** automate the process. CockroachDB manages the schedule internally:

```sql
CREATE SCHEDULE daily_backup FOR BACKUP DATABASE myapp
  INTO 's3://my-bucket/backups?AUTH=implicit'
  WITH revision_history
  RECURRING '@daily'
  FULL BACKUP '@weekly'
  WITH SCHEDULE OPTIONS first_run = 'now';

-- List schedules
SHOW SCHEDULES;

-- Pause or drop a schedule
PAUSE SCHEDULE 123456789;
DROP SCHEDULE 123456789;
```

The `FULL BACKUP @weekly` clause means daily runs produce incrementals, and once per week a full backup is taken. The `revision_history` option enables point-in-time restore to any timestamp within the backup window.

## Monitoring with the DB Console

The DB Console (port 8080 by default) provides real-time visibility. Key pages to check regularly:

- **Cluster Overview**: Node count, liveness status, total ranges, under-replicated ranges.
- **Metrics > SQL**: Query latency (p50, p99), transaction rates, active connections.
- **Metrics > Replication**: Range count per node, leaseholder distribution, rebalancing activity.
- **Metrics > Storage**: Disk usage per node, compaction activity, LSM health.
- **Hot Ranges**: Identifies ranges with disproportionate read/write traffic.
- **Statements/Transactions**: Slow query fingerprints, execution counts, contention time.

For Prometheus integration, CockroachDB exposes metrics at `/_status/vars` on the HTTP port:

```yaml
# Prometheus scrape config
scrape_configs:
  - job_name: cockroachdb
    metrics_path: /_status/vars
    scheme: http
    static_configs:
      - targets:
          - crdb-1:8080
          - crdb-2:8080
          - crdb-3:8080
```

Critical alerts to configure: `ranges_underreplicated > 0` for longer than 1 hour, `node_liveness` changes, `capacity_available` dropping below 20%, and `sql_service_latency_p99` exceeding your SLA threshold.

## Changefeeds for CDC

Changefeeds emit row-level changes as they happen, enabling change data capture (CDC) pipelines to Kafka, cloud storage, or webhooks.

```sql
-- Enterprise changefeed to Kafka
CREATE CHANGEFEED FOR TABLE myapp.orders
  INTO 'kafka://kafka-broker:9092?topic_prefix=crdb_'
  WITH format = json, resolved = '10s', updated;

-- Core changefeed (free, but outputs to the SQL client, not for production pipelines)
EXPERIMENTAL CHANGEFEED FOR orders WITH format = json;

-- Changefeed to cloud storage
CREATE CHANGEFEED FOR TABLE myapp.orders
  INTO 's3://my-bucket/cdc?AUTH=implicit'
  WITH format = json, resolved = '30s';

-- Monitor changefeeds
SHOW CHANGEFEED JOBS;
```

The `resolved` option emits timestamp markers so downstream consumers know they have seen all changes up to that point.

## Multi-Region Configuration

Set `--locality` on each node at startup so CockroachDB can make placement decisions:

```bash
cockroach start --locality=region=us-east-1,zone=us-east-1a ...
cockroach start --locality=region=us-west-2,zone=us-west-2a ...
cockroach start --locality=region=eu-west-1,zone=eu-west-1a ...
```

Then configure the database:

```sql
ALTER DATABASE myapp SET PRIMARY REGION "us-east-1";
ALTER DATABASE myapp ADD REGION "us-west-2";
ALTER DATABASE myapp ADD REGION "eu-west-1";

-- REGIONAL BY ROW: each row has a crdb_region column controlling placement
ALTER TABLE orders SET LOCALITY REGIONAL BY ROW;

-- GLOBAL: replicated everywhere, fast reads from any region, slower writes
ALTER TABLE config SET LOCALITY GLOBAL;

-- REGIONAL BY TABLE: entire table pinned to one region
ALTER TABLE audit_log SET LOCALITY REGIONAL BY TABLE IN "us-east-1";
```

`REGIONAL BY ROW` adds a hidden `crdb_region` column and places leaseholders in the row's region for sub-10ms local reads. Writes still require cross-region Raft consensus (typically 60-100ms inter-continental).

The `SURVIVE REGION FAILURE` setting requires at least 3 regions and 5 replicas:

```sql
ALTER DATABASE myapp SURVIVE REGION FAILURE;
```

This increases storage cost but guarantees the database remains available even if an entire region goes offline.

