Stateful Workload Disaster Recovery: Storage Replication, Database Operators, and Restore Ordering

Stateful Workload Disaster Recovery#

Stateless workloads are easy to recover – redeploy from Git and they are running. Stateful workloads carry data that cannot be regenerated. Databases, message queues, object stores, and anything with a PersistentVolume needs a deliberate DR strategy that goes beyond “we have Velero.”

The fundamental challenge: you must capture data at a point in time where the application state is consistent, replicate that data to a recovery site, and restore it in the correct order. Get any of these wrong and you recover corrupted data or a broken dependency chain.

StatefulSets and Persistent Storage: Stable Identity, PVCs, and StorageClasses

StatefulSets and Persistent Storage#

Deployments treat pods as interchangeable. StatefulSets do not – each pod gets a stable hostname, a persistent volume, and an ordered startup sequence. This is what you need for databases, message queues, and any workload where identity matters.

StatefulSet vs Deployment#

FeatureDeploymentStatefulSet
Pod namesRandom suffix (web-api-6d4f8)Ordinal index (postgres-0, postgres-1)
Startup orderAll at onceSequential (0, then 1, then 2)
Stable network identityNoYes, via headless Service
Persistent storageShared or nonePer-pod via volumeClaimTemplates
Scaling downRemoves random podsRemoves highest ordinal first

Use StatefulSets when your application needs any of: stable hostnames, ordered deployment/scaling, or per-pod persistent storage. Common examples: PostgreSQL, MySQL, Redis Sentinel, Kafka, ZooKeeper, Elasticsearch.

Velero Backup and Restore: Disaster Recovery for Kubernetes

Velero Backup and Restore#

Velero backs up Kubernetes resources and persistent volume data to object storage. It handles scheduled backups, on-demand snapshots, and restores to the same or a different cluster. It is the standard tool for Kubernetes disaster recovery.

Velero captures two things: Kubernetes API objects (stored as JSON) and persistent volume data (via cloud volume snapshots or file-level backup with Kopia).

Installation#

You need an object storage bucket (S3, GCS, Azure Blob, or MinIO) and write credentials.

Choosing a Kubernetes Backup Strategy: Velero vs Native Snapshots vs Application-Level Backups

Choosing a Kubernetes Backup Strategy#

Kubernetes clusters contain two fundamentally different types of state: cluster state (the Kubernetes objects themselves – Deployments, Services, ConfigMaps, Secrets, CRDs) and application data (the contents of Persistent Volumes). A complete backup strategy must address both. Most backup failures happen because teams back up one but not the other, or because they never test the restore process.

What Needs Backing Up#

Before choosing tools, inventory what your cluster contains:

Choosing Kubernetes Storage: Local vs Network vs Cloud CSI Drivers

Choosing Kubernetes Storage#

Storage decisions in Kubernetes are harder to change than almost any other architectural choice. Migrating data between storage backends in production involves downtime, risk, and careful planning. Understand the tradeoffs before provisioning your first PersistentVolumeClaim.

The decision comes down to five criteria: performance (IOPS and latency), durability (can you survive node failure), portability (can you move the workload), cost, and access mode (single pod or shared).

Storage Categories#

Block Storage (ReadWriteOnce)#

Block storage provides a raw disk attached to a single node. Only one pod on that node can mount it at a time (ReadWriteOnce). This is the most common storage type for databases, caches, and any workload that needs fast, consistent disk I/O.

Minikube Storage: PersistentVolumes, StorageClasses, and Data Persistence Patterns

Minikube Storage: PersistentVolumes, StorageClasses, and Data Persistence#

Minikube ships with a built-in storage provisioner that handles PersistentVolumeClaims automatically. Understanding how it works – and where it differs from production storage – is essential for testing stateful workloads locally.

Default Storage: The hostPath Provisioner#

When you start minikube, it registers a default StorageClass called standard backed by the k8s.io/minikube-hostpath provisioner. This provisioner creates PersistentVolumes as directories on the minikube node’s filesystem.