{"page":{"agent_metadata":{"content_type":"reference","outputs":["cloud-dr-strategy","failover-timing-expectations","cost-estimate","automated-failover-decision"],"prerequisites":["cloud-services-basics","database-fundamentals","disaster-recovery-concepts","networking-fundamentals"]},"categories":["databases"],"content_plain":"Cloud Managed Database Disaster Recovery# Every cloud provider offers managed database DR, but the actual behavior during a failure rarely matches the marketing. The documented failover time is the best case. The real failover time includes detection delay, DNS propagation, and connection draining. This guide covers what actually happens.\nAWS: RDS and Aurora# RDS Multi-AZ# RDS Multi-AZ maintains a synchronous standby in a different availability zone. When the primary fails, RDS flips the DNS CNAME to the standby.\nDocumented failover time: 60-120 seconds. Actual failover time: 60-180 seconds. The variance comes from DNS caching (the 5-second TTL may be ignored by connection pools), failure detection delay (5-30 seconds), and crash recovery on the standby.\nMulti-AZ does not protect against region failure. Both AZs are in the same region. Cost: 2x the instance cost. The standby cannot serve reads.\nRDS Cross-Region Read Replicas# For cross-region DR, create a read replica in another region. This uses asynchronous replication, so there is a data loss window. Failover is manual \u0026ndash; you must promote the replica yourself:\naws rds promote-read-replica --db-instance-identifier myapp-dr-west --region us-west-2Promotion takes 5-15 minutes. Your application needs a new connection string pointing to the promoted instance\u0026rsquo;s endpoint. Total real RTO: 10-25 minutes including human decision time.\nCost: Full instance cost for the replica plus cross-region data transfer ($0.02/GB). A busy 500 GB database generates roughly $2,600/month in transfer costs alone.\nAurora Global Database# Aurora Global Database replicates an entire Aurora cluster to up to five secondary regions using dedicated replication infrastructure outside of the database engine.\nDocumented replication lag: Under 1 second typically. Actual replication lag: 100-500ms under normal load. Can spike to 5-10 seconds during heavy write bursts or during Aurora storage scaling events.\nManaged failover (planned): Aurora supports managed planned failover where it promotes a secondary region and demotes the old primary. This takes 1-3 minutes and involves a brief global write outage.\nUnplanned failover (detach and promote): If the primary region is unreachable, you detach the secondary cluster and promote it. This takes 1-2 minutes for the promotion, but the decision to trigger it is on you.\n# Detach and promote secondary region aws rds failover-global-cluster \\ --global-cluster-identifier myapp-global \\ --target-db-cluster-identifier arn:aws:rds:us-west-2:123456789:cluster:myapp-westCost: Full Aurora cluster cost in each region. Aurora storage replication is included in the service. A db.r6g.2xlarge Aurora cluster costs roughly $1,400/month per region. Two regions = $2,800/month minimum for compute alone.\nGCP: Cloud SQL# Cloud SQL HA# Cloud SQL HA uses a regional instance with a standby in a different zone within the same region. Failover is automatic.\nDocumented failover time: Under 60 seconds for most instance sizes. Actual failover time: 30-120 seconds. Smaller instances fail over faster. The failover includes an IP address reassignment (not DNS), which eliminates the DNS propagation problem that plagues RDS.\nCloud SQL Cross-Region Replicas# Promotion is manual:\ngcloud sql instances promote-replica myapp-dr-west --project=my-projectActual promotion time: 5-10 minutes. After promotion, the old primary and the new primary are completely independent \u0026ndash; no automatic reconfiguration.\nCost: Full instance cost in the DR region plus cross-region egress at $0.08-0.12/GB.\nAzure: SQL Database and Cosmos DB# Azure SQL Geo-Replication# Azure SQL Database supports active geo-replication to up to four secondary regions. Each secondary is readable. Failover groups add a listener abstraction \u0026ndash; a single read-write endpoint and a read-only endpoint that automatically update DNS on failover.\naz sql failover-group create \\ --name myapp-fg \\ --server myapp-primary-eastus \\ --resource-group myapp-rg \\ --partner-server myapp-dr-westus \\ --partner-resource-group myapp-dr-rg \\ --failover-policy Automatic \\ --grace-period 60The grace period (in minutes) prevents flapping on transient failures. The default is 60 minutes; set it to 5 for critical workloads.\nActual failover time with failover groups: 30-60 seconds for the database promotion plus the grace period, plus 30-60 seconds for DNS propagation.\nCosmos DB Multi-Region Writes# Cosmos DB supports multi-region writes where every region accepts writes simultaneously. Conflicts use last-write-wins by default or a custom stored procedure.\nFailover time: Near zero \u0026ndash; all regions already accept writes. If a region becomes unreachable, clients redirect via SDK retry logic (10-30 seconds).\nCost: Multi-region writes roughly double your RU cost. A 10,000 RU/s container in two regions costs approximately $1,170/month.\nAWS: DynamoDB Global Tables# DynamoDB Global Tables replicate tables across regions with multi-region writes. Conflict resolution is last-write-wins.\naws dynamodb update-table --table-name Orders \\ --replica-updates \u0026#39;[{\u0026#34;Create\u0026#34;: {\u0026#34;RegionName\u0026#34;: \u0026#34;us-west-2\u0026#34;}}]\u0026#39;Replication lag: Typically under 1 second. DynamoDB publishes a ReplicationLatency CloudWatch metric per region pair.\nFailover: There is no \u0026ldquo;failover\u0026rdquo; because all regions accept writes. If us-east-1 fails, your application in us-west-2 keeps working. You need to route traffic to the healthy region, but the database itself does not need any promotion.\nCost: Replicated write capacity is charged at 1.625x the standard rate. A table doing 1,000 WCU costs $467/month in one region and $759/month replicated to a second region.\nCost Comparison Summary# Service Single-Region HA Cross-Region DR Monthly Cost Premium RDS Multi-AZ 2x instance + replica + transfer 2x-2.5x base Aurora Global DB Included + full cluster per region 2x-3x base Cloud SQL HA ~2x instance + replica + egress 2x-2.5x base Azure SQL + FG Included in tier + secondary DTUs 1.5x-2x base Cosmos DB multi-write N/A (serverless) + RUs per region 2x RU cost DynamoDB Global Tables N/A (serverless) 1.625x WCU 1.6x write cost Automated vs Manual Failover Decisions# Automated failover sounds better, but it introduces the risk of split-brain: both regions think they are primary. Every managed service handles this differently, and not all of them handle it safely.\nAutomate failover when: You have a single-writer architecture, the service guarantees fencing of the old primary (Aurora Global, Azure SQL Failover Groups), and your RPO tolerance exceeds the typical replication lag.\nKeep failover manual when: You have application-level state to coordinate (cache invalidation, queue draining), unpredictable replication lag, or the cost of a false positive exceeds a few extra minutes of downtime. Most teams start manual and automate only after doing it manually at least three times.\n","date":"2026-02-22","description":"Disaster recovery options for cloud managed databases — RDS Multi-AZ, Aurora Global Database, Cloud SQL HA and cross-region replicas, Azure SQL geo-replication, Cosmos DB multi-region writes, DynamoDB Global Tables — with real failover timings, cost comparisons, and automation decisions.","lastmod":"2026-02-22","levels":["intermediate","advanced"],"reading_time_minutes":5,"section":"knowledge","skills":["cloud-database-architecture","disaster-recovery-planning","cost-optimization","failover-management"],"tags":["disaster-recovery","rds","aurora","cloud-sql","azure-sql","cosmos-db","dynamodb","multi-az","cross-region","failover","aws","gcp","azure"],"title":"Cloud Managed Database Disaster Recovery","tools":["aws-cli","gcloud","az-cli","terraform"],"url":"https://agent-zone.ai/knowledge/databases/cloud-managed-database-dr/","word_count":984}}