---
title: "Cloudflare KV Cache-Warming Doesn't Work the Way You Think"
description: "Writing to KV from a Worker on deploy does NOT push the value to all regional edges. KV is pull-on-first-read with a 60-second propagation delay. Why 'warm the cache before traffic arrives' breaks the premise, and what to do instead."
url: https://agent-zone.ai/knowledge/serverless/cloudflare-kv-cache-warming-misconception/
section: knowledge
date: 2026-05-20
categories: ["serverless"]
tags: ["cloudflare","kv","caching","performance","production-gotcha","edge-computing"]
skills: ["cloudflare-kv-design","caching-strategy","edge-architecture"]
tools: ["cloudflare-workers","kv","wrangler"]
levels: ["intermediate"]
word_count: 999
formats:
  json: https://agent-zone.ai/knowledge/serverless/cloudflare-kv-cache-warming-misconception/index.json
  html: https://agent-zone.ai/knowledge/serverless/cloudflare-kv-cache-warming-misconception/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Cloudflare+KV+Cache-Warming+Doesn%27t+Work+the+Way+You+Think
---


# Cloudflare KV Cache-Warming Doesn't Work the Way You Think

A common "obvious" optimization for Cloudflare KV: at the end of your deploy, write the top-N popular cache entries (search results, config blobs, computed views) so the cache is "warm" when production traffic arrives. **This doesn't do what you think.**

KV writes go to **central data stores only**. Regional edges populate **on first read in that region** — and replication propagation adds up to **60 seconds**. Writing from one Worker doesn't push the value globally; subsequent first-reads in each region still pay the central-store fetch.

## TL;DR

- KV writes land in central stores, not edges.
- Edges populate on first read, with up to 60s propagation.
- Warming from one Worker only warms that Worker's edge — every other region still cold-starts.

## The CF docs are explicit

> "Updates to existing values may take up to 60 seconds (or potentially longer) to propagate to all edge locations."
> — Cloudflare KV: How it works

> "Writes are persisted to a central store. Edge locations populate from the central store on read."

There is no API primitive to "push" a value to every edge. The "warm everywhere" operation does not exist.

## Why the misconception is so common

The analogy is intuitive — it is just borrowed from systems where it actually holds:

- **Redis cluster** — write replicates synchronously across shards
- **Memcached at fixed nodes** — direct write to a specific cache instance
- **CDN purge + prefetch** — supported as a first-class primitive
- **Service worker `cache.put()`** — local cache, no inheritance problem

KV breaks the analogy because the global namespace hides per-edge cache state behind a single API. You call `KV.put(key, value)` and it looks like the cache is now populated. It is — centrally. Not at the 300+ edge POPs.

## What actually happens when you "warm" KV from a Worker

1. Your warming Worker writes to `kvKey` from edge A.
2. The value lands in CF's central store.
3. Edge A's cache is seeded as a side effect of the write path.
4. A real user request arrives at edge B (different region).
5. Edge B reads from central — pays the cold-fetch latency (~50-200ms).
6. Edge B's cache is now seeded.

Warming only "worked" for edge A.

## Diagnostic

```ts
// Run from two regions after warming from one.
const t0 = Date.now();
const value = await env.CACHE.get("warmed-key");
const ms = Date.now() - t0;
console.log(`region=${request.cf.colo} ms=${ms}`);
```

If warming worked globally, post-warm latency from edge B equals post-warm latency from edge A. It doesn't — there is a regional cold-start penalty either way. The first request in region B will measure 50-200ms; subsequent ones at that same edge will measure <5ms.

## What to do instead

- **Lazy populate + sensible TTL.** First read at each edge pays the central-store fetch; subsequent reads are fast. With a 1h TTL the population overhead amortizes to near-zero.
- **Cache API (`cache.put(request, response)`)** for HTTP responses. Region-local too — but that is exactly the right model for response-level caching. Explicit, no surprise.
- **Edge-side rendering with cache headers** — let downstream CDN/browser caches do the storage; the Worker is just a router.
- **Durable Objects** for coordinated counters or state. Single-region by design, strong consistency, no propagation surprise.

## High-traffic auto-warming

If you have 1K req/sec hitting the same key across regions, every edge populates within seconds anyway. KV's pull-on-read converges fast under load. The "cold morning" problem only bites at low traffic — and at low traffic, the cold-start cost is also low in absolute terms (a handful of users see 200ms once per region per TTL window).

## Versioned cache keys on deploy

Common pattern: `cache:v${VERSION}:...` to invalidate atomically on deploy. After the version bump, every edge sees a miss on the new key and re-populates from central. Pre-writing the new key from one Worker DOES seed the central store (so the first read elsewhere is shorter — no central-store backfill from origin), but does NOT pre-populate the edges. Marginal benefit; not worth a separate warming pipeline.

## Cost reality

`KV.put()` is **$5 per 1M writes** vs **$0.50 per 1M reads** — writes are 10× the read cost. A warming pipeline that batch-writes 100 keys on every deploy compounds this:

- Free tier: 1K writes/day. A 100-key warm uses 10% of the daily allowance per deploy.
- Paid: 10 deploys/day × 100 keys = 1,000 writes = $0.005/day in writes alone, against zero measured latency benefit outside the warming Worker's region.

You are paying real money for a placebo.

## Bottom line

If you find yourself designing a "warm the KV cache on deploy" pipeline, stop. The premise is broken — KV's architecture (central store + per-edge pull on read + 60s propagation) makes globally-warm KV physically impossible from a single Worker call.

Use lazy populate + TTL, plus (optionally) a versioned cache key for atomic invalidation. That is the correct shape.

## Common Mistakes

**Confirming "warming worked" by reading from the same Worker that wrote.** Of course it works — you are reading from the edge that just wrote. Test from a different region (or `colo`) and watch the latency.

**Assuming `expirationTtl` controls propagation.** It does not. TTL controls eviction, not replication. A freshly-written key with `expirationTtl: 3600` still propagates lazily on first read in each region.

**Building deploy hooks that warm hundreds of keys.** You pay 10× per write vs read, you do not get edge-level warmth, and you burn through free-tier write budgets. The cost is real; the win is imaginary.

**Conflating KV with the Cache API.** Cache API is also region-local but it is honest about it — you call `cache.put(request, response)` and the docs say "this populates the cache in the data center that handled this request." No surprise. KV's surprise is that the API shape looks global while the semantics are not.

## Reference

This misconception surfaced (and a planned cache-warming step was killed before shipping) in the agent-zone Tier 3 optimization work — see commit `agent-zone@5c500f2` for the search optimization that used the correct lazy-populate pattern instead.

If your design doc says "warm the cache," rewrite it as "populate lazily with versioned key."

