---
title: "Cloudflare Search Optimization: A Tiered Methodology (App -> Schema -> Platform)"
description: "Three-tier framework for taking a Cloudflare Worker + D1 + KV search endpoint from 677ms to 355ms average, then unlocking platform-level wins with Smart Placement + D1 read replicas + Vectorize. Each tier scoped by cost (free / app-level / platform-feature-research-gated)."
url: https://agent-zone.ai/knowledge/serverless/cloudflare-search-optimization-tiered-methodology/
section: knowledge
date: 2026-05-20
categories: ["serverless"]
tags: ["cloudflare","cloudflare-workers","d1","kv","vectorize","search-optimization","fts5","smart-placement","latency","performance"]
skills: ["api-performance-tuning","cloudflare-workers-development","d1-database-optimization","caching-strategy"]
tools: ["cloudflare-workers","d1","kv","vectorize","wrangler","fts5"]
levels: ["intermediate","advanced"]
word_count: 1905
formats:
  json: https://agent-zone.ai/knowledge/serverless/cloudflare-search-optimization-tiered-methodology/index.json
  html: https://agent-zone.ai/knowledge/serverless/cloudflare-search-optimization-tiered-methodology/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Cloudflare+Search+Optimization%3A+A+Tiered+Methodology+%28App+-%3E+Schema+-%3E+Platform%29
---


# Cloudflare Search Optimization: A Tiered Methodology

A CF Workers + D1 + KV search endpoint has three classes of work you can ship to make it faster. They differ by cost-to-ship, not by impact. Order them right and you ship ~50% latency reduction in a day; order them wrong and you burn a week on Vectorize when the real win was a `SELECT *` you forgot to trim.

This page is the methodology, observed end-to-end on `api.agent-zone.ai/api/v1/knowledge/search` going from a 677ms baseline to 355ms then unlocking platform-level scale. Each tier is `scope -> moves -> measured impact -> shipped commit`.

## TL;DR — pick a tier in 30 seconds

- **Start at Tier 1.** Pure TS/SQL inside the Worker. No infra. Same-day PR. Always the biggest per-hour win.
- **Go to Tier 2** once Tier 1 is shipped and you still see cold paths >500ms. Pure SQLite + HTTP-standard work (bm25, ETag/304, narrow JOIN tables).
- **Go to Tier 3** only after Tier 1+2 and only if your callers are geographically distributed OR you need semantic recall. Workers Paid tier required for some moves. Mostly $0 incremental, but research-gated.
- **Do NOT reach for Tier 3 first.** At low traffic, structural latency lives in Tier 1. Smart Placement, replicas, and Vectorize do not fix a 50-row JSON payload you should not be sending.
- **Do NOT trust KV cache-warming.** Writes do not propagate to edges; the cache populates on read, per-colo. Plan for cold misses.

What to grep for before starting work:

- `SELECT *` or any column-list including `content_plain` in your search query
- Cache keys built from raw `q` without normalization (`q.trim().toLowerCase()`)
- TTLs <600s on responses that change only on content deploy
- `ORDER BY rank` without bm25 column weights
- Missing `ETag` / `If-None-Match` handling on read endpoints
- Single-region D1 with non-WNAM caller distribution

## Baseline before you cut code

Pull the per-request latency distribution before touching anything. If it's a tight band (say 587-882ms across 10 reqs, all `agent-zone-mcp/0.1.0`), the latency is structural — Tier 1 will move it. If it's outlier-dominated (9 fast + 1 cold), focus on TTL alone.

```sql
SELECT
  endpoint,
  COUNT(*) AS n,
  AVG(response_time_ms) AS avg_ms,
  MIN(response_time_ms) AS min_ms,
  MAX(response_time_ms) AS max_ms
FROM request_log
WHERE endpoint LIKE '/api/v1/knowledge/search%'
  AND timestamp > strftime('%s', 'now', '-7 days') * 1000
GROUP BY endpoint;
```

Observed baseline: 677ms avg, 587-882ms range, 0% effective cache hits.

## Tier 1 — App-only, same-day, free

Pure TS/SQL changes inside the Worker. No new infra, no new bindings, no platform-feature research. Ship as one PR. After Tier 1: 355ms avg (-48%), 201-697ms range, ~68% cache hits at ~210ms each.

### T1-A: drop `content_plain` from the search SELECT

A search response should not carry full article bodies. Each row of `content_index` averages ~8KB of `content_plain`; multiply by `limit=20` and you are serializing 160KB of body content into a result list nobody asked for. Use FTS5's `snippet()` instead and redirect full-body readers to `get_article`.

```sql
-- before:
SELECT id, title, description, content_plain, ...
FROM content_index JOIN content_fts ON ...

-- after:
SELECT
  ci.id, ci.title, ci.description,
  snippet(content_fts, -1, '<mark>', '</mark>', '...', 32) AS snippet
FROM content_index ci
JOIN content_fts ON content_fts.rowid = ci.rowid
WHERE content_fts MATCH ?
```

The snippet is highlighted, capped at ~32 tokens, and rendered server-side once instead of on every consumer.

### T1-B: normalize `q` before the cache key

`"ArgoCD"`, `"argocd"`, and `"  argocd  "` are the same FTS5 query (FTS5 is case-insensitive) but three different cache entries. Normalize once at the entry point and your hit rate jumps without any new cache logic.

```ts
const qNorm = (url.searchParams.get("q") ?? "")
  .trim()
  .toLowerCase()
  .replace(/\s+/g, " ");
const cacheKey = `search:v2:${qNorm}:${category}:${limit}`;
```

The `v2` suffix is the invalidation lever for T1-D below.

### T1-C: cache the formatted JSON, not raw D1 rows

Storing rows means you re-run the response-shaping code (snippet HTML, URL building, frontmatter merging) on every hit. Store the final `Response` body bytes in KV and serve them directly. Include `format` and a `content_version` stamp in the cache key so a content deploy invalidates cleanly.

```ts
const cached = await env.CACHE.get(cacheKey, "stream");
if (cached) {
  return new Response(cached, {
    headers: { "content-type": "application/json", "x-cache": "HIT" },
  });
}
const body = JSON.stringify(buildResponse(rows));
ctx.waitUntil(env.CACHE.put(cacheKey, body, { expirationTtl: 3600 }));
```

### T1-D: bump TTL with deploy-time invalidation

Default 300s TTL on a content set that changes every few hours is leaving free hits on the table. Bump to 3600s and bump `content_version` at sync time so a deploy doesn't serve stale rows.

```bash
# in sync-content.ts post-success hook:
npx wrangler kv:key put --binding=CACHE \
  content_version "$(date +%s)"
```

Read it once per request and fold into the cache key. Shipped: `agent-zone@8da5049`.

## Tier 2 — Bigger app changes, no platform features

Still pure SQLite + HTTP-standard work. Bigger PR than Tier 1, no new bindings. After Tier 2: cold-path -10% (608ms avg), ETag 304s return 0 bytes, bm25 visibly improves relevance.

### T2-A: bm25 column weights

FTS5's default `ORDER BY rank` weights every column equally. For a search index with title, description, content, and facets, that under-weights titles and over-weights body content. Use `bm25()` with explicit weights matching the FTS5 column order:

```sql
SELECT ci.id, ci.title, ci.description,
  snippet(content_fts, -1, '<mark>', '</mark>', '...', 32) AS snippet,
  bm25(content_fts, 1.0, 5.0, 3.0, 1.0, 2.0, 2.0, 1.5, 1.5) AS score
FROM content_index ci
JOIN content_fts ON content_fts.rowid = ci.rowid
WHERE content_fts MATCH ?
ORDER BY score
LIMIT ?
```

The weights map to FTS5 columns in the order declared at `CREATE VIRTUAL TABLE` time — get the order wrong and you boost `slug` instead of `title`. Verify with `PRAGMA table_info(content_fts);` first. Title 5x, description 3x, facets 2x is the production set.

### T2-B: weak ETag + 304

A bandwidth win, not a latency win when network RTT dominates — but free to ship and removes 100% of cold-path body bytes on conditional requests. Use FNV-1a 32-bit, NOT SHA-256 (`crypto.subtle.digest('SHA-256', ...)` is ~50x slower per request than a 9-line FNV loop).

```ts
function fnv1a32(s: string): string {
  let h = 0x811c9dc5;
  for (let i = 0; i < s.length; i++) {
    h ^= s.charCodeAt(i);
    h = (h + ((h << 1) + (h << 4) + (h << 7) + (h << 8) + (h << 24))) >>> 0;
  }
  return `W/"${h.toString(16)}"`;
}

const etag = fnv1a32(body);
if (request.headers.get("if-none-match") === etag) {
  return new Response(null, { status: 304, headers: { etag } });
}
return new Response(body, { headers: { "content-type": "application/json", etag } });
```

The `W/` prefix marks it weak; correct for content where byte-equality is not guaranteed across runtimes.

### T2-C: narrow `content_search` table

A physical clone of `content_index` minus `content_plain`. JOIN row width drops ~13x (~8KB -> ~600B). At 456 rows the gain is modest; pays off as content grows past ~5K rows. Dual-write in `sync-content`:

```sql
CREATE TABLE content_search (
  id TEXT PRIMARY KEY,
  rowid INTEGER UNIQUE,
  title TEXT, description TEXT, category TEXT,
  tags_json TEXT, weight INTEGER, updated_at INTEGER
  -- no content_plain
);
```

```ts
// in sync-content.ts after content_index INSERT:
await db.prepare(
  `INSERT INTO content_search (id, rowid, title, description, category, tags_json, weight, updated_at)
   VALUES (?,?,?,?,?,?,?,?) ON CONFLICT(id) DO UPDATE SET ...`
).bind(...).run();
```

Search reads from `content_search`; `get_article` reads from `content_index`. Shipped: `agent-zone@6761675`.

## Tier 3 — Platform features, research-gated

Mostly $0 incremental but each move requires reading platform docs, sometimes a token-scope refresh, and at least one of them (T3-C) is API-only (no `wrangler.jsonc` knob). After Tier 3-A and 3-C: same WNAM-client latency, but non-WNAM callers stop paying ocean RTT. After Tier 3-B: opt-in semantic search at <$0.10/mo.

### T3-A: Smart Placement

One line in `wrangler.jsonc`. Free on all Workers plans, including Free tier.

```jsonc
{
  "name": "agent-zone-api",
  "main": "src/index.ts",
  "placement": { "mode": "smart" }
}
```

CF places Worker invocations near the binding (D1 primary, in most cases) instead of near the client, when its model predicts that's faster. Verify via the `cf-placement: local-<COLO>` or `cf-placement: remote-<COLO>` response header. It does NOT help callers already in the same region as your D1 primary — those calls were already optimal. It helps everyone else, and only after a few hours of "learning" traffic.

### T3-C: D1 read replicas via Sessions API

D1 read replicas are enabled per-database, not per-binding. Hit the REST API to flip the mode (the wrangler CLI did not expose this knob as of ship time):

```bash
curl -X PATCH "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/d1/database/$DB_ID" \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{"read_replication":{"mode":"auto"}}'
```

CF auto-creates 6 regional replicas (ENAM, WNAM, WEUR, EEUR, APAC, OC). In the Worker, use the Sessions API instead of raw `env.DB.prepare`:

```ts
const session = env.DB.withSession("first-unconstrained");
const rows = await session.prepare(SEARCH_SQL).bind(q).all();
```

`first-unconstrained` reads from the nearest replica and accepts up to ~60s of read-after-write lag. For an endpoint where data changes hours apart (deploy-time only), that's safe and you skip bookmark threading entirely. For write-followed-by-read paths, pass the bookmark from the prior write.

Token requirement: your CF API token needs `D1:Edit` scope. The default Pages-deploy token does not.

### T3-B: Vectorize semantic search (opt-in)

Don't replace FTS5 — add a `?engine=vector` flag and let consumers A/B. FTS5 is fine for most queries and free; vector cost is real per-query.

```ts
if (url.searchParams.get("engine") === "vector") {
  const queryEmbedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
    text: [qNorm],
  });
  const matches = await env.VECTORIZE.query(queryEmbedding.data[0], {
    topK: limit,
    returnMetadata: "all",
  });
  return jsonResponse(matches);
}
```

Seeding 456 articles with bge-base embeddings: ~$0.03 one-time. Steady-state at agent-zone traffic: <$0.10/mo. Vectorize IDs are capped at 64 bytes (UTF-8) — use a short hash, not the full slug. Store the full slug + title in metadata so you can roundtrip back to `content_index` for a full render. Shipped: `agent-zone@5c500f2` (T3-A + T3-C), `agent-zone@69a9e89` (T3-B).

## What to measure before each tier

- **Pre-Tier-1**: per-request latency distribution. Tight band → Tier 1 is structural and helps. Outlier-dominated → bump TTL first, defer the rest.
- **Pre-Tier-2**: cache-hit rate after Tier 1. If hits are >80% of traffic and cold-path latency is acceptable, Tier 2 is premature optimization.
- **Pre-Tier-3**: caller geography (parse `cf-connecting-ip` ranges or use Workers Analytics country breakdown) and confirm Workers Paid tier. Replicas + Smart Placement mostly benefit non-primary-region callers; if 95% of your traffic is WNAM and your D1 is WNAM, skip 3-A and 3-C.

## What NOT to do

- Reach for Tier 3 first. The structural latency at low traffic lives in Tier 1. A 50-row response with 160KB of unwanted `content_plain` is not fixed by Smart Placement.
- Expect KV cache-warming to work cluster-wide. KV writes do not propagate to edge colos; each colo populates on first read. Plan for cold misses per-region.
- Use SHA-256 for ETags. The 50x slowdown vs FNV-1a is not worth the collision resistance you don't need.
- Over-engineer measurement. CF GraphQL Analytics + a per-request `request_log` row cover 95% of optimization decisions. Don't add OpenTelemetry, span propagation, or a third-party tracing vendor to make a search endpoint faster.
- Mix bm25 column weights without checking FTS5 column order. The weights are positional; get them wrong and you boost `slug` instead of `title`.

## Bottom line / decision framework

If you're staring at a slow Cloudflare search endpoint right now, start with Tier 1 — it is the only tier whose per-hour ROI is guaranteed regardless of traffic shape, and it shipped a 48% improvement in one PR on the reference workload. Move to Tier 2 only after Tier 1 is live and cold-path latency is still your top complaint; bm25 weighting and ETags pay off but they're a smaller delta. Save Tier 3 for after you've confirmed Workers Paid tier AND that your callers span regions — Smart Placement and replicas help non-primary-region callers, and Vectorize is worth it only when keyword recall is provably failing. The methodology is "cheap → app → platform", and skipping the cheap layer is the most expensive mistake.