---
title: "FTS5 vs Cloudflare Vectorize: A/B Results on When Keyword Beats Semantic Search"
description: "Side-by-side ranking comparison across 5 query patterns on the same 456-article corpus. FTS5 wins on exact-keyword queries; Vectorize wins on conceptual/thematic queries. Concrete pattern for shipping both via an opt-in ?engine=vector flag."
url: https://agent-zone.ai/knowledge/serverless/fts5-vs-vectorize-when-each-wins/
section: knowledge
date: 2026-05-20
categories: ["serverless"]
tags: ["fts5","vectorize","cloudflare","semantic-search","full-text-search","bm25","embeddings","bge-base-en","search-relevance"]
skills: ["search-engine-selection","embedding-pipeline-design","ab-testing-search-relevance"]
tools: ["fts5","sqlite","vectorize","workers-ai","cloudflare-workers","d1"]
levels: ["intermediate","advanced"]
word_count: 1627
formats:
  json: https://agent-zone.ai/knowledge/serverless/fts5-vs-vectorize-when-each-wins/index.json
  html: https://agent-zone.ai/knowledge/serverless/fts5-vs-vectorize-when-each-wins/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=FTS5+vs+Cloudflare+Vectorize%3A+A%2FB+Results+on+When+Keyword+Beats+Semantic+Search
---


# FTS5 vs Cloudflare Vectorize

The "FTS5 vs vectors" debate is usually hand-wavy. Both sides cite plausible reasons, neither runs the same queries through both engines on the same corpus, and the conclusion is whichever one the author shipped. With identical data and identical queries you can measure exactly where each wins.

The result: FTS5 and Vectorize have non-overlapping strengths. The right answer for most knowledge-base workloads is "ship both" behind an opt-in flag — not pick one. This page is the measurements, the cost math, and the dual-engine pattern.

## TL;DR — pattern-match before reading

- Exact keyword, technical proper noun, slug match → **FTS5**
- Conceptual query, synonym tolerance, intent match → **Vectorize**
- Same top hit, different order → tie; FTS5 is cheaper and faster
- Ship both behind `?engine=vector`; default stays on FTS5 until you can measure conceptual queries dominate
- Embed `title + description` (not full body); cost ~$0.003 to seed 456 articles
- Store minimal metadata `{id, section}` in the vector index; fetch the rest from D1 by id

## The methodology

Same 456-article infrastructure-knowledge corpus. Same 5 query patterns. Same Worker. Dual-write: every article that lands in `content_index` is also embedded via Workers AI `bge-base-en-v1.5` and upserted into Vectorize. The Worker exposes `/api/v1/knowledge/search?q=...` on FTS5 by default and `?q=...&engine=vector` for the semantic path. Top-3 results compared side by side.

This beats opinion-based debates because the only variable is the ranking algorithm. The data, the query strings, the chunking, and the surface area are all held constant. Reference shipped commit: `agent-zone@69a9e89` (T3-B Vectorize semantic search opt-in).

## The numbers

| Query | FTS5 top 3 | Vectorize top 3 (cosine) | Winner |
|---|---|---|---|
| `kubernetes networking` | GKE Networking; Terraform Networking Patterns; EKS Networking | Multi-Cluster Kubernetes (0.809); Debugging K8s Network Connectivity (0.783); Running 7 Helm Services (0.766) | Vector — more synthesis |
| `gitops continuous deployment` | GitOps and IaC; Choosing a GitOps Tool; ArgoCD on Minikube | Choosing a GitOps Tool (0.738); ArgoCD on Minikube (0.732); GitOps and IaC (0.718) | Tie — same set, different order |
| `rate limiting` | Rate Limiting Implementation Patterns; Securing K8s Ingress; API Gateway Patterns | Rate Limiting Implementation Patterns (0.740); Cost-Per-Pass for Autonomous Agents (0.652); Securing K8s Ingress (0.637) | FTS — direct keyword |
| `secret rotation` | Secrets Rotation Patterns; Secrets Management in Minikube; Secrets in CI/CD | Secrets Rotation Patterns (0.741); Secrets Management in Minikube (0.694); Secrets Management Decision Framework (0.656) | Tie on #1; vector broader |
| `observability metrics` | Pipeline Observability; OpenTelemetry for K8s; Observability Stack Troubleshooting | Setting Up Full Observability from Scratch (0.796); Pipeline Observability (0.773); Real User Monitoring (0.737) | Vector — better intent match |

Stack: `bge-base-en-v1.5` embeddings (768 dim), cosine metric, Vectorize REST upsert. FTS5 with custom bm25 weights (title 5x, description 3x, body 1x, facets 2x). Embed source: title + description only — 10x faster and cheaper than full-body embedding, and the hand-curated summary is the strongest single semantic signal in the corpus.

## Where FTS5 wins

Exact-keyword queries. Technical proper nouns. Slug matches. Anything where the user (or agent) already knows the term and wants matches that contain it. FTS5's bm25 is a precision instrument when the query terms exist in the documents; vectors smear that precision into a similarity gradient that can rank a synonymous-but-tangential article above the exact-match one.

`rate limiting` is the canonical case in the table above. FTS5 puts two articles that contain the literal phrase in slots 2-3; Vectorize puts an autonomous-agent cost article in slot 2 because the embedding model decided "rate limiting" is semantically near "cost-per-pass". For an agent dispatching against a backlog item titled "implement rate limiting on /api/v1", that is the wrong ranking.

The query shape that runs the FTS5 path:

```sql
SELECT cs.id, cs.title, cs.url,
  snippet(content_fts, -1, '<mark>', '</mark>', '...', 32) as snippet
FROM content_fts
JOIN content_search cs ON cs.id = content_fts.id
WHERE content_fts MATCH ?
ORDER BY bm25(content_fts, 1.0, 5.0, 3.0, 1.0, 2.0, 2.0, 1.5, 1.5)
LIMIT 10;
```

Latency: 5-15ms on D1, no external call, no embedding round-trip. The bm25 weights are tunable per-column and the snippet output drops straight into a UI or an agent's tool result.

## Where Vectorize wins

Conceptual queries. Synonyms. Intent-based search. "Find me articles related to X" patterns. The `kubernetes networking` row is the canonical case: FTS5 returns three cloud-vendor-specific networking articles because they each contain "kubernetes" and "networking" verbatim. Vectorize returns multi-cluster, debugging, and helm-services articles — none of which contain the literal query terms, all of which are what a human or agent actually wants when they ask the question.

The embedding pipeline:

```ts
// Workers AI batch embed
const resp = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
  text: [`${article.title}\n${article.description}`],
});
const vector = resp.data[0]; // 768-dim float array

// Upsert into Vectorize with minimal metadata
await env.VECTORIZE.upsert([{
  id: article.id,
  values: vector,
  metadata: { id: article.id, section: article.section },
}]);
```

Query:

```ts
const qvec = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: [q] });
const matches = await env.VECTORIZE.query(qvec.data[0], { topK: 10 });
const ids = matches.matches.map(m => m.id);
// Hydrate from D1 by id
const rows = await env.DB
  .prepare(`SELECT id, title, url, description FROM content_index WHERE id IN (${ids.map(() => "?").join(",")})`)
  .bind(...ids)
  .all();
```

Latency: 50-150ms cold path (embedding call + vector query + D1 hydrate). Always slower than FTS5 by a constant factor.

## The ship-both pattern

Default stays on FTS5 (cheaper, faster, already proven). Vector is opt-in via a query-string flag. One handler, one cache key namespace, identical response shape:

```ts
export async function searchKnowledge(req: Request, env: Env) {
  const url = new URL(req.url);
  const q = url.searchParams.get("q") ?? "";
  const engine = url.searchParams.get("engine") ?? "fts";

  const results = engine === "vector"
    ? await searchVector(q, env)
    : await searchFts(q, env);

  return Response.json({ engine, query: q, results });
}
```

The two paths return the same `{id, title, url, snippet, score}` shape so the caller doesn't branch. Cache keys include the engine: `search:v1:fts:q=...` vs `search:v1:vec:q=...`. Telemetry tags the engine on every request so you can A/B real traffic later without code change.

## What to embed

Title + description only. Not the full body. Three reasons:

1. **Cost.** 456 articles × ~100 tokens × $0.067/1M = ~$0.003 one-time seed. Full bodies are ~1000+ tokens each — $0.03+ to seed and 10x the recurring cost on update. Either is cheap, but title+description is the floor.
2. **Signal density.** The description is the hand-curated one-line summary of what the article is about. That is exactly the signal you want in an embedding. Full-body chunks dilute it with implementation detail, code blocks, and references.
3. **Latency.** Query-side embedding is a single ~20-token call regardless of doc size, so this only affects upsert cost. But chunking strategy is a downstream complexity tax — title+description sidesteps it entirely.

The matrix above used title+description only and Vector still won the two conceptual queries cleanly. If you can win on the cheaper signal, do that first; chunked-body embedding is a complexity to earn, not adopt by default.

## Vectorize metadata shape

Store the minimum in the vector index. `{id, section}` is enough. Everything else — title, description, URL, tags, dates — stays in D1 / your `content_index` table. The query flow:

1. Embed the query string (Workers AI, ~20 tokens, ~30ms).
2. Vectorize returns top-N ids + cosine scores + the minimal metadata you upserted.
3. Hydrate the full rows from D1 by id in one `SELECT ... WHERE id IN (...)`.

This keeps Vectorize storage tight (which keeps query latency tight), keeps your source-of-truth in one place (D1), and makes the response shape identical to FTS5 mode — same fields, same hydration query. No metadata sync drift, no double-write race on article updates.

## Cost reality at 1K queries/day

| Item | Math | Cost |
|---|---|---|
| Query embedding | 1K × 20 tokens × $0.067/1M × 30 days | ~$0.04/mo |
| Vector queries | 30K × 768 dims = 23M dims/mo (CF Paid free tier: 50M) | $0 |
| Index storage | 456 vectors × 768 dim, well inside free tier | $0 |
| Index seed (one-time) | 456 × 100 tokens × $0.067/1M | ~$0.003 |
| **Total recurring** | | **<$0.10/mo** |

Vectorize on Workers AI is not the cost story people warn you about. At any knowledge-base scale that fits in a single Worker, the dollars are noise. The constraints are latency (every query path adds the embedding round-trip) and operational complexity (a second index to keep in sync with the first).

## What NOT to use Vectorize for

Pure-keyword queries where FTS5 is already perfect. The `rate limiting` row is the worst case — vectors actively make the ranking worse by promoting a tangential cost article above an exact-match implementation guide.

Latency-critical paths. The ~100ms embedding+query overhead is fine for a search box; it is not fine for an autocomplete dropdown or any hot-path call in an agent's tool loop. If you're under 50ms budget, stay on FTS5.

Anywhere relevance quality isn't the actual bottleneck. If users aren't complaining about FTS5 ranking, vectors solve a problem you don't have, while adding embedding cost, a second index to maintain, and a code path to monitor.

## When to default to Vector

Only after you can measure that either (a) vector relevance is consistently better on your real query mix, not on a 5-query A/B table, or (b) your users' queries are mostly conceptual rather than keyword. Until then, FTS5 is the default and Vector is opt-in. The cost of getting the default wrong is silently worse search for the majority of queries while one demo-quality conceptual query looks great.

The opt-in flag lets you collect this data without committing. Log `engine` per query and the click-through (or for agents, which result was actually fetched as a follow-up). If `?engine=vector` traffic has higher fetch-through rates than the default after a few weeks, switch the default.

## Decision checklist

- **Does the query contain a literal term the user expects in the result?** Use FTS5. Vectors will smear precision.
- **Does the query express an intent or concept without naming specific terms?** Use Vectorize. FTS5 will miss synonyms entirely.
- **Don't know which kind of query it is?** Ship both behind a flag, default to FTS5, instrument both paths, and let real traffic decide.

