---
title: "Cloudflare Vectorize Id 64-Byte Limit: The Hash-with-Metadata-Roundtrip Pattern"
description: "Vectorize caps vector ids at 64 bytes (not chars). The fix is SHA-256 hex hashing with the original id preserved in metadata so query results round-trip back to your source-of-truth table. Includes the exact partial-failure mode and a one-shot orphan cleanup endpoint."
url: https://agent-zone.ai/knowledge/serverless/vectorize-id-64-byte-limit-hash-pattern/
section: knowledge
date: 2026-05-20
categories: ["serverless"]
tags: ["cloudflare","vectorize","embeddings","data-modeling","production-gotcha","id-strategy"]
skills: ["vectorize-index-design","embedding-pipeline-development","production-debugging"]
tools: ["vectorize","cloudflare-workers","workers-ai","typescript"]
levels: ["intermediate"]
word_count: 826
formats:
  json: https://agent-zone.ai/knowledge/serverless/vectorize-id-64-byte-limit-hash-pattern/index.json
  html: https://agent-zone.ai/knowledge/serverless/vectorize-id-64-byte-limit-hash-pattern/?format=html
  api: https://api.agent-zone.ai/api/v1/knowledge/search?q=Cloudflare+Vectorize+Id+64-Byte+Limit%3A+The+Hash-with-Metadata-Roundtrip+Pattern
---


# Cloudflare Vectorize Id 64-Byte Limit

Cloudflare Vectorize caps vector ids at **64 BYTES**, not 64 characters. The naive `if id.length <= 64` skip-hashing check passes Unicode through and then fails at upsert time. The right pattern is unconditional SHA-256 hex hashing with the original id stored in metadata so query results round-trip back to your source-of-truth row.

## TL;DR

- The limit is **64 bytes**, not 64 chars. Multibyte UTF-8 hits it sooner than ASCII.
- Always hash the id. Never branch on length.
- Put the original id in `metadata.id`. Resolve back at query time.
- A single oversized id fails the WHOLE batch — partial-success semantics.

## The error

```
VECTOR_UPSERT_ERROR (code = 40008): id too long; max is 64 bytes, got 67 bytes
```

This is a 4xx-class refusal at the upsert API. One bad id in a `vectorize.upsert([...])` batch rejects every vector in the call — it is not partial-success-with-warnings. If you batch 100 vectors and one has a 67-byte id, all 100 silently fail to land.

## The wrong "fix"

```ts
// BROKEN — String.length counts UTF-16 code units, not bytes
async function vectorId(id: string): Promise<string> {
  if (id.length <= 64) return id;
  // ... hash ...
}
```

Why it breaks:

- 64 CJK chars in UTF-8 = up to 192 bytes. Passes the `length` check, fails the upsert.
- Emoji and combining marks: same story. Surrogate pairs hide additional bytes from `length`.
- Two id formats now coexist in your index. Migrate the threshold later and you create orphans + dups.

## The right fix — always hash

```ts
async function vectorId(id: string): Promise<string> {
  const digest = await crypto.subtle.digest(
    "SHA-256",
    new TextEncoder().encode(id),
  );
  return [...new Uint8Array(digest)]
    .map((b) => b.toString(16).padStart(2, "0"))
    .join("");
  // 64 hex chars = 64 ASCII bytes, always within the limit.
}
```

Deterministic. ASCII-only output. No multibyte trap. Costs ~5µs per id on a Worker — irrelevant next to the embedding call.

## The metadata round-trip

Vectorize accepts a `metadata` blob per vector. Put the original id there so query results can find the source row in your D1 (or whatever) table:

```ts
const vectors = await Promise.all(rows.map(async (r, j) => ({
  id: await vectorId(r.id),
  values: embeddings[j],
  metadata: { id: r.id, section: r.section },
})));
await env.VECTOR.upsert(vectors);
```

At query time, resolve back to the original id:

```ts
const result = await env.VECTOR.query(qEmbedding, {
  topK: 30,
  returnMetadata: true,
});
const ids = result.matches.map(
  (m) => ((m.metadata as { id?: string })?.id) ?? m.id,
);
// Now SELECT * FROM content WHERE id IN (...) by these ids.
```

The `?? m.id` fallback covers vectors written under any earlier scheme where you didn't yet store the original.

## Dedup if you migrated

If you previously used "original slug when short, hash when long" and switched to always-hash, the same article may exist twice in the index — once under its slug, once under its hash. Dedup at query time by metadata.id:

```ts
const seen = new Map<string, number>();
for (const m of result.matches) {
  const id = ((m.metadata as { id?: string })?.id) ?? m.id;
  if (!seen.has(id)) seen.set(id, m.score);
}
```

Higher-scored copy wins (the iteration order from Vectorize is already by score desc).

## Cleanup orphans (one-shot)

To physically evict the old-scheme ids, `deleteByIds` with the list of originals that were ≤64 chars under the old branch:

```ts
const orphanIds = (await env.DB.prepare(
  "SELECT id FROM content_search WHERE LENGTH(id) <= 64"
).all()).results.map((r) => r.id as string);

for (const batch of chunk(orphanIds, 100)) {
  await env.VECTOR.deleteByIds(batch);
}
```

Vectorize is eventually consistent. `vectorCount` from `describe()` may lag the delete by several minutes. Don't gate your deploy on the count returning to the expected value within seconds.

## Why 64 bytes

The docs don't justify it. Plausible reasons: leveldb-style key sizing in the index storage layer, parity with other CF KV-like products (Workers KV keys are 512 bytes; Vectorize is tighter), or page-alignment of the id column in the underlying store. It is not configurable. It is not negotiable. Build for it.

## Reference

Implemented in `agent-zone@69a9e89` — admin reindex endpoint hashes ids, preserves originals in metadata, includes the cleanup helper. Of 456 article slugs, 7 exceeded 64 chars. The first deploy used the broken `id.length <= 64` skip and silently dropped those 7. The second deploy with always-hash captured all 456.

## Common Mistakes

**Trusting `String.length`** as a byte count. It is a UTF-16 code-unit count. Use `new TextEncoder().encode(s).byteLength` if you ever need a real byte length — but for vector ids, just hash unconditionally and skip the question.

**Forgetting `returnMetadata: true`** on query. Without it, `m.metadata` is `undefined` and your round-trip silently falls through to the hash. Your search results "work" but every id is a 64-char hex string instead of your slug.

**Storing the embedding model name only in the index**. If you rotate models, you need to know which vectors are from which model. Add `model: "@cf/baai/bge-base-en-v1.5"` to metadata too, alongside the id.

**Assuming partial-success on batch upsert.** One 65-byte id in a 100-vector batch rejects all 100. Validate (or hash) every id before the batch leaves the Worker.

If you see code 40008 in production, this is the pattern.

