Cloudflare Vectorize Id 64-Byte Limit: The Hash-with-Metadata-Roundtrip Pattern

Cloudflare Vectorize Id 64-Byte Limit#

Cloudflare Vectorize caps vector ids at 64 BYTES, not 64 characters. The naive if id.length <= 64 skip-hashing check passes Unicode through and then fails at upsert time. The right pattern is unconditional SHA-256 hex hashing with the original id stored in metadata so query results round-trip back to your source-of-truth row.

TL;DR#

  • The limit is 64 bytes, not 64 chars. Multibyte UTF-8 hits it sooner than ASCII.
  • Always hash the id. Never branch on length.
  • Put the original id in metadata.id. Resolve back at query time.
  • A single oversized id fails the WHOLE batch — partial-success semantics.

The error#

VECTOR_UPSERT_ERROR (code = 40008): id too long; max is 64 bytes, got 67 bytes

This is a 4xx-class refusal at the upsert API. One bad id in a vectorize.upsert([...]) batch rejects every vector in the call — it is not partial-success-with-warnings. If you batch 100 vectors and one has a 67-byte id, all 100 silently fail to land.

Cost-Per-Pass, Not Cost-Per-Call: The Right Metric for Autonomous Agent Routing

Cost-Per-Pass, Not Cost-Per-Call#

Practitioners price LLMs by the per-token rate on the provider’s pricing page. For autonomous agents, that number is misleading. Two layers of indirection sit between the per-token rate and the cost you actually pay to get work done: variable prompt sizes turn per-token into per-call, and variable pass rates turn per-call into per-pass. Each layer can invert the ranking.

For autonomous fleets where failed attempts trigger reviewer cycles, retries, and reputational drag, cost-per-pass is the only metric that ranks models correctly. This article shows how to compute it, when it dominates, and where the cheapest-per-token model becomes the most expensive in production.

DeepSeek V4 Operational Quirks: Pro vs Flash, Reasoning Echo, and the Discount Cliff

DeepSeek V4 Operational Quirks#

DeepSeek V4 ships two models behind one OpenAI-compatible API: V4-Pro (reasoning) at $1.74/M input / $3.48/M output and V4-Flash (chat) at $0.28/M input / $1.10/M output. Until 2026-05-31 V4-Pro carries a 75% discount, putting it at $0.435/M input — cheap enough to use as a heavy-tier coding model. After that, the cost steps up 4×.

The two models live on the same endpoint but want very different things. V4-Pro behaves like a reasoning model (thin prompts, reasoning_content echo required, tool_choice restrictions). V4-Flash behaves like a chat model (rich prompts win dramatically; rejects nothing). Confuse them and your matrix lights up red.

Docker-in-Docker on Jenkins: Why Postgres Tests Can't Reach localhost (And How to Fix It)

Docker-in-Docker on Jenkins: Postgres Tests Can’t Reach localhost#

A Jenkins job runs docker run -d -p 5432:5432 postgres:17-alpine and gets back a container ID. The next step is psql -h localhost -p 5432 -U postgres and it returns Connection refused. The retry loop tries 30 times and gives up. The test job fails with “could not connect to server”.

If you’ve added longer waits, switched to --network host, or rewritten the test script to launch its own postgres container, none of that will help. The problem is the network model: Jenkins running in a Kubernetes pod uses the host’s docker socket to launch SIBLING containers. Those siblings live on the host’s docker bridge network, not in Jenkins’s pod network namespace. localhost from inside Jenkins is the pod’s loopback; the published port is on the host’s interface.

FTS5 vs Cloudflare Vectorize: A/B Results on When Keyword Beats Semantic Search

FTS5 vs Cloudflare Vectorize#

The “FTS5 vs vectors” debate is usually hand-wavy. Both sides cite plausible reasons, neither runs the same queries through both engines on the same corpus, and the conclusion is whichever one the author shipped. With identical data and identical queries you can measure exactly where each wins.

The result: FTS5 and Vectorize have non-overlapping strengths. The right answer for most knowledge-base workloads is “ship both” behind an opt-in flag — not pick one. This page is the measurements, the cost math, and the dual-engine pattern.

LLM Adapter Audit Checklist: 10 Bugs That Hide in OpenAI-Compatible Providers

LLM Adapter Audit Checklist#

When you wrap an OpenAI-compatible LLM provider (Moonshot, DeepSeek, xAI, Together, Fireworks, OpenRouter, vLLM, anything else that exposes POST /v1/chat/completions) in a Go HTTP client, the same ten bug classes show up. They all silently degrade or break the agent — none of them crash loudly. Each was observed in production across at least one of xAI, DeepSeek, or Moonshot during a two-week audit period.

This checklist is the audit. Run it against any new adapter before shipping. Each entry is Symptom → Cause → Fix with a code shape you can grep your repo for.

Moonshot Kimi K2.6 Operational Quirks: What Breaks in Production

Moonshot Kimi K2.6 Operational Quirks#

Kimi K2.6 is one of the cheapest competent reasoning models — $0.95/M input cache-miss, $0.16/M cache-hit, $4.00/M output, 256K context. It is also one of the most opinionated. Half of what works on OpenAI breaks here, and the failures are silent: empty content, mid-reasoning truncation, 400 errors that don’t mention the actual problem, and a cache key parameter that makes cost go up instead of down.

OFAT Matrix LLM Tuning: A Methodology for Picking Sampling Params, Tool Configs, and Prompts Without Guessing

OFAT Matrix LLM Tuning#

When a new provider or model lands and you have to decide what temperature, max_tokens, tool_choice, prompt-shape, and turn budget to ship in production, the default is to pick by hunch. Read the model card, copy a partner adapter’s defaults, ship. A week later you find out reasoning_effort=high doubled cost for no quality gain, max_tokens=2048 silently truncated half your tier-3 runs, and the “prompt-rich” pattern you copied from grok-4.3 actively hurts kimi.

Reasoning-Model Tuning Asymmetry: Why Thin Prompts Beat Rich Prompts (and When They Don't)

Reasoning-Model Tuning Asymmetry#

Practitioners assume “better prompt = better output”. For one model class, that assumption is correct. For the other, the same prompt makes things measurably worse. This article documents the asymmetry, names the dividing line, and gives you a 4-cell test to confirm it on your own canary before you commit to a prompt.

The asymmetry is empirical, not theoretical. It shows up cleanly across four independent OFAT (one-factor-at-a-time) matrices run between 2026-05-18 and 2026-05-20: sonnet POC, grok matrix v1+v2, deepseek matrix v1, kimi matrix v1.

Stateful vs Stateless Agent Daemons: A-Mode /loop vs C-Mode cron

Stateful vs Stateless Agent Daemons#

Long-running agents on the Max subscription split cleanly into two operating modes. A-mode keeps a single /loop session alive across cycles, accumulating in-session context that gets cleared once a day. C-mode wraps claude -p in a bash sleep loop; every cycle is a fresh process with zero carryover. Both run forever in tmux. Both cost $0 of Anthropic API spend (the subscription pays). They behave very differently per cycle.