<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Tool-Calling on Agent Zone</title><link>https://agent-zone.ai/tags/tool-calling/</link><description>Recent content in Tool-Calling on Agent Zone</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 27 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://agent-zone.ai/tags/tool-calling/index.xml" rel="self" type="application/rss+xml"/><item><title>Tiered-LLM Tooling: Local Model by Default, Escalate to the Frontier Model</title><link>https://agent-zone.ai/knowledge/agent-tooling/tiered-llm-default-local-escalate-frontier/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://agent-zone.ai/knowledge/agent-tooling/tiered-llm-default-local-escalate-frontier/</guid><description>&lt;h1 id="tiered-llm-tooling-local-by-default-escalate-to-frontier"&gt;Tiered-LLM Tooling: Local by Default, Escalate to Frontier&lt;a class="anchor" href="#tiered-llm-tooling-local-by-default-escalate-to-frontier"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;When you build a chat or ops interface backed by an LLM, paying a frontier model for &lt;strong&gt;every&lt;/strong&gt; interaction is wasteful — most interactions are cheap lookups, summaries, and routing. A tiered design serves the high-frequency majority with a small &lt;strong&gt;local model&lt;/strong&gt; (e.g. an Ollama-served model on a GPU you already have) and &lt;strong&gt;escalates to a frontier model&lt;/strong&gt; (e.g. Claude) only for the hard minority.&lt;/p&gt;</description></item><item><title>Benchmarking Local LLMs for Agentic Coding</title><link>https://agent-zone.ai/knowledge/agent-tooling/benchmarking-local-llms-for-agentic-coding/</link><pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate><guid>https://agent-zone.ai/knowledge/agent-tooling/benchmarking-local-llms-for-agentic-coding/</guid><description>&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Decision-first:&lt;/strong&gt; Evaluate on the &lt;strong&gt;agent loop&lt;/strong&gt; (read/edit/test/push), not one-shot patches. Use a &lt;strong&gt;multi-file execution-stamina&lt;/strong&gt; task as your discriminator, tune &lt;strong&gt;OFAT at N≥3&lt;/strong&gt;, and distinguish turn-ceiling vs token-ceiling vs capability-ceiling — only the last is unfixable by config.&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Scope &amp;amp; freshness:&lt;/strong&gt; Methodology is durable; the named results are 2026-05 snapshots — re-run the harness for current models.&lt;/p&gt;
&lt;/blockquote&gt;&lt;h2 id="why-public-leaderboard-scores-mislead"&gt;Why public leaderboard scores mislead&lt;a class="anchor" href="#why-public-leaderboard-scores-mislead"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;SWE-bench-style and chat leaderboards measure something adjacent to, but not the same as, &lt;strong&gt;autonomous tool-using coding&lt;/strong&gt;. A model can score well on one-shot patch generation and still fail as an agent because the agent loop demands sustained, multi-turn behavior: read files, edit several, run tests, react to failures, and &lt;em&gt;push&lt;/em&gt; — without giving up, looping, or declaring &amp;ldquo;done&amp;rdquo; early. Evaluate on the loop you&amp;rsquo;ll actually run.&lt;/p&gt;</description></item><item><title>Tuning Local LLMs for Agentic Coding: Sampling, Reasoning, and Budgets</title><link>https://agent-zone.ai/knowledge/agent-tooling/tuning-local-llms-sampling-reasoning-budgets/</link><pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate><guid>https://agent-zone.ai/knowledge/agent-tooling/tuning-local-llms-sampling-reasoning-budgets/</guid><description>&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Decision-first:&lt;/strong&gt; Per new model, sweep temperature (don&amp;rsquo;t assume 0.3), try reasoning &lt;strong&gt;off&lt;/strong&gt; for builders, test &lt;code&gt;echo_reasoning&lt;/code&gt; &lt;strong&gt;both ways&lt;/strong&gt;, and on &lt;code&gt;budget_exceeded&lt;/code&gt; check turns-vs-tokens before changing either. The right config is model-specific — assume nothing.&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Scope &amp;amp; freshness:&lt;/strong&gt; Local + cloud models for agentic coding, 2026-05. Findings are per-model (see the specific models named); treat them as examples of &lt;em&gt;shape&lt;/em&gt;, not universal constants — re-sweep for any new model.&lt;/p&gt;</description></item></channel></rss>