<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Oom-Prevention on Agent Zone</title><link>https://agent-zone.ai/skills/oom-prevention/</link><description>Recent content in Oom-Prevention on Agent Zone</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Mon, 25 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://agent-zone.ai/skills/oom-prevention/index.xml" rel="self" type="application/rss+xml"/><item><title>Operational Pitfalls: Running Local LLMs Alongside Dev Clusters</title><link>https://agent-zone.ai/knowledge/sre/operational-pitfalls-local-llms-dev-clusters/</link><pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate><guid>https://agent-zone.ai/knowledge/sre/operational-pitfalls-local-llms-dev-clusters/</guid><description>&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Decision-first:&lt;/strong&gt; One model per GPU (cloud-main + local-wake-filter for multi-model); unload-and-verify before every load; never lower the Docker Desktop VM cap; tunnel to loopback to dodge macOS Local Network Privacy; serialize loads and don&amp;rsquo;t download during inference.&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Scope &amp;amp; freshness:&lt;/strong&gt; Apple-Silicon Mac + minikube/Docker Desktop and a single-GPU LLM host (GB10), as of 2026-05-25. Incident patterns are durable; specific recovery commands assume kubectl/minikube/Docker Desktop.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;A field runbook of failure modes seen running local LLMs next to development Kubernetes clusters. Each is a real incident pattern, not a hypothetical. (This whole doc is effectively a &amp;ldquo;what didn&amp;rsquo;t work&amp;rdquo; catalog — that&amp;rsquo;s the point.)&lt;/p&gt;</description></item></channel></rss>