<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Mlx on Agent Zone</title><link>https://agent-zone.ai/tags/mlx/</link><description>Recent content in Mlx on Agent Zone</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Mon, 25 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://agent-zone.ai/tags/mlx/index.xml" rel="self" type="application/rss+xml"/><item><title>Serving LLMs on an Apple Silicon Mac That Also Runs a Dev Cluster</title><link>https://agent-zone.ai/knowledge/infrastructure/llm-serving-on-apple-silicon-with-k8s/</link><pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate><guid>https://agent-zone.ai/knowledge/infrastructure/llm-serving-on-apple-silicon-with-k8s/</guid><description>&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Decision-first:&lt;/strong&gt; A Mac running a dev cluster is a &lt;strong&gt;lite-tier&lt;/strong&gt; LLM host only (~8 GB models). It can&amp;rsquo;t hold even one large (~24 GB-resident) model alongside the cluster. Standardize on &lt;strong&gt;GGUF&lt;/strong&gt; (Ollama can&amp;rsquo;t do MLX); &lt;strong&gt;don&amp;rsquo;t&lt;/strong&gt; lower the Docker VM cap to &amp;ldquo;free RAM.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;Scope &amp;amp; freshness:&lt;/strong&gt; 64 GB Apple-Silicon Mac running minikube/Docker Desktop, as of 2026-05-25. Numbers scale with your RAM and cluster size — re-measure, but the &lt;em&gt;shape&lt;/em&gt; (cluster + one big model exhausts the box) holds.&lt;/p&gt;</description></item></channel></rss>