Mac-Llm-Hosting

Serving LLMs on an Apple Silicon Mac That Also Runs a Dev Cluster

May 25, 2026

Mac-Llm-Hosting, Unified-Memory-Budgeting, Runtime-Selection

Apple-Silicon, Macos, Local-Llm, Ollama, Mlx, Gguf, Docker-Desktop, Minikube, Unified-Memory, Metal

Ollama, Lm-Studio, Docker-Desktop, Minikube

Decision-first: A Mac running a dev cluster is a lite-tier LLM host only (~8 GB models). It can’t hold even one large (~24 GB-resident) model alongside the cluster. Standardize on GGUF (Ollama can’t do MLX); don’t lower the Docker VM cap to “free RAM.”

Scope & freshness: 64 GB Apple-Silicon Mac running minikube/Docker Desktop, as of 2026-05-25. Numbers scale with your RAM and cluster size — re-measure, but the shape (cluster + one big model exhausts the box) holds.