Local-Llm-Deployment

Running Local LLMs on the NVIDIA GB10 (DGX Spark / ASUS Ascent GX10)

May 25, 2026

Local-Llm-Deployment, Gpu-Memory-Sizing, Model-Runtime-Selection, Moe-Model-Selection

Gb10, Dgx-Spark, Asus-Ascent-Gx10, Local-Llm, Lm-Studio, Llama-Cpp, Gguf, Unified-Memory, Moe, Grace-Blackwell, Dcgm

Lm-Studio, Lms, Llama.cpp, Dcgm-Exporter, Ssh

Decision-first: On a GB10, pick low-active MoE models (A3B-class), serve GGUF (not MLX) via LM Studio, run one model at a time behind an OOM guard, and monitor GPU via DCGM but read the model footprint from system RAM (no framebuffer metrics). Dense 70B is unusable (~2-3 tok/s).

Scope & freshness: GB10 / Grace-Blackwell, 128 GB unified, DCGM 4.5.3 + driver 580-class, as of 2026-05-25. Re-check the DCGM profiling/framebuffer gaps after a driver/DCGM bump (≥585).