GPU and Host Monitoring Across Mac and Linux/GB10 in One Prometheus

May 25, 2026

Heterogeneous-Host-Monitoring, Scrapeconfig-Authoring, Cross-Os-Promql, Gpu-Telemetry

Prometheus, Grafana, Node-Exporter, Dcgm, Gpu-Monitoring, Macos, Darwin, Scrapeconfig, Kube-Prometheus, Local-Llm

Prometheus, Grafana, Node-Exporter, Dcgm-Exporter, Kube-Prometheus-Stack

Decision-first: macOS and Linux node_exporter expose different metric names — write per-OS memory/disk expressions. The stock node dashboard hides Darwin on purpose. Scrape external hosts via ScrapeConfig + relabel job/instance. On a GB10, there are no GPU framebuffer or profiling metrics — read model footprint from system RAM.

Scope & freshness: kube-prometheus-stack + node_exporter + DCGM, macOS + Linux/GB10, as of 2026-05-25. Re-check the GB10 DCGM gaps after a DCGM/driver bump.

Running Local LLMs on the NVIDIA GB10 (DGX Spark / ASUS Ascent GX10)

May 25, 2026

Infrastructure

Intermediate, Advanced

Local-Llm-Deployment, Gpu-Memory-Sizing, Model-Runtime-Selection, Moe-Model-Selection

Gb10, Dgx-Spark, Asus-Ascent-Gx10, Local-Llm, Lm-Studio, Llama-Cpp, Gguf, Unified-Memory, Moe, Grace-Blackwell, Dcgm

Lm-Studio, Lms, Llama.cpp, Dcgm-Exporter, Ssh

Decision-first: On a GB10, pick low-active MoE models (A3B-class), serve GGUF (not MLX) via LM Studio, run one model at a time behind an OOM guard, and monitor GPU via DCGM but read the model footprint from system RAM (no framebuffer metrics). Dense 70B is unusable (~2-3 tok/s).

Scope & freshness: GB10 / Grace-Blackwell, 128 GB unified, DCGM 4.5.3 + driver 580-class, as of 2026-05-25. Re-check the DCGM profiling/framebuffer gaps after a driver/DCGM bump (≥585).

GPU and ML Workloads on Kubernetes: Scheduling, Sharing, and Monitoring

February 22, 2026

Kubernetes

Intermediate

Gpu-Scheduling, Ml-Infrastructure, Resource-Management, Workload-Isolation, Gpu-Monitoring

Gpu, Nvidia, Machine-Learning, Device-Plugin, Mig, Time-Slicing, Mps, Cuda, Node-Affinity, Taints, Dcgm

Kubectl, Nvidia-Smi, Helm, Dcgm-Exporter, Prometheus, Grafana

GPU and ML Workloads on Kubernetes#

Running GPU workloads on Kubernetes requires hardware-aware scheduling that the default scheduler does not provide out of the box. GPUs are expensive – an NVIDIA A100 node costs $3-12/hour on cloud providers – so efficient utilization matters far more than with CPU workloads. This article covers the full stack from device plugin installation through GPU sharing and monitoring.

The NVIDIA Device Plugin#

Kubernetes has no native understanding of GPUs. The NVIDIA device plugin bridges that gap by exposing GPUs as a schedulable resource (nvidia.com/gpu). Without it, the scheduler has no idea which nodes have GPUs or how many are available.