Ollama Setup and Model Management: Installation, Model Selection, Memory Management, and ARM64 Native

Ollama Setup and Model Management#

Ollama turns running local LLMs into a single command. It handles model downloads, quantization, GPU memory allocation, and exposes a REST API that any application can call. No Python environments, no CUDA driver debugging, no manual GGUF file management.

Installation#

# macOS
brew install ollama

# Linux (official installer)
curl -fsSL https://ollama.com/install.sh | sh

# Or run as a Docker container
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Start the Ollama server:

Minikube Setup, Drivers, and Resource Configuration

Minikube Setup, Drivers, and Resource Configuration#

Minikube runs a single-node Kubernetes cluster on your local machine. The difference between a minikube setup that feels like a toy and one that behaves like production comes down to three choices: the driver, the resource allocation, and the Kubernetes version. Get these wrong and you spend more time fighting the tool than using it.

Installation#

On macOS with Homebrew:

brew install minikube

On Linux via direct download:

Minikube with Docker Driver on Apple Silicon

Why the Docker Driver on ARM64#

When running Minikube on Apple Silicon (M1/M2/M3/M4), the driver you choose determines whether your containers run natively or through emulation. The Docker driver runs containers directly on the host architecture — ARM64 — with zero emulation overhead.

This matters because QEMU user-mode emulation, which kicks in when you try to run amd64 images on ARM64, cannot reliably execute Go binaries. The specific failure is a crash in lfstack.push, deep in Go’s runtime memory management. This is not a fixable application bug — it is a fundamental incompatibility between QEMU’s user-mode emulation and Go’s lock-free stack implementation.