Ollama Setup and Model Management#
Ollama turns running local LLMs into a single command. It handles model downloads, quantization, GPU memory allocation, and exposes a REST API that any application can call. No Python environments, no CUDA driver debugging, no manual GGUF file management.
Installation#
# macOS
brew install ollama
# Linux (official installer)
curl -fsSL https://ollama.com/install.sh | sh
# Or run as a Docker container
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollamaStart the Ollama server: