Running AI Locally — The Complete Guide
Private, free, offline AI on your own hardware — setup to advanced use
Contents
Why run AI locally?
Three compelling reasons to run models on your own hardware:
Privacy: Your data never leaves your machine. Sensitive business documents, personal health data, client information — none of it goes to external servers.
Cost: Zero API costs. Run unlimited queries at zero marginal cost. At high usage, local models pay for themselves in weeks.
Offline: Works without internet. Useful for travel, areas with unreliable connectivity, or air-gapped environments.
The trade-off: local models are smaller and less capable than frontier models (GPT-4o, Claude 3.5). For many tasks — summarisation, code help, writing — the difference is minimal. For frontier reasoning tasks, you still need cloud APIs.
Ollama — the easiest way to start
Ollama is the simplest way to run models locally. One-command install, one-command model download.
```bash # Install from ollama.com, then: ollama run llama3.2 # Meta's 3B model — fast, runs on any laptop ollama run mistral # 7B — better quality, needs 8GB RAM ollama run llama3.3 # 70B — excellent quality, needs 48GB RAM ollama run phi4 # Microsoft's 14B — great for coding ollama run gemma3 # Google's 12B — very capable ```
Ollama also runs an OpenAI-compatible API on localhost:11434 — meaning any app built for OpenAI can point to Ollama instead with one URL change.
What hardware do you actually need?
MacBook (M1/M2/M3): The best local AI experience. Apple Silicon uses unified memory — an M2 Pro with 16GB runs Llama 3.2 3B perfectly and Mistral 7B smoothly. An M2 Max with 32GB runs 13B models. M3 Max with 64GB handles 70B models.
Windows / Linux with Nvidia GPU: 8GB VRAM → 7B models (Mistral, Llama 3.2 8B). 16GB VRAM → 13B models. 24GB VRAM → 34B models. Without GPU, CPU inference works but is 5-10x slower.
Minimum for useful local AI: Any laptop with 8GB RAM can run 3B-7B models. They are genuinely useful for writing assistance, summarisation, and code help. Not as capable as GPT-4o but free and private.
The 80/20 rule: A Llama 3.2 3B model running locally handles 80% of daily AI tasks. Use cloud APIs for the 20% that needs frontier reasoning.
Open WebUI — the missing browser interface
Ollama gives you a terminal. Open WebUI gives you a ChatGPT-like browser interface for your local models — conversation history, file uploads, model switching, image generation, and more.
```bash # Requires Docker docker run -d -p 3000:8080 \ --add-host=host.docker.internal:host-gateway \ -v open-webui:/app/backend/data \ --name open-webui \ ghcr.io/open-webui/open-webui:main
# Open: http://localhost:3000 ```
Open WebUI connects to Ollama automatically. You can also add OpenAI and Anthropic API keys — switching between local and cloud models from the same interface.
For teams: deploy Open WebUI on an internal server. Everyone in the company gets AI access through a browser, using your local models, without any cloud subscription.