💻

Running AI Locally — The Complete Guide

Private, free, offline AI on your own hardware — setup to advanced use

Intermediate 8 min read

1. Why run AI locally?
2. Ollama — the easiest way to start
3. What hardware do you actually need?
4. Open WebUI — the missing browser interface

Why run AI locally?

Three compelling reasons to run models on your own hardware:

Privacy: Your data never leaves your machine. Sensitive business documents, personal health data, client information — none of it goes to external servers.

Cost: Zero API costs. Run unlimited queries at zero marginal cost. At high usage, local models pay for themselves in weeks.

Offline: Works without internet. Useful for travel, areas with unreliable connectivity, or air-gapped environments.

The trade-off: local models are smaller and less capable than frontier models (GPT-4o, Claude 3.5). For many tasks — summarisation, code help, writing — the difference is minimal. For frontier reasoning tasks, you still need cloud APIs.

Ollama — the easiest way to start

Ollama is the simplest way to run models locally. One-command install, one-command model download.

```bash # Install from ollama.com, then: ollama run llama3.2 # Meta's 3B model — fast, runs on any laptop ollama run mistral # 7B — better quality, needs 8GB RAM ollama run llama3.3 # 70B — excellent quality, needs 48GB RAM ollama run phi4 # Microsoft's 14B — great for coding ollama run gemma3 # Google's 12B — very capable ```

Ollama also runs an OpenAI-compatible API on localhost:11434 — meaning any app built for OpenAI can point to Ollama instead with one URL change.

What hardware do you actually need?

MacBook (M1/M2/M3): The best local AI experience. Apple Silicon uses unified memory — an M2 Pro with 16GB runs Llama 3.2 3B perfectly and Mistral 7B smoothly. An M2 Max with 32GB runs 13B models. M3 Max with 64GB handles 70B models.

Windows / Linux with Nvidia GPU: 8GB VRAM → 7B models (Mistral, Llama 3.2 8B). 16GB VRAM → 13B models. 24GB VRAM → 34B models. Without GPU, CPU inference works but is 5-10x slower.

Minimum for useful local AI: Any laptop with 8GB RAM can run 3B-7B models. They are genuinely useful for writing assistance, summarisation, and code help. Not as capable as GPT-4o but free and private.

The 80/20 rule: A Llama 3.2 3B model running locally handles 80% of daily AI tasks. Use cloud APIs for the 20% that needs frontier reasoning.

Open WebUI — the missing browser interface

Ollama gives you a terminal. Open WebUI gives you a ChatGPT-like browser interface for your local models — conversation history, file uploads, model switching, image generation, and more.

```bash # Requires Docker docker run -d -p 3000:8080 \ --add-host=host.docker.internal:host-gateway \ -v open-webui:/app/backend/data \ --name open-webui \ ghcr.io/open-webui/open-webui:main

# Open: http://localhost:3000 ```

Open WebUI connects to Ollama automatically. You can also add OpenAI and Anthropic API keys — switching between local and cloud models from the same interface.

For teams: deploy Open WebUI on an internal server. Everyone in the company gets AI access through a browser, using your local models, without any cloud subscription.

Keep learning

🧠

Running AI Locally — The Complete Guide

Contents

Why run AI locally?

Ollama — the easiest way to start

What hardware do you actually need?

Open WebUI — the missing browser interface

Keep learning

Large Language Models (LLMs)

RAG — Retrieval Augmented Generation

AI Agents