📚

RAG — Retrieval Augmented Generation

How AI reads your documents and gives accurate, sourced answers

Intermediate 6 min read

1. The problem RAG solves
2. How RAG works — step by step
3. Real-world example
4. When to use RAG vs fine-tuning
5. Tools to build RAG yourself

The problem RAG solves

Standard LLMs have a knowledge cutoff — they only know what was in their training data. Ask GPT-4 about your company's internal policy document and it cannot answer accurately. Ask it about last week's news and it'll hallucinate.

RAG (Retrieval Augmented Generation) fixes this by giving the model a way to look things up before answering — like an open-book exam versus a closed-book one.

How RAG works — step by step

1. Indexing: Your documents (PDFs, websites, databases) are split into chunks and converted into numerical representations called embeddings. These are stored in a vector database.

2. Query: A user asks a question. The question is also converted into an embedding.

3. Retrieval: The system finds the document chunks most similar to the question — like a semantic search.

4. Augmentation: The relevant chunks are inserted into the LLM's prompt as context.

5. Generation: The LLM answers using both its training knowledge and the retrieved documents — and can cite its sources.

Real-world example

Imagine you have 500 pages of legal contracts. A standard LLM cannot read all of them in one go (context limit) and doesn't know your specific contracts.

With RAG: all contracts are indexed. You ask "What are the termination clauses in the Reliance contract?" The system retrieves the relevant pages, passes them to the LLM, and gives you an accurate, cited answer.

This is how tools like Perplexity (searches the web), NotebookLM (reads your documents), and custom enterprise AI chatbots work.

When to use RAG vs fine-tuning

Use RAG when: You need the AI to access specific, frequently updated, or private documents. RAG is dynamic — add new documents and the knowledge updates immediately.

Use fine-tuning when: You want the model to adopt a specific style, personality, or specialized skill baked in. Fine-tuning changes the model itself and is expensive.

Rule of thumb: For knowledge (facts, documents), use RAG. For behaviour (writing style, domain expertise), use fine-tuning.

Tools to build RAG yourself

LlamaIndex: The most popular framework for building RAG pipelines. Handles indexing, retrieval, and query logic.

LangChain: Broader AI framework with RAG support. More complex but very flexible.

Supabase + pgvector: PostgreSQL with vector search. Free and open-source.

Pinecone: Managed vector database. Easy to start, scales well.

Weaviate / Qdrant: Open-source vector databases for self-hosting.

For a simple start: LlamaIndex + local Llama model can run entirely on your laptop.

Keep learning

🧠

RAG — Retrieval Augmented Generation

Contents

The problem RAG solves

How RAG works — step by step

Real-world example

When to use RAG vs fine-tuning

Tools to build RAG yourself

Keep learning

Large Language Models (LLMs)

AI Agents

Prompt Engineering