Fine-tuning vs RAG vs Prompting
Choosing the right approach — when to use what and why
Three ways to customise AI
There are three main approaches to making an AI model do what you specifically need:
Prompting: Tell the model what to do in the prompt. Zero cost, immediate, but limited and not persistent.
RAG: Give the model access to specific documents at query time. Dynamic, updatable, great for knowledge.
Fine-tuning: Train the model further on your specific data. Changes the model itself. Expensive but powerful for style and behaviour.
Decision framework
Use prompting when: The task is clear, the model already knows how to do it, and you can express the requirements in a prompt. Start here — always.
Use RAG when: The model needs to know specific, private, or frequently changing information. Company documents, internal knowledge bases, real-time data.
Use fine-tuning when: You need consistent tone/style the model doesn't naturally have, specialised skills (medical, legal domain expertise), or you want to compress long system prompts into the model.
The typical path: Prompt engineer first → add RAG if knowledge is lacking → fine-tune only if prompt + RAG aren't enough.
Fine-tuning in practice
Fine-tuning requires labelled training examples (input-output pairs). A minimum of 50-100 examples to see improvement; 1,000+ for significant changes.
Cost: OpenAI fine-tuning of GPT-4o mini costs roughly $0.003/1K tokens for training. A 1,000-example dataset might cost $5-20 to train.
Good use cases: Customer service bot with your company's tone, domain-specific classification, formatting outputs in a very specific structure.
Bad use cases: Adding new knowledge the model doesn't have (use RAG), or fixing factual errors (the model will still hallucinate).