📊

Tokens & Context Windows

Understanding AI's memory limits and how to work within them

Beginner 4 min read

1. What is a token?
2. Context windows explained
3. Practical implications

What is a token?

AI models don't process text character by character or word by word — they use tokens. A token is roughly ¾ of a word in English.

"Hello" = 1 token. "Hello, world!" = 3 tokens. "I am using AItheGuru.in" = 6 tokens.

Why does this matter? Models are priced per token (input + output), and they have a maximum number of tokens they can handle in one conversation.

Context windows explained

The context window is how much text a model can "see" at once — like working memory. Everything in the conversation (your messages + the AI's responses) counts against this limit.

GPT-4o: 128K tokens ≈ 96,000 words ≈ a 300-page book Claude 3.5: 200K tokens ≈ 150,000 words ≈ a 500-page book Gemini 1.5 Pro: 1M tokens ≈ 750,000 words ≈ 4 novels

Once you hit the limit, the model starts "forgetting" older parts of the conversation.

Practical implications

For long documents: Claude or Gemini with large context windows handle full books or codebases. GPT-4o handles most business documents comfortably.

For cost: Longer contexts cost more. If you're building an app with an LLM backend, manage context size carefully.

For conversations: Long chats degrade in quality as older context is compressed or dropped. For complex projects, start fresh conversations for distinct tasks.

Token counting tip: Use platform.openai.com/tokenizer to see exactly how many tokens your text uses.

Keep learning

🧠

Tokens & Context Windows

Contents

What is a token?

Context windows explained

Practical implications

Keep learning

Large Language Models (LLMs)

RAG — Retrieval Augmented Generation

AI Agents