Vector Databases
The storage layer that makes RAG and semantic search possible
Contents
Why regular databases cannot do semantic search
A regular SQL database can tell you: "find all rows where column = value." It matches exactly.
But "find me documents about retirement planning" cannot be answered by exact matching — the word "retirement" might not appear in the most relevant documents. You need semantic matching: find documents with similar meaning.
This requires comparing meaning numerically — which requires embeddings (vectors). And searching through millions of vectors efficiently requires a specialised database designed for exactly that operation.
How vector databases work
Every piece of content (document, image, product description) is converted to a vector (list of numbers) using an embedding model. These vectors are stored in the database alongside the original content.
At query time: 1. Your question is also converted to a vector 2. The database finds the stored vectors closest to your query vector (cosine similarity or L2 distance) 3. Returns the matching content
The key challenge is doing this efficiently across millions or billions of vectors. Algorithms like HNSW (Hierarchical Navigable Small World) and IVF (Inverted File Index) create index structures that make approximate nearest-neighbour search fast enough for production.
Choosing a vector database
ChromaDB: Best for prototyping and small projects. Runs in-process (no server). Free and open-source. Limited to millions of vectors.
Pinecone: Managed cloud service. Easy to start, scales well, good Python SDK. Paid beyond the free tier. Good for production apps.
Qdrant: Open-source, self-hostable, written in Rust for performance. Best balance of scale and control. Free to self-host.
pgvector (Supabase): PostgreSQL extension. If you already use Postgres, this avoids a separate service. Good for moderate scale.
Weaviate / Milvus: Enterprise-grade, billion-scale. Significant operational complexity.
Rule of thumb: Start with ChromaDB for prototyping. Graduate to Qdrant or pgvector for production. Pinecone if you want managed infrastructure.