Building a RAG Pipeline from Scratch — pgvector, Embeddings, and Retrieval That Actually Works
How I built a production retrieval system using pgvector, chunking strategies, and embedding models without reaching for a framework.
Most RAG tutorials skip the hard parts — chunking strategy, embedding model choice, and what to do when retrieval fails silently. This is what I learned building it for real.
Why pgvector over Pinecone
For Unirift I needed vector search to stay inside the self-hosted stack. pgvector on Postgres meant one less service, transactions across relational and vector data, and zero vendor lock-in.
Chunking strategy
Fixed-size chunks with overlap work for most content. The real win is metadata — store the source, section heading, and timestamp with every chunk. Retrieval quality jumps when you can filter before the vector search.
Embedding model choice
I tested text-embedding-3-small and Gemini's embedding model. Gemini performs better on longer passages; 3-small wins on speed and cost for shorter technical notes. I abstracted the choice behind a VectorStore interface so swapping is a config change.