Building a RAG Pipeline from Scratch — pgvector, Embeddings, and Retrieval That Actually Works

How I built a production retrieval system using pgvector, chunking strategies, and embedding models without reaching for a framework.

Most RAG tutorials skip the hard parts — chunking strategy, embedding model choice, and what to do when retrieval fails silently. This is what I learned building it for real.

Why pgvector over Pinecone

For Unirift I needed vector search to stay inside the self-hosted stack. pgvector on Postgres meant one less service, transactions across relational and vector data, and zero vendor lock-in.

Chunking strategy

Fixed-size chunks with overlap work for most content. The real win is metadata — store the source, section heading, and timestamp with every chunk. Retrieval quality jumps when you can filter before the vector search.

Embedding model choice

I tested text-embedding-3-small and Gemini's embedding model. Gemini performs better on longer passages; 3-small wins on speed and cost for shorter technical notes. I abstracted the choice behind a VectorStore interface so swapping is a config change.