Skip to content
$EngineeringAtlas

Vector Databases, Explained for Backend Engineers

What embeddings are, why approximate nearest-neighbor search needs special indexes, and when you actually need a vector database versus a pgvector column.

Amit Kumar Singh3 min read

The Problem

You want to build semantic search or retrieval-augmented generation: given a user's question, find the most relevant documents by meaning, not keywords. A LIKE query won't cut it — "how do I reset my password" should match a doc titled "account recovery steps." This is similarity search over embeddings, and it needs different machinery.

Why It Matters

RAG and semantic search are now standard features. The retrieval layer underneath them is a vector similarity problem, and doing it naively — comparing a query against every stored vector — is O(n) per query. At millions of vectors, that's too slow. Understanding the index is what separates a demo from production.

Core Concepts

An embedding is a fixed-length array of floats that represents the meaning of text, produced by a model. Similar meanings produce vectors that are close together, usually measured by cosine similarity.

Finding the closest vectors exactly is expensive, so vector databases use approximate nearest neighbor (ANN) search. The most common index is HNSW (Hierarchical Navigable Small World), a graph you can traverse to find close neighbors in roughly logarithmic time, trading a little recall for a lot of speed.

Implementation

You don't always need a dedicated database. Postgres with pgvector is often enough:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id      bigserial PRIMARY KEY,
  content text,
  embedding vector(1536)        -- dimension matches your embedding model
);

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops);

Querying is an ordering by distance:

SELECT id, content
FROM documents
ORDER BY embedding <=> $1   -- <=> is cosine distance
LIMIT 5;

The <=> operator finds the nearest neighbors; the HNSW index makes it fast.

Common Mistakes

  • Mismatched dimensions. The column dimension must equal the embedding model's output. Switching models usually means re-embedding everything.
  • Comparing across models. Embeddings from different models live in different spaces and aren't comparable. Pick one and stay consistent.
  • Forgetting the index. Without an ANN index, every query is a full scan. It works on a thousand rows and falls over at a million.

Production Considerations

ANN is a recall/speed trade-off. Index parameters (m, ef_construction, and the query-time ef_search) tune how thoroughly the graph is searched. Raise them for better recall, lower them for lower latency. Measure recall against an exact brute-force baseline on a sample so you know what you're trading away.

Security

Apply your normal authorization after retrieval, or filter within the query. Vector search will happily return a chunk the current user isn't allowed to see — relevance is not permission. Combine the similarity search with a metadata filter on tenant or document ownership.

Performance

Reach for a dedicated vector database (Qdrant, Weaviate, Milvus) when you're past tens of millions of vectors, need high write throughput, or want native metadata filtering at scale. Below that, pgvector keeps your vectors next to your relational data and one fewer system to operate.

Summary

Vector search powers semantic retrieval by comparing embeddings with approximate nearest-neighbor indexes like HNSW. Start with pgvector if you're already on Postgres, match your dimensions to your model, always build the index, and enforce authorization after retrieval. Graduate to a dedicated vector database only when scale genuinely demands it.

Amit Kumar Singh

// written by

Amit Kumar Singh

Software engineer writing about backend systems, cloud, and the realities of running code in production.

$ subscribe --weekly

The weekly engineering digest

Production-grade engineering writing in your inbox. No spam, unsubscribe anytime.

## related

[Backend]▲ trending

Designing Idempotent APIs That Survive Retries

Networks fail, clients retry, and duplicate requests happen. Here's how to design write endpoints that produce the same result no matter how many times they're called.

Amit Kumar Singh3 min read