Vector Databases, Explained for Backend Engineers

The Problem

You want to build semantic search or retrieval-augmented generation: given a user's question, find the most relevant documents by meaning, not keywords. A LIKE query won't cut it — "how do I reset my password" should match a doc titled "account recovery steps." This is similarity search over embeddings, and it needs different machinery.

Why It Matters

RAG and semantic search are now standard features. The retrieval layer underneath them is a vector similarity problem, and doing it naively — comparing a query against every stored vector — is O(n) per query. At millions of vectors, that's too slow. Understanding the index is what separates a demo from production.

Core Concepts

An embedding is a fixed-length array of floats that represents the meaning of text, produced by a model. Similar meanings produce vectors that are close together, usually measured by cosine similarity.

Finding the closest vectors exactly is expensive, so vector databases use approximate nearest neighbor (ANN) search. The most common index is HNSW (Hierarchical Navigable Small World), a graph you can traverse to find close neighbors in roughly logarithmic time, trading a little recall for a lot of speed.

Implementation

You don't always need a dedicated database. Postgres with pgvector is often enough:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id      bigserial PRIMARY KEY,
  content text,
  embedding vector(1536)        -- dimension matches your embedding model
);

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops);

Querying is an ordering by distance:

SELECT id, content
FROM documents
ORDER BY embedding <=> $1   -- <=> is cosine distance
LIMIT 5;

The <=> operator finds the nearest neighbors; the HNSW index makes it fast.

Common Mistakes

Mismatched dimensions. The column dimension must equal the embedding model's output. Switching models usually means re-embedding everything.
Comparing across models. Embeddings from different models live in different spaces and aren't comparable. Pick one and stay consistent.
Forgetting the index. Without an ANN index, every query is a full scan. It works on a thousand rows and falls over at a million.

Production Considerations

ANN is a recall/speed trade-off. Index parameters (m, ef_construction, and the query-time ef_search) tune how thoroughly the graph is searched. Raise them for better recall, lower them for lower latency. Measure recall against an exact brute-force baseline on a sample so you know what you're trading away.

Security

Apply your normal authorization after retrieval, or filter within the query. Vector search will happily return a chunk the current user isn't allowed to see — relevance is not permission. Combine the similarity search with a metadata filter on tenant or document ownership.

Performance

Reach for a dedicated vector database (Qdrant, Weaviate, Milvus) when you're past tens of millions of vectors, need high write throughput, or want native metadata filtering at scale. Below that, pgvector keeps your vectors next to your relational data and one fewer system to operate.

Summary

Vector search powers semantic retrieval by comparing embeddings with approximate nearest-neighbor indexes like HNSW. Start with pgvector if you're already on Postgres, match your dimensions to your model, always build the index, and enforce authorization after retrieval. Graduate to a dedicated vector database only when scale genuinely demands it.

Vector Databases, Explained for Backend Engineers

The Problem

Why It Matters

Core Concepts

Implementation

Common Mistakes

Production Considerations

Security

Performance

Summary

Amit Kumar Singh

The weekly engineering digest

## related

Designing Idempotent APIs That Survive Retries

Postgres Connection Pooling, Explained Properly

Zero-Downtime Deployments Without the Magic

The Problem#

Why It Matters#

Core Concepts#

Implementation#

Common Mistakes#

Production Considerations#

Security#

Performance#

Summary#

Amit Kumar Singh

The weekly engineering digest

## related

Designing Idempotent APIs That Survive Retries

Postgres Connection Pooling, Explained Properly

Zero-Downtime Deployments Without the Magic

The Problem

Why It Matters

Core Concepts

Implementation

Common Mistakes

Production Considerations

Security

Performance

Summary