tutorial 2025-03-21 12 min read

From SQL to Vector Search: A Developer's Guide

Understand vector search through the lens of SQL. Learn how similarity queries differ from exact-match queries, how ANN indexes work, and how to choose between pgvector, Pinecone, and Weaviate.

vector search pgvector ANN semantic search database embeddings

The Fundamental Difference

SQL is optimized for exact matches:

-- Find user with id = 42
SELECT * FROM users WHERE id = 42;

-- Find all orders with status = 'shipped'
SELECT * FROM orders WHERE status = 'shipped';

Vector search is optimized for similarity:

-- Find the 10 products most similar to product 42
SELECT * FROM products
ORDER BY embedding <-> (SELECT embedding FROM products WHERE id = 42)
LIMIT 10;

The <-> operator computes L2 distance between vectors. The query asks: "find me things that are close in embedding space" — which means "semantically similar."

Why You Can't Do Semantic Search with SQL Indexes

Traditional database indexes (B-trees, hash indexes) enable exact and range lookups in O(log n). They don't help with nearest-neighbor queries in high-dimensional space.

The naive approach — compute distance from query to every row — is O(n * d) where d is the vector dimension. For 10 million 1536-dimensional vectors: ~15 billion floating-point operations per query. Too slow.

The solution: Approximate Nearest Neighbor (ANN) indexes. They sacrifice a small amount of accuracy for orders-of-magnitude speedup.

How ANN Indexes Work (Intuition)

HNSW (Hierarchical Navigable Small World)

The most widely used ANN algorithm. Think of it like a skip list for similarity:

Level 2 (sparse):  [A] -------- [E] -------- [I]
                    |            |            |
Level 1 (medium):  [A] --- [C] - [E] --- [G] - [I]
                    |  |    |    |  |    |    |  |
Level 0 (dense):   [A][B] [C][D][E][F] [G][H][I][J]

To find nearest neighbor of query Q:

  1. Enter at top level, greedily move to closest node
  2. Drop to next level at current position
  3. Repeat until bottom layer
  4. Search neighborhood at bottom layer

Time complexity: O(log n) average. Recall: ~95-99% at common settings.

IVF (Inverted File Index)

Used in FAISS:

  1. Cluster all vectors into K clusters during indexing
  2. At query time, find the M nearest cluster centroids
  3. Only search within those M clusters

Faster but requires more tuning (K, M) and does worse on highly non-uniform distributions.

pgvector: Vector Search in PostgreSQL

If you already use Postgres, pgvector adds vector types and indexes:

-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Add vector column to existing table
ALTER TABLE documents ADD COLUMN embedding vector(1536);

-- Create HNSW index for fast approximate search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Insert documents with embeddings
INSERT INTO documents (content, embedding)
VALUES ('The cat sat on the mat', '[0.1, -0.3, 0.7, ...]');

-- Semantic search
SELECT id, content,
       1 - (embedding <=> query_embedding) AS cosine_similarity
FROM documents
ORDER BY embedding <=> '[0.1, -0.2, ...]'::vector
LIMIT 10;

Distance operators:

  • <-> — L2 (Euclidean) distance
  • <=> — Cosine distance
  • <#> — Negative inner product

pgvector with Python

import psycopg2
from pgvector.psycopg2 import register_vector
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

# Connect
conn = psycopg2.connect("postgresql://user:pass@localhost/db")
register_vector(conn)
cursor = conn.cursor()

# Index documents
def index_documents(docs: list[dict]):
    embeddings = model.encode([d["content"] for d in docs])

    cursor.executemany(
        "INSERT INTO documents (id, content, embedding) VALUES (%s, %s, %s)",
        [(d["id"], d["content"], emb.tolist()) for d, emb in zip(docs, embeddings)]
    )
    conn.commit()

# Search
def semantic_search(query: str, top_k: int = 10) -> list[dict]:
    query_embedding = model.encode(query).tolist()

    cursor.execute("""
        SELECT id, content, 1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (query_embedding, query_embedding, top_k))

    return [
        {"id": row[0], "content": row[1], "similarity": float(row[2])}
        for row in cursor.fetchall()
    ]

Hybrid Search: Combining Keyword and Semantic

Pure semantic search misses exact keyword matches. Pure keyword search (BM25) misses paraphrases. Hybrid search combines both:

-- Hybrid search: BM25 + vector similarity
-- Reciprocal Rank Fusion (RRF) for combining scores
WITH semantic AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS rank
    FROM documents
    ORDER BY embedding <=> $1
    LIMIT 50
),
keyword AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(content_tsvector, query) DESC) AS rank
    FROM documents, to_tsquery('english', $2) query
    WHERE content_tsvector @@ query
    LIMIT 50
)
SELECT
    COALESCE(s.id, k.id) AS id,
    1.0 / (60 + COALESCE(s.rank, 1000)) + 1.0 / (60 + COALESCE(k.rank, 1000)) AS rrf_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY rrf_score DESC
LIMIT 10;

Dedicated Vector Databases

For large-scale vector search, dedicated databases offer more features:

Pinecone

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="documents",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("documents")

# Upsert vectors
vectors = [(doc["id"], embedding.tolist(), {"content": doc["content"]})
           for doc, embedding in zip(docs, embeddings)]
index.upsert(vectors=vectors)

# Query
results = index.query(
    vector=query_embedding.tolist(),
    top_k=10,
    include_metadata=True
)

Choosing Your Vector Store

Option When to use
pgvector You already use PostgreSQL; small-medium scale (<10M vectors)
FAISS Local/batch use; you want control over the index
Pinecone Fully managed; you don't want to operate infrastructure
Weaviate You need hybrid search + GraphQL API
Qdrant Open source; good filtering + payload storage
Milvus Very large scale (1B+ vectors); self-hosted

For most production applications under 50M vectors, pgvector is sufficient and keeps your stack simpler. Graduate to a dedicated store when you hit scaling or feature limitations.

Index Tuning

-- HNSW parameters
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (
    m = 16,              -- Max connections per layer (higher = better recall, more memory)
    ef_construction = 64 -- Search width during construction (higher = better index, slower build)
);

-- At query time
SET hnsw.ef_search = 100;  -- Higher = better recall, slower query
# FAISS IVF tuning
import faiss

nlist = 1000  # Number of clusters (√N is a good starting point)
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)

index.train(training_vectors)  # IVF requires training step
index.add(all_vectors)

index.nprobe = 50  # Check 50 clusters per query (higher = better recall, slower)

Recall vs. latency tradeoff: most applications target 95%+ recall at <50ms latency. Test with your actual data distribution.


For a deeper dive into production search systems, see our guide to search in the AI era.

Want to Go Deeper?

This article is part of our comprehensive curriculum on building ML systems at scale. Explore our full courses for hands-on learning.