Why Vector Databases Matter
As LLM-powered applications move to production, vector databases have become the backbone of retrieval-augmented generation (RAG), semantic search, and recommendation systems. Choosing the wrong one—or misconfiguring the right one—can turn a working prototype into an unscalable mess.
This guide covers the four options you'll actually encounter in production: Pinecone, Weaviate, Qdrant, and pgvector.
The Core Problem: ANN at Scale
All vector databases solve the same fundamental problem: given a query vector, find the k most similar vectors in a collection of millions or billions of records efficiently.
Exact nearest-neighbor search is O(n·d) per query—too slow at scale. Every serious vector database uses an approximate nearest neighbor (ANN) index. The two dominant families are:
- HNSW (Hierarchical Navigable Small World): Graph-based, high recall, fast queries, expensive to build and store
- IVF (Inverted File Index): Cluster-based, faster to build, slightly lower recall at the same memory budget
Understanding this distinction explains most of the performance tradeoffs you'll encounter.
Option 1: Pinecone
Pinecone is a fully managed, serverless vector database. You don't manage infrastructure—you create an index and call an API.
When to use Pinecone
- Startup or team without dedicated infrastructure engineers
- Need to be in production within a day
- Budget isn't the primary constraint
Key limits to know
Serverless (recommended tier):
- Max vector dimensions: 20,000
- Max metadata per vector: 40 KB
- Max index size: limited by spend, not hard ceiling
Pod-based (legacy):
- p1.x1: 1M vectors of dim-768 per pod
- s1.x1: 5M vectors of dim-768 per pod
Upsert and query pattern
import pinecone
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="YOUR_KEY")
# Create index
pc.create_index(
name="documents",
dimension=1536, # text-embedding-3-small
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("documents")
# Upsert vectors with metadata
vectors = [
("doc-001", embedding_1, {"text": "...", "source": "wiki"}),
("doc-002", embedding_2, {"text": "...", "source": "internal"}),
]
index.upsert(vectors=vectors, namespace="v1")
# Query
results = index.query(
vector=query_embedding,
top_k=10,
include_metadata=True,
filter={"source": {"$eq": "wiki"}},
namespace="v1",
)
Production gotchas
- Namespace isolation is critical for multi-tenant apps—don't skip it
- Metadata filtering is applied post-ANN, which can tank recall if filters are too selective
- Batch upserts of 100 vectors perform dramatically better than single-vector upserts
Option 2: Weaviate
Weaviate is an open-source vector database with native multi-modal support and a GraphQL API. It can run self-hosted or on Weaviate Cloud.
Architecture overview
Weaviate organizes data into classes (like tables). Each class can have its own vectorizer module (e.g., OpenAI, Cohere, a local model) or accept pre-computed vectors.
import weaviate
from weaviate.classes.config import Configure, Property, DataType
client = weaviate.connect_to_local()
# Create collection
client.collections.create(
name="Document",
vectorizer_config=Configure.Vectorizer.none(), # we'll provide vectors
properties=[
Property(name="text", data_type=DataType.TEXT),
Property(name="source", data_type=DataType.TEXT),
Property(name="created_at", data_type=DataType.DATE),
],
)
collection = client.collections.get("Document")
# Batch insert
with collection.batch.dynamic() as batch:
for doc in documents:
batch.add_object(
properties={"text": doc.text, "source": doc.source},
vector=doc.embedding,
)
# Hybrid search (BM25 + vector)
results = collection.query.hybrid(
query="machine learning infrastructure",
alpha=0.75, # 0 = pure BM25, 1 = pure vector
limit=10,
)
Why Weaviate stands out
Hybrid search is first-class. Combining sparse (BM25) and dense (vector) retrieval in a single query beats pure vector search for most knowledge-base use cases by 5–15% on NDCG@10.
When to use Weaviate
- Need hybrid search out of the box
- Multi-modal data (text + images)
- Want self-hosted with a managed option available
Option 3: Qdrant
Qdrant is a high-performance, open-source vector search engine written in Rust. It's gaining adoption for latency-critical applications.
Payload filtering done right
Qdrant's killer feature is payload-filtered search: filters are applied during ANN traversal, not after. This means filtered queries are nearly as fast as unfiltered ones.
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct,
Filter, FieldCondition, MatchValue,
)
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Upsert
client.upsert(
collection_name="docs",
points=[
PointStruct(id=1, vector=embedding, payload={"source": "wiki", "year": 2024}),
],
)
# Filtered search — filter applied DURING HNSW traversal
results = client.search(
collection_name="docs",
query_vector=query_embedding,
query_filter=Filter(
must=[FieldCondition(key="source", match=MatchValue(value="wiki"))]
),
limit=10,
)
When to use Qdrant
- Highly selective metadata filters on large collections
- Latency-critical workloads (p99 under 20ms)
- On-prem or air-gapped environments
Option 4: pgvector
pgvector is a PostgreSQL extension. If you're already on Postgres, it's the lowest-friction path to vector search.
-- Install
CREATE EXTENSION vector;
-- Create table
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
source TEXT,
embedding VECTOR(1536)
);
-- HNSW index (faster queries, more memory)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Query
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE source = 'wiki'
ORDER BY embedding <=> $1::vector
LIMIT 10;
The transactional advantage
pgvector shines when you need ACID semantics across vector and relational data. Updating a document and its embedding atomically is trivial in Postgres. In a dedicated vector DB, you'd need to coordinate two separate writes.
Performance ceiling
pgvector's HNSW index tops out around 5–10M vectors before query latency degrades noticeably. Beyond that, you're looking at partitioning or a dedicated vector database.
When to use pgvector
- Existing Postgres infrastructure
- < 5M vectors
- Need transactional consistency between vector and relational data
- Minimize operational complexity
Head-to-Head Comparison
| Dimension | Pinecone | Weaviate | Qdrant | pgvector |
|---|---|---|---|---|
| Operational overhead | None (managed) | Low–Medium | Low | None (if on Postgres) |
| Filtered search | Post-ANN | Post-ANN | During ANN | Post-ANN |
| Hybrid search | No | Yes (native) | Yes | Via RRF manually |
| Scale ceiling | Effectively unlimited | ~100M+ | ~100M+ | ~5–10M |
| Latency (p99 @1M vecs) | ~30ms | ~20ms | ~10ms | ~15ms |
| Open source | No | Yes | Yes | Yes |
| Best fit | Managed simplicity | Hybrid search | Filter-heavy workloads | Postgres shops |
Production Configuration Checklist
Regardless of which database you choose:
- Set ef_search / nprobe: Higher values improve recall but increase latency. Start at ef_search=100 and tune
- Pre-filter selectivity: If > 50% of vectors match a filter, post-ANN filtering is fine; below 10%, you need Qdrant-style during-traversal filtering
- Dimension reduction: If using OpenAI's text-embedding-3-large (3072 dims), consider truncating to 1536 or 512 for 2–4x storage savings with minimal recall loss
- Monitoring: Track p99 query latency and recall@k over time—both degrade as index grows
Building a RAG system on top of your vector store? Read our guide on RAG Systems at Scale.