What is important about Vector Databases (pgvector, Pinecone, Milvus) regarding "Embeddings are fixed-length numerical vectors (typically 768..."?

Embeddings are fixed-length numerical vectors (typically 768-1536 dimensions) that represent the semantic meaning of data. Items that are similar in meaning have embeddings that are close in vector space, measured by cosine similarity, Euclidean distance (L2), or dot product.

What is important about Vector Databases (pgvector, Pinecone, Milvus) regarding "HNSW (Hierarchical Navigable Small World) is the most popula..."?

HNSW (Hierarchical Navigable Small World) is the most popular ANN index, providing high recall (95-99%) with logarithmic query time. It builds a multi-layer navigable graph where upper layers provide fast coarse navigation and lower layers provide precise local search. Trade-off: high memory usage (stores all vectors + graph edges in memory).

What is important about Vector Databases (pgvector, Pinecone, Milvus) regarding "IVF (Inverted File Index) partitions vectors into clusters a..."?

IVF (Inverted File Index) partitions vectors into clusters and searches only the nearest clusters (nprobe parameter controls how many). It uses less memory than HNSW by storing vectors on disk with centroids in memory, but requires an expensive upfront training step to build cluster centroids.

What is important about Vector Databases (pgvector, Pinecone, Milvus) regarding "Hybrid search combining keyword (BM25) and vector (ANN) retr..."?

Hybrid search combining keyword (BM25) and vector (ANN) retrieval outperforms either approach alone. Reciprocal Rank Fusion (RRF) merges ranked lists from both methods without requiring score calibration. This is the standard pattern for production RAG systems.

What is important about Vector Databases (pgvector, Pinecone, Milvus) regarding "pgvector brings vector search into PostgreSQL, enabling sing..."?

pgvector brings vector search into PostgreSQL, enabling single-query hybrid operations like 'SELECT * FROM products WHERE category = $1 ORDER BY embedding $2 LIMIT 10' -- combining relational filters with vector similarity in one transactional database.

What is important about Vector Databases (pgvector, Pinecone, Milvus) regarding "Metadata filtering (pre-filtering vectors by structured attr..."?

Metadata filtering (pre-filtering vectors by structured attributes before ANN search) is essential for production use cases. Without it, ANN search returns globally similar items rather than items matching business constraints like category, price range, or availability.

Vetora

🧬Database Families

Vector Databases (pgvector, Pinecone, Milvus)

Vector databases store high-dimensional embeddings and perform approximate nearest neighbor (ANN) search to find semantically similar items. They power AI-driven features like semantic search, recommendation engines, and retrieval-augmented generation (RAG) by finding items that are close in meaning, not just matching on keywords.

Overview

Vector databases are the fastest-growing category of data infrastructure, driven by the explosion of AI applications that need to find semantically similar items rather than exact matches. Traditional databases search by exact key match (key-value stores), keyword match (search engines), or structured predicates (relational databases). Vector databases search by meaning -- given a query embedding (a high-dimensional numerical vector representing the semantic content of text, images, audio, or any other data), find the items whose embeddings are closest in vector space. This enables use cases that are impossible with traditional databases: finding documents that mean the same thing as a query even if they use different words, recommending products similar to ones a user has liked, or retrieving relevant context for a large language model (RAG).

The core technical challenge in vector search is the curse of dimensionality: exact nearest neighbor search in high-dimensional space (typically 768-1536 dimensions for modern embedding models) requires comparing the query vector against every vector in the database, which is O(n) and impractical for large datasets. Approximate Nearest Neighbor (ANN) algorithms solve this by building index structures that trade a small amount of accuracy (recall) for dramatically faster search. HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each vector is connected to its neighbors; search navigates the graph from a random entry point, greedily moving closer to the query vector at each step. IVF (Inverted File Index) partitions vectors into clusters using k-means, then searches only the nearest clusters. ScaNN (Scalable Nearest Neighbors) combines quantization with ANN for extreme scale. Each algorithm offers different trade-offs between index build time, memory usage, query latency, and recall (the percentage of true nearest neighbors found).

The 2026 vector database landscape has stratified into three tiers. Embedded extensions like pgvector bring vector search directly into PostgreSQL, enabling hybrid queries that combine traditional SQL filters with vector similarity in a single transactional database -- ideal for applications with moderate vector workloads (under 10 million vectors) that want to avoid operational complexity. Managed services like Pinecone provide a serverless API optimized purely for vector operations, handling index management, scaling, and updates -- ideal for teams that want to deploy vector search quickly without infrastructure expertise. Purpose-built engines like Milvus and Weaviate are designed from the ground up for billion-scale vector search, offering fine-grained control over index types, quantization, and hybrid search -- ideal for large-scale production deployments where performance tuning is critical.

Hybrid search -- combining keyword (BM25) and vector (ANN) search in a single query -- has become the standard production pattern because neither approach alone is sufficient. Keyword search excels at finding exact matches (product SKUs, error codes, proper nouns) but misses semantic similarity. Vector search captures meaning but can miss important keywords, especially for out-of-distribution queries. Reciprocal Rank Fusion (RRF) or cross-encoder reranking merge results from both approaches, providing better retrieval quality than either alone. Metadata filtering (pre-filtering vectors by structured attributes before ANN search) is critical for production use cases where you need 'semantically similar items in this product category under $50' rather than just 'semantically similar items.'

Key Points

1Embeddings are fixed-length numerical vectors (typically 768-1536 dimensions) that represent the semantic meaning of data. Items that are similar in meaning have embeddings that are close in vector space, measured by cosine similarity, Euclidean distance (L2), or dot product.
2HNSW (Hierarchical Navigable Small World) is the most popular ANN index, providing high recall (95-99%) with logarithmic query time. It builds a multi-layer navigable graph where upper layers provide fast coarse navigation and lower layers provide precise local search. Trade-off: high memory usage (stores all vectors + graph edges in memory).
3IVF (Inverted File Index) partitions vectors into clusters and searches only the nearest clusters (nprobe parameter controls how many). It uses less memory than HNSW by storing vectors on disk with centroids in memory, but requires an expensive upfront training step to build cluster centroids.
4Hybrid search combining keyword (BM25) and vector (ANN) retrieval outperforms either approach alone. Reciprocal Rank Fusion (RRF) merges ranked lists from both methods without requiring score calibration. This is the standard pattern for production RAG systems.
5pgvector brings vector search into PostgreSQL, enabling single-query hybrid operations like 'SELECT * FROM products WHERE category = $1 ORDER BY embedding <=> $2 LIMIT 10' -- combining relational filters with vector similarity in one transactional database.
6Metadata filtering (pre-filtering vectors by structured attributes before ANN search) is essential for production use cases. Without it, ANN search returns globally similar items rather than items matching business constraints like category, price range, or availability.

Simple Example

The Music Recommendation Analogy

Imagine every song in a music library is represented as a point in a multi-dimensional space where the dimensions are tempo, energy, danceability, mood, and genre similarity. Two songs that sound alike will be close together in this space, even if they have different titles, artists, or languages. When you say 'find me songs similar to this one,' the vector database finds the nearest points in this space -- songs that share similar tempo, energy, and mood. It does not match on keywords like song title or lyrics; it matches on the overall vibe of the music. That is the difference between keyword search ('find songs with the word love') and vector search ('find songs that feel like this one').

Real-World Examples

Notion AI

Notion uses pgvector within PostgreSQL to power its AI workspace search. When a user asks a question in natural language, the query is converted to an embedding using an LLM, and pgvector finds the most semantically relevant pages, databases, and blocks across the user's workspace. By using pgvector inside PostgreSQL, Notion combines vector similarity search with standard SQL filtering (workspace permissions, page access controls) in a single query, avoiding the complexity of a separate vector database.

Spotify

Spotify uses approximate nearest neighbor search (their custom Annoy library, now complemented by ScaNN) to power music and podcast recommendations. Each track and user is represented as an embedding in a shared vector space. Finding similar tracks is a nearest-neighbor query: given a track embedding, find the 50 closest track embeddings. The system processes billions of recommendation queries per day, requiring index structures that fit in memory across distributed serving nodes.

Shopify

Shopify uses Milvus for product similarity search across millions of products in their merchant stores. When a shopper views a product, Milvus finds visually and semantically similar products by querying the product's image and text embeddings. Metadata filtering restricts results to the same merchant's store and available inventory. The system handles burst traffic during flash sales, with Milvus's GPU-accelerated indexing enabling rapid re-indexing as product catalogs change.

Trade-Offs

Aspect	Description
Recall vs Query Latency	ANN algorithms sacrifice exact results for speed. HNSW's ef_search parameter controls this trade-off: higher values search more of the graph, improving recall from 90% to 99% but increasing latency from 1ms to 10ms. Applications must define their acceptable recall floor -- for recommendations 90% is fine, for legal document retrieval 99%+ may be required.
Index Build Time vs Query Performance	HNSW indexes are expensive to build (O(n * log(n)) time, hours for large datasets) but provide fast queries. IVF indexes require expensive k-means clustering during training but enable faster partial-dataset scans. Flat indexes require no build time but O(n) query time. The choice depends on how frequently the index is rebuilt and how large the dataset is.
Embedded (pgvector) vs Dedicated Vector Database	pgvector avoids operational complexity by running inside PostgreSQL, providing transactional consistency and SQL-based filtering. However, it is limited in scale (practical ceiling around 10-50 million vectors), lacks GPU acceleration, and has fewer index tuning options. Dedicated vector databases (Pinecone, Milvus) scale to billions of vectors but require a separate service, data synchronization pipeline, and operational expertise.
Memory vs Disk Trade-off	HNSW requires all vectors and graph edges in memory, consuming significant RAM (a 1M-vector dataset with 1536 dimensions uses ~6 GB for vectors alone, plus graph overhead). Product Quantization (PQ) compresses vectors by 4-8x at the cost of lower recall. Disk-based indexes (DiskANN) trade latency for memory savings, keeping only a compact graph in memory while vectors reside on SSD.

Case Study

Notion AI -- pgvector for Workspace Semantic Search

Scenario

Notion needed to add AI-powered search across users' workspaces -- pages, databases, comments, and embedded content. Users expected to ask natural language questions ('what was the Q4 revenue target?') and get relevant results even when the exact keywords did not appear on any page. The system needed to respect workspace permissions and integrate with Notion's existing PostgreSQL infrastructure. Building and operating a separate vector database was an undesirable increase in infrastructure complexity.

Solution

Notion adopted pgvector as an extension within their existing PostgreSQL deployment. Each page and block is embedded using an LLM at write time, and the embedding is stored in a vector column alongside the content. Search queries are embedded on the fly, and pgvector's HNSW index finds the nearest pages in vector space. Crucially, the query combines vector similarity with standard SQL WHERE clauses for workspace permissions, page sharing settings, and content type filters -- something that would require a separate filtering step with a standalone vector database.

Outcome

Notion AI search achieved sub-200ms latency for semantic queries across workspaces containing millions of pages. Using pgvector eliminated the need for a separate vector database service, reducing operational complexity and keeping the data model within a single transactional system. Permission enforcement through SQL WHERE clauses guaranteed that users never see results from pages they do not have access to, a critical security requirement that is harder to implement correctly with external vector databases.

Common Mistakes

⚠Skipping hybrid search in RAG systems. Pure vector search misses exact keyword matches (product IDs, error codes, proper nouns) that are critical for precision. Always combine vector similarity with keyword (BM25) search using Reciprocal Rank Fusion or a cross-encoder reranker for production retrieval pipelines.
⚠Choosing too many or too few dimensions for embeddings. Using a 1536-dimension model when 384 dimensions would suffice wastes memory and compute. Using too few dimensions loses semantic information. Benchmark recall on your specific dataset with different embedding models before committing.
⚠Ignoring index maintenance costs. HNSW indexes must be rebuilt when significant data changes occur (inserts invalidate the graph's navigability properties). Plan for periodic re-indexing, and consider IVF or streaming-insert-compatible indexes if your dataset changes frequently.
⚠Not filtering by metadata before vector search. Without metadata pre-filtering, ANN search returns globally similar items that may violate business constraints (wrong category, out of stock, wrong language). Always apply structured filters to reduce the candidate set before or during the ANN search.

Related Concepts

Search Engines Relational Databases Latency Numbers Every Engineer Should Know Horizontal vs Vertical Scaling Cache-Aside Pattern

See Vector Databases (pgvector, Pinecone, Milvus) in action

Explore system design templates that use vector databases (pgvector, pinecone, milvus) and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Explore how vector index parameters affect recall and query latency

Metrics to watch

recall_at_10query_latency_p99index_build_time

Run Simulation

Test Your Understanding

1What problem does Approximate Nearest Neighbor (ANN) search solve that exact nearest neighbor search cannot?

2Why is hybrid search (combining keyword + vector search) recommended for production RAG systems?

3What is a key advantage of using pgvector inside PostgreSQL compared to a dedicated vector database?

Deeper Reading