1What problem does Approximate Nearest Neighbor (ANN) search solve that exact nearest neighbor search cannot?
Vector databases store high-dimensional embeddings and perform approximate nearest neighbor (ANN) search to find semantically similar items. They power AI-driven features like semantic search, recommendation engines, and retrieval-augmented generation (RAG) by finding items that are close in meaning, not just matching on keywords.
Vector databases are the fastest-growing category of data infrastructure, driven by the explosion of AI applications that need to find semantically similar items rather than exact matches. Traditional databases search by exact key match (key-value stores), keyword match (search engines), or structured predicates (relational databases). Vector databases search by meaning -- given a query embedding (a high-dimensional numerical vector representing the semantic content of text, images, audio, or any other data), find the items whose embeddings are closest in vector space. This enables use cases that are impossible with traditional databases: finding documents that mean the same thing as a query even if they use different words, recommending products similar to ones a user has liked, or retrieving relevant context for a large language model (RAG).
The core technical challenge in vector search is the curse of dimensionality: exact nearest neighbor search in high-dimensional space (typically 768-1536 dimensions for modern embedding models) requires comparing the query vector against every vector in the database, which is O(n) and impractical for large datasets. Approximate Nearest Neighbor (ANN) algorithms solve this by building index structures that trade a small amount of accuracy (recall) for dramatically faster search. HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each vector is connected to its neighbors; search navigates the graph from a random entry point, greedily moving closer to the query vector at each step. IVF (Inverted File Index) partitions vectors into clusters using k-means, then searches only the nearest clusters. ScaNN (Scalable Nearest Neighbors) combines quantization with ANN for extreme scale. Each algorithm offers different trade-offs between index build time, memory usage, query latency, and recall (the percentage of true nearest neighbors found).
The 2026 vector database landscape has stratified into three tiers. Embedded extensions like pgvector bring vector search directly into PostgreSQL, enabling hybrid queries that combine traditional SQL filters with vector similarity in a single transactional database -- ideal for applications with moderate vector workloads (under 10 million vectors) that want to avoid operational complexity. Managed services like Pinecone provide a serverless API optimized purely for vector operations, handling index management, scaling, and updates -- ideal for teams that want to deploy vector search quickly without infrastructure expertise. Purpose-built engines like Milvus and Weaviate are designed from the ground up for billion-scale vector search, offering fine-grained control over index types, quantization, and hybrid search -- ideal for large-scale production deployments where performance tuning is critical.
Hybrid search -- combining keyword (BM25) and vector (ANN) search in a single query -- has become the standard production pattern because neither approach alone is sufficient. Keyword search excels at finding exact matches (product SKUs, error codes, proper nouns) but misses semantic similarity. Vector search captures meaning but can miss important keywords, especially for out-of-distribution queries. Reciprocal Rank Fusion (RRF) or cross-encoder reranking merge results from both approaches, providing better retrieval quality than either alone. Metadata filtering (pre-filtering vectors by structured attributes before ANN search) is critical for production use cases where you need 'semantically similar items in this product category under $50' rather than just 'semantically similar items.'
The Music Recommendation Analogy
Imagine every song in a music library is represented as a point in a multi-dimensional space where the dimensions are tempo, energy, danceability, mood, and genre similarity. Two songs that sound alike will be close together in this space, even if they have different titles, artists, or languages. When you say 'find me songs similar to this one,' the vector database finds the nearest points in this space -- songs that share similar tempo, energy, and mood. It does not match on keywords like song title or lyrics; it matches on the overall vibe of the music. That is the difference between keyword search ('find songs with the word love') and vector search ('find songs that feel like this one').
Notion AI
Notion uses pgvector within PostgreSQL to power its AI workspace search. When a user asks a question in natural language, the query is converted to an embedding using an LLM, and pgvector finds the most semantically relevant pages, databases, and blocks across the user's workspace. By using pgvector inside PostgreSQL, Notion combines vector similarity search with standard SQL filtering (workspace permissions, page access controls) in a single query, avoiding the complexity of a separate vector database.
Spotify
Spotify uses approximate nearest neighbor search (their custom Annoy library, now complemented by ScaNN) to power music and podcast recommendations. Each track and user is represented as an embedding in a shared vector space. Finding similar tracks is a nearest-neighbor query: given a track embedding, find the 50 closest track embeddings. The system processes billions of recommendation queries per day, requiring index structures that fit in memory across distributed serving nodes.
Shopify
Shopify uses Milvus for product similarity search across millions of products in their merchant stores. When a shopper views a product, Milvus finds visually and semantically similar products by querying the product's image and text embeddings. Metadata filtering restricts results to the same merchant's store and available inventory. The system handles burst traffic during flash sales, with Milvus's GPU-accelerated indexing enabling rapid re-indexing as product catalogs change.
| Aspect | Description |
|---|---|
| Recall vs Query Latency | ANN algorithms sacrifice exact results for speed. HNSW's ef_search parameter controls this trade-off: higher values search more of the graph, improving recall from 90% to 99% but increasing latency from 1ms to 10ms. Applications must define their acceptable recall floor -- for recommendations 90% is fine, for legal document retrieval 99%+ may be required. |
| Index Build Time vs Query Performance | HNSW indexes are expensive to build (O(n * log(n)) time, hours for large datasets) but provide fast queries. IVF indexes require expensive k-means clustering during training but enable faster partial-dataset scans. Flat indexes require no build time but O(n) query time. The choice depends on how frequently the index is rebuilt and how large the dataset is. |
| Embedded (pgvector) vs Dedicated Vector Database | pgvector avoids operational complexity by running inside PostgreSQL, providing transactional consistency and SQL-based filtering. However, it is limited in scale (practical ceiling around 10-50 million vectors), lacks GPU acceleration, and has fewer index tuning options. Dedicated vector databases (Pinecone, Milvus) scale to billions of vectors but require a separate service, data synchronization pipeline, and operational expertise. |
| Memory vs Disk Trade-off | HNSW requires all vectors and graph edges in memory, consuming significant RAM (a 1M-vector dataset with 1536 dimensions uses ~6 GB for vectors alone, plus graph overhead). Product Quantization (PQ) compresses vectors by 4-8x at the cost of lower recall. Disk-based indexes (DiskANN) trade latency for memory savings, keeping only a compact graph in memory while vectors reside on SSD. |
Notion AI -- pgvector for Workspace Semantic Search
Scenario
Notion needed to add AI-powered search across users' workspaces -- pages, databases, comments, and embedded content. Users expected to ask natural language questions ('what was the Q4 revenue target?') and get relevant results even when the exact keywords did not appear on any page. The system needed to respect workspace permissions and integrate with Notion's existing PostgreSQL infrastructure. Building and operating a separate vector database was an undesirable increase in infrastructure complexity.
Solution
Notion adopted pgvector as an extension within their existing PostgreSQL deployment. Each page and block is embedded using an LLM at write time, and the embedding is stored in a vector column alongside the content. Search queries are embedded on the fly, and pgvector's HNSW index finds the nearest pages in vector space. Crucially, the query combines vector similarity with standard SQL WHERE clauses for workspace permissions, page sharing settings, and content type filters -- something that would require a separate filtering step with a standalone vector database.
Outcome
Notion AI search achieved sub-200ms latency for semantic queries across workspaces containing millions of pages. Using pgvector eliminated the need for a separate vector database service, reducing operational complexity and keeping the data model within a single transactional system. Permission enforcement through SQL WHERE clauses guaranteed that users never see results from pages they do not have access to, a critical security requirement that is harder to implement correctly with external vector databases.
See Vector Databases (pgvector, Pinecone, Milvus) in action
Explore system design templates that use vector databases (pgvector, pinecone, milvus) and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What problem does Approximate Nearest Neighbor (ANN) search solve that exact nearest neighbor search cannot?
2Why is hybrid search (combining keyword + vector search) recommended for production RAG systems?
3What is a key advantage of using pgvector inside PostgreSQL compared to a dedicated vector database?