Hierarchical Navigable Small World Graphs

When I first came across this algorithm I was intrigued. What did it mean? Working/playing with vector database like Milvus, Pinecone, Weaviate, or Vespa, I saw HNSW mentioned in the index settings.

I discovered this algorithm, Hierarchical Navigable Small World graphs, is the reason modern AI search can find semantically similar items in billions of vectors within a few milliseconds.

In this post, we’ll break down:

The problem HNSW solves
How it works under the hood
Why it’s so fast
Practical considerations in vector databases

The Problem: Nearest Neighbor Search at Scale

Vector search is about finding the nearest neighbors to a query vector in high-dimensional space.

Naïve approach:

Compare the query to all N vectors in your dataset
Compute a similarity metric (cosine, dot product, Euclidean etc)
Return the top-k
Complexity: It’s impossible for billions of vectors

Need an index structure that reduces comparisons while keeping accuracy high → Approximate Nearest Neighbor (ANN).

HNSW: A Graph-Based ANN Structure

HNSW builds a multi-layer graph where:

Nodes = data points (vectors)
Edges = connections to “neighbor” nodes in vector space

Key idea:
Instead of exhaustively checking every vector, walk the graph from a high-level overview down to fine-grained neighborhoods.

The Hierarchical Structure

HNSW organizes data into layers:

Top layers: Sparse graph, long-range connections, few nodes
Bottom layer: Dense graph, local neighborhood connections

When searching:

Start at the top layer at a random entry point
Greedily move to the neighbor that is closest to the query
Drop down a layer and repeat until reaching layer 0
At layer 0, do a localized search in a candidate set to refine results

This navigable small-world property means you jump across large areas quickly, then zoom in.

Building the Graph

When inserting a vector:

Assign it a random maximum layer (probabilistic skip-list style)
Insert it into each layer up to its max
Connect it to M nearest existing nodes in that layer (parameter M)
Maintain bidirectional edges for navigability

Parameters:

M: number of bi-directional connections per node (graph connectivity)
efConstruction: size of candidate list during insertion (higher = more accurate but slower build)

Searching the Graph

Parameters:

efSearch: size of candidate list during search (higher = better recall, slower)

Steps:

Start at top layer with entry point
Greedy search for the closest node in that layer
Move down one layer, use previous closest node as new entry
At bottom layer (layer 0), explore neighbors until efSearch candidates are evaluated
Return top-k nearest

Time Complexity:

In practice: hops
Each hop only checks a small fraction of points

Why It’s Fast

Logarithmic traversal: Layers reduce search space exponentially
Small-world property: Long links allow quick “teleports” across vector space
Local refinement: Only a small set of points are compared in detail
Cache efficiency: Neighbor lists fit in CPU cache, minimizing memory latency

Example in Practice

# Milvus index config:
index_params = { 
  "index_type": "HNSW", 
  "metric_type": "cosine", 
  "params": {"M": 16, "efConstruction": 200}
} 
collection.create_index("embedding", index_params)

# Search config
search_params = {
  "metric_type": "cosine", 
  "params": {"ef": 64}
} 
results = collection.search([query_vec], "embedding", search_params, limit=10)

# Milvus index config:
index_params = { 
  "index_type": "HNSW", 
  "metric_type": "cosine", 
  "params": {"M": 16, "efConstruction": 200}
} 
collection.create_index("embedding", index_params)

# Search config
search_params = {
  "metric_type": "cosine", 
  "params": {"ef": 64}
} 
results = collection.search([query_vec], "embedding", search_params, limit=10)

M=16: 16 connections per node
efConstruction=200: Higher build-time accuracy
ef=64: Good balance between recall and latency in search

Trade-Offs

Memory usage: HNSW keeps all vectors + graph connections in RAM
Build time: Higher accuracy means slower indexing
Updates: Insertion is possible but frequent deletes can be costly (graph rebalancing)
Dimensionality curse: Still suffers at extreme dimensions (>1,000) unless vectors are compressed (e.g., PQ, OPQ)

Beyond HNSW

While HNSW dominates in open-source vector DBs, alternatives exist:

IVF-PQ (Inverted File + Product Quantization) – great for disk-based large-scale
ScaNN – optimized for TPU/GPU
Annoy – Spotify’s tree-based approach, low memory

Many production systems use hybrid strategies — e.g., IVF for partitioning + HNSW for local search.

Closing Thoughts

HNSW is the workhorse behind modern semantic search — powering recommendations, RAG pipelines, and image search across billions of vectors.
It’s not perfect (memory-heavy, CPU-bound), but for in-memory high-recall search, it’s unmatched.

The next wave will likely be disk-backed hybrid HNSW + compression, making it feasible to handle trillions of vectors without sacrificing latency.

Hierarchical Navigable Small World Graphs