Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

The dominant marketing position in 2022 was that vector search would replace keyword search. Lexical retrieval was an artefact of the pre-LLM era. Embeddings captured meaning; BM25 captured surface form; meaning won. By 2024, every serious retrieval system on earth was hybrid. Pure vector lost. It’s worth understanding why, because the same mistake gets made every few years at a different layer of the stack.
Credit where due. Semantic search works for the case it was designed for: paraphrased natural-language queries against natural-language documents. “How do I cancel my subscription” finds a help article titled “Terminating your membership” without anyone hand-curating a synonym dictionary. That was a real step change. The decade of work in lexical retrieval before that had been incremental: better stemmers, better stopword lists, better tokenisers, smarter BM25 parameters. None of it solved the synonym problem. Embeddings did, almost as a side effect.
So when the dedicated vector databases shipped in 2021–22 and the demos were genuinely impressive, the enthusiasm wasn’t silly. The pitch was “semantic search is here, the old way is dead.” First half true. Second half wasn’t.
Pure vector search is bad at exact matching. Product codes. Error codes. Account numbers. Version strings. Anything where the surface form is the meaning. Search for ORA-00942 and a pure vector system happily hands you back results about other Oracle errors that share “topic” but not the specific code you needed. The compression argument applies: the embedding has thrown away the surface form, and the surface form was what mattered.
It’s also bad at rare terms. If a word appears once in the corpus, its embedding reflects the model’s prior more than the corpus distribution. BM25 handles both cases trivially because it’s an inverted index over actual terms. Exact lookups hit. Rare terms get scored higher precisely because they’re rare. The old technology kept working in exactly the places the new technology was weakest.
And the new technology was weak in places nobody bothered to advertise. Out-of-domain queries (where the query language differs from the training language) degrade badly. Multilingual mismatch is a quiet killer. Domain-specific jargon embeds poorly until you fine-tune. Numerical queries (“documents from after 2020 mentioning Q3 revenue”) are not what embeddings are for at all.
The pattern that won is conceptually simple. Run BM25 and vector search in parallel. Each produces a ranked list. Then combine them.
The most common combination method is reciprocal rank fusion, which is so simple you can write it on a napkin:
def rrf(rankings, k=60): """ rankings: list of ranked lists, each a list of doc_ids k: smoothing constant (60 is the conventional default) """ scores = {} for ranking in rankings: for rank, doc_id in enumerate(ranking, start=1): scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank) return sorted(scores.items(), key=lambda x: -x[1])
That’s the whole algorithm. No tuning, no normalisation across incompatible score ranges, no learned weights. RRF works embarrassingly well as a baseline. Most teams who go “we need something more sophisticated” later quietly come back to RRF after their learned fusion model becomes a maintenance burden.
The more sophisticated alternative is to pass both candidate sets to a cross-encoder reranker, which reads each (query, document) pair properly and produces a calibrated relevance score. Better than RRF, but more expensive at query time.
Every credible search system now does some version of hybrid. The list, briefly:
retriever abstraction lets you compose BM25 and vector queries in one request.pgvector, lexical via tsvector, combined in one SQL query.VECTOR_COSINE_SIMILARITY plus Cortex Search for hybrid.The engines that started lexical and bolted on vectors (Elasticsearch, Postgres) carry decades of lexical retrieval maturity. The engines that started pure vector and added BM25 (Pinecone, Weaviate) are still catching up on the lexical side. If you have to choose today, bias toward an engine that takes both retrieval modes seriously.
Two reasons. The first is that “keyword search is dead” is a punchier pitch than “we blend two retrieval techniques.” A new category always needs a foil. The second is more uncomfortable: the vendors who built pure vector products had a commercial interest in not telling you their product alone was insufficient. By 2023, the customers who’d shipped pure vector were quietly bolting BM25 back on. The vendors quietly added hybrid modes. The narrative caught up to the engineering about eighteen months later.
This pattern repeats itself. Pure neural beats statistical NLP, until it doesn’t and hybrid wins. Pure ML beats hand-engineered features, until hybrid wins. Pure agentic systems beat handcrafted pipelines, until hybrid wins. The old methods are old because they work. Combining the new with the old is almost always better than replacing one with the other.
If you’re starting a retrieval system today, the defensible default is:
That recipe will outperform any pure-vector setup on any non-toy corpus I’ve worked with. It’s also cheaper to operate, because BM25 is essentially free compared to dense retrieval.
Pure vector lost. Long live the boring old inverted index, doing the work nobody wants to credit it for.
Next up – the reranker, the bit of the retrieval pipeline nobody talks about that’s doing more of the work than everything else combined.