Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Looking for a specific word, or possibly a phrase in a book then you’d use the books index. What if you wanted to find that same word or phrase in a library full of books. To do that, you need an inverted index. It’s one of the oldest and most important data structures in information retrieval, so fundamental that search engines, full-text databases, and log analytics systems could not exist without it.
An inverted index is deceptively simple: instead of mapping documents to words, it maps words to documents. This inversion makes it possible to answer queries like “find all documents containing the word database” in milliseconds, even across billions of records.
In this post I’ll explore the history of inverted indexes, how they’re implemented, examples of databases that use them, and why they remain a cornerstone of search and analytics.
The idea of inverted indexing dates back to the early 20th century in library science. Librarians maintained “back-of-the-book indexes” and concordances of words, mapping them to page numbers.

The first formal use of inverted indexes in computing emerged in the 1950s–60s, when Hans Peter Luhn at IBM and other pioneers of information retrieval began building “term to document” lists for automated searching. This work directly influenced the development of IR (information retrieval) systems like SMART in the 1960s.
By the 1990s, inverted indexes were powering AltaVista and then Google, scaling from thousands of documents to billions of web pages.
An inverted index flips the normal relationship:
Suppose we have three documents:
Doc1: “data modeling is powerful”
Doc2: “data governance is essential”
Doc3: “governance and modeling”
Forward Index
Doc1 → [data, modeling, is, powerful]
Doc2 → [data, governance, is, essential]
Doc3 → [governance, and, modeling]
Inverted Index
data → [Doc1, Doc2]
modeling → [Doc1, Doc3]
governance → [Doc2, Doc3]
is → [Doc1, Doc2]
powerful → [Doc1]
essential → [Doc2] and → [Doc3]
Now, if we query “governance AND data”, we intersect [Doc1, Doc2] with [Doc2, Doc3], yielding Doc2.
from collections import defaultdict
docs = { 1: "data modeling is powerful", 2: "data governance is essential", 3: "governance and modeling" } <
inverted_index = defaultdict(list)
for doc_id, text in docs.items():
for word in text.split(): inverted_index[word].append(doc_id)
print(dict(inverted_index))This is the essence of search engines—though production systems add tokenization, stemming, stop-word removal, compression, and ranking.
GIN (Generalized Inverted Index) for full-text search.Some vector search systems combine inverted indexes with approximate nearest neighbor techniques to mix keyword and semantic search.
Databases often balance real-time indexing (for logs) vs batch indexing (for web crawls)
The actual history of the inverted index can be traced further back. The inverted index is essentially a digital concordance. Medieval monks compiled concordances of the Bible, mapping every word to its verses. By the 20th century, librarians formalized back-of-book indexes. In computing, the same concept scaled to the web: Larry Page and Sergey Brin’s early PageRank + inverted index made Google vastly faster than competitors like Yahoo’s directory model.
Alternatives include signature files, suffix arrays, and tries, but inverted indexes remain dominant.
CREATE TABLE articles (id serial, content text);
-- Create a GIN index on text
CREATE INDEX content_idx ON articles USING GIN (to_tsvector('english', content));
-- Query for documents containing both 'data' and 'governance'
SELECT * FROM articles WHERE to_tsvector('english', content) @@ to_tsquery('data & governance')Here PostgreSQL uses its inverted index to resolve the query in milliseconds. See PostgreSQL: Documentation: 18: 65.4. GIN Indexes
The inverted index is one of the most enduring and impactful data structures in computer science. From medieval concordances to modern search engines, the idea remains the same: invert the mapping, and suddenly search becomes tractable.
Every time you hit Ctrl+F, query logs in Elasticsearch, or search a product catalog, you’re standing on a century of innovation built on the inverted index.
You must be logged in to post a comment.