Category Research

The Future of Data Engineering

Is the answer AI? Ummm….not yet (correct at time of writing). Data engineering today looks remarkably different from five years ago. The role that emerged to build Hadoop clusters and write MapReduce jobs has evolved into something unrecognizable from its…

Compression Codecs

Memory is cheap, but it ain’t free. In the world of modern data engineering, compression is everywhere. It’s in your Parquet files, your Kafka messages, your database storage engine, your .txt file and your API responses. Yet despite its ubiquity,…

Gap and Island Problems

Writing this blog reminded me that I need a holiday. Anyway, there’s a class of data problem that shows up all over the place: On the surface, these seem like different problems. But they’re all the same problem in disguise:…

What are B-Trees?

When storing and retrieving large volumes of ordered data, the goal is to keep search, insert, and delete operations efficient while staying balanced. B‑trees (Balanced Trees) do exactly this: maintain sorted keys, keep all leaves at the same depth, and…

Spatial Databases

Spatial databases are designed to store, query, and manipulate data that represents objects in space — from cities and roads to oceans and underground utility lines. Unlike traditional databases that handle purely numeric or text data, spatial databases deal with…

From Tape to Terabytes

We don’t really think about where the data is physically stored. In the digital world, storage is the unsung hero. Data scientists, AI models and analytics tools often take centre stage, but none of them work without a place to…

Hierarchical Navigable Small World Graphs

When I first came across this algorithm I was intrigued. What did it mean? Working/playing with vector database like Milvus, Pinecone, Weaviate, or Vespa, I saw HNSW mentioned in the index settings. I discovered this algorithm, Hierarchical Navigable Small World…

Quantum & Data Security – Oh Sh#t!

The encryption protecting your data today relies on mathematical problems that are practically impossible for classical computers to solve. Factoring large numbers, computing discrete logarithms, and similar hard problems form the foundation of RSA, ECC, and Diffie-Hellman, the cryptographic algorithms…

David Hilberts Curve

Another stroll back in time here to talk about the way we order data and how it affects how efficiently we can store, retrieve, and analyze it. Sorting a single-dimensional list is straightforward, but real-world data is often multi-dimensional –…

Vector Databases & Triple Stores

The AI boom has brought a new class of databases into the spotlight: vector databases. But in the background triple stores have been quietly powering knowledge graphs and the Semantic Web for over two decades. At one of my previous…