Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

In the vast landscape of data engineering and computational systems, Directed Acyclic Graphs (DAGs) quietly orchestrate the flow of tasks, computations, and dependencies. From compiler optimizations to machine learning pipelines, DAGs are everywhere (including a film with Brad Pitt, for…

Nothing. Ok, I should probably write a bit more than that. Nothing replaces SQL outright. Instead, three complementary layers are emerging around SQL. It will likely be with us forever, but it’s interesting to understand what alternatives exist and why.…

Let’s look at some hypothetical examples you may have actually witnessed with your engineering teams. We have one server, but probably need two. Let’s use Kubernetes. We need to count the number of messages Let’s use Redis. We want to…

Every generation of data tooling has its keystone. In the 1980s, relational databases defined the foundation. In the 2000s, Hadoop represented a seismic shift in scale. Today, in the cloud-first era of the modern data stack, one tool stands out…

For over a decade (I had to check this as it made me feel old) Pandas has been the go-to Python library for data analysis. Its DataFrame API has shaped how millions of analysts, scientists, and engineers work with tabular…

The last five years have seen a number of open data table formats vying for position, these include Apache Iceberg, Delta Lake and Apache Hudi. By mid-2025, the winner is clear: Apache Iceberg. This is not just a technical victory…

The term data product has become ubiquitous in modern data organizations, but its meaning often remains fuzzy. Teams talk about building data products, while creating the same old dashboards, reports, and datasets they’ve always built. Is this new Excel spreadsheet…

DataFrames dominate modern data analysis. If I had a £ for every time I typed… import pandas as pd Whether its Pandas/Polars in Python, Spark Data Frames, or R’s original implementation, the abstraction has become the default for manipulating tabular…

Data contracts are becoming the backbone of modern data architectures. As organisations shift from ad-hoc pipelines to product-oriented data ecosystems, they need guarantees: that data will arrive on time, in the right shape, and with the expected semantics. This is…

Databricks (like Snowflake) doesn’t rely on traditional B-trees, because it’s built on a cloud-native, columnar, distributed file architecture. It avoids B-trees entirely because the cost of maintaining per-row index structures would destroy the scalability benefits of its append-only, distributed Parquet…
You must be logged in to post a comment.