Kappa Architecture

The Kappa Architecture has emerged as a compelling alternative to the more complex Lambda Architecture. By treating all data as continuous streams and using a unified processing pipeline, it simplifies…

Apache Airflow

Bit of a departure from the usual data related posts but I think this one is important. Since its inception at Airbnb in 2014, Apache Airflow has become a cornerstone…

Data Vault 2.0

While dimensional modeling has dominated data warehousing for decades, another approach has been quietly gaining traction in enterprises dealing with complex, rapidly changing data landscapes. Data Vault 2.0 represents a…

Data Fabric

My personal experience with Data Fabric has been somewhat limited, as no organisation I’ve worked at has embraced this pattern. Speaking to my peers, I do see it being adopted…

Which DataFrame?

In modern data architecture, clarifying the distinction between file formats (like Avro, Parquet) and table formats (like Iceberg, Delta Lake), as well as interoperability layers (XTable), is critical. Your choice…

Anchor Modeling

Ever heard of this? Well, if you’re reading this site then you probably have. Anchor Modeling is an entity-centric, normalized data modeling technique built to handle change over time in…

Model Context Protocol

Generative AI has opened up a whole new world of opportunity. It is now foundational for business strategy and adoption has found it’s way into many homes via Open AI…

Normal Forms

Normalization (and it’s inverse, renormalization) is one of the foundational concepts in relational database design. It’s a systematic process of organizing data to reduce redundancy, eliminate anomalies, and ensure data…