Category Databases

DuckLake

There are some great names in software engineering and this is most definitely one of them. The lakehouse architecture promised to combine the best of data warehouses and data lakes. In practice, it delivered quite a lot of complexity. We…

Keys Are The Key

Not going to win any awards for this blog post title. But keys really are a critical part of a database design and are often misunderstood, ignored and used in the wrong way. Keys are just attributes like everything other…

The 6 Pillars

Not a Wu-Tang Clan song. This is about Data Quality. Every organization claims to want high-quality data, but when pressed to define what that means, the conversation becomes vague. They was “clean data” or “accurate data” or “reliable data” –…

What’s an Inverted Index?

Looking for a specific word, or possibly a phrase in a book then you’d use the books index. What if you wanted to find that same word or phrase in a library full of books. To do that, you need…

Graph Databases

Most databases are designed around storing things: Relationships between these things exist, but are treated as secondary concerns, represented through foreign keys and join tables that the database supports. Queries that traverse relationships deeply become unwieldy, requiring multiple self-joins that…

Why Delta Lake

Data Lakes promised everything. Store all your data in one place, in any format, ready for any workload. The reality was horrific. Data Lakes became data swamps, filled with inconsistent data, failed jobs leaving partial writes, no way to roll…

Data Masking Strategies

Your production database contains millions of customer records with real names, addresses, credit card numbers, social security numbers, and medical histories. Your developers need realistic data to test new features. Your analytics team needs representative datasets to validate models. Your…

Master Data Management

Ask five different systems in your organization for information about customer number 12345, and you’ll likely get five different answers. The CRM shows one address, the order management system shows another, billing has a third, and the data warehouse has…

Centipede Schema

I only recently discovered this pattern for data model design. If you’ve heard of the centipede schema previously, then it’s probably as a warning rather than a recommendation. This rare dimensional modeling pattern represents what happens when the normalization philosophy…

Serverless Databases

The term serverless is somewhat misleading because servers obviously exist somewhere. This is not magic. For decades, deploying a database meant answering impossible questions. How much capacity do you need? How many CPU cores, how much memory, how much disk?…