Just Build the Physical Models?

Honestly, we will do the conceptual and logical models later. Well, maybe. Probably.

Yeah, we won’t.

Data modelling is an art, but not everyone is an artist. We entust this highly specialised domain to engineers with (sometimes) little training and just hope for the best. Delivery trumps design every time and we end up with hidden technical debt. But what debt…we delivered on time and it works in <insert DB of choice>, right?

You did. And it does. But when I look at the models I see numeric fields marked as strings and flags marked as integers. No primary keys. Why are there no labels? Why do all the field descriptions say “to be added later”. What does the field U_HOS_B mean? These are the fundamental components of what makes a data model. They are not optional.

Employee with curly hair seated at a desk, holding their head in frustration while working on a computer in a modern office environment.

What they have shipped is likely a ticking time bomb. Good luck to the next team that wants to use these models. We may be lucky, the new team might have someone who intuitively knows what the column U_HOS_B means (that’s a real column name by the way…I still have no idea what it does). If your database has one table, if it is never updated and everything always works, then cool. But that’s not a world most of us operate in.

It’s not good enough. What happens when we want to ship a Snowflake physical model to Databricks? We copy the already poor physical model and hand-write it all again. Why, because we don’t have a logical model. If we did we would simple forward engineer it to a Databricks physical model using a data modelling tool. I talk about logical models here:

Your organisation needs discipline. Your organisation needs standards and policies. You need to stop the deployment of poor quality data models. Someone needs to have the power to say “no”!

In the age of AI, semantics are everything. If all the AI model has to go on is a table with 200 fields, all with labels that say “description to be added later”, then good luck and well done. You’ve just successfully weaponised chaos.

Fundamentally it all comes down to having the wrong people in the role and having management that just don’t see the problem. In the same way that longevity apparently makes people good managers, being able to write a few lines of Java makes someone a good data modeller.

How did we end up here? Well, there are a few reasons. The elephant in the room here is the role of the architect. I have been all flavours of architect over the years; enterprise, solution, technical and now informational. And we need architects, yes, but we first and foremost need pragmatic architects. The best advice I was given years ago was:

…to be credible as an architect you need one foot in the ivory tower and one in the trenches

A tall modern skyscraper with a vibrant sky in the background, showcasing geometric patterns on its facade.

You should have a vision and a goal. But you need to bring the engineers and modellers along with you. Using words like ontology, sementics and taxonomy is great, but if you are the only one that understands the terms, then the message is already lost. It all comes down to a simple, shared understand. In financial services the majority of work deals with tabular data of varying shapes and sizes. We know how to model this. We’ve known since the 1970s:

I used the words conceptual and logical models at the top of this post. I always assumed that everyone working in a technology knows what they mean, why they exist and what we use them for. This is absolutely not the case. But in a room full of your peers people just pretend to understand. You can’t assume anything.

What can we do?

Get everyone in the tent with a shared understanding.

  • Education – Dont assume everyone understands data modelling. They don’t. They may even try and actively resist these initiatives. Demonstrate value, give focused training and win hearts and minds.
  • Standards and Guidelines: This is not a wall of text on a confluence site. It’s a domain in its own right. Use UX teams to design ways to showcase this and make it compelling. Videos, infographics etc.
  • Tooling: Miro isn’t a data modelling tool. Neither is PowerPoint, or MSPaint. Modern SaaS tools like SqlDBM can accelerate your data modelling journey. Enforce standard sinto the tool wherever possible.
  • Checklists: Formal architecture review boards are necessary. But they can stifle innovative and kill momentum. Think of quick checklists every data modeller has and add gates that means all data models have to be reviewed before release.

For other examples, please see the blog I wrote on data literacy.

Discover more from Where Data Engineering Meets Business Strategy

Subscribe now to keep reading and get access to the full archive.

Continue reading