Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

The word semantic is rather heavily used (often incorrectly) when discussing data models. The word semantic itself is an adjective relating to meaning in language or logic. When we think about its use in data models, we think of the meta-data associated with a particular entity or table. If you are building databases, you are likely already using a semantic model as the meaning is baked into the table names, relationships, descriptions, constraints, cardinality etc.
Here is a really simple example:
CREATE TABLE person (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
age INT CHECK (age >= 0),
country_id INT REFERENCES countries(id)
)Here we have created a semantic model with multiple dimensions:
So, what’s more to know? Well, if we are using just DDL then we are somewhat limited by how much semantic richness we can embedd into our models. Data Contracts get us a step further as we can embedd further dimensions into the yaml contract definitions, as described here:
The best way to think about semantic models is to abstract away from the technical space that answers questions like “what type is this field?” to adding business meaning, by answering questions like “what does this actually mean to our business?” or “how do we calculate quarterly revenue?”. This is where semantic models come in. A semantic model extends basic metadata by adding layers of business meaning, calculation logic, and contextual understanding. It’s metadata that knows not just the structure of your data, but the significance of it. Here is an example using Snowflake Semantic Views:
First we have the usual database DDL:
-- Raw data tables
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
order_amount DECIMAL(10,2),
order_date DATE,
status VARCHAR(20)
);
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
country VARCHAR(50),
segment VARCHAR(20)
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50)
);That we then augment with a rich semantic layers.
name: "Sales Analytics"
description: "Semantic model for sales and revenue analysis"
tables:
- name: orders
description: "Customer orders and transactions"
base_table:
database: sales_db
schema: public
table: orders
dimensions:
- name: order_date
synonyms: ["date", "order day", "purchase date"]
description: "Date when the order was placed"
data_type: date
- name: status
synonyms: ["order status", "state"]
description: "Current status of the order"
data_type: varchar
- name: customers
description: "Customer information"
base_table:
database: sales_db
schema: public
table: customers
dimensions:
- name: customer_name
synonyms: ["customer", "client name"]
description: "Name of the customer"
data_type: varchar
- name: country
synonyms: ["location", "region"]
description: "Customer's country"
data_type: varchar
- name: segment
synonyms: ["customer segment", "customer type"]
description: "Customer segment classification"
data_type: varchar
- name: products
description: "Product catalog"
base_table:
database: sales_db
schema: public
table: products
dimensions:
- name: category
synonyms: ["product category", "product type"]
description: "Product category"
data_type: varchar
# Define relationships
relationships:
- name: orders_to_customers
left_table: orders
left_column: customer_id
right_table: customers
right_column: customer_id
relationship_type: many_to_one
- name: orders_to_products
left_table: orders
left_column: product_id
right_table: products
right_column: product_id
relationship_type: many_to_one
# Define business metrics
metrics:
- name: revenue
description: "Total revenue from completed orders"
synonyms: ["sales", "total sales", "income"]
definition: SUM(order_amount)
filters:
- status = 'completed'
- name: order_count
description: "Number of orders"
synonyms: ["number of orders", "order volume"]
definition: COUNT(order_id)
- name: average_order_value
description: "Average value per order"
synonyms: ["AOV", "avg order size"]
definition: AVG(order_amount)
filters:
- status = 'completed'
# Time grains for aggregation
time_dimensions:
- name: order_date
time_grains: [day, week, month, quarter, year]What’s really powerful is that we are now able to introduce other dimensions like synonymns. You sales department may refer to a field as “Order Data” whilst procurement may refer to it as “Purchase Date”. The semantic model caters for this. In addition, we can now encode business-friendly metrics users can immediately recognise, such as “Average value per order” which is defined as the AVG(order_amount) where status = ‘completed’. The semantic model means business users get immediate vaule from the models without having to learn any SQL. Data models are the digital footprint of the business and it’s vital the business can use them.
And all of this is important for the reasons described. But where it becomes vital is how this semantic layer can inform AI. Without semantics or meaning, AI will hallucinate and guess the true meaning of the data model, and be confident and convincing when it does. But without meaning, what insight would any method gain from a column of numbers named MT_SN. Is it Market Sum, Maintenance Snap? With semantics we give the AI the information it needs and we now have meaning embedded, rather than guesswork and pretence.