Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

If you’ve ever worked with a data warehouse or business intelligence tool, then you’ve probably encountered a star schema. If you haven’t, then you might want to look into it…
This elegant data modeling pattern has become the de-facto standard for analytical databases, and for good reason.
A star schema is a database design pattern that organizes data into two types of tables: a central fact table surrounded by dimension tables. When visualized, this structure resembles a star, hence the name.
The fact table sits at the center and contains the measurable, quantitative data about your business; things like sales amounts, quantities, or durations.
The dimension tables radiate outward from this center, providing the descriptive context needed to analyze those facts – information about products, customers, dates, and locations.
That’s all there is to it.

The fact table stores the metrics you want to analyze and contains foreign keys that link to dimension tables. Each row typically represents a specific business event or transaction.
For example, a sales fact table might contain columns like sales_amount, quantity_sold, discount_applied, along with foreign keys like product_id, customer_id, store_id, and date_id.
Dimension tables provide the who, what, when, where, and why of your data. They’re typically denormalized, meaning they contain redundant data to make queries simpler and faster.
A customer dimension might include customer_id, customer_name, email, city, state, country, and customer_segment all in one table even though this creates some data redundancy.
Here’s a simple normalized database for tracking product sales:

And a typical analytic query would come in form:
-- Query requiring multiple joins
SELECT
p.department,
p.category,
c.state,
EXTRACT(YEAR FROM s.sale_date) as year,
EXTRACT(MONTH FROM s.sale_date) as month,
COUNT(*) as transaction_count,
SUM(s.quantity) as total_quantity,
SUM(s.quantity * s.unit_price - s.discount_amount) as total_revenue
FROM sales s
JOIN customers c ON s.customer_id = c.customer_id
JOIN products p ON s.product_id = p.product_id
WHERE s.sale_date >= '2024-01-01'
GROUP BY p.department, p.category, c.state,
EXTRACT(YEAR FROM s.sale_date), EXTRACT(MONTH FROM s.sale_date)
ORDER BY total_revenue DESC;Now let’s transform this into a Star Schema.
First create the dimensional tables:

And introduce the Fact table (with some additional columns added for denormalisation purposes):

Now let’s calculate the monthly sales trend and look at the differences between the relational and star schema approaches:
-- relational
SELECT
EXTRACT(YEAR FROM s.sale_date) as year,
EXTRACT(MONTH FROM s.sale_date) as month,
TO_CHAR(s.sale_date, 'Month') as month_name,
SUM(s.quantity * s.unit_price - s.discount_amount) as revenue
FROM public.sales s
WHERE s.sale_date >= '2023-01-01'
GROUP BY EXTRACT(YEAR FROM s.sale_date),
EXTRACT(MONTH FROM s.sale_date),
TO_CHAR(s.sale_date, 'Month')
ORDER BY year, month;
-- star schema
SELECT
d.year,
d.month_number,
d.month_name,
SUM(f.net_amount) as revenue
FROM analytics.fact_sales f
JOIN analytics.dim_date d ON f.date_key = d.date_key
WHERE d.year >= 2023
GROUP BY d.year, d.month_number, d.month_name
ORDER BY d.year, d.month_number;The star schema transformation from just 3 tables provides:
Even with a simple 3-table relational model, the star schema provides significant benefits for analytical workloads.
The beauty of a star schema lies in its intuitive structure. Business users can understand it without deep technical knowledge. Want to see sales by product category? Join the fact table to the product dimension. Need to analyze by region? Join to the location dimension. It’s straightforward.
Star schemas are optimized for read-heavy analytical workloads. Most queries require only a single join between the fact table and each relevant dimension. Modern database engines and BI tools are highly optimized for this pattern.
Nearly every business intelligence tool on the market is built with star schemas in mind. Tableau, Power BI, Looker, and others work seamlessly with this structure, automatically recognizing the relationships and enabling drag-and-drop analytics.

Need to add new dimensions? You can extend a star schema without disrupting existing queries or reports. This makes it remarkably adaptable as business requirements evolve.
One challenge with dimension tables is handling changes over time. If a customer moves to a new city, do you overwrite the old address or keep history? Slowly Changing Dimensions (SCD) provide strategies for this, with Type 2 (keeping full history) being most common in star schemas.
The date dimension deserves special mention. Rather than storing just a date value, a proper date dimension includes pre-calculated attributes like day_of_week, is_holiday, fiscal_quarter, and week_number. This eliminates complex date math in queries and standardizes time-based analysis across the organization.
Sometimes a dimension doesn’t warrant its own table. Order numbers or invoice IDs often live directly in the fact table as degenerate dimensions—dimension keys without corresponding dimension tables.
The main alternative to a star schema is a snowflake schema, where dimension tables are normalized into multiple related tables. While this reduces data redundancy, it requires more joins and adds complexity.
In practice, star schemas win out in most scenarios. Storage is cheap, and the performance and simplicity benefits of denormalization outweigh the modest increase in data volume. Some organizations use a hybrid approach, normalizing only the largest dimensions.
Keep fact tables lean. Store only numeric measures and foreign keys. Descriptive attributes belong in dimensions.
Denormalize dimensions generously. Don’t be afraid of redundancy in dimension tables. The query simplicity is worth it.
Use surrogate keys. Create artificial keys for dimensions rather than using natural business keys. This provides stability when source system keys change.
Build conformed dimensions. When multiple fact tables share dimensions like customer or product, use the same dimension table for consistency across analyses.
Pre-aggregate when needed. For very large fact tables, create aggregate fact tables at higher grain levels to improve query performance.
With cloud data warehouses like Snowflake, BigQuery, and Redshift, the economics of star schemas have become even more favorable. Columnar storage and massive parallel processing make scanning large denormalized tables incredibly efficient.
Some organizations are even moving toward “one big table” approaches that take denormalization to the extreme. However, the star schema remains the sweet spot for most use cases—providing excellent performance while maintaining logical organization and manageability.
If you’re building a new data warehouse, start with star schemas. Begin with your most important business process, identify the key metrics you want to analyze, and build outward from there. Keep it simple, stay disciplined about separating facts from dimensions, and you’ll have a solid foundation for analytics that can grow with your organization.
The star schema has endured for decades because it solves the right problems in the right way. It’s not the fanciest modeling approach, but it’s the one that actually works in production—and that’s what matters.
You must be logged in to post a comment.