A look into the importance of using the star schema when building dimensional models in tools such as Power BI.
You may have heard the term star schema when it comes to Business Intelligence, Data Warehouse Modelling, and Microsoft Power BI best practices. But what is the star schema method, and why is it important in data modeling?
Raising awareness of this methodology is beneficial for business users, as data modeling is moving out of the hands of IT professionals. Self-Service BI is becoming more prevalent and general business users have become more involved in data model development.
As these users continue to use tools such as Power BI, they will eventually run into challenges from not following good modeling practices. Understanding and maintaining best practices will positively impact their data reporting in the long run, and ultimately, define a robust data model from a weak one.
Though the name itself is quite abstract, the concept is simple. Introduced in 1996 by Ralph Kimball, the star schema methodology was initially designed to be used when building data warehouses.
Over the past 30 years, it has evolved to become the design used for dimensional modeling by business users and report developers across a multitude of industries. This has enabled businesses to make the best use of their data and truly understand their business processes.
What is Dimensional Modeling?
Dimensional modeling is a logical design technique used to structure data so that it is intuitive to business users and can deliver fast query performance. The star schema is the most known type of dimensional modeling.
What makes the star schema stand out is its simplicity and ability to be understood by business users.
The star schema works by dividing data into measurements and the “who, what, where, when, why, and how” descriptive context. Broadly, these two groups are facts and dimensions.
By doing this, the star schema methodology allows the business user to restructure their transactional database into smaller tables that are easier to fit together. Fact tables are then linked to their associated dimension tables with primary or foreign key relationships.
An example of this would be a quick grocery store purchase.
The amount you spent and how many items you bought would be considered a fact, but what you bought, when you bought it, and the specific grocery store’s location would all be considered dimensions.
Once these two groups have been established, we can connect them by the unique transaction number associated with your specific purchase.
This example is showcased in the model below.
An important note is that each fact, or measurement, will be associated with multiple dimensions. This is what forms the star shape, the fact in the center, and dimensions drawing out around it.
Dimensions relating to the grocery store, the products you bought, and descriptions about you as their customer will be carefully separated into its table with its attributes.
How has the Star Schema Methodology evolved?
The development of star schemas has fundamentally changed over time. When it comes to applying the technique to Data Warehousing, it takes on a much more structured approach, requiring both the logical and the physical design. This includes extracting, transforming, and loading the data into its separate facts and dimensions.
Traditionally this process could take months to develop.
However, in today’s modern world, developing the star schema through tools such as Power Query in Power BI is easily accessible. In very short order, business users can simply import transactional data and restructure it into the facts and dimensions following these basic design principles.
Is the Star Schema still relevant 30 years later?
The answer is simple – Yes! Core business needs and how business users consume data have remained the same. And to that end, dimensional modeling has come full circle and it is now time to get back to basics.
Easy to Understand
One of the main reasons a star schema is the widely accepted best practice for structuring data is because of its simplicity. The method is much easier for the business user to understand than the typical source system models.
By having the information grouped into logical business categories, businesspeople can easily walk through the design and make sense of it. In fact, users of Excel PivotTables have been indirectly using star schemas for years.
This also allows the business to participate in the data modeling discussion. The data essentially reflects the complexity of the business and its processes.
Fast Query Performance
The second significant benefit is the speed at which the user can retrieve the information they need.
The predictable framework of the star schema can allow the database to make strong assumptions about its data and aid the performance. The star schema focuses on querying per fact, rather than querying all interrelated tables at once.
This means it targets only the relevant data you are looking for to provide the answers.
Agility when using the star Schema method is another one of its key benefits. Modeling each fact and dimension, group by group allows you to always add to what you are working on.
Users may start with one fact, and as their project progresses, to gain more knowledge on the business and they can easily add to their data warehouse or model.
This is the same for removing or backtracking during modeling since each business process is grouped on its own, making changes can be done quickly and efficiently.
The star schema remains relevant no matter the size of your data, although small datasets are the most common when it comes to star schema modeling.
The accessibility to simply query the data into facts and dimensions is intuitive and time-efficient. While Big Data solution tends to rely on a Data Warehouse, in which the star schema model will be built into, and physically loaded to hold the data.
Finally, it is important to mention the Cloud Data Warehouse, essentially the unlimited dataset combining any type of sources into one Data Warehouse without designing a star schema. Even though performance is not an issue here, when it comes to developing reports for the business, these models will require the use of star schemas to bridge the gap between the data and the end-users.
What is the Star Schema’s Role in Power BI?
As it relates to Power BI, the star schema plays an important role as articulated by the Power BI team. In fact, a properly designed star schema will help in both performance and predictability of a data model.
To this end, Marco Russo and Alberto Ferrari published a great article shortly thereafter titled The Importance of star schemas in Power BI which speaks to many of the above benefits.
The design’s ability to answer unknown questions throughout a project is one of the driving forces the model is strongly recommended. Once business requirements are set, anything outside of scope can be brought into the model at any point in time.
Meaning if the end-user is looking for a fact that is not in the model, bringing it in will be possible. Though it will require multiple steps and take time, including connecting to the data source and establishing its relationship with the rest of the model.
When it comes to dealing with some long-lived collections of data and using that data in lots of contexts such as marketing analytics, customer support, performance, etc. The business user can really benefit from the star schema method that has served data warehouse use cases well.
With enterprise use cases, they rely on the guarantee of data quality, data provenance, lineage, and governance. By having these stable, maintained, and rigorously designed structures to handle those collections of data over time, are what bring the business one step closer to the single source of truth.
Stay Up to Date with Our Latest Blogs
Subscribe to our mailing list to be notified when a new article is released.