Sections
- Introduction
- The Famous Type 1 and Type 2
- Type 3 and Multiple
- One More Weapon Type 6
- Two Analytic Possibilities
- Conclusion
Introduction
The warehouse responds to changes in source data through a process known as slowly changing dimensions. This data down the pipeline will become part of the dimensional model. Some different levels or techniques will help to keep a record of the historical data and be able to analyze it.
For instance, when a Product is reassigned to a different Product Group due to a correction in the source system or a customer address update when they move.
This blog will review a special technique that records historic changes in Dimensions when there is a need to be able to analyze using new and old values. An important part of the star schema design is the identification of slow change processing rules for dimensions. For a given source change, the correct response is dictated by the analytical requirements.
There are cases when conflicting business requirements appear in the handling of change data. A group of users requires a Type 1 response, while a second group requires a Type 2 response. The basic SCD techniques are not able to cope with this type of requirement and thatโs when hybrid methodologies come into play. The drawback is that becomes more complex both for the ETL process and for final users.
Our primary focus in this blog is SCD Hybrid or Type 6 which is a combination of Type 1, Type 2, and Type 3. The number 6 comes from 1 * 2 * 3 = 6 or 1 + 2 + 3 = 6. However, we will give a quick review of the basic most broad techniques Types 1, 2, and 3 having a more holistic understanding of the hybrid response.
The Famous Type 1 and Type 2
Real-Time Updates SCD Type 1: This method focuses on simplicity. Any changes in dimension attributes overwrite existing values without preserving historical information. While efficient, SCD Type 1 cannot track historical changes over time.
Our example is a retail business that receives orders from big Companies. For the first iteration, the requirement is that even though values in the source system change it is not necessary to preserve history this will create a Type 1 response.
One of our customers recently updated their company Name, as we can see in the first picture for the โInitial Stateโ Customer = Chris Sorensen has a company named โIteration Insightsโ.
New state with the new Company Name โIteration Insights Incโ, all history of the data element is lost.
In this case, there is no trace that the column ever contained a different Company Name. This can lead to confusion as you can see the facts are retroactively altered.
Note: The Surrogate Key CustomerKey = 100 did not change in the Fact Orders. The update overwrote only a descriptive value in the CustomerDimension table.
Historical Tracking SCD Type 2: A more robust solution, SCD Type 2 employs versioning techniques such as surrogate keys and effective date ranges to capture and preserve historical changes in dimensional data. This ensures a comprehensive historical record but introduces the challenge of managing a growing number of records over time.
In our Initial State, Chris Sorensen has been managing the Company since 2015.
Note: For simplicity to visualize the Fact Table there is 1 order from Iteration Insights with Quantity = 9
However, after a restructuring, Jarid McKenzie became the new President. In September 2020 the source systems recorded the change, and the historic change was registered as Type 2 response.
As presented any previous records were not changed, this response preserved the context for facts that were associated with the old value, while permitting new facts to be associated with the new value.
Note: A new surrogate key = 106 was created so new orders will be recorded with a new Key as well.
For more information about SCD Type 2 check out my blog: How to Implement Slowly Changing Dimensions(SCD) Type 2 Using Delta Table
Type 3 and Multiple
Limited Historical Snapshots SCD Type 3: Balancing between simplicity and history tracking, SCD Type 3 maintains a limited history by keeping both the current and previous versions of select attributes. This compromise enables a degree of historical analysis without the extensive record-keeping of SCD Type 2.
In this case, 2 attributes are modeled one for the current value and one for the previous value. When a change happens, both columns are updated, and no rows are added. Business Users can use either column in their analysis and the process is repeatable, the solution can be expanded if multiple versions of the changed value are required.
At the initial state, the Iteration Insights offices are located in the South Region of the Province.
The company move offices to the North due to a new lease in a new building.
Note: Even though Iteration Insights moved offices from South to North the old record and new record Orders are using the same Surrogate Key
SCD Type 3 Multiple
What happens when there are multiple historical changes for a dimension?
Iteration Insights found a new partner that owns a building in the West area, so the system records a new update for the current Year in 2024. Previous year’s columns are named with the year number, current year is managed in the data model as RegionCurrent.
Note: New and old orders maintain the same surrogate key
One More Weapon Type 6
SCD Type 6 hybrid, implies a methodology that combines features from various traditional SCD types (1 through 3) to create a more flexible and adaptive solution for handling slowly changing dimensions.
As we know from our fictitious example โIteration Insights โchanges their name from time to time. When gathering requirements from the user they reply saying that they always use the current company name for their analysis.
That is a Type 1 response which is common if there are analytic comparisons between the current year versus last year. In the second round of requirements gathering the user mentions that sometimes they need to know the company name that was in effect at the time of an Order for statutory reports. As we analyze a Type 1 will not associate historical company name with each order.
The next response Type 2 unfortunately is not compatible with the initial requirement to use the current name to analyze all history.
Our next technique to resolve the complication is Type 3, with this one we can capture two versions of the attribute. Unfortunately, neither version of the record can be used to segregate facts recorded before the change from facts recorded after the change. One or the other value can be used to study all facts, but it is not possible to know which version was in effect at the time of any individual transaction.
The last resource is a hybrid response one that is designed to allow for Type 1 and Type 2 and provide both separate dimension columns. One implementation is designed as Type 1 and will be used to group all facts under the latest value. The other will be designed as Type 2 and will be used to group facts with the historic values.
Initial State Iteration Insights has 2 orders, one for a quantity of 9 and the other for 3
In April 2016 the company changed its name to โIteration Insights Incโ and generated a new order for a quantity of 2.
Note:
- Step 1: Old row(s) updated with new CompanyNameCurrent
- Step 2: Update ExpireDate with the date of change and flag the Current Column as โExpiredโ
- Step 3: Row is added with the new name in both positions. The Current column is flagged as โCurrentโ, and a new Surrogate Key (Customer Key) is created.
As we can see previous Fact records maintain the same CustomerKey = 100, however, new, and future records will use the new SK = 106 for the new company name.
There is a last Company Name change in June 2023 and the system generates a new update.
The same steps are applied and a new SK = 107 is generated. Only new fact transactions will use the newest key generated.
Surrogate Keys and Flags give us the flexibility that we were looking for, now it is possible to study facts between versions and know exactly the time those dimensional records were in place.
Two Analytic Possibilities
To expose the previous logic, I created the CustomerDimension and the FacOrders table.
With the hybrid response we have 2 analytical possibilities.
- We can Sum up the total order quantity with the Current most up-to-date โCompany Nameโ
- The second option is capable of analyzing Order quantity for the previous version of โCompany Nameโ when they were in effect.
Note : The โCurrentโ flag from SCD Type 2 is essential for correct analysis, otherwise the final output will respond with erroneous data.
Conclusion
All in all, the concept of SCD Type 6 Hybrid represents a dynamic and adaptable approach to handling slowly changing dimensions in the world of data warehousing. By mixing elements from SCD Types 1, 2, and 3, this concept aims to provide a versatile solution that is useful in the diverse nature of dimensional data.
Although, SCD Type 6 Hybrid is not widely implemented in data warehouse scenarios. This technique will come to rescue your data when Type 1, Type 2, and Type 3 struggle to deliver the desired historical analysis.
When finding conflicting requirements with business users this will likely follow a revision of SCD Type 6 or as Kimball and Ross call it Type1/2 hybrid.