16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Get unlimited access to the best of Medium for less than $1/week. Become a member
Understanding Slowly Changing
Dimensions (SCD) in Data
Warehousing
Exploring Different SCD Types, Their Advantages, Disadvantages, and
Applications.
Mainak Das · Follow
Published in Python in Plain English · 7 min read · Feb 25
62
Open in app
[Link] 1/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Search Medium Write
Photo by Victor Grabarczyk on Unsplash
Slowly changing dimensions (SCD) is an important concept in data
warehousing that deals with managing the historical data of slowly changing
data over time. There are different types of SCDs, each with its own
characteristics, advantages, and disadvantages.
In this article, we will explore the different types of SCDs, their definitions,
advantages, disadvantages, and applications. We will also provide sample
Python code to help you understand how to implement SCDs in your data
warehousing project.
What are Slowly Changing Dimensions (SCD)?
In data warehousing, slowly changing dimensions are dimensions that
change over time, but at a slow pace. These dimensions typically store
[Link] 2/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
historical data about an entity, such as a customer, product, or location.
Slowly changing dimensions are important for tracking changes in the data
over time, and for making accurate reports and analyses.
There are three main types of slowly changing dimensions: Type 1, Type 2,
and Type 3.
SCD Type 1
SCD Type 1, also known as the overwrite method, updates the dimension
table with the latest data without maintaining the history. It simply
overwrites the existing data with the new data. This method is suitable for
situations where historical data is not required.
Advantages of Type 1 SCD:
Simple and easy to implement
Requires less storage space
Suitable for data that does not require historical information
Disadvantages of Type 1 SCD:
Historical data is lost
Does not support reporting and analysis of historical data
Applications of SCD Type 1:
It can be used in Customer Address data where the customer’s old
address is replaced with the new address, and the history of the old
address is not stored.
[Link] 3/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
It can be used in Employee Salary data, when an employee’s salary is
updated, the old salary is replaced with the new salary, and the history of
the old salary is not maintained.
It can be used in Inventory data when the stock of an item is updated, the
new stock level replaces the old stock level, and the history of the
changes is not stored.
Example of SCD Type 1
Image By Me
Explanation of SCD Type 1 :
[Link] 4/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Suppose you have a table with columns ID, Name, and Address. You receive
updated data from your source, where is a change in the address of a
customer or multiple customers.
In SCD Type 1, you simply update the existing record with the new address
without keeping a history of previous addresses.
SCD Type 2
Type 2 SCD, also known as the historical tracking method, maintains the
history of changes in the dimension table by creating a new record for each
change. It adds a new record with a new primary key every time a change
occurs. This method is suitable for situations where historical data is
required for reporting and analysis.
Advantages of SCD Type 2:
Supports reporting and analysis of historical data
No data is lost
Accurately tracks changes over time
Disadvantages of SCD Type 2:
Requires more storage space
Can be complex to implement
Sample Python code:
Applications of SCD Type 2:
[Link] 5/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Can be used in sales data where you can track your profit or sales of a
particular item over time to help you analyse your sales.
It can be used in customer or order data where you can keep a track of
the previous order of a person and recommend items based on that.
It can be used in expense data where you can track your daily expenses
and analyse your daily expense.
Example of SCD Type 2
[Link] 6/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Image By me
Explanation
Suppose you have the same data as earlier but you want to implement SCD
Type 2 in that. here you use two to three different extra columns to track the
history of your data.
Here I have added Start Date & End Date to our incoming data. Whenever a
record is Active i.e the incoming record is unchanged and is matching your
existing record the End Date is Null.
Initially, the two records were in the active state so the End Date was Null.
Later with the new data, we notice that they are different from the existing
data and there is a new record as well.
So we close those active records and update the End Date with the current
date and we add three new entries to our existing table.
NOTE: We can also use some columns such as flag / active with 0 or 1 values
to denote whether it is an active or closed record along with the dates. Also for
the End Date you can use current_day-1 based on your design and
requirements, in this case that will be 2022–02-20. Sometimes this approach
makes it easier to handle the overlapping of the same date in Start Date and
End Date.
From my personal experience, I cannot help but emphasise this again,
To fetch active records, query : select …. where end_date is null
To fetch closed records, query : select …. where end_date is not null
[Link] 7/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
SCD Type 3
SCD Type 3 stands for Slowly Changing Dimension Type 3. It is a technique
used in data warehousing to capture historical data while limiting the
amount of storage space required. SCD Type 3 maintains only a limited
history of changes, usually only the current value and the previous value in
form of a column versioning rather than row versioning like SCD Type 2.
Advantages of SCD Type 3:
It requires less storage space compared to SCD Type 2.
It is useful when we want to track only limited historical changes for
specific columns.
It enables fast queries as there is limited history to scan.
Disadvantages of SCD Type 3:
It does not provide a complete history of changes as it only tracks limited
changes.
It may not be suitable for tracking changes for columns that may require
full history.
Applications of SCD Type 3:
It is useful in cases where the user needs to know only the current value
and previous value of specific columns.
It is commonly used in retail sales or financial reporting where the
current and previous value of a product or service is important.
It is also used in cases where there is limited space available for storing
historical data.
[Link] 8/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Example
Image By Me
Explanation
Here, taking the same data as before, we have one new record and two
updated records in our source data. So after ingesting, we update the
Prev_Address field with the values of the old address of the customer. In place
of the Address, comes the new address that was ingested. For the new record,
as there is no previous entry with the same primary key, it updates as it is
leaving the Prev_Address as Null
[Link] 9/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Extras
There is also SCD Type 0, where none of the fields is updated ever. A good
example of this will be a reference table of some sort.
There are also SCD Type 4, also known as hybrid SCDs, which combine
aspects of both SCD Type 1 and SCD Type 2. In this type, the slowly changing
dimension is kept in a separate table, which is linked to the main table via a
foreign key. This allows for both the original and updated data to be
retained, while still maintaining a small footprint for the main table.
Suppose we have a product table with attributes such as Id, name,
description, and price. If we want to maintain the history of changes to the
price attribute, we can use SCD Type 5. In this case, we create a new table to
store the historical data, with columns such as id, price, start date, and end
date. Whenever there is a change to the price attribute, we insert a new record
into the historical table with the new price and start date and update the
current record in the product table with the new price.
In SCD Type 6, it uses a hybrid approach between type 1,type 2 and type 3
(1+2+3=6). The data includes columns for both current and historical data,
as well as a column that tracks the current version of a record. This allows
for both current and historical data to be stored in the same row, with the
current version easily accessible. This type of SCD is useful when a business
wants to see both current and historical data in the same report.
In SCD Type 7, it uses a similar approach to type 6, but instead of including a
column for the current version of a record, it creates a separate table to store
all historical data. The main table only includes columns for the current
data. This allows for better performance when querying the current data,
while still providing access to historical data through the separate table. This
[Link] 10/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
type of SCD is useful when historical data needs to be accessed less
frequently than current data.
The most commonly used SCD types are SCD Type 1, SCD Type 2, and SCD
Type 3
Conclusion
SCDs are important for maintaining the integrity and accuracy of your data
over time. Choosing the right SCD type depends on the needs of your data
and the application it is being used for. Understanding the differences
between the SCD types and their advantages and disadvantages can help you
make informed decisions about your data modelling strategy.
Thank you for reading till the end. You can follow me here on Medium for
more updates.
More content at [Link].
Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube,
and Discord.
Interested in scaling your software startup? Check out Circuit.
[Link] 11/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Written by Mainak Das Follow
A Software Engineer and a Data Science enthusiast. Sharing my learnings on
Medium. Check : [Link] |
[Link]
More from Mainak Das and Python in Plain English
Mainak Das in Python in Plain English Builescu Daniel in Python in Plain English
Object Oriented Programming Wanna Code Like a Google
(OOP) in Python — Abstraction &… Engineer? Let’s Dive into Advance…
Abstraction and encapsulation are two crucial Unlock the secrets of advanced Python,
concepts of Object Oriented… straight from an Ex-Googler! Dive into synta…
3 min read · Feb 12 · 22 min read · Aug 21
16 1 1.3K 9
[Link] 12/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English
Builescu Daniel in Python in Plain English Mainak Das in Python in Plain English
10 Python Projects You Can Start A Comprehensive Guide to File
Today and Monetize Tomorrow Formats in Data Engineering
🚀 Dive into 10 Python projects with HUGE Understanding the Pros and Cons of using
potential! Turn your code into cash. 💰 Read… CSV, JSON, Parquet, Avro, and ORC file form…
· 19 min read · Aug 10 7 min read · Feb 20
477 7 220 3
See all from Mainak Das See all from Python in Plain English
[Link] 13/13