0% found this document useful (0 votes)
57 views13 pages

SCD Types in Data Warehousing Explained

The document discusses slowly changing dimensions (SCD) in data warehousing. There are three main types of SCD - Type 1, Type 2, and Type 3. Type 1 overwrites data without keeping history. Type 2 creates new records to track changes over time. Type 3 maintains the current and previous value of some columns to limit storage while keeping some history. Each type has advantages and use cases depending on needs for historical data storage and retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views13 pages

SCD Types in Data Warehousing Explained

The document discusses slowly changing dimensions (SCD) in data warehousing. There are three main types of SCD - Type 1, Type 2, and Type 3. Type 1 overwrites data without keeping history. Type 2 creates new records to track changes over time. Type 3 maintains the current and previous value of some columns to limit storage while keeping some history. Each type has advantages and use cases depending on needs for historical data storage and retrieval.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Get unlimited access to the best of Medium for less than $1/week. Become a member

Understanding Slowly Changing


Dimensions (SCD) in Data
Warehousing
Exploring Different SCD Types, Their Advantages, Disadvantages, and
Applications.

Mainak Das · Follow


Published in Python in Plain English · 7 min read · Feb 25

62

Open in app

[Link] 1/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Search Medium Write

Photo by Victor Grabarczyk on Unsplash

Slowly changing dimensions (SCD) is an important concept in data


warehousing that deals with managing the historical data of slowly changing
data over time. There are different types of SCDs, each with its own
characteristics, advantages, and disadvantages.

In this article, we will explore the different types of SCDs, their definitions,
advantages, disadvantages, and applications. We will also provide sample
Python code to help you understand how to implement SCDs in your data
warehousing project.

What are Slowly Changing Dimensions (SCD)?


In data warehousing, slowly changing dimensions are dimensions that
change over time, but at a slow pace. These dimensions typically store

[Link] 2/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

historical data about an entity, such as a customer, product, or location.


Slowly changing dimensions are important for tracking changes in the data
over time, and for making accurate reports and analyses.

There are three main types of slowly changing dimensions: Type 1, Type 2,
and Type 3.

SCD Type 1
SCD Type 1, also known as the overwrite method, updates the dimension
table with the latest data without maintaining the history. It simply
overwrites the existing data with the new data. This method is suitable for
situations where historical data is not required.

Advantages of Type 1 SCD:


Simple and easy to implement

Requires less storage space

Suitable for data that does not require historical information

Disadvantages of Type 1 SCD:


Historical data is lost

Does not support reporting and analysis of historical data

Applications of SCD Type 1:


It can be used in Customer Address data where the customer’s old
address is replaced with the new address, and the history of the old
address is not stored.

[Link] 3/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

It can be used in Employee Salary data, when an employee’s salary is


updated, the old salary is replaced with the new salary, and the history of
the old salary is not maintained.

It can be used in Inventory data when the stock of an item is updated, the
new stock level replaces the old stock level, and the history of the
changes is not stored.

Example of SCD Type 1

Image By Me

Explanation of SCD Type 1 :


[Link] 4/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Suppose you have a table with columns ID, Name, and Address. You receive
updated data from your source, where is a change in the address of a
customer or multiple customers.

In SCD Type 1, you simply update the existing record with the new address
without keeping a history of previous addresses.

SCD Type 2
Type 2 SCD, also known as the historical tracking method, maintains the
history of changes in the dimension table by creating a new record for each
change. It adds a new record with a new primary key every time a change
occurs. This method is suitable for situations where historical data is
required for reporting and analysis.

Advantages of SCD Type 2:


Supports reporting and analysis of historical data

No data is lost

Accurately tracks changes over time

Disadvantages of SCD Type 2:


Requires more storage space

Can be complex to implement

Sample Python code:

Applications of SCD Type 2:

[Link] 5/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Can be used in sales data where you can track your profit or sales of a
particular item over time to help you analyse your sales.

It can be used in customer or order data where you can keep a track of
the previous order of a person and recommend items based on that.

It can be used in expense data where you can track your daily expenses
and analyse your daily expense.

Example of SCD Type 2

[Link] 6/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Image By me

Explanation
Suppose you have the same data as earlier but you want to implement SCD
Type 2 in that. here you use two to three different extra columns to track the
history of your data.

Here I have added Start Date & End Date to our incoming data. Whenever a
record is Active i.e the incoming record is unchanged and is matching your
existing record the End Date is Null.

Initially, the two records were in the active state so the End Date was Null.
Later with the new data, we notice that they are different from the existing
data and there is a new record as well.

So we close those active records and update the End Date with the current
date and we add three new entries to our existing table.

NOTE: We can also use some columns such as flag / active with 0 or 1 values
to denote whether it is an active or closed record along with the dates. Also for
the End Date you can use current_day-1 based on your design and
requirements, in this case that will be 2022–02-20. Sometimes this approach
makes it easier to handle the overlapping of the same date in Start Date and
End Date.

From my personal experience, I cannot help but emphasise this again,

To fetch active records, query : select …. where end_date is null

To fetch closed records, query : select …. where end_date is not null


[Link] 7/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

SCD Type 3
SCD Type 3 stands for Slowly Changing Dimension Type 3. It is a technique
used in data warehousing to capture historical data while limiting the
amount of storage space required. SCD Type 3 maintains only a limited
history of changes, usually only the current value and the previous value in
form of a column versioning rather than row versioning like SCD Type 2.

Advantages of SCD Type 3:


It requires less storage space compared to SCD Type 2.

It is useful when we want to track only limited historical changes for


specific columns.

It enables fast queries as there is limited history to scan.

Disadvantages of SCD Type 3:


It does not provide a complete history of changes as it only tracks limited
changes.

It may not be suitable for tracking changes for columns that may require
full history.

Applications of SCD Type 3:


It is useful in cases where the user needs to know only the current value
and previous value of specific columns.

It is commonly used in retail sales or financial reporting where the


current and previous value of a product or service is important.

It is also used in cases where there is limited space available for storing
historical data.

[Link] 8/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Example

Image By Me

Explanation
Here, taking the same data as before, we have one new record and two
updated records in our source data. So after ingesting, we update the
Prev_Address field with the values of the old address of the customer. In place
of the Address, comes the new address that was ingested. For the new record,
as there is no previous entry with the same primary key, it updates as it is
leaving the Prev_Address as Null

[Link] 9/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Extras
There is also SCD Type 0, where none of the fields is updated ever. A good
example of this will be a reference table of some sort.

There are also SCD Type 4, also known as hybrid SCDs, which combine
aspects of both SCD Type 1 and SCD Type 2. In this type, the slowly changing
dimension is kept in a separate table, which is linked to the main table via a
foreign key. This allows for both the original and updated data to be
retained, while still maintaining a small footprint for the main table.

Suppose we have a product table with attributes such as Id, name,


description, and price. If we want to maintain the history of changes to the
price attribute, we can use SCD Type 5. In this case, we create a new table to
store the historical data, with columns such as id, price, start date, and end
date. Whenever there is a change to the price attribute, we insert a new record
into the historical table with the new price and start date and update the
current record in the product table with the new price.

In SCD Type 6, it uses a hybrid approach between type 1,type 2 and type 3
(1+2+3=6). The data includes columns for both current and historical data,
as well as a column that tracks the current version of a record. This allows
for both current and historical data to be stored in the same row, with the
current version easily accessible. This type of SCD is useful when a business
wants to see both current and historical data in the same report.

In SCD Type 7, it uses a similar approach to type 6, but instead of including a


column for the current version of a record, it creates a separate table to store
all historical data. The main table only includes columns for the current
data. This allows for better performance when querying the current data,
while still providing access to historical data through the separate table. This

[Link] 10/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

type of SCD is useful when historical data needs to be accessed less


frequently than current data.

The most commonly used SCD types are SCD Type 1, SCD Type 2, and SCD
Type 3

Conclusion
SCDs are important for maintaining the integrity and accuracy of your data
over time. Choosing the right SCD type depends on the needs of your data
and the application it is being used for. Understanding the differences
between the SCD types and their advantages and disadvantages can help you
make informed decisions about your data modelling strategy.

Thank you for reading till the end. You can follow me here on Medium for
more updates.

More content at [Link].

Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube,
and Discord.

Interested in scaling your software startup? Check out Circuit.

[Link] 11/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Written by Mainak Das Follow

A Software Engineer and a Data Science enthusiast. Sharing my learnings on


Medium. Check : [Link] |
[Link]

More from Mainak Das and Python in Plain English

Mainak Das in Python in Plain English Builescu Daniel in Python in Plain English

Object Oriented Programming Wanna Code Like a Google


(OOP) in Python — Abstraction &… Engineer? Let’s Dive into Advance…
Abstraction and encapsulation are two crucial Unlock the secrets of advanced Python,
concepts of Object Oriented… straight from an Ex-Googler! Dive into synta…

3 min read · Feb 12 · 22 min read · Aug 21

16 1 1.3K 9

[Link] 12/13
16:48 11/10/2023 Understanding Slowly Changing Dimensions (SCD) in Data Warehousing | by Mainak Das | Python in Plain English

Builescu Daniel in Python in Plain English Mainak Das in Python in Plain English

10 Python Projects You Can Start A Comprehensive Guide to File


Today and Monetize Tomorrow Formats in Data Engineering
🚀 Dive into 10 Python projects with HUGE Understanding the Pros and Cons of using
potential! Turn your code into cash. 💰 Read… CSV, JSON, Parquet, Avro, and ORC file form…

· 19 min read · Aug 10 7 min read · Feb 20

477 7 220 3

See all from Mainak Das See all from Python in Plain English

[Link] 13/13

You might also like