You are on page 1of 20

Slowly Changing Dimension

• In Data warehouse, there are two types of tables:


--Dimension table
--Fact table.
• Fact tables change regularly as new information collects, whereas
dimension table changes very often.
• Although dimension tables are typically static lists, most dimension
tables do change over time. For example, we might need to update
an inventory dimension once a month to reflect new or changed part
numbers. Since these changes are smaller in magnitude compared
to changes in fact tables, these dimensions are known as slowly
growing or slowly changing dimensions.
• Slowly changing dimensions are dimension tables that have slowly
increasing dimension data and updates to existing dimensions.
When updating existing dimensions, we decide whether to keep all
historical dimension data, no historical data, or just the current and
previous versions of dimension data.
Using the Slowly Changing
Dimensions Wizard
• The Slowly Changing Dimensions Wizard creates mappings to load slowly changing dimension tables:

• Type 1 Dimension mapping. Loads a slowly changing dimension table by inserting new dimensions and
overwriting existing dimensions. Use this mapping when you do not want a history of previous dimension data.

• Type 2 Dimension/Version Data mapping. Loads a slowly changing dimension table by inserting new and
changed dimensions using a version number and incremented primary key to track changes. Use this mapping
when you want to keep a full history of dimension data and to track the progression of changes.

• Type 2 Dimension/Flag Current mapping. Loads a slowly changing dimension table by inserting new and
changed dimensions using a flag to mark current dimension data and an incremented primary key to track
changes. Use this mapping when you want to keep a full history of dimension data, tracking the progression of
changes while flagging only the current dimension.

• Type 2 Dimension/Effective Date Range mapping. Loads a slowly changing dimension table by inserting new
and changed dimensions using a date range to define current dimension data. Use this mapping when you want
to keep a full history of dimension data, tracking changes with an effective date range.

• Type 3 Dimension mapping. Loads a slowly changing dimension table by inserting new dimensions and
updating values in existing dimensions. Use this mapping when you want to keep the current and previous
dimension values in the dimension table.
SCD Overview:
Slowly Changing Target Table History Data Handling
Dimensions Mapping

Type 1 Dimension Slowly Changing None Inserts new dimensions. Overwrites existing
Dimension dimensions with changed dimensions.

Type 2 Slowly Changing Full Inserts new and changed dimensions. Creates a
Dimension/Version Dimension version number and increments the primary key
to track the changes.
Data

Type 2 Dimension/Flag Slowly Changing Full Inserts new and changed dimensions. Flags the
Current Dimension current version and increments the primary key
to track the changes.

Type 2 Slowly Changing Full Inserts new and changed dimensions. Creates
Dimension/Effective Dimension an effective date range to track changes.
Date Range
Type 3 Dimension Slowly Changing Partial Inserts new dimensions. Updates changed
Dimension values in existing dimensions. Optionally uses
the load date to track changes.
Creating a Type 1 Dimension Mapping
• The Type 1 Dimension mapping filters source rows based on user-
defined comparisons and inserts only those found to be new
dimensions to the target.
• Rows containing changes to existing dimensions are updated in the
target by overwriting the existing dimension. In the Type 1
Dimension mapping, all rows contain current dimension data.

• Handling Keys :
• When we use the Type 1 Dimension option, the Designer creates an
additional column in the mapping target, PM_PRIMARYKEY. In this
column, the Integration Service generates a primary key for each
row written to the target, incrementing new key values by 1.
Understanding the SCD 1 Mapping
The Type 1 Dimension mapping performs the following tasks:
• Selects all rows.
• Caches the existing target as a lookup table.
• Compares logical key columns in the source against corresponding columns in the target lookup table.
• Compares source columns against corresponding target columns if key columns match.
• Flags new rows and changed rows.
• Creates two data flows: one for new rows, one for changed rows.
• Generates a primary key for new rows.
• Inserts new rows to the target.
• Updates changed rows in the target, overwriting existing rows.
Data Example:
Data after first run:
PM_PRIMARYK ITEM STYLES
EY
65000 Sandal 5

Data after second run:

PM_PRIMARYK ITEM STYLES


EY
65000 Sandal 12
Creating a Type 2 Dimension/Version
Data Mapping
• The Type 2 Dimension/Version Data mapping filters source rows based on user-defined
comparisons and inserts both new and changed dimensions into the target. Changes are tracked
in the target table by versioning the primary key and creating a version number for each
dimension in the table.
• In the Type 2 Dimension/Version Data target, the current version of a dimension has the highest
version number and the highest incremented primary key of the dimension.

• When we use this option, the Designer creates two additional fields in the target:
--PM_PRIMARYKEY. The Integration Service generates a primary key for each row
written to the target.
--PM_VERSION_NUMBER. The Integration Service generates a version number for
each row written to the target.

• Handling Keys :
• In a Type 2 Dimension/Version Data mapping, the Integration Service generates a new primary
key value for each new dimension it inserts into the target. An Expression transformation
increments key values by 1,000 for new dimensions.
• When updating a dimension, the Integration Service increments the existing primary key by 1.
Understanding the SCD 2 Mapping
using versioning
• The Type 2 Dimension/Version Data mapping performs the following tasks:
• Selects all rows.
• Caches the existing target as a lookup table.
• Compares logical key columns in the source against corresponding columns in the target lookup table.
• Compares source columns against corresponding target columns if key columns match.
• Flags new rows and changed rows.
• Creates two data flows: one for new rows, one for changed rows.
• Generates a primary key and version number for new rows.
• Inserts new rows to the target.
• Increments the primary key and version number for changed rows.
• Inserts changed rows in the target.
Data Example:
Data after first run:
PM_PRIMARYKEY ITEM STYLES PM_VERSION
_NUMBER

65000 Sandal 5 0

Data after second run:


PM_PRIMARYKEY ITEM STYLES PM_VERSION
_NUMBER

65000 Sandal 5 0

65001 Sandal 12 1

65002 Sandal 17 2
Creating a Type 2 Dimension/Flag
Current Mapping
• The Type 2 Dimension/Flag Current mapping filters source rows based on user-defined
comparisons and inserts both new and changed dimensions into the target. Changes are tracked
in the target table by flagging the current version of each dimension and versioning the primary
key.
• In the Type 2 Dimension/Flag Current target, the current version of a dimension has a current
flag set to 1 and the highest incremented primary key.

• When we use this option, the Designer creates two additional fields in the target:
-- PM_CURRENT_FLAG. The Integration Service flags the current row “1” and all
previous versions “0.”
--PM_VERSION_NUMBER. The Integration Service generates a version number for
each row written to the target.

• Handling Keys :
• The Integration Service generates a current flag of 1 for each row written to the target. This flag
indicates the dimension is new or newly updated. If the row is an update to an existing
dimension, the Integration Service resets the existing dimension current flag to 0.
• As a result, all current versions of a dimension appear in the target with a current flag of 1. All
previous versions have a current flag of 0.
Understanding the SCD 2 Mapping
using current flag
The Type 2 Dimension/Flag Current mapping performs the following tasks:
• Selects all rows.
• Caches the existing target as a lookup table.
• Compares logical key columns in the source against corresponding columns in the target lookup table.
• Compares source columns against corresponding target columns if key columns match.
• Flags new rows and changed rows.
• Creates two data flows: one for new rows, one for changed rows.
• Generates a primary key and current flag for new rows.
• Inserts new rows to the target.
• Increments the existing primary key and sets the current flag for changed rows.
• Inserts changed rows in the target.
• Updates existing versions of the changed rows in the target, resetting the current flag to indicate the row is no
longer current.
Data Example:
Data after first run:
PM_PRIMARYKEY ITEM STYLES PM_CURRENT
_FLAG

65000 Sandal 5 0

Data after three runs:


PM_PRIMARYKEY ITEM STYLES PM_CURREN
T_FLAG

65000 Sandal 5 0

65001 Sandal 12 0

65002 Sandal 17 1
Creating a Type 2 Dimension/Effective
Date Range Mapping
• The Type 2 Dimension/Effective Date Range mapping filters source rows based on user-defined comparisons and
inserts both new and changed dimensions into the target. Changes are tracked in the target table by maintaining
an effective date range for each version of each dimension in the target.
• In the Type 2 Dimension/Flag Current target, the current version of a dimension has a current flag set to 1 and the
highest incremented primary key.

• When we use this option, the Designer creates two additional fields in the target:
-- PM_BEGIN_DATE For each new and changed dimension written to the target, the Integration
Service uses the system date to indicate the start of the effective date range for the dimension.
--PM_END_DATE. For each dimension being updated, the Integration Service uses the system date
to indicate the end of the effective date range for the dimension.
--PM_VERSION_NUMBER. The Integration Service generates a version number for each row
written to the target.

• Handling Keys :
• The Integration Service generates a begin date for each new and changed dimension it inserts into the target,
using the current system date. The end date for these dimensions is NULL.
• Each time the Integration Service inserts a changed dimension, it updates the previous version of the dimension
in the target, using the current system date to fill the previously null end date column.
• As a result, all current dimension data in the Type 2 Dimension/Effective Date Range target have null values in
the PM_END_DATE column. All previous versions of dimension data have a system date in PM_END_DATE to
indicate the end of the effective date range for each version.
Understanding the SCD 2 Mapping
using Effective Date Range
The Type 2 Dimension/Effective Date Range mapping performs the following tasks:
• Selects all rows.
• Caches the existing target as a lookup table.
• Compares logical key columns in the source against corresponding columns in the target lookup table.
• Compares source columns against corresponding target columns if key columns match.
• Flags new rows and changed rows.
• Creates three data flows: one for new rows, one for changed rows, one for updating existing rows.
• Generates a primary key and beginning of the effective date range for new rows.
• Inserts new rows to the target.
• Generates a primary key and beginning of the effective date range for changed rows.
• Inserts changed rows in the target.
• Updates existing versions of the changed rows in the target, generating the end of the effective date range to
indicate the row is no longer current.
Data Example:
Data after first run:
PM_PRIMARYKEY ITEM STYLES PM_BEGIN PM_END_
_DATE DATE

65000 Sandal 5 9/1/98

Data after second run:


PM_PRIMARYKEY ITEM STYLES PM_BEGIN PM_END_
_DATE DATE

65000 Sandal 5 9/1/98 9/1/99

65283 Sandal 12 9/1/99


Creating a Type 3 Dimension Mapping
• The Type 3 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to existing
dimensions are updated in the target. When updating an existing dimension, the Integration
Service saves existing data in different columns of the same row and replaces the existing data
with the updates. The Integration Service optionally enters the system date as a timestamp for
each row it inserts or updates.
• In the Type 3 Dimension target, each dimension contains current dimension data.

• When we use this option, the Designer creates two additional fields in the target:
--PM_PRIMARYKEY. The Integration Service generates a primary key for each row
written to the target.
--PM_PREV_COLNAME. The Designer generates a previous column corresponding
to each column for which you want historical data. The Integration Service keeps the previous
version of dimension data in these columns.

• Handling Keys :
• In the Type 3 Dimension mapping, the Integration Service generates a primary key value for each
new row written to the target, incrementing key values by one. Updated rows retain their original
key values.
Understanding the SCD 3 Mapping
The Type 3 Dimension mapping performs the following tasks:
• Selects all rows.
• Caches the existing target as a lookup table.
• Compares logical key columns in the source against corresponding columns in the target lookup table.
• Compares source columns against corresponding target columns if key columns match.
• Flags new rows and changed rows.
• Creates two data flows: one for new rows, one for updating changed rows.
• Generates a primary key and optionally notes the effective date for new rows.
• Inserts new rows to the target.
• Writes previous values for each changed row into previous columns and replaces previous values with updated
values.
• Optionally uses the system date to note the effective date for inserted and updated values.
• Updates changed rows in the target.
Data Example:
Data after first run:
PM_PRIMARYKEY ITEM STYLES PM_PREV_ST
YLES

65000 Sandal 5

Data after second run:

PM_PRIMARYKEY ITEM STYLES PM_VERSION


_NUMBER

65000 Sandal 12 5
Thank You….

You might also like