You are on page 1of 6

Slowly changing dimension - Wikipedia, the free encyclopedia http://en.wikipedia.

org/wiki/Slowly_changing_dimension

Dimension is a term in data management and data warehousing that refers to logical groupings of data such as
geographical location, customer information, or product information. Slowly Changing Dimensions (SCD) are
dimensions that have data that slowly changes.

For example, you may have a Dimension in your database that tracks the sales records of your company's
salespeople. Creating sales reports seems simple enough, until a salesperson is transferred from one regional office to
another. How do you record such a change in your sales Dimension?

You could sum or average the sales by salesperson, but if you use that to compare the performance of salesmen, that
might give misleading information. If the salesperson that was transferred used to work in a hot market where sales
were easy, and now works in a market where sales are infrequent, her totals will look much stronger than the other
salespeople in her new region, even if they are just as good. Or you could create a second salesperson record and treat
the transferred person as a new sales person, but that creates problems also.

Dealing with these issues involves SCD management methodologies referred to as Type 0 through 6. Type 6 SCDs are
also sometimes called Hybrid SCDs.

Contents
1 Type 0
2 Type 1
3 Type 2
4 Type 3
5 Type 4
6 Type 6 / Hybrid
7 Type 6 / Hybrid - Alternative Implementation
8 Combining Types
9 See also
10 Notes
11 References

Type 0
The Type 0 method is a passive approach to managing dimension value changes, in which no action is taken. Values
remain as they were at the time the dimension record was first entered. In certain circumstances historical
preservation with a Type 0 SCD may occur. But, higher order SCD types are often employed to guarantee history
preservation, whereas Type 0 provides the least control or no control over managing a slowly changing dimension.

The most common slowly changing dimensions are Types 1, 2, and 3.

Type 1
The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is
most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't
ever need to know how it used to be misspelled in the past.)

Another example would be of a database table that keeps supplier information.

Supplier_key Supplier_Name Supplier_State

1 of 6 6/17/2010 3:18 PM
Slowly changing dimension - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Slowly_changing_dimension

Supplier_key Supplier_Name Supplier_State


001 Phlogistical Supply Company CA

Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply overwrite this
record:

Supplier_key Supplier_Name Supplier_State


001 Phlogistical Supply Company IL

The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data
warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example. But an advantage to this
is that these are very easy to maintain.

Type 2
The Type 2 method tracks historical data by creating multiple records in the dimensional tables with separate keys.
With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made.

In the same example, if the supplier moves to Illinois, the table would look like this:

Supplier_key Supplier_Code Supplier_Name Supplier_State version


001 ABC Phlogistical Supply Company CA 0
001 ABC Phlogistical Supply Company IL 1

Another popular method for tuple versioning is to add effective date columns.

Supplier_key Supplier_Code Supplier_Name Supplier_State Start_Date End_Date


001 ABC Phlogistical Supply Company CA 01-Jan-2000 21-Dec-2004
001 ABC Phlogistical Supply Company IL 22-Dec-2004

Null End_Date signifies current tuple version. In some cases, a standardized surrogate high date (e.g. 9999-12-31)
may be used as an end date, so that the field can be included in an index.

Transactions that reference this Surrogate Key (Supplier_Key) are then permanently bound to these time slices
defined by each row in the slowly changing dimension table. If there are retrospective changes made to the contents
of the dimension, or if a new set of attributes are added to the dimension (for example a Sales Rep column) which
have different effective dates to those already defined, then this can result in the existing transactions needing to be
updated to reflect the new situation. This can be an expensive database operation, so Type 2 SCD are not a good
choice if the dimensional model is subject to change.

Type 3
The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type
3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data.
Where the original table structure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to the

2 of 6 6/17/2010 3:18 PM
Slowly changing dimension - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Slowly_changing_dimension

Where the original table structure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to the
tables:

Supplier_key Supplier_Name Original_Supplier_State Effective_Date Current_Supplier_State


Phlogistical Supply
001 CA 22-Dec-2004 IL
Company

Note that this record can not track all historical changes, such as when a supplier moves twice.

Type 4
The Type 4 method is usually just referred to as using "history tables", where one table keeps the current data, and an
additional table is used to keep a record of some or all changes.

Following the example above, the original table might be called Supplier and the history table might be called
Supplier_History.

Supplier
Supplier_key Supplier_Name Supplier_State
001 Phlogistical Supply Company IL

Supplier_History
Supplier_key Supplier_Name Supplier_State Create_Date
001 Phlogistical Supply Company CA 22-Dec-2004

This method resembles how database audit tables and how change data capture techniques function.

Type 6 / Hybrid
The Type 6 method is one that combines the approaches of types 1,2 and 3 (1 + 2 + 3 = 6). One possible explanation
of the origin of the term was that it was coined by Ralph Kimball during a conversation with Stephen Pace from
Kalido but has also been referred to by Tom Haughey [1]. It is not frequently used because it has the potential to
complicate end user access, but has some advantages over the other approaches especially when techniques are
employed to mitigate the downstream complexity.

The approach is to use a Type 1 slowly changing dimension, but adding an additional pair of date columns to indicate
the date range at which a particular row in the dimension applies and a flag to indicate if the record is the current
record.

This approach has a number of advantages:

the user can choose to query using the current values of the dimensional table by restricting the rows in the
Dimension table using a filter to only select current values
alternatively the user can use the "as at the time of the transaction" values by using one of the date fields on the
transaction as a constraint on the dimension table.
if there are a number of date columns on the transaction (e.g. Order Date, Shipping Date, Confirmation Date)
then the user can choose which date to analyze the fact data by - something not possible using other

3 of 6 6/17/2010 3:18 PM
Slowly changing dimension - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Slowly_changing_dimension

then the user can choose which date to analyze the fact data by - something not possible using other
approaches.

This is how the Supplier table would look using Type 6 Slowly Changing Dimensions:

Supplier
Current
Row_Key Supplier_key Supplier_code Supplier_Name Supplier_State Start_Date End_Date
Indicator
Phlogistical
1 001 ABC001 Supply CA 22-Dec-2004 15-Jan-2007 N
Company
Phlogistical
2 001 ABC001 Supply IL 16-Jan-2007 1-Jan-2099 Y
Company

Alternative implementations of Type 6 can include a blank end date. Other approaches include using a Revision
Number instead of a Row Key.

Note, Transactions should have the Supplier Key as the foreign key (even though this is not a join to a unique column)
in combination with a date filter. The join should not be to the primary key for the table (Row Key). The Transaction
could also optionally carry one or more Row Keys (to allow use of Type 2 SCD in addition to Type 6) depending on
which of the transactions datestamp columns the user wishes to analyze by - but note the performance issues with this
approach described in the section on Type 2 SCD.

Note the Current_Indicator and Row_Key columns are optional - they are not required in order for analysis, but can
simplify querying and management respectively.

Note Ideally a Fact should only have a single event date on it to represent the date at which the transaction occurred.
If there are multiple dates, consider splitting the Fact out into the most granular events if possible.

Example SQL: In the following example, we wish to query the following Delivery transactions:

Delivery
Delivery_Key Supplier_key Quantity Product Delivery_Date Order_Date
1 001 132 Bags 22-Dec-2006 15-Oct-2006
2 001 324 Chairs 15-Jan-2007 1-Jan-2007

To query the star schema using the historic reference data as of the date of the delivery, the query looks like this:

SELECT
delivery.delivery_cost,
supplier.supplier_name,
supplier.supplier_state
FROM delivery
INNER JOIN supplier
ON delivery.supplier_key = supplier.supplier_key
WHERE delivery.delivery_date >= supplier.start_date
AND delivery.delivery_date < supplier.end_date

4 of 6 6/17/2010 3:18 PM
Slowly changing dimension - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Slowly_changing_dimension

To query using the order date, the SQL simply needs to be changed to reference the Order_Date.

To query the star schema using the current reference data, the query looks like this:

SELECT
delivery.delivery_cost,
supplier.supplier_name,
supplier.supplier_state
FROM delivery
INNER JOIN supplier
ON delivery.supplier_key = supplier.supplier_key
WHERE supplier.start_date <= CURRENT_TIMESTAMP
AND supplier.end_date > CURRENT_TIMESTAMP

Notes

CURRENT_TIMESTAMP is an ISO SQL Standard function. Vendor-specific equivalents include the NOW()
function in PostgreSQL/MySQL syntax, getdate() on SQL Server, and SYSDATE for Oracle

Caution - If the WHERE clause restricting the rows in the dimension table is not present, then the query will
potentially return duplicate rows and give the wrong answers, so this technique should be used with care.

Some Business Intelligence tools do not handle generating complex joins such as this well.

The ETL processes needed to create this table also need to be carefully designed to ensure that there are no
overlaps in the time periods for each distinct item of reference data.

The >= and < clause is required to ensure no time period is omitted.

Often a view is created over the table which can filter out the rows or columns. This simplifies joins to the table
if only the current rows are needed for certain queries. This view could be materialized to a physical table if
storage space is not a problem. This can be done automatically by most modern DBMS's and automatically kept
up to date.

Type 6 / Hybrid - Alternative Implementation


One of the drawbacks of the above implementation is that a many-to-many relationship between the dimension and
the fact table is never resolved into the data model. This many-to-many relationship can't be resolved at a logical and
Physical Data Model Standpoint. The way it is designed, it will be only resolved at runtime, at a report level, when
end-user fills in the 'As At Date' parameter. The consequence is that no referential integrity can be enforced at a
RDBMS level between the fact and the dimension table. A variant of this SCD type 6 exists that implements all the
advantages of SCD6 without this inconvenience.

This variant is based on the dimension table primary key made of a surrogate key and a version number (and not a
Start date). The version number of the current dimension record would always be 0. Before a dimension record gets
updated, the version 0 is copied over a new record (version n+1) and the version 0 can be updated.

Supplier Dimension
Version_Number Supplier_key Supplier_code Supplier_Name Supplier_State Start_Date End_Date
Phlogistical
1 001 ABC001 CA 22-Dec-2004 15-Jan-2007
Supply Company

5 of 6 6/17/2010 3:18 PM
Slowly changing dimension - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Slowly_changing_dimension

1 001 ABC001 CA 22-Dec-2004 15-Jan-2007


Supply Company
Phlogistical
0 001 ABC001 IL 15-Jan-2007
Supply Company

In the fact table, when a fact is added, it can only be done using the current dimension record, Consequently, all the
fact records will have a version number equals to 0.

Fact Delivery
Delivery_Key Supplier_key Supplier_version_number Quantity Product Delivery_Date Order_Date
1 001 0 132 Bags 22-Dec-2006 15-Oct-2006
2 001 0 324 Chairs 15-Jan-2007 1-Jan-2007

The advantage of this SCD Type 6 implementation is that the many to many between the fact and dimension is
resolved, the priority given to the AS IS version (version number=0).

Another advantage is that the RDBMS can ensure the referential integrity between the fact table and the dimension
table. Finally, when using a Design Tool like Erwin, Powerdesigner...in the Physical Model, fact tables are linked to
dimension tables. The Dimension surrogate key + the dimension version number is a compound primary key , and can
be stored into the fact table , even though the version number will be always 0.

In the implementation using the Start date in the primary key, all the tables in the Physical Data Model will look like
standalone tables and no integrity constraint can be enforced by the RDBMS (even though it is not always desirable).

Note: Advocating the concept of a column in a database table that is always 0 is illogical. There are two options to
correct this example. Either remove the column or stop updating dimension table to ensure current record is 0 (and
just increment version number instead). It is important to retain what the current record in the dimension table was
when inserting extra rows into the fact table.

6 of 6 6/17/2010 3:18 PM

You might also like