You are on page 1of 9

Slowly Changing Dimensions

A slowly changing dimension is a kind where data can change slowly at any time rather than in
periodic regular intervals. Modified data in dimension tables can be handled in different ways as
explained below.

You can select the SCD type to respond to a change individually for every attribute in a dimensional
table.

(i) Type 1 SCD


 In type 1 when there is a change in the values of the dimensional attributes, the
existing values are overwritten with the newly modified values which is nothing but
an update.
 Old data is not maintained for historical reference.
 Past reports can’t be regenerated because of the non-existence of old data.
 Easy to maintain.
 The impact on fact tables is more.
Example of Type 1 SCD:

(ii) Type 2 SCD


 In type 2, when there is a change in the values of the dimensional attributes, a new
row will be inserted with the modified values without changing the old row data.
 If there is any foreign key reference that exists to the old record in any of the fact
tables, then the old surrogate key gets updated everywhere with a new surrogate key
automatically.
 The impact on the fact table changes is very less with the above step.
 Old data is not considered anywhere after the changes.
 In type 2, we can track all the changes that are happening to the dimensional
attributes.
 There is no limit on the storage of historical data.
 In type 2, adding few attributes to each row such as changed date, effective date-time,
end date-time, the reason for the change and the current flag is optional. But this is
significant if the business wants to know the number of changes made during a
certain time period.
Example of Type 2 SCD:

(iii) Type 3 SCD


 In type 3 when there is a change in the values of the dimensional attributes, new
values are updated but the old values still remain valid as the second option.
 Instead of adding a new row for every change, a new column will be added if it is not
existing previously.
 Old values are placed in the above-added attributes and the primary attribute’s data is
overwritten with the changed value as in type 1.
 There is a limit on the storage of historical data.
 The impact on fact tables is more.
Example of Type 3 SCD:
(iv) Type 4 SCD
 In type 4, the current data is stored in one table.
 All historical data is maintained in another table.
Example of Type 4 SCD:

(v) Type 6 SCD


 A dimensional table can also have a combination of all three SCD types 1, 2 and 3
which is known as a Type 6 (or) Hybrid slowly changing dimension.
Fact Tables
Fact tables store a set of quantitatively measured values that are used for calculations. The fact
table’s values get displayed in the business reports. In contrast to the dimension tables textual data
type, fact tables data type is significantly Numeric.

Fact tables are deep whereas dimension tables are wide as fact tables will have a higher number of
rows and a lesser number of columns. A primary key defined in the fact table is primarily to identify
each row separately. The primary key is also called a Composite key in fact table.

If the composite key is missing in a fact table and if any two records have the same data, it is very
tough to differentiate between the data and to refer the data in dimension tables.

Hence, if a proper unique key exists as the composite key, then it is good to generate a sequence
number for each fact table record. Another alternative is to form a concatenated primary key. This
will be generated by concatenating all the referred primary keys of dimension tables row-wise.

A single fact table can be surrounded by multiple dimension tables. With the help of the foreign keys
that exist in fact tables, the respective context (verbose data) of the measured values can be referred
to in the dimension tables. With the help of queries, the users will perform drill down and roll up
efficiently.

The lowest level of data that can be stored in a fact table is known as Granularity. The number of
dimension tables associated with a fact table is inversely proportional to the granularity of that fact
table data. i.e. The smallest measurement value needs more dimension tables to be referred.

In a dimensional model, the fact tables maintain many-to-many relation with dimension tables.

An example of a Sales Fact Table:

Load Plan For Fact Tables


You can load a fact table data efficiently by considering the following pointers:

#1) Drop And Restore Indexes


Indexes in fact tables are good performance boosters while querying the data, but they demolish the
performance while loading the data. Hence, before loading any huge data into fact tables primarily
drop all the indexes on that table, load the data and restore the indexes.

#2) Separate Inserts From Updates


Do not merge insert and update records while loading into a fact table. If the number of updates is
less, then process inserts, and updates separately. If the number of updates is more then it is advisable
to truncate and reload the fact table for quick results.

#3) Partitioning
Do the partitioning physically on a fact table into mini tables for better query performance on bulk
fact table’s data. Except for the DBAs and the ETL team no one will be aware of the partitions on
facts.

As an example, you can partition a table month-wise, quarter wise, year-wise, etc. While querying,
only the partitioned data is considered instead of scanning the entire table.
#4) Load In Parallel
We have now got an idea about partitions on fact tables. Partitions on facts are also beneficial while
loading huge data into facts. To do this, first, break the data logically into different data files and run
the ETL jobs to load all these logical portions of data in parallel.

#5) Bulk Load Utility


Unlike other RDBMS systems, ETL system does not need to maintain rollback logs explicitly for
mid-transaction failures. Here “bulk loads” happen into facts instead of “SQL inserts” to load huge
data. If in case a single load fails, then the entire data can be easily reloaded (or) it can get continued
from where it is left off with the bulk load.

#6) Deleting A Fact Record


Deleting a fact table record happens only if the business wants explicitly. If there is any fact table
data that no longer exists in the source systems then that respective data can be deleted either
physically (or) logically.

 Physical delete: Unwanted records are removed from the fact table permanently.
 Logical delete: A new column will be added to the fact table such as ‘deleted’ of Bit
(or) Boolean type. This acts as a flag to represent the deleted records. You must
ensure that you are not selecting the deleted records while querying the fact table
data.
#7) Sequence For Updates And Deletes In A Fact Table
When there is any data to be updated, the dimension tables should get updated first followed by
updating the surrogate keys in the lookup table if necessary and after that the respective fact table
updates. Deletion happens in reverse because deleting all unwanted data from fact tables makes easy
to delete the linked unwanted data from the dimension tables.
We should follow the above sequence in both cases because dimension tables and fact tables
maintain referential integrity all the time.

Types Of Facts
Based on the behavior of fact tables data they are categorized as transaction fact tables, snapshot fact
tables, and accumulated snapshot fact tables. All these three types follow different features with
different data load strategies.

#1) Transaction Fact Tables


As the name indicates transaction fact tables store transaction-level data for each event that happens.
Such kind of data is easy to analyze at the fact table level itself. But for further analysis, you can also
refer to the associated dimensions.

For example, every sale (or) purchase happening from a marketing website should be loaded into a
transaction fact table.
An example of a Transaction Fact Table is shown below.

#2) Periodic Snapshot Fact Tables


As the name indicates data in periodic snapshot fact table is stored in the form of snapshots (pictures)
at periodic intervals such as for every day, week, month, quarter etc depending on the business needs.

So it is clear that this is an aggregation of data all the time. Hence snapshot facts are more complex
compared to transaction fact tables. For example, any performance revenue reports data can be
stored in snapshot fact tables for easy reference.
An example of a Periodic Snapshot Fact Table is shown below.
#3) Accumulating Snapshot Fact Tables
Accumulating snapshot fact tables allow you to store data into tables for the entire lifetime of a
product. This acts as a combination of the above two types where data can be inserted by any event at
any time as a snapshot.

In this type, additional date columns and data for each row gets updated with every milestone of that
product.

An example of an Accumulating Snapshot Fact Table.


In addition to the above three types, here are a few other types of fact tables:
#4) Factless Fact Tables: A fact is a collection of measures whereas fact less captures only events
(or) conditions that do not contain any measures. A fact-less fact table is mainly used to track a
system. The data in these tables can be analyzed and used for reporting.
For example, you can look for details of an employee who has taken leave and the type of leave in a
year, etc. Including all these non-clear fact details in a fact, the table will definitely increase the size
of facts.
An example of a Factless Fact Table is shown below.

#5) Conformed Fact Tables: A conformed fact is a fact which can be referred in the same way with
every data mart it is related to.
Specifications Of A Fact Table
Given below are the specifications of a Fact Table.
 Fact name: This is a string that describes the functionality of the fact table in brief.
 Business process: Talks about the business need to be fulfilled by that fact table.
 Questions: Mentions a list of business questions that will be answered by that fact
table.
 Grain: Indicates the lowest level of detail associated with that fact table data.
 Dimensions: List out all the dimension tables associated with that fact table.
 Measures: The calculated values stored in the fact table.
 Load frequency Represents the time intervals to load data into the fact table.
 Initial rows: Refer to the initial data populated in the fact table for the first time.
Example Of Dimensional Data Modeling
You can get an idea of how dimension tables and fact tables can be designed for a system by looking
at the below dimensional data modeling diagram for sales and orders.

You might also like