Professional Documents
Culture Documents
A slowly changing dimension is a kind where data can change slowly at any time rather than in
periodic regular intervals. Modified data in dimension tables can be handled in different ways as
explained below.
You can select the SCD type to respond to a change individually for every attribute in a dimensional
table.
Fact tables are deep whereas dimension tables are wide as fact tables will have a higher number of
rows and a lesser number of columns. A primary key defined in the fact table is primarily to identify
each row separately. The primary key is also called a Composite key in fact table.
If the composite key is missing in a fact table and if any two records have the same data, it is very
tough to differentiate between the data and to refer the data in dimension tables.
Hence, if a proper unique key exists as the composite key, then it is good to generate a sequence
number for each fact table record. Another alternative is to form a concatenated primary key. This
will be generated by concatenating all the referred primary keys of dimension tables row-wise.
A single fact table can be surrounded by multiple dimension tables. With the help of the foreign keys
that exist in fact tables, the respective context (verbose data) of the measured values can be referred
to in the dimension tables. With the help of queries, the users will perform drill down and roll up
efficiently.
The lowest level of data that can be stored in a fact table is known as Granularity. The number of
dimension tables associated with a fact table is inversely proportional to the granularity of that fact
table data. i.e. The smallest measurement value needs more dimension tables to be referred.
In a dimensional model, the fact tables maintain many-to-many relation with dimension tables.
#3) Partitioning
Do the partitioning physically on a fact table into mini tables for better query performance on bulk
fact table’s data. Except for the DBAs and the ETL team no one will be aware of the partitions on
facts.
As an example, you can partition a table month-wise, quarter wise, year-wise, etc. While querying,
only the partitioned data is considered instead of scanning the entire table.
#4) Load In Parallel
We have now got an idea about partitions on fact tables. Partitions on facts are also beneficial while
loading huge data into facts. To do this, first, break the data logically into different data files and run
the ETL jobs to load all these logical portions of data in parallel.
Physical delete: Unwanted records are removed from the fact table permanently.
Logical delete: A new column will be added to the fact table such as ‘deleted’ of Bit
(or) Boolean type. This acts as a flag to represent the deleted records. You must
ensure that you are not selecting the deleted records while querying the fact table
data.
#7) Sequence For Updates And Deletes In A Fact Table
When there is any data to be updated, the dimension tables should get updated first followed by
updating the surrogate keys in the lookup table if necessary and after that the respective fact table
updates. Deletion happens in reverse because deleting all unwanted data from fact tables makes easy
to delete the linked unwanted data from the dimension tables.
We should follow the above sequence in both cases because dimension tables and fact tables
maintain referential integrity all the time.
Types Of Facts
Based on the behavior of fact tables data they are categorized as transaction fact tables, snapshot fact
tables, and accumulated snapshot fact tables. All these three types follow different features with
different data load strategies.
For example, every sale (or) purchase happening from a marketing website should be loaded into a
transaction fact table.
An example of a Transaction Fact Table is shown below.
So it is clear that this is an aggregation of data all the time. Hence snapshot facts are more complex
compared to transaction fact tables. For example, any performance revenue reports data can be
stored in snapshot fact tables for easy reference.
An example of a Periodic Snapshot Fact Table is shown below.
#3) Accumulating Snapshot Fact Tables
Accumulating snapshot fact tables allow you to store data into tables for the entire lifetime of a
product. This acts as a combination of the above two types where data can be inserted by any event at
any time as a snapshot.
In this type, additional date columns and data for each row gets updated with every milestone of that
product.
#5) Conformed Fact Tables: A conformed fact is a fact which can be referred in the same way with
every data mart it is related to.
Specifications Of A Fact Table
Given below are the specifications of a Fact Table.
Fact name: This is a string that describes the functionality of the fact table in brief.
Business process: Talks about the business need to be fulfilled by that fact table.
Questions: Mentions a list of business questions that will be answered by that fact
table.
Grain: Indicates the lowest level of detail associated with that fact table data.
Dimensions: List out all the dimension tables associated with that fact table.
Measures: The calculated values stored in the fact table.
Load frequency Represents the time intervals to load data into the fact table.
Initial rows: Refer to the initial data populated in the fact table for the first time.
Example Of Dimensional Data Modeling
You can get an idea of how dimension tables and fact tables can be designed for a system by looking
at the below dimensional data modeling diagram for sales and orders.