Dimensional Modeling: Corporate Technology Solutions

Corporate Technology Solutions
Dimensional
Modeling
An Overview of Dimensional Modeling by Haitham Salawdeh
Dimensional Modeling
Table of Contents:
INTRODUCTION:
BASIC CONCEPTS AND TERMINOLOGY:
TYPES OF FACT TABLES:
DIMENSION TABLES
SLOWLY CHANGING DIMENSIONS:
DO NOTHING:
ROW CENTRIC VERSIONING:
COLUMN CENTRIC VERSIONING:
HYBRID APPROACH:
OTHER CONSIDERATIONS:
7
7
8
8
8
STEPS TO A DM
SOME DOS AND DONTS:
REVERENCES:
Haitham Salawdeh
Page 2
June, 2009
Introduction:
Multi Dimensional Modeling (DM) stands in contrast to the normalized model (NM) in
many ways. First and for most, modeled dimensionally, a structure is easier to navigate
and understand. This is a plus especially when the user of the model is not familiar with
database technologies and tools. Next, the audience of a normalized model is generally
a developer or someone capable of navigating the normalized structures effectively.
Lastly, the emphasis of a normalized model is on reducing data redundancy and making
sure a datum is updated once and in one place.
When ease of user navigation and performance are the primary concern data
normalization becomes an obstacle. DM is generally used in the context of data
warehousing. In this context the data warehouse is loaded once and accessed many
times. On the other hand, NM is well suited for systems where updates/inserts and
deletes happen frequently.
This Overview is meant to introduce the concepts and language of Multi-Dimensional
Modeling (DM). I will not spend a lot of time developing the motivation for DM or
Warehousing.
Basic Concepts and Terminology:

A dimensional model is also called a star schema. A star is a model that has a center
table called fact table surrounded by dimensions. Figure 1 shows an example of a
star where the fact is the Transaction table and dimensions are Account and Transaction
Date. The non-key values of a fact table are called measures. Measures are numerical
values and they can be additive or semi-additive. An example of an additive measure
would be the transaction amount in the Transaction fact table of depicted in Figure 1.
The amount can be added over a period to generate a meaningful transaction total
amount. A semi-additive fact, on the other hand, would be market value amount in a
holding fact table of a fund company. Adding the market value cross a month is not
meaningful. However, it is possible to have an average market value instead. In Figure
2 the BalanceAmount field is semi-additive.
Haitham Salawdeh
Page 3
June, 2009
Figure 1: A transaction fact table for a bank.

There are times when a star has some of its portions normalized. The design is then said
to be snow-flaked. An example of a snowflake design is when a security dimension
table has a relationship to an industry classification table. To undo the snow flake the
relationships generating it will have to be flattened. In the earlier example, the industry
assignment will be made part of the security table. Later we will learn about a
snowflake that cannot be avoided. Figure 2 illustrates such a situation where the
relationship between accounts and customer cannot be flattened and need to be
normalized.
Haitham Salawdeh
Page 4
June, 2009
Figure 2: A simple balance fact of a bank and its dimensions.

It is important to note that when one is modeling a business process one should re-use
what has been modeled previously. Other modeling sessions might have defined
dimensions and facts that can be reused. This notion is also known as conforming
dimension. When developing a warehouse iteratively this idea becomes paramount.
The use of a dimension is conforming if we use an already defined dimension or a subset
of it. Conforming dimensions coupled with conforming facts are the foundation of the
warehouse bus architecture advocated by Ralph Kimball.
Types of Fact Tables:

In addition to being additive or non-additive, fact tables come in four flavors:
transactional, periodic, accumulative or factless. First, a fact table can be a
transactional snapshot. A transactional snapshot fact table represents a point of time
in the life of business events. An example of a transactional snapshot would be a table
capturing all order details of a retail website. Figure 1 shows another example of a
transactional fact table.
Haitham Salawdeh
Page 5
June, 2009
While a transactional snapshot represents a point in time, a periodic snapshot
represents a pre-defined interval or a period. A daily balance fact table of a bank as
depicted in Figure 2 is an example of a periodic snapshot.
Furthermore, an accumulative snapshot represents business activities over a time

period. An accumulative fact could represent a fulfillment process of a mutual fund
company. A row in that table will capture a customers first contact date, then the
literature send date and finally an account open date. As a consequence, this is one of
the only times when a fact table is updated in a data warehouse. Otherwise, a fact table
row is not updated after it is loaded.
Finally, a fact table does not necessarily have to have measures. It could simply be a
bridge between dimensions. In that case the fact table is said to be factless.
Dimension Tables:
The dimensions define a structure around the facts. It is imperative that all dimensions
be demoralized to ease the navigation of the model. In some cases dimensions have
many-to-many relationships that cannot be flattened. It is also possible that the manyto-many is between a fact and dimension. It is appropriate then to have a multi-valued
dimension to implement the relationship. A multi-valued dimension is simply a bridge
table between the entities involved in the many-to-many relationship. Figure 2 shows a
multi-valued dimension table depicting the account ownership. An account can have
multiple relationships to customers including: primary, secondary and custodian. In
addition, the number of secondary customers for an account might not be bound. It is
best in such situations to normalize the dimensional model as we have done here.
Furthermore, when attempting to conform dimension we are faced with using the same
dimension with different name. An example of that is when a transaction fact table
refers to a settlement date and a transaction date. The two dates should refer to a date
dimension that is conforming. A way to do that is to introduce views on top of the date
dimension for the settlement and transaction date. In this case the date is said to be
role-playing. For example, in Figure 1 the TransactionDate dimension can be a view
over the Date dimension depicted in Figure 2.
In addition, it is possible to be left with some attributes that do not fit in any of the
extracted dimensions and dont group well together. In that case it is not
recommended to keep these attributes with the fact table. Instead, they can be pulled
out into a Junk Dimension. Figure 3 shows how 2 indicators were grouped into a junk
dimension. The TransactionIndicators dimension will have 4 rows and they are the
combination of the 2 indicators possible values. As we add more indicators and multi-
Haitham Salawdeh
Page 6
June, 2009
valued columns this dimension will grow. At one point we might need to split a junk
dimension if the collection of columns is highly un-related.
Figure 3: Shows some indicators grouped as part of a junk dimension.
Slowly Changing Dimensions:

The fact tables in a data warehouse rarely see an update. They are generally appended
to or truncated and reloaded. The dimension tables on the other hand do need to
represent change and some cases the change is happening rapidly. This issue is known
as Slowly Changing Dimension or SCD. There are generally few ways to deal with SCD.
Do Nothing:
In the datamodeling literature, this is known as type 1 SCD. In this scenario you simply
overwrite the old values with the new. History is then lost forever. If there is no
requirement to keep history this might be the approach you need to take. However, it is
important to understand that this approach does come at a cost. All cubes will have to
be rebuilt. Otherwise, reports and cubes will have dead paths.
Row Centric Versioning:

This is known as type 2 SCD. In type 2, for each change in a dimension we generate a
new row. In the security dimension mentioned above we will add a new dimension row
Haitham Salawdeh
Page 7
June, 2009
for a security if it were to change cusip. This allows us to represent history accurately.
New holdings will point to the new dimensions and old holdings will continue to point to
the old one. The drawback of this approach is that it does not allow us to associate the
old facts with the new values. Depending on the requirements this could be
unacceptable.
Column Centric versioning:

This is known as type 3 SCD. When using type 3 a designer identifies the columns that
need to be tracked and over how many changes. Then the column is recreated with
some sequence scheme. In our security dimension example if the cusip is the column to
be tracked then we need to decide on how many changes to track. If we want to only
keep track of a single change event we would create a new column and call it previous
cusip.
The issue with this approach is very apparent. The change is implemented into the
structure of the table and cannot be revised easily. If it is important to track many
change events then this approach is difficult to manage.
Hybrid Approach:
I personally like the use of type 2 SCD to capture the change and type 3 to chain link the
dimensions. In the case of the cusip change example earlier, I would create a new
security dimension row with the new values and add a new column to the security
dimension called previous security. This column will point to the security that just
changed. While it is hard for a user to navigate the chain and while I, otherwise, refrain
from using recursion, I feel this approach meets many requirements. It might require IT
or some technical users to assist in predefining navigation paths for less experienced
users.
Other considerations:
In some dimensions some fields change much more frequently than others. It is
admissible to break out the fields that change more rapid from the others. However,
the breakup has to be done in a way that makes sense to the business user of the model
who is not aware of the technical motivating the split.
Steps to a DM:
Now that we talked about the terminology of DM, where do we create a dimensional
model? The initial step to creating a DM is to decide what business process(es) to
model. As part of that the business requirements need to be harvested and the
available data need to be understood.
Haitham Salawdeh
Page 8
June, 2009
The next step is to decide on the grain of the model. The best and most flexible
approach is to go for the lowest available grain. By doing this you give the user of your
model a way to summarize data in an ad-hoc manner. It also allows your users to drill
from a summary view to supporting details. In addition, keep all the measures of a fact
table at the same grain. An example of grain declaration when modeling holdings for a
fund company would state, We will model account holdings per day.
The third step is to define the dimensions. In the above example the dimensions for a
holding will include portfolio, holding date and security.
Finally, we derive the facts described by the dimensions. The facts in the above example
will include holding market value and holding cost basis.
Some Dos and Donts:

The use of surrogate, non-natural keys, is highly encouraged when designing a
warehouse. This will isolate the warehouse from changes that happen to the natural
key. It also does improve performance dependant on what type the natural key is.
In addition, avoid the temptation of normalization. Many application developers and
modelers cannot resist the temptation of snow-flaking. If you are one of those people
stop it! Help your users and dont give in to your vices. Having said that, snow flaked
models are sometimes necessary and unavoidable. Multi-valued dimensions are an
example of when snow flaking is acceptable and indeed necessary.
Finally, Dimension and fact conformance is a must in a successful warehouse
implementation. If you are to guaranty that your warehouse is expandable you have to
enforce fact and dimension conformance.
References:
Ralph Kimball and Margy Ross (2002). The Data Warehouse Toolkit. Second Edition.
New York: Wiley Computer Publishing.
Haitham Salawdeh
Page 9
June, 2009

Dimensional Modeling: Corporate Technology Solutions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dimensional Modeling: Corporate Technology Solutions

Uploaded by

Copyright:

Available Formats

Corporate Technology Solutions

BASIC CONCEPTS AND TERMINOLOGY:

TYPES OF FACT TABLES:

SLOWLY CHANGING DIMENSIONS:

SOME DOS AND DONTS:

Basic Concepts and Terminology:

Figure 1: A transaction fact table for a bank.

Figure 2: A simple balance fact of a bank and its dimensions.

Types of Fact Tables:

Furthermore, an accumulative snapshot represents business activities over a time

Figure 3: Shows some indicators grouped as part of a junk dimension.

Slowly Changing Dimensions:

Row Centric Versioning:

Column Centric versioning:

Some Dos and Donts:

You might also like