Professional Documents
Culture Documents
Dimensional Modeling: Corporate Technology Solutions
Dimensional Modeling: Corporate Technology Solutions
Dimensional
Modeling
An Overview of Dimensional Modeling by Haitham Salawdeh
Dimensional Modeling
Table of Contents:
INTRODUCTION:
DIMENSION TABLES
DO NOTHING:
ROW CENTRIC VERSIONING:
COLUMN CENTRIC VERSIONING:
HYBRID APPROACH:
OTHER CONSIDERATIONS:
7
7
8
8
8
STEPS TO A DM
REVERENCES:
Haitham Salawdeh
Page 2
June, 2009
Dimensional Modeling
Introduction:
Multi Dimensional Modeling (DM) stands in contrast to the normalized model (NM) in
many ways. First and for most, modeled dimensionally, a structure is easier to navigate
and understand. This is a plus especially when the user of the model is not familiar with
database technologies and tools. Next, the audience of a normalized model is generally
a developer or someone capable of navigating the normalized structures effectively.
Lastly, the emphasis of a normalized model is on reducing data redundancy and making
sure a datum is updated once and in one place.
When ease of user navigation and performance are the primary concern data
normalization becomes an obstacle. DM is generally used in the context of data
warehousing. In this context the data warehouse is loaded once and accessed many
times. On the other hand, NM is well suited for systems where updates/inserts and
deletes happen frequently.
This Overview is meant to introduce the concepts and language of Multi-Dimensional
Modeling (DM). I will not spend a lot of time developing the motivation for DM or
Warehousing.
Haitham Salawdeh
Page 3
June, 2009
Dimensional Modeling
Haitham Salawdeh
Page 4
June, 2009
Dimensional Modeling
Haitham Salawdeh
Page 5
June, 2009
Dimensional Modeling
While a transactional snapshot represents a point in time, a periodic snapshot
represents a pre-defined interval or a period. A daily balance fact table of a bank as
depicted in Figure 2 is an example of a periodic snapshot.
Dimension Tables:
The dimensions define a structure around the facts. It is imperative that all dimensions
be demoralized to ease the navigation of the model. In some cases dimensions have
many-to-many relationships that cannot be flattened. It is also possible that the manyto-many is between a fact and dimension. It is appropriate then to have a multi-valued
dimension to implement the relationship. A multi-valued dimension is simply a bridge
table between the entities involved in the many-to-many relationship. Figure 2 shows a
multi-valued dimension table depicting the account ownership. An account can have
multiple relationships to customers including: primary, secondary and custodian. In
addition, the number of secondary customers for an account might not be bound. It is
best in such situations to normalize the dimensional model as we have done here.
Furthermore, when attempting to conform dimension we are faced with using the same
dimension with different name. An example of that is when a transaction fact table
refers to a settlement date and a transaction date. The two dates should refer to a date
dimension that is conforming. A way to do that is to introduce views on top of the date
dimension for the settlement and transaction date. In this case the date is said to be
role-playing. For example, in Figure 1 the TransactionDate dimension can be a view
over the Date dimension depicted in Figure 2.
In addition, it is possible to be left with some attributes that do not fit in any of the
extracted dimensions and dont group well together. In that case it is not
recommended to keep these attributes with the fact table. Instead, they can be pulled
out into a Junk Dimension. Figure 3 shows how 2 indicators were grouped into a junk
dimension. The TransactionIndicators dimension will have 4 rows and they are the
combination of the 2 indicators possible values. As we add more indicators and multi-
Haitham Salawdeh
Page 6
June, 2009
Dimensional Modeling
valued columns this dimension will grow. At one point we might need to split a junk
dimension if the collection of columns is highly un-related.
Do Nothing:
In the datamodeling literature, this is known as type 1 SCD. In this scenario you simply
overwrite the old values with the new. History is then lost forever. If there is no
requirement to keep history this might be the approach you need to take. However, it is
important to understand that this approach does come at a cost. All cubes will have to
be rebuilt. Otherwise, reports and cubes will have dead paths.
Page 7
June, 2009
Dimensional Modeling
for a security if it were to change cusip. This allows us to represent history accurately.
New holdings will point to the new dimensions and old holdings will continue to point to
the old one. The drawback of this approach is that it does not allow us to associate the
old facts with the new values. Depending on the requirements this could be
unacceptable.
Hybrid Approach:
I personally like the use of type 2 SCD to capture the change and type 3 to chain link the
dimensions. In the case of the cusip change example earlier, I would create a new
security dimension row with the new values and add a new column to the security
dimension called previous security. This column will point to the security that just
changed. While it is hard for a user to navigate the chain and while I, otherwise, refrain
from using recursion, I feel this approach meets many requirements. It might require IT
or some technical users to assist in predefining navigation paths for less experienced
users.
Other considerations:
In some dimensions some fields change much more frequently than others. It is
admissible to break out the fields that change more rapid from the others. However,
the breakup has to be done in a way that makes sense to the business user of the model
who is not aware of the technical motivating the split.
Steps to a DM:
Now that we talked about the terminology of DM, where do we create a dimensional
model? The initial step to creating a DM is to decide what business process(es) to
model. As part of that the business requirements need to be harvested and the
available data need to be understood.
Haitham Salawdeh
Page 8
June, 2009
Dimensional Modeling
The next step is to decide on the grain of the model. The best and most flexible
approach is to go for the lowest available grain. By doing this you give the user of your
model a way to summarize data in an ad-hoc manner. It also allows your users to drill
from a summary view to supporting details. In addition, keep all the measures of a fact
table at the same grain. An example of grain declaration when modeling holdings for a
fund company would state, We will model account holdings per day.
The third step is to define the dimensions. In the above example the dimensions for a
holding will include portfolio, holding date and security.
Finally, we derive the facts described by the dimensions. The facts in the above example
will include holding market value and holding cost basis.
References:
Ralph Kimball and Margy Ross (2002). The Data Warehouse Toolkit. Second Edition.
New York: Wiley Computer Publishing.
Haitham Salawdeh
Page 9
June, 2009