You are on page 1of 6

Dimensions:

The dimension tables are where the attributes of the dimensions of the business are stored. The
best attributes are textual and discrete and used to constraint the fact table. Each of these textual
descriptions helps us to describe the member of the respective dimension.
They are the entry points into the fact tables. They determine the grain of the fact table.
They serves as a primary source of query constraints grouping and report labels/row
headers.
They are relatively shallow in terms of rows but are wide with many large columns.
They are not usually time dependent
Hierarchical relationships.
Robust dimension attributes delivers analytic slicing and dicing capabilities.
Dimension tables are de-normalized.
Examples of Dimensions: Employee, Time Product Customer etc

Dimension Keys:
Dimensional Modeling proposes that the dimension keys should be surrogate keys. surrogate
keys are integers assigned sequentially as needed to populate a dimension.
They are also know as meaningless keys, integer keys, artificial keys, synthetic keys etc.

Every join between dimension tables and fact tables in a data warehouse environment should be
based on surrogate keys, not natural keys. Primary Benefits of surrogate keys is that they buffer
the data warehouse environment from operational changes. Avoid adverse impact on
performance in case of composite natural keys.

Avoid smart keys, Natural keys or Production keys.

Keys where you can tell something about the record just by looking at the key are called smart
keys.
Data warehouse team is able to maintain control over the environment without getting
Effected by operational rules of generating, updating, deleting, recycling and reusing production
keys. Ex: Multiple sources using same keys, Production reusing the same values after data purge,
Systems with different format keys being added at a later stage etc.



Slowly Changing Dimensions (SCD):

In the real world, dimensions and their descriptions, though relatively constant, evolve over time
employees come and go, they are promoted, salaries change etc. The term slowly changing
dimensions is the variation in dimensional attributes over time. The word slowly in this context
might seem incorrect but in general, when compared to a measure in a fact table, changes to
dimensional data occur slowly.

We need to have a strategy to deal with these changed attributes over time. When we encounter a
slowly changing dimension we face making one of the following three fundamental choices.
Each choice results in a different degree of tracking changes over time

Type One (Overwriting History): A Type 1 change overwrites an existing dimensiona l attribute
with new information. In the customer name-change example, the new name overwrites the old
name, and the value for the old version is lost. A Type One change updates only the attribute,
doesn' t insert new records, and affects no keys. It is easy to implement but does not maintain any
history of prior attribute values

Type Two (Preserving history) Creating an additional dimension record at the time of the change
with the new attribute values and thereby segmenting history very accurately between the old
description and the new description. Implementing Type Two changes within a data warehouse
might require significant analysis and development. Type Two changes accurately partition
history across time more effectively than other types. However, because Type Two changes add
records, they can significantly increase the database's size.

Type Three (Preserving a version of history) Creating new current fields within the original
dimension row to record the new attribute values, while keeping the original attribute values as
well, thereby being able to describe history both forward and backward from the change either in
terms of the original attribute values or in terms of the current attribute values. You usually
implement Type Three changes only if you have a limited need to preserve and accurately
describe history, such as when someone gets married and you need to retain the previous name.

Hybrid Type As an alternative, you can implement a mix of Type One and Type Two changes at
an attribute level by implementing Type 2 changes for only attributes whose historical values are
important when you're slicing and dicing. For example, users might not need to know an
individual's previous name if a name change occurs, so a Type One change would suffice. Users
might want the system to show only the person' s current name. However, if the company
reassigns sales territories, users might need to track who sold what, at what time, and in what
territory, necessitating a Type Two change.







Rapid Changing Dimensions (RCD):

In case of rapidly changing dimensions the dimension attribute values change rapidly over time.
Note that there are no yardstick for telling when a dimension is slowly changing or not and this is
based on the judgment of the data modeler. Also an SCD may become a RCD over time or vice
versa. For RCDs the design followed depends on the size of the dimension

Small dimensions: The same technologies as for slowly changing dimensions may be applied

Large dimensions: The best approach for efficiently browsing and tracking changes of key
attributes in really huge dimensions is to break off one or more mini dimensions from the
dimension table, each consisting of small clumps of attributes that have been administered to
have a limited number of values.

Degenerate Dimensions:

A degenerate dimension is represented by a dimension key attribute with no corresponding
dimension table. Degenerate dimensions usually occur in line item-oriented fact table designs.

Many of the dimensional designs revolve around some kind of control document like an order,
an invoice, a bill of lading, or a ticket. Usually these control documents are a kind of container
with one or more line items inside. A very natural grain for a fact table in these cases is the
individual line item, In other words, a fact table record is a line item.
the attributes on the order number automatically go over to these chosen dimensions e.g.
Product, Customer, Time etc.

At the end of the design, the order number is sitting by itself, without any attributes. We call this
a degenerate dimension. The degenerate dimension key should be the actual production order
number and should sit in the fact table without a join to anything. There is no point of making a
dimension table because the dimension table would not contain anything .

Junk Dimensions:

A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. By
creating an abstract dimension, we remove the flags from the fact table while placing them into a
useful dimensional framework.

Sometimes after carving out all the dimensions some flags or text attributes are left over in the
fact table but do not belong to any of the dimension tables. When a number of miscellaneous
flags and text attributes exist, the following design alternatives should be avoided:

Leaving the flags and attributes unchanged in the fact table record
Making each flag and attribute into its own separate dimension
Stripping out all of these flags and attributes from the design

A better alternative is to create a junk dimension.

Conformed Dimensions:

Conformed dimensions can be used to analyze facts from two or more data marts. For example
shipping and sales data marts both require a customer dimension and a time dimension.
If theyre the same dimension, then you have conforming dimensions, allowing you to extract
and manipulate facts relating to a particular customer from both marts, answering questions such
as whether late shipments have affected sales to that customer.
Adding a marketing data mart to analyze product promotions, with conformed customer and
time dimensions, youre able to analyze the effects of a particular product promotion on sales.
(Analyzing facts from more than one fact table in this way is termed drilling across. )

The same conformed dimensionsin this case, time and customer dimensionshave meaning in
the context of three independently developed data marts. These dimensions become enterprise
property and can be used later in other marts as the enterprise data warehouse evolves.
Conformed dimensions have consistent definitions regardless of where they are used. This
allows a single query to be run across multiple tables, Data Marts and Data Warehouses
Facts:

The fact table is at the center of a star schema and holds the primary measurement
data. They contain the actual numerical measurements that the business is interested in.
Fact tables express the many-to-many relationships between dimensions.
A fact table typically has two types of columns: those that contain measures and those
that are foreign keys to dimension tables. Some key features of a fact table are
Multi part Key. I.e. a composite key with one foreign key for each dimension.
Time is a always a part of the key
Usually numeric. Keys are surrogate integers and the measures are numeric.
Typically additive.
Granularity refers to the level of data in the fact table. The lowest granularity is
referred as atomic data. The granularity is determined by the grain. The meaning of a
single record in a fact table is grain. The granularity also determines how far you can
drill down without returning to the base, transaction system data. The lower the grain,
the more records will be present in the fact table. we must make sure that the grain is
low enough to support our decision support needs

Fact Types
Additive Facts
Additive facts are the measurements in a fact table that can be added across all
dimensions. e.g., discrete numerical measures of activity, i.e., quantity sold, Sales dollars.
Semi-Additive Facts
Numeric Facts that can be added across some dimensions in a fact table but not across
Others. e.g., Inventory levels and balances cannot be added along the time dimension but
can be averaged usefully over the time dimension.
Non-Additive Facts
Facts that cannot logically be added between rows. May be numeric and therefore usually
must be combined in a computation with other facts before being added across rows. If
non-numeric, can only be used in constraints, counts or groupings. e.g., measurement of
room temperature
Fact less Fact Table
A fact table that has no facts but captures certain many-to-many relationships between the
Dimension keys. Most often used to represent events or provide coverage information that
Does not appear in other fact tables.
e.g.,
1. Track student attendance at a college.
2. Promotion coverage fact to answer questions like "Which products were on promotion
that didn't sell?" not captured by the sales fact table

You might also like