You are on page 1of 49

DW DESIGN COMPONENTS:

GRANULARITY
• is the extent to which a system is broken down into small parts, either

the system itself or its description or observation. It is the extent to

which a larger entity is subdivided.

• For example, a yard broken into inches has finer granularity than a yard

broken into feet.


DATA GRANULARITY

• The granularity of data refers to the size in which data fields are sub-divided. For example, a postal
address can be recorded, with coarse granularity, as a single field:

• address = 200 2nd Ave. South #358, St. Petersburg, FL 33701-4313 USA or

• with fine granularity, as multiple fields:

• street address = 200 2nd Ave. South #358

• city = St. Petersburg

• postal code = FL 33701-4313

• country = USA
OR EVEN FINER GRANULARITY:

• street = 2nd Ave. South

• address number = 200

• suite/apartment number = #358

• city = St. Petersburg

• state = FL

• postal-code = 33701

• postal-code-add-on = 4313

• country = USA
• Data granularity in a data warehouse refers to the level of detail. The lower the

level of detail, the finer is the data granularity. Of course, if you want to keep data

in the lowest level of detail, you have to store a lot of data in the data warehouse

• In a data warehouse, therefore, you find it efficient to keep data summarized at

different levels. Depending on the query, you can then go to the particular level of

detail and satisfy the query.


FIGURE BELOW SHOWS EXAMPLES OF DATA GRANULARITY IN A
TYPICAL DATA WAREHOUSE.
FACT TABLES. WHAT ARE THEY?

• In data warehousing, a Fact table consists of the measurements, metrics or facts of a business
process.

• It is located at the center of a star schema or a snowflake schema surrounded by dimension


tables.

• The primary key of a fact table is usually a composite key that is made up of all of its foreign
keys.

• Fact tables contain the content of the data warehouse and store different types of measures like
additive, non additive, and semi additive measures.
FACT TABLES CONTINUED…
• A fact table is the primary table in a dimensional model where the numerical

performance measurements of the business are stored, We use the term fact to

represent a business measure.

• We can imagine standing in the marketplace watching products being sold and writing

down the quantity sold and dollar sales amount each day for each product in each

store
• A measurement is taken at the intersection of all the dimensions (day, product, and store).

This list of dimensions defines the grain of the fact table and tells us what the scope of the

measurement is.

• The most useful facts are numeric and additive, such as dollar sales amount
ILLUSTRATION
DIMENSION TABLES

• Dimension tables are integral companions to a fact table. The dimension tables contain

the textual descriptors of the business.

• In a well-designed dimensional model, dimension tables have many columns or

attributes. These attributes describe the rows in the dimension table,

• Each dimension is defined by its single primary key, designated by the PK notation

which serves as the basis for referential integrity with any given fact table to which it is

joined.
• Dimension attributes serve as the primary source of query constraints, groupings,
and report labels. In a query or report request, attributes are identified as the by
words.

• For example, when a user states that he or she wants to see dollar sales by week
by brand, week and brand must be available as dimension attributes.

• Dimension table attributes play a vital role in the data warehouse. Since they are
the source of virtually all interesting constraints and report labels, they are key to
making the data warehouse usable and understandable
SAMPLE DIMENSION TABLE.
BRINGING TOGETHER FACTS AND
DIMENSIONS: FACT AND DIMENSION TABLES IN A
DIMENSIONAL MODEL
DIMENSIONAL MODELING (DM)
• (DM) refers to a logical design technique often used for data warehouses. (It seeks to

present the data in a standard framework that allows high-performance access)

• It differs from the entity –relationship (E-R) designs in that while the E-R aims at

elimination of data redundancy by normalization but Dws are not normalized


DIMENSIONAL MODEL

• Every Dimensional model is composed of one table with a multi part key called the fact

table and a set of smaller tables called dimension tables

• Each D-Table has a single part primary key that corresponds to exactly one component

of the multi-part key in the fact table


DIMENSION TABLES

• These represent Business objects or subjects; they could be equated to the entities in

E-R models

• Dimension have attributes in the same way entities have properties. these form the

columns of the dimensional tables

• Data stored is descriptive


EXAMPLE 1 OF A DIMENSIONAL
MODEL
DIMENSIONAL MODEL

• Business process being modeled is the sales process.

• Fact table represents sales facts that is the amount sold, units sold and cost.

• Dimension tables store the data that describes a sale


CONFORMED DIMENSION

• In data ware housing, a conformed dimension is a dimension which has the same meaning

to every fact table it relates to.

Conformed dimensions allow facts and measures to be categorized and described in the

same way across multiple fact tables/data mats/ ensuring consistent reporting across the

enterprise.
JUNK DIMENSION:
• A junk dimension is a collection of random transactional codes or text attributes that are unrelated to
any particular dimension.

• The junk dimension is simply a structure that provides a convenient place to store the junk attributes.

Eg: Assume that we have a gender dimension and marital status dimension. In the fact table we need to
maintain two keys referring to these dimensions.

• Instead of that create a junk dimension which has all the combinations of gender and marital status
(cross join gender and marital status table and create a junk table). Now we can maintain only one key
in the fact table.
DEGENERATED DIMENSION:

• According to Ralph Kimball, in a data warehouse, a degenerate dimension is a

dimension key in the fact table that does not have its own dimension table, because all

the interesting attributes have been placed in analytic dimensions.


ROLE-PLAYING DIMENSION:

Dimensions which are often used for multiple purposes within the same database are

called role-playing dimensions. For example, a date dimension can be used for “date of

sale", as well as "date of delivery", or "date of hire".


DW SCHEMAS

• The schema is a logical description of the entire database. The schema includes the

name and description of records of all record types including all associated data-items

and aggregates.

• Likewise the database, the data warehouse also require the schema. The database uses

the relational model on the other hand the data warehouse uses the Star, snowflake and

fact constellation schema.


• Two types of dimensional schemas that is

• The star schema

• The snow flake

(comes from a Greek word (skhēma )which means shape or more generally plan

• The schema represented in the diagram is the star


STAR SCHEMA

• In star schema each dimension is represented with only one dimension table.

• This dimension table contains the set of attributes.

• In the following diagram we have shown the sales data of a company with respect to the

four dimensions namely, time, item, branch and location.


EXAMPLE OF STAR SCHEMA
time
time_key foreign keys item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key
type
year item_key supplier_type

branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
• There is a fact table at the centre.

• This fact table contains the keys to each of four dimensions.

• The fact table also contain the attributes namely, dollars sold and units sold.

Note: Each dimension has only one dimension table and each table holds a set of attributes.

• For example the location dimension table contains the attribute set
{location_key,street,city,province_or_state,country}.
SNOWFLAKE SCHEMA

• In Snowflake schema some dimension tables are normalized.

• The normalization splits up the data into additional tables.

• for example the item dimension table in star schema is normalized and split into two

dimension tables namely, item and supplier table.


EXAMPLE OF SNOWFLAKE SCHEMA
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
time_key type
quarter
year item_key supplier_key

branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type dollars_sold
city_key
avg_sales city
province_or_street
Measures normalization country
• Therefore now the item dimension table contains the attributes item_key, item_name,

type, brand, and supplier-key.

• The supplier key is linked to supplier dimension table.

• The supplier dimension table contains the attributes supplier_key, and supplier_type.

Note: Due to normalization in Snowflake schema the redundancy is reduced therefore it

becomes easy to maintain and save storage space.


THE FIRST DIFFERENCE:
NORMALIZATION

• As mentioned, normalization is a key difference between star and snowflake schemas.

• Snowflake schemas will use less space to store dimension tables. This is because as a

rule any normalized database produces far fewer redundant records .


FACT CONSTELLATION SCHEMA

• In fact Constellation there are multiple fact tables. This schema is also known as galaxy

schema.

• In the following diagram we have two fact tables namely, sales and shipping.
EXAMPLE OF FACT CONSTELLATION
time
time_key item Shipping Fact Table
day item_key time_key
day_of_the_week Sales Fact Table item_name item_key
month brand
quarter
time_key
type
year item_key supplier_type shipper_key

branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name units_sold
street
branch_type dollars_sold city units_shipped
province_or_street
avg_sales shipper
country
shipper_key
Measures
shipper_name
location_key
shipper_type
• The sale fact table is same as that in star schema.

• The shipping fact table has the five dimensions namely, item_key, time_key, shipper-key,
from-location.

• The shipping fact table also contains two measures namely, dollars sold and units sold.

• It is also possible for dimension table to share between fact tables. For example time,
item and location dimension tables are shared between sales and shipping fact table.
WHAT IS A FACT TABLE?

• A fact table is the one which consists of the measurements, metrics or facts of business

process.

• These measurable facts are used to know the business value and to forecast the future

business. The different types of facts are explained in detail below.


TYPES OF FACTS
• Additive: Additive facts are facts that can be summed up through all of the dimensions

in the fact table.

• Semi-Additive: Semi-additive facts are facts that can be summed up for some of the

dimensions in the fact table, but not the others.

• Non-Additive: Non-additive facts are facts that cannot be summed up for any of the

dimensions present in the fact table.


ADDITIVE FACTS
• Additive facts are facts that can be summed up through all of the dimensions in the fact table. A
sales fact is a good example for additive fact
SEMI ADDITIVE FACTS

• Semi-additive facts are facts that can be summed up for some of the dimensions in

the fact table, but not the others.

Eg: Daily balances fact can be summed up through the customers dimension but

not through the time dimension.


NON ADDITIVE FACTS

Non-additive facts are facts that cannot be summed up for any of the dimensions

present in the fact table.

Profit margins are non-additive. If a department has two employees, and one

employee has sold an item with a 55% profit margin and the other has sold an

item with a 45% profit margin, the profit margin for the department is not 100%.
FACT LESS FACT TABLES

• A fact less fact table is fact table that does not contain fact. They contain only

dimensional keys and it captures events that happen only at information level but not

included in the calculations level. just an information about an event that happen over a

period.
• A fact less fact table captures the many-to-many relationships between dimensions,
but contains no numeric or textual facts. They are often used to record events or
coverage information. Common examples of fact less fact tables include:

• Identifying product promotion events (to determine promoted products that didn’t
sell)

• Tracking student attendance or registration events

• Tracking insurance-related accident events

• Identifying building, facility, and equipment schedules for a hospital or university


FACTLESS FACT TABLES FOR
EVENTS
• The first type of fact less fact table is a table that records an event. Many event-tracking
tables in dimensional data warehouses turn out to be factless. Sometimes there seem to
be no facts associated with an important business process.

• Events or activities occur that you wish to track, but you find no measurements. In
situations like this, build a standard transaction-grained fact table that contains no
facts.
• For eg.
• The above fact is used to capture the leave taken by an employee.

• Whenever an employee takes leave a record is created with the dimensions.

• Using the fact FACT_LEAVE we can answer many questions like

• Number of leaves taken by an employee

• The type of leave an employee takes

• Details of the employee who took leave


FACTLESS FACT TABLES FOR
CONDITIONS
Factless fact tables are also used to model conditions or other important relationships

among dimensions. In these cases, there are no clear transactions or events.

• It is used to support negative analysis report. For example a Store that did not sell a

product for a given period.  To produce such report, you need to have a fact table to

capture all the possible combinations.  You can then figure out what is missing
• For eg, fact_promo gives the information about the products which have promotions but
still did not sell
• This fact answers the below questions:

• To find out products that have promotions.

• To find out products that have promotion that sell.

• The list of products that have promotion but did not sell.

• This kind of fact less fact table is used to track conditions, coverage or eligibility. 

You might also like