You are on page 1of 14

12/5/2023

Data Science for Economics and Business

DATABASE MANAGEMENT SYSTEM

Lecture Agenda

• Design process
• Dim tables
• Fact tables

1
12/5/2023

SQL for Data Analytics

Lecture 06: Design a dimension model

Dimensional Modeling

Section 1 Design Process

2
12/5/2023

Four step to build

Select the Declare the


Business Process Grain

Identify the
Identify the Facts
Dimensions

Business Process in Dimensional Model

The first step in the design is to decide what business process to model by
combining an understanding of the business requirements with an understanding
of the available source data.
▪ Business processes are the operational activities performed by your
organization.
▪ Business process events generate or capture performance metrics that
translate into facts in a fact table.
▪ Most fact tables focus on the results of a single business process.

ô Choosing the process is important because it defines a specific design target


and allows the grain, dimensions, and facts to be declared

3
12/5/2023

Grain

Declaring the grain means specifying exactly what an individual face table row
represents. It provides the answer to the question, ‘How do you describe a single
row in the fact table ?’.

For example:
▪ One row per bank account each month.
▪ One row per scan of an individual product on a customer’s sales transaction.

-> Declaring the grain is a critical step that can’t be taken lightly.

Dimensions for Descriptive Context

▪ Dimensions provide the “who, what, where, when, why, and how” context
surrounding a business process event.
▪ A dimension should be single valued when associated with a given fact row.
ô Dimension tables are the “SOUL” of the data model

4
12/5/2023

Facts for Measurements

▪ Facts are the measurements that result from a business process event and are
almost always numeric
▪ A single fact table row has a one-to-one relationship to a measurement event
as described by the fact table’s grain

Use Case

Imagine you work in the headquarters


of a large grocery chain. The business
has 100 grocery stores spread across
five states. Each store has a full
complement of departments,
including grocery, frozen foods, dairy,
meat, produce, bakery. Each store has
approximately 60,000 individual
products, called stock keeping units
(SKUs), on its shelves

Sample cash register receipt


10

10

5
12/5/2023

Step 1 : Select the Business Process

Management wants to better understand customer purchases as captured by the


POS system so which business process you should model ?

Point-of-sale (POS)
system

11

11

Step 2: Declare the Grain

After the business process has been identified, the design team faces a serious
decision about the granularity. What level of data detail should be made
available in the dimensional model ?

In our case study, the most granular data is an individual product on a POS
transaction.

12

12

6
12/5/2023

Step 3: Identify the Dimensions

After the grain of the fact table has been chosen, the choice of dimensions is
straightforward.
You can ask whether other dimensions can be attributed to the POS
measurements, such as the date of the sale, the store where the sale occurred,
the promotion under which the product is sold, the cashier who handled the
sale, and potentially the method of payment.

13

13

Step 4: Identify the Facts

The fourth and final step in the design is to make a careful determination of
which facts will appear in the fact table.

14

14

7
12/5/2023

Step 4: Identify the Facts

More details

15

15

Dimensional Modeling

Section 2 Basic Fact Table Techniques

16

8
12/5/2023

Basic Fact Table Techniques

▪ Fact table structure


▪ Types of fact measures:
o Additive
o Semi-Additive,
o Non-Additive Facts
▪ Types of fact tables:
o Transaction fact table
o Periodic snapshot fact table
o Accumulating snapshot fact table
o Factless fact table

17

17

Factless Fact Tables

▪ Does not include any measure column


▪ A table stores relationship between dimensions ô Good practice to define
many-to-many dimensions relationship

18

18

9
12/5/2023

Dimensional Modeling

Section 3 Basic Dim Table Techniques

19

Basic Dimension Table Techniques

▪ Dimension table structure


▪ Surrogate key in dimension table
▪ Types of dimension:
o Snowflake dimensions
o Role-playing dimensions
o (optional, self-study) Slowly changing dimensions
o (optional, self-study) Junk dimensions

20

20

10
12/5/2023

Snowflake dimensions

▪ A snowflake dimension is a set of normalized tables for a single business


entity

21

21

Should we use Snowflake dimensions?

▪ Longer relationship filter propagation chains ô Less efficient than in a


single table
▪ More tables in Fields pane ô Less intuitive experience
▪ Impossible to create a hierarchy ô Cannot use Drill down feature
▪ In PowerBI, more tables need to be loadedô Less efficient from storage
and performance perspective

22

22

11
12/5/2023

De-normalize Snowflake dimensions!

23

23

Role-Playing Dimensions

• A role-playing dimension is a dimension that can filter related facts differently

24

24

12
12/5/2023

Role-Playing Dimensions

25

25

SQL for Data Analytics

It’s time for your questions

26

13
12/5/2023

THANK YOU !

27

14

You might also like