You are on page 1of 39

Logical Design in

Data Warehouse
Sufia Adha Putri
sufia@ub.ac.id

Fakultas Ilmu Komputer Universitas Brawijaya


Data Warehouse & Data Mining - Logical Design in Data
Fakultas Ilmu Komputer Universitas Brawijaya 1
Warehouse
Objectives

• Logical Versus Physical Design in Data Warehouses


• Creating a Logical Design
• Data Warehousing Schemas
• Data Warehousing Objects

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 2
Warehouse
Logical Versus Physical
Design in Data Warehouses
• Defined the business requirements and agreed upon the scope of your
application, and created a conceptual design.
• Now you need to translate your requirements into a system deliverable.
• To do so, you create the logical and physical design for the data warehouse.
• You then define:
• The specific data content
• Relationships within and between groups of data
• The system environment supporting your data warehouse
• The data transformations required
• The frequency with which data is refreshed
Data Warehouse & Data Mining - Logical Design in Data
Fakultas Ilmu Komputer Universitas Brawijaya 3
Warehouse
Data Warehouse Schema
Architecture
• A schema is a collection of database objects, including tables, views,
indexes, and synonyms
• Most data warehouses use a dimensional model
• The model of your source data and the requirements of your users help
you design the data warehouse schema
• Data Warehouse Schema :
• Star Schema
• Snowflake Schema
• Fact constellation schema

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 4
Warehouse
Star Schema
• What is star schema?
• The star schema architecture is the simplest data warehouse schema.
• It is called a star schema because the diagram resembles a star, with points
radiating from a centre.
• The centre of the star consists of fact table and the points of the star are the
dimension tables.
• Usually the fact tables in a star schema are in third normal form(3NF)
whereas dimensional tables are de-normalized.
• Despite the fact that the star schema is the simplest architecture, it is most
commonly used nowadays and is recommended by Oracle

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 5
Warehouse
Star Schema

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 6
Warehouse
The main characteristics
of star schema:
• Simple structure  easy to understand schema
• Great query effectives  small number of tables to join
• Relatively long time of loading data into dimension tables  de-
normalization, redundancy data caused that size of the table could
be large.
• The most commonly used in the data warehouse implementations
 widely supported by a large number of business intelligence
tools

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 7
Warehouse
Design Concepts in Star Schemas
• Here we touch on some of the key terms used in star schemas. This is by no
means a full set, but is intended to highlight some of the areas worth your
consideration. This section contains the following topics:
• Data Grain
• Working with Multiple Star Schemas
• Conformed Dimensions
• Conformed Facts
• Surrogate Keys
• Degenerate Dimensions
• Junk Dimensions
• Embedded Hierarchy
• Factless Fact Tables
• Slowly Changing Dimensions

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 8
Warehouse
Data Grain
• One of the most important tasks when designing your model is to consider the level
of detail it will provide, referred to as the grain of the data.
• Consider a sales schema:
• will the grain be very fine, storing every single item purchased by each customer?
• Or will it be a coarse grain, storing only the daily totals of sales for each product at each store?
• In modern data warehousing there is a strong emphasis on providing the finest grain
data possible, because this allows for maximum analytic power.
• Dimensional modeling experts generally recommend that each fact table store just
one grain level.
• Presenting fact data in single-grain tables supports more reliable querying and table
maintenance, because there is no ambiguity about the scope of any row in a fact
table
Data Warehouse & Data Mining - Logical Design in Data
Fakultas Ilmu Komputer Universitas Brawijaya 9
Warehouse
Working with Multiple Star
Schemas
• Because the star schema design approach is intended to chunk data
into distinct processes, you need reliable and performant ways to
traverse the schemas when queries span multiple schemas.
• One term for this ability is a data warehouse bus architecture.
• A data warehouse bus architecture can be achieved with conformed
dimensions and conformed facts

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 10
Warehouse
Conformed Dimensions
• Conformed dimensions means that dimensions are designed
identically across the various star schemas.
• Conformed dimensions use the same values, column names and data
types consistently across multiple stars.
• The conformed dimensions do not have to contain the same number
of rows in each schema's copy of the dimension table, as long as the
rows in the shorter tables are a true subset of the larger tables.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 11
Warehouse
Conformed Facts
• If the fact columns in multiple fact tables have exactly the same
meaning, then they are considered conformed facts.
• Such facts can be used together reliably in calculations even though
they are from different tables.
• Conformed facts should have the same column names to indicate
their conformed status.
• Facts that are not conformed should always have different names to
highlight their different meanings

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 12
Warehouse
Surrogate Keys
• Surrogate or artificial keys, usually sequential integers, are
recommended for dimension tables.
• By using surrogate keys, the data is insulated from operational
changes.
• Also, compact integer keys may allow for better performance than
large and complex alphanumeric keys.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 13
Warehouse
Degenerate Dimensions
• Degenerate dimensions are dimension columns in fact tables that do
not join to a dimension table.
• They are typically items such as order numbers and invoice numbers

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 14
Warehouse
Junk Dimensions
• Junk dimensions are abstract dimension tables used to hold text
lookup values for flags and codes in fact tables.
• These dimensions are referred to as junk, not because they have low
value, but because they hold an assortment of columns for
convenience, analogous to the idea of a "junk drawer" in your home.
• The number of distinct values (cardinality) of each column in a junk
dimension table is typically small

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 15
Warehouse
Junk Dimensions
• Let's look at an example. Assuming that we have the following fact table:

• In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are all indicator fields.
• In this existing format, each one of them is a dimension.
• Using the junk dimension principle, we can combine them into a single junk dimension

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 16
Warehouse
Embedded Hierarchy
• Classic dimensional modeling with star schemas advocates that each table
contain data at a single grain.
• However, there are situations where designers choose to have multiple grains in a
table, and these commonly represent a rollup hierarchy.
• A single sales fact table, for instance, might contain both transaction-level data,
then a day-level rollup by product, then a month-level rollup by product.
• In such cases, the fact table will need to contain a level column indicating the
hierarchy level applying to each row, and queries against the table will need to
include a level predicate

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 17
Warehouse
Factless Fact Tables
• A factless fact table is a fact table that does not have any measures

• Factless fact tables offer the most flexibility in data warehouse design. For example, one can
easily answer the following questions with this factless fact table:
• How many students attended a particular class on a particular day?
• How many classes on average does a student attend on a given day?
• Without using a factless fact table, we will need two separate fact tables to answer the above two
questions. With the above factless fact table, it becomes the only fact table that's needed.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 18
Warehouse
Slowly Changing Dimensions
• The "Slowly Changing Dimension" problem is a common one particular to data warehousing. In a
nutshell, this applies to cases where the attribute for a record varies over time.
• We give an example below: Christina is a customer with ABC Inc. She first lived in Chicago, Illinois.
So, the original entry in the customer lookup table has the following record:
Customer Key Name State
1001 Christina Illinois

• At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now
modify its customer table to reflect this change? This is the "Slowly Changing Dimension"
problem.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 19
Warehouse
Slowly Changing Dimensions
• There are in general three ways to solve this type of problem,
and they are categorized as follows:
• Type 1: The new record replaces the original record. No trace of
the old record exists.
• Type 2: A new record is added into the customer dimension
table. Therefore, the customer is treated essentially as two
people.
• Type 3: The original record is modified to reflect the change.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 20
Warehouse
Type 1 Slowly Changing Dimensions
• In Type 1 Slowly Changing Dimension, the new information simply overwrites the original
information. In other words, no history is kept.
• In our example, recall we originally have the following table:

Customer Key Name State


1001 Christina Illinois

• After Christina moved from Illinois to California, the new information replaces the new record,
and we have the following table:
Customer Key Name State
1001 Christina California

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 21
Warehouse
Type 1 Slowly Changing Dimensions
• Advantages: This is the easiest way to handle the Slowly Changing Dimension
problem, since there is no need to keep track of the old information.
• Disadvantages: All history is lost. By applying this methodology, it is not possible
to trace back in history. For example, in this case, the company would not be able
to know that Christina lived in Illinois before.
• Usage: About 50% of the time.
• When to use Type 1: Type 1 slowly changing dimension should be used when it is
not necessary for the data warehouse to keep track of historical changes.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 22
Warehouse
Type 2 Slowly Changing Dimension
• In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information.
Therefore, both the original and the new record will be present. The new record gets its own primary key.
• In our example, recall we originally have the following table:

Customer Key Name State


1001 Christina Illinois
• After Christina moved from Illinois to California, we add the new information as a new row into the table:

Customer Key Name State
1001 Christina Illinois
1005 Christina California

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 23
Warehouse
Type 2 Slowly Changing Dimension
• Advantages: This allows us to accurately keep all historical information.
• Disadvantages:
• This will cause the size of the table to grow fast. In cases where the number of rows for the
table is very high to start with, storage and performance can become a concern.
• This necessarily complicates the ETL process.
• Usage: About 50% of the time.
• When to use Type 2: Type 2 slowly changing dimension should be used when it is
necessary for the data warehouse to track historical changes.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 24
Warehouse
Type 3 Slowly Changing Dimension
• In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular
attribute of interest, one indicating the original value, and one indicating the current value. There
will also be a column that indicates when the current value becomes active.
• In our example, recall we originally have the following table:
Customer Key Name State
1001 Christina Illinois
• To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:
Customer Key, Name, Original State, Current State, Effective Date
• After Christina moved from Illinois to California, the original information gets updated, and we
have the following table (assuming the effective date of change is January 15, 2003):
Customer Key Name Original State Current State Effective Date
1001 Christina Illinois California 15-JAN-2003

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 25
Warehouse
Type 3 Slowly Changing Dimension
• Advantages:
• This does not increase the size of the table, since new information is updated.
• This allows us to keep some part of history.
• Disadvantages: Type 3 will not be able to keep all history where an attribute is
changed more than once. For example, if Christina later moves to Texas on
December 15, 2003, the California information will be lost.
• Usage: Type 3 is rarely used in actual practice.
• When to use Type 3: Type III slowly changing dimension should only be used
when it is necessary for the data warehouse to track historical changes, and when
such changes will only occur for a finite number of time.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 26
Warehouse
Snowflake Schema
• Snowflake schema…?
• “Snowflaking” is a method of normalizing the dimension tables in a STAR
schema
• The snowflake schema architecture is a more complex variation of the star
schema used in a data warehouse, because the tables which describe the
dimensions are normalized

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 27
Warehouse
Snowflake Schema

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 28
Warehouse
Advantages and Disadvantages
Snowflake Schema
• Advantages :
• Small savings in storage space
• Normalized structures are easier to update and maintain
• Disadvantages
• Schema less intuitive and end-users are put off by the complexity
• Ability to browse through the contents difficult
• Degraded query performance because of additional joins
• Snowflaking is not generally recommended in a data warehouse
environment. Query performance takes the highest significance in a
data warehouse and snowflaking hampers the performance
Data Warehouse & Data Mining - Logical Design in Data
Fakultas Ilmu Komputer Universitas Brawijaya 29
Warehouse
Fact Constellation Schema
• Fact Constellation Schema?
• For each star schema it is possible to construct fact constellation schema(for
example by splitting the original star schema into more star schemes each of
them describes facts on another level of dimension hierarchies).
• The fact constellation architecture contains multiple fact tables that share
many dimension tables.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 30
Warehouse
Fact Constellation
Schema

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 31
Warehouse
Data Warehousing Objects
• The two types of objects commonly used in dimensional data warehouse
schemas :
• Fact tables…?
• Dimension tables…?

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 32
Warehouse
Data Warehousing Objects:
Fact Tables
• Fact tables are the large tables in your data warehouse schema that store business
measurements.
• Fact tables typically contain facts and foreign keys to the dimension tables.
• Fact tables represent data, usually numeric and additive, that can be analysed and examined.
• A fact table contains either detail-level facts or facts that have been aggregated
• Examples include sales, cost, and profit
• Requirements of Fact Tables
• You must define a fact table for each star schema. From a modeling standpoint, the primary key
of the fact table is usually a composite key that is made up of all of its foreign keys

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 33
Warehouse
Data Warehousing Objects:
Dimension Tables
• Dimension tables, also known as lookup or reference tables, contain the relatively
static data in the data warehouse.
• Dimension tables store the information you normally use to contain queries.
• Dimension tables are usually textual and descriptive and you can use them as the
row headers of the result set
• Examples are customers or products
• A dimension is a structure, often composed of one or more hierarchies, that
categorizes data
• Dimension data is typically collected at the lowest level of detail and then aggregated
into higher level totals that are more useful for analysis. These natural rollups or
aggregations within a dimension table are called hierarchies

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 34
Warehouse
Hierarchies
• Hierarchies :
• are logical structures that use ordered levels to organize data.
• A hierarchy can be used to define data aggregation.
• For example, in a time dimension, a hierarchy might aggregate data from the month
level to the quarter level to the year level. A hierarchy can also be used to define a
navigational drill path and to establish a family structure.
• Levels : A level represents a position in a hierarchy. For example, a time
dimension might have a hierarchy that represents data at the month, quarter, and
year levels
• Level Relationships Level relationships specify top-to-bottom ordering of levels
from most general (the root) to most specific information. They define the
parent-child relationship between the levels in a hierarchy
Data Warehouse & Data Mining - Logical Design in Data
Fakultas Ilmu Komputer Universitas Brawijaya 35
Warehouse
Typical Dimension
Hierarchy

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 36
Warehouse
Data Warehousing Objects:
Unique Identifiers
• Unique identifiers are specified for one distinct record in a dimension
table.
• Artificial unique identifiers are often used to avoid the potential
problem of unique identifiers changing.
• Unique identifiers are represented with the # character. For example,
#customer_id

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 37
Warehouse
Data Warehousing Objects: Relationships
• Relationships guarantee business integrity.
• An example is that if a business sells something, there is obviously a customer
and a product. Designing a relationship between the sales information in the
fact table and the dimension tables products and customers enforces the
business rules in databases

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 38
Warehouse
Example of Data Warehousing
Objects and Their Relationships
• Figure 2–3 illustrates a common example of a sales fact table and dimension tables customers, products, promotions,
times, and channels.

Data Warehouse & Data Mining - Logical Design in Data


Fakultas Ilmu Komputer Universitas Brawijaya 39
Warehouse

You might also like