You are on page 1of 29

Data Warehouse Concepts & Modelling

Introduction

Symbiosis International University

Title: By: Dr. Tamal Mondal 1


By the end of this lecture, students would
Figure out
• Background of Data Warehousing
• Need for Data Warehousing
CESD

• Definition of Data Warehousing


• Online Transaction Processing (OLTP) and Online Analytical
Processing (OLAP)
• OLAP vs OLTP
• Data Warehouse Architecture
• Data Warehouse Schemas and Components
• Data Warehouse Operations (OLAP Operations)
2
Background
 Data warehouses generalize and consolidate data in multidimensional space.

 The construction of data warehouses involves data cleaning, data integration, and data
CESD

transformation, and can be viewed as an important preparation step in Data Mining.

 Data warehouses provide online analytical processing (OLAP) tools for interactive
analysis of multidimensional data of varied granularities, which facilitates effective
data generalization and data mining.

 Other data mining functions, such as association, classification, prediction, and


clustering, can be integrated with OLAP operations to enhance interactive mining of
knowledge at multiple levels of abstraction.
3
Background
 OLAP tools typically use data cube, a multidimensional data model, to provide
flexible access to summarized data.
CESD

 Data lakes as enterprise information infrastructure collect extensive data in


enterprises and integrate metadata so that data exploration can be conducted
effectively.

OLAP, data cubes, and data lakes are the essential features of Data Warehousing.
3
Why Data warehousing??
 Operational Decision vs Strategic Decision
 Data Types in terms of Decision Types
CESD

OLTP Queries OLAP Queries

3
Why Data warehousing??
 In Operational DB, organizations often records
the details of customer transactions in a table,
the information about customers in another
CESD

table, and the particulars about product


suppliers in a table and so on.

 Operational DBs often makes efficient the


individual processes like supplies, storage,
purchase etc. of any business.
 Data warehousing provides
architectures and tools  Limitations in providing a critical analytical
systematically organize, view over the historical, current and future of
understand, and use data to make the Business process.
strategic decisions. 3
Definition
CESD

 General Definition: In general, a data warehouse refers to a data repository that is specific for analysis and is maintained separately from
an organization’s operational databases.

 According to William H. Inmon A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data
in support of management’s decision making process.
 Subject-oriented: A data warehouse is organized around major subjects that are often identified enterprise or department wise, such
as customer, supplier, product, and sales.
 Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat
files, and online transaction records.
 Time Variant: Data are stored to provide information from a historic perspective (e.g., the past 5–10 years). Every key structure in a
data warehouse contains, either implicitly or explicitly, a time element.
 Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the
operational environment.
3
OLTP and OLAP
 OLTP systems cover most of the day-to-
day operations of an organization, such as
purchasing, inventory, manufacturing,
CESD

banking, payroll, registration, and


accounting.

 OLAP serves the diverse needs of


different users in terms of getting business
insights through analyzing the data
presented in order to make some future
decisions.

2
OLTP vs OLAP
 Users and system orientation: An OLTP system is transaction-oriented and is used
for operation execution by clerks and clients. An OLAP system is business insight-
oriented and is used for data summarization and analysis by knowledge workers,
including managers, executives, and analysts.
CESD

 Data contents: An OLTP system manages current data that are typically too detailed
to be easily used for business decision making. An OLAP system manages large
amounts of historic data, provides facilities for summarization and aggregation, and
stores and manages information at different levels of granularity, such as weekly-
monthly-annually.

 Database design: An OLTP system usually adopts an entity-relationship (ER) data


model and an application-oriented database design. An OLAP system typically adopts
either a star model or a snowflake model.
2
OLTP vs OLAP

 View: An OLTP system focuses mainly on the current data within an enterprise or
department, without referring to historic data or data in different organizations. In
CESD

contrast, an OLAP system often spans multiple versions of a database schema, due to
the evolutionary process of an organization.

 Access patterns: The access patterns of an OLTP system consist mainly of short,
atomic transactions, such as transferring an amount from one account to another.
However, accesses to OLAP systems are mostly read-only operations (because most
data warehouses store historic rather than up-to-date information).

2
Architecture of Data Warehouse
 The bottom level is a warehouse database
server that is typically a main-stream database
system, such as a relational database or a key-
value store. Back-end tools and data
extraction/transformation/loading (ETL)
CESD

utilities (cleaning, transformation, loading) are


used.

 OLAP server that is typically implemented


using either a relational OLAP or a
multidimensional OLAP (MOLAP) model.

 Front-end client layer, which contains tools


for querying, reporting, visualization, analysis,
and/or data mining, such as trend analysis and
2
prediction.
Architecture of Data Warehouse
CESD

Enterprise Datawarehouse and Data Marts 2


Architecture of Data Warehouse
Data Marts: Data Cleaning:
CESD

2
Architecture of Data Warehouse
Data Cleaning Steps:
CESD

2
Architecture of Data Warehouse
Data Cleaning Steps:
CESD

2
Architecture of Data Warehouse
Data Integration:
CESD

2
Data Warehouse Schemas & Components
 Data Cube:
 Data warehouses and OLAP tools are based on multidimensional data models, which view
data in the form a data cube.
CESD

 A data cube allows data to be modeled and viewed in multiple dimensions. It is defined
by dimensions and facts.

 In the simplest multidimensional data model, a dimension table can be built for each
dimension. For example, a dimension table for item may contain the attributes item name,
brand and type.

 Fact table stores the names of the facts, or measures, as well as (foreign) keys referencing
to each of the related dimension tables.

 For example, for subject sales in a company (fact), the possible perspectives may include
2
time, item, branch, and location (dimensions).
Data Warehouse Schemas & Components
 Data Cube:
CESD

2
Data Warehouse Schemas & Components
 Data Cube:
CESD

2
Data Warehouse Schemas & Components
 Schemas:
CESD

2
Data Warehouse Schemas & Components
 Schemas:
CESD

2
Data Warehouse Schemas & Components
 Schemas:
CESD

2
Data Warehouse Schemas & Components
 Concept Hierarchies:
CESD

2
Data Warehouse Operations (OLAP
Operations)
 Slice: A subset of Cube corresponding to a single value for one or more dimension.
Selection using 1D on a 3D to get a 2D output.
CESD

2
Data Warehouse Operations (OLAP
Operations)
 Dice: This operation describes a sub cube by operating a selection on two or more
dimensions i.e. selection using 2D on 3D to generate 3D as output.
CESD

2
Data Warehouse Operations (OLAP
Operations)
 Roll Up: Aggregation on a dimension of a data cube. Zooming out.
CESD

2
Data Warehouse Operations (OLAP
Operations)
 Drill Down: Reverse of Roll Up
CESD

2
Data Warehouse Operations (OLAP
Operations)
CESD

2
Thank You
CESD

You might also like