Professional Documents
Culture Documents
Lec01 Introduction To DataWarehouse
Lec01 Introduction To DataWarehouse
Introduction
The construction of data warehouses involves data cleaning, data integration, and data
CESD
Data warehouses provide online analytical processing (OLAP) tools for interactive
analysis of multidimensional data of varied granularities, which facilitates effective
data generalization and data mining.
OLAP, data cubes, and data lakes are the essential features of Data Warehousing.
3
Why Data warehousing??
Operational Decision vs Strategic Decision
Data Types in terms of Decision Types
CESD
3
Why Data warehousing??
In Operational DB, organizations often records
the details of customer transactions in a table,
the information about customers in another
CESD
General Definition: In general, a data warehouse refers to a data repository that is specific for analysis and is maintained separately from
an organization’s operational databases.
According to William H. Inmon A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data
in support of management’s decision making process.
Subject-oriented: A data warehouse is organized around major subjects that are often identified enterprise or department wise, such
as customer, supplier, product, and sales.
Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat
files, and online transaction records.
Time Variant: Data are stored to provide information from a historic perspective (e.g., the past 5–10 years). Every key structure in a
data warehouse contains, either implicitly or explicitly, a time element.
Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the
operational environment.
3
OLTP and OLAP
OLTP systems cover most of the day-to-
day operations of an organization, such as
purchasing, inventory, manufacturing,
CESD
2
OLTP vs OLAP
Users and system orientation: An OLTP system is transaction-oriented and is used
for operation execution by clerks and clients. An OLAP system is business insight-
oriented and is used for data summarization and analysis by knowledge workers,
including managers, executives, and analysts.
CESD
Data contents: An OLTP system manages current data that are typically too detailed
to be easily used for business decision making. An OLAP system manages large
amounts of historic data, provides facilities for summarization and aggregation, and
stores and manages information at different levels of granularity, such as weekly-
monthly-annually.
View: An OLTP system focuses mainly on the current data within an enterprise or
department, without referring to historic data or data in different organizations. In
CESD
contrast, an OLAP system often spans multiple versions of a database schema, due to
the evolutionary process of an organization.
Access patterns: The access patterns of an OLTP system consist mainly of short,
atomic transactions, such as transferring an amount from one account to another.
However, accesses to OLAP systems are mostly read-only operations (because most
data warehouses store historic rather than up-to-date information).
2
Architecture of Data Warehouse
The bottom level is a warehouse database
server that is typically a main-stream database
system, such as a relational database or a key-
value store. Back-end tools and data
extraction/transformation/loading (ETL)
CESD
2
Architecture of Data Warehouse
Data Cleaning Steps:
CESD
2
Architecture of Data Warehouse
Data Cleaning Steps:
CESD
2
Architecture of Data Warehouse
Data Integration:
CESD
2
Data Warehouse Schemas & Components
Data Cube:
Data warehouses and OLAP tools are based on multidimensional data models, which view
data in the form a data cube.
CESD
A data cube allows data to be modeled and viewed in multiple dimensions. It is defined
by dimensions and facts.
In the simplest multidimensional data model, a dimension table can be built for each
dimension. For example, a dimension table for item may contain the attributes item name,
brand and type.
Fact table stores the names of the facts, or measures, as well as (foreign) keys referencing
to each of the related dimension tables.
For example, for subject sales in a company (fact), the possible perspectives may include
2
time, item, branch, and location (dimensions).
Data Warehouse Schemas & Components
Data Cube:
CESD
2
Data Warehouse Schemas & Components
Data Cube:
CESD
2
Data Warehouse Schemas & Components
Schemas:
CESD
2
Data Warehouse Schemas & Components
Schemas:
CESD
2
Data Warehouse Schemas & Components
Schemas:
CESD
2
Data Warehouse Schemas & Components
Concept Hierarchies:
CESD
2
Data Warehouse Operations (OLAP
Operations)
Slice: A subset of Cube corresponding to a single value for one or more dimension.
Selection using 1D on a 3D to get a 2D output.
CESD
2
Data Warehouse Operations (OLAP
Operations)
Dice: This operation describes a sub cube by operating a selection on two or more
dimensions i.e. selection using 2D on 3D to generate 3D as output.
CESD
2
Data Warehouse Operations (OLAP
Operations)
Roll Up: Aggregation on a dimension of a data cube. Zooming out.
CESD
2
Data Warehouse Operations (OLAP
Operations)
Drill Down: Reverse of Roll Up
CESD
2
Data Warehouse Operations (OLAP
Operations)
CESD
2
Thank You
CESD