You are on page 1of 31

FACULTY: COMPUTING & SWE

Introduction to Data Mining


and
Warehousing
Amin T (Asst. Prof). (2022/23)
CHAPTER ONE: DATA WAREHOUSE

05/03/2023 Amin T. (Asst. Prof) 2


What is Data warehouse?
Suppose, ABC Pvt Ltd is a company with branches at Addis ababa,
Hawassa, Arba minch and Dire Dawa. The Sales Manager wants
quarterly sales report. Each branch has a separate operational system.

A.A

Hawassa
Sales per item type per branch Sales
for first quarter. Manager
Arba M.

D.D.

05/03/2023 Amin T. (Asst. Prof) 3


Continued ...
• Solution:
– Extract sales information from each database.
– Store the information in a common repository at a single site.

A.A

Report
Hawassa Query & Sales
Data
Analysis tools Manager
Repository
Arba M.

D.D

05/03/2023 Amin T. (Asst. Prof) 4


• A data warehouse is a subject-oriented, Integrated, time-variant, non-volatile collection of
data in support of management’s decision making process.

• It is generally used for research and decision support.

05/03/2023 Amin T. (Asst. Prof) 5


Continued …
• Subject Oriented:
– Data warehouse is organized around major subjects such as sales, product,
customer
Customer

Customer Data
(1988 - 1990) Customer activity
Customer Data (1986- 1989)
(1985 - 1987)

Customer Activity detail Customer Activity detail


(1985 - 1987) (1990 - 1991)

05/03/2023 Amin T. (Asst. Prof) 6


Continued …
• Integrated:
– Data Warehouse is constructed by integrating multiple heterogeneous sources.
– Data Preprocessing are applied to ensure consistency

05/03/2023 Amin T. (Asst. Prof) 7


Continued …
• Time Variant:
– Provides information from historical perspective e.g. past 5-10 years
– Data warehouse stores historical data.

05/03/2023 Amin T. (Asst. Prof) 8


Continued …
• Non- volatile:
– Data once recorded cannot be updated.
– Data warehouse requires two operations:
– Initial loading of data
– Access of data

05/03/2023 Amin T. (Asst. Prof) 9


Need/importance of data warehouse
source of information for report generation

Increase quality and flexibility of enterprise analysis

Ability to maintain better customer relationships

source for data analysis and data mining

More cost – effective decision making and policy formulation

05/03/2023 Amin T. (Asst. Prof) 10


Data Warehouse for Decision Support & OLAP

• Putting Information technology to help the knowledge worker


make faster and better decisions
– What product promotions have the biggest impact on
revenue?
– How did the share price of software companies correlate
with profits over last 10 years?

05/03/2023 Amin T. (Asst. Prof) 11


Continued …
• Decision Support
– Used to manage and control business
– Data is historical or point-in-time
– Optimized for inquiry rather than update
– Use of the system is loosely defined and can be ad-hoc
– Used by managers and end-users to understand the business and
make judgements

05/03/2023 Amin T. (Asst. Prof) 12


Continued …
• Data Mining works with Warehouse Data

• Data Warehousing provides the Enterprise with a


memory

• Data Mining provides the Enterprise with intelligence

05/03/2023 Amin T. (Asst. Prof) 13


Continued …
• Data warehouse as one step in the process of data mining

05/03/2023 Amin T. (Asst. Prof) 14



• Features of Data warehouse
– It is separate from Operational Database.
– Integrates data from heterogeneous systems.
– Stores HUGE amount of data, more historical than current data.
– Does not require data to be highly accurate.
– Queries are generally complex.
– Goal is to execute statistical queries and provide results which can influence decision making
in favor of the Enterprise.
– These systems are thus called Online Analytical Processing Systems

05/03/2023 Amin T. (Asst. Prof) 15



• Why have a separate Warehouse?
– Data warehouse queries are often complex and they present a general form of
data. In contrast, OLTP systems require high concurrency, reliability, locking
which provide good performance for short and simple queries.
– An operational database query allows reading and modifying operations, while
an OLAP query needs only read only access of stored data.
– An operational database maintains current data. On the other hand, a data
warehouse maintains historical data.

05/03/2023 Amin T. (Asst. Prof) 16


Application Areas

Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis

05/03/2023 Amin T. (Asst. Prof) 17


Trend analysis, prediction,
classification and so on)

3 –Tier Data • Implemented either ROLAP


warehouse or MOLAP
architecture • support and operate on multi-
dimensional data structures

ROLAP: Relational Online Analytical


Processing
MOLAP: Multidimensional OLAP

Almost always a relational


DBMS, rarely flat files

05/03/2023 Amin T. (Asst. Prof) 18


Data warehouse
architecture

05/03/2023 Amin T. (Asst. Prof) 19


Data warehouse model & Types
• Example

05/03/2023 Amin T. (Asst. Prof) 20


Data Warehouse Models
• The enterprise warehouse
– An enterprise warehouse collects all of the information about subjects spanning
the entire organization.
– It provides corporate-wide data integration,
– It typically contains detailed data as well as summarized data
• The data mart
– A data mart contains a subset of corporate-wide data
– The scope is confined to specific selected subjects.
– It can be either Dependent or Independent data mart.
• The virtual warehouse.
– is a set of views over operational databases.
– It is easy to build but requires excess capacity on operational database servers.
05/03/2023 Amin T. (Asst. Prof) 21
Data warehouse and OLAP Technology
Data Warehouse Back-End Tools and Utilities
• Data extraction,
– which typically gathers data from multiple, heterogeneous, and external
sources.
• Data cleaning,
– which detects errors in the data and rectifies them when possible
• Data transformation,
– which converts data from legacy or host format to warehouse format.
• Load,
– which sorts, summarizes, consolidates, computes views, checks integrity, and
builds indices and partitions
• Refresh,
– which propagates the updates from the data sources to the warehouse.
05/03/2023 Amin T. (Asst. Prof) 22
Continued …
• Data warehouses and OLAP tools are based on a multidimensional data model. This
model views data in the form of a data cube.
• Methods for the efficient implementation of data warehouse system:
– Efficient Computation of Data Cubes

Report : The number of items sold and income in each region for each product with time.

Jan Feb Mar

ct
du
ETB U ETB U ETB U

o
Pr
A.A Wheat Bread 7.44 3
Cheese 7.95 3 42.40 16 15.90 6 Data
Warehouse

Regio
Hawassa Wheat Bread 7.44 3

n
Cheese 7.95 3
Swiss Rolls 7.32 4 16.47 9 27.45 15
Time
A data cube consists of a lattice of cuboids. Each cuboid corresponds to a different
degree of summarization of the given multidimensional data.
05/03/2023 Amin T. (Asst. Prof) 23
Continued …
• The compute cube Operator and the Curse of Dimensionality
– Cube operations
– Roll-up - is also known as "consolidation" or "aggregation.“ AM
DD
– It can be done via AA
– Reducing dimensions Home Appliance
– Climbing up concept hierarchy. Kitchen Appliance

Q1 Q2 Q3 Q4
AM
E.g. Category DD
e.g Electrical Appliance AA
Sub Category Air Conditioner
e.g Kitchen appliance
Television
Product Toaster
e.g Toaster Boiler
Q1 Q2 Q3 Q4
05/03/2023 Amin T. (Asst. Prof) 24
Continued …
• Drill down
– In drill-down data is fragmented into smaller parts. It is the opposite of the rollup process.
– It can be done via
– Moving down the concept hierarchy AM
DD
– Increasing a dimension AA

Home Appliance

E.g. Category e.g Electrical Appliance Kitchen Appliance

Q1 Q2 Q3 Q4 AM
Sub Category e.g Kitchen DD
AA
Product e.g Toaster Air Conditioner
Television
Toaster
Boiler
05/03/2023 Amin T. (Asst. Prof) Q1 Q2 Q3 25Q4
Continue …
• Slice
– Here, one dimension is selected, and a new
sub-cube is created.
– Following diagram explain how slice
operation performed:

– Dimension Time is Sliced with Q1 as the


filter.
– A new cube is created altogether.

05/03/2023 Amin T. (Asst. Prof) 26


Continued …
• Dice:
– This operation is similar to a slice.
– The difference in dice is you select 2 or more
dimensions that result in the creation of a sub-
cube.

Books Clothes PC Mobile

05/03/2023 Amin T. (Asst. Prof) 27


Continued …
• Pivot:
– In Pivot, you rotate the data axes to provide
a substitute presentation of data.
– In the following example, the pivot is based
on item types.

05/03/2023 Amin T. (Asst. Prof) 28


From data warehouse to data mining
• Data warehousing is the process of constructing and using data warehouses.
• The construction of a data warehouse requires data cleaning, data integration, and data
consolidation.
• Many organizations use this information to support business decision-making activities,
– Including Increasing customer focus, which includes the analysis of customer
buying patterns (such as buying preference, buying time, budget cycles, and
appetites for spending)
– repositioning products and managing product portfolios by comparing the
performance of sales by quarter, by year, and by geographic regions in order
to fine-tune production strategies;
– analyzing operations and looking for sources of profit; and
– Managing customer relationships, making environmental corrections, and
managing the cost of corporate assets
05/03/2023 Amin T. (Asst. Prof) 29
Continued

05/03/2023 Amin T. (Asst. Prof) 30


Thank You

You might also like