You are on page 1of 30

‫ن‬ ‫ش‬ ‫ن‬

‫ہللا کے ام سے روع ج و ب ڑا مہرب اننہای ت رحم‬


‫‪Data Warehousing & Mining‬‬
‫کرے واال ے‬
Data Warehousing & Mining
(CS-422)
Credit Hours: 3 + 1

By

Engr.Shakeel Ahmed Shaikh


Lecturer
Email: shakeel.sheikh@buetk.edu.pk

Department of CSE&S
Balochistan University of Engineering and Technology
Khuzdar
Data Warehousing & Mining
Data Warehousing & Mining
DWH Course Details

Course Title: Data Warehousing & Mining


(CS-422)
Credit Hours: 3 (Theory) + 1 (Lab)

Prerequisites: Database Management Systems 

Data Warehousing & Mining


Online Technical Support
• Online Access Point:
– MS Team (shakeel.sheikh@buetk.edu.pk)
– WhatsApp(+923337145585)
– Email(shakeel.sheikh@buetk.edu.pk)
– Timing: 7pm to 11pm
• Technical Support/Helpdesk:
– Mr. Farhan Maqsi
• Email (Farhan.ahmed@buetk.edu.pk)
• Contact#(03337827525)
– Mr. Israr Raisani
• Email (Israr.Raisani@buetk.edu.pk)
• Contact#(03337860190
Data Warehousing & Mining
Course Objectives

“To enable the students to understand different


features / issues in data warehousing and its
designing and data mining concepts”

Data Warehousing & Mining


Course Learning Outcomes (CLO’s)
S.No Course Learning Outcome (CLO) Program Learning
Outcome (PLO)
1 Understand the functionality of the PLO-1
various components of data (Engineering
warehousing architecture and describe Knowledge)
that what data mining is and how data
mining can be employed and applied to
solve real problems.
2 ILLUSTRATE different models of OLAP PLO-1
and learn dimensional modeling (Engineering
technique in building a data warehouse. Knowledge)

3 Analyze different techniques and PLO-4


algorithms for data mining. (Investigation)
Lab CLO
4
Demonstrate different data mining PLO-5
techniques for the discovery of relevant
knowledge from large datasets using
suitable tool.
Course Contents
 Data Warehouse: Basic concepts, operational DBMS v/s data Warehouse. Data
Warehouse characteristics, Architecture and component. Data Modeling, Schema
Design, star and snow-Flake Schema. OLAP and OLTP. ROLAP, MOLAP and
HOLAP.
 Data Mining: Introduction, KDD process. Data extraction and preprocessing.
 Classification and Prediction: Basic concepts and Classification algorithms;
Decision trees, Naïve-Bayes Classifier, K-nearest neighbor.
 Clustering Analysis: Clustering overview, clustering algorithms; K-Means,
Hierarchical Clustering.
 Association Rules: Basic concepts and methods.
 Data Mining Trends and Research Frontiers: cloud data warehousing, web,
Spatial and temporal data mining.
 Data Mining tools & applications: RapidMiner / Weka

Data Warehousing & Mining


Assessment Mechanism
Assessment tools Weightage

Two Assignments/ Two 10%


Quizzes
Practical Viva and Practical 20%
Journal
Mid Examination 20%

Final Examination 50%

Data Warehousing & Mining


Recommended Books

1. Data Mining: Concepts and Techniques, J. Han, J. Pei,


M. Kamber, Publishers, Elsevier, 3 Edition, 2011.
2. Introduction to Data Mining, Pang-Ning Tan, Vipin
Kumar, Michael Steinbanch, Pearson Education.
3. Data Mining, V. Pudi, P. R. Krishna, Oxford University.
4. Mastering Data Ware Housing Design by Claudio
Imhoff
5. W. H. Inmon, Building the Data Warehouse (Second
Edition), John Wiley & Sons Inc., NY.

Data Warehousing & Mining


Referenced Books

1. Introduction to Data Mining by Pang-Ning Tang,


Michael Steinbatch and Vipin Kumar
2. Data Ware Housing Fundamentals: A Comprehensive
Guide for I.T Professionals by Paulraj Ponnjah

Data Warehousing & Mining


Lecture # 1
Introduction & Background

Data Warehousing & Mining


Outlines

 Why a Data Warehousing?


 The Need for a Data Warehouse
 Historical Overview
 Data Warehouse Definition
 Bill Inmon’s view of a DWH
 Reading Materials

Data Warehousing & Mining


Why Data Warehousing?
• The world economy has moved form the industrial age into
information driven knowledge economy
– The information age is characterized by the computer
technology, modern communication technology and
Internet technology
– Governments around the globe have realized potential of
information, as a “multifactor” in the development of their
economy, which not only creates wealth for the society,
but also affects the future of the country
– Thus, many countries in the world have placed the modern
information technology into their strategic plans.

Data Warehousing & Mining


The Need for a Datawarehouse

“Drowning in data and starving for


information”
“Knowledge is power, Intelligence is
absolute power!”

Data Warehousing & Mining


The Need for a Datawarehouse
$
POWER

INTELLIGENCE

KNOWLEDGE

INFORMATION

DATA

Data Warehousing & Mining


Historical overview
1960
Master Files & Reports

1965
Lots of Master files!

1970
Direct Access Memory & DBMS

1975
Online high performance transaction processing 

Data Warehousing & Mining


Historical overview

1980
PCs and 4GL Technology (MIS/DSS) 
1985 & 1990 
Extract programs, extract processing,
The legacy system’s web

Data Warehousing & Mining


Why a Data Warehouse (DWH)?
• Data recording and storage is growing.

• History is excellent predictor of the future.

• Gives total view of the organization.

Data Warehousing & Mining


Why a Data Warehouse (DWH)?
• Intelligent decision-support is required for decision-
making.
– Consider a bank which is losing customers, for reasons not
known.
– Therefore, it is important, actually critical to understand
which customers have left and why they have left.
– However, intelligent decision-support will give you the
ability to predict going forward (in time), to identify which
customers will leave you (i.e. the bank).
– We are going to talk about this in the course using data
mining algorithms, like clustering, classification,
regression analysis etc.
– However, this being another example of using historical
data to predict theData
future
Warehousing & Mining
Reason-1: Why a Data Warehouse?
• Data Sets are growing.

How Much Data is that?


1 MB 220 or 106 bytes Small novel – 31/2 Disk
Paper rims that could fill the back of
1 GB 230 or 109 bytes
a pickup van
50,000 trees chopped and converted
1 TB 240 or 1012 bytes
into paper and printed
Academic research libraries across
2 PB 1 PB = 250 or 1015 bytes
the U.S.
All words ever spoken by human
5 EB 1 EB = 260 or 1018 bytes
beings

Data Warehousing & Mining


Reason-1: Why a Data Warehouse?
• Size of Data Sets are going up .
• Cost of data storage is coming down .
– The amount of data average business collects and
stores is doubling every year
• A Few Examples
• WalMart: 24 TB
• France Telecom: ~ 100 TB
• CERN: Up to 20 PB by 2006
• Stanford Linear Accelerator Center (SLAC):
500TB

Data Warehousing & Mining


What is a Data Warehouse?

A complete repository of historical


corporate data extracted from
transaction systems that is available
for ad-hoc access by knowledge
workers.

Data Warehousing & Mining


What is a Data Warehouse?
Complete repository
History
Transaction System
Ad-Hoc access
Knowledge workers

Data Warehousing & Mining


What is a Data Warehouse?
Transaction System
– Management Information System (MIS)
– Could be typed sheets (NOT transaction system)

Ad-Hoc access
– Does not have a certain access pattern.
– Queries not known in advance.
– Difficult to write SQL in advance.

Knowledge workers
– Typically NOT IT literate (Executives, Analysts, Managers).
– NOT clerical workers.
– Decision makers.
Data Warehousing & Mining
Bill Inmon’s View of Data Warehousing

Subject
Oriented

Integrated

Time
Variant

Non
Volatile

Data Warehousing & Mining


Bill Inmon’s View of Data Warehousing
• Subject oriented: The goal of data in the data warehouse is to improve
decision making, planning, and control of the major subjects of
enterprises such as customer, products, regions etc
• Integrated: The data in the data warehouse is loaded from different
sources that store the data in different formats and focus on different
aspects of the subject. The data has to be checked, cleansed and
transformed into a unified format to allow easy and fast access.
• Time variant: Every record in the data warehouse has some form of time
variancy associated with it. In an OLTP system, the contents change with
time i.e. updated such as bank account balance or mobile phone balance,
but in a warehouse as the data is loaded; the moment usually becomes its
time stamp.
• Non-volatile: Unlike OLTP systems, after inserting data in the data
warehouse it is neither changed nor removed. The only exceptions are
when false or incorrect data gets inserted erroneously or the capacity of
the data warehouse exceeded and archiving becomes necessary

Data Warehousing & Mining


Reading Materials

• Data Ware Housing Fundamentals: A


Comprehensive Guide for I.T Professionals by
Paulraj Ponnjah
• W. H. Inmon, Building the Data Warehouse
(Second Edition), John Wiley & Sons Inc., NY.

Data Warehousing & Mining


Data Warehousing & Mining
Data Warehousing & Mining

You might also like