You are on page 1of 37

Enterprise Systems

CT107-3-2 and VD1

Data Warehousing &


Data Mining
Topic & Structure of The Lesson

• Data Warehouse
• Benefits of Data Warehouse
• Characteristics of a Data Warehouse
• Data Mining
• Kind of Data Can Be Mined and
Discovered

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining Slide ‹2› of 9
Learning Outcomes

• To understand what is a Data Warehouse


and Data Mining.
• To differentiate Data Warehouse and
Database.
• To explain benefits and characteristic of
Data Warehouse.
• To know type of data can be mined and
what can be discovered.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Key Terms You Must Be Able To
Use
• If you have mastered this topic, you should be able to use the following
terms correctly in your assignments and exams:

Data Warehouse Discovered

Data Mining

Type of data

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining Slide ‹4› of 9
Data Warehouse

• A repository (a type of database) in which


data are organized so that they can be
readily analyzed using methods such as:
– Data mining
– Decision support
– Querying

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Data Warehouse Vs. Regular
Database
• Similarity
– Consists of data tables (files), primary and other
keys, and query capabilities.

• Main Difference
Database Data Warehouse
Designed and optimized to store data Designed and optimized to respond to
analysis questions that are critical for
a business

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


The Need For Data
Warehousing
• Inconsistent decision support data.
• To improve reporting applications or
better understand the business.
• Purpose: To organization can respond
quickly and flexibly to market changes
and opportunities.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Benefits of Data Warehouse

• Marketing and sales


– Use a Data Warehouse for product
introductions, product information access,
marketing program effectiveness, and
product line profitability.
– Use the data to maximize per customer
profitability.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Benefits of Data Warehouse

• Pricing and contracts


– Use the data to calculate cost accurately to
optimize pricing of a contract.
– Accurate cost data for the contract is
important to ensure the price is not too high
or low.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Benefits of Data Warehouse

• Sales performance
– Use the data to determine sales profitability
and productivity for all territories and regions;
can obtain and analyze results by geography,
product, sales group, or individual.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Strategic Uses of Data Warehousing
Industry Functional Areas of Strategic Use
Use
Airline Operations and Crew assignment, aircraft deployment,
Marketing mix of fares, analysis of route
profitability, frequent-flyer program
promotions.

Apparel Distribution and Merchandising, inventory replenishment.


Marketing
Banking Product Development, Customer service, trend analysis,
Operations and product and service promotions,
Marketing reduction of IS expenses.

Government Operations Reporting crime areas, homeland


security.
Retail chain Distribution and Trend analysis, buying pattern analysis,
marketing pricing policy, inventory control, sales
promotion,

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Characteristics of a Data
Warehouse
• Organization
– Data are organized by subject (e.g. by
customer, vendor, product, price level, and
region).
– Contain information relevant for decision
support only.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Characteristics of a Data
Warehouse
• Consistency
– Data is coded in a consistent manner.
– E.g. gender data is encoded “m” and “f”.

• Time variant
– Data are kept for many years so they can be
used for identifying trends, forecasting, and
making comparisons over time.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Characteristics of a Data
Warehouse
• Nonvolatile
– Data are entered into the warehouse, they
are not updated.

• Relational
– Uses a relational structure.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Characteristics of a Data
Warehouse
• Client/server
– Uses the client/server architecture mainly to
provide the end user an easy access to its
data.

• Web-based
– Designed to provide an efficient computing
environment for Web-based applications.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Characteristics of a Data
Warehouse
• Integration
– Data from various sources are integrated.

• Real-time
– It is possible to arrange for real-time
capabilities.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Purpose of Data Mining

• The wide availability of huge amounts data


and imminent need for turning such data into
useful information and knowledge.

• The information and knowledge gained can


be used for application ranging from
business management, production control,
marketing analysis, engineering design and
scientific exploration.
CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining
Data Mining

• Data mining helps the end users to extract useful


information from large database -> Data
Warehouse.

• Data mining is the nontrivial extraction of implicit,


previously unknown and potentially useful
information from the data warehouse.

• Data mining also referred to as knowledge


discovery in database (KDD).
CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining
Data Mining

Also can be described in:


•The exploration and analysis of large
quantities of data in order to discover
meaningful patterns and rules.

•The goal is to allow a corporation to improve


its marketing, sales, and customers support
operations through a better understanding of
its customer.
CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining
What Kind of Data Can Be
Mined
• Flat files
– Most common data source.
– Simple data files in text or binary format.
– Data in these files can be transactions, time
series data, scientific measurement.
• Relational database
– A set of tables with columns and rows.
– Relational table corresponds to either an
object or a relationship.
CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining
What Kind of Data Can Be
Mined
• Data warehouse
– A repository of data collected from multiple
data source.
– Gives the option to analyze data from
different sources under the same roof.
• Transaction database
– A set of records representing transactions,
each with a time stamp.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


What Kind of Data Can Be
Mined
• Multimedia database
– Include video, images, audio, and text media.
– Can be stored on extended object-relational
or object-oriented database, or simply on a
file system.
– Required computer vision, computer
graphics, image interpretation, and natural
language processing methodologies.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


What Kind of Data Can Be
Mined
• Spatial database
– In addition to usual data, store geographical
information like maps, and global or regional
positioning.
• Time-series database
– Time related data such as stock market data or
logged activities.
– Usually have a continuous flow of new data
coming in.
– Need for a challenging real time analysis.
CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining
What Kind of Data Can Be
Mined
• World Wide Web
– The most heterogeneous and dynamic
repository.
– A large number of authors and publishers.
– A massive number of users accessing.
– Organized in interconnected documents ->
text, audio, video, raw data, and application.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


What Can Be Discovered?

• Characterization
• Discrimination
• Association analysis
• Classification
• Prediction
• Clustering
• Outlier analysis
• Evolution and deviation analysis
CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining
Characterization

• Summarization of general features of objects in


a target class, and produces (characteristic
rules).
• The data relevant to a user-specified class are
normally retrieved by a database query and run
through a summarization module to extract the
essence of the data at different levels of
abstraction.
• Eg. customers with minimum spending of
RM500 per month.
CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining
Discrimination

• Discrimination procedure also known as


discriminant rules.
• The comparison of the general features of
objects between two classes:

–Target Class
–Contrasting Class

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Association Analysis

• Association rules.
• Commonly used for market basket analysis.
• It studies the frequency of items occurring
together in transactional databases, based on:
– Support: identifies the frequent item sets.
– Confidence: conditional probability than an
item appears in a transaction when another
item appears.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Classification

• The organization of data in given classes.


• Also known as supervised classification.
– Gives class labels to order the objects in the
data collection.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Prediction

• Attracted considerable attention given the


potential implication of successful forecasting in
a business context.
• Use the large number of past values to consider
probable future values.
• Two major types of predictions:
– To predict some unavailable data values or
pending trends.
– To predict a class label for some data.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Clustering

• Also known as unsupervised classification.


• Organization of data in classes.
• Principles:
– Intra Class Similarity: Maximizing the similarity
between objects in a same class.
– Inter Class Similarity: Minimizing the similarity
between objects of different classes.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Outlier Analysis

• Data elements that cannot be grouped in a


given class or cluster.
• Also known as exceptions or surprises.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Evolution and Deviation
Analysis
• The study of time related data that changes in
time.
• Evolution analysis models:
– Evolutionary trends in data, which consent to
characterizing, comparing, classifying or clustering of
time related data.
• Deviation analysis
– Considers differences between measured values and
expected values, and attempts to find the cause of
the deviations from the anticipated value.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining


Quick Review Question

• Main difference between Data Warehouse


and Database.
• What can be discovered from the mined
data?

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining Slide ‹34› of 9
Summary of Main Teaching Points

• A repository (a type of database) in which


data are organized so that they can be
readily analyzed.
• Data mining helps the end users to extract
useful information from Data Warehouse.

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining Slide ‹35› of 9
Question and Answer Session

Q&A

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining Slide ‹36› of 9
What we will cover next

• Organisational Knowledge & Knowledge


Management

CT107-3.-2 Enterprise Systems Data Warehousing & Data Mining Slide 37 of 37

You might also like