You are on page 1of 39

Topic 4

Data Mining for Business Intelligence

Part 4: Prediction Pattern


ISP642 – BUSINESS
INTELLIGENCE
Learning Objectives
 Learn about association pattern
Data Mining Patterns: An
Overview
DATA
MINING

PREDICTI ASSOCATI CLUSTERI


ON ON NG

Supervise Unsupervi Unsupervi


d sed sed
Learning Learning Learning
Method Method Method
Data Mining Pattern: Prediction
Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Pattern: Predictions
 Tell the nature of future occurences of certain events based
on what has happened in the past
 Predictive analysis
 Predicting the winner of the Super Bowl
 Forecasting the absolute temperature of a particular day
 Forecasting the next best product
 Stored data is used to locate data in predetermined groups.
 For example,
 a restaurant chain could mine customer purchase data to determine when
customers visit and what they typically order.
 This information could be used to increase traffic by having daily specials.
Predictive Analysis
Data Mining Pattern: Predictions
Data Mining Learning Method Popular Algorithms
 Two classes
Classification and Regression Trees,
Prediction Supervised
 Classification ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


 Regression Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Classification in Prediction Mining
 used to predict group membership for data instances.
 For example
 You use classification to predict whether the weather on a
particular day will be “sunny”, “rainy” or “cloudy
 Assign an object to a certain class based on its
similarity to previous examples of other objects
 Can be done with reference to original data or based
on a model of that data
Classification Techniques
Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Classification Technique
 Video Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Decision tree analysis
 A good automatic rule discovery technique
 Produces a set of branching decisions that end in a
classification
 Commonly used in operation research
 to help identify a strategy most likely to reach

a goal.
 Often, a decision tree is used in parallel with
a probability model as a selection model algorithm.
Decision Tree Analysis
Example: Decision Tree Analysis
 determine whether or not to play tennis?
Answer
 If the outlook is overcast,
 then we should definitely play tennis
 If it is rainy ,
 we should only play tennis if the wind is weak
 If it is sunny,
 then we should play tennis in case the humidity is
normal
Bayesian Classifier
Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Bayesian Classifier
 Uses probability theory to model the training
set
 Assumes independence between attributes
 Produces a model for each class
Support Vector Machine (SVM)
Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Support Vector Machine
 a group of supervised learning methods that can be applied to
classification or regression.
 represent an extension to nonlinear models of the generalized
portrait algorithm developed by Vladimir Vapnik.
 The SVM algorithm is based on the statistical learning theory
introduced by Vladimir Vapnik and Alexey Chervonenkis.
 Empirically good performance if it is applied in organization
because this applications can be used in many fields
(bioinformatics, text, image recognition).
Support Vector Machine
 A relatively new classification method for both linear and
nonlinear data
 Uses a nonlinear mapping to transform the original training
data into a higher dimension
 With the new dimension, it searches for the linear optimal
separating hyperplane(i.e., “decision boundary”)
 With an appropriate nonlinear mapping to a sufficiently high
dimension, data from two classes can always be separated by
a hyperplane
 SVM finds this hyperplane using support vectors(“essential”
training tuples) and margins(defined by the support vectors)
Software Tools for SVM
Example of SVM software:
 LibSVM (C++)

 SVMLight (C)

Complete machine learning toolboxes such as:


 Torch (C++)

 Spider (Matlab)

 Weka (Java)
Artificial Neural Network (ANN)
 Video Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Artificial Neural Network (ANN)
 Consists of an interconnected group of artificial
neurons, and it processes information using a
connection approach to computation
 Artificial neural networks (ANNs) are
computational models inspired by an animal's
central nervous systems (in particular the brain), and
are used to estimate or approximate functions that
can depend on a large number of inputs and are
generally unknown.
Pattern, sound
and speech
recognition
Identification of Adaptive and
military targets robotic control

Identification of
explosives in Where Electrical and
thermal load
passenger
suitcases
ANN is prediction

used?
Weather and Prediction of
market trends mineral
forecasting exploration sites
Analysis of
electromyograph
y and other
medical
signatures
Artificial Immune Network (AIN)
 adaptive systems inspired by theoretical
immunology and observed immune functions,
principles and models, which are applied to
complex problem domains
 use of the natural immune system as a

metaphor for solving computational problems.


Role of the Immune System
 Protect our bodies from pathogen and viruses
 Primary immune response
 Launch a response to invading pathogens
 Secondary immune response
 Remember past encounters
 Faster response the second time around
Basic Immune Models &
Algorithms
 Negative Selection Algorithms
 Clonal Selection Algorithm
 Immune Network Models
 Somatic Hypermutation
Rough Sets
 Video Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Rough Sets
 proposed by Pawlak (1982), is an extension of
classical set theory for dealing with imprecision
knowledge.
 It employs indiscernibility relations to evaluate to
what extent two objects are identical or similar.
 Using the relations, we can construct:
 > Lower approximation; it consists of all instances
which surely belong to the concept/class
 > Upper approximation; it contains all instances which
possibly belong to the concept.
Rough Sets
 It offers the mathematic tools for discovering
hidden patterns in data through the use of
identification of partial and total dependencies in
data.
 It also enables work with null or missing values.
 Rough sets can be used separately but usually they
are used together with other methods such as
fuzzy sets, statistic methods, genetics algorithms
etc
Genetic Algorithm
 Video Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Genetic Algorithm
 The use of the analog of natural evolution to build
directed search based mechanisms to classify data
samples
 Algorithm is started with a set of solutions called
population
 Solution from one population are taken and used to
form a new population for better one.
 Derived from Darwin’s principal of survival of the
fittest in natural genetics.
Implementation genetic algorithm
Create initial
population of
chromosomes /
individuals
Evaluates fitness of
individuals

No
Select the individuals

Apply genetic
operators
Finished all generation
in the genetic
algorithm / stopping
criteria?
Implementation of Genetic Algorithm
Evolution at the gene level
Evolution due to genetic crossover
Evolution due to genetic mutation
Result: Accuracy Estimation
 In classification problems, the primary source for
accuracy estimation is the confusion matrix
True Class TP  TN
Positive Negative Accuracy 
TP  TN  FP  FN
True False TP
Positive

Positive Positive True Positive Rate 


Predicted Class

TP  FN
Count (TP) Count (FP)
TN
True Negative Rate 
TN  FP
Negative

False True
Negative Negative
Count (FN) Count (TN) TP TP
P recision  Recall 
TP  FP TP  FN

© Pearson Education, 2014


END OF PART 4

You might also like