64 views

Uploaded by api-27259648

- Improving The Accuracy Of KNN Classifier For Heart Attack Using Genetic Algorithm
- Decision Tree.10.11
- Data Analysis and Prediction System for Meteorological Data
- Assessing Credit Risk- An Application of Data Mining in a Rural Bank
- DM 5
- Decision Trees Partially Classified Data
- Survey of Data Mining
- COMPARING VARIOUS TECHNIQUES OF DETECTION OF SIX FACIAL EXPRESSION WITH THE ALGORITHM ID3( DECISION TREE BASED)
- Implication Of Classification Techniques In Predicting Student’s Recital
- IJDIWC_Volume 6, Issue 1
- HYPERTENSION DIAGNOSIS USING COMPOUND PATTERN RECOGNITION METHODS
- Naive Bayes_report (Repaired)
- A review on various classification algorithm for Acute Kidney Injury diagnostic method
- Classification
- Data Mining - Classification: Alternative Techniques
- Linear Discriminant Analysis (LDA) Tutorial
- report
- Efficeint_SQL_Datamining
- Artigo Previsão de Faleências usando o Algoritimo FID 3d
- Classification of Student’s E-Learning Experiences’ in Social Media via Text Mining

You are on page 1of 71

Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

October 15, 2008 Data Mining: Concepts and Techniques 1

Classification vs. Prediction

Classification:

predicts categorical class labels

a classifying attribute and uses it in classifying

new data

Prediction:

models continuous-valued functions, i.e.,

Typical Applications

credit approval

target marketing

medical diagnosis

Classification—A Two-Step

Process

classes

Each tuple/sample is assumed to belong to a predefined

The set of tuples used for model construction: training

set

The model is represented as classification rules,

Model usage: for classifying future or unknown objects

Estimate accuracy of the model

Accuracy rate is the percentage of test set samples

that are correctly classified by the model

Test set is independent of training set, otherwise

October 15, 2008 Data Mining: Concepts and Techniques 3

Classification Process (1):

Model Construction

Classification

Algorithms

Training

Data

M ike Assistant Prof 3 no (Model)

M ary Assistant Prof 7 yes

Bill Professor 2 yes

Jim Associate Prof 7 yes

IF rank = ‘professor’

Dave Assistant Prof 6 no

OR years > 6

Anne Associate Prof 3 no

THEN tenured = ‘yes’

October 15, 2008 Data Mining: Concepts and Techniques 4

Classification Process (2): Use

the Model in Prediction

Classifier

Testing

Data Unseen Data

(Jeff, Professor, 4)

NAM E RANK YEARS TENURED

Tom Assistant Prof 2 no Tenured?

M erlisa Associate Prof 7 no

G eorge Professor 5 yes

Joseph Assistant Prof 7 yes

October 15, 2008 Data Mining: Concepts and Techniques 5

Supervised vs. Unsupervised

Learning

Supervised learning (classification)

Supervision: The training data (observations,

measurements, etc.) are accompanied by

labels indicating the class of the observations

New data is classified based on the training

set

Unsupervised learning (clustering)

The class labels of training data is unknown

Given a set of measurements, observations,

etc. with the aim of establishing the existence

of classes or clusters

October 15, 2008 in the

Data Mining: Concepts data

and Techniques 6

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

October 15, 2008 Data Mining: Concepts and Techniques 7

Issues regarding classification and

prediction (1): Data Preparation

Data cleaning

Preprocess data in order to reduce noise and

handle missing values

Relevance analysis (feature selection)

Remove the irrelevant or redundant attributes

Data transformation

Generalize and/or normalize data

Issues regarding classification and

prediction (2): Evaluating Classification

Methods

Predictive accuracy

Speed and scalability

time to construct the model

Robustness

handling noise and missing values

Scalability

efficiency in disk-resident databases

Interpretability:

understanding and insight provded by the

model

Goodness of rules

decision tree size

October 15, 2008 Data Mining: Concepts and Techniques 9

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

October 15, 2008 Data Mining: Concepts and Techniques 10

Classification by Decision Tree

Induction

Decision tree

A flow-chart-like tree structure

Tree construction

At start, all the training examples are at the root

Partition examples recursively based on selected

attributes

Tree pruning

Identify and remove branches that reflect noise or

outliers

Use of decision tree: Classifying an unknown sample

October Test

15, 2008 the attributeData

values

Mining: of the and

Concepts sample against the

Techniques 11

Training Dataset

This <=30 high no fair

<=30 high no excellent

follows 31…40 high no fair

an >40 medium no fair

example >40 low yes fair

>40 low yes excellent

from 31…40 low yes excellent

Quinlan’ <=30 medium no fair

s ID3 <=30 low yes fair

>40 medium yes fair

<=30 medium yes excellent

31…40 medium no excellent

31…40 high yes fair

>40 medium no excellent

October 15, 2008 Data Mining: Concepts and Techniques 12

Output: A Decision Tree for “buys_computer”

age?

<=30 overcast

30..40 >40

no yes no yes

Algorithm for Decision Tree

Induction

Basic algorithm (a greedy algorithm)

Tree is constructed in a top-down recursive divide-and-

conquer manner

At start, all the training examples are at the root

discretized in advance)

Examples are partitioned recursively based on selected

attributes

Test attributes are selected on the basis of a heuristic or

Conditions for stopping partitioning

All samples for a given node belong to the same class

There are no samples left

Attribute Selection

Measure

All attributes are assumed to be categorical

All attributes are assumed continuous-valued

May need other tools, such as clustering, to get

Can be modified for categorical attributes

Information Gain

(ID3/C4.5)

gain

Assume there are two classes, P and N

Let the set of examples S contain p elements of

class P and n elements of class N

The amount of information, needed to decide if an

arbitrary example in S belongs to P or N is

p p n n

defined )=−

I ( p, nas log 2 − log 2

p+n p+n p+n p+n

Information Gain in Decision

Tree Induction

partitioned into sets {S1, S2 , …, Sv}

If Si contains pi examples of P and ni examples

of N, the entropy, or the expected information

needed to classify objects in all subtrees Si is

ν p +n

E ( A) = ∑ i i I ( pi , ni )

i =1 p + n

by branching on A ( A) = I ( p, n) − E ( A)

Gain

Attribute Selection by

Information Gain Computation

5 4

E ( age) = I ( 2,3) + I ( 4,0)

Class P: buys_computer = 14 14

“yes” 5

+ I (3,2) =0.69

14

Class N: buys_computer =

“no” Hence

I(p, n) = I(9, 5) =0.940 Gain(age) = I ( p, n) − E (age)

Compute the entropy for

Similarly

age:

age pi ni I(pi, ni) Gain(income) = 0.029

<=30 2 3 0.971

Gain( student ) = 0.151

30…40 4 0 0

>40 3 2 0.971 Gain(credit _ rating ) = 0.048

Gini Index (IBM

IntelligentMiner)

If a data set T contains examples from n classes, gini

index, gini(T) is defined as n 2

gini(T ) =1− ∑ p j

j =1

where pj is the relative frequency of class j in T.

If a data set T is split into two subsets T1 and T2 with

sizes N1 and N2 respectively, the gini index of the

split data contains examples from n classes, the gini

index gini(T) is defined asN 1 N2

gini split (T ) = gini(T 1) + gini(T 2)

N N

The attribute provides the smallest ginisplit(T) is

chosen to split the node (need to enumerate all

possible splitting points for each attribute).

October 15, 2008 Data Mining: Concepts and Techniques 19

Extracting Classification Rules from

Trees

rules

One rule is created for each path from the root to a

leaf

Each attribute-value pair along a path forms a

conjunction

The leaf node holds the class prediction

Rules are easier for humans to understand

Example

IF age = “<=30” AND student = “no” THEN buys_computer

= “no”

IF age = “<=30” AND student = “yes” THEN buys_computer

= “yes”

IF age = “31…40” THEN buys_computer

=2008

October 15, “yes” Data Mining: Concepts and Techniques 20

Avoid Overfitting in

Classification

The generated tree may overfit the training data

Too many branches, some may reflect

Result is in poor accuracy for unseen samples

goodness measure falling below a threshold

Difficult to choose an appropriate threshold

Postpruning: Remove branches from a “fully

pruned trees

Use a set of data different from the training

October 15, 2008 data to decide which

Data Mining: Concepts is

and the “best pruned

Techniques 21

Approaches to Determine the

Final Tree Size

Separate training (2/3) and testing (1/3)

sets

Use cross validation, e.g., 10-fold cross

validation

Use all the data for training

but apply a statistical test (e.g., chi-

square) to estimate whether expanding

or pruning a node may improve the

entire distribution

Use minimum Data

15, 2008

October description length (MDL)

Mining: Concepts and Techniques 22

Enhancements to basic

decision tree induction

Allow for continuous-valued attributes

Dynamically define new discrete-valued

value into a discrete set of intervals

Handle missing attribute values

Assign the most common value of the attribute

Attribute construction

Create new attributes based on existing ones

This reduces fragmentation, repetition, and

replication

October 15, 2008 Data Mining: Concepts and Techniques 23

Classification in Large Databases

studied by statisticians and machine learning

researchers

Scalability: Classifying data sets with millions of

examples and hundreds of attributes with

reasonable speed

Why decision tree induction in data mining?

relatively faster learning speed (than other

classification methods)

convertible to simple and easy to understand

classification rules

can use SQL queries for accessing databases

October 15, 2008 Data Mining: Concepts and Techniques 24

Scalable Decision Tree

Induction Methods in Data

Mining Studies

SLIQ (EDBT’96 — Mehta et al.)

builds an index for each attribute and only class

memory

SPRINT (VLDB’96 — J. Shafer et al.)

RainForest (VLDB’98 — Gehrke, Ramakrishnan &

Ganti)

separates the scalability aspects from the criteria

that

October 15, 2008 determineData

the quality

Mining: ofTechniques

Concepts and the tree 25

Data Cube-Based Decision-

Tree Induction

induction (Kamber et al’97).

Classification at primitive concept levels

E.g., precise temperature, humidity, outlook, etc.

classification-trees

Semantic interpretation problems.

Relevance analysis at multi-levels.

Presentation of Classification

Results

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

Bayesian Classification: Why?

for hypothesis, among the most practical

approaches to certain types of learning problems

Incremental: Each training example can

incrementally increase/decrease the probability that

a hypothesis is correct. Prior knowledge can be

combined with observed data.

Probabilistic prediction: Predict multiple hypotheses,

weighted by their probabilities

Standard: Even when Bayesian methods are

computationally intractable, they can provide a

standard of optimal decision making against which

other methods can be measured

October 15, 2008 Data Mining: Concepts and Techniques 29

Bayesian

Theorem

hypothesis h, P(h|D) follows the Bayes theorem

P(h | D) = P(D | h)P(h)

P(D)

MAP (maximum posteriori) hypothesis

h ≡ arg max P(h | D) = arg max P(D | h)P(h).

MAP h∈H h∈H

Practical difficulty: require initial knowledge of

many probabilities, significant computational

cost

Naïve Bayes Classifier (I)

conditionally independent:

n

P( C j |V ) ∞ P( C j )∏ P( v i | C j )

i =1

only count the class distribution.

Naive Bayesian Classifier (II)

probabilities

O utlook P N H um idity P N

sunny 2/9 3/5 high 3/9 4/5

overcast 4/9 0 norm al 6/9 1/5

rain 3/9 2/5

Tem preature W indy

hot 2/9 2/5 true 3/9 3/5

m ild 4/9 2/5 false 6/9 2/5

cool 3/9 1/5

Bayesian classification

The classification problem may be

formalized using a-posteriori probabilities:

P(C|X) = prob. that the sample tuple

X=<x1,…,xk> is of class C.

E.g. P(class=N |

outlook=sunny,windy=true,…)

such that P(C|X) is maximal

October 15, 2008 Data Mining: Concepts and Techniques 33

Estimating a-posteriori

probabilities

Bayes theorem:

P(C|X) = P(X|C)·P(C) / P(X)

P(X) is constant for all classes

P(C) = relative freq of class C samples

C such that P(C|X) is maximum =

C such that P(X|C)·P(C) is maximum

Problem: computing P(X|C) is unfeasible!

October 15, 2008 Data Mining: Concepts and Techniques 34

Naïve Bayesian Classification

P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)

If i-th attribute is categorical:

P(xi|C) is estimated as the relative freq of

samples having value xi as i-th attribute in

class C

If i-th attribute is continuous:

P(xi|C) is estimated thru a Gaussian

density function

Computationally easy in both cases

October 15, 2008 Data Mining: Concepts and Techniques 35

Play-tennis example: estimating

P(xi|C) outlook

P(sunny|p) = P(sunny|n) =

Outlook Temperature Humidity W indy Class

sunny hot high false N 2/9

P(overcast|p) = 3/5

P(overcast|n) =

sunny hot high true N

overcast hot high false P 4/9

P(rain|p) = 3/9 0

P(rain|n) = 2/5

rain mild high false P

rain

rain

cool

cool

normal false

normal true

P

N

temperature

overcast cool normal true P

sunny mild high false N P(hot|p) = 2/9 P(hot|n) = 2/5

sunny cool normal false P

rain mild normal false P P(mild|p) = 4/9 P(mild|n) = 2/5

sunny mild normal true P

overcast mild high true P P(cool|p) = 3/9 P(cool|n) = 1/5

overcast hot normal false P

rain mild high true N humidity

P(high|p) = 3/9 P(high|n) = 4/5

P(p) = 9/14 P(normal|p) = P(normal|n) =

6/9

windy 2/5

P(n) = 5/14

P(true|p) = 3/9 P(true|n) = 3/5

P(false|p) = 6/9 P(false|n) = 2/5

October 15, 2008 Data Mining: Concepts and Techniques 36

Play-tennis example: classifying

X

An unseen sample X = <rain, hot, high,

false>

P(X|p)·P(p) =

P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p)

= 3/9·2/9·3/9·6/9·9/14 = 0.010582

P(X|n)·P(n) =

P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)

= 2/5·2/5·4/5·2/5·5/14 = 0.018286

play)

October 15, 2008 Data Mining: Concepts and Techniques 37

The independence hypothesis…

… yields optimal classifiers when satisfied

… but is seldom satisfied in practice, as

attributes (variables) are often correlated.

Attempts to overcome this limitation:

Bayesian networks, that combine Bayesian

reasoning with causal relationships between

attributes

Decision trees, that reason on one attribute at

the time, considering most important

October 15, 2008 Data Mining: Concepts and Techniques 38

Bayesian Belief Networks (I)

Family

Smoker

History

(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)

LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9

for the variable LungCancer

PositiveXRay Dyspnea

Bayesian Belief Networks (II)

Bayesian belief network allows a subset of the

variables conditionally independent

A graphical model of causal relationships

Several cases of learning Bayesian belief networks

Given both network structure and all the

variables: easy

Given network structure but only some variables

When the network structure is not known in

advance

October 15, 2008 Data Mining: Concepts and Techniques 40

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

Neural Networks

Advantages

prediction accuracy is generally high

errors

output may be discrete, real-valued, or a

attributes

fast evaluation of the learned target function

Criticism

long training time

(weights)

October 15, 2008 Data Mining: Concepts and Techniques 42

A Neuron

- µk

x0 w0

x1 w1

∑ f

output y

xn wn

vector x vector w sum function

The n-dimensional input vector x is mapped

into variable y by means of the scalar

product and a nonlinear function mapping

October 15, 2008 Data Mining: Concepts and Techniques 43

Network Training

obtain a set of weights that makes almost all

correctly

Steps

Initialize weights with random values

one

For each unit

Compute the net input to the unit as a linear

combination of all the inputs to the unit

Compute the output value using the activation

October 15, 2008 Data Mining: Concepts and Techniques

Multi-Layer Perceptron

Output vector

Err j = O j (1 − O j )∑ Errk w jk

Output nodes k

θ j = θ j + (l) Err j

wij = wij + (l ) Err j Oi

Hidden nodes Err j = O j (1 − O j )(T j − O j )

wij 1

Oj = −I j

1+ e

Input nodes

I j = ∑ wij Oi + θ j

i

Input vector: xi

October 15, 2008 Data Mining: Concepts and Techniques

Network Pruning and Rule

Extraction

Network pruning

Fully connected network will be hard to articulate

N input nodes, h hidden nodes and m output nodes lead

to h(m+N) weights

Pruning: Remove some of the links without affecting

classification accuracy of the network

Extracting rules from a trained network

Discretize activation values; replace individual activation

value by the cluster average maintaining the network

accuracy

Enumerate the output from the discretized activation

values to find rules between activation value and output

Find the relationship between the input and activation

value

October 15, 2008 Data Mining: Concepts and Techniques

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

October 15, 2008 Data Mining: Concepts and Techniques 47

Association-Based Classification

classification

ARCS: Quantitative association mining and

It beats C4.5 in (mainly) scalability and also accuracy

Associative classification: (Liu et al’98)

It mines high support and high confidence rules in the

form of “cond_set => y”, where y is a class label

CAEP (Classification by aggregating emerging

patterns) (Dong et al’99)

Emerging patterns (EPs): the itemsets whose support

increases significantly from one class to another

Mine Eps based on minimum support and growth rate

October 15, 2008 Data Mining: Concepts and Techniques 48

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

October 15, 2008 Data Mining: Concepts and Techniques 49

Other Classification Methods

case-based reasoning

Genetic algorithm

Rough set approach

Fuzzy set approaches

Instance-Based Methods

Instance-based learning:

Store training examples and delay the

instance must be classified

Typical approaches

k-nearest neighbor approach

Euclidean space.

Locally weighted regression

Case-based reasoning

knowledge-based inference

October 15, 2008 Data Mining: Concepts and Techniques 51

The k-Nearest Neighbor

Algorithm

All instances correspond to points in the n-D

space.

The nearest neighbor are defined in terms of

Euclidean distance.

The target function could be discrete- or real-

valued.

For discrete-valued, the k-NN returns the most

common value among the k training examples

nearest to xq.

Vonoroi diagram: the decision surface induced

_

by 1-NN for_ a _typical set of training examples.

_

.

+

_ .

+

xq + . . .

October 15, 2008

_

+ .

Data Mining: Concepts and Techniques 52

Discussion on the k-NN

Algorithm

The k-NN algorithm for continuous-valued target

functions

Calculate the mean values of the k nearest

neighbors

Distance-weighted nearest neighbor algorithm

Weight the contribution of each of the k 1

w≡

d ( xq , xi )2

neighbors according to their distance to the

query point xq

giving greater weight to closer neighbors

Similarly, for real-valued target functions

neighbors

Curse

October 15, 2008 of dimensionality: distance

Data Mining: Concepts between

and Techniques 53

Case-Based Reasoning

Also uses: lazy evaluation + analyze similar

instances

Difference: Instances are not “points in a Euclidean

space”

Example: Water faucet problem in CADET (Sycara et

al’92)

Methodology

Multiple retrieved cases may be combined

Research issues

October 15, 2008 Data Mining: Concepts and Techniques 54

Remarks on Lazy vs. Eager

Learning

Instance-based learning: lazy evaluation

Decision-tree and Bayesian classification: eager evaluation

Key differences

Lazy method may consider query instance xq when

Eager method cannot since they have already chosen

Efficiency: Lazy - less time training but more time predicting

Accuracy

Lazy method effectively uses a richer hypothesis space

global approximation to the target function

Eager: must commit to a single hypothesis that covers the

October 15, 2008 Data Mining: Concepts and Techniques 55

Genetic Algorithms

GA: based on an analogy to biological evolution

Each rule is represented by a string of bits

e.g., IF A and Not A then C can be encoded as

1 2 2

100

Based on the notion of survival of the fittest, a new

and their offsprings

The fitness of a rule is represented by its

examples

Offsprings

October 15, 2008 are generated by crossover

Data Mining: Concepts and Techniques and 56

Rough Set Approach

Rough sets are used to approximately or

“roughly” define equivalent classes

A rough set for a given class C is approximated

by two sets: a lower approximation (certain to be

in C) and an upper approximation (cannot be

described as not belonging to C)

Finding the minimal subsets (reducts) of

attributes (for feature reduction) is NP-hard but a

discernibility matrix is used to reduce the

computation intensity

Fuzzy Set

Approaches

to represent the degree of membership (such as

using fuzzy membership graph)

Attribute values are converted to fuzzy values

values calculated

For a given new sample, more than one fuzzy

Each applicable rule contributes a vote for

Typically, the truth values for each predicted

category

October 15, 2008 are summed

Data Mining: Concepts and Techniques 58

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

October 15, 2008 Data Mining: Concepts and Techniques 59

What Is

Prediction?

First, construct a model

Second, use model to predict unknown value

Major method for prediction is regression

Linear and multiple regression

Non-linear regression

Prediction is different from classification

Classification refers to predict categorical

class label

Prediction models continuous-valued functions

October 15, 2008 Data Mining: Concepts and Techniques 60

Predictive Modeling in

Databases

Predictive modeling: Predict data values or

construct generalized linear models based on the

database data.

One can only predict value ranges or category

distributions

Method outline:

Minimal generalization

Attribute relevance analysis

Generalized linear model construction

Prediction

Determine the major factors which influence the

prediction

Data relevance analysis: uncertainty

October 15, 2008 Data Mining: Concepts and Techniques 61

Regress Analysis and Log-Linear

Models in Prediction

Linear regression: Y = α + β X

Two parameters , α and β specify the line and

using the least squares criterion to the known

Multiple regression: Y = b0 + b1 X1 + b2 X2.

Many nonlinear functions can be transformed

Log-linear models:

The multi-way table of joint probabilities is

tables.

Probability: p(a, b, c, d) = αab βacχad δbcd

October 15, 2008 Data Mining: Concepts and Techniques 62

Locally Weighted Regression

Construct an explicit approximation to f over a

local region surrounding query instance xq.

Locally weighted linear regression:

The target function f is approximated near xq

( x) = w + w a ( x)++w a ( x)

f

using the linear function:0 1 1 n n

minimize the squared error: distance-decreasing

q 2 x∈k _nearest _neighbors_of _ x q

q

the gradient descent training rule:

∆w j ≡ η ∑ K (d ( xq , x))(( f ( x) − f ( x))a j ( x)

x ∈k _ nearest _ neighbors_ of _ xq

In most cases, the target function is approximated

by a constant, linear, or quadratic function.

October 15, 2008 Data Mining: Concepts and Techniques 63

Prediction: Numerical Data

Prediction: Categorical Data

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

October 15, 2008 Data Mining: Concepts and Techniques 66

Classification Accuracy: Estimating

Error Rates

Partition: Training-and-testing

use two independent data sets, e.g., training

set (2/3), test set(1/3)

used for data set with large number of samples

Cross-validation

divide the data set into k subsamples

use k-1 subsamples as training data and one

sub-sample as test data --- k-fold cross-

validation

for data set with moderate size

Bootstrapping (leave-one-out)

for small sizeData

October 15, 2008 dataMining: Concepts and Techniques 67

Boosting and Bagging

accuracy

Applicable to decision trees or

Bayesian classifier

Learn a series of classifiers, where each

classifier in the series pays more

attention to the examples misclassified

by its predecessor

Boosting requires only linear time and

constant space

October 15, 2008 Data Mining: Concepts and Techniques 68

Boosting Technique (II) — Algorithm

1/N

For t = 1, 2, …, T Do

Obtain a hypothesis (classifier) h(t)

under w(t)

Calculate the error of h(t) and re-weight

Normalize w(t+1) to sum to 1

hypothesis, with each hypothesis

October 15, 2008 Data Mining: Concepts and Techniques 69

Chapter 7. Classification and

Prediction

What is classification? What is prediction?

Issues regarding classification and prediction

Classification by decision tree induction

Bayesian Classification

Classification by backpropagation

Classification based on concepts from

association rule mining

Other Classification Methods

Prediction

Classification accuracy

Summary

October 15, 2008 Data Mining: Concepts and Techniques 70

Summary

(mainly in statistics, machine learning & neural

networks)

Classification is probably one of the most widely

used data mining techniques with a lot of

extensions

Scalability is still an important issue for database

applications: thus combining classification with

database techniques should be a promising topic

Research directions: classification of non-relational

October 15, 2008 Data Mining: Concepts and Techniques 71

- Improving The Accuracy Of KNN Classifier For Heart Attack Using Genetic AlgorithmUploaded byطوق الياسمين
- Decision Tree.10.11Uploaded byamit_iitd
- Data Analysis and Prediction System for Meteorological DataUploaded byIRJET Journal
- Assessing Credit Risk- An Application of Data Mining in a Rural BankUploaded bydeva putra
- DM 5Uploaded bypoornima venkatesh
- Decision Trees Partially Classified DataUploaded byNaufalAhda
- Survey of Data MiningUploaded bySufi Syarif
- COMPARING VARIOUS TECHNIQUES OF DETECTION OF SIX FACIAL EXPRESSION WITH THE ALGORITHM ID3( DECISION TREE BASED)Uploaded byIJIERT-International Journal of Innovations in Engineering Research and Technology
- Implication Of Classification Techniques In Predicting Student’s RecitalUploaded byLewis Torres
- IJDIWC_Volume 6, Issue 1Uploaded byIJDIWC
- HYPERTENSION DIAGNOSIS USING COMPOUND PATTERN RECOGNITION METHODSUploaded bysukmapipit
- Naive Bayes_report (Repaired)Uploaded byHimanshu Patel
- A review on various classification algorithm for Acute Kidney Injury diagnostic methodUploaded byIRJET Journal
- ClassificationUploaded bygirirahul11
- Data Mining - Classification: Alternative TechniquesUploaded byTran Duy Quang
- Linear Discriminant Analysis (LDA) TutorialUploaded byRupali
- reportUploaded byapi-369423954
- Efficeint_SQL_DataminingUploaded byChristopher L Rodrigues
- Artigo Previsão de Faleências usando o Algoritimo FID 3dUploaded byAndrew D. Murray
- Classification of Student’s E-Learning Experiences’ in Social Media via Text MiningUploaded byIOSRjournal
- machine-learning-lecture-3-1205567881737248-5.pdfUploaded byGayathri
- Digital Gasoline Indicator which Shows Accurate Fuel/Gasoline in the TankUploaded byIRJET Journal
- 2004 - Face Recognition Using Two Novel Nearest Neighbor Classifiers - InUploaded byluongxuandan
- Zf 34251259Uploaded byAJER JOURNAL
- [IJETA-V1I2P1] Author: S Revathi, N RajkumarUploaded byIJETA - EighthSenseGroup
- 06 Learning Systems 2Uploaded byYogesh Bansal
- An Investigation on the Correlation of Learner Styles and Learning Objects Characteristics in a ProposedUploaded byAna Nonantzin
- Speaker Independent Speech Emotion Recognition by Ensemble ClassificationUploaded bylyquyen8
- 21-dtreeUploaded byAbdoosh F-e
- v7.0 TutorialUploaded bySri Ram

- Programmable Keyboard-Display Interface - 8279Uploaded byapi-27259648
- jaeUploaded byapi-27259648
- coprocessor1Uploaded byapi-27259648
- DS 82C55Uploaded byilluminatissm
- EC1307 Microprocessor and Application LabUploaded byapi-27259648
- CTS_Final_MCAUploaded byapi-27259648
- MC1702Uploaded byapi-27259648
- Unit 2 FinalUploaded byapi-27259648
- Application Format 126 GDOCUploaded byapi-27259648
- Application Format 126 GDOCUploaded byapi-27259648
- Virtual MemoryUploaded byapi-27259648
- JAVA_QTN2Uploaded byapi-27259648
- Campus_data_format_HCL.xlsMCAUploaded byapi-27259648
- CTS_Topper_shopper_MCAUploaded byapi-27259648
- CA-no_standing_arrearUploaded byapi-27259648
- Chap_8Uploaded byapi-27259648
- HCL__CTS.xlsMCAUploaded byapi-27259648
- flashformUploaded byapi-27259648
- Chap_9Uploaded byapi-27259648
- Unix 4 and 5Uploaded byapi-3836654
- Flash MaskUploaded byapi-27259648
- Chap_6Uploaded byapi-27259648
- Time table-AUploaded byapi-27259648
- IdlUploaded byapi-3836654
- flashUploaded byapi-27259648
- PRINT1Uploaded byapi-27259648
- Creating own brush toolUploaded byapi-27259648
- PRINT2Uploaded byapi-27259648
- Creating night effectsUploaded byapi-27259648

- RE_615ANSI_tech_050144_ENcUploaded byCarlos Yebra
- Kirloskar MF ManualUploaded byatulnalawade1980
- 5 Nobel Prize.docxUploaded byVrusti Rao
- BOvedaUploaded byJoe Fago
- APA StylesUploaded bywajania2
- sap-bpc-10-nw-pamUploaded byMuhammad Rikhzan
- Students ProposalUploaded byvinita
- Crypto 101Uploaded byMatthew Phillips
- The Body in Performance : A Study Room Guide on body based practicesUploaded bythisisLiveArt
- The effects of diabetes medications on post-operative long bone Giannoudis Simpson2015.pdfUploaded byKlaus
- Assignment 1 OrUploaded byPiyush Virmani
- lesson 43 pdfUploaded byapi-276774049
- Intelligent Prediction of Sieving Efficiency in Vibrating ScreensUploaded bySHAIJU
- Taking Wealth Management to the Next LevelUploaded byCognizant
- How to spot a liarUploaded byTran Kim
- On AIR Community Voice-Rupantar_2012Uploaded byKhokan Sikder
- SDCG_Live_Webinar_Sales_Presentation_8-28-17_v2.pptxUploaded bykarthik pr
- elvishUploaded byLence Skymon
- Differential Diagnoses of SeizuresUploaded bylinevet
- m600Uploaded byAlaa Ramadan
- pwh_guidance_no.10_Occupational exposure to vibration from hand-held toolsUploaded byWalter Alcala
- Seminar Report on MemristorUploaded byjasil1414
- Signal Flow GraphUploaded bysymbo11
- Eaton Ipp Users Guide en 1.40Uploaded byziker2443
- Classful and ClasslessUploaded byAamir Rashid
- Cost Accounting ConceptsUploaded byRiz Wan
- Belonging and Love Needs in AntigoneUploaded byIpoenk Zuhro
- Exp 3 Direct Shear TestUploaded byumar
- ENG_WUploaded byJose Vasus
- Job Eight SevenUploaded byNomtai