V1-CH-6-Classification and Prediction

CHAPTER-VI
Classification & Prediction
Prof.Anil Prajapati
1
Index
What is Classification and Prediction?

Issues
Classification methods
•Decision tree,
•Bayesian
•Rule based,
•CART
•Neural Network
Prediction Methods
•Linear-Nonlinear
•logistics regression
•Data mining tools
2
What is Classification and Prediction?
 Classification and Prediction are two forms of

data analysis that can be used for extracting
models describing important classes or to predict
future data trends.
 Classification models predict categorical class
labels
 Prediction models predict continuous valued
functions.
3
Cont..
Example of classification:
 A bank loan officer wants to analyze the data in order to
know which loan applicant are risky or which are safe.
 A marketing manager at a company needs to analyze a
customer with a given profile, who will buy a new
computer.
 In both of the above examples, a model or classifier is
constructed to predict the categorical labels. These labels
are risky or safe for loan application data and yes or no
for marketing data.
 Teachers Classify the students in class A,B,C,D
4
Cont..
Example of prediction:
 The marketing manager of Amazon needs to predict how
much a given customer will spend during a sale at his
company.
 In this example, we are bothered to predict a numeric
value. Therefore the data analysis task is an example of
numeric prediction.
Typical applications
 Credit/loan approval
 Medical diagnosis: if a tumor is cancerous or benign
 Fraud detection: if a transaction is fraudulent
 Web page categorization: which category it is
5
6
What is supervised & Unsupervised?
 Supervised learning (classification)

 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
7
How does classification work?
 Consider the bank loan application that we have

discussed previously.
 The Data Classification process includes two
steps −
1. Building the Classifier or Model
2. Using Classifier for Classification
8
Building the Classifier or Model
 This step is the learning step/learning phase.

 The classification algorithms build the classifier.
 The classifier is built from the training set made
up of database tuples and their associated class
labels.
 Each tuple that constitutes the training set is
referred to as a category or class.
 These tuples can also be referred to as sample,
object or data points.
9
Cont..
10
Using Classifier for classification
 In this step, the classifier is used for

classification.
 Here the test data is used to estimate the
accuracy of classification rules.
 The classification rules can be applied to the new
data tuples if the accuracy is considered
acceptable.
11
Cont..
12
Issues of Classification and Prediction
Data cleaning:
Pre-process the data to remove the noise and
handle missing values
Relevance analysis:
Remove irrelevant and redundant attributes
Data Transformation:
Generalize the data to higher level concepts using
concept hierarchies
13
Comparison of Classification and
Prediction methods
 Accuracy − Accuracy of classifier refers to the
ability of classifier. It predict the class label
correctly and the accuracy of the predictor refers
to how well a given predictor can guess the value
of predicted attribute for a new data.
 Speed − This refers to the computational cost in
generating and using the classifier or predictor.
 Robustness − It refers to the ability of classifier
or predictor to make correct predictions from
given noisy data.
14
Comparison of Classification and
Prediction methods
 Scalability − Scalability refers to the ability to
construct the classifier or predictor efficiently;
given large amount of data.
 Interpretability − It refers to what extent the
classifier or predictor understands.
15
Classification and Prediction methods
16
Decision Tree
 A decision tree is a structure that includes a root

node, branches, and leaf nodes.
 Each internal node denotes a test on an attribute,
each branch denotes the outcome of a test, and
each leaf node holds a class label.
 The decision tree is for the concept
buy_computer that indicates whether a
customer at a company is likely to buy a
computer or not.
17
Cont..
18
Benefits of decision tree
 It does not require any domain knowledge.

 It is easy to comprehend.
 The learning and classification steps of a decision
tree are simple and fast.
19
Requirements of classification
Sufficient data: Enough training data should be

provided
Attribute-value description: Object/Case must be

expressible in teams of the fixed collection of
properties or attributes(Ex. Pass,fail)
Predefined/Target classes: The target function

has discrete output values(boolean or multiclass)
20
Terms for Decision Tree
1. Entropy
2. Information Gain
3. Gini Index
21
Entropy
It measures how much information that we don't know

(uncertainty)
If sample data are homogenous(Same) than entropy will
be zero. And sample can be divided equally then entropy is
one. 22
Information Gain
23
Gini Index
24
Example: Sample Data
25
Calculate Information Gain
26
Divided based on Age
27
I and E for Age <=30
28
I and E for Age <=30
29
30
31
32
33
34
35
36
Final Tree
37
Classification rule for decision tree
38

V1-CH-6-Classification and Prediction

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

V1-CH-6-Classification and Prediction

Uploaded by

Copyright:

Available Formats

CHAPTER-VI

Classification & Prediction

What is Classification and Prediction?

 Classification and Prediction are two forms of

 Supervised learning (classification)

 Consider the bank loan application that we have

 This step is the learning step/learning phase.

 In this step, the classifier is used for

 A decision tree is a structure that includes a root

 It does not require any domain knowledge.

Sufficient data: Enough training data should be

Attribute-value description: Object/Case must be

Predefined/Target classes: The target function

It measures how much information that we don't know

You might also like