You are on page 1of 38

CHAPTER-VI

Classification & Prediction

Prof.Anil Prajapati

1
Index

What is Classification and Prediction?


Issues
Classification methods
•Decision tree,
•Bayesian
•Rule based,
•CART
•Neural Network
Prediction Methods
•Linear-Nonlinear
•logistics regression
•Data mining tools

2
What is Classification and Prediction?

 Classification and Prediction are two forms of


data analysis that can be used for extracting
models describing important classes or to predict
future data trends.
 Classification models predict categorical class
labels
 Prediction models predict continuous valued
functions.

3
Cont..

Example of classification:
 A bank loan officer wants to analyze the data in order to
know which loan applicant are risky or which are safe.
 A marketing manager at a company needs to analyze a
customer with a given profile, who will buy a new
computer.
 In both of the above examples, a model or classifier is
constructed to predict the categorical labels. These labels
are risky or safe for loan application data and yes or no
for marketing data.
 Teachers Classify the students in class A,B,C,D

4
Cont..

Example of prediction:
 The marketing manager of Amazon needs to predict how
much a given customer will spend during a sale at his
company.
 In this example, we are bothered to predict a numeric
value. Therefore the data analysis task is an example of
numeric prediction.
Typical applications
 Credit/loan approval
 Medical diagnosis: if a tumor is cancerous or benign
 Fraud detection: if a transaction is fraudulent
 Web page categorization: which category it is

5
6
What is supervised & Unsupervised?

 Supervised learning (classification)


 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
7
How does classification work?

 Consider the bank loan application that we have


discussed previously.
 The Data Classification process includes two
steps −
1. Building the Classifier or Model
2. Using Classifier for Classification

8
Building the Classifier or Model

 This step is the learning step/learning phase.


 The classification algorithms build the classifier.
 The classifier is built from the training set made
up of database tuples and their associated class
labels.
 Each tuple that constitutes the training set is
referred to as a category or class.
 These tuples can also be referred to as sample,
object or data points.

9
Cont..

10
Using Classifier for classification

 In this step, the classifier is used for


classification.
 Here the test data is used to estimate the
accuracy of classification rules.
 The classification rules can be applied to the new
data tuples if the accuracy is considered
acceptable.

11
Cont..

12
Issues of Classification and Prediction

Data cleaning:
Pre-process the data to remove the noise and
handle missing values
Relevance analysis:
Remove irrelevant and redundant attributes
Data Transformation:
Generalize the data to higher level concepts using
concept hierarchies

13
Comparison of Classification and
Prediction methods
 Accuracy − Accuracy of classifier refers to the
ability of classifier. It predict the class label
correctly and the accuracy of the predictor refers
to how well a given predictor can guess the value
of predicted attribute for a new data.
 Speed − This refers to the computational cost in
generating and using the classifier or predictor.
 Robustness − It refers to the ability of classifier
or predictor to make correct predictions from
given noisy data.

14
Comparison of Classification and
Prediction methods
 Scalability − Scalability refers to the ability to
construct the classifier or predictor efficiently;
given large amount of data.
 Interpretability − It refers to what extent the
classifier or predictor understands.

15
Classification and Prediction methods

16
Decision Tree

 A decision tree is a structure that includes a root


node, branches, and leaf nodes.
 Each internal node denotes a test on an attribute,
each branch denotes the outcome of a test, and
each leaf node holds a class label.
 The decision tree is for the concept
buy_computer that indicates whether a
customer at a company is likely to buy a
computer or not.

17
Cont..

18
Benefits of decision tree

 It does not require any domain knowledge.


 It is easy to comprehend.
 The learning and classification steps of a decision
tree are simple and fast.

19
Requirements of classification

Sufficient data: Enough training data should be


provided

Attribute-value description: Object/Case must be


expressible in teams of the fixed collection of
properties or attributes(Ex. Pass,fail)

Predefined/Target classes: The target function


has discrete output values(boolean or multiclass)

20
Terms for Decision Tree

1. Entropy
2. Information Gain
3. Gini Index

21
Entropy

It measures how much information that we don't know


(uncertainty)
If sample data are homogenous(Same) than entropy will
be zero. And sample can be divided equally then entropy is
one. 22
Information Gain

23
Gini Index

24
Example: Sample Data

25
Calculate Information Gain

26
Divided based on Age

27
I and E for Age <=30

28
I and E for Age <=30

29
30
31
32
33
34
35
36
Final Tree

37
Classification rule for decision tree

38

You might also like