You are on page 1of 28

CHAPTER – 6 Department of CE/IT

Concepts of Classification and DATA MINING AND


INFORMATION
Prediction RETRIEVAL
(01CE07070)
A bank loans officer needs analysis of her data to learn which loan applicants
are “safe” and which are “risky” for the bank.

A marketing manager at AllElectronics needs data analysis to help guess


whether a customer with a given profile will buy a new computer.

Basic A medical researcher wants to analyze breast cancer data to predict which
one of three specific treatments a patient should receive.
Concepts -
In each of these examples, the data analysis task is classification, where a
Classification model or classifier is constructed to predict class (categorical) labels, such as
“safe” or “risky” for the loan application data; “yes” or “no” for the marketing
data; or “treatment A,” “treatment B,” or “treatment C” for the medical data.

Suppose that the marketing manager wants to predict how much a given
customer will spend during a sale at AllElectronics. This data analysis task is an
example of numeric prediction
Data classification is a two-step
process,
1. Learning step (where a classification
model is constructed)
2. Classification step (where the model Learning:
is used to predict class labels for
given data).
The process is shown for the loan
Application data

Basic
Concepts -
Classification
Classification
1. Decision Tree
2. Naïve bayes

Classification 3. Rule based classifier


techniques 4. Model evaluation and selection
5. Bagging
6. Boosting
A decision tree is a flowchart-like tree structure,
Each internal node (nonleaf node) denotes a test on an attribute,
Each branch represents an outcome of the test,
1. Decision Each leaf node (or terminal node) holds a class label.
Tree The topmost node in a tree is the root node.

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
During the late 1970s and early 1980s, J. Ross Quinlan, a researcher in machine
learning, developed a decision tree algorithm known as ID3

1.a. Decision
Tree Induction
1.a.Decision
Tree Induction

Three possibilities for partitioning tuples based on the splitting criterion,


1.b. Attribute
Selection
1.b. Attribute
Selection
1.b. Attribute
Selection
1.c. Gain Ratio
1.d. Gini Index
1.d. Gini Index
1.d.e. Comparing
Attribute
Selection
Measures
2. Naïve Bayes
Classifier
2. Naïve Bayes
Classifier
2. Naïve Bayes
Classifier
3. Rule-Based
Classification
3. Rule-Based
Classification
4. Model
Evaluation and
Selection
4. Model
Evaluation and
Selection
4. Model
Evaluation and
Selection
4. Model
Evaluation and
Selection
4. Model
Evaluation and
Selection
5. Techniques
to Improve
Classification
Accuracy:
Ensemble
Methods
5. Bagging
6. Boosting
Thank you

MEFGI

You might also like