Professional Documents
Culture Documents
Prof.Anil Prajapati
1
Index
2
What is Classification and Prediction?
3
Cont..
Example of classification:
A bank loan officer wants to analyze the data in order to
know which loan applicant are risky or which are safe.
A marketing manager at a company needs to analyze a
customer with a given profile, who will buy a new
computer.
In both of the above examples, a model or classifier is
constructed to predict the categorical labels. These labels
are risky or safe for loan application data and yes or no
for marketing data.
Teachers Classify the students in class A,B,C,D
4
Cont..
Example of prediction:
The marketing manager of Amazon needs to predict how
much a given customer will spend during a sale at his
company.
In this example, we are bothered to predict a numeric
value. Therefore the data analysis task is an example of
numeric prediction.
Typical applications
Credit/loan approval
Medical diagnosis: if a tumor is cancerous or benign
Fraud detection: if a transaction is fraudulent
Web page categorization: which category it is
5
6
What is supervised & Unsupervised?
8
Building the Classifier or Model
9
Cont..
10
Using Classifier for classification
11
Cont..
12
Issues of Classification and Prediction
Data cleaning:
Pre-process the data to remove the noise and
handle missing values
Relevance analysis:
Remove irrelevant and redundant attributes
Data Transformation:
Generalize the data to higher level concepts using
concept hierarchies
13
Comparison of Classification and
Prediction methods
Accuracy − Accuracy of classifier refers to the
ability of classifier. It predict the class label
correctly and the accuracy of the predictor refers
to how well a given predictor can guess the value
of predicted attribute for a new data.
Speed − This refers to the computational cost in
generating and using the classifier or predictor.
Robustness − It refers to the ability of classifier
or predictor to make correct predictions from
given noisy data.
14
Comparison of Classification and
Prediction methods
Scalability − Scalability refers to the ability to
construct the classifier or predictor efficiently;
given large amount of data.
Interpretability − It refers to what extent the
classifier or predictor understands.
15
Classification and Prediction methods
16
Decision Tree
17
Cont..
18
Benefits of decision tree
19
Requirements of classification
20
Terms for Decision Tree
1. Entropy
2. Information Gain
3. Gini Index
21
Entropy
23
Gini Index
24
Example: Sample Data
25
Calculate Information Gain
26
Divided based on Age
27
I and E for Age <=30
28
I and E for Age <=30
29
30
31
32
33
34
35
36
Final Tree
37
Classification rule for decision tree
38