You are on page 1of 12

Decision Tree

Dr. Alekh Gour


Decision Tree
What is a decision tree?
Customer ID Age Gender Marital Status No. of Credit Profitability
Card
C-0001000 36 Male Married 1 Profitable
C-0001002 32 Male Single 3 Unprofitable
C-0001003 38 Male Married 2 Profitable
C-0001004 40 Male Single 1 Unprofitable
C-0001005 44 Male Married 0 Profitable
C-0001006 56 Female Married 0 Profitable
C-0001007 58 Female Single 1 Unprofitable
C-0001008 30 Female Single 2 Profitable
C-0001009 28 Female Married 1 Unprofitable
C-0001010 26 Female Married 0 Unprofitable
age income student credit_rating buys_product
Youth high no fair no
Decision Tree Classification Youth
Middle_Aged
high
high
no
no
excellent
fair
no
yes
Senior medium no fair yes
Customer Senior low yes fair yes
Senior low yes excellent no
Middle_Aged low yes excellent yes
Youth medium no fair no
Youth low yes fair yes
Age=Youth Age=Middle-aged Age=Senior Senior medium yes fair yes
(5/14) (5/14) Youth medium yes excellent yes
(4/14)
Middle_Aged medium no excellent yes
2 3 4 0 3 2 Middle_Aged high yes fair yes
Senior medium no excellent no

Student=Y Student=N C_R = fair C_R = Excellent


(2/5) (3/5) (3/5) (2/5)
2 0 0 3 3 0 0 2 Splitting
Criteria
Decision Tree Classification:
Splitting
Criteria

Information
Gain Ratio Gini Index
Gain

ID3 C4.5 CART


Decision Tree Classification
ENTROPY:
Entropy is measure of randomness and unpredictability in the dataset.
Decision Tree Classification:
age income student credit_rating buys_product
Information Gain Youth high no fair no
Youth high no excellent no
Middle_Aged high no fair yes
Senior medium no fair yes
Information Gain (DA) = Entropy (D) – Entropy (DA) Senior low yes fair yes
Senior low yes excellent no
Middle_Aged low yes excellent yes
Youth medium no fair no
  Youth low yes fair yes
Entropy (D) = - Senior medium yes fair yes
Youth medium yes excellent yes
Middle_Aged medium no excellent yes
 
Entropy (DA) = Middle_Aged high yes fair yes
Senior medium no excellent no
Decision Tree Classification:
age income student credit_rating buys_product
Information Gain Youth high no fair no
Youth high no excellent no
Middle_Aged high no fair yes
Information Gain (DA) = Entropy (D) – Entropy (DA) Senior medium no fair yes
Senior low yes fair yes
Gain (DAge) = Entropy (D) – Entropy (DAge) Senior low yes excellent no
Middle_Aged low yes excellent yes
Gain (DAge) = 0.940 – 0.694 = 0.246 Youth medium no fair no
Youth low yes fair yes
Gain (Dincome) = Entropy (D) – Entropy (Dincome) Senior medium yes fair yes
Youth medium yes excellent yes
Gain (Dincome) = 0.940 – 0.911 = 0.029 Middle_Aged medium no excellent yes
Middle_Aged high yes fair yes
Gain (Dstudent) = Entropy (D) – Entropy (Dstudent) Senior medium no excellent no

Gain (Dstudent) = 0.940 – 0.789 = 0.151


Gain (Dcredit_rating) = Entropy (D) – Entropy (Dcredit_rating)
Gain (Dcredit_rating) = 0.940 – 0.892 = 0.048
Decision Tree Classification:
age income student credit_rating buys_product
Gini Index Youth high no fair no
Youth high no excellent no
Middle_Aged high no fair yes
Senior medium no fair yes
(D)
  Senior low yes fair yes
Senior low yes excellent no
Middle_Aged low yes excellent yes
Youth medium no fair no
  Youth low yes fair yes
Gini (D) = 1 - Senior medium yes fair yes
Youth medium yes excellent yes
Middle_Aged medium no excellent yes
 
(D) = ) + ) Middle_Aged high yes fair yes
Senior medium no excellent no
Decision Tree Classification:
age income student credit_rating buys_product
Gini Index Youth high no fair no
Youth high no excellent no
Middle_Aged high no fair yes
(D)
  Senior medium no fair yes
Senior low yes fair yes
(D)
  Senior low yes excellent no
Middle_Aged low yes excellent yes
0.459
  – 0.375 = 0.084 age {youth, senior} or {middle-aged} Youth medium no fair no
Youth low yes fair yes
(D)
  Senior medium yes fair yes
Youth medium yes excellent yes
 0.459 – 0.443 = 0.016 income {low, medium} or {High} Middle_Aged medium no excellent yes
Middle_Aged high yes fair yes
(D)
  Senior medium no excellent no
0.459
  – 0.367 = 0.092

(D)
 
0.459
  – 0.429 = 0.03
Case Study Competition
Your client is a large MNC and they have 9 broad verticals across the organisation. One of the
problem your client is facing is around identifying the right people for promotion (only for
manager position and below) and prepare them in time. Currently the process, they are
following is:

They first identify a set of employees based on recommendations/ past performance


Selected employees go through the separate training and evaluation program for each vertical. These
programs are based on the required skill of each vertical

At the end of the program, based on various factors such as training performance, KPI completion
(only employees with KPIs completed greater than 60% are considered) etc., employee gets
promotion
For above mentioned process, the final promotions are only announced after the evaluation and this
leads to delay in transition to their new roles. Hence, company wants to design some model which
help in identifying the eligible candidates at a particular checkpoint so that they can expedite the entire
promotion cycle.
Case Study

You might also like