You are on page 1of 4

Q 3 USE ID3

The dataset is categorical classes (SENIOR, JUNIOR)


Where 4 out of 8 are "SENIOR" and 4 out of 8 " JUNIOR " .
Complete entropy of dataset is H(S) = P(SENIOR)* log2(P(SENIOR))-
P(JUNIOR)*log2(P(JUNIOR)) .
H(S)=-4/8log2(4/8)-4/8log2(4/8)
1/2+1/2=1
For each attribute of the dataset
first attribute is department categorical value Sales , System , Marketing
 H(D= sales ) = 1/3log2(1/3)-2/3log2(2/3)=0.91
 H(D= system) = 2/4*log2(2/4)-2/4log2(2/4)=1
 H(D=marketing ) = -1log2(1)=0
Average entropy information department
3/8*0.91+4/8*1+0*1/8 =0.84125

1-0.84=0.1587 Information Gain =0, 1587

=========================================================
Second Attribute age
( 31=>35 =3 | 26=>30 = 2 | 21=>25 = 1 | 41=>45 = 1 | 36=>40 = 1 )
1- H(age=31=>35) = -2/3 log2(2/3) -1/3 log2 (1/3) =0.918
2- H(age=26=>30) = -2/2 log2(2/2) – 0 = 0
3- H(age=21=>25) = 0
4- H(age=41=>45) = 0
5- H(age=36=>40) = 0

Average Entropy Information Age


= 3/8 * 0.918 + 0 + 0 + 0 = 0.344
1-0.344 = 0.656 Information Gain =0.656

=========================================================
Third Attribute Salary
46k=>50 K= 4 | 26K =>30 K = 1 | 31K=>35K=1 | 66K=>70K= 2
6- H(Sal=46k=>50 K) = -2/4 log2(2/4) -2/4 log2 (2/4) =1
7- H(Sal =26K =>30 K) = -1 log2(1) – 0 = 0
8- H(Sal =31K=>35K) = -1 log2(1) 0
9- H(Sal =66K=>70K) = -2/2 log2(2/2) =0
Average Entropy Information Salary
4/8*1 = 0.5
Information Gain H(S)-I(Sal )= 1-0.5=0.5
Information Gain =0.5
=========================================================

For drawing the Tree we Shoes the attribute with the highest information
Gain It's the second Attribute Age with information Gain = 0.656 .

Age
|

21=>25 26=>30 31=>35 41=>45 36=>40

JUNIOR JUNIOR ? SENIOR SENIOR

Again with the second level

Complete Entropy of 31 => 35


H(S)= -2/3 log2 (2/3)-1/3 log2 (1/3) =0.91
=========================================================

First Attribute Department categorical value Sales, System


1- H(31=>35 , dep = sales) = -1/2log2(1/2) – 1/2 log(1/2)=1
2- H(31=>35 , dep = System ) = -1log2(1) =0
10- Average entropy information department

I(31=>35 , department )= p(31 => 35 , sales ) * H(31 => 35 , sales)+0 =1*2/3 =0.666

Information Gain = H(31 => 35 , sales)-I(31 => 35 , Department)


0.91-0.66= 0.244

=========================================================

Second Attribute
1- H(31=>35 , Sal = A) = -1/1 log2(1/1) -0-0-0=0
2- H(31=>35 , Sal = B)= 0
3- H(31=>35 , Sal = C) = 0
- Average Entropy information for Salary
0+0+0 =0
Information Gain = H(31=>35) I(31=>35 ,Salary ) 0.91-0 = 0.91

We shoes the attribute with the highest information Gain


We shoes the salary attribute
Age
|

21=>25 26=>30 31=>35 41=>45 36=>40

JUNIOR JUNIOR Salary SENIOR SENIOR

46K=> 50K 31K=> 35K 66K=> 70K

SENIOR JUNIOR SENIOR

=====================================================================

Q3 ----- B
X = System , 26 => 30 , 46 => 50 K
Senior = 4 , junior = 4

- P(System / Senior) = 2/4 =0.5


- P(26 => 30 / Senior) = 0/4 =0
- P(46 K => 50 K / Senior) = 2/4 =0.5
- P( X / Senior) = 0/5 *0.5 * 0 = 0

- P(System / Junior) = 2/4 =0.5


- P(26 => 30 / Junior) = 2/4 =0.5
- P(46 K => 50 K / Junior) = 2/4 =0.5

- P( X / Junior) = 0/5 *0.5 * 0.5 = 0.125


- X = Junior

You might also like