You are on page 1of 2

Bayes decision rule

If we know the conditional probability P(X | Y) we


can determine the appropriate class by using
Bayes rule:
P(y i | x)

P(x | y i)P(y i) def


qi (x)
P(x)

But how do we determine


p(X|Y)?

Computing p(x|y)
Consider a
dataset with 16
attributes (lets
assume they are
all binary). How
many values to
we need to know
to fully determine
p(x|y)?

age
39
51
39
54
28
38
50
52
31
42
37
30
24
33
41
34
26
33
38
44
41
:
:
:
:

employment
education edunum
marital
State_gov Bachelors
13
Self_emp_not_inc
Bachelors
13
Private
HS_grad
9
Private
11th
7
Private
Bachelors
13
Private
Masters
14
Private
9th
5
Self_emp_not_inc
HS_grad
9
Private
Masters
14
Private
Bachelors
13
Private
Some_college
10
State_gov Bachelors
13
Private
Bachelors
13
Private
Assoc_acdm12
Private
Assoc_voc 11
Private
7th_8th
4
Self_emp_not_inc
HS_grad
9
Private
HS_grad
9
Private
11th
7
Self_emp_not_inc
Masters
14
Private
Doctorate
16
:
:
:
:
:
:
:
:
:
:
:
:

job
relation
race
gender

Never_married

Adm_clerical
Not_in_family
White
Male
Married

Exec_managerial
Husband White
Male
Divorced
Handlers_cleaners
Not_in_family
White
Male
Married

Handlers_cleaners
Husband Black
Male
Married

Prof_specialty
Wife
Black
Female
Married

Exec_managerial
Wife
White
Female
Married_spouse_absent

Other_service
Not_in_family
Black
Female
Married

Exec_managerial
Husband White
Male
Never_married

Prof_specialty
Not_in_family
White
Female
Married

Exec_managerial
Husband White
Male
Married

Exec_managerial
Husband Black
Male
Married

Prof_specialty
Husband Asian
Male
Never_married

Adm_clerical
Own_child White
Female
Never_married

Sales
Not_in_family
Black
Male
Married

Craft_repairHusband Asian
Male
Married

Transport_moving
Husband Amer_Indian
Male
Never_married

Farming_fishing
Own_child White
Male
Never_married

Machine_op_inspct
Unmarried White
Male
Married

Sales
Husband White
Male
Divorced
Exec_managerial
Unmarried White
Female
Married

Prof_specialty
Husband White
Male
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:

Learning the values for the full conditional


probability table would require enormous
amounts of data

hours_worked
country
40
13
40
40
40
40
16
45
50
40
80
40
30
50
40
45
35
40
50
45
60
:
:
:
:

wealth

United_States
poor
United_States
poor
United_States
poor
United_States
poor
Cuba
poor
United_States
poor
Jamaica poor
United_States
rich
United_States
rich
United_States
rich
United_States
rich
India
rich
United_States
poor
United_States
poor
*MissingValue*
rich
Mexico
poor
United_States
poor
United_States
poor
United_States
poor
United_States
rich
United_States
rich
:
:
:
:
:
:
:
:

You might also like