Professional Documents
Culture Documents
A,D,K,M,Z
1
A,D,K,M,Z CS 4870: Homework #4
Problem 1
a. From the question:
P (R) = 1/2
P (B) = 1/2
P (H|R) = 3/5
3 1 7 1 6 7 13
P (H) = P (H|R)P (R) + P (H|B)P (B) = · + · = + =
5 2 10 2 20 20 20
By Bayes Rule:
P (H|R)P (R) 3 1 20 6
P (R|H) = = · · =
P (H) 5 2 13 13
We make the naive Bayes assumption. By Bayes Rule (notice the 1/2 terms cancel):
3 3 1 4 9
P (HHT H|R) = · · · =
5 10 2 5 125
7 1 9 2 63
P (HHT H|B) = · · · =
10 5 10 5 1250
P (HHT H|R) 10
P (R|HHT H) = =
P (HHT H|R) + P (HHT H|B) 17
c. After examining the data table (probabilities are P ([coin] is heads|hat color):
3 7 1 1 21
P (HHT H|R) = · · · =
4 8 2 8 512
1 3 1 4 3
P (HHT H|B) = · · · =
10 10 10 10 2500
d. X = Data(heads/tails) and y = red/blue hat. The Naive Bayes assumption holds because different
coins are independent of each other (i.e. the event spaces are disjoint); hence, the features can be assumed
to be conditionally independent.
Problem 2
a.
P (0, 0, 1|Ham)P (Ham) 0
P (Ham|0, 0, 1) = =
P (0, 0, 1) 5
b.
• Collecting more emails would help with our predictions because a larger data sample would give us
more realistic probabilities
• Extracting more features for each email would allow us to classify each email more accurately
• Duplicating emails with uncommon features would not help, it changes the distribution of the emails
• Making stronger assumptions is helpful, assuming our features are independent of each other would be
more realistic for our data.
c.
P (1, 0, 1|Ham) = P (bacon = 1|Ham)P (ip = 0|Ham)P (mispell = 1|Ham) = 1 ∗ 2/5 ∗ 3/5 = 6/25
d.
1
P (bacon = 1|Spam) =
10
3
P (ip = 1|Spam) =
10
7
P (mispell = 1|Spam) =
10
5
P (bacon = 1|Ham) =
5
3
P (ip = 1|Ham) =
5
3
P (mispell = 1|Ham) =
5
P (Spam) = 2/3
P (Ham) = 1/3
e.
5
P (bacon = 1|Spam) =
18
7
P (ip = 1|Spam) =
18
11
P (mispell = 1|Spam) =
18
9
P (bacon = 1|Ham) =
13
7
P (ip = 1|Ham) =
13
7
P (mispell = 1|Ham) =
13
P (Spam) = 18/31
P (Ham) = 13/31
P (1, 0, 1|Ham) = P (bacon = 1|Ham)P (ip = 0|Ham)P (mispell = 1|Ham) = 9/13 ∗ 6/13 ∗ 7/13 = 0.172
4
A,D,K,M,Z CS 4870: Homework #4
Problem 3
1.
Qd
α=1 p([x]α |y = 1)p(y = 1)
p(y = 1|x) =
p(x)
= (given sum rule)
Qd
α=1 p([x]α |y = 1)p(y = 1)
p(x|y = 1)p(y = 1) + p(x|y = 0)p(y = 0)
= (given Naive Bayes assumption and product rule)
Qd
α=1 p([x]α |y = 1)p(y = 1)
Qd Qd
α=1 p([x]α |y = 1)p(y = 1) + α=1 p([x]α |y = 0)p(y = 0)
1
= Qd
α=1 p([x]α |y=0)p(y=0)
1 + exp (log Qd )
α=1 p([x]α |y=1)p(y=1)
1
= Qd
p([x]α |y=1)p(y=1)
1 + exp (− log Qα=1
d )
α=1 p([x]α |y=0)p(y=0)
3. Define w
~ and b as follows:
µα1 − µα0
wα = [w]α =
σα2
d
p(y = 1) X µ2α1 − µ2α0
b = log −
p(y = 0) α=1
2σα2
(xα −µαy )2
Then, given that p([x]α |y) = √ 1 exp (− 2 ),
2πσα 2σα
P (y = 1|~x)
h(~x) = 1 ⇐⇒ >1
P (y = 0|~x)
Qd
α=1 p([x]α |y = 1)p(y = 1)
⇐⇒ Qd >1
α=1 p([x]α |y = 0)p(y = 0)
Qd
p([x]α |y = 1) p(y = 1)
⇐⇒ log Qdα=1 + log >0
α=1 p([x]α |y = 0)
p(y = 0)
Qd (xα −µα1 )2
√ 1
α=1 2πσα exp (− 2
2σα ) p(y = 1)
⇐⇒ log Qd (xα −µα0 )2
+ log >0
1 p(y = 0)
α=1 2πσα exp (− )
√ 2
2σα
2
exp (− α=1 (xα −µ
Pd α1 )
2σα2 ) p(y = 1)
⇐⇒ log Pd (xα −µα0 )2 + log >0
exp (− 2 ) p(y = 0)
α=1 2σα
d
X (xα − µα1 )2 − (xα − µα0 )2 p(y = 1)
⇐⇒ log(exp (− 2
)) + log >0
α=1
2σα p(y = 0)
d
X (xα − µα1 )2 − (xα − µα0 )2 p(y = 1)
⇐⇒ − 2
+ log >0
α=1
2σ α p(y = 0)
d
X −2xα µα1 + µ2α1 + 2xα µα0 − µ2α0 p(y = 1)
⇐⇒ − 2
+ log >0
α=1
2σ α p(y = 0)
d d
X µα1 − µα0 X µ2α1 − µ2α0 p(y = 1)
⇐⇒ 2
· x α − 2
+ log >0
α=1
σα α=1
2σα p(y = 0)
d d
X p(y = 1) X µ2α1 − µ2α0
⇐⇒ wα xα + log − >0
α=1
p(y = 0) α=1
2σα2
⇐⇒ w
~ · ~x + b > 0