Professional Documents
Culture Documents
o Conditional probability is a probability of an event occurring, given that another event (by assumption,
presumption, assertion or evidence) has already occurred.
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) = ;𝑃 ( 𝐵 ) ≠ 0
𝑃(𝐵)
P (A│B) = Probability of occurrence of event A given event B has already occurred
P (A∩B) = Probability of occurrence of A AND B
P(B) = independent probability of B
o Example: If two fair dice are rolled, compute probability that the face-up value of the first dice is 2, given that
their sum is no greater than 5.
▪ Event- rolling 2 dice. Sample space = 36 outcomes
Probability D1 =2
D2
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8 𝟔 𝟏
D1 3 4 5 6 7 8 9 𝑃(𝐷1 = 2) = =
𝟑𝟔 𝟔
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Probability D1+ D2 ≤ 5
D2
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8 𝟏𝟎
D1 3 4 5 6 7 8 9 𝑃(𝐷1 + 𝐷2 ≤ 5) =
𝟑𝟔
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Probability D1 =2 given that D1+ D2≤ 5
D2
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8 𝟑
D1 3 4 5 6 7 8 9 𝑃(𝐷1 = 2 |𝐷1 + 𝐷2 ≤ 5) =
𝟏𝟎
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
3
𝑃(𝐴 ∩ 𝐵) 3 10 3
▪ 𝑃(𝐴|𝐵) = = 36
10 = ∗ =
𝑃(𝐵) 36 36 10
36
o Independent event occurs when one event remains unaffected by the occurrence of the other event. E.g., if
we take two separate coins and flip them, then the occurrence of Head or Tail on both the coins are
independent to each other.
▪ 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵)
𝑃(𝐴∩𝐵) 𝑃(𝐴)∗𝑃(𝐵)
▪ 𝑃(𝐴|𝐵) = = = 𝑃(𝐴)
𝑃(𝐵) 𝑃(𝐵)
𝑃(𝐴∩𝐵) 𝑃(𝐴)∗𝑃(𝐵)
▪ 𝑃(𝐵|𝐴) = = = 𝑃(𝐵)
𝑃(𝐴) 𝑃(𝐴)
• Bayes Theorem
By conditional probability,
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) = Posterior = Probability of A given B =
𝑃(𝐵)
𝑃(𝐵 ∩ 𝐴)
𝑃(𝐵|𝐴) = Likelihood = Probability of B given A =
𝑃(𝐴)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵 ∩ 𝐴)
𝑃(𝐴|𝐵) ∗ 𝑃 ( 𝐵 ) = 𝑃(𝐵|𝐴) ∗ 𝑃(𝐴)
𝑃(𝐵|𝐴) . 𝑃(𝐴)
𝑃(𝐴|𝐵) = 𝑃(𝐵)
; 𝑃 (𝐵) ≠ 0
▪ P(B) = independent probability of B
▪ P(A) = independent probability of B
Now, it’s time to put a naive assumption to the Bayes’ theorem, which is, independence among the
features. So now, we split evidence into the independent parts.
For independent variables, 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) ∗ 𝑃(𝐵)
𝑃(𝑥1 |𝑦)𝑃(𝑥2 |𝑦)𝑃(𝑥3 |𝑦)….𝑃(𝑥𝑛−1 |𝑦)𝑃(𝑥𝑛 |𝑦). 𝑃(𝑦)
𝑃(𝑦 | 𝑋) =
𝑃(𝑥1 )𝑃(𝑥2 )𝑃(𝑥3 )…𝑃(𝑥𝑛−1 )𝑃(𝑥𝑛 )
𝑃(𝑦).∏𝑛
𝑖=1 𝑃(𝑥𝑖 |𝑦)
𝑃(𝑦 | 𝑋) =
𝑃(𝑥1 )𝑃(𝑥2 )𝑃(𝑥3 )…𝑃(𝑥𝑛−1 )𝑃(𝑥𝑛 )
2 1 0 2 3
𝑃(𝑁𝑒𝑤 | 𝑠𝑝𝑜𝑟𝑡𝑠) ≈ ∗ ∗ ∗ ∗ ≈0
11 11 11 11 5
1 0 0 1 2
𝑃(𝑁𝑒𝑤 | 𝑁𝑜𝑛𝑆𝑝𝑜𝑟𝑡𝑠) ≈ ∗ ∗ ∗ ∗ ≈0
9 9 9 9 5
• Laplace/Additive Smoothing
o Since a query word “Close” doesn’t exist in our training data. Due to that numerator becomes zero
o Now, there are equal chances that it belongs to an either category. This equal probability is calculated by
o Majority of the time alpha =1, as the alpha increases this likelihood ≈ 0.5 e.g., 𝑎 = 10000
2 + 10000 1
𝑃(𝑤𝑖 | 𝑦 = 1 ) = 𝑃(𝑤𝑖 | 𝑦 = 0 ) =
1000 + 10000 ∗ 2 2