ML Mod 4

2 Consider a medical diagnosis problem in which there are two alternative hypotheses: 1.
That the
patient has a particular form of cancer (+) and 2. That the patient does not (-). A patient takes a lab
test and the result comes back positive. The test returns a correct positive result in only 98% of the
cases in which the disease is actually present, and a correct negative result in only 97% of the
cases in which the disease is not present. Furthermore, .008 of the entire population have this
cancer. Determine whether the patient has Cancer or not using MAP hypothesis.
P(cancer) = .008
P(!cancer) = .992 = 1 - P(cancer)
P(+|cancer) = .98
P(-|cancer) = .02 = 1 - P(+|cancer)
P(-|!cancer) = .97
P(+|!cancer) = .03 = 1 - P(-|~cancer)
(Recall that the exclamation sign "!" means "not")
Explanation:
We want to calculate the probability of cancer given that the test
result is positive. Or in probability notation, we want P(cancer|+).
However, we are not given P(cancer|+) directly. This is only an
inconvenience, though, since we can calculate P(cancer|+) using Bayes
rule. Bayes rule states that:
P(cancer|+) = P(+|cancer) x P(cancer) / P(+).
We are given both P(+|cancer) and P(cancer) so calculating the
numerator is straight forward. The denominator we don't have
directly, but we can calculate it pretty easily using the law of total
probability, or "marginalization" as it was called in lecture. The
law of total probability just says that the probability of testing
positive is:
P(+) = P(+|cancer) x P(cancer) + P(+|~cancer) x P(~cancer).
So now just plug and chug since we have all the required
numbers:
P(cancer|+)
= P(+|cancer) x P(cancer) /
( P(+|cancer) x P(cancer) + P(+|!cancer) x P(!cancer) ).
= .98 x .008 / ( .98 x .008 + .03 x .992 )
= .0078 / ( .0078 + .0298 )
= .21
So even though the patient tested positve, the probability that he has
cancer is only 21%. How can we make sense of this, especially since
the cancer test is quite accurate? The answer lies in the fact that
the prior probability of having the cancer is extremely low (0.008).
What Bayes rule is telling us here is that because the prior is so
low, we need a lot of evidence to convince us that the patient really
has cancer. One test result is simply not enough
Part B
1 ExplainBayestheorem? How it is useful.
Applications of the theorem are widespread and not limited to the financial realm. As
an example, Bayes' theorem can be used to determine the accuracy of medical test
results by taking into consideration how likely any given person is to have a disease
and the general accuracy of the test. Bayes' theorem relies on incorporating prior
probability distributions in order to generate posterior probabilities.
Prior probability, in Bayesian statistical inference, is the probability of an event

before new data is collected. This is the best rational assessment of the probability
of an outcome based on the current knowledge before an experiment is performed.
Posterior probability is the revised probability of an event occurring after taking into
consideration new information. Posterior probability is calculated by updating
the prior probability by using Bayes' theorem. In statistical terms, the posterior
probability is the probability of event A occurring given that event B has occurred.
2 Compare L1 and L2 regularization?
S.No L1 Regularization L2 Regularization
Panelizes the sum of

penalizes the sum of square
1 absolute value of
weights.
weights.
2 It has a sparse solution. It has a non-sparse solution.
3 It gives multiple solutions. It has only one solution.
Constructed in feature
4 No feature selection.
selection.
5 Robust to outliers. Not robust to outliers.
It gives more accurate predictions

It generates simple and
6 when the output variable is the
interpretable models.
function of whole input variables.
Unable to learn complex

7 Able to learn complex data patterns.
data patterns.
Computationally
Computationally efficient because of
8 inefficient over non-
having analytical solutions.
sparse conditions.
3 Explain Bayesian belief network and conditional independence with example
https://www.javatpoint.com/bayesian-belief-network-in-artificial-intelligence
https://towardsdatascience.com/conditional-independence-the-backbone-of-bayesian-networks-
85710f1b35b
4 What is the difference between probability and likelihood?
https://medium.com/swlh/probability-vs-likelihood-cdac534bf523
5 Explain prior probability likelihood and marginal likelihood in context of Na¨ıve Bayes algorithm?
Prior probability, in Bayesian statistical inference, is the probability of an

event before new data is collected. This is the best rational assessment of
the probability of an outcome based on the current knowledge before an
experiment is performed.
The prior probability of an event will be revised as new data or information becomes
available, to produce a more accurate measure of a potential outcome. That revised
probability becomes the posterior probability and is calculated using Bayes'
theorem. In statistical terms, the posterior probability is the probability of event A
occurring given that event B has occurred.
For example, three acres of land have the labels A, B, and C. One acre has
reserves of oil below its surface, while the other two do not. The prior probability of
oil being found on acre C is one third, or 0.333. But if a drilling test is conducted on
acre B, and the results indicate that no oil is present at the location, then the
posterior probability of oil being found on acres A and C become 0.5, as each acre
has one out of two chances.
In statistics, a marginal likelihood function, or integrated likelihood, is a likelihood function in

which some parameter variables have been marginalized. In the context of Bayesian statistics, it
may also be referred to as the evidence or model evidence.
6 Explain Bayesian belief network? Where are they used?
Same as 3b
7 Describe about prior probability likelihood in NaiveBayes algorithm
Same as 5b
8 How do you classify text using Bayes Theorem?
https://medium.com/analytics-vidhya/naive-bayes-classifier-for-text-classification-556fabaf252b
9 Explain the general cause of overfitting and underfitting? What steps will you take to avoid
overfitting and underfitting?
Underfitting:
A statistical model or a machine learning algorithm is said to have
underfitting when it cannot capture the underlying trend of the data. (It’s just
like trying to fit undersized pants!) Underfitting destroys the accuracy of our
machine learning model. Its occurrence simply means that our model or the
algorithm does not fit the data well enough. It usually happens when we have
fewer data to build an accurate model and also when we try to build a linear
model with fewer non-linear data. In such cases, the rules of the machine
learning model are too easy and flexible to be applied on such minimal data
and therefore the model will probably make a lot of wrong predictions.
Underfitting can be avoided by using more data and also reducing the
features by feature selection.
Techniques to reduce underfitting:
1. Increase model complexity
2. Increase the number of features, performing feature engineering
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training
to get better results.
Overfitting:
A statistical model is said to be overfitted when we train it with a lot of
data (just like fitting ourselves in oversized pants!). When a model gets
trained with so much data, it starts learning from the noise and inaccurate
data entries in our data set. Then the model does not categorize the data
correctly, because of too many details and noise. The causes of overfitting
are the non-parametric and non-linear methods because these types of
machine learning algorithms have more freedom in building the model based
on the dataset and therefore they can really build unrealistic models. A
solution to avoid overfitting is using a linear algorithm if we have linear data
or using the parameters like the maximal depth if we are using decision
trees.
Techniques to reduce overfitting:
1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase (have an eye over the loss
over the training period as soon as loss begins to increase stop
training).
4. Ridge Regularization and Lasso Regularization
5. Use dropout for neural networks to tackle overfitting.
10 Describe the average squared difference between classifier predicted output and actual output
11 Explain probability of a hypothesis before the presentation of evidence
12 What are the uses of Bayes classifier
https://www.javatpoint.com/machine-learning-naive-bayes-classifier
13 Explain hypothesis with example
What is Hypothesis?
Hypothesis is an assumption that is made on the basis of some evidence. This is the initial point
of any investigation that translates the research questions into a prediction. It includes
components like variables, population and the relation between the variables. A research
hypothesis is a hypothesis that is used to test the relationship between two or more variables.
Characteristics of Hypothesis
Following are the characteristics of hypothesis:
• The hypothesis should be clear and precise to consider it to be reliable.

• If the hypothesis is a relational hypothesis, then it should be stating the relationship
between variables.
• The hypothesis must be specific and should have scope for conducting more tests.
• The way of explanation of the hypothesis must be very simple and it should also be
understood that the simplicity of the hypothesis is not related to its significance.
Examples of Hypothesis
Following are the examples of hypothesis based on their types:
• Consumption of sugary drinks every day leads to obesity is an example of a simple

hypothesis.
• All lilies have the same number of petals is an example of a null hypothesis.
• If a person gets 7 hours of sleep, then he will feel less fatigue than if he sleeps less.
14 What are the advantages of na¨ıve bayes classifier
Same as 12b
15 ExplainBayesian belief network and conditional independence with example
Same as 3b
16 Explain the concept of EM Algorithm.
https://www.geeksforgeeks.org/ml-expectation-maximization-algorithm/
17 Discuss what are gaussian mixtures.
https://www.geeksforgeeks.org/gaussian-mixture-model/
18 What is optimal classifier
https://svivek.com/teaching/lectures/slides/prob-learning/bayes-optimal-classifier.pdf
19 Explain about different classifiers
https://monkeylearn.com/blog/what-is-a-classifier/
20 How do you classify text using Bayes theorem
Same as part b 8

ML Mod 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Mod 4

Uploaded by

Copyright:

Available Formats

2 Consider a medical diagnosis problem in which there are two alternative hypotheses: 1.

P(!cancer) = .992 = 1 - P(cancer)

P(-|cancer) = .02 = 1 - P(+|cancer)

P(+|!cancer) = .03 = 1 - P(-|~cancer)

(Recall that the exclamation sign "!" means "not")

We want to calculate the probability of cancer given that the test

result is positive. Or in probability notation, we want P(cancer|+).

However, we are not given P(cancer|+) directly. This is only an

inconvenience, though, since we can calculate P(cancer|+) using Bayes

rule. Bayes rule states that:

P(cancer|+) = P(+|cancer) x P(cancer) / P(+).

We are given both P(+|cancer) and P(cancer) so calculating the

numerator is straight forward. The denominator we don't have

probability, or "marginalization" as it was called in lecture. The

law of total probability just says that the probability of testing

P(+) = P(+|cancer) x P(cancer) + P(+|~cancer) x P(~cancer).

= .98 x .008 / ( .98 x .008 + .03 x .992 )

= .0078 / ( .0078 + .0298 )

the prior probability of having the cancer is extremely low (0.008).

What Bayes rule is telling us here is that because the prior is so

low, we need a lot of evidence to convince us that the patient really

has cancer. One test result is simply not enough

Prior probability, in Bayesian statistical inference, is the probability of an event

S.No L1 Regularization L2 Regularization

Panelizes the sum of

2 It has a sparse solution. It has a non-sparse solution.

3 It gives multiple solutions. It has only one solution.

5 Robust to outliers. Not robust to outliers.

It gives more accurate predictions

Unable to learn complex

3 Explain Bayesian belief network and conditional independence with example

Prior probability, in Bayesian statistical inference, is the probability of an

In statistics, a marginal likelihood function, or integrated likelihood, is a likelihood function in

7 Describe about prior probability likelihood in NaiveBayes algorithm

8 How do you classify text using Bayes Theorem?

11 Explain probability of a hypothesis before the presentation of evidence

12 What are the uses of Bayes classifier

13 Explain hypothesis with example

• The hypothesis should be clear and precise to consider it to be reliable.

• Consumption of sugary drinks every day leads to obesity is an example of a simple

15 ExplainBayesian belief network and conditional independence with example

17 Discuss what are gaussian mixtures.

18 What is optimal classifier

19 Explain about different classifiers

20 How do you classify text using Bayes theorem

You might also like