You are on page 1of 8

2 Consider a medical diagnosis problem in which there are two alternative hypotheses: 1.

That the
patient has a particular form of cancer (+) and 2. That the patient does not (-). A patient takes a lab
test and the result comes back positive. The test returns a correct positive result in only 98% of the
cases in which the disease is actually present, and a correct negative result in only 97% of the
cases in which the disease is not present. Furthermore, .008 of the entire population have this
cancer. Determine whether the patient has Cancer or not using MAP hypothesis.

P(cancer) = .008

P(!cancer) = .992 = 1 - P(cancer)

P(+|cancer) = .98

P(-|cancer) = .02 = 1 - P(+|cancer)

P(-|!cancer) = .97

P(+|!cancer) = .03 = 1 - P(-|~cancer)

(Recall that the exclamation sign "!" means "not")

Explanation:

We want to calculate the probability of cancer given that the test

result is positive. Or in probability notation, we want P(cancer|+).

However, we are not given P(cancer|+) directly. This is only an

inconvenience, though, since we can calculate P(cancer|+) using Bayes

rule. Bayes rule states that:

P(cancer|+) = P(+|cancer) x P(cancer) / P(+).

We are given both P(+|cancer) and P(cancer) so calculating the

numerator is straight forward. The denominator we don't have

directly, but we can calculate it pretty easily using the law of total

probability, or "marginalization" as it was called in lecture. The

law of total probability just says that the probability of testing

positive is:

P(+) = P(+|cancer) x P(cancer) + P(+|~cancer) x P(~cancer).

So now just plug and chug since we have all the required

numbers:

P(cancer|+)

= P(+|cancer) x P(cancer) /
( P(+|cancer) x P(cancer) + P(+|!cancer) x P(!cancer) ).

= .98 x .008 / ( .98 x .008 + .03 x .992 )

= .0078 / ( .0078 + .0298 )

= .21

So even though the patient tested positve, the probability that he has

cancer is only 21%. How can we make sense of this, especially since

the cancer test is quite accurate? The answer lies in the fact that

the prior probability of having the cancer is extremely low (0.008).

What Bayes rule is telling us here is that because the prior is so

low, we need a lot of evidence to convince us that the patient really

has cancer. One test result is simply not enough

Part B
1 ExplainBayestheorem? How it is useful.

Applications of the theorem are widespread and not limited to the financial realm. As
an example, Bayes' theorem can be used to determine the accuracy of medical test
results by taking into consideration how likely any given person is to have a disease
and the general accuracy of the test. Bayes' theorem relies on incorporating prior
probability distributions in order to generate posterior probabilities.

Prior probability, in Bayesian statistical inference, is the probability of an event


before new data is collected. This is the best rational assessment of the probability
of an outcome based on the current knowledge before an experiment is performed.
Posterior probability is the revised probability of an event occurring after taking into
consideration new information. Posterior probability is calculated by updating
the prior probability by using Bayes' theorem. In statistical terms, the posterior
probability is the probability of event A occurring given that event B has occurred.
2 Compare L1 and L2 regularization?

S.No L1 Regularization L2 Regularization

Panelizes the sum of


penalizes the sum of square
1 absolute value of
weights.
weights.

2 It has a sparse solution. It has a non-sparse solution.

3 It gives multiple solutions. It has only one solution.

Constructed in feature
4 No feature selection.
selection.

5 Robust to outliers. Not robust to outliers.

It gives more accurate predictions


It generates simple and
6 when the output variable is the
interpretable models.
function of whole input variables.

Unable to learn complex


7 Able to learn complex data patterns.
data patterns.

Computationally
Computationally efficient because of
8 inefficient over non-
having analytical solutions.
sparse conditions.

3 Explain Bayesian belief network and conditional independence with example

https://www.javatpoint.com/bayesian-belief-network-in-artificial-intelligence

https://towardsdatascience.com/conditional-independence-the-backbone-of-bayesian-networks-
85710f1b35b
4 What is the difference between probability and likelihood?

https://medium.com/swlh/probability-vs-likelihood-cdac534bf523
5 Explain prior probability likelihood and marginal likelihood in context of Na¨ıve Bayes algorithm?

Prior probability, in Bayesian statistical inference, is the probability of an


event before new data is collected. This is the best rational assessment of
the probability of an outcome based on the current knowledge before an
experiment is performed.
The prior probability of an event will be revised as new data or information becomes
available, to produce a more accurate measure of a potential outcome. That revised
probability becomes the posterior probability and is calculated using Bayes'
theorem. In statistical terms, the posterior probability is the probability of event A
occurring given that event B has occurred.

For example, three acres of land have the labels A, B, and C. One acre has
reserves of oil below its surface, while the other two do not. The prior probability of
oil being found on acre C is one third, or 0.333. But if a drilling test is conducted on
acre B, and the results indicate that no oil is present at the location, then the
posterior probability of oil being found on acres A and C become 0.5, as each acre
has one out of two chances.

In statistics, a marginal likelihood function, or integrated likelihood, is a likelihood function in


which some parameter variables have been marginalized. In the context of Bayesian statistics, it
may also be referred to as the evidence or model evidence.
6 Explain Bayesian belief network? Where are they used?

Same as 3b

7 Describe about prior probability likelihood in NaiveBayes algorithm

Same as 5b

8 How do you classify text using Bayes Theorem?

https://medium.com/analytics-vidhya/naive-bayes-classifier-for-text-classification-556fabaf252b

9 Explain the general cause of overfitting and underfitting? What steps will you take to avoid
overfitting and underfitting?

Underfitting:
A statistical model or a machine learning algorithm is said to have
underfitting when it cannot capture the underlying trend of the data. (It’s just
like trying to fit undersized pants!) Underfitting destroys the accuracy of our
machine learning model. Its occurrence simply means that our model or the
algorithm does not fit the data well enough. It usually happens when we have
fewer data to build an accurate model and also when we try to build a linear
model with fewer non-linear data. In such cases, the rules of the machine
learning model are too easy and flexible to be applied on such minimal data
and therefore the model will probably make a lot of wrong predictions.
Underfitting can be avoided by using more data and also reducing the
features by feature selection.
Techniques to reduce underfitting:
1. Increase model complexity
2. Increase the number of features, performing feature engineering
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training
to get better results.

Overfitting:
A statistical model is said to be overfitted when we train it with a lot of
data (just like fitting ourselves in oversized pants!). When a model gets
trained with so much data, it starts learning from the noise and inaccurate
data entries in our data set. Then the model does not categorize the data
correctly, because of too many details and noise. The causes of overfitting
are the non-parametric and non-linear methods because these types of
machine learning algorithms have more freedom in building the model based
on the dataset and therefore they can really build unrealistic models. A
solution to avoid overfitting is using a linear algorithm if we have linear data
or using the parameters like the maximal depth if we are using decision
trees.
Techniques to reduce overfitting:
1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase (have an eye over the loss
over the training period as soon as loss begins to increase stop
training).
4. Ridge Regularization and Lasso Regularization
5. Use dropout for neural networks to tackle overfitting.

10 Describe the average squared difference between classifier predicted output and actual output

11 Explain probability of a hypothesis before the presentation of evidence

12 What are the uses of Bayes classifier

https://www.javatpoint.com/machine-learning-naive-bayes-classifier

13 Explain hypothesis with example

What is Hypothesis?
Hypothesis is an assumption that is made on the basis of some evidence. This is the initial point
of any investigation that translates the research questions into a prediction. It includes
components like variables, population and the relation between the variables. A research
hypothesis is a hypothesis that is used to test the relationship between two or more variables.

Characteristics of Hypothesis
Following are the characteristics of hypothesis:

• The hypothesis should be clear and precise to consider it to be reliable.


• If the hypothesis is a relational hypothesis, then it should be stating the relationship
between variables.
• The hypothesis must be specific and should have scope for conducting more tests.
• The way of explanation of the hypothesis must be very simple and it should also be
understood that the simplicity of the hypothesis is not related to its significance.

Examples of Hypothesis
Following are the examples of hypothesis based on their types:

• Consumption of sugary drinks every day leads to obesity is an example of a simple


hypothesis.
• All lilies have the same number of petals is an example of a null hypothesis.
• If a person gets 7 hours of sleep, then he will feel less fatigue than if he sleeps less.
14 What are the advantages of na¨ıve bayes classifier

Same as 12b

15 ExplainBayesian belief network and conditional independence with example

Same as 3b
16 Explain the concept of EM Algorithm.

https://www.geeksforgeeks.org/ml-expectation-maximization-algorithm/

17 Discuss what are gaussian mixtures.

https://www.geeksforgeeks.org/gaussian-mixture-model/

18 What is optimal classifier

https://svivek.com/teaching/lectures/slides/prob-learning/bayes-optimal-classifier.pdf

19 Explain about different classifiers

https://monkeylearn.com/blog/what-is-a-classifier/

20 How do you classify text using Bayes theorem

Same as part b 8

You might also like