You are on page 1of 17

Machine Learning

Module 5:
Probability
Faculty: Santosh Chapaneri

Why Probability for ML?


• Probability theory can be applied to any situation involving uncertainty.

• In ML, there may be uncertainties in different forms, e.g. arriving at the best

prediction of future given the past data, arriving at the best model based on
certain data, arriving at the confidence level while predicting the future
outcome based on past data, etc.

• In ML, we train the system using training data and we expect the ML algorithm

to depict the behaviour of the larger set of actual data.

• If we have observation on a subset of events, called ‘sample’, then there will

be some uncertainty in attributing the sample results to the whole set or


population.

santosh.chapaneri@ieee.org
3

Probability – Properties

santosh.chapaneri@ieee.org

Probability – Properties

(marginal)

santosh.chapaneri@ieee.org
5

Probability – Conditional
• P (A | B) = the probability of event A given event B happened

• P(A|B) is the probability measure of the event A after observing the

occurrence of event B.

• Two events are called independent if and only if P(A|B) = P(A) P(B) (or

equivalently, P(A|B) = P(A).

• Therefore, independence is equivalent to saying that observing B does not

have any effect on the probability of A.

santosh.chapaneri@ieee.org

Probability – Conditional
• Q1: In a toy-making shop, the automated machine produces few

defective pieces. It is observed that in a lot of 1,000 toy parts, 25 are


defective. If two random samples are selected for testing without
replacement from the lot, calculate the probability that both the samples
are defective.

santosh.chapaneri@ieee.org
7

Probability – Baye’s Rule


• Posterior = Likelihood x Prior / Marginal

santosh.chapaneri@ieee.org

Probability – Baye’s Rule


• Q2: Suppose a new home HIV test has 95% sensitivity and 98%

specificity, and is to be used in a population of size 100,000 with an HIV


prevalence of 1/1000. What is the probability of a person being truly
positive given that the person tests positive?

santosh.chapaneri@ieee.org
9

Probability – Baye’s Rule


• Q3: Prob. of emails with sender names ‘mass’ and ‘bulk’ are spam = 0.8

• Prob. of false alarm = 0.1 (marked as spam even if email is not a spam)

• Prior knowledge that only 0.4% of the total emails received are spam

• Let x be the event of marked as spam if the sender name has the words

‘mass’ or ‘bulk’ and y be the event of some mail really being spam.

• Compute p(y = 1 | x = 1)

santosh.chapaneri@ieee.org

10

Probability – Baye’s Rule


• Q4: Consider a woman in her 40s who decides to have a medical

test for breast cancer called mammogram. Suppose you are told
the test has a sensitivity of 80%, the prior probability of having
breast cancer is 0.4%, the false positive is 10%. If the test is
positive, what is the probability that she has cancer?

santosh.chapaneri@ieee.org
11

Probability – Random Variables


• Consider an experiment in which we flip 10 coins, and we want to know the

number of coins that come up heads. Here, the elements are 10-length
sequences of H & T.

• in practice, we usually do not care about the probability of obtaining any particular

sequence of heads and tails. Instead we usually care about real-valued functions
of outcomes, such as the number of heads that appear among our 10 tosses, or
the length of the longest run of tails.

• These functions are known as random variables.

santosh.chapaneri@ieee.org

12

Probability – Random Variables


• Suppose that X(w) is the number of heads which occur in sequence of tosses w.

Given that 10 coins are tossed, X(w) can take only a finite number of values, so it
is known as a discrete random variable.

• Here, the probability of the set associated with a random variable X taking on

some specific value k is

• Suppose that X(w) is a random variable indicating the amount of time it takes for

a radioactive particle to decay. In this case, X(w) takes on a infinite number of


possible values, so it is called a continuous random variable.

• We denote the probability that X takes on a value between two real constants a

and b (where a < b) as

santosh.chapaneri@ieee.org
13

Random Variables – CDF


• Cumulative Distribution Function

santosh.chapaneri@ieee.org

14

Random Variables – PMF


• Probability Mass Function

santosh.chapaneri@ieee.org
15

Random Variables – PDF


• Probability Density Function

santosh.chapaneri@ieee.org

16

Random Variables – Expectation


• Mean of random variable

• Intuitively, the expectation of g(X) can be thought of as a “weighted average” of

the values that g(x) can taken on for different values of x, where the weights are
given by p(x) or f(x).

santosh.chapaneri@ieee.org
17

Random Variables – Variance


• The variance of a random variable X is a measure of how concentrated the

distribution of a random variable X is around its mean.

santosh.chapaneri@ieee.org

18

Random Variables – Variance


• Q5: Calculate the mean and the variance of the uniform random variable

X with PDF

santosh.chapaneri@ieee.org
19

Discrete Random Variables

santosh.chapaneri@ieee.org

20

Continuous Random Variables

santosh.chapaneri@ieee.org
21

PDF/CDF of Random Variables

santosh.chapaneri@ieee.org

22

Statistics of Random Variables

santosh.chapaneri@ieee.org
23

Multiple Random Variables


(Joint CDF)

(Marginal CDF)

santosh.chapaneri@ieee.org

24

Multiple Random Variables


(Joint PMF)

(Marginal PMF)

(Joint PDF)

(Marginal PDF)

santosh.chapaneri@ieee.org
25

Multiple Random Variables


Two random variables X and Y are independent if

santosh.chapaneri@ieee.org

26

Covariance of Random Variables

When Cov[X, Y ] = 0, we say that X and Y are uncorrelated.

santosh.chapaneri@ieee.org
27

Central Limit Theorem


• The central limit theorem tells us that the sum of a set of
random variables, which itself is a random variable, has a
distribution that becomes Gaussian as the number of terms in
the sum increases.
• Consider N IID variables, each U[0, 1], find the distribution of
average. For large N, this distribution tends to a Gaussian.

santosh.chapaneri@ieee.org

28

Sampling Distributions
• An important application of statistics in machine learning is how to draw a

conclusion about a set or population based on the probability model of


random samples of the set.

• E.g. based on the malignancy sample test results of some random tumor

cases, we want to estimate the proportion of all tumors which are


malignant and thus advise the doctors on the requirement or non-
requirement of biopsy on each tumor case.

• Different random samples may give different estimates.

• If we can get some knowledge about the variability of all possible

estimates derived from the random samples, then we should be able to


arrive at reasonable conclusions
santosh.chapaneri@ieee.org
29

Sampling Distributions
• Population is a finite set of objects being investigated.

• Random sample refers to a sample of objects drawn from a population in a

way that every member of the population has the same chance of being
chosen.

• Sampling distribution refers to the probability distribution of a random

variable defined in a space of random samples.

• Sampling with Replacement: While choosing the samples from the

population if each object chosen is returned to the population before the next
object is chosen, then it is called the sampling with replacement. In this case,
repetitions are allowed. Number of possibilities = NN, since each sample can
be repeated. Prob. of each sample = 1/ NN

santosh.chapaneri@ieee.org

30

Sampling Distributions
• Sampling with Replacement: choose a random sample of 2 patients from a

population of 3 patients {A, B, C} and replacement is allowed. There can be 9


such ordered pairs (A, A), (A, B), (A, C), (B, A), (B, B), (B, C), (C, A), (C, B),
(C, C). The number of random samples of 2 from the population of 3 is N = 9

• Sampling without Replacement: In case, we don’t return the object being

chosen to the population before choosing the next object, then the unordered
subset is called sampling without replacement. The number of such samples
that can be drawn from the population size of N is

santosh.chapaneri@ieee.org
31

Sampling Distributions – Mean and Var


• X random variable with mean μ and standard deviation σ from a population N

• Random sample of size n, drawn without replacement will generate n values

x1, x2, …, xn for X

• When samples are drawn with replacement, these values are independent of

each other and can be considered as values of n independent random


variables X1, X2, …, Xn, each having mean µ and variance σ2.

• The sample mean is

• When samples are drawn without replacement, these values are not

independent of each other

santosh.chapaneri@ieee.org

32

Hypothesis Testing
• Hypothesis is a statement about one or more populations.

• It is usually concerned with the parameters of the population. e.g. the hospital

administrator may want to test the hypothesis that the average length of stay
of patients admitted to the hospital is 5 days

• Null hypothesis H0: It is the hypothesis to be tested.

• Alternative hypothesis H1: It is a statement of what we believe is true if our

sample data cause us to reject the null hypothesis.

• The level of significance α is the probability of rejecting H0.

santosh.chapaneri@ieee.org
33

Hypothesis Testing

santosh.chapaneri@ieee.org

34

Monte Carlo Approximation


• Finding the distribution function of the random variables, in practical

situations, is difficult to compute using the change of variables formula

• Monte Carlo approximation provides a simple but powerful alternative to

this.

• Generate S samples from the distribution

• Given the samples, we can approximate the distribution of f(X) by using

the empirical distribution of

• We can use Monte Carlo to approximate the expected value of any

function of a random variable. We simply draw samples, and then


compute the arithmetic mean of the function applied to the samples:

santosh.chapaneri@ieee.org

You might also like