Unit 2

UNIT - 2
Probability, Random Variables and Probability Distributions:
Probability - Basic Definitions, Bayesian versus Frequentist Approach,

Compound Events, Rules of Probability, Advanced Probability-Bayes Theorem,
Applications.
Random Variables-Types of Random Variables-Discrete and Continuous,

Probability Mass Function, Probability Density Function; Probability
Distributions- Discrete Distributions - Binomial, Poisson, Continuous
Distributions, Examples and Applications of Binomial and Poisson
Distributions in Solving Business Problems.
Probability - Introduction
• People use the term probability many times
each day.
• For example, physician says that a patient
has a 50-50 chance of surviving a certain
operation. Another physician may say that
she is 95% certain that a patient has a
particular disease
Introduction to Probability
• We use probability to define the chances of the occurrence of an event.
Basic definitions:
• A procedure is an act that leads to a result. For example, throwing a dice
or visiting a website.
• An event is a collection of the outcomes of a procedure, such as getting a
heads on
a coin flip or leaving a website after only 4 seconds.
A simple event is an outcome/event of a procedure that cannot be
broken down further. For example, rolling two dice can be broken
down into two simple events: rolling dice 1 and rolling dice 2.
• The sample space of a procedure is the set of all possible simple events.
– For example, an experiment is performed, in which a coin is flipped
three times in succession.
– What is the size of the sample space for this experiment?
• The answer is eight, because the results could be any one of the
possibilities in the following sample space:
{HHH, HHT, HTT, HTH, TTT, TTH, THH, or THT}.
Introduction to Probability
• Probability: The probability of an event represents the frequency, or chance, that
the event will happen.
• For notation, if A is an event, P(A) is the probability of the occurrence of the event.
• We can define the actual probability of an event, A, as follows:
• Let's now pretend that our universe involves a research study on humans, and the
event ‘A’ is people in that study who have cancer.
• If our study has 100 people and A has 25 people, the probability of A or P(A) is
25/100.
Note: The maximum probability of any event is 1.

Definition of Probability
Probability - The likelihood of occurrence of an event
• The ratio of the number of favorable outcomes to the total number of outcomes of
an event.
Probability(Event) = Number of Favourable Outcomes/Total Number of Outcomes

= x/n
• The probability value always lies between 0 and 1. 0⩽P(E)⩽1

Terms in Probability:
Term Definition Example
Experiment or A series of actions where the Tossing a coin, Choosing a card from a deck of cards,
Trial outcomes are always uncertain. Throwing a dice.
Event The outcome of an experiment. Getting Heads while tossing a coin.
Outcome Possible result of an T (tail) is a possible outcome when a coin is tossed.
experiment
Sample Space The set of all the possible 1. Tossing a coin, Sample Space (S) = {H,T}
outcomes of an experiment 2. Rolling a dice, Sample Space (S) = {1,2,3,4,5,6}
Sample Point It is one of the possible resultsIn a deck of Cards:
• 4 of hearts is a sample point.
• The queen of clubs is a sample point.
Favorable An event that has produced the In rolling two dice, the possible/favorable outcomes of
Outcome desired result or expected getting the sum of numbers on the two dice as 4 are
event. (1,3), (2,2), and (3,1).
Bayesian versus Frequentist
• When it comes to calculating probabilities in practice: the Frequentist approach
and the Bayesian approach are considered.
Frequentist approach:
• In a Frequentist approach, the probability of an event is calculated through

experimentation. It uses the past in order to predict the future chance of an event.
• The basic formula is as follows:
• Here we observe several instances of the event and count the number of times A
was satisfied. The division of these numbers is an approximation of the probability.
Bayesian Approach:
• The Bayesian approach differs by dictating that probabilities must be discerned
(determined) using theoretical means.
• Using the Bayes approach, we would have to think a bit more critically about
events and why they occur.
• The important part of the Frequentist approach is the relative frequency.
• The relative frequency of an event is how often an event occurs divided by the
total number of observations.
• Example – marketing stats
• Let's say that you are interested in ascertaining how often a person who visits your
website is likely to return on a later date. This is sometimes called the rate of
repeat visitors.
• we can calculate relative frequency.
– So, in this case, we can take the visitor logs and calculate the relative
frequency of
– event A (repeat visitors).
– Let's say, of the 1,458 unique visitors in the past week, 452 were repeat
visitors.
– We can calculate this as follows:
– P(A) RF(A) =
• So, about 31% of your visitors are repeat visitors

Frequentist Approach
The law of large numbers
• It states that if we repeat a procedure over and over, the relative frequency
probability will approach the actual probability.
Example: Pick the average number between 1 and 10.
• Let's design the experiment to be as follows:
• Python will choose n random numbers between 1 and 10 and find their average.
• We will repeat this experiment several times using a larger n each time, and then
we will graph the outcome.
• The steps are as follows:
1. Pick a random number between 1 and 10 and find the average.
2. Pick two random numbers between 1 and 10 and find their average.
3. Pick three random numbers between 1 and 10 and find their average.
4. Pick 10,000 random numbers between 1 and 10 and find their average.
5. Graph the results.
The law of large numbers - Example
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
results = []
for n in range(1,10000):
nums = np.random.randint(low=1,high=10, size=n) # choose n between 1 and 10
mean = nums.mean() # find the average of these numbers
results.append(mean) # add the average to a running list
len(results) #to know the length of the results
df = pd.DataFrame({ 'means' : results})
print (df.head())
print (df.tail())
df.plot(title='Law of Large Numbers')
plt.xlabel("Number of throws in sample")
plt.ylabel("Average Of Sample")
• Proof: As we increase the sample size of our relative frequency, the frequency
approaches the actual average (probability) of 5.
Why Bayes?
Because Bayes answers the questions we
really care about.
Pr(I have disease | test +) vs Pr(test + | disease)
Pr(A better than B | data) vs Pr(extreme data | A=B)
Bayes is natural (vs interpreting a CI or a P-value)
Note: blue = Bayesian, red = frequentist

You are waiting on a subway platform for a train that is

known to run on a regular schedule, only you don't know
how much time is scheduled to pass between train arrivals,
nor how long it's been since the last train departed.
As more time passes, do you (a) grow more confident

that the train will arrive soon, since its eventual arrival can
only be getting closer, not further away, or
(b) grow less confident that the train will arrive soon,
since the longer you wait, the more likely it seems that
either the scheduled arrival times are far apart or else that
you happened to arrive just after the last train left – or
both.
If you choose (a), you're thinking like a

frequentist.
If you choose (b), you're thinking like a Bayesian.

Compound events
• A compound event is any event that combines two or more simple events.
Given events A and B:
• The probability that A and B occur is P(A ∩ B) = P(A and B)
• The probability that either A or B occurs is P(A∪ B) = P(A or B)
• Let's say that our Universe is 100 people who showed up for an experiment, in
which a new test for cancer is being developed:
• Here, the red circle, A, represents 25 people who actually have cancer.
• Using the relative frequency approach, we can say that
– P(A) = number of people with cancer/number of people in study,
– that is, 25/100 = ¼ = .25.
This means that there is a 25% chance that someone has cancer.
Compound events
• A second event, called B, as shown, which contains people for whom the test was
positive (it claimed that they had cancer).
• Let's say that this is for 30 people.
– So, P(B) = 30/100 = 3/10 = .3.
– This means that there is a 30% chance that the test said positive for any given
person:
• These are two separate events, but they interact with each other. Namely, they
might intersect or have people in common, as shown here:
Compound events
• A intersect B or A ∩ B, are people for whom the test claimed they were positive
for cancer (A) and they actually do have cancer. Let's say that's 20 people.
• The test said positive for 20 people, that is, they have cancer, as shown here:
• This means that P(A and B) = 20/100

= 1/5 = .2 = 20%.
• If we want to say that someone has cancer or the test came back positive.
– This would be the total sum (or union) of the two events, namely, the sum of
5, 20, and 10, which is 35.
– So, 35/100 people either have cancer or had a positive test outcome.
– That means, P(A or B) = 35/100 = .35 = 35%.
Compound events
• We have people in the following four different classes:
• Pink: This refers to the people who have cancer and had a negative test
outcome
• Purple (A intersect B): These people have cancer and had a positive test
outcome
• Blue: This refers to the people with no cancer and a positive test outcome
• White: This refers to the people with no cancer and a negative test outcome
• So, effectively, the only times the test was accurate was in the white and purple
regions.
• In the blue and pink regions, the test was incorrect.
Conditional Probability
• Select an arbitrary person from this study of 100 people, Assume that that their
test result was positive.
• What is the probability of them actually having cancer?
– So the event B has already taken place, and that their test came back
positive.
• The question now is: what is the probability that they have cancer, that is P(A)?
• This is called a conditional probability of A given B or P(A|B).
• Conditional Probability: It is to calculate the probability of an event given that

another event has already happened.
• We can think of conditional probability as changing the relevant universe.
• P(A|B) (probability of A given B) is a way of saying, given that entire universe is
now B, what is the probability of A?
– This is also known as transforming the sample space.
our universe is now B, and we are concerned with AB (A and B) inside of B

Conditional Probability
• The formula can be given as follows:
P(A|B) = P(A and B) / P(B)
= (20/100) / (30/100)
= 20/30
= .66
= 66%
our universe is now B, and we are concerned
with AB (A and B) inside of B
• There is a 66% chance that if a test result came back positive, that person had
cancer.
• In reality, this is the main probability that the experimenters want. They want to
know how good the test is at predicting cancer.
The Rules of Probability
• These rules help us calculate compound probabilities with ease.
The addition rule
• The addition rule is used to calculate the probability of either or events.
• To calculate
– P(A ∪ B) = P(A or B), we use the following formula:
– P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
• To get the union of the two events, we have to add together the area of the circles
in the universe.
• The subtraction of P(A and B) - This is because when we add the two circles, we
are adding the area of intersection twice, as shown in the following diagram:
• If A is the event that someone has cancer, and B is that the test result was positive,
we have:
P(A or B) = P(A) + P(B) – P(A and B)
= .25 + .30 - .2 = .35
• This was calculated before visually in the diagram.
Addition Rule of Probability
Addition Rule: If A and B are two events in a probability experiment, then the probability that either
one of the events will occur is:
P(A or B)=P(A)+P(B)−P(A and B)
Venn diagram representation: P(A∪B)=P(A)+P(B)−P(A∩B)
• P(A ∪ B) – Probability of either A or B happens

• P(A) – Probability of Event A
• P(B) – Probability of Event B
• P(A ∩ B) – Probability of A and B happening together
Mutually Exclusive Events: Events are mutually exclusive or disjoint if they cannot occur at the same time.
• For mutually exclusive events A and B, P(A ∩ B) = 0
Example: On a six-sided dice, each side has a number between 1 and 6. What is the probability of
throwing 3 or 4?
The chance of rolling either 3 or 4 is: 1/6 + 1/6 = 2/6
= 1/3
Addition Rule of Probability
Example:
• If a single card is drawn from a regular pack of cards, what is the probability that the card is either a
queen or spade?
Solution:
Let X be the event of picking a queen and Y be the event of picking a spade.
P(X)=4/52
P(Y)=13/52
The two events are not mutually exclusive, as there is one favorable outcome in which the card can be
both an ace and spade.
P(X and Y)=1/52
Based on addition rule:

P(X or Y)=4/52+13/52−1/52
=16/52
P(X or Y)=4/13
• Mutual exclusivity
– We say that two events are mutually exclusive if they cannot occur at the
same time.
– This means that A∩B = or just that the intersection of the events is the
empty set.
– When this happens, P(A∩B) = P(A and B) = 0.
– If two events are mutually exclusive, then:
• P(A ∪ B) = P(A or B)
= P(A) + P(B) − P(A ∩ B)
= P(A) + P(B)
This makes the addition rule much easier.

• Examples of mutually exclusive events:
• Today is Saturday and today is Wednesday
• I failed Econ 101 and I passed Econ 101
• None of these events can occur simultaneously.
The multiplication rule
• The multiplication rule is used to calculate the probability of and events.
• To calculate
– P(A ∩ B) = P(A and B), we use the following formula:
– P(A ∩ B) = P(A and B) = P(A) · P(B|A)
• Why do we use B|A instead of B? This is because it is possible that B depends on A.

• In cancer trial example, let's find P(A and B).
• The equation will be as follows:
P(A ∩ B) = P(A and B)
= P(A) · P(B|A)
= .25 * .8
= .2
= 20%
• This was calculated before visually.
The multiplication rule
• For example, of a randomly selected set of 10 people, 6 have iPhones and 4 have
Androids. What is the probability that if I randomly select two people, they both will
have iPhones?
• This example can be retold using event spaces, as follows:
• 2 EVENTS:
– A: This event shows the probability that I choose a person with an iPhone first
– B: This event shows the probability that I choose a person with an iPhone
second
• P(A and B): P( choose a person with an iPhone and a person with an iPhone)
• So use P(A and B) = P(A) · P(B|A) formula.
• P(A) is - People with iPhones are 6 out of 10, so, 6/10 = 3/5 = 0.6 chance of A. This
means P(A) = 0.6.
• But, We only have 9 people left to choose our second person from, because one was
taken away.
– So in our new transformed sample space, we have 9 people in total, 5 with iPhones and 4 with androids,
making P(B) = 5/9 = .555.
– So, the probability of choosing two people with iPhones is 0.6 * 0.555 = 0.333 = 33%.
– So there is 1/3 chance of choosing two people with iPhones out of 10.
• The conditional probability is very important in the multiplication rule.
Independence
• Two events are independent if one event does not affect the outcome of
the other,
that is P(B|A) = P(B) and P(A|B) = P(A).
• If two events are independent, then:
P(A ∩ B) = P(A) · P(B|A) = P(A) · P(B)
Example: Flip a coin and get heads and flip another coin and get tails
Complementary events
• The complement of A is the opposite or negation of A.
• If A is an event, represents the complement of A.
• For example, if A is the event where someone has cancer, is the event where
someone is cancer free.
Complementary events:
P(A) = l – (P(2)+P(3))
= 1 – (1/36 + 2/36)
= 1 – (3/36)
= 33/36
=.9
Difference Between Mutually Exclusive and
Independent Events
• A mutually exclusive event can simply be defined as a situation when two
events cannot occur at same time whereas independent event occurs
when one event remains unaffected by the occurrence of the other event.
• An example of a mutually exclusive event is when a coin is a tossed and

there are two events that can occur, either it will be a head or a tail.
Hence, both the events here are mutually exclusive. But if we take two
separate coins and flip them, then the occurrence of Head or Tail on both
the coins are independent to each other.
Mutually exclusive events Independent events

• When the occurrence is not simultaneous for When the occurrence of one event does not control
two events then they are termed as Mutually the happening of the other event then it is termed
exclusive events. as an independent event.
• The mathematical formula for mutually The mathematical formula for independent events
exclusive events can be represented as can be defined as P(X and Y) = P(X) P(Y)
P(X and Y) = 0
• The sets will not overlap in the case of mutually The sets will overlap in the case of independent
exclusive events. events.
Binary Classifier and Confusion Matrix
Binary classifier: Trying to predict from only two options: have cancer or no
cancer.
• When we are dealing with binary classifiers, we can draw confusion
matrices, which are 2 x 2 matrices that comprises all the four possible
outcomes of our experiment.
Example:
• Let's try some different numbers. Let's say 165 people walked in for the
study. So, our n (sample size) is 165 people. All 165 people are given the
test and asked if they have cancer (provided through various other
means).
• The confusion matrix shows us the results of this experiment:
Binary Classifier and Confusion Matrix
• The true positives are the tests correctly predicting positive (cancer) == 100
• The true negatives are the tests correctly predicting negative (no cancer) == 50
• The false positives are the tests incorrectly predicting positive (cancer) == 10
• The false negatives are the tests incorrectly predicting negative (no cancer) == 5
• The first two classes indicate where the test was correct or true.
• The last two classes indicate where the test was incorrect or false.
• False positives are sometimes called a Type I error

• False negatives are called a Type II error.
Probability Examples
Try it yourself
• Use the chart below showing the gender and their class in a school for
30 students in a Maths class.
• Find the solution for the given problems:
What is the probability that a randomly selected student is a female?

•1/2
•7/13
•8/15
Try it yourself
• What is the probability that a randomly
selected student is a senior?
• 1/2
• 9/16
• 17/30
selected student is a female and a senior?
• 1/4
• 3/10
• 9/16
Try it yourself
selected student is a female or a senior?
• 3/4
• 4/5
• 11/10
selected female student is a senior
• 1/2
• 9/17
• 9/16
Try it yourself
• What is the probability that a randomly selected senior
student is a female?
• 1/2
• 9/17
• 9/16
• Suppose that two students are randomly selected from

the class without replacement. What is the probability
that both students are female?
• 1/4
• 8/29
• 64/225
Summary
• Set notation to denote probabilities: When events A and B

exist in the same universe, we can use intersections and
unions to represent them happening either at the same time
or to represent one happening versus the other.
• Applied Frequentist approach, and expressed the basics of

experimentation and using probability to predict outcome.
Advanced Probability
• Advanced Probability
– To explore more complicated theorems of probability and how we can
use them in a predictive capacity.
– Bayes theorem and random variables, give rise to common machine
learning algorithms, such as the Naïve Bayes algorithm
Concepts:
• Exhaustive events
• Bayes theorem
• Basic prediction rules
• Random variables
Collectively exhaustive events

• When given a set of two or more events, if at least one of the events must
occur, then such a set of events is said to be collectively exhaustive.
Consider the following examples:
• Given a set of events {temperature < 60, temperature > 90}, these events
are not collectively exhaustive because there is a third option that is not
given in this set of events: The temperature could be between 60 and 90.
– However, they are mutually exhaustive because both cannot happen
at the same time.
• In a dice roll, the set of events of rolling a {1, 2, 3, 4, 5, or 6} are

collectively exhaustive because these are the only possible events, and at
least one of them must happen.
Bayesian Approach:
• When applying Bayes, the following three things are considered along with
how they all interact with each other:
• A prior distribution
• A posterior distribution
• A likelihood
• Basically, we are concerned with finding the posterior. - That's the thing
we want to know.
• We have a prior probability, or what we naively think about a hypothesis,

and then we have a posterior probability, which is what we think about a
hypothesis, given some data.
Bayes theorem
• Bayes theorem is the big result of Bayesian inference.
• Consider the following (Already known):
• P(A) = The probability that event A occurs
• P(A|B) = The probability that A occurs, given that B occurred
• P(A, B) = The probability that A and B occurs
• P(A, B) = P(A) * P(B|A) [The probability that A and B occur is the probability
that A occurs times the probability that B occurred, given that A already occurred]
• It's from that last point that Bayes theorem takes its meaning.
We know that:
P(A, B) = P(A) * P(B|A)
P(B, A) = P(B) * P(A|B)
P(A, B) = P(B, A)
So:
P(B) * P(A|B) = P(A) * P(B|A)
• Bayes theorem,
Bayes theorem
Bayes using the terms hypothesis and data:

• Suppose H = your hypothesis about the given data and D = the data given.
• Bayes can be interpreted as trying to figure out P(H|D) (the probability that
our hypothesis is correct, given the data).
To use terminology from before:
• P(H) - the probability of the hypothesis before we observe the data, called
the prior probability or just prior
• P(H|D) - what we want to compute, the probability of the hypothesis after
we observe the data, called the posterior
• P(D|H) - the probability of the data under the given hypothesis, called the
likelihood
• P(D) - the probability of the data under any hypothesis (the normalizing constant)
Bayes Theorem
• Applications: Bayes theorem shows up in a lot of applications, usually
when we need to make fast decisions based on data and probability. Most
recommendation engines, such as Netflix's, use some elements of
Bayesian updating.
Example – Titanic Data
• A very famous dataset involves looking at the survivors of the sinking of
the Titanic in 1912. We will use an application of probability in order to
figure out if there were any demographic features that showed a
relationship to passenger survival.
• Mainly, we are curious to see if we can isolate any features of our dataset
that can tell us more about the types of people who were likely to survive
this disaster.
• Each row represents a single passenger on the ship, and, for now, we are
looking at two specific features: the gender of the individual and whether
or not they survived.
• For example, the first row represents a man who did not survive while the
fourth row (with index 3,) represents a female who did survive.
Python code – To find survival analysis of passengers based on the attribute
import pandas as pd
titanic = pd.read_csv('C:/Users/Nithya/Desktop/titanic.csv’) #read in a csv
titanic = titanic[['Sex', 'Survived']] #the Sex and Survived column
titanic.head()
num_rows = float(titanic.shape[0])
print(num_rows)
p_survived = (titanic.Survived==1).sum() / num_rows
print(p_survived)
p_notsurvived = 1 - p_survived
print(p_notsurvived)
p_male = (titanic.Sex=="male").sum() / num_rows
print(p_male)
p_female = 1 - p_male # == .35
print(p_female)
number_of_women = titanic[titanic.Sex=='female'].shape[0]
print(number_of_women)
women_who_lived = titanic[(titanic.Sex=='female') & (titanic.Survived==1)].shape[0]
print(women_who_lived)
p_survived_given_woman = women_who_lived / (number_of_women)
print(p_survived_given_woman)
Medical test example:
Suppose a test is 95% accurate when a disease is present and

97% accurate when the disease is absent.
Suppose that 1% of the population has the disease.
What is P(have the disease | test +)?

P(dis)P(test+ | dis)
p(dis | test+) =
P(dis)P(test+ | dis) + P(Ødis)P(test+ | Ødis)
(0.01)(0.95) 0.0095
= = » 0.24
(0.01)(0.95) + (0.99)(0.03) 0.0095 + 0.0297
Typical statistics problem: There is a parameter, θ, that we
want to estimate, and we have some data.
Traditional (frequentist) methods: Study and describe P(data |

θ). If the data are unlikely for a given θ, then state “that value
of θ is not supported by the data.” (A hyp. test asks whether a
particular value of θ might be correct; a CI presents a range of
plausible values.)
Bayesian methods: Describe the distribution P(θ | data).
A frequentist thinks of θ as fixed (but unknown)

while a Bayesian thinks of θ as a random variable
that has a distribution.
Bayesian reasoning is natural and easy to think about. It is

becoming much more commonly used.
Review of Simple Probability
• The probability of a simple event is a ratio of the number of

favorable outcomes for the event to the total number of
possible outcomes of the event.
• The probability of an event a can be expressed as:
number of favorable outcomes

P(a ) =
total number of possible outcomes
Find Outcomes of simple events
• For Simple Events – count the outcomes
• Examples:
One Die- 6 outcomes
One coin- 2 outcomes
One deck of cards- 52 outcomes; 4 Aces, 12
face cards, and 36 non-face cards(ie 2-10);
Basic Probabilities
• What is the probability of drawing a black

ace from a deck of cards;
• 2/52= .0385 or 3.85%
Compound Events
• Events that cannot occur at the same time are
called mutually exclusive.
• Suppose you want to find the probability of

rolling a 2 or a 4 on a die. P(2 or 4)
• Since a die cannot show both a 2 and a 4 at

the same time, the events are mutually
exclusive.
Compound Mutually Exclusive
Addition Rule of Probability: Two
or More Events
• P(A or B)= P(A) + P(B) – P( A and B)
• Example: What is probability of drawing a Ace
of Spades or a 2 from a deck of shuffled cards?
• Solution: 1/52 + 4/52=5/52= .0962=9.62%
• What is the probability of drawing an Ace OR a
Spade from a deck?
• Solution: Ace=4/52, spades equal 13/52 minus
1/52= 16/52= .3076 or 30.76%.
Additive Rule… more challenging problem
First: Create Your Sample Space
1 2 3 4 5 6
1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
6 (6,1) (6,2) (6,3) (6,4) (5,6) (6,6)

Now solve the
problem….
Probability of Compound events
P(jack, tails)
4 1 4
( )=  0.04  4%
52 2 104
Compound Event Notations
Compound Events
• When the outcome of one event does not
affect the outcome of a second event, these
are called independent events.
• The probability of two independent events is

found by multiplying the probability of the
first event by the probability of the second
event, minus probability of B given that A has
already occurred.
Example Problems to TRY!!
• What is the probability of drawing an Ace
and a King from a deck of cards?
• What is the probability of drawing an Ace
OR a King from a deck of cards?
• What is the probability of drawing a 2 OR a
Heart from a deck of cards?
• Find probability of rolling 3 consecutive 3s
with a fair dice?
Random Variables
• Discrete Random Variables
• Binomial
• Geometric
• Poisson
• Continuous Random Variables
Definition of a Random Variable
• Random variable
– A numerical value to each outcome of a particular experiment
– They might take on multiple values depending on the environment
• The main distinction between variables and a random variable is the
fact that a random variable's value may change depending on the
situation
• Each value that a random variable might take on is associated with a
percentage
• Random variable uses real numerical values to describe a probabilistic
event
Random Variables
• Probability distribution of a random variable
– which gives the variable's possible values and their probabilities
Example:
• X = the outcome of a dice roll
• Y = the revenue earned by a company this year
• Z = the score of an applicant on an interview coding quiz (0-100%)
• A random variable is a function that maps values from the sample space
of an event (the set of all possible outcomes) to a probability value
(between 0 and 1).
Random Variables
Discrete random variables
• A discrete random variable only takes on a countable number of
possible values.
• Example:
• Probability Mass Function (PMF) - to describe a discrete random

variable
– P(X = x) = PMF
So, for a dice roll, P(X = 1) = 1/6 and
P(X = 5) = 1/6.
Example - Probability distribution of a discrete random variable
Suppose a variable X can take the values 1, 2, 3, or 4. The probabilities
associated with each outcome are described by the following data:
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2
• The probability that X is equal to 2 or 3 is the sum of the two

probabilities:
P(X = 2 or X = 3) = P(X = 2) + P(X = 3)
= 0.3 + 0.4 = 0.7
Similarly, the probability that X is greater than 1 is equal to
P(X>1) = 1 - P(X <= 1)
= 1 - 0.1
= 0.9 (by the complement rule)
Random Variables
Discrete random variables
Properties of Random variables:
– Expected value and the variance.
• The expected value of a random variable defines the mean

value of a long run of repeated samples of the random
variable.
• This is sometimes called the mean of the variable.

Expectation of a random variable – Mean Variance
• The expected value (or mean) of X, where X is a discrete random

variable, is a weighted average of the possible values that X can take,
each value being weighted according to the probability of that event
occurring. The expected value of X is usually written as E(X) or m.
E(X) = S . P(X = x)
• So the expected value is the sum of: [(each of the possible outcomes) ×
(the probability of the outcome occurring)].
• In more concrete terms, the expectation is what you would expect the
outcome of an experiment to be on average.
or
Mean of a Discrete Random Variable
• The mean of the discrete random variable X is also called the expected
value of X. Through Notation, the expected value of X is denoted by
E(X). Use the following formula to compute the mean of a discrete
random variable.
E(X) = μx = Σ [ xi * P(xi) ]
• where xi is the value of the random variable for outcome i, μx is the
mean of random variable X, and P(xi) is the probability that the random
variable will be outcome i.
Expectation of a random variable – Mean
• To calculate the Expected Value:
– multiply each value by its probability
– sum them up
• Example: Find expected value of discrete random variable x.
x 1 2 3 4 5 6
p 0.1 0.1 0.1 0.1 0.1 0.5
• Solution:
x 1 2 3 4 5 6
p 0.1 0.1 0.1 0.1 0.1 0.5
xp 0.1 0.2 0.3 0.4 0.5 3
• μ = Σxp = 0.1+0.2+0.3+0.4+0.5+3 = 4.5

The expected value is 4.5
Example: In a little league softball game, each player went to bat 4
times. The number of hits made by each player is described by the
following probability distribution. What is the mean of the
probability distribution?
Probability,
Number of hits, x
P(x)
0 0.10
1 0.20
2 0.30
• Solution 3 0.25
4 0.15
E(X) = Σ [ xi * P(xi) ]
E(X) = 0*0.10 + 1*0.20 + 2*0.30 + 3*0.25 +4*0.15
= 2.15
Example: What is the expected value when we roll a fair die?
• Solution:
• There are six possible outcomes: 1, 2, 3, 4, 5, 6. Each of these has a
probability of 1/6 of occurring. Let X represent the outcome of the
experiment.
• Therefore P(X = 1) = 1/6 (this means that the probability that the
outcome of the experiment is 1 is 1/6)
P(X = 2) = 1/6 (the probability that you throw a 2 is 1/6)
• E(X) = 1×P(X = 1) + 2×P(X = 2) + 3×P(X = 3) + 4×P(X=4) + 5×P(X=5) +
6×P(X=6)
• Therefore E(X) = 1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6 = 7/2
• So the expectation is 3.5 .
Expectation of a random variable
– Variance
• The variance of a discrete random variable X measures the spread, or
variability, of the distribution, and is defined by Var(X) = Σx2p − μ2
• The variance of a random variable tells us something about the spread of
the possible values of the variable. For a discrete random variable X, the
variance of X is written as Var(X).
Var(X) = E[ (X – m)2 ] where m is the expected value E(X)
• This can also be written as:
Var(X) = E(X2) – m2
• The standard deviation of X is the square root of Var(X).

Expectation of a random variable – Mean, Variance
To calculate the Variance:

•square each value and multiply by its probability
•sum them up and we get Σx2p
•then subtract the square of the Expected Value μ2
x 1 2 3 4 5 6
• Example: p 0.1 0.1 0.1 0.1 0.1 0.5
Find variance and standard deviation.

x 1 2 3 4 5 6
• Solution:
p 0.1 0.1 0.1 0.1 0.1 0.5
x2 p 0.1 0.4 0.9 1.6 2.5 18
Σx2p = 0.1+0.4+0.9+1.6+2.5+18 = 23.5

Var(X) = Σx2p − μ2 = 23.5 - 4.52 = 3.25
The variance is 3.25
σ = √Var(X)
The Standard Deviation is 1.803
Expectation of a random variable –
Example:
Mean, Variance
x 1 2 3 4
p(x) .10 .30 .40 .20
1. Find E(X), the expectation of X.

2. Find var(X) and sd(X), the variance and standard deviation of X.
Answer:
E(X) = (.10)(1) + (.30)(2) + (.40)(3) + (.20)(4) = 2.7.
Var(x) = .81.
sd(X) = p var(X) = 0.9.
Expectation of a random variable – Mean, Variance
Example: The project has a 2% chance of failing completely and a 26% chance of
being a great success! Calculate the expected value of success with its variation. Also
find the chance that our product will have success rate of 3 or higher.
Solution:
E[X] = 0(0.02) + 1(0.07) + 2(0.25) + 3(0.4) + 4(0.26) = 2.81
So, The manager can expect a success of about 2.81 out of this project.
Variance=V[X] = 0.93
We could say that our project will have an expected score of 2.81 plus or minus .93
meaning that can expect something between 1.88 and 3.74.
To find success rate > 3
P(X >= 3) = P(X = 3) + P(X = 4) = .66 = 66%
• This means that we have a 66% chance that our product will rate as either a 3 or a 4.
• Another way to calculate this would be the conjugate way, as shown here:
P(X >= 3) = 1 – P(X < 3)
P(X < 3) = P(X = 0) + P(X = 1) + P(X = 2) = 0.02 + 0.07 + 0.25 = .034
1 – P(X < 3) = 1 - .34 = .66 = P( x >= 3)
Types of discrete random variable
Types of Discrete Random Variables:
• Binomial
• Geometric
• Poisson
Binomial random variables:
A binomial setting has the following four conditions:
• The possible outcomes are either success or failure
• The outcomes of trials cannot affect the outcome of another trial
• The number of trials was set (a fixed sample size)
• The chance of success of each trial must always be p
• A binomial random variable is a discrete random variable, X, that counts

the number of successes in a binomial setting. The parameters are n = the
number of trials and p = the chance of success of each trial.
Binomial Distribution - PMF
Probability Mass Function - The binomial distribution formula is:

b(x; n, P) = nCx * Px * (1 – P)n – x
Where:
b = binomial probability
x = total number of “successes” (pass or fail, heads or tails etc.)
P = probability of success on an individual trial
n = number of trials
or
• A binomial random variable, X, is written as X∼Bin(n,p)
• The probability mass function is given as

•
Binomial Distribution - Graph
b(x; n, P) = nCx * Px * (1 – P)n – x

Binomial Distribution for n=10, p=0.3
x p(x) Binomial distribution - column chart

0 0.028248 0.3
1 0.121061 0.25
2 0.233474
3 0.266828 0.2
4 0.200121
P(x)
0.15
5 0.102919
6 0.036757 0.1
7 0.009002 0.05
8 0.001447
9 0.000138 0
0 1 2 3 4 5 6 7 8 9 10
10 5.9E-06
Value of x
Binomial random variables – Example Problems:
The probability mass function (PMF) for a binomial random variable:
– p=prob of success in one trial

– q=prob of failure in one trial
– n=number of trials
– x= number of successes in ‘n’ trials
Example – restaurant openings
• A new restaurant in a town has a 20% chance of surviving its first year. If 14
restaurants open this year, find the probability that exactly four restaurants survive
their first year of being open to the public.
Binomial random variables:
Solution:
First, we should prove that this is a binomial setting:
• The possible outcomes are either success or failure (the restaurants either survive
or not)
• The outcomes of trials cannot affect the outcome of another trial (assume that the
opening of one restaurant doesn't affect another restaurant's opening and survival)
• The number of trials was set (14 restaurants opened)
• The chance of success of each trial must always be p (we assume that it is always
20%)
Two parameters of n = 14 and p = 0.2.
So, 17% chance that exactly 4 of these restaurants will survive.

Binomial random variables
Example – blood types
• A family has a 25% chance of a having a child with type O blood. What is the
chance that 3 of their 5 kids have type O blood?
Solution:
• Let X = the number of children with type O blood with n = 5 and p = 0.25, as shown
here:
P(X = 3) =
10(0.25)3(0.75)2 = 0.087
• We can calculate this probability for the values of 0, 1, 2, 3, 4, and 5 to get a sense
of the probability distribution:
• From here, we can calculate an expected value and the variance of this variable:
• So, this family can expect to have probably 1 or 2 kids with type O blood!
Binomial random variables
Example – blood types
• What if we want to know the probability that at least 3 of their kids have
type O blood?
Solution:
• To know the probability that at least three of their kids have type O blood,
we can use the following formula for discrete random variables:
• So, there is about a 10% chance that three of their kids have type O
blood.
Binomial random variable Applications:
• A binomial random variable is a discrete random variable that
counts the number of successes in a binomial setting.
• It is used in a wide variety of data-driven experiments, such as

counting the number of people who will sign up for a
website given a chance of conversion, or even, at a simple
level, predicting stock price movements given a chance of
decline.
More Binomial Problems
Binomial…
• There are 6 trials, so n=6. We are concerned with
only 2 of them being color blind, so x will be 2. A
‘success’ =.09 and a failure, or those who aren’t
color blind =1–.09= .91
• p=.09
• q=.91
• n=6
• x=2
Binomial Distribution Problems
The probability of getting a question correct, a

success, is 1/5 or .20. The probability of not
getting question correct. A failure, is 4/5 or .80
Binomial
p=.20
q= .80
n=18
x=5
2. Geometric random variables
• It is actually quite similar to the binomial random variable in that we are
concerned with a setting in which a single event is occurring over and over.
• The major difference is that we are not fixing the sample size.
Four conditions:
• The possible outcomes are either success or failure
• The outcomes of trials cannot affect the outcome of another trial
• The number of trials was not set
• The chance of success of each trial must always be p
Note: These are the exact same conditions as a binomial variable, except the third
condition.
• A geometric random variable is a discrete random variable, X, that counts the
number of trials needed to obtain one success.
• The parameters are p = the chance of success of each trial and (1 − p) = the chance
of failure of each trial. The formula for the PMF is as follows:
P(X = x) = (1−p)[x−1]p
3. Poisson random variables
• This is used when an event that we wish to model has a small probability of
happening and that we wish to count the number of times that the event occurs
in a certain time frame.
• If we have an idea of the average number of occurrences, μ, over a specific period
of time, given from past instances, then the Poisson random variable, denoted by
X = Poi(μ), counts the total number of occurrences of the event during that given
time period.
Examples of Poisson random variables:
• Finding the probability of having a certain number of visitors on your site within an
hour, knowing the past performance of the site
• Estimating the number of car crashes at an intersection based on past police reports
• If we let X = the number of events in a given interval, and the average number of
events per interval is the λ number, then the probability of observing x events in a
given interval is given by the following formula:
2. Poisson random variables
Example – call center:
The number of calls arriving at your call center follows a Poisson distribution
at the rate of 5 calls/hour. What is the probability that exactly six calls will
come in between 10 and 11 p.m.?
Solution:
• Let X be the number of calls that arrive between 10 and 11 p.m. This is our
Poisson random variable with mean λ = 5.
• The mean is 5 because we are using 5 as our previous expected value of
the number of calls to come in at this time.
P(X = 6) =( e(-5)*56) / 6!
= 0.146
• This means that there is about a 14.6% chance that exactly six calls will
come between 10 and 11 p.m.s
Random variables
Continuous random variables
• A continuous random variable can take on an infinite number of possible
values
• It can take all possible values between certain limits. Continuous random
variables are usually measurements.
• It can also take integral as well as fractional values.
• Examples include height, weight, the amount of sugar content in an
orange, the time required to run a mile. The height, weight, age of a
person, the distance between two cities etc. are some of the continuous
random variables.
• A continuous random variable is not defined at specific values. Instead, it
is defined over an interval of values, and is represented by the area under
a curve (in advanced mathematics, this is known as an integral).
Random variables
Continuous random variables
Consider the following examples of continuous variables:
• The length of a sales representative's phone call (not the number of calls)
• The actual amount of oil in a drum marked 20 gallons (not the number of oil
drums)
• If X is a continuous random variable, then there is a function, f(x), such that for any
constants a and b:
• The most important continuous distribution is the standard normal distribution.

• The PDF of this distribution is as follows:
Continuous Random Variables
Example of Continuous Random
Variables
• Example: Metal Cylinder Production
– Suppose that the random variable is the
diameter of a randomly chosen cylinder
manufactured by the company. Since this
random variable can take any value between
49.5 and 50.5, it is a continuous random
variable.
Examples
• A coin is tossed 10 times. What is the probability of getting exactly 6
heads?
Binomial variable formula: nCx * (P)x * (1 – P)n – x
The number of trials (n) is 10
The odds of success (“tossing a heads”) is 0.5 (So 1-p = 0.5)
x=6
P(x=6) = 10C6 * 0.5^6 * 0.5^4 = 210 * 0.015625 * 0.0625 = 0.205078125

Examples
Problem :
10 coins are tossed simultaneously where the probability of getting head

for each coin is 0.6.
Find the probability of getting 4 heads.
Solution:
Probability of getting head, p = 0.6
Probability of not getting head, q = 1 - p = 1 - 0.6 = 0.4
Probability of getting 4 heads out of 10,

P(X=4)=
= 0.111476736
Examples
Problem 3:
In an exam, 10 multiple choice questions are asked where only one

out of four answers is correct.
Find the probability of getting 5 out of 10 questions correct in an
answer sheet.
Solution:
Probability of getting an answer correct, p = 1/4 = 0.25
Probability of getting an answer incorrect, q = 1 - p = 1 - 0.25 = 0.75
Probability of getting 5 answers correct, P(X = 5) =

= 0.05839920044
Example:
The average sales score by a company is 1.5/day. What is the probability that this company achieves more
than 2 scores in a given day?
Solution:
• It follows Poisson distribution: mean = λ = 1.5; P(x=0) = e-1.5 (1.5)0 / 0!
• For P(x>2), any number of values could be considered from 2 onwards. = (0.2231) (1)/1
• So, we consider x<=2. = 0.2231
P(x>2) = 1-p(x<=2)
P(x=1) = e-1.5 (1.5)1 / 1!
= 1 – [ P(x=0) + P(x=1)+P(x=2) ] = 0.2231 * 1.5
= 0.3347
= 1 – [ 0.2231+0.3347+0.251]
P(x=2) = e-1.5 (1.5)2 / 2!
= 1 - 0.8088 = (0.2231 * 2.25 )/2
= 0.251
= 0.1912
So, there is a 19.12 % of chance that the company achieves more than 2 scores.
Examples
Poisson random variable
Examples
Examples
Module 2 - Questions
Vectors and Matrices, sets
1. What a vector is and how to define one in Python with NumPy?
2. How to perform vector arithmetic such as addition, subtraction, multiplication and

division.
3. How to perform additional operations such as dot product and multiplication with a
scalar.
4. Why data is represented as a ‘vector’ in data science? What is the use of vectors in
data science?
5. List various symbols associated with basic arithmetic operations of vectors in data
science.
6. Define set and set theory. Illustrate set operations with suitable examples. (Any
data will be given to show the operations in Python)
1. Differentiate Bayesian versus frequentist approaches
2. Suppose a test is 95% accurate when a disease is present and 97% accurate when
the disease is absent. Suppose that 1% of the population has the disease. What is P
(have the disease | test +)?
3. Let's say that we are interested in ascertaining how often a person who visits the
particular website is likely to return on a later date. There are 1,458 unique visitors
in the past week, 452were repeat visitors. Calculate the relative frequency of
repeat visitors.
Probability
1. Derive the rules of probability in detail with an illustration.
2. Derive Bayes’ theorem for two events (Hypothesis and Data). Also describe any
two of its applications.
3. Let's say 165 people walked in for the study. All 165 people are given the test and
asked if they have cancer (provided through various other means). 50 people were
predicted to have no cancer and did not have it, 100 people were predicted to have
cancer and actually did have it. Formulate the confusion matrix to show the results
of this experiment.
4. Differentiate Type-I and Type-II errors with an example.
Note: Refer Problems – based on probability, Bayes approach

Random variables
1. List the types of discrete random variables. Identify the conditions in which these
random variables are appropriate and describe them with necessary examples.
2. Explain the properties of random variables. (Expectation and variance of a random

variable – problem will be asked)
3. Differentiate discrete random variable and continuous random variable with an

example.
4. Explain binomial random variable with its function (Problems also)
5. Differentiate discrete and continuous random variable. Give examples.
Note: Refer all problems related to all the types of random variables.

Unit 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2

Uploaded by

Copyright:

Available Formats

UNIT - 2

Probability, Random Variables and Probability Distributions:

Probability - Basic Definitions, Bayesian versus Frequentist Approach,

Random Variables-Types of Random Variables-Discrete and Continuous,

Note: The maximum probability of any event is 1.

Probability - The likelihood of occurrence of an event

Probability(Event) = Number of Favourable Outcomes/Total Number of Outcomes

• The probability value always lies between 0 and 1. 0⩽P(E)⩽1

• In a Frequentist approach, the probability of an event is calculated through

• So, about 31% of your visitors are repeat visitors

Pr(A better than B | data) vs Pr(extreme data | A=B)

Bayes is natural (vs interpreting a CI or a P-value)

Note: blue = Bayesian, red = frequentist

You are waiting on a subway platform for a train that is

As more time passes, do you (a) grow more confident

If you choose (a), you're thinking like a

If you choose (b), you're thinking like a Bayesian.

• This means that P(A and B) = 20/100

• Conditional Probability: It is to calculate the probability of an event given that

our universe is now B, and we are concerned with AB (A and B) inside of B

• P(A ∪ B) – Probability of either A or B happens

Based on addition rule:

This makes the addition rule much easier.

• Why do we use B|A instead of B? This is because it is possible that B depends on A.

• If A is an event, represents the complement of A.

• An example of a mutually exclusive event is when a coin is a tossed and

Mutually exclusive events Independent events

• False positives are sometimes called a Type I error

• Find the solution for the given problems:

What is the probability that a randomly selected student is a female?

• Suppose that two students are randomly selected from

• Set notation to denote probabilities: When events A and B

• Applied Frequentist approach, and expressed the basics of

Collectively exhaustive events

Consider the following examples:

• In a dice roll, the set of events of rolling a {1, 2, 3, 4, 5, or 6} are

• We have a prior probability, or what we naively think about a hypothesis,

Bayes using the terms hypothesis and data:

To use terminology from before:

Suppose a test is 95% accurate when a disease is present and

What is P(have the disease | test +)?

Traditional (frequentist) methods: Study and describe P(data |

Bayesian methods: Describe the distribution P(θ | data).

A frequentist thinks of θ as fixed (but unknown)

Bayesian reasoning is natural and easy to think about. It is

• The probability of a simple event is a ratio of the number of

number of favorable outcomes

• What is the probability of drawing a black

• Suppose you want to find the probability of

• Since a die cannot show both a 2 and a 4 at

4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

6 (6,1) (6,2) (6,3) (6,4) (5,6) (6,6)

• The probability of two independent events is

• Probability Mass Function (PMF) - to describe a discrete random

• The probability that X is equal to 2 or 3 is the sum of the two

Properties of Random variables:

– Expected value and the variance.

• The expected value of a random variable defines the mean

• This is sometimes called the mean of the variable.

• The expected value (or mean) of X, where X is a discrete random

• μ = Σxp = 0.1+0.2+0.3+0.4+0.5+3 = 4.5

• The standard deviation of X is the square root of Var(X).

To calculate the Variance: