Probability and Statistics 3

from scipy import stats
import numpy as np
import statistics
data ={"A": 90,"B": 86,"C":70,"D":95,"E":95,"F":95,"G":95}
values = list(data.values())
print("Mean")
print(np.mean(values))
print("Median")
print(np.median(values))
print("Mode")
print(statistics.mode(values))
print("Standard Deviation")
print(np.std(values))
print("Variance")
print(np.var(values))
print("range")
print(stats.iqr(values))```
Probability
Probability
The likelihood of the occurrence of an event is known as probability.
The value of probability of an event lies between 0 and 1.
Formula
Probability (P)=\dfrac{No. of Favorable Outcomes}{Total no. of

outcomes}=\dfrac{n(E)}{n(S)}
Totalno.ofoutcomes
No.ofFavorableOutcomes
=
n(S)
n(E)
Scenario
Consider a scenario where three coins are tossed simultaneously. What is the
probability of at least one being heads?
Solution
Sample Space
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
n(S)=8
Event
E= Getting at least one heads
E={HHH, HHT, HTH, THH, HTT, THT,TTH}
n(E)=7
Probability
P (getting one heads)= \dfrac{7}{8}

8
7
Types of Events
Simple Event
Any event containing a single element of a sample space.
Compound Event
Any event containing two or more elements of a sample space.
Dependent Event
If the occurrence of an event is influenced by another event, it is called a
'Dependent Event'.
Independent Event
If the occurrence of an event is not influenced by another event, it is called an
'Independent Event'.
Exhaustive Events
A set of events devouring the entire sample space.
Mutually Exclusive
Two events are said to be mutually exclusive events when both cannot occur at the
same time.
Example:
While driving a car, the steering wheel cannot be turned left and right at the same
time. Therefore, the events turning left and turning right are considered to be
mutually exclusive.
Properties of Probability
Property 1
Probability value lies between 0 and 1.
0 \leq Probability \leq 10≤Probability≤1
Property 2
The probability of an impossible event is 0.
P(\emptyset)=0P(∅)=0
Property 3
The probability of a confirmed event is 1.
P(S)=1P(S)=1
Property 4
The sum of the probabilities of an event, and its complementary is 1.
P(A) + P(\overline{A}) =1P(A)+P(

A
)=1
Types of Probability
Joint Probability - Example

Joint Probability is a measure of two events happening at the same time, and can
only be applied to situations where more than one observation can occur at the same
time.
P(A\: \small{and}\:B)P(AandB) or P(A \cap B)P(A∩B)
Question:
In a deck of 52 cards, find the probability of a card that is red in color, and
contains the number 6.
Solution:
P(Six\: {\footnotesize and} \:Red)P(SixandRed) = \dfrac{2}{52}

52
Explanation:
There are two red color Sixes are present in a deck of 52, the 6 of hearts and the
6 of diamonds.
Conditional Probability:
The probability of an event ( A ), given that another event ( B ) has already

occurred.
P(A|B)P(A∣B) or P(A, given B)
Question:
Given that you draw a red card from a deck of 52 cards, what is the probability
that it is a card with number 6?
Solution:
P(Six | Red) = \dfrac{2}{26}=\dfrac{1}{13}P(Six∣Red)=

26
2
=
13
Explanation:
Among the 26 red cards, there are two cards with number 6. Therefore, it is
\frac{2}{26}
26
Addition Rule
Mutually Exclusive Events
If A and B are mutually exclusive events,
P(A\cup B) = P(A) + P(B)P(A∪B)=P(A)+P(B)
Non-mutually Exclusive Events
If A and B are non-mutually exclusive events,
P(A\cup B) = P(A) + P(B) - P(A\cap B)P(A∪B)

= P(A)+P(B)−P(A∩B)
Multiplication Rule
Independent Events
If A and B are two Independent Events,
P(A \cap B) = P(A) P(B)P(A∩B)=P(A)P(B)
Dependent Events
If A and B are two Dependent Events,
P(A \cap B) = P(A) P\Big(\dfrac {B}{A}\Big)P(A∩B)=P(A)P(

A
B
)
Bayes Theorem
From the multiplication and addition probability rules, Bayes theorem can be
formed.
From Product rule
P(X \cap Y) = P(X|Y)P(Y)P(X∩Y)=P(X∣Y)P(Y)
P(Y \cap X) = P(Y|X)P(X)P(Y∩X)=P(Y∣X)P(X)
then
P(Y|X) = \dfrac{P(X|Y)*P(Y)}{P(X)}P(Y∣X)=
P(X)
P(X∣Y)∗P(Y)
where
P(X) = P(X \cap Y) + P(X \cap Y^{c})P(X)=P(X∩Y)+P(X∩Y

c
) (Addition Rule)
Random Variables
A random variable represents the outcome of statistical experiments on numerical
values.
Represented by X.
Example:
Consider the experiment of tossing a coin, where the result could either be heads
or tails.
Assume that the values are:
Heads = 0
Tails = 1
X will be the random variable denoting the outcome of tossing the coin.
X= \big\{0,1 \big\}X={0,1}
Types of Random Variables

Discrete Random Variables
A variable that can take only countable number of values.

Example:
Consider the experiment of tossing many coins, and count the number of times you
get heads.
Explanation
Let X be the random variable denoting the number of times you get heads.
X=\big\{0, \infty \big\}X={0,∞}
The number of times you get heads can be any positive integer between (0, \infty)
(0,∞). It cannot be a continuous number (2.5 heads).
Types of Random Variables

Continuous Random Variable
Continuous Random value can take any value within an interval \big\{0, \infty
\big\}{0,∞}.
It can take infinite number of values.
Example:
The Fire department expects its employees to weigh between 150 and 250 pounds.
Explanation:
Let X be the random variable denoting the weight of the employees.
X=\big\{150, 150.1, 150.2, ..........,250 \big\}X={150,150.1,150.2,..........,250}
X can be any value in the interval [150,250]
Probability Mass Function

Probability Distribution
List of all the possible outcomes of a random variable, along with their
corresponding probability values.

When a probability function is used to describe a discrete probability
distribution, it is called Probability Mass Function (PMF).
Definition
Let X be a discrete random variable with range R_X=\big\{x_1,x_2,x_3.....\big\}R
X
={x
1
,x
2
,x
3
.....} (finite or countably infinite)
The function
P_X(x)= \begin{bmatrix} P(X=x) & x \epsilon R_X\\ \\ 0 & Otherwise \end{bmatrix}P

X
(x)=
⎣
⎡
P(X=x)
xϵR
X
Otherwise
⎦
⎤
is called the probability mass function (PMF) of X.
Properties of Probability Mass Function

Properties
By definition, PMF is a probability measure which satisfies all the properties of
probability, and specific properties of PMF.
0\leq P_X(x) \leq 10≤P

X
(x)≤1 for all xx
\sum_{x \in R_X} P_X(x)=1∑
x∈R
X
P
X
(x)=1
For any set, A \subset R_X, P(X \in A)=\sum_{x \in A} P_X(x)A⊂R
X
,P(X∈A)=∑
x∈A
P
X
(x)
Cumulative Distribution Function

Cumulative Distribution Function
Discrete Random Variable
The Cumulative Distribution Function (CDF) of a Discrete random variable
{\displaystyle X}X, is the function given by
F_X(x)=P(X \leq x)F

X
(x)=P(X≤x)
Continuous Random Variable

The CDF of a continuous random variable {\displaystyle X}X can be expressed as the
integral of its probability density function {\displaystyle f_{X}}f
X
:
F_X(x)=\int_{-\infty}^{b} f_x(t)\:dtF
X
(x)=∫
−∞
b
f
x
(t)dt
b is any positive integer greater than 0
Probability Distribution Function (PDF)

When a probability function is used to describe a discrete probability
distribution, it is called PMF.
A continuous form of PMF is PDF.
Definition
If X is the continuous random variable,
The PDF of XX is denoted by:
{P(a \le X \le b) = \int_a^b f(x) d_x}P(a≤X≤b)=∫

a
b
f(x)d
x
where,
[a,b][a,b] : interval in which X lies
{P(a \le X \le b)}P(a≤X≤b) : probability that some value x lies within this
interval
{d_x}=b-ad
x
=b−a
Properties of PDF
Property 1
Consider a continuous random variable X which has a value in the range [a,b].
To find the area covered by the curve in the specified interval, the following
formula is utilized:
P(X)=\int_a^b f(x) \: dxP(X)=∫

a
b
f(x)dx
Property 2
Probability Density Function value is always positive.
\forall x , f(x) \geq 0∀x,f(x)≥0
Property 3
The area covered by the density curve is equal to 1.
\int_{-\infty}^{\infty} f(x) \:dx =1∫

−∞
∞
f(x)dx=1
Central Limit Theorem

The Central Limit Theorem (CLT) states that the sampling distribution of a sample
mean approaches a normal distribution, as the sample size gets larger.
Consider a population with
Mean \muμ
Standard Deviation \sigmaσ
take sufficiently large random samples from the population with replacement, then
the distribution of the sample mean will be approximately normally distributed.
This will hold true regardless of whether the source population is normal or
skewed, provided the sample size is sufficiently large (usually n > 30).
Expected Value - Discrete

The expected value is calculated by multiplying each of the possible outcomes in
the sample space with the likelihood of their occurrence, and then summing up all
the values.
Discrete
Let X be a discrete random variable with range R_X=\{x_1,x_2,x_3,...\}R

X
={x
1
,x
2
,x
3
,...} (finite). The expected value of a discrete random variable X can be obtained
as:
E[X]=\sum_{x_k \in R_X} x_k P(X=x_k)E[X]=∑

x
k
∈R
X
x
k
P(X=x
k
)
k=1,2,3..k=1,2,3..
Notations
EX=E[X]=E(X)=\mu_XEX=E[X]=E(X)=μ
X
Expected Value - Continuous

Definition
Let X be a continuous random variable with range (-\infty,\infty)(−∞,∞). The
expected value of a continuous random variable X can be obtained as:
E[X]= \int_{-\infty}^{\infty} xf_X(x)dxE[X]=∫

−∞
∞
xf
X
(x)dx
Variance
The variance of any random variable can be obtained by using:
Var(X)=E\big[(X-\mu_X)^2\big]=E[X^2]-(E[X])^2Var(X)=E[(X−μ
X
)
2
]=E[X
2
]−(E[X])
2
\:\:
Continuous
The variance of any continuous random variable can be obtained by using:
Var(X)=\int_{-\infty}^{\infty} x^2 f_X(x)dx- \big(\int_{-\infty}^{\infty} x

f_X(x)dx\big)^2Var(X)=∫
−∞
∞
x
2
f
X
(x)dx−(∫
−∞
∞
xf
X
(x)dx)
2
Discrete
The variance of any discrete random variable can be obtained by using:
Var(X)=\sum x^2 P(X=x)-\big(\sum x\:P(X=x)\big)^2Var(X)=∑x

2
P(X=x)−(∑xP(X=x))
2
Variance
The variance of any random variable can be obtained by using:
Var(X)=E\big[(X-\mu_X)^2\big]=E[X^2]-(E[X])^2Var(X)=E[(X−μ
X
)
2
]=E[X
2
]−(E[X])
2
\:\:
Magic Formula
\forall a,b \:\epsilon\: \mathbb{R}∀a,bϵR
Variance
Var(aX+b)=a^2 \:Var(X)Var(aX+b)=a
2
Var(X)
Expected Value
E[aX+b]=a\:E[X]+bE[aX+b]=aE[X]+b
Discrete Distribution Function

The following are the most commonly used Discrete Distribution Functions:
Binomial probability distribution

Poisson probability distribution
Hypergeometric probability distribution
Multinomial probability distribution
Negative binomial distribution
Properties of Binomial Experiment

The experiment consists of n repeated trials.
Each trial can result in either of 2 outcomes only.
The probability of success, denoted by P, is the same for every trial.
The trials are independent.
Notation
x: The number of successes that result from the binomial experiment.
n: The number of trials.
P: The probability of success of an individual trial.
Q: The probability of failure of an individual trial (= 1 - P).
Binomial Distribution
Binomial Distribution is defined as:
The frequency distribution of the possible number of successful outcomes in a given

number of trials, in each of which there is the same probability of success.
Formula
If a binomial experiment consists of n trials, and results in x successes, and if
the probability of success on an individual trial is P, the binomial probability
is:
b(x;n,P)= \begin{bmatrix} n \\ x \end{bmatrix}P^x (1-P)^{n-x}b(x;n,P)=[

n
x
]P
x
(1−P)
n−x
Mean and Variance
Mean(\muμ) = np
Variance (\sigma^2σ
2
) = npq
Binomial Example
Question:
80 % of people who purchase pet insurance are women. If 9 pet insurance owners are
randomly selected, find the probability that precisely 6 are women.
Solution:
#n=9
#p=0.80
#k=6
probability=stats.binom.pmf(6,9,0.80)
print(probability)
Result:
0.17616076799999994
Properties of Poisson Experiment

Results in outcomes that can be classified as successes or failures.
The average number of successes (μ) that occurs in a specified region is known.
Poisson Notations
e: A constant which has a value that is approximately 2.71828.
μ: The mean number of successes that occur in a specified region.
x: The actual number of successes that occur in a specified region.
Poisson Distribution Function

A Poisson distribution function calculates the number of events occuring in a
specific period of time, when given the average number of times the event occurs in
that time span.
A Poisson random variable denotes the number of successes that result from a
Poisson experiment. The probability distribution of a Poisson random variable is
called a Poisson distribution function.
Formula
Suppose a Poisson experiment is conducted in which the average number of successes
within a given region is μ, then, the Poisson probability is:
P(x;\mu)=\dfrac{e^{-\mu}\:\mu^x}{x!}P(x;μ)=
x!
e
−μ
μ
x
Poisson - Example
Question:
If the number of vehicles that pass through a junction on a busy road is at an

average rate of 300 per hour, find the probability that no vehicle passes in a
given minute.
Python Code

averagepass=300/60
probability=stats.poisson.pmf(0, averagepass)
print(probability)
Result
0.006737946999085467
Continuous Distribution Function

The following are the most commonly used Continuous Distribution Functions:
Normal distribution
Uniform distribution
Chi-squared distribution
F distribution
t distribution
Naive Bayes Theorem

Mathematically, the Bayes theorem is:
P(y|X)=\dfrac{P(X|y) \:P(y)}{P(X)}P(y∣X)=
P(X)
P(X∣y)P(y)
where,
P(y|X)P(y∣X): Posterior probability of class y (target) given predictor

X(attribute).
P(y)P(y): Prior probability of class.
P(X)P(X): Prior probability of predictor.
P(X|y)P(X∣y): Probability of the predictor given class.
Other Notations
P(y)P(y) is also called Class Probability.
P(X | \:y)P(X∣y) is called Conditional Probability.
Fundamental
The fundamental assumption is that each feature makes an:
Independent
Equal
contribution to the outcome.
Note
The Bayes classifier assumes that the effect made by one predictor value on a given
class variable is independent of the other predictor variable values. This
assumption is called Conditional Independence.
Expansion
X: Predictor Variable X=\{x_1,x_2,x_3.....x_n\}X={x

1
,x
2
,x
3
.....x
n
}
y: Target Variable (class variable)
The equation can be expanded as:
P(y|x_1,x_2...x_n)=\dfrac{P(x_1|y)\:P(x_2|y)......P(x_n|y) \:P(y)}{P(X)}P(y∣x
1
,x
2
...x
n
)=
P(X)
P(x
1
∣y)P(x
2
∣y)......P(x
n
∣y)P(y)
Types of Naive Bayes

Multinomial Naive Bayes
Used in Document Classification Problem.

Predictors are the frequency of words present in a specific document.
Has multiple class variables.
Bernoulli Naive Bayes
Predictors are the boolean Variables, either Yes or No.

Gaussian Naive Bayes
Predictors takes only continuous values, not discrete.

Values are sampled from Gaussian (Normal) Distribution.
Hypothesis Testing
Statistical Hypothesis
Assumption about a population parameter.
Hypothesis Testing
Formal procedures used by statisticians to decide whether to Reject Null Hypothesis

or Fail to Reject Null Hypothesis.
Hypothesis testing involves testing an assumption to check whether the inference

drawn from the sample data is true for the entire population.
Types of Hypotheses
Types of Statistical Hypotheses
Null Hypothesis
Denoted by H_0H
0
States that there is no significant difference between a set of variables.

Alternative Hypothesis
Denoted by H_1H
1
or H_aH
a
States that there is a significant difference between a set of variables.
Types of Tests
One-tailed
Region of rejection is only on one side of the sampling distribution.
Two-tailed
Region of rejection is on both sides of the sampling distribution.
Decision Rules
Decision rules are formulated during the analysis phase to reject the null
hypothesis.
Methods of Formulation
P-Value
Region of Acceptance
Decision Errors
The process of hypotheses testing may result in errors.
Types of Errors
Type 1 Error
Type 2 Error
Decision Errors
Type 1 Error
Occurs when a researcher rejects a null hypothesis when it is true.
Significance level- Probability of committing a Type 1 error.
Denoted by \alphaα
Type 2 Error
Occurs when a researcher fails to reject a null hypothesis that is false.
Power of test- The probability of not committing a Type 2 error.
Denoted by \betaβ
Decision Rules
P-Value
The strength of evidence in support of a null hypothesis is measured by the P-
Value.
Reject a null hypothesis if,
P-value << Significance level
Region of Acceptance
Range of values.
If the test statistic falls within the range, the null hypothesis is not rejected.
Applications
Applications of the General Hypothesis Testing procedure:
Mean
Difference between means
Difference between paired means
Goodness of fit
Homogeneity
Independence
Proportions
Difference between proportions
Regression slope
Mean
Hypothesis
The following table shows the set of Hypothesis statements:
Each makes a statement about how the population mean \muμ is related to a specified
pre-defined value M.
Null Hypothesis Alternative Hypotheis Tail

\muμ = M \mu \neqμ≠ M 2
\mu \geq\:μ≥M \muμ < M 1
\mu \leqμ≤ M \muμ> M 1
For more details, refer to link.
Difference between Means
Hypothesis
Each makes a statement about the difference d between the mean of one population
\mu_1μ
1
, and the mean of another population \mu_2μ
2
.

\mu_1-\mu_2μ
1
−μ
2
= d \mu_1-\mu_2\neq\:μ
1
−μ
2
≠ d 2
\mu_1-\mu_2 \geq\:μ
1
−μ
2
≥d \mu_1-\mu_2μ
1
−μ
2
< d 1
\mu_1-\mu_2\leqμ
1
−μ
2
≤ d \mu_1-\mu_2\:μ
1
−μ
2
> d 1
Difference Between Paired Means

Hypothesis
Each makes a statement about how the true difference in population values \mu_dμ
d
is related to a hypothesized value D.

\mu_dμ
d
= D \mu_d \neqμ
d
≠ D 2
\mu_d \geq\:μ
d
≥D \mu_dμ
d
< D 1
\mu _d\leqμ
d
≤ D \mu_dμ
d
> D 1
Reference
The following links will help you in getting in-depth knowledge of Hypothesis
Testing:
Goodness of fit
Homogeneity
Independence
Regression Slope
Proportion
Difference between Proportion
Chi-squared Test
The Chi-squared test helps in determining whether there is a significant difference
between expected frequencies and observed frequencies, in one or more categories.
Represented by \chi^2χ
2
Chi-squared Statistics
Chi-squared Statistics is represented by:
\chi^2_c=\sum\dfrac{(O_i-E_i)^2}{E_i}χ
c
2
=∑
E
i
(O
i
−E
i
)
2
Where,
c : Degrees of Freedom
O: Observed Values
E: Expected Values
Chi-squared Tests - Types

There are two types of Chi-squared tests:
Goodness of Fit
Test for Independence
Goodness of Fit
Statistical hypothesis test to see how well sample data fits a distribution from a
population with normal distribution.
Determines how well the sample represents the population.
Test for Independence
Determines if category variables are related or not.
Chi-squared Statistic Value
Smaller statistic value: There is a relationship between category variables.

Larger statistic value: There is no relationship between category variables.
CHi square
Coding
The following code shows the steps to test hypotheses by using Chi-squared Test:
from scipy.stats import chi2_contingency

from scipy.stats import chi2
table =[[30, 10], [15, 25], [15, 5]]
stat,p,dof,expected = chi2_contingency(table)
prob = 0.95
critical = chi2.ppf(prob, dof)
if abs(stat) >= critical:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')
Result
Output
[[30, 10], [15, 25], [15, 5]]

-----------------------
Expected Values
[[24. 16.]
[24. 16.]
[12. 8.]]
-------------------------------------
Chi-Square Statistic = 14.062
Degree of Freedom =2
P value = 0.001
-------------------------------------
Significance level =0.050, P-Value=0.001
Dependent : Reject the null Hypothesis
Course Summary
The following is a summary of the topics covered in this course:
Statistics
Probability and Essentials
Rules of Probability
Random Variables
Expected Value and Variance
Distribution Function
Hypothesis Testing

Probability and Statistics 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability and Statistics 3

Uploaded by

Copyright:

Available Formats

from scipy import stats

Probability (P)=\dfrac{No. of Favorable Outcomes}{Total no. of

S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}

E= Getting at least one heads

E={HHH, HHT, HTH, THH, HTT, THT,TTH}

P (getting one heads)= \dfrac{7}{8}

Probability value lies between 0 and 1.

0 \leq Probability \leq 10≤Probability≤1

The probability of an impossible event is 0.

The probability of a confirmed event is 1.

The sum of the probabilities of an event, and its complementary is 1.

P(A) + P(\overline{A}) =1P(A)+P(

Joint Probability - Example

P(A\: \small{and}\:B)P(AandB) or P(A \cap B)P(A∩B)

P(Six\: {\footnotesize and} \:Red)P(SixandRed) = \dfrac{2}{52}

The probability of an event ( A ), given that another event ( B ) has already

P(A|B)P(A∣B) or P(A, given B)

P(Six | Red) = \dfrac{2}{26}=\dfrac{1}{13}P(Six∣Red)=

P(A\cup B) = P(A) + P(B)P(A∪B)=P(A)+P(B)

Non-mutually Exclusive Events

If A and B are non-mutually exclusive events,

P(A\cup B) = P(A) + P(B) - P(A\cap B)P(A∪B)

If A and B are two Independent Events,

P(A \cap B) = P(A) P(B)P(A∩B)=P(A)P(B)

If A and B are two Dependent Events,

P(A \cap B) = P(A) P\Big(\dfrac {B}{A}\Big)P(A∩B)=P(A)P(

From Product rule

P(X \cap Y) = P(X|Y)P(Y)P(X∩Y)=P(X∣Y)P(Y)

P(Y \cap X) = P(Y|X)P(X)P(Y∩X)=P(Y∣X)P(X)

P(X) = P(X \cap Y) + P(X \cap Y^{c})P(X)=P(X∩Y)+P(X∩Y

Assume that the values are:

Types of Random Variables

A variable that can take only countable number of values.

X=\big\{0, \infty \big\}X={0,∞}

Types of Random Variables

Let X be the random variable denoting the weight of the employees.

X=\big\{150, 150.1, 150.2, ..........,250 \big\}X={150,150.1,150.2,..........,250}

X can be any value in the interval [150,250]

Probability Mass Function

Probability Mass Function

P_X(x)= \begin{bmatrix} P(X=x) & x \epsilon R_X\\ \\ 0 & Otherwise \end{bmatrix}P

is called the probability mass function (PMF) of X.

Properties of Probability Mass Function

0\leq P_X(x) \leq 10≤P

Cumulative Distribution Function

F_X(x)=P(X \leq x)F

Continuous Random Variable

b is any positive integer greater than 0

Probability Distribution Function (PDF)

A continuous form of PMF is PDF.

If X is the continuous random variable,

The PDF of XX is denoted by:

{P(a \le X \le b) = \int_a^b f(x) d_x}P(a≤X≤b)=∫

[a,b][a,b] : interval in which X lies

P(X)=\int_a^b f(x) \: dxP(X)=∫

Probability Density Function value is always positive.

\forall x , f(x) \geq 0∀x,f(x)≥0

The area covered by the density curve is equal to 1.

\int_{-\infty}^{\infty} f(x) \:dx =1∫

Central Limit Theorem

Consider a population with

Expected Value - Discrete

Let X be a discrete random variable with range R_X=\{x_1,x_2,x_3,...\}R