You are on page 1of 18

from scipy import stats

import numpy as np
import statistics
data ={"A": 90,"B": 86,"C":70,"D":95,"E":95,"F":95,"G":95}
values = list(data.values())
print("Mean")
print(np.mean(values))
print("Median")
print(np.median(values))
print("Mode")
print(statistics.mode(values))
print("Standard Deviation")
print(np.std(values))
print("Variance")
print(np.var(values))
print("range")
print(stats.iqr(values))```

Probability
Probability
The likelihood of the occurrence of an event is known as probability.
The value of probability of an event lies between 0 and 1.
Formula

Probability (P)=\dfrac{No. of Favorable Outcomes}{Total no. of


outcomes}=\dfrac{n(E)}{n(S)}
Totalno.ofoutcomes

No.ofFavorableOutcomes
=
n(S)

n(E)

Scenario

Consider a scenario where three coins are tossed simultaneously. What is the
probability of at least one being heads?

Solution

Sample Space

S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}

n(S)=8
Event

E= Getting at least one heads

E={HHH, HHT, HTH, THH, HTT, THT,TTH}

n(E)=7
Probability

P (getting one heads)= \dfrac{7}{8}


8

7
Types of Events
Simple Event
Any event containing a single element of a sample space.

Compound Event
Any event containing two or more elements of a sample space.

Dependent Event
If the occurrence of an event is influenced by another event, it is called a
'Dependent Event'.

Independent Event
If the occurrence of an event is not influenced by another event, it is called an
'Independent Event'.

Exhaustive Events
A set of events devouring the entire sample space.

Mutually Exclusive
Two events are said to be mutually exclusive events when both cannot occur at the
same time.

Example:
While driving a car, the steering wheel cannot be turned left and right at the same
time. Therefore, the events turning left and turning right are considered to be
mutually exclusive.

Properties of Probability
Property 1

Probability value lies between 0 and 1.

0 \leq Probability \leq 10≤Probability≤1

Property 2

The probability of an impossible event is 0.

P(\emptyset)=0P(∅)=0

Property 3

The probability of a confirmed event is 1.

P(S)=1P(S)=1

Property 4

The sum of the probabilities of an event, and its complementary is 1.

P(A) + P(\overline{A}) =1P(A)+P(


A

)=1

Types of Probability

Joint Probability - Example


Joint Probability is a measure of two events happening at the same time, and can
only be applied to situations where more than one observation can occur at the same
time.

P(A\: \small{and}\:B)P(AandB) or P(A \cap B)P(A∩B)

Question:

In a deck of 52 cards, find the probability of a card that is red in color, and
contains the number 6.

Solution:

P(Six\: {\footnotesize and} \:Red)P(SixandRed) = \dfrac{2}{52}


52

Explanation:

There are two red color Sixes are present in a deck of 52, the 6 of hearts and the
6 of diamonds.

Conditional Probability:

The probability of an event ( A ), given that another event ( B ) has already


occurred.

P(A|B)P(A∣B) or P(A, given B)

Question:

Given that you draw a red card from a deck of 52 cards, what is the probability
that it is a card with number 6?

Solution:

P(Six | Red) = \dfrac{2}{26}=\dfrac{1}{13}P(Six∣Red)=


26

2
=
13

Explanation:

Among the 26 red cards, there are two cards with number 6. Therefore, it is
\frac{2}{26}
26

Addition Rule
Mutually Exclusive Events
If A and B are mutually exclusive events,

P(A\cup B) = P(A) + P(B)P(A∪B)=P(A)+P(B)

Non-mutually Exclusive Events

If A and B are non-mutually exclusive events,

P(A\cup B) = P(A) + P(B) - P(A\cap B)P(A∪B)


= P(A)+P(B)−P(A∩B)

Multiplication Rule
Independent Events

If A and B are two Independent Events,

P(A \cap B) = P(A) P(B)P(A∩B)=P(A)P(B)

Dependent Events

If A and B are two Dependent Events,

P(A \cap B) = P(A) P\Big(\dfrac {B}{A}\Big)P(A∩B)=P(A)P(


A

B
)

Bayes Theorem
From the multiplication and addition probability rules, Bayes theorem can be
formed.

From Product rule

P(X \cap Y) = P(X|Y)P(Y)P(X∩Y)=P(X∣Y)P(Y)

P(Y \cap X) = P(Y|X)P(X)P(Y∩X)=P(Y∣X)P(X)

then

P(Y|X) = \dfrac{P(X|Y)*P(Y)}{P(X)}P(Y∣X)=
P(X)

P(X∣Y)∗P(Y)

where

P(X) = P(X \cap Y) + P(X \cap Y^{c})P(X)=P(X∩Y)+P(X∩Y


c
) (Addition Rule)

Random Variables
A random variable represents the outcome of statistical experiments on numerical
values.

Represented by X.

Example:
Consider the experiment of tossing a coin, where the result could either be heads
or tails.

Assume that the values are:

Heads = 0
Tails = 1
X will be the random variable denoting the outcome of tossing the coin.

X= \big\{0,1 \big\}X={0,1}

Types of Random Variables


Discrete Random Variables

A variable that can take only countable number of values.


Example:

Consider the experiment of tossing many coins, and count the number of times you
get heads.

Explanation

Let X be the random variable denoting the number of times you get heads.

X=\big\{0, \infty \big\}X={0,∞}

The number of times you get heads can be any positive integer between (0, \infty)
(0,∞). It cannot be a continuous number (2.5 heads).

Types of Random Variables


Continuous Random Variable

Continuous Random value can take any value within an interval \big\{0, \infty
\big\}{0,∞}.
It can take infinite number of values.
Example:

The Fire department expects its employees to weigh between 150 and 250 pounds.

Explanation:

Let X be the random variable denoting the weight of the employees.

X=\big\{150, 150.1, 150.2, ..........,250 \big\}X={150,150.1,150.2,..........,250}

X can be any value in the interval [150,250]

Probability Mass Function


Probability Mass Function
Probability Distribution
List of all the possible outcomes of a random variable, along with their
corresponding probability values.

Probability Mass Function


When a probability function is used to describe a discrete probability
distribution, it is called Probability Mass Function (PMF).

Definition
Let X be a discrete random variable with range R_X=\big\{x_1,x_2,x_3.....\big\}R
X
={x
1
,x
2
,x
3
.....} (finite or countably infinite)

The function

P_X(x)= \begin{bmatrix} P(X=x) & x \epsilon R_X\\ \\ 0 & Otherwise \end{bmatrix}P


X
(x)=

P(X=x)

xϵR
X

Otherwise


is called the probability mass function (PMF) of X.

Properties of Probability Mass Function


Properties
By definition, PMF is a probability measure which satisfies all the properties of
probability, and specific properties of PMF.

0\leq P_X(x) \leq 10≤P


X
(x)≤1 for all xx
\sum_{x \in R_X} P_X(x)=1∑
x∈R
X

P
X
(x)=1
For any set, A \subset R_X, P(X \in A)=\sum_{x \in A} P_X(x)A⊂R
X
,P(X∈A)=∑
x∈A
P
X
(x)

Cumulative Distribution Function


Cumulative Distribution Function
Discrete Random Variable
The Cumulative Distribution Function (CDF) of a Discrete random variable
{\displaystyle X}X, is the function given by

F_X(x)=P(X \leq x)F


X
(x)=P(X≤x)

Continuous Random Variable


The CDF of a continuous random variable {\displaystyle X}X can be expressed as the
integral of its probability density function {\displaystyle f_{X}}f
X
:

F_X(x)=\int_{-\infty}^{b} f_x(t)\:dtF
X
(x)=∫
−∞
b
f
x
(t)dt

b is any positive integer greater than 0

Probability Distribution Function (PDF)


When a probability function is used to describe a discrete probability
distribution, it is called PMF.

A continuous form of PMF is PDF.

Definition

If X is the continuous random variable,

The PDF of XX is denoted by:

{P(a \le X \le b) = \int_a^b f(x) d_x}P(a≤X≤b)=∫


a
b
f(x)d
x

where,

[a,b][a,b] : interval in which X lies

{P(a \le X \le b)}P(a≤X≤b) : probability that some value x lies within this
interval

{d_x}=b-ad
x
=b−a

Properties of PDF
Property 1

Consider a continuous random variable X which has a value in the range [a,b].
To find the area covered by the curve in the specified interval, the following
formula is utilized:

P(X)=\int_a^b f(x) \: dxP(X)=∫


a
b
f(x)dx

Property 2

Probability Density Function value is always positive.

\forall x , f(x) \geq 0∀x,f(x)≥0

Property 3

The area covered by the density curve is equal to 1.

\int_{-\infty}^{\infty} f(x) \:dx =1∫


−∞

f(x)dx=1

Central Limit Theorem


The Central Limit Theorem (CLT) states that the sampling distribution of a sample
mean approaches a normal distribution, as the sample size gets larger.

Consider a population with

Mean \muμ
Standard Deviation \sigmaσ
take sufficiently large random samples from the population with replacement, then
the distribution of the sample mean will be approximately normally distributed.

This will hold true regardless of whether the source population is normal or
skewed, provided the sample size is sufficiently large (usually n > 30).

Expected Value - Discrete


The expected value is calculated by multiplying each of the possible outcomes in
the sample space with the likelihood of their occurrence, and then summing up all
the values.

Discrete

Let X be a discrete random variable with range R_X=\{x_1,x_2,x_3,...\}R


X
={x
1
,x
2
,x
3
,...} (finite). The expected value of a discrete random variable X can be obtained
as:

E[X]=\sum_{x_k \in R_X} x_k P(X=x_k)E[X]=∑


x
k
∈R
X

x
k
P(X=x
k
)

k=1,2,3..k=1,2,3..

Notations

EX=E[X]=E(X)=\mu_XEX=E[X]=E(X)=μ
X

Expected Value - Continuous


Definition
Let X be a continuous random variable with range (-\infty,\infty)(−∞,∞). The
expected value of a continuous random variable X can be obtained as:

E[X]= \int_{-\infty}^{\infty} xf_X(x)dxE[X]=∫


−∞

xf
X
(x)dx

Variance
The variance of any random variable can be obtained by using:

Var(X)=E\big[(X-\mu_X)^2\big]=E[X^2]-(E[X])^2Var(X)=E[(X−μ
X
)
2
]=E[X
2
]−(E[X])
2

\:\:

Continuous

The variance of any continuous random variable can be obtained by using:

Var(X)=\int_{-\infty}^{\infty} x^2 f_X(x)dx- \big(\int_{-\infty}^{\infty} x


f_X(x)dx\big)^2Var(X)=∫
−∞

x
2
f
X
(x)dx−(∫
−∞

xf
X
(x)dx)
2

Discrete

The variance of any discrete random variable can be obtained by using:

Var(X)=\sum x^2 P(X=x)-\big(\sum x\:P(X=x)\big)^2Var(X)=∑x


2
P(X=x)−(∑xP(X=x))
2

Variance
The variance of any random variable can be obtained by using:

Var(X)=E\big[(X-\mu_X)^2\big]=E[X^2]-(E[X])^2Var(X)=E[(X−μ
X
)
2
]=E[X
2
]−(E[X])
2

\:\:

Magic Formula
\forall a,b \:\epsilon\: \mathbb{R}∀a,bϵR

Variance

Var(aX+b)=a^2 \:Var(X)Var(aX+b)=a
2
Var(X)

Expected Value

E[aX+b]=a\:E[X]+bE[aX+b]=aE[X]+b

Discrete Distribution Function


The following are the most commonly used Discrete Distribution Functions:

Binomial probability distribution


Poisson probability distribution
Hypergeometric probability distribution
Multinomial probability distribution
Negative binomial distribution

Properties of Binomial Experiment


The experiment consists of n repeated trials.
Each trial can result in either of 2 outcomes only.
The probability of success, denoted by P, is the same for every trial.
The trials are independent.
Notation
x: The number of successes that result from the binomial experiment.
n: The number of trials.

P: The probability of success of an individual trial.

Q: The probability of failure of an individual trial (= 1 - P).

Binomial Distribution
Binomial Distribution is defined as:

The frequency distribution of the possible number of successful outcomes in a given


number of trials, in each of which there is the same probability of success.

Formula
If a binomial experiment consists of n trials, and results in x successes, and if
the probability of success on an individual trial is P, the binomial probability
is:

b(x;n,P)= \begin{bmatrix} n \\ x \end{bmatrix}P^x (1-P)^{n-x}b(x;n,P)=[


n
x
]P
x
(1−P)
n−x

Mean and Variance

Mean(\muμ) = np

Variance (\sigma^2σ
2
) = npq

Binomial Example
Question:

80 % of people who purchase pet insurance are women. If 9 pet insurance owners are
randomly selected, find the probability that precisely 6 are women.

Solution:

#n=9
#p=0.80
#k=6
from scipy import stats
probability=stats.binom.pmf(6,9,0.80)
print(probability)
Result:

0.17616076799999994

Properties of Poisson Experiment


Results in outcomes that can be classified as successes or failures.
The average number of successes (μ) that occurs in a specified region is known.

Poisson Notations
e: A constant which has a value that is approximately 2.71828.
μ: The mean number of successes that occur in a specified region.

x: The actual number of successes that occur in a specified region.

Poisson Distribution Function


A Poisson distribution function calculates the number of events occuring in a
specific period of time, when given the average number of times the event occurs in
that time span.

A Poisson random variable denotes the number of successes that result from a
Poisson experiment. The probability distribution of a Poisson random variable is
called a Poisson distribution function.

Formula
Suppose a Poisson experiment is conducted in which the average number of successes
within a given region is μ, then, the Poisson probability is:

P(x;\mu)=\dfrac{e^{-\mu}\:\mu^x}{x!}P(x;μ)=
x!

e
−μ
μ
x

Poisson - Example
Question:

If the number of vehicles that pass through a junction on a busy road is at an


average rate of 300 per hour, find the probability that no vehicle passes in a
given minute.

Python Code

from scipy import stats


averagepass=300/60
probability=stats.poisson.pmf(0, averagepass)
print(probability)
Result

0.006737946999085467

Continuous Distribution Function


The following are the most commonly used Continuous Distribution Functions:

Normal distribution
Uniform distribution
Chi-squared distribution
F distribution
t distribution

Naive Bayes Theorem


Mathematically, the Bayes theorem is:

P(y|X)=\dfrac{P(X|y) \:P(y)}{P(X)}P(y∣X)=
P(X)

P(X∣y)P(y)
where,

P(y|X)P(y∣X): Posterior probability of class y (target) given predictor


X(attribute).
P(y)P(y): Prior probability of class.
P(X)P(X): Prior probability of predictor.
P(X|y)P(X∣y): Probability of the predictor given class.

Other Notations
P(y)P(y) is also called Class Probability.
P(X | \:y)P(X∣y) is called Conditional Probability.

Fundamental
The fundamental assumption is that each feature makes an:

Independent
Equal
contribution to the outcome.

Note
The Bayes classifier assumes that the effect made by one predictor value on a given
class variable is independent of the other predictor variable values. This
assumption is called Conditional Independence.

Expansion

X: Predictor Variable X=\{x_1,x_2,x_3.....x_n\}X={x


1
,x
2
,x
3
.....x
n
}

y: Target Variable (class variable)

The equation can be expanded as:

P(y|x_1,x_2...x_n)=\dfrac{P(x_1|y)\:P(x_2|y)......P(x_n|y) \:P(y)}{P(X)}P(y∣x
1
,x
2
...x
n
)=
P(X)

P(x
1
∣y)P(x
2
∣y)......P(x
n
∣y)P(y)

Types of Naive Bayes


Multinomial Naive Bayes

Used in Document Classification Problem.


Predictors are the frequency of words present in a specific document.
Has multiple class variables.
Bernoulli Naive Bayes

Predictors are the boolean Variables, either Yes or No.


Gaussian Naive Bayes

Predictors takes only continuous values, not discrete.


Values are sampled from Gaussian (Normal) Distribution.

Hypothesis Testing
Statistical Hypothesis

Assumption about a population parameter.

Hypothesis Testing

Formal procedures used by statisticians to decide whether to Reject Null Hypothesis


or Fail to Reject Null Hypothesis.

Hypothesis testing involves testing an assumption to check whether the inference


drawn from the sample data is true for the entire population.

Types of Hypotheses
Types of Statistical Hypotheses
Null Hypothesis

Denoted by H_0H
0

States that there is no significant difference between a set of variables.


Alternative Hypothesis

Denoted by H_1H
1
or H_aH
a

States that there is a significant difference between a set of variables.

Types of Tests
One-tailed

Region of rejection is only on one side of the sampling distribution.

Two-tailed

Region of rejection is on both sides of the sampling distribution.

Decision Rules
Decision rules are formulated during the analysis phase to reject the null
hypothesis.

Methods of Formulation

P-Value
Region of Acceptance

Decision Errors
The process of hypotheses testing may result in errors.

Types of Errors
Type 1 Error
Type 2 Error

Decision Errors
Type 1 Error
Occurs when a researcher rejects a null hypothesis when it is true.
Significance level- Probability of committing a Type 1 error.
Denoted by \alphaα
Type 2 Error
Occurs when a researcher fails to reject a null hypothesis that is false.
Power of test- The probability of not committing a Type 2 error.
Denoted by \betaβ

Decision Rules
P-Value
The strength of evidence in support of a null hypothesis is measured by the P-
Value.

Reject a null hypothesis if,

P-value << Significance level

Region of Acceptance
Range of values.
If the test statistic falls within the range, the null hypothesis is not rejected.

Applications
Applications of the General Hypothesis Testing procedure:

Mean
Difference between means
Difference between paired means
Goodness of fit
Homogeneity
Independence
Proportions
Difference between proportions
Regression slope

Mean
Hypothesis

The following table shows the set of Hypothesis statements:

Each makes a statement about how the population mean \muμ is related to a specified
pre-defined value M.

Null Hypothesis Alternative Hypotheis Tail


\muμ = M \mu \neqμ≠ M 2
\mu \geq\:μ≥M \muμ < M 1
\mu \leqμ≤ M \muμ> M 1
For more details, refer to link.
Difference between Means
Hypothesis

The following table shows the set of Hypothesis statements:

Each makes a statement about the difference d between the mean of one population
\mu_1μ
1
, and the mean of another population \mu_2μ
2
.

Null Hypothesis Alternative Hypotheis Tail


\mu_1-\mu_2μ
1
−μ
2
= d \mu_1-\mu_2\neq\:μ
1
−μ
2
≠ d 2
\mu_1-\mu_2 \geq\:μ
1
−μ
2
≥d \mu_1-\mu_2μ
1
−μ
2
< d 1
\mu_1-\mu_2\leqμ
1
−μ
2
≤ d \mu_1-\mu_2\:μ
1
−μ
2
> d 1
For more details, refer to link.

Difference Between Paired Means


Hypothesis

The following table shows the set of Hypothesis statements:

Each makes a statement about how the true difference in population values \mu_dμ
d
is related to a hypothesized value D.

Null Hypothesis Alternative Hypotheis Tail


\mu_dμ
d
= D \mu_d \neqμ
d
≠ D 2
\mu_d \geq\:μ
d
≥D \mu_dμ
d
< D 1
\mu _d\leqμ
d
≤ D \mu_dμ
d
> D 1
For more details, refer to link.

Reference
The following links will help you in getting in-depth knowledge of Hypothesis
Testing:

Goodness of fit
Homogeneity
Independence
Regression Slope
Proportion
Difference between Proportion

Chi-squared Test
The Chi-squared test helps in determining whether there is a significant difference
between expected frequencies and observed frequencies, in one or more categories.
Represented by \chi^2χ
2

Chi-squared Statistics
Chi-squared Statistics is represented by:

\chi^2_c=\sum\dfrac{(O_i-E_i)^2}{E_i}χ
c
2
=∑
E
i

(O
i
−E
i
)
2

Where,

c : Degrees of Freedom

O: Observed Values

E: Expected Values

Chi-squared Tests - Types


There are two types of Chi-squared tests:
Goodness of Fit
Test for Independence
Goodness of Fit

Statistical hypothesis test to see how well sample data fits a distribution from a
population with normal distribution.
Determines how well the sample represents the population.
Test for Independence

Determines if category variables are related or not.

Chi-squared Statistic Value

Smaller statistic value: There is a relationship between category variables.


Larger statistic value: There is no relationship between category variables.

CHi square
Coding
The following code shows the steps to test hypotheses by using Chi-squared Test:

from scipy.stats import chi2_contingency


from scipy.stats import chi2
table =[[30, 10], [15, 25], [15, 5]]
stat,p,dof,expected = chi2_contingency(table)
prob = 0.95
critical = chi2.ppf(prob, dof)
if abs(stat) >= critical:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')

Result
Output

[[30, 10], [15, 25], [15, 5]]


-----------------------
Expected Values
[[24. 16.]
[24. 16.]
[12. 8.]]
-------------------------------------
Chi-Square Statistic = 14.062
Degree of Freedom =2
P value = 0.001
-------------------------------------
Significance level =0.050, P-Value=0.001
Dependent : Reject the null Hypothesis

Course Summary
The following is a summary of the topics covered in this course:

Statistics
Probability and Essentials
Rules of Probability
Random Variables
Expected Value and Variance
Distribution Function
Hypothesis Testing

You might also like