Professional Documents
Culture Documents
import numpy as np
import statistics
data ={"A": 90,"B": 86,"C":70,"D":95,"E":95,"F":95,"G":95}
values = list(data.values())
print("Mean")
print(np.mean(values))
print("Median")
print(np.median(values))
print("Mode")
print(statistics.mode(values))
print("Standard Deviation")
print(np.std(values))
print("Variance")
print(np.var(values))
print("range")
print(stats.iqr(values))```
Probability
Probability
The likelihood of the occurrence of an event is known as probability.
The value of probability of an event lies between 0 and 1.
Formula
No.ofFavorableOutcomes
=
n(S)
n(E)
Scenario
Consider a scenario where three coins are tossed simultaneously. What is the
probability of at least one being heads?
Solution
Sample Space
n(S)=8
Event
n(E)=7
Probability
7
Types of Events
Simple Event
Any event containing a single element of a sample space.
Compound Event
Any event containing two or more elements of a sample space.
Dependent Event
If the occurrence of an event is influenced by another event, it is called a
'Dependent Event'.
Independent Event
If the occurrence of an event is not influenced by another event, it is called an
'Independent Event'.
Exhaustive Events
A set of events devouring the entire sample space.
Mutually Exclusive
Two events are said to be mutually exclusive events when both cannot occur at the
same time.
Example:
While driving a car, the steering wheel cannot be turned left and right at the same
time. Therefore, the events turning left and turning right are considered to be
mutually exclusive.
Properties of Probability
Property 1
Property 2
P(\emptyset)=0P(∅)=0
Property 3
P(S)=1P(S)=1
Property 4
)=1
Types of Probability
Question:
In a deck of 52 cards, find the probability of a card that is red in color, and
contains the number 6.
Solution:
Explanation:
There are two red color Sixes are present in a deck of 52, the 6 of hearts and the
6 of diamonds.
Conditional Probability:
Question:
Given that you draw a red card from a deck of 52 cards, what is the probability
that it is a card with number 6?
Solution:
2
=
13
Explanation:
Among the 26 red cards, there are two cards with number 6. Therefore, it is
\frac{2}{26}
26
Addition Rule
Mutually Exclusive Events
If A and B are mutually exclusive events,
Multiplication Rule
Independent Events
Dependent Events
B
)
Bayes Theorem
From the multiplication and addition probability rules, Bayes theorem can be
formed.
then
P(Y|X) = \dfrac{P(X|Y)*P(Y)}{P(X)}P(Y∣X)=
P(X)
P(X∣Y)∗P(Y)
where
Random Variables
A random variable represents the outcome of statistical experiments on numerical
values.
Represented by X.
Example:
Consider the experiment of tossing a coin, where the result could either be heads
or tails.
Heads = 0
Tails = 1
X will be the random variable denoting the outcome of tossing the coin.
X= \big\{0,1 \big\}X={0,1}
Consider the experiment of tossing many coins, and count the number of times you
get heads.
Explanation
Let X be the random variable denoting the number of times you get heads.
The number of times you get heads can be any positive integer between (0, \infty)
(0,∞). It cannot be a continuous number (2.5 heads).
Continuous Random value can take any value within an interval \big\{0, \infty
\big\}{0,∞}.
It can take infinite number of values.
Example:
The Fire department expects its employees to weigh between 150 and 250 pounds.
Explanation:
Definition
Let X be a discrete random variable with range R_X=\big\{x_1,x_2,x_3.....\big\}R
X
={x
1
,x
2
,x
3
.....} (finite or countably infinite)
The function
P(X=x)
xϵR
X
Otherwise
⎦
⎤
P
X
(x)=1
For any set, A \subset R_X, P(X \in A)=\sum_{x \in A} P_X(x)A⊂R
X
,P(X∈A)=∑
x∈A
P
X
(x)
F_X(x)=\int_{-\infty}^{b} f_x(t)\:dtF
X
(x)=∫
−∞
b
f
x
(t)dt
Definition
where,
{P(a \le X \le b)}P(a≤X≤b) : probability that some value x lies within this
interval
{d_x}=b-ad
x
=b−a
Properties of PDF
Property 1
Consider a continuous random variable X which has a value in the range [a,b].
To find the area covered by the curve in the specified interval, the following
formula is utilized:
Property 2
Property 3
Mean \muμ
Standard Deviation \sigmaσ
take sufficiently large random samples from the population with replacement, then
the distribution of the sample mean will be approximately normally distributed.
This will hold true regardless of whether the source population is normal or
skewed, provided the sample size is sufficiently large (usually n > 30).
Discrete
x
k
P(X=x
k
)
k=1,2,3..k=1,2,3..
Notations
EX=E[X]=E(X)=\mu_XEX=E[X]=E(X)=μ
X
Variance
The variance of any random variable can be obtained by using:
Var(X)=E\big[(X-\mu_X)^2\big]=E[X^2]-(E[X])^2Var(X)=E[(X−μ
X
)
2
]=E[X
2
]−(E[X])
2
\:\:
Continuous
Discrete
Variance
The variance of any random variable can be obtained by using:
Var(X)=E\big[(X-\mu_X)^2\big]=E[X^2]-(E[X])^2Var(X)=E[(X−μ
X
)
2
]=E[X
2
]−(E[X])
2
\:\:
Magic Formula
\forall a,b \:\epsilon\: \mathbb{R}∀a,bϵR
Variance
Var(aX+b)=a^2 \:Var(X)Var(aX+b)=a
2
Var(X)
Expected Value
E[aX+b]=a\:E[X]+bE[aX+b]=aE[X]+b
Binomial Distribution
Binomial Distribution is defined as:
Formula
If a binomial experiment consists of n trials, and results in x successes, and if
the probability of success on an individual trial is P, the binomial probability
is:
Mean(\muμ) = np
Variance (\sigma^2σ
2
) = npq
Binomial Example
Question:
80 % of people who purchase pet insurance are women. If 9 pet insurance owners are
randomly selected, find the probability that precisely 6 are women.
Solution:
#n=9
#p=0.80
#k=6
from scipy import stats
probability=stats.binom.pmf(6,9,0.80)
print(probability)
Result:
0.17616076799999994
Poisson Notations
e: A constant which has a value that is approximately 2.71828.
μ: The mean number of successes that occur in a specified region.
A Poisson random variable denotes the number of successes that result from a
Poisson experiment. The probability distribution of a Poisson random variable is
called a Poisson distribution function.
Formula
Suppose a Poisson experiment is conducted in which the average number of successes
within a given region is μ, then, the Poisson probability is:
P(x;\mu)=\dfrac{e^{-\mu}\:\mu^x}{x!}P(x;μ)=
x!
e
−μ
μ
x
Poisson - Example
Question:
Python Code
0.006737946999085467
Normal distribution
Uniform distribution
Chi-squared distribution
F distribution
t distribution
P(y|X)=\dfrac{P(X|y) \:P(y)}{P(X)}P(y∣X)=
P(X)
P(X∣y)P(y)
where,
Other Notations
P(y)P(y) is also called Class Probability.
P(X | \:y)P(X∣y) is called Conditional Probability.
Fundamental
The fundamental assumption is that each feature makes an:
Independent
Equal
contribution to the outcome.
Note
The Bayes classifier assumes that the effect made by one predictor value on a given
class variable is independent of the other predictor variable values. This
assumption is called Conditional Independence.
Expansion
P(y|x_1,x_2...x_n)=\dfrac{P(x_1|y)\:P(x_2|y)......P(x_n|y) \:P(y)}{P(X)}P(y∣x
1
,x
2
...x
n
)=
P(X)
P(x
1
∣y)P(x
2
∣y)......P(x
n
∣y)P(y)
Hypothesis Testing
Statistical Hypothesis
Hypothesis Testing
Types of Hypotheses
Types of Statistical Hypotheses
Null Hypothesis
Denoted by H_0H
0
Denoted by H_1H
1
or H_aH
a
Types of Tests
One-tailed
Two-tailed
Decision Rules
Decision rules are formulated during the analysis phase to reject the null
hypothesis.
Methods of Formulation
P-Value
Region of Acceptance
Decision Errors
The process of hypotheses testing may result in errors.
Types of Errors
Type 1 Error
Type 2 Error
Decision Errors
Type 1 Error
Occurs when a researcher rejects a null hypothesis when it is true.
Significance level- Probability of committing a Type 1 error.
Denoted by \alphaα
Type 2 Error
Occurs when a researcher fails to reject a null hypothesis that is false.
Power of test- The probability of not committing a Type 2 error.
Denoted by \betaβ
Decision Rules
P-Value
The strength of evidence in support of a null hypothesis is measured by the P-
Value.
Region of Acceptance
Range of values.
If the test statistic falls within the range, the null hypothesis is not rejected.
Applications
Applications of the General Hypothesis Testing procedure:
Mean
Difference between means
Difference between paired means
Goodness of fit
Homogeneity
Independence
Proportions
Difference between proportions
Regression slope
Mean
Hypothesis
Each makes a statement about how the population mean \muμ is related to a specified
pre-defined value M.
Each makes a statement about the difference d between the mean of one population
\mu_1μ
1
, and the mean of another population \mu_2μ
2
.
Each makes a statement about how the true difference in population values \mu_dμ
d
is related to a hypothesized value D.
Reference
The following links will help you in getting in-depth knowledge of Hypothesis
Testing:
Goodness of fit
Homogeneity
Independence
Regression Slope
Proportion
Difference between Proportion
Chi-squared Test
The Chi-squared test helps in determining whether there is a significant difference
between expected frequencies and observed frequencies, in one or more categories.
Represented by \chi^2χ
2
Chi-squared Statistics
Chi-squared Statistics is represented by:
\chi^2_c=\sum\dfrac{(O_i-E_i)^2}{E_i}χ
c
2
=∑
E
i
(O
i
−E
i
)
2
Where,
c : Degrees of Freedom
O: Observed Values
E: Expected Values
Statistical hypothesis test to see how well sample data fits a distribution from a
population with normal distribution.
Determines how well the sample represents the population.
Test for Independence
CHi square
Coding
The following code shows the steps to test hypotheses by using Chi-squared Test:
Result
Output
Course Summary
The following is a summary of the topics covered in this course:
Statistics
Probability and Essentials
Rules of Probability
Random Variables
Expected Value and Variance
Distribution Function
Hypothesis Testing