Professional Documents
Culture Documents
Stats Formula Sheet
Stats Formula Sheet
Conditional Probability
Derivation of conditional probability using multiplicative rule:
P(A and B) = P(A).P(B|A)
P(A and B) P(A).P(B|A)
=
P(A) P(A)
P(A and B)
= P(B|A)
P(A)
P(A and B)
P(B|A) =
P(A)
Bayes Theorem
𝐏 (𝐀𝐤) × 𝐏 (𝐁 | 𝐀𝐤)
P (Ak| B) =
𝐏(𝐀𝟏).𝐏(𝐁|𝐀𝟏) + 𝐏(𝐀𝟐).𝐏(𝐁|𝐀𝟐) + … + 𝐏(𝐀𝐧)𝐏(𝐁 |𝐀𝐧)
Binomial Distribution
𝐏(𝐗) = 𝐧𝐫𝐂 𝐩 𝐱 𝐪𝐧−𝐱 (where q = 1 – p)
Mean: 𝐄(𝐗) = 𝛍 = 𝐧𝐩
Poisson Distribution
Always has a time interval.
𝒆−𝝁 𝝁𝒙
P(X) =
𝒙!
Mean: E(X) = μ
Variance: V(X) = σ2 = μ
Standard Normal Distribution
𝐱−𝛍
𝑧= 𝝈
Mean: E(X) = µ
Variance: V(X) = σ2 = 1
If the V(X) of binomial > 5, then you convert the standard normal distribution to the binomial, by using the
formula:
𝐱 − 𝐧𝐩
𝑧 = √(𝒏𝒑𝒒)
𝝈𝟐
Variance: S2 =
𝒏
𝝈
Standard Deviation: s =
√𝒏
To take out z-values for this distribution, you use the formula:
𝐱−µ
𝑧= 𝝈
√𝒏
T-Distribution
This is used when sample size is very small.
𝒙−µ
t-value = 𝒔 where x̅ is the sample mean, µ is the proportional mean & s is the sample S.D.
⁄ 𝒏
√
1
̅=
Mean: 𝑋 ∑𝑁
𝐼=1 𝑥𝑖 (the mean of this distribution is equal to 0)
𝑁
Variance:
Measures of Central Tendency (Ungrouped Data)
sum of all values
Average Mean =
number of values
Median:
1. The middle value when a variable’s values are ranked in order; the point that divides a distribution into
two equal halves.
2. When data are listed in order, the median is the point at which 50% of the cases are above and 50%
below it.
3. When two values lie in this central point then the median is the average of the two points.
Mode:
Where l = lower class boundary of the modal class, fm = frequency of the modal class, f1 = frequency of the
class preceding the modal class, f2 = frequency of the class following modal class, and h = length of class
interval of the modal class.
Median:
Where l = lower class boundary of the median class (i.e. that class for which the cumulative frequency is just
in excess of n/2), h = class interval size of the median class, f = frequency of the median class, n = the total
number of observations), c = cumulative frequency of the class preceding the median class.
Empirical Relation between Mode, Median and Mean: Mode = 3(Median) – 2(Mean)
General Linear Model
Degrees of Freedom = N – 1
The model function is written as: y = B0 + Bx1 + Bx2 + …. + Bxn
Sum of Squares: Sum of Squares Explained + Sum of Squares Residual = Total Sum of Squares
Sum of Squares Explained
R2 =
Total Sum of Squares
Sum of Squares
Mean Squares =
Degrees of Freedom
Explained Sum of Squares
Mean Explained Sum of Squares =
Explained Degrees of Freedom
Residual Sum of Squares
Mean Residual Sum of Squares =
Residual Degrees of Freedom
Mean Square of Explained
F-Value =
Mean Square of Residual
𝑎1
Model for a Covariate: f = [𝑎2 ] + c (a1 is always your intercept)
𝑎3
Confidence Intervals:
1. Lower Limit = Value of the coefficient – (1.96 x Standard Error of the Coefficient)
2. Upper Limit = Value of the coefficient + (1.96 x Standard Error of the Coefficient)
The 1.96 value is the z-value that corresponds to the significance level stated in the question.
h n
First Quartile: Q1 = l + − c
f 4
h 2n
− c = l + (n 2 − c )
h
Second Quartile: Q2 = l+
f 4 f
h 3n
Third Quartile: Q3 = l + − c
f 4
Hypothesis Testing
You need the z-value and the significance level to identify which hypothesis is significant.
𝐱−µ
Formula for calculating the z-value: z = 𝝈
√𝒏
If p-value > α value, then the results are significant, and you accept H 0 and reject H1 (α is usually 0.05).