You are on page 1of 8

EC2303 Formula Sheet

Descriptive statistics:
Population mean, variance and coefficient of variation:
𝑁 𝑁
1 1 𝜎
𝜇 = ∑ 𝑥𝑖 ; 𝜎 2 = ∑(𝑥𝑖 − 𝜇)2 ; 𝐶𝑉 = ∗ 100%
𝑁 𝑁 𝜇
𝑖=1 𝑖=1

Population covariance and correlation:


𝑁
1 𝑐𝑜𝑣(𝑥, 𝑦) 𝜎𝑥𝑦
𝑐𝑜𝑣(𝑥, 𝑦) = 𝜎𝑥𝑦 = ∑(𝑥𝑖 − 𝜇𝑥 )(𝑦𝑖 − 𝜇𝑦 ) ; 𝜌𝑥𝑦 = =
𝑁 𝜎𝑥 𝜎𝑦 𝜎𝑥 𝜎𝑦
𝑖=1

Sample mean and variance:


𝑛 𝑛
1 1 𝑠
𝑥̅ = ∑ 𝑥𝑖 ; 𝑠 2 = ∑(𝑥𝑖 − 𝑥̅ )2 ; 𝐶𝑉 = ∗ 100%
𝑛 𝑛−1 𝑥̅
𝑖=1 𝑖=1

Sample covariance and correlation:


𝑛
1 𝑐𝑜𝑣(𝑥, 𝑦) 𝑠𝑥𝑦
𝑐𝑜𝑣(𝑥, 𝑦) = 𝑠𝑥𝑦 = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) ; 𝑟𝑥𝑦 = =
𝑛−1 𝑠𝑥 𝑠𝑦 𝑠𝑥 𝑠𝑦
𝑖=1

Introduction to probability:
Permutation rule formula:
𝑛!
𝑃𝑥𝑛 = 𝑛(𝑛 − 1)(𝑛 − 2) … . (𝑛 − 𝑥 + 1) =
(𝑛 − 𝑥)!
Where 𝑥! = 𝑥(𝑥 − 1)(𝑥 − 2) … (1)
Combination rule formula:
𝑛(𝑛 − 1)(𝑛 − 2) … . (𝑛 − 𝑥 + 1) 𝑛! 𝑃𝑥𝑛
𝐶𝑥𝑛 = = =
𝑥(𝑥 − 1) … (1) 𝑥! (𝑛 − 𝑥)! 𝑥!
Where 𝑥! = 𝑥(𝑥 − 1)(𝑥 − 2) … (1)
Addition rule:
𝑃(𝐴) + 𝑃(𝐵) = 𝑃(𝐴 ∪ 𝐵) + 𝑃(𝐴 ∩ 𝐵)
Conditional probability:
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
Multiplication rule:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵)𝑃(𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴)
1
Law of total probability:
𝑃(𝐴) = 𝑃(𝐴|𝐵1)𝑃(𝐵1 ) + 𝑃(𝐴|𝐵2)𝑃(𝐵2 ) + … + 𝑃(𝐴|𝐵𝑘 )𝑃(𝐵𝑘 )
where 𝐵1 , 𝐵2 , … , 𝐵𝑘 are mutually exclusive and collectively exhaustive events.
Bayes’ theorem:
𝑃(𝐵|𝐴)𝑃(𝐴)
𝑃(𝐴|𝐵) =
𝑃(𝐵)

Discrete random variable distribution:


Probability distribution function
𝑃(𝑥) = 𝑃(𝑋 = 𝑥) for all values of 𝑥
Cumulative probability distribution

𝐹(𝑥𝑚 ) = 𝑃(𝑋 ≤ 𝑥𝑚 ) = ∑ 𝑃(𝑥)


𝑥≤𝑥𝑚

Mean and variance:

𝐸[𝑋] = 𝜇 = ∑ 𝑥𝑃(𝑥) ; 𝜎 2 = 𝐸[(𝑋 − 𝜇)2 ] = ∑(𝑥 − 𝜇)2 𝑃(𝑥)


𝑥 𝑥

Let 𝑌 = 𝑔(𝑋) be any function of 𝑋:

𝐸[𝑌] = 𝐸[𝑔(𝑋)] = ∑ 𝑔(𝑥)𝑃(𝑥)


𝑥

Bernoulli distribution
𝑃(𝑋 = 1) = 𝑃 ; 𝑃(𝑋 = 0) = 1 − 𝑃
𝑀𝑒𝑎𝑛: 𝜇 = 𝑃; 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝜎 2 = 𝑃(1 − 𝑃)
Binomial distribution
𝑛!
𝑃(𝑥) = 𝑃 𝑥 (1 − 𝑃)𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
𝑀𝑒𝑎𝑛: 𝜇 = 𝑛𝑃; 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝜎 2 = 𝑛𝑃(1 − 𝑃)
Poisson distribution
𝜆𝑥
𝑃(𝑥) = 𝑒 −𝜆 𝑓𝑜𝑟 𝑥 = 0,1,2, …
𝑥!
𝑀𝑒𝑎𝑛: 𝜇 = 𝜆 𝑎𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝜎 2 = 𝜆

Continuous random variable distribution:


Probability of 𝑋 between 𝑎 and 𝑏:
𝑏
𝑃(𝑎 < 𝑋 < 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥
𝑎

2
Cumulative distribution function:
𝑥
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = 𝑃(𝑋 < 𝑥) = ∫ 𝑓(𝑥)𝑑𝑥
𝑥𝑚𝑖𝑛

Mean and variance:


𝑥𝑚𝑎𝑥 𝑥𝑚𝑎𝑥
𝜇 = 𝐸[𝑋] = ∫ 𝑥𝑓(𝑥)𝑑𝑥 ; 𝜎 2 = 𝐸[(𝑥 − 𝜇)2 ] = ∫ (𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥
𝑥𝑚𝑖𝑛 𝑥𝑚𝑖𝑛

Uniform distribution
1 𝑥−𝑎
𝑓(𝑥) = ; 𝐹(𝑥) = 𝑓𝑜𝑟 𝑎𝑛𝑦 𝑎 ≤ 𝑥 ≤ 𝑏
𝑏−𝑎 𝑏−𝑎
𝑎+𝑏 2
(𝑏 − 𝑎)2
𝑀𝑒𝑎𝑛: 𝜇 = ; 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝜎 =
2 12
Normal distribution
1 2 /2𝜎 2
𝐹𝑜𝑟 𝑋~𝑁(𝜇, 𝜎 2 ), 𝑓(𝑥) = 𝑒 −(𝑥−𝜇)
√2𝜋𝜎 2
Standard normal distribution:
1 2 /2
𝐹𝑜𝑟 𝑍~𝑁(0,1), 𝑓(𝑧) = 𝑒 −𝑥
√2𝜋
Exponential distribution:
𝑓(𝑡) = 𝜆𝑒 −𝜆𝑡 ; 𝐹(𝑡) = 1 − 𝑒 −𝜆𝑡 for t > 0
1 1
𝑀𝑒𝑎𝑛: 𝜇 = 𝑎𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒: 𝜎 2 = 2
𝜆 𝜆
Joint Probability Distribution:
Joint probability distribution:
𝑃(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥 ∩ 𝑌 = 𝑦)
Marginal probability distributions

𝑃(𝑥) = ∑ 𝑃(𝑥, 𝑦) and 𝑃(𝑦) = ∑ 𝑃(𝑥, 𝑦)


𝑦 𝑥

conditional probability distribution:


𝑃(𝑥, 𝑦) 𝑃(𝑥, 𝑦)
𝑃(𝑦|𝑥) = ; 𝑃(𝑥|𝑦) =
𝑃(𝑥) 𝑃(𝑦)
Conditional mean

𝜇𝑌|𝑋 = 𝐸[𝑌|𝑋] = ∑(𝑦|𝑋)𝑃(𝑦|𝑋)


𝑌

Conditional variance
2 2 2
𝜎𝑌|𝑋 = 𝐸 [(𝑌 − 𝜇𝑌|𝑋 ) |𝑋] = ∑ [(𝑌 − 𝜇𝑌|𝑋 ) |𝑋] 𝑃(𝑦|𝑋)
𝑌

Covariance

3
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] = ∑ ∑(𝑥 − 𝜇𝑋 )(𝑦 − 𝜇𝑌 )𝑃(𝑥, 𝑦)
𝑦 𝑥

Correlation
𝐶𝑜𝑣(𝑋, 𝑌)
𝜌 = 𝐶𝑜𝑟𝑟(𝑋, 𝑌) =
𝜎𝑋 𝜎𝑌
General rules:
𝐸[𝑋 + 𝑌] = 𝜇𝑋 + 𝜇𝑌
𝐸[𝑋 − 𝑌] = 𝜇𝑋 − 𝜇𝑌
𝑉𝑎𝑟(𝑋 + 𝑌) = 𝜎𝑋2 + 𝜎𝑌2 + 2𝐶𝑜𝑣(𝑋, 𝑌)
𝑉𝑎𝑟(𝑋 − 𝑌) = 𝜎𝑋2 + 𝜎𝑌2 − 2𝐶𝑜𝑣(𝑋, 𝑌)
𝑉𝑎𝑟(𝑋 + 𝑌 + 𝑍) = 𝜎𝑋2 + 𝜎𝑌2 + 𝜎𝑍2 + 2𝐶𝑜𝑣(𝑋, 𝑌) + 2𝐶𝑜𝑣(𝑋, 𝑍) + 2𝐶𝑜𝑣(𝑌, 𝑍)
𝑉𝑎𝑟(𝑋 − 𝑌 + 𝑍) = 𝜎𝑋2 + 𝜎𝑌2 + 𝜎𝑍2 − 2𝐶𝑜𝑣(𝑋, 𝑌) + 2𝐶𝑜𝑣(𝑋, 𝑍) − 2𝐶𝑜𝑣(𝑌, 𝑍)
𝑉𝑎𝑟(𝑋 − 𝑌 − 𝑍) = 𝜎𝑋2 + 𝜎𝑌2 + 𝜎𝑍2 − 2𝐶𝑜𝑣(𝑋, 𝑌) − 2𝐶𝑜𝑣(𝑋, 𝑍) + 2𝐶𝑜𝑣(𝑌, 𝑍)
𝐶𝑜𝑣(𝑋, 𝑋) = 𝑉𝑎𝑟(𝑋)
𝐶𝑜𝑣(𝑋 + 𝑌, 𝑍) = 𝐶𝑜𝑣(𝑋, 𝑍) + 𝐶𝑜𝑣(𝑌, 𝑍)

Sampling Distribution of Sample Mean:


Denote 𝑋̅ as the sample mean of a random sample from a population with mean 𝜇 and
variance 𝜎 2 .
Mean of sample mean:
𝜇𝑋̅ = 𝜇
Variance of sample mean if the population size is infinitely large:
𝜎2
𝜎𝑋2̅ =
𝑛
n
Variance of sample mean if the population size is finite (i.e., N < 0.05):

𝜎2 𝑁 − 𝑛
𝜎𝑋2̅ =
𝑛 𝑁−1

Central Limit Theorem: Let 𝑋1 , 𝑋2 , … 𝑋𝑛 be a sample of 𝑛 independent random variables


with an arbitrary distribution with mean 𝜇 and variance 𝜎 2 , and let 𝑋̅ be the sample mean. As
𝑋̅−𝜇
𝑛 gets larger, the distribution of 𝑍 = 𝜎𝑋
approaches the standard normal distribution.
̅

Sampling Distribution of Sample Proportion:


Denote 𝑝̂ as the sample proportion of a random sample from a population with mean
proportion of 𝑃.
Mean of sample proportion:
𝜇𝑝̂ = 𝑃
Variance of sample proportion:

4
𝑃(1 − 𝑃)
𝜎𝑝2̂ =
𝑛

Central Limit Theorem: Let 𝑋1 , 𝑋2 , … 𝑋𝑛 be a sample of 𝑛 independent random variables


with an arbitrary distribution with population proportion 𝑃, and let 𝑝̂ be the sample proportion.
𝑝̂−𝑃
As 𝑛 gets larger, the distribution of 𝑍 = 𝜎 approaches the standard normal distribution.
̂
𝑝

Sampling Distribution of Sample Variance:


Denote 𝑠 2 as the sample variance of a random sample from population with mean 𝜇 and
variance 𝜎 2 .
Mean of sample variance:
𝐸(𝑠 2 ) = 𝜎 2
If the population distribution is normal, then
(𝑛 − 1)𝑠 2 2
= 𝜒𝑛−1
𝜎2
has a 𝜒 2 distribution with 𝑛 − 1 degrees of freedom.
If the population distribution is normal, variance of sample variance:
2𝜎 4
𝑉𝑎𝑟(𝑠 2 ) =
𝑛−1
Confidence Interval:
Confidence
90% 95% 98% 99%
Level
𝜶 0.10 0.05 0.02 0.01
𝒛𝜶/𝟐 1.645 1.96 2.33 2.58

If population is normal or sample size is large, and population variance 𝜎 2 is known, the
100(1 − 𝛼)% confidence intervals of population mean is:
𝜎
𝑋̅ ± 𝑧𝛼/2
√𝑛
If population is normal or sample size is large, and population variance 𝜎 2 is unknown, the
100(1 − 𝛼)% confidence intervals of population mean is:
𝑠
𝑋̅ ± 𝑡𝑛−1,𝛼/2
√𝑛
If population is normal or sample size is large, the 100(1 − 𝛼)% confidence intervals of
population proportion is:

𝑝̂ (1 − 𝑝̂ )
𝑝̂ ± 𝑧𝛼/2 √
𝑛

If population is normal, the 100(1 − 𝛼)% confidence intervals of population variance is:

5
(𝑛 − 1)𝑠 2 (𝑛 − 1)𝑠 2
2 < 𝜎2 < 2
𝜒𝑛−1,𝛼/2 𝜒𝑛−1,1−𝛼/2

If population size is infinitely large, margin of error for population mean is required to be 𝑀𝐸,
sample size 𝑛 is determined by
2
𝑧𝛼/2 𝜎2
𝑛=
𝑀𝐸 2
If population size is finite, margin of error for population mean is required to be 𝑀𝐸, sample
size 𝑛 is determined by
𝑛0 𝑁
𝑛=
𝑛0 + 𝑁 − 1
2
𝑧𝛼/2 𝜎2
Where 𝑛0 = 𝑀𝐸 2

If margin of error for population proportion is required to be 𝑀𝐸, sample size 𝑛 is


determined by
2
0.25𝑧𝛼/2
𝑛=
𝑀𝐸 2
Hypothesis Test:
One-sided

Significance Level 10% 5% 1%

𝜶 0.10 0.05 0.01

𝒛𝜶 1.28 1.645 2.33

Two-sided

Significance Level 10% 5% 1%

𝜶 0.10 0.05 0.01

𝒛𝜶/𝟐 1.645 1.96 2.58

If population variance 𝜎 2 is known, reject the null hypothesis 𝐻0 : 𝜇 = 𝜇0 at significance level


of 𝛼,
against one-sided alternative 𝐻1 : 𝜇 > 𝜇0 if
𝑋̅ − 𝜇0
𝑧= > 𝑧𝛼
𝜎/√𝑛
against one-sided alternative 𝐻1 : 𝜇 < 𝜇0 if
𝑋̅ − 𝜇0
𝑧= < −𝑧𝛼
𝜎/√𝑛
against two-sided alternative 𝐻1 : 𝜇 ≠ 𝜇0 if
6
𝑋̅−𝜇0 𝑋̅−𝜇0
𝑧= > 𝑧𝛼/2 or 𝑧 = < −𝑧𝛼/2
𝜎/√𝑛 𝜎/√𝑛

If population variance 𝜎 2 is unknown, reject the null hypothesis 𝐻0 : 𝜇 = 𝜇0 at significance


level of 𝛼,
against one-sided alternative 𝐻1 : 𝜇 > 𝜇0 if
𝑋̅ − 𝜇0
𝑡= > 𝑡𝑛−1,𝛼
𝑠/√𝑛
against one-sided alternative 𝐻1 : 𝜇 < 𝜇0 if
𝑋̅ − 𝜇0
𝑡= < −𝑡𝑛−1,𝛼
𝑠/√𝑛
against two-sided alternative 𝐻1 : 𝜇 ≠ 𝜇0 if
𝑋̅−𝜇0 𝑋̅−𝜇0
𝑡= 𝑠/√𝑛
> 𝑡𝑛−1,𝛼/2 or 𝑡 = 𝑠/√𝑛
< −𝑡𝑛−1,𝛼/2

Univariate Regression (Least Square Regression):


Population regression model:
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜖𝑖
where 𝜖𝑖 is the population error.
Sample regression model:
𝑦𝑖 = 𝑦̂𝑖 + 𝑒𝑖 = 𝑏0 + 𝑏1 𝑥𝑖 + 𝑒𝑖
where 𝑒𝑖 is the regression residual.
Slope coefficient
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) 𝑐𝑜𝑣(𝑥, 𝑦) 𝑠𝑦
𝑏1 = 𝑛 = 2 = 𝑟
∑𝑖=1(𝑥𝑖 − 𝑥̅ )2 𝑠𝑥 𝑠𝑥
∑𝑛 ̅)
𝑖=1(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦 𝑐𝑜𝑣(𝑥,𝑦)
where 𝑐𝑜𝑣(𝑥, 𝑦) = 𝑛−1
and 𝑟 = 𝑠𝑥 𝑠𝑦
.

Intercept coefficient
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
Error/residual sum of squared, SSE, is:
𝑛 𝑛

𝑆𝑆𝐸 = ∑ 𝑒𝑖2 = ∑(𝑦𝑖 − 𝑦̂𝑖 )2


𝑖=1 𝑖=1

Total sum of squared, SST, is

𝑆𝑆𝑇 = ∑(𝑦𝑖 − 𝑦̅)2

Regression sum of squared, SSR, is

𝑆𝑆𝑅 = ∑(𝑦̂𝑖 − 𝑦̅)2

𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸


Coefficient of determination, R-squared, 𝑅 2
7
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇
Under univariate least squared regression,
𝑟2 = 𝑅2
If the population error 𝜖 follows normal distribution and variance of the population error 𝜎 2 is
known, 𝑏1 follows the normal distribution:
𝑏1 ~𝑁(𝛽1 , 𝜎𝑏21 )
where
𝜎2 𝜎2
𝜎𝑏21 = =
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 (𝑛 − 1)𝑠𝑥2
The estimator for the variance of the population error
∑𝑛𝑖=1 𝑒𝑖2
𝜎̂ 2 = 𝑠𝑒2 =
𝑛−2
Where 𝑠𝑒 is called the residual standard error.
If the population error 𝜖 follows normal distribution, and variance of the population error, 𝜎 2 ,
is unknown,
𝑏1 − 𝛽1
𝑡= ~𝑡𝑛−2
𝑠𝑏1
follows the Student’s t-distribution with 𝑛 − 2 degree of freedom:
𝑠𝑒2 𝑠𝑒2
𝑠𝑏21 = =
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 (𝑛 − 1)𝑠𝑥2
The (100 − 𝛼)% confidence interval for the population slope coefficient 𝛽1 is
𝑏1 − 𝑡𝑛−2,𝛼/2 𝑠𝑏1 < 𝛽1 < 𝑏1 + 𝑡𝑛−2,𝛼/2 𝑠𝑏1
Reject the null hypothesis of 𝐻0 : 𝛽1 = 𝛽 ∗ against alternative 𝐻1 : 𝛽1 ≠ 𝛽 ∗ if
𝑏1 −𝛽 ∗ 𝑏1 −𝛽 ∗
𝑡= 𝑠𝑏1
> 𝑡𝑛−2,𝛼/2 or 𝑡 = 𝑠𝑏1
< −𝑡𝑛−2,𝛼/2