You are on page 1of 20

Contents

CHAPTER 4..................................................................................................................................2
CHAPTER 5..................................................................................................................................3
CHAPTER 6: DISCRETE DISTRIBUTION ........................................................................................4
CHAPTER 7: CONTINUOUS DISTRIBUTION .................................................................................6
CHATER 8: SAMPLING DISTRIBUTION ........................................................................................9
CHAPTER 9: ONE-SAMPLE HYPOTHESIS TESTS ........................................................................11
CHAPTER 10: TWO-SAMPLE TEST ............................................................................................13
CHAPTER 11: ANALYSIS OF VARIANCE .....................................................................................16
CHAPTER 12: SIMPLE LINEAR REGRESSION .............................................................................19

1
CHAPTER 4
Distribution’s Shape Statistics
Skewed Left (negative skewness) Mean < Median < Mode
Symmetric 𝑀𝑒𝑎𝑛 ≈ 𝑀𝑒𝑑𝑖𝑎𝑛 ≈ 𝑀𝑜𝑑𝑒
Skewed Right (positive skewness) Mean > Median > Mode

Type of variable Best measure of central tendency


Nominal Mode
Ordinal Median
Interval / Ratio (not skewed) Mean
Interval / Ratio (not skewed) Median

Population Mean Sample Mean


∑𝑁𝑖=1 𝑥𝑖
∑𝑛
𝑖=1 𝑥𝑖
𝐸(𝑋) = 𝜇 = 𝑥̅ =
𝑁 𝑛
Population Variance Sample Variance
2
∑𝑁𝑖=1(𝑥𝑖 − 𝜇)
2 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
2
𝜎 = 𝑠 =
𝑁 𝑛−1
 =VAR.S(Data)
Population Standard Deviation Sam Standard Deviation
𝜎 𝑠
 =STDEV.S(Data)
Coefficient of Variation
𝜇 𝑥̅
𝐶𝑉 = 𝐶𝑉 = 100 ∗
𝜎 𝑠
Mean Absolute Deviation
𝑀𝐴𝐷 = 𝐸[|𝑋 − 𝜇|] ̅
∑𝑛𝑖=1 |𝑥𝑖 − 𝑥|
𝑀𝐴𝐷 =
𝑛
 =AVEDEV(Data)
Covariance
𝜎𝑥𝑦 = 𝐸[(𝑋 − 𝜇𝑥 )(𝑋 − 𝜇𝑦 )] ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑠𝑥𝑦 =
= 𝐸 [𝑋𝑌] − 𝜇𝑥 𝜇𝑦 𝑛−1
Correlation Coefficient
𝜎𝑥𝑦 𝑠𝑥𝑦 ∑𝑛𝑖=1(𝑥1 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑝𝑥𝑦 = 𝑟𝑥𝑦 = =
𝜎𝑥 𝜎𝑦 𝑠𝑥 𝑠𝑦 √∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 √∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2

Geometric Mean (denoted G) 𝐺 = 𝑛√𝑥1 𝑥2 … 𝑥𝑛 (𝑥 > 0)


Growth Rates 𝑛−1 𝑥𝑛
𝐺𝑅 = √ −1
𝑥1
Range = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
Midrange 𝑥𝑚𝑎𝑥 + 𝑥𝑚𝑖𝑛
=
2
Trimmed Mean (e.g: 5% Trim Mean) = TRIMMEAN(Data,10%)

2
CHAPTER 5
Complement of an event P(A) + P(A’) = 1
Union of two events (A or B) 𝐴∪𝐵
Intersection of two events (A and B) 𝐴∩𝐵
General Law of Addition: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
Mutually exclusive events If 𝐴 ∩ 𝐵 = ∅ then 𝑃(𝐴 ∩ 𝐵) = ∅
In this case: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Collectively Exhaustive Events
Conditional Probability 𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = for P(B) > 0
𝑃(𝐵)
Independent events 𝑃(𝐴1 ∩ 𝐴2 ∩ … 𝐴𝑛 ) = 𝑃(𝐴1 )𝑃(𝐴2 ) … 𝑃(𝐴𝑛 )
𝑃(𝐴∩𝐵) 𝑃(𝐴)𝑃(𝐵)
[when 𝑃(𝐴|𝐵) = 𝑃(𝐵) = 𝑃(𝐵) =
𝑃(𝐴)]
Bayes’s Theorem (in Independent 𝑃(𝐴|𝐵)𝑃(𝐵)
events) 𝑃(𝐵|𝐴) =
𝑃(𝐴)
𝑃(𝐴|𝐵)𝑃(𝐵)
=
𝑃(𝐴|𝐵)𝑃(𝐵) + 𝑃(𝐴|𝐵 ′ )𝑃(𝐵′ )
Permutation 𝑛!
𝑛𝑃𝑟 =
(𝑛 − 𝑟)!
Combination 𝑛!
𝑛𝐶𝑟 =
𝑟! (𝑛 − 𝑟)!

3
CHAPTER 6: DISCRETE DISTRIBUTION
𝑁
𝐸(𝑋) = 𝜇 = ∑ 𝑥𝑖 𝑃(𝑥𝑖 )
𝑖=1
𝑁
𝑉𝑎𝑟(𝑋) = 𝜎 2 = ∑ [(𝑥𝑖 − 𝜇)2 ]𝑃(𝑥𝑖 )
𝑖=1
Uniform Distribution
Parameters a =lower limit
b = upper limit
PDF 1
𝑃(𝑋 = 𝑥) =
𝑏−𝑎+1
CDF 𝑥−𝑎+1
𝑃(𝑋 ≤ 𝑥) =
𝑏−𝑎+1
Domain x = a, a+1, a+2, …, b-1, b
Mean 𝑎+𝑏
=
2
Standard Deviation
[(𝑏 − 𝑎) + 1]2 − 1
=√
12
Random data generation in Excel =RANDBETWEEN(a,b)
Comments Useful as a benchmark, to generate random
integers for sampling, or in simulation
models, always symmetric.

Binominal Distribution
Parameter n = number of trials
𝜋 = probability of success
PDF 𝑛!
𝑃(𝑋 = 𝑥) = 𝜋 𝑥 (1 − 𝜋 )𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
Excel*PDF = BINOM.DIST(𝑥, 𝑛, 𝜋, 0)
Excel*CDF = BINOM.DIST(𝑥, 𝑛, 𝜋, 1)
Domain x = 0, 1, 2,…n
Mean 𝑛𝜋
Standard Deviation √𝑛𝜋(1 − 𝜋)
Random data generation in Excel = BINOM.INV(𝑛, 𝜋, 𝑅𝐴𝑁𝐷( ))
Comments Skewed right if 𝜋 < .5, skewed left if 𝜋 > .5
and symmetric if 𝜋 = .5

Poisson Distribution
Parameter 𝜆 = mean arrivals per unit of time or space
PDF 𝜆𝑥 𝑒 −𝜆
𝑃(𝑋 = 𝑥) =
𝑥!
Excel*PDF = POISSON.DIST(𝑥, 𝜆, 0)
Excel*CDF = POISSON.DIST(𝑥, 𝜆, 1)
Domain x = 0, 1, 2,…(no obvious upper limit)
Mean 𝜆
Standard Deviation √𝜆

4
Comments Always right-skewed, but less so for larger
𝜆

Hypergeometric Distribution
Parameter N = number of items in the population
n = sample size
s = number of success in population
.
PDF 𝑠𝐶𝑥𝑁−𝑠 𝐶𝑛−𝑥
𝑃(𝑋 = 𝑥) = .
𝑁𝐶𝑛
Excel*PDF =HYPGEOM.DIST(x,s,n,N,0)
Domain 𝑚𝑎𝑥(0, 𝑛 − 𝑁 + 𝑠) ≤ 𝑋 ≤ 𝑚𝑖𝑛(𝑠, 𝑛)
Mean 𝑛𝜋 𝑤ℎ𝑒𝑟𝑒 𝜋 = 𝑠/𝑁
Standard Deviation
𝑁−𝑛
√𝑛𝜋(1 − 𝜋)√
𝑛−1
Comments Similar to binominal, but sampling without
replacement from a finite popuation. It can
be axproxiamatedby a binominal with 𝜋 =
𝑠 𝑛 𝑠
𝑖𝑓 𝑁 < 0.05 and is symmetric if 𝑁 = 0.5
𝑁

Geometric Distribution
Parameters 𝜋 = probability of success
PDF 𝑃(𝑋 = 𝑥) = 𝜋(1 − 𝜋)𝑥−1
CDF 𝑃(𝑋 ≤ 𝑥) = 1 − (1 − 𝜋)𝑥
Domain x = 1, 2, …
Mean 1
=
𝜋
Standard Deviation
1−𝜋
=√
𝜋2
Random data generation in Excel =1+INT(LN(1-RAND()/LN(1-𝜋))
Comments Describes the number of trials before the
first success. Highly skewed.

5
CHAPTER 7: CONTINUOUS DISTRIBUTION
Quartile to CDF
Q1 F(z) = 0.25
Q2 F(z) = 0.5
Q3 F(z) = 0.75
Q4 F(z) = 1
𝜎
Standard error (of the mean): 𝑆𝐸 =
√𝑛
Uniform Distribution
Parameters a = lower limit
b = upper limit
PDF 1
𝑓(𝑥) =
𝑏−𝑎
CDF 𝑥−𝑎
𝑃(𝑋 ≤ 𝑥) =
𝑏−𝑎
Domain 𝑎≤𝑥≤𝑏
Mean 𝑎+𝑏
2
Standard Deviation
(𝑏 − 𝑎)2

12
Shape Symmetric with no mode
Random data in Excel =a+(b-a)*RAND()
Comments Used as a conservative what-if benchmark
and in simulation

Normal Distribution
Parameters 𝜇 = population mean
𝜎 = population standard deviation
PDF 1 1 𝑥−𝜇 2
𝑓(𝑥) = 𝑒 −2( 𝜎 )
𝜎 √2𝜋
Domain −∞ < 𝑥 < +∞
Mean 𝜇
Standard Deviation 𝜎
Shape Symmetric, mesokurtic, and bell-shaped
PDF in *Excel =NORM.DIST(𝑥, 𝜇, 𝜎, 0)
CDF in *Excel =NORM.DIST(𝑥, 𝜇, 𝜎, 1)
Random data generation in *Excel =NORM.INV(RAND(), 𝜇, 𝜎)

6
Standard Normal Distribution
Parameters 𝜇 = population mean
𝜎 = population standard deviation
PDF 1 −𝑧 2 𝑥−𝜇
𝑓(𝑧) = 𝑒 2 𝑤ℎ𝑒𝑟𝑒 𝑧 =
√2𝜋 𝜎
Domain −∞ < 𝑥 < +∞
Mean 0
Standard Deviation 1
Shape Symmetric, mesokurtic, and bell-shaped
CDF in *Excel =NORM.S.DIST(z,1)
Random data in *Excel =NORM.S.INV(RAND())
Comment No simple formula for a normal CDF so we
use CDF-table
How to find CDF:
1. Find z→ 𝑃(𝑋1 ≤ 𝑥 ≤ 𝑋2 ) = 𝑃(𝑍1 ≤ 𝑧 ≤ 𝑍2 ) = 𝐹(𝑍2 ) − 𝐹(𝑍1 )
2. Look up CDF table { Use for 𝑃 (𝑍 ≤ 𝑧) 𝑜𝑟 𝑃(𝑍 < 𝑧) 𝑡𝑜 𝑓𝑖𝑛𝑑 𝐹(𝑍)}
How to find x (Inverse Normal):
1. 𝐹(𝑍) → 𝑍 → 𝑘𝑛𝑜𝑤 𝑃(𝑧 < 𝑍) → 𝑃(𝑥 < 𝑧𝜎 + 𝜇)

Inverse Normal table


Percentile z
95th (highest 5%) 1.645
90th (highest 10%) 1.282
75th (highest 25%) 0.675
25th (lowest 25%) -0.675
10th (lowest 10%) -1.282
5th (lowest 5%) -1.645

Normal Approximation to the Binomial


𝑛 > 20
A data set large when (use for large data
set): 𝑛𝜋 > 5
𝑛(1 − 𝜋) < 5
Standardize 𝜇 = 𝑛𝜋
𝜎 = √𝑛𝜋(1 − 𝜋)
→ 𝑧 and solve like standard normal
distribution
When approxiamte Binomial then: 𝑃(𝑋 = 𝑥) = 𝑃(𝑋 − 0.5 < 𝑥 < 𝑋 + 0.5)

Normal Approximation to the Poisson


A data set large when: 𝜆 𝑙𝑎𝑟𝑔𝑒
Standardize 𝜇=𝜆
𝜎 = √𝜆
→ 𝑧 and solve like standard normal
distribution

7
Exponential Distribution (continuous Poisson)
Parameter 𝜆 = mean arrivals per unit of time or space
(same as Poisson)
PDF 𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥
CDF 𝑃(𝑋 ≤ 𝑥) = 1 − 𝑒 −𝜆𝑥
𝑃(𝑋 > 𝑥) = 𝑒 −𝜆𝑥
Hỏi

Domain 𝑥≥0
Mean 1/𝜆 (mean time between events)
Standard Deviation 1/𝜆
Shape Always right-skewed
CDF in *Excel =EXPON.DIST(𝑥, 𝜆, 1)
Random data in Excel =LN(RAND())/𝜆
Comments Waiting time is exponential when arrivals
1
follow a Poisson model. Often 𝜆 is given
(mean time between events) rather than 𝜆

8
CHATER 8: SAMPLING DISTRIBUTION
Central Limit Theorem: even if the population of X is not normal, sample mean from that x
will be approxiamtely normal as long as the sample size is large enough.

Population shape: Normal 𝑥̅ : always normal


Population shape: Symmetric 𝑛 ≥ 15 : almost normal
Population shape: Unknown 𝑛 ≥ 30 : almost normal
Note 𝜇𝑥̅ = 𝜇
𝜎
𝜎𝑥̅ =
√𝑛

90% Interval 95% Interval 99% Interval


𝜎 𝜎 𝜎
𝜇 ± 1.645 𝜇 ± 1.96 𝜇 ± 2.576
√𝑛 √𝑛 √𝑛

Interval
Sample means 𝜎 𝜎
[𝜇 − 𝑧 ,𝜇 +𝑧 ]
√𝑛 √𝑛
Population [𝜇 − 𝑧𝜎, 𝜇 + 𝑧𝜎]

Confidence Interval
𝑥̅ −𝜇𝑥 𝜎
𝑧𝑥̅ = 𝜎𝑥̅ = 𝑥 (standard error) 𝜇𝑥̅ (𝑚𝑒𝑎𝑛 𝑤𝑒𝑖𝑔ℎ𝑡) = 𝜇
𝜎𝑥 /√𝑛 √𝑛
1/ 𝝈𝒙 are known
𝜎𝑥
(1 − 𝛼)𝐶𝐼 = 𝑥̅ ± 𝑧1−𝛼 𝜎𝑥̅ = 𝑥̅ ± 𝑧1−𝛼
2 2 √𝑛
(point estimate) (margin of error)
𝛼
𝐹(𝑧) = 1 − → 𝐶𝐷𝐹 𝑡𝑎𝑏𝑙𝑒 → 𝑧
2
Margin of error 𝜎𝑥
𝑒 = 𝑧1−𝛼
√𝑛
2
(𝑧1−𝛼 )2 𝜎𝑥 2
2
→ 𝑛=
𝑒2
2/ 𝝈𝒙 are not known
Sample Standard Deviation
∑𝑛 (𝑥𝑖 − 𝑥̅ )2
𝑠𝑥 = √ 𝑖=1
𝑛−1
Degrees of freedom: 𝑑. 𝑓. = 𝑛 − 1 (larger𝑑. 𝑓. → approaches normal dist.)
Substitute 𝒔𝒙 for 𝝈𝒙 𝑥̅ − 𝜇𝑥
𝑡𝑥 = ~𝑡𝑛−1
𝑠𝑥 /√𝑛
*EXCEL: =T.INV.2T(𝛼,d.f.)
𝑠𝑥
(1 − 𝛼)𝐶𝐼 = 𝑥̅ ± 𝑡1−𝛼
2 √𝑛
(point estimate) (width)
𝛼
𝐹(𝑡) = 1 − → 𝑡 𝑡𝑎𝑏𝑙𝑒 (𝑣𝑒𝑟𝑡𝑖𝑐𝑙𝑒 𝑖𝑠 𝑛 − 1) → 𝑡
2

9
2. CI for 𝝅 (nếu ko biết p thì 𝒑 = 𝟎. 𝟓)
𝜋(1 − 𝜋)
𝜎𝑝 = √
𝑛
Sample proportion 𝑝 = 𝑥/𝑛
Shape Symmetric if: 𝜋 = 0.5
𝐶𝑙𝑜𝑠𝑒𝑟 𝑡𝑜 𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑖𝑐 𝑖𝑓 𝑛 ↑
Rule of Thumb 𝑝 = 𝑥/𝑛 assumed normal if:
𝑛𝜋 ≥ 10
𝑛(1 − 𝜋) ≥ 10
𝑝(1 − 𝑝)
(1 − 𝛼)𝐶𝐼 = 𝑝 ± 𝑧1−𝛼 √
2 𝑛

10
CHAPTER 9: ONE-SAMPLE HYPOTHESIS TESTS
(*** 𝜶 in chapter different from 𝜶 CHAP 7)

0
If n increase, 𝛼 𝑎𝑛𝑑 𝛽 𝑏𝑜𝑡ℎ 𝑑𝑒𝑐𝑟𝑒𝑎𝑠𝑒
(mẫu nhiều thì sai sót sẽ giảm)
*Note terms:
level of significance 𝛼
Left-tailed test Two-tailed test Right-tailed test
𝐻0 : 𝜇 ≥ 𝜇0 𝐻0 : 𝜇 = 𝜇0 𝐻0 : 𝜇 ≤ 𝜇0
𝐻1 : 𝜇 < 𝜇0 𝐻1 : 𝜇 ≠ 𝜇0 𝐻1 : 𝜇 > 𝜇0

One-sided / tailed test (xài cho right-tailed, nếu left thì đk z, t ngược lại)
Null hypothesis 𝐻0 : 𝜇 ≤ 𝜇0 (right-tailed) 𝐻0 : 𝜇 ≥ 𝜇0 (left-tailed)
Alternative hypothesis 𝐻1 : 𝜇 > 𝜇0 (right-tailed) 𝐻0 : 𝜇 < 𝜇0 (left-tailed)
Decide in the significance level 𝜶
Type I error reject 𝐻0 | 𝐻0 is true False positive
P(type I error)= 𝛼
Type II error don’t reject 𝐻0 | 𝐻0 is false False negative
P(type II error)= 𝛽
Power Correctly reject 𝐻0 | 𝐻0 is false
P(power)= 1 − 𝛽 Sensitivity
Compute test statistic 𝑥̅ − 𝜇0 𝑥̅ − 𝜇0
𝑧𝑐𝑎𝑙𝑐 = 𝑜𝑟 𝑡𝑐𝑎𝑙𝑐 =
𝜎𝑥 /√𝑛 𝑠𝑥 /√𝑛
𝐹(𝑧1−𝛼 ) = 1 − 𝛼 → 𝐶𝐷𝐹 𝑡𝑎𝑏𝑙𝑒 → 𝑐 = ⋯
*Excel: 𝑐𝑟𝑖𝑔ℎ𝑡−𝑡𝑎𝑖𝑙𝑒𝑑 =norm.s.inv(1-𝛼)
𝑐𝑙𝑒𝑓𝑡−𝑡𝑎𝑖𝑙𝑒𝑑 =norm.s.inv(𝛼)
𝐹(𝑡1−𝛼 ) = 1 − 𝛼 → 𝑡 𝑡𝑎𝑏𝑙𝑒 → 𝑐 = ⋯
*Excel: 𝒄𝒓𝒊𝒈𝒉𝒕−𝒕𝒂𝒊𝒍𝒆𝒅 =t.inv(1 − 𝛼, deg_𝑓𝑟𝑒𝑒𝑑𝑜𝑚)
𝒄𝒍𝒆𝒇𝒕−𝒕𝒂𝒊𝒍𝒆𝒅 =t.inv(𝛼, deg_𝑓𝑟𝑒𝑒𝑑𝑜𝑚)
𝒛𝒄𝒂𝒍𝒄 < 𝒄 𝒐𝒓 𝒕𝒄𝒂𝒍𝒄 < 𝒄 do not reject 𝐻0 at 𝛼% level

𝒛𝒄𝒂𝒍𝒄 > 𝒄 𝒐𝒓 𝒕𝒄𝒂𝒍𝒄 > 𝒄 reject 𝐻0 at 𝛼% level

𝒑 − 𝒗𝒂𝒍𝒖𝒆 Left tailed: 𝑝𝑣𝑎𝑙𝑢𝑒 = 𝑐𝑑𝑓(𝑧𝑐𝑎𝑙𝑐 ) / = 𝑐𝑑𝑓(𝑡𝑐𝑎𝑙𝑐 )


Right tailed: 𝑝𝑣𝑎𝑙𝑢𝑒 = 1 − 𝑐𝑑𝑓(𝑧𝑐𝑎𝑙𝑐 )=1 − 𝑐𝑑𝑓(𝑡𝑐𝑎𝑙𝑐 )

11
*Excel: 𝒑𝒗𝒂𝒍𝒖𝒆(𝒍𝒆𝒇𝒕−𝒕𝒂𝒊𝒍𝒆𝒅) = norm.s.dist(𝑧𝑐𝑎𝑙𝑐 , 1)
𝒑𝒗𝒂𝒍𝒖𝒆(𝒓𝒊𝒈𝒉𝒕−𝒕𝒂𝒊𝒍𝒆𝒅) = 1-norm.s.dist(𝑧𝑐𝑎𝑙𝑐 , 1)
𝒑𝒗𝒂𝒍𝒖𝒆(𝒍𝒆𝒇𝒕−𝒕𝒂𝒊𝒍𝒆𝒅) = t.dist(𝑡𝑐𝑎𝑙𝑐 , 𝑑. 𝑓. ,1)
𝒑𝒗𝒂𝒍𝒖𝒆(𝒓𝒊𝒈𝒉𝒕−𝒕𝒂𝒊𝒍𝒆𝒅) = 1- t.dist(𝑡𝑐𝑎𝑙𝑐 , 𝑑. 𝑓, 1)

Two-sided / tailed test


Null hypothesis 𝐻0 : 𝜇 = 𝜇0
Alternative hypothesis 𝐻1 : 𝜇 ≠ 𝜇0
Decide in the significance level 𝜶
Type I error reject 𝐻0 | 𝐻0 is true False positive
P(type I error)= 𝛼
Type II error don’t reject 𝐻0 | 𝐻0 is false False negative
P(type II error)= 𝛽
Power Correctly reject 𝐻0 | 𝐻0 is false
P(power)= 1 − 𝛽 Sensitivity
Compute test statistic 𝑥̅ − 𝜇0 𝑥̅ − 𝜇0
𝑧𝑐𝑎𝑙𝑐 = 𝑜𝑟 𝑡𝑐𝑎𝑙𝑐 =
𝜎𝑥 /√𝑛 𝑠𝑥 /√𝑛
𝛼
𝐹 (𝑧1−𝛼 ) = 1 − → 𝐶𝐷𝐹 𝑡𝑎𝑏𝑙𝑒 → 𝑐 = ± ⋯
2 2
𝛼
*Excel: =norm.s.inv( 2 ) [nhớ trị rồi ± tại 2 bên]
𝛼
𝐹 (𝑡1−𝛼 ) = 1 − → 𝑡 𝑡𝑎𝑏𝑙𝑒 → 𝑐 = ± ⋯
2 2
*Excel: =t.inv.2t(𝛼, d. f. )
|𝒛𝒄𝒂𝒍𝒄 | < 𝒄 𝒐𝒓 |𝒕𝒄𝒂𝒍𝒄 | < 𝒄 do not reject 𝐻0 at 𝛼% level

|𝒛𝒄𝒂𝒍𝒄 | > 𝒄 𝒐𝒓 |𝒕𝒄𝒂𝒍𝒄 | > 𝒄 reject 𝐻0 at 𝛼% level

𝒑 − 𝒗𝒂𝒍𝒖𝒆 Two tailed:


𝑝𝑣𝑎𝑙𝑢𝑒 = 2 ∗ 𝑃(𝑍 < 𝑧𝑐𝑎𝑙𝑐 )
𝑝𝑣𝑎𝑙𝑢𝑒 = 2 ∗ 𝑃(𝑇 < 𝑡𝑐𝑎𝑙𝑐 )

*Excel: 𝑝𝑣𝑎𝑙𝑢𝑒 = 2 ∗ 𝑛𝑜𝑟𝑚. 𝑠. 𝑑𝑖𝑠𝑡(|𝑧𝑐𝑎𝑙𝑐 |,1)


𝑝𝑣𝑎𝑙𝑢𝑒 = 2 ∗ 𝑡. 𝑑𝑖𝑠𝑡. 2𝑡(|𝑡𝑐𝑎𝑐𝑙 |, 𝑑. 𝑓. )

Left-tailed test Two-tailed test Right-tailed test


𝐻0 : 𝜋 ≥ 𝜋0 𝐻0 : 𝜋 = 𝜋0 𝐻0 : 𝜋 ≤ 𝜋0
𝐻1 : 𝜋 < 𝜋0 𝐻1 : 𝜋 ≠ 𝜋0 𝐻1 : 𝜋 > 𝜋0

TESTING A PROPORTION
𝑥 𝑝−𝜋
𝑧𝑐𝑎𝑙𝑐 = 𝜎 0 =
𝑝−𝜋0
𝑝= (1−𝜋0 )
𝑛 𝑝 √𝜋0
𝑛

12
CHAPTER 10: TWO-SAMPLE TEST
Left-tailed test Two-tailed test Right-tailed test
𝑯𝟎 : 𝝁 𝟏 − 𝝁 𝟐 ≥ 𝑫𝟎 𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝑫𝟎 𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 ≤ 𝑫𝟎
𝑯𝟏 : 𝝁 𝟏 − 𝝁 𝟐 < 𝑫𝟎 𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 ≠ 𝑫𝟎 𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 > 𝑫𝟎

Comparing two mean


Variances known Variances unknown
𝝈𝟐𝟏 , 𝝈𝟐𝟐 Equal (almost use this) Unequal (less common)
𝑥1 − 𝑥
̅̅̅ ̅̅̅2 𝑥1 − 𝑥
̅̅̅ ̅̅̅2 ̅𝑥̅̅̅−𝑥
If 𝑛1 , 𝑛2 ≥ 30, 𝑡ℎ𝑒𝑛 𝑧𝑐𝑎𝑙𝑐 = 1 2
̅̅̅̅
𝑧𝑐𝑎𝑙𝑐 = 𝑡𝑐𝑎𝑙𝑐 = 2
𝑠 𝑠2
2 2 √ 1+ 2
𝜎2 𝜎2 𝑠 𝑠 𝑛1 𝑛2
√ 1 + 2 √ 𝑝+ 𝑝
𝑛1 𝑛2 𝑛1 𝑛2 ̅𝑥̅̅1̅−𝑥 ̅̅̅2̅
If 𝑛1 , 𝑛2 < 30, 𝑡ℎ𝑒𝑛 𝑡𝑐𝑎𝑙𝑐 =
𝑠 2
𝑠 2
√ 1+ 2
𝑛1 𝑛2

Common critical Pooled variance We will not pooled the variances


value (𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22
𝑠𝑝2 =
𝑛1 + 𝑛2 − 2
2
𝑑. 𝑓. = 𝑛1 + 𝑛2 − 2 𝑠2 𝑠2
(𝑛1 + 𝑛2 )
1 2
𝑑. 𝑓. = 2 2
𝑠2 𝑠2
(𝑛1 ) (𝑛2 )
1 2
𝑛1 − 1 + 𝑛2 − 1
Or Welch’s adjusted 𝑑. 𝑓.
𝑚𝑖𝑛{𝑛1 − 1, 𝑛2 − 1)
1 1 𝑠12 𝑠22
(𝑥 𝑥2 ± 𝑡𝛼 √𝑠𝑝2 (
̅̅̅1 − ̅̅̅) + ) (𝑥 𝑥2 ± 𝑡𝛼 √ +
̅̅̅1 − ̅̅̅)
2 𝑛1 𝑛2 2 𝑛1 𝑛2
*Excel Megastat→Hypothesis tests→
Compare two independent
groups→Summary→(Type)
[𝑎, ̅̅̅,
𝑥1 𝑠1 , 𝑛1 ]
[𝑏, ̅̅̅,
𝑥2 𝑠2 , 𝑛2 ]

Comparing two means: Pairs Samples


Obs 1 2 3 … n
Sample 1 𝑥11 𝑥21 𝑥31 … 𝑥𝑛1
Sample 2 𝑥12 𝑥22 𝑥32 … 𝑥𝑛2
Difference 𝑑1 = 𝑥11 − 𝑥12 𝑑2 = 𝑥21 − 𝑥22 𝑑3 = 𝑥31 − 𝑥32 … 𝑑𝑛 = 𝑥𝑛1 − 𝑥𝑛2
Left-tailed test Two-tailed test Right-tailed test
𝐻0 : 𝜇𝑑 ≥ 𝜇𝑑0 𝐻0 : 𝜇𝑑 = 𝜇𝑑0 𝐻0 : 𝜇𝑑 ≤ 𝜇𝑑0
𝐻1 : 𝜇𝑑 < 𝜇𝑑0 𝐻1 : 𝜇𝑑 ≠ 𝜇𝑑0 𝐻1 : 𝜇𝑑 > 𝜇𝑑0
Mean of n differences ∑𝑛𝑖=1 𝑑𝑖
𝑑=̅
𝑛
Std.dev
∑ 𝑛
(𝑑 𝑖 − 𝑑̅ )2
𝑠𝑑 = √ 𝑖=1
𝑛−1

13
Test statistic 𝑑̅−𝜇𝑑
𝑡𝑐𝑎𝑙𝑐 = 𝑠𝑑
√𝑛
𝑠𝑑
𝑑̅ ± 𝑡𝛼
2 √𝑛

Comparing two proportion


𝒙 𝒙
𝒑𝟏 = 𝒏𝟏 , 𝒑𝟐 = 𝒏𝟐
𝟏 𝟐
Left-tailed test Two-tailed test Right-tailed test
𝐻0 : 𝜋1 − 𝜋2 ≥ 0 𝐻0 : 𝜋1 − 𝜋2 = 0 𝐻0 : 𝜋1 − 𝜋2 ≤ 0
𝐻1 : 𝜋1 − 𝜋2 < 0 𝐻0 : 𝜋1 − 𝜋2 ≠ 0 𝐻0 : 𝜋1 − 𝜋2 > 0
𝑝1 − 𝑝2
𝑧𝑐𝑎𝑙𝑐 =
1 1
√𝑝𝑐 (1 − 𝑝𝑐 )( + )
𝑛1 𝑛2
̅ 𝒐𝒓 𝒑𝒄 )
Pooled proportion (𝒑
𝑥1 + 𝑥2
𝑝𝑐 =
𝑛1 + 𝑛2
𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 )
(𝑝1 − 𝑝2 ) ± 𝑧𝛼 √ +
2 𝑛1 𝑛2

Comparing two variances


We may wish to test: 𝐻0 : 𝜎12 = 𝜎22
Against one of:
𝐻𝑎 : 𝜎12 > 𝜎22
𝐻𝑎 : 𝜎12 < 𝜎22
𝐻𝑎 : 𝜎12 ≠ 𝜎22
𝑠2
𝐹𝑐𝑎𝑙𝑐 = 𝑠12 (order not matter: make sure 𝐹𝑐𝑎𝑙𝑐 > 1)
2

Ex: Two-tailed

Critical value
𝐹𝑅 = 𝐹𝑑𝑓1 ,𝑑𝑓2 (*Notice 𝛼 )
1
𝐹𝐿 =
𝐹𝑑𝑓2 ,𝑑𝑓1
𝑑𝑓1 = 𝑛1 − 1
𝑑𝑓2 = 𝑛2 − 1

14
One-tailed Two tailed

*Excel:
𝑎
𝐹𝑅 = 𝐹. 𝐼𝑁𝑉. 𝑅𝑇(𝛼, 𝑑. 𝑓1 , 𝑑. 𝑓2 ) 𝐹𝑅.2𝑇 = 𝐹. 𝐼𝑁𝑉. 𝑅𝑇 (2 , 𝑑. 𝑓1 , 𝑑. 𝑓2 )
1
𝐹𝐿.2𝑇 = 𝑎
𝐹. 𝐼𝑁𝑉. 𝑅𝑇 (2 , 𝑑. 𝑓1 , 𝑑. 𝑓2 )
If 𝐹𝑐𝑎𝑙𝑐 > 1: 𝑡𝑤𝑜 − 𝑡𝑎𝑖𝑙𝑒𝑑 𝒑𝒗𝒂𝒍𝒖𝒆 = 2 ∗ 𝐹. 𝐷𝐼𝑆𝑇. 𝑅𝑇(𝐹𝑐𝑎𝑙𝑐 , 𝑑𝑓1 , 𝑑𝑓2 )
If 𝐹𝑐𝑎𝑙𝑐 < 1: 𝑡𝑤𝑜 − 𝑡𝑎𝑖𝑙𝑒𝑑 𝒑𝒗𝒂𝒍𝒖𝒆 = 2 ∗ 𝐹. 𝐷𝐼𝑆𝑇(𝐹𝑐𝑎𝑙𝑐 , 𝑑𝑓1 , 𝑑𝑓2 , 1)

15
CHAPTER 11: ANALYSIS OF VARIANCE
ANOVA: analysis of variance
One-factor ANOVA (Completely randomized model)
Data in columns
𝑻𝟏 𝑻𝟐 𝑻𝟑
𝑦11 𝑦12 … 𝑦1𝑐
𝑦21 𝑦12 … 𝑦2𝑐
𝑦31 𝑦32 … 𝑦3𝑐
… … … …
𝑒𝑡𝑐. 𝑒𝑡𝑐. … 𝑒𝑡𝑐.
𝑛1 𝑜𝑏𝑠. 𝑛2 𝑜𝑏𝑠. … 𝑛𝑐 𝑜𝑏𝑠.
𝑦1
̅̅̅ 𝑦2
̅̅̅ … 𝑦̅𝑐
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑐 ( all the treatment means are equal)
𝐻1 : not all the means are equal (at least one pair of treatment means differ)

One-factor ANOVA as a Linear Model


𝑦𝑖𝑗 = 𝜇 + 𝑇𝑗 + 𝜀𝑖𝑗
𝜇: 𝑚𝑒𝑎𝑛 𝑗 = 1,2, … , 𝑐
𝑇𝑗 : 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑒𝑓𝑓𝑒𝑐𝑡 𝑖 = 1,2, … , 𝑛𝑗
𝜀𝑖𝑗 : 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟
𝐻0 : 𝑇1 = 𝑇2 = ⋯ = 𝑇𝑐 = 0
𝐻1 : 𝑁𝑜𝑡 𝑎𝑙𝑙 𝑇𝑗 𝑎𝑟𝑒 0
1 𝑛𝑗
Mean of each group (treatments mean)
𝑦̅𝑗 =∑ 𝑦𝑖𝑗
𝑛𝑗 𝑖=1
1 𝑐 𝑛𝑗 1 𝑐
Overall sample mean (grand mean)
𝑦̅ = ∑ ∑ 𝑦𝑖𝑗 = ∑ 𝑛𝑗 𝑦̅𝑗
𝑛 𝑗=1 𝑖=1 𝑛 𝑗=1
Partitioned Sum of Squares
𝑐 𝑛𝑗 2 𝑐 2 𝑐 𝑛𝑗 2
∑ ∑ (𝑦𝑖𝑗 − 𝑦̅) = ∑ 𝑛𝑗 (𝑦̅𝑗 − 𝑦̅) + ∑ ∑ (𝑦𝑖𝑗 − 𝑦̅𝑗 )
𝑗=1 𝑖=1 𝑗=1 𝑗=1 𝑖=1
Or in words:
SST = SSB + SSE
Sum of Squares Total Sum of Squares Sum of Squares
Between Treatments within Treatmens
[Explained by Treatments] [Unexplained Random Error]

Source of Sum of Squares Degrees Mean F


Variation of Square Statistic
Freedom
𝑐 𝑆𝑆𝐵 𝑀𝑆𝐵
Treatment 2 𝑐−1
𝑆𝑆𝐵 = ∑ 𝑛𝑗 (𝑦̅𝑗 − 𝑦̅) 𝑀𝑆𝐴 = 𝐹=
(between 𝑗=1 𝑐−1 𝑀𝑆𝐸
groups)
𝑐 𝑛𝑗 𝑆𝑆𝐸
Error 2 𝑛−𝑐
𝑆𝑆𝐸 = ∑ ∑ (𝑦𝑖𝑗 − 𝑦̅𝑗 ) 𝑀𝑆𝐸 =
(within 𝑗=1 𝑗=1 𝑛−𝑐
groups)
𝑐 𝑛𝑗
Total 2 𝑛−1
𝑆𝑆𝑇 = ∑ ∑ (𝑦𝑖𝑗 − 𝑦̅)
𝑗=1 𝑗=1

16
*Excel:
Data Analysis → Anova: Single Factor
Step 1: State the Hypotheses
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑐
𝐻1 : 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙 (𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡)
Step 2: State the Decision Rule
Find 𝑐 = ⋯ and 𝑛 = ⋯
Then calculate the d.f
Numerator: 𝑑𝑓1 = 𝑐 − 1 = ⋯
Denominator: 𝑑𝑓2 = 𝑛 − 𝑐 = ⋯
Then find 𝐹𝑅 or 𝐹𝐿
𝐹𝑅 = 𝑓. 𝑖𝑛𝑣. 𝑟𝑡(𝛼, 𝑑𝑓1 , 𝑑𝑓2 )
Step 3: Perform the Calculation
Type in Excel like this → Data Analysis →
Anova: Single Factor
Or
type like this → Megastat → Analysis of
Variance → One-Factor
Step 4: Make decision
Option 1: compare 𝐹𝑐𝑎𝑙𝑐 (in Excel is F) and 𝐹𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 (in Excel is F crit)
In right-tailed: 𝐹𝑐𝑎𝑙𝑐 > 𝐹𝑐𝑟𝑖𝑡 → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
Option 2: compare 𝑝𝑣𝑎𝑙𝑢𝑒
In right-tailed: 𝑝𝑣𝑎𝑙𝑢𝑒 < 𝛼 → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0

Turkey’s Test
(to determine exactly which group mean is different)
Always is a two-tailed test
𝐻0 : 𝜇𝑗 = 𝜇𝑘
𝐻1 : 𝜇𝑗 ≠ 𝜇𝑘
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 Option 1:
|𝑦̅𝑗 − ̅̅̅|
𝑦𝑘
𝑇𝑐𝑎𝑙𝑐 =
1 1
√𝑀𝑆𝐸(𝑛 + 𝑛 )
𝑗 𝑘

Option 2:
|𝑥̅𝑗 − ̅̅̅|
𝑥𝑘
𝑇𝑐𝑎𝑙𝑐 =
𝑠𝑝2 𝑠𝑝2

𝑛𝑗 + 𝑛𝑘
(𝑛1 −1)𝑠12 +(𝑛2 −1)𝑠22 +⋯+(𝑛𝑐 −1)𝑠𝑐2
where 𝑠𝑝2 = (𝑛1 −1)+(𝑛2 −1)+⋯+(𝑛𝑐 −1)
Critical value 𝑇𝑐𝑟𝑖𝑡 = 𝑇𝑐,𝑛−𝑐 = 𝑇𝑑𝑓1 ,𝑑𝑓2
𝑐: 𝑔𝑟𝑜𝑢𝑝
𝑛: 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒

17
𝑁𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟: 𝑑𝑓1 = 𝑐
𝐷𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟: 𝑑𝑓2 = 𝑛 − 𝑐
𝑇𝑐𝑎𝑙𝑐 > 𝑇𝑐,𝑛−𝑐 → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
𝑇𝑐𝑎𝑙𝑐 < 𝑇𝑐,𝑛−𝑐 → 𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0

Hartley’s Test
Always is a two-tailed test
𝐻0 : 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑐2 (equal variance)
𝐻1 : 𝑇ℎ𝑒 𝜎𝑗2 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑒𝑞𝑢𝑎𝑙 (unequal variance)
2
Test statistic 𝑠𝑚𝑎𝑥
𝐻𝑐𝑎𝑙𝑐 = 2
𝑠𝑚𝑖𝑛
Critical value 𝐻𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 𝐻𝑑𝑓1 ,𝑑𝑓2
n: total number of obs
𝑁𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟: 𝑑𝑓1 = 𝑐
𝑛
𝐷𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟: 𝑑𝑓2 = 𝑐 − 1
𝐻𝑐𝑎𝑙𝑐 > 𝐻𝑑𝑓1 ,𝑑𝑓2 → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
𝐻𝑐𝑎𝑙𝑐 < 𝐻𝑑𝑓1 ,𝑑𝑓2 → 𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0

18
CHAPTER 12: SIMPLE LINEAR REGRESSION
(Rewind) Correlation relationship:
- measure the the association (linear relationship) between 2 variables
- only concern with the strength of the relationship
- NOT imply cause and effect
- use Scatter plot to show
(In this chapter) Regression analysis:
- predict the value of a dependent variable based on the value of at least 1 variable
- explain the impact on the dependent variable from changes in an independent variable
Dependent variable (Y) Independent variable (X)
we wish to predict or explain the variable we use to explain the dependent one
Type of Relationship Strength of Relationships

Population equation

Simple linear
regression equation

Least Squares 2 2
= 𝑚𝑖𝑛 ∑(𝑦𝑖 − 𝑦̂)
𝑖 = 𝑚𝑖𝑛 ∑(𝑦𝑖 − (𝑏0 + 𝑏1 𝑥𝑖 ))
Criterion
(minimize the sum of ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑏1 =
the squared differences ∑(𝑥𝑖 − 𝑥̅ )2
between 𝑌 𝑎𝑛𝑑 𝑌̂) 𝑏0 = 𝑦̅ − 𝑏1 𝑥̅

19
Analysis of Variance: Overall Fit
𝑛 𝑛
𝑛
2 2
∑ (𝑦𝑖 − 𝑦̅) = ∑(𝑦𝑖 − 𝑦̂)
𝑖 + ∑(𝑦̂𝑖 − 𝑦̅)2
𝑖=1
𝑖=1 𝑖=1
or in words:
𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅
total variation unexplained variation explained
around the mean or error variation by the regression

F Statistic for Overall Fit


Source of Sum of Squares df Mean F Excel
Variation Square p-value
𝑛 𝑆𝑆𝑅 𝑀𝑆𝑅
Regression 1 = 𝐹. 𝐷𝐼𝑆𝑇(𝐹𝑐𝑎𝑙𝑐 ,
𝑆𝑆𝑅 = ∑ (𝑦̂𝑖 − 𝑦̅)2 𝑀𝑆𝑅 = 𝐹𝑐𝑎𝑙𝑐 =
(explained) 𝑖=1 1 𝑀𝑆𝐸 1, 𝑛 − 2)
𝑛 𝑆𝑆𝐸
Residual 𝑛−2
𝑆𝑆𝐸 = ∑ (𝑦𝑖 − 𝑦̂𝑖 )2 𝑀𝑆𝐸 =
(unexpained) 𝑖=1 𝑛−2
𝑛
Total 𝑛−1
𝑆𝑆𝑇 = ∑ (𝑦 − 𝑦̅)2𝑖
𝑖=1

Coefficient of Determination (R-squared) 𝑆𝑆𝑅


𝑟2 =
𝑆𝑆𝑇
r: ratio (0 ≤ 𝑟 2 ≤ 1)

20

You might also like