Professional Documents
Culture Documents
SB8 Formula
SB8 Formula
CHAPTER 4..................................................................................................................................2
CHAPTER 5..................................................................................................................................3
CHAPTER 6: DISCRETE DISTRIBUTION ........................................................................................4
CHAPTER 7: CONTINUOUS DISTRIBUTION .................................................................................6
CHATER 8: SAMPLING DISTRIBUTION ........................................................................................9
CHAPTER 9: ONE-SAMPLE HYPOTHESIS TESTS ........................................................................11
CHAPTER 10: TWO-SAMPLE TEST ............................................................................................13
CHAPTER 11: ANALYSIS OF VARIANCE .....................................................................................16
CHAPTER 12: SIMPLE LINEAR REGRESSION .............................................................................19
1
CHAPTER 4
Distribution’s Shape Statistics
Skewed Left (negative skewness) Mean < Median < Mode
Symmetric 𝑀𝑒𝑎𝑛 ≈ 𝑀𝑒𝑑𝑖𝑎𝑛 ≈ 𝑀𝑜𝑑𝑒
Skewed Right (positive skewness) Mean > Median > Mode
2
CHAPTER 5
Complement of an event P(A) + P(A’) = 1
Union of two events (A or B) 𝐴∪𝐵
Intersection of two events (A and B) 𝐴∩𝐵
General Law of Addition: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
Mutually exclusive events If 𝐴 ∩ 𝐵 = ∅ then 𝑃(𝐴 ∩ 𝐵) = ∅
In this case: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Collectively Exhaustive Events
Conditional Probability 𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = for P(B) > 0
𝑃(𝐵)
Independent events 𝑃(𝐴1 ∩ 𝐴2 ∩ … 𝐴𝑛 ) = 𝑃(𝐴1 )𝑃(𝐴2 ) … 𝑃(𝐴𝑛 )
𝑃(𝐴∩𝐵) 𝑃(𝐴)𝑃(𝐵)
[when 𝑃(𝐴|𝐵) = 𝑃(𝐵) = 𝑃(𝐵) =
𝑃(𝐴)]
Bayes’s Theorem (in Independent 𝑃(𝐴|𝐵)𝑃(𝐵)
events) 𝑃(𝐵|𝐴) =
𝑃(𝐴)
𝑃(𝐴|𝐵)𝑃(𝐵)
=
𝑃(𝐴|𝐵)𝑃(𝐵) + 𝑃(𝐴|𝐵 ′ )𝑃(𝐵′ )
Permutation 𝑛!
𝑛𝑃𝑟 =
(𝑛 − 𝑟)!
Combination 𝑛!
𝑛𝐶𝑟 =
𝑟! (𝑛 − 𝑟)!
3
CHAPTER 6: DISCRETE DISTRIBUTION
𝑁
𝐸(𝑋) = 𝜇 = ∑ 𝑥𝑖 𝑃(𝑥𝑖 )
𝑖=1
𝑁
𝑉𝑎𝑟(𝑋) = 𝜎 2 = ∑ [(𝑥𝑖 − 𝜇)2 ]𝑃(𝑥𝑖 )
𝑖=1
Uniform Distribution
Parameters a =lower limit
b = upper limit
PDF 1
𝑃(𝑋 = 𝑥) =
𝑏−𝑎+1
CDF 𝑥−𝑎+1
𝑃(𝑋 ≤ 𝑥) =
𝑏−𝑎+1
Domain x = a, a+1, a+2, …, b-1, b
Mean 𝑎+𝑏
=
2
Standard Deviation
[(𝑏 − 𝑎) + 1]2 − 1
=√
12
Random data generation in Excel =RANDBETWEEN(a,b)
Comments Useful as a benchmark, to generate random
integers for sampling, or in simulation
models, always symmetric.
Binominal Distribution
Parameter n = number of trials
𝜋 = probability of success
PDF 𝑛!
𝑃(𝑋 = 𝑥) = 𝜋 𝑥 (1 − 𝜋 )𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
Excel*PDF = BINOM.DIST(𝑥, 𝑛, 𝜋, 0)
Excel*CDF = BINOM.DIST(𝑥, 𝑛, 𝜋, 1)
Domain x = 0, 1, 2,…n
Mean 𝑛𝜋
Standard Deviation √𝑛𝜋(1 − 𝜋)
Random data generation in Excel = BINOM.INV(𝑛, 𝜋, 𝑅𝐴𝑁𝐷( ))
Comments Skewed right if 𝜋 < .5, skewed left if 𝜋 > .5
and symmetric if 𝜋 = .5
Poisson Distribution
Parameter 𝜆 = mean arrivals per unit of time or space
PDF 𝜆𝑥 𝑒 −𝜆
𝑃(𝑋 = 𝑥) =
𝑥!
Excel*PDF = POISSON.DIST(𝑥, 𝜆, 0)
Excel*CDF = POISSON.DIST(𝑥, 𝜆, 1)
Domain x = 0, 1, 2,…(no obvious upper limit)
Mean 𝜆
Standard Deviation √𝜆
4
Comments Always right-skewed, but less so for larger
𝜆
Hypergeometric Distribution
Parameter N = number of items in the population
n = sample size
s = number of success in population
.
PDF 𝑠𝐶𝑥𝑁−𝑠 𝐶𝑛−𝑥
𝑃(𝑋 = 𝑥) = .
𝑁𝐶𝑛
Excel*PDF =HYPGEOM.DIST(x,s,n,N,0)
Domain 𝑚𝑎𝑥(0, 𝑛 − 𝑁 + 𝑠) ≤ 𝑋 ≤ 𝑚𝑖𝑛(𝑠, 𝑛)
Mean 𝑛𝜋 𝑤ℎ𝑒𝑟𝑒 𝜋 = 𝑠/𝑁
Standard Deviation
𝑁−𝑛
√𝑛𝜋(1 − 𝜋)√
𝑛−1
Comments Similar to binominal, but sampling without
replacement from a finite popuation. It can
be axproxiamatedby a binominal with 𝜋 =
𝑠 𝑛 𝑠
𝑖𝑓 𝑁 < 0.05 and is symmetric if 𝑁 = 0.5
𝑁
Geometric Distribution
Parameters 𝜋 = probability of success
PDF 𝑃(𝑋 = 𝑥) = 𝜋(1 − 𝜋)𝑥−1
CDF 𝑃(𝑋 ≤ 𝑥) = 1 − (1 − 𝜋)𝑥
Domain x = 1, 2, …
Mean 1
=
𝜋
Standard Deviation
1−𝜋
=√
𝜋2
Random data generation in Excel =1+INT(LN(1-RAND()/LN(1-𝜋))
Comments Describes the number of trials before the
first success. Highly skewed.
5
CHAPTER 7: CONTINUOUS DISTRIBUTION
Quartile to CDF
Q1 F(z) = 0.25
Q2 F(z) = 0.5
Q3 F(z) = 0.75
Q4 F(z) = 1
𝜎
Standard error (of the mean): 𝑆𝐸 =
√𝑛
Uniform Distribution
Parameters a = lower limit
b = upper limit
PDF 1
𝑓(𝑥) =
𝑏−𝑎
CDF 𝑥−𝑎
𝑃(𝑋 ≤ 𝑥) =
𝑏−𝑎
Domain 𝑎≤𝑥≤𝑏
Mean 𝑎+𝑏
2
Standard Deviation
(𝑏 − 𝑎)2
√
12
Shape Symmetric with no mode
Random data in Excel =a+(b-a)*RAND()
Comments Used as a conservative what-if benchmark
and in simulation
Normal Distribution
Parameters 𝜇 = population mean
𝜎 = population standard deviation
PDF 1 1 𝑥−𝜇 2
𝑓(𝑥) = 𝑒 −2( 𝜎 )
𝜎 √2𝜋
Domain −∞ < 𝑥 < +∞
Mean 𝜇
Standard Deviation 𝜎
Shape Symmetric, mesokurtic, and bell-shaped
PDF in *Excel =NORM.DIST(𝑥, 𝜇, 𝜎, 0)
CDF in *Excel =NORM.DIST(𝑥, 𝜇, 𝜎, 1)
Random data generation in *Excel =NORM.INV(RAND(), 𝜇, 𝜎)
6
Standard Normal Distribution
Parameters 𝜇 = population mean
𝜎 = population standard deviation
PDF 1 −𝑧 2 𝑥−𝜇
𝑓(𝑧) = 𝑒 2 𝑤ℎ𝑒𝑟𝑒 𝑧 =
√2𝜋 𝜎
Domain −∞ < 𝑥 < +∞
Mean 0
Standard Deviation 1
Shape Symmetric, mesokurtic, and bell-shaped
CDF in *Excel =NORM.S.DIST(z,1)
Random data in *Excel =NORM.S.INV(RAND())
Comment No simple formula for a normal CDF so we
use CDF-table
How to find CDF:
1. Find z→ 𝑃(𝑋1 ≤ 𝑥 ≤ 𝑋2 ) = 𝑃(𝑍1 ≤ 𝑧 ≤ 𝑍2 ) = 𝐹(𝑍2 ) − 𝐹(𝑍1 )
2. Look up CDF table { Use for 𝑃 (𝑍 ≤ 𝑧) 𝑜𝑟 𝑃(𝑍 < 𝑧) 𝑡𝑜 𝑓𝑖𝑛𝑑 𝐹(𝑍)}
How to find x (Inverse Normal):
1. 𝐹(𝑍) → 𝑍 → 𝑘𝑛𝑜𝑤 𝑃(𝑧 < 𝑍) → 𝑃(𝑥 < 𝑧𝜎 + 𝜇)
7
Exponential Distribution (continuous Poisson)
Parameter 𝜆 = mean arrivals per unit of time or space
(same as Poisson)
PDF 𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥
CDF 𝑃(𝑋 ≤ 𝑥) = 1 − 𝑒 −𝜆𝑥
𝑃(𝑋 > 𝑥) = 𝑒 −𝜆𝑥
Hỏi
Domain 𝑥≥0
Mean 1/𝜆 (mean time between events)
Standard Deviation 1/𝜆
Shape Always right-skewed
CDF in *Excel =EXPON.DIST(𝑥, 𝜆, 1)
Random data in Excel =LN(RAND())/𝜆
Comments Waiting time is exponential when arrivals
1
follow a Poisson model. Often 𝜆 is given
(mean time between events) rather than 𝜆
8
CHATER 8: SAMPLING DISTRIBUTION
Central Limit Theorem: even if the population of X is not normal, sample mean from that x
will be approxiamtely normal as long as the sample size is large enough.
Interval
Sample means 𝜎 𝜎
[𝜇 − 𝑧 ,𝜇 +𝑧 ]
√𝑛 √𝑛
Population [𝜇 − 𝑧𝜎, 𝜇 + 𝑧𝜎]
Confidence Interval
𝑥̅ −𝜇𝑥 𝜎
𝑧𝑥̅ = 𝜎𝑥̅ = 𝑥 (standard error) 𝜇𝑥̅ (𝑚𝑒𝑎𝑛 𝑤𝑒𝑖𝑔ℎ𝑡) = 𝜇
𝜎𝑥 /√𝑛 √𝑛
1/ 𝝈𝒙 are known
𝜎𝑥
(1 − 𝛼)𝐶𝐼 = 𝑥̅ ± 𝑧1−𝛼 𝜎𝑥̅ = 𝑥̅ ± 𝑧1−𝛼
2 2 √𝑛
(point estimate) (margin of error)
𝛼
𝐹(𝑧) = 1 − → 𝐶𝐷𝐹 𝑡𝑎𝑏𝑙𝑒 → 𝑧
2
Margin of error 𝜎𝑥
𝑒 = 𝑧1−𝛼
√𝑛
2
(𝑧1−𝛼 )2 𝜎𝑥 2
2
→ 𝑛=
𝑒2
2/ 𝝈𝒙 are not known
Sample Standard Deviation
∑𝑛 (𝑥𝑖 − 𝑥̅ )2
𝑠𝑥 = √ 𝑖=1
𝑛−1
Degrees of freedom: 𝑑. 𝑓. = 𝑛 − 1 (larger𝑑. 𝑓. → approaches normal dist.)
Substitute 𝒔𝒙 for 𝝈𝒙 𝑥̅ − 𝜇𝑥
𝑡𝑥 = ~𝑡𝑛−1
𝑠𝑥 /√𝑛
*EXCEL: =T.INV.2T(𝛼,d.f.)
𝑠𝑥
(1 − 𝛼)𝐶𝐼 = 𝑥̅ ± 𝑡1−𝛼
2 √𝑛
(point estimate) (width)
𝛼
𝐹(𝑡) = 1 − → 𝑡 𝑡𝑎𝑏𝑙𝑒 (𝑣𝑒𝑟𝑡𝑖𝑐𝑙𝑒 𝑖𝑠 𝑛 − 1) → 𝑡
2
9
2. CI for 𝝅 (nếu ko biết p thì 𝒑 = 𝟎. 𝟓)
𝜋(1 − 𝜋)
𝜎𝑝 = √
𝑛
Sample proportion 𝑝 = 𝑥/𝑛
Shape Symmetric if: 𝜋 = 0.5
𝐶𝑙𝑜𝑠𝑒𝑟 𝑡𝑜 𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑖𝑐 𝑖𝑓 𝑛 ↑
Rule of Thumb 𝑝 = 𝑥/𝑛 assumed normal if:
𝑛𝜋 ≥ 10
𝑛(1 − 𝜋) ≥ 10
𝑝(1 − 𝑝)
(1 − 𝛼)𝐶𝐼 = 𝑝 ± 𝑧1−𝛼 √
2 𝑛
10
CHAPTER 9: ONE-SAMPLE HYPOTHESIS TESTS
(*** 𝜶 in chapter different from 𝜶 CHAP 7)
0
If n increase, 𝛼 𝑎𝑛𝑑 𝛽 𝑏𝑜𝑡ℎ 𝑑𝑒𝑐𝑟𝑒𝑎𝑠𝑒
(mẫu nhiều thì sai sót sẽ giảm)
*Note terms:
level of significance 𝛼
Left-tailed test Two-tailed test Right-tailed test
𝐻0 : 𝜇 ≥ 𝜇0 𝐻0 : 𝜇 = 𝜇0 𝐻0 : 𝜇 ≤ 𝜇0
𝐻1 : 𝜇 < 𝜇0 𝐻1 : 𝜇 ≠ 𝜇0 𝐻1 : 𝜇 > 𝜇0
One-sided / tailed test (xài cho right-tailed, nếu left thì đk z, t ngược lại)
Null hypothesis 𝐻0 : 𝜇 ≤ 𝜇0 (right-tailed) 𝐻0 : 𝜇 ≥ 𝜇0 (left-tailed)
Alternative hypothesis 𝐻1 : 𝜇 > 𝜇0 (right-tailed) 𝐻0 : 𝜇 < 𝜇0 (left-tailed)
Decide in the significance level 𝜶
Type I error reject 𝐻0 | 𝐻0 is true False positive
P(type I error)= 𝛼
Type II error don’t reject 𝐻0 | 𝐻0 is false False negative
P(type II error)= 𝛽
Power Correctly reject 𝐻0 | 𝐻0 is false
P(power)= 1 − 𝛽 Sensitivity
Compute test statistic 𝑥̅ − 𝜇0 𝑥̅ − 𝜇0
𝑧𝑐𝑎𝑙𝑐 = 𝑜𝑟 𝑡𝑐𝑎𝑙𝑐 =
𝜎𝑥 /√𝑛 𝑠𝑥 /√𝑛
𝐹(𝑧1−𝛼 ) = 1 − 𝛼 → 𝐶𝐷𝐹 𝑡𝑎𝑏𝑙𝑒 → 𝑐 = ⋯
*Excel: 𝑐𝑟𝑖𝑔ℎ𝑡−𝑡𝑎𝑖𝑙𝑒𝑑 =norm.s.inv(1-𝛼)
𝑐𝑙𝑒𝑓𝑡−𝑡𝑎𝑖𝑙𝑒𝑑 =norm.s.inv(𝛼)
𝐹(𝑡1−𝛼 ) = 1 − 𝛼 → 𝑡 𝑡𝑎𝑏𝑙𝑒 → 𝑐 = ⋯
*Excel: 𝒄𝒓𝒊𝒈𝒉𝒕−𝒕𝒂𝒊𝒍𝒆𝒅 =t.inv(1 − 𝛼, deg_𝑓𝑟𝑒𝑒𝑑𝑜𝑚)
𝒄𝒍𝒆𝒇𝒕−𝒕𝒂𝒊𝒍𝒆𝒅 =t.inv(𝛼, deg_𝑓𝑟𝑒𝑒𝑑𝑜𝑚)
𝒛𝒄𝒂𝒍𝒄 < 𝒄 𝒐𝒓 𝒕𝒄𝒂𝒍𝒄 < 𝒄 do not reject 𝐻0 at 𝛼% level
11
*Excel: 𝒑𝒗𝒂𝒍𝒖𝒆(𝒍𝒆𝒇𝒕−𝒕𝒂𝒊𝒍𝒆𝒅) = norm.s.dist(𝑧𝑐𝑎𝑙𝑐 , 1)
𝒑𝒗𝒂𝒍𝒖𝒆(𝒓𝒊𝒈𝒉𝒕−𝒕𝒂𝒊𝒍𝒆𝒅) = 1-norm.s.dist(𝑧𝑐𝑎𝑙𝑐 , 1)
𝒑𝒗𝒂𝒍𝒖𝒆(𝒍𝒆𝒇𝒕−𝒕𝒂𝒊𝒍𝒆𝒅) = t.dist(𝑡𝑐𝑎𝑙𝑐 , 𝑑. 𝑓. ,1)
𝒑𝒗𝒂𝒍𝒖𝒆(𝒓𝒊𝒈𝒉𝒕−𝒕𝒂𝒊𝒍𝒆𝒅) = 1- t.dist(𝑡𝑐𝑎𝑙𝑐 , 𝑑. 𝑓, 1)
TESTING A PROPORTION
𝑥 𝑝−𝜋
𝑧𝑐𝑎𝑙𝑐 = 𝜎 0 =
𝑝−𝜋0
𝑝= (1−𝜋0 )
𝑛 𝑝 √𝜋0
𝑛
12
CHAPTER 10: TWO-SAMPLE TEST
Left-tailed test Two-tailed test Right-tailed test
𝑯𝟎 : 𝝁 𝟏 − 𝝁 𝟐 ≥ 𝑫𝟎 𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝑫𝟎 𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 ≤ 𝑫𝟎
𝑯𝟏 : 𝝁 𝟏 − 𝝁 𝟐 < 𝑫𝟎 𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 ≠ 𝑫𝟎 𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 > 𝑫𝟎
13
Test statistic 𝑑̅−𝜇𝑑
𝑡𝑐𝑎𝑙𝑐 = 𝑠𝑑
√𝑛
𝑠𝑑
𝑑̅ ± 𝑡𝛼
2 √𝑛
Ex: Two-tailed
Critical value
𝐹𝑅 = 𝐹𝑑𝑓1 ,𝑑𝑓2 (*Notice 𝛼 )
1
𝐹𝐿 =
𝐹𝑑𝑓2 ,𝑑𝑓1
𝑑𝑓1 = 𝑛1 − 1
𝑑𝑓2 = 𝑛2 − 1
14
One-tailed Two tailed
*Excel:
𝑎
𝐹𝑅 = 𝐹. 𝐼𝑁𝑉. 𝑅𝑇(𝛼, 𝑑. 𝑓1 , 𝑑. 𝑓2 ) 𝐹𝑅.2𝑇 = 𝐹. 𝐼𝑁𝑉. 𝑅𝑇 (2 , 𝑑. 𝑓1 , 𝑑. 𝑓2 )
1
𝐹𝐿.2𝑇 = 𝑎
𝐹. 𝐼𝑁𝑉. 𝑅𝑇 (2 , 𝑑. 𝑓1 , 𝑑. 𝑓2 )
If 𝐹𝑐𝑎𝑙𝑐 > 1: 𝑡𝑤𝑜 − 𝑡𝑎𝑖𝑙𝑒𝑑 𝒑𝒗𝒂𝒍𝒖𝒆 = 2 ∗ 𝐹. 𝐷𝐼𝑆𝑇. 𝑅𝑇(𝐹𝑐𝑎𝑙𝑐 , 𝑑𝑓1 , 𝑑𝑓2 )
If 𝐹𝑐𝑎𝑙𝑐 < 1: 𝑡𝑤𝑜 − 𝑡𝑎𝑖𝑙𝑒𝑑 𝒑𝒗𝒂𝒍𝒖𝒆 = 2 ∗ 𝐹. 𝐷𝐼𝑆𝑇(𝐹𝑐𝑎𝑙𝑐 , 𝑑𝑓1 , 𝑑𝑓2 , 1)
15
CHAPTER 11: ANALYSIS OF VARIANCE
ANOVA: analysis of variance
One-factor ANOVA (Completely randomized model)
Data in columns
𝑻𝟏 𝑻𝟐 𝑻𝟑
𝑦11 𝑦12 … 𝑦1𝑐
𝑦21 𝑦12 … 𝑦2𝑐
𝑦31 𝑦32 … 𝑦3𝑐
… … … …
𝑒𝑡𝑐. 𝑒𝑡𝑐. … 𝑒𝑡𝑐.
𝑛1 𝑜𝑏𝑠. 𝑛2 𝑜𝑏𝑠. … 𝑛𝑐 𝑜𝑏𝑠.
𝑦1
̅̅̅ 𝑦2
̅̅̅ … 𝑦̅𝑐
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑐 ( all the treatment means are equal)
𝐻1 : not all the means are equal (at least one pair of treatment means differ)
16
*Excel:
Data Analysis → Anova: Single Factor
Step 1: State the Hypotheses
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑐
𝐻1 : 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙 (𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡)
Step 2: State the Decision Rule
Find 𝑐 = ⋯ and 𝑛 = ⋯
Then calculate the d.f
Numerator: 𝑑𝑓1 = 𝑐 − 1 = ⋯
Denominator: 𝑑𝑓2 = 𝑛 − 𝑐 = ⋯
Then find 𝐹𝑅 or 𝐹𝐿
𝐹𝑅 = 𝑓. 𝑖𝑛𝑣. 𝑟𝑡(𝛼, 𝑑𝑓1 , 𝑑𝑓2 )
Step 3: Perform the Calculation
Type in Excel like this → Data Analysis →
Anova: Single Factor
Or
type like this → Megastat → Analysis of
Variance → One-Factor
Step 4: Make decision
Option 1: compare 𝐹𝑐𝑎𝑙𝑐 (in Excel is F) and 𝐹𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 (in Excel is F crit)
In right-tailed: 𝐹𝑐𝑎𝑙𝑐 > 𝐹𝑐𝑟𝑖𝑡 → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
Option 2: compare 𝑝𝑣𝑎𝑙𝑢𝑒
In right-tailed: 𝑝𝑣𝑎𝑙𝑢𝑒 < 𝛼 → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
Turkey’s Test
(to determine exactly which group mean is different)
Always is a two-tailed test
𝐻0 : 𝜇𝑗 = 𝜇𝑘
𝐻1 : 𝜇𝑗 ≠ 𝜇𝑘
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 Option 1:
|𝑦̅𝑗 − ̅̅̅|
𝑦𝑘
𝑇𝑐𝑎𝑙𝑐 =
1 1
√𝑀𝑆𝐸(𝑛 + 𝑛 )
𝑗 𝑘
Option 2:
|𝑥̅𝑗 − ̅̅̅|
𝑥𝑘
𝑇𝑐𝑎𝑙𝑐 =
𝑠𝑝2 𝑠𝑝2
√
𝑛𝑗 + 𝑛𝑘
(𝑛1 −1)𝑠12 +(𝑛2 −1)𝑠22 +⋯+(𝑛𝑐 −1)𝑠𝑐2
where 𝑠𝑝2 = (𝑛1 −1)+(𝑛2 −1)+⋯+(𝑛𝑐 −1)
Critical value 𝑇𝑐𝑟𝑖𝑡 = 𝑇𝑐,𝑛−𝑐 = 𝑇𝑑𝑓1 ,𝑑𝑓2
𝑐: 𝑔𝑟𝑜𝑢𝑝
𝑛: 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒
17
𝑁𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟: 𝑑𝑓1 = 𝑐
𝐷𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟: 𝑑𝑓2 = 𝑛 − 𝑐
𝑇𝑐𝑎𝑙𝑐 > 𝑇𝑐,𝑛−𝑐 → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
𝑇𝑐𝑎𝑙𝑐 < 𝑇𝑐,𝑛−𝑐 → 𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0
Hartley’s Test
Always is a two-tailed test
𝐻0 : 𝜎12 = 𝜎22 = ⋯ = 𝜎𝑐2 (equal variance)
𝐻1 : 𝑇ℎ𝑒 𝜎𝑗2 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑒𝑞𝑢𝑎𝑙 (unequal variance)
2
Test statistic 𝑠𝑚𝑎𝑥
𝐻𝑐𝑎𝑙𝑐 = 2
𝑠𝑚𝑖𝑛
Critical value 𝐻𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 𝐻𝑑𝑓1 ,𝑑𝑓2
n: total number of obs
𝑁𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟: 𝑑𝑓1 = 𝑐
𝑛
𝐷𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟: 𝑑𝑓2 = 𝑐 − 1
𝐻𝑐𝑎𝑙𝑐 > 𝐻𝑑𝑓1 ,𝑑𝑓2 → 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
𝐻𝑐𝑎𝑙𝑐 < 𝐻𝑑𝑓1 ,𝑑𝑓2 → 𝐷𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0
18
CHAPTER 12: SIMPLE LINEAR REGRESSION
(Rewind) Correlation relationship:
- measure the the association (linear relationship) between 2 variables
- only concern with the strength of the relationship
- NOT imply cause and effect
- use Scatter plot to show
(In this chapter) Regression analysis:
- predict the value of a dependent variable based on the value of at least 1 variable
- explain the impact on the dependent variable from changes in an independent variable
Dependent variable (Y) Independent variable (X)
we wish to predict or explain the variable we use to explain the dependent one
Type of Relationship Strength of Relationships
Population equation
Simple linear
regression equation
Least Squares 2 2
= 𝑚𝑖𝑛 ∑(𝑦𝑖 − 𝑦̂)
𝑖 = 𝑚𝑖𝑛 ∑(𝑦𝑖 − (𝑏0 + 𝑏1 𝑥𝑖 ))
Criterion
(minimize the sum of ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑏1 =
the squared differences ∑(𝑥𝑖 − 𝑥̅ )2
between 𝑌 𝑎𝑛𝑑 𝑌̂) 𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
19
Analysis of Variance: Overall Fit
𝑛 𝑛
𝑛
2 2
∑ (𝑦𝑖 − 𝑦̅) = ∑(𝑦𝑖 − 𝑦̂)
𝑖 + ∑(𝑦̂𝑖 − 𝑦̅)2
𝑖=1
𝑖=1 𝑖=1
or in words:
𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅
total variation unexplained variation explained
around the mean or error variation by the regression
20