Professional Documents
Culture Documents
D
DEFINITION
Linear regression: best relationship between
Y (dependent variable=response) and
X (independent variable=regressor)
Y=a+bX
Where:
a = intercept
b = slope
DEFINITION
ESTIMATION OF REGRESSION LINE/THE
FITTED REGRESSION LINE
Y=a+bX
Best regression line is regression with the
best estimation of “a” and “b” (fitted).
The fitted regression line is close to the true
regression line when large amount of data
involved. Fitted regression comes from
regression of coefficients “a” and “b”.
ESTIMATION OF REGRESSION LINE/THE
FITTED REGRESSION LINE
Fitted
regression line
True
regression line
FITTED REGRESSION LINE
𝒏
𝝏𝑺𝑺𝑬
= −𝟐 𝒚𝒊 − 𝒂 − 𝒃𝒙𝒊 𝒙𝒊 = 𝟎
𝝏𝒃
𝒊=𝟏
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
𝒏 𝒏
𝒏𝒂 + 𝒃 𝒙𝒊 = 𝒚𝒊
𝒊=𝟏 𝒊=𝟏
𝒏 𝒏 𝒏
𝟐
𝒂 𝒙𝒊 + 𝒃 𝒙𝒊 = 𝒙 𝒊 𝒚𝒊
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
𝒏 σ𝒏𝒊=𝟏 𝒙𝒊 𝒚𝒊 − σ𝒏𝒊=𝟏 𝒙𝒊 σ𝒏𝒊=𝟏 𝒚𝒊
𝒃= 𝟐
𝒏 σ𝒏𝒊=𝟏 𝒙𝟐𝒊 − 𝒏
σ𝒊=𝟏 𝒙𝒊
σ𝒏𝒊=𝟏 𝒚𝒊 − 𝒃 σ𝒏𝒊=𝟏 𝒙𝒊
𝒂=
𝒏
THE ERROR SUM OF SQUARES (SSE)
A MEASURE OF QUALITY OF FIT:
COEffiCIENT OF DETERMINATION
2
𝑆𝑆𝐸
𝑅 =1−
𝑆𝑆𝑇
Where:
𝑛
𝑆𝑆𝑇 = 𝑦𝑖 − 𝑦ത𝑖 2
𝑖=1
SST Is the total corrected sum of squared
ILLUSTRATION OF R 2
CORRELATION COEFFICIENT
𝑆𝑥𝑥 𝑆𝑥𝑦
𝑟=𝑏 =
𝑆𝑦𝑦 𝑆𝑥𝑥 𝑆𝑦𝑦
→ 𝑌𝑖 = 𝑎 + 𝑏𝑋𝑖
→ 𝐹𝑖𝑡𝑡𝑒𝑑 𝑀𝑜𝑑𝑒𝑙
→ 𝑅𝑒𝑎𝑙 𝑑𝑎𝑡𝑎
CONFIDENCE INTERVAL FOR 𝛽1
A 100(1-𝛼)% confidence interval for 𝛽1 in regression
line 𝜇𝑌|𝑥 = 𝛽0 + 𝛽1 𝑥 is
𝑥𝑖 = 41086
𝑖=1
= 𝑛 − 2 = 33 − 2 = 31