You are on page 1of 22

SIMPLE LINEAR REGRESSION Ade Y.P., Ph.

D
DEFINITION
Linear regression: best relationship between
Y (dependent variable=response) and
X (independent variable=regressor)
Y=a+bX
Where:
a = intercept
b = slope
DEFINITION
ESTIMATION OF REGRESSION LINE/THE
FITTED REGRESSION LINE
Y=a+bX
Best regression line is regression with the
best estimation of “a” and “b” (fitted).
The fitted regression line is close to the true
regression line when large amount of data
involved. Fitted regression comes from
regression of coefficients “a” and “b”.
ESTIMATION OF REGRESSION LINE/THE
FITTED REGRESSION LINE
Fitted
regression line

True
regression line
FITTED REGRESSION LINE

Residual: Error of fitted model


If real data: (xi,yi) with i = 1,2,3,4,….n
Then fitted model :
Y=a+bX resulted (Xi,Yi) with i = 1,2,3,4,….n
Then Error of fit is E=yi-Yi
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
The sum of squares of the errors about the regression line is
denoted by SSE.
𝒏 𝒏
𝟐
𝑺𝑺𝑬 = ෍ 𝑬𝒊 = ෍ 𝒚𝒊 − 𝒀𝒊 𝟐
𝒊=𝟏𝒏 𝒊=𝟏
𝟐
𝑺𝑺𝑬 = ෍ 𝒚𝒊 − 𝒂 − 𝒃𝒙𝒊
𝒊=𝟏
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
𝒏
𝝏𝑺𝑺𝑬
= −𝟐 ෍ 𝒚𝒊 − 𝒂 − 𝒃𝒙𝒊 = 𝟎
𝝏𝒂
𝒊=𝟏

𝒏
𝝏𝑺𝑺𝑬
= −𝟐 ෍ 𝒚𝒊 − 𝒂 − 𝒃𝒙𝒊 𝒙𝒊 = 𝟎
𝝏𝒃
𝒊=𝟏
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
𝒏 𝒏

𝒏𝒂 + 𝒃 ෍ 𝒙𝒊 = ෍ 𝒚𝒊
𝒊=𝟏 𝒊=𝟏

𝒏 𝒏 𝒏
𝟐
𝒂 ෍ 𝒙𝒊 + 𝒃 ෍ 𝒙𝒊 = ෍ 𝒙 𝒊 𝒚𝒊
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
ESTIMATE COEFFICIENT WITH LEAST
SQUARE METHOD
𝒏 σ𝒏𝒊=𝟏 𝒙𝒊 𝒚𝒊 − σ𝒏𝒊=𝟏 𝒙𝒊 σ𝒏𝒊=𝟏 𝒚𝒊
𝒃= 𝟐
𝒏 σ𝒏𝒊=𝟏 𝒙𝟐𝒊 − 𝒏
σ𝒊=𝟏 𝒙𝒊

σ𝒏𝒊=𝟏 𝒚𝒊 − 𝒃 σ𝒏𝒊=𝟏 𝒙𝒊
𝒂=
𝒏
THE ERROR SUM OF SQUARES (SSE)
A MEASURE OF QUALITY OF FIT:
COEffiCIENT OF DETERMINATION

2
𝑆𝑆𝐸
𝑅 =1−
𝑆𝑆𝑇
Where:
𝑛

𝑆𝑆𝑇 = ෍ 𝑦𝑖 − 𝑦ത𝑖 2

𝑖=1
SST Is the total corrected sum of squared
ILLUSTRATION OF R 2
CORRELATION COEFFICIENT

𝑆𝑥𝑥 𝑆𝑥𝑦
𝑟=𝑏 =
𝑆𝑦𝑦 𝑆𝑥𝑥 𝑆𝑦𝑦

𝑟 ≈ 1 (𝑔𝑜𝑜𝑑 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑥 𝑎𝑛𝑑 𝑦)


2
VARIANCE (𝜎 ) ESTIMATOR

→ 𝑌𝑖 = 𝑎 + 𝑏𝑋𝑖

→ 𝐹𝑖𝑡𝑡𝑒𝑑 𝑀𝑜𝑑𝑒𝑙

→ 𝑅𝑒𝑎𝑙 𝑑𝑎𝑡𝑎
CONFIDENCE INTERVAL FOR 𝛽1
A 100(1-𝛼)% confidence interval for 𝛽1 in regression
line 𝜇𝑌|𝑥 = 𝛽0 + 𝛽1 𝑥 is

Where 𝑡𝛼|2 is the value of t-distribution with n-2


degrees of freedom
CONFIDENCE INTERVAL FOR 𝛽0
A 100(1-𝛼)% confidence interval for 𝛽0 in regression
line 𝜇𝑌|𝑥 = 𝛽0 + 𝛽1 𝑥 is

Where 𝑡𝛼/2 is the value of 𝑡-distribution with (𝑛 − 2)


degrees of freedom
EXAMPLE
From Table 11.1
𝑆𝑥𝑥 = 4152.18
𝑆𝑦𝑦 = 3713.88
𝑆𝑥𝑦 = 3752.09
𝑏1 = 0.903643
𝑏0 = 3.829633
𝑛 = 33
𝑛

෍ 𝑥𝑖 = 41086
𝑖=1

Find a 95% confidence interval for 𝛽1 and 𝛽0 in the


regression line 𝜇𝑌|𝑥 = 𝛽0 + 𝛽1 𝑥 !!!
2
Solve 𝑠 𝑏𝑦 𝑠
Solve 𝛼 𝑎𝑛𝑑 𝑡𝛼/2
Solve interval
𝛼 = 100% − 95% = 1 − 0.95 = 0.05
𝑎𝑛𝑑 𝑡𝛼/2 = 𝑡0.05/2 = 𝑡0.025

 = 𝑛 − 2 = 33 − 2 = 31

Cek tabel distribusi -𝑡


𝑇𝑒𝑛𝑡𝑢𝑘𝑎𝑛 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑑𝑒𝑛𝑔𝑎𝑛 𝑚𝑒𝑚𝑖𝑙𝑖ℎ:
𝛼 = 0.025 𝑑𝑎𝑛  = 31

You might also like