You are on page 1of 15

109-1 ⽣物統計學⼆ 實習課

R : Linear Regression

呂冠臻
2020.10.22
Contents

• Diagnosis of regression model ei := Yi − Yî (Residual)

• Recall simple linear regression model


• Residual analysis
• Multiple linear regression
• Dummy variable
• Interaction
2
Diagnosis of regression model
Recall
n
• Data : {(Xi , Yi)}i=1
Random Part (Error)
2
• Model : Yi = β0 + β1Xi + εi , where εi ∼ N(0, σ ), i = {1,..., n}
M i.i.d

• Estimation method : Least Square Estimation (LSE)


n
2

Minimize the random part. S(β0 , β1) = (Yi − β0 − β1Xi)
n i=1
∑i=1 (Xi − X̄)(Yi − Ȳ) SXY ̂
̂
β1 = = ̂
, β0 = Ȳ − β1 X̄
n
∑i=1 (Xi − X̄)2 SXX

3
Diagnosis of regression model
Recall
i.i.d 2
• Model : Yi = β0 + β1Xi + εi , where εi ∼ N(0, σ ), i = {1,..., n}
M

• Error : εi = Yi − β0 − β1Xi
̂ ̂ ̂
• Residual : ei = Yi − Yi = Yi − β0 − β1 Xi
2
• Use ei to estimate εi σ
n
2 1 2
n−2∑
σ̂ = ei = MSE
i=1
2 2
E[σ ̂ ] = σ
4
Diagnosis of regression model
Assumptions
• Linearity : The relationship between X and the mean of Y is linear
• ε ⊥ E[Y | X] = Y ̂

• Homoscedasticity : The variance of residual is the same for any value of X


2
• ε ⊥ i (Constant variance σ ⊥ Sampling mechanism)
• Independence : Observations are independent of each other
• ε⊥X
• Normality : For any fixed value of X, Y is normally distributed
• Normality of ε
5
Diagnosis of regression model
Resiual plot

• X-axis : i or Xi or Yî

• Y-axis : ei
• ε ⊥ E[Y | X] = Y ̂

• ε⊥i
• ε⊥X
• Normality of ε

6
https://condor.depaul.edu/sjost/it223/documents/regress.htm
Diagnosis of regression model
Scatter plot for residual
• 請建⽴線性迴歸,探討SGOT之變化是否可⽤SGPT來預測或解釋,並對其進⾏
模型檢測。
• ε ⊥ E[Y | X] = Y ̂

• ε⊥i
• ε⊥X
• Normality of ε

• Use ei to estimate εi
7
Diagnosis of regression model
Scatter plot for residual
• Present the diagnosis result • R code
• par(mfrow=c(n, m))
̂
• 3 scatter plots : (Yi , ei) , (Xi , ei) , and (i, ei) • par(mfcol=c(n, m))
• Q-Q plot Check normality of ε

https://bookdown.org/ndphillips/YaRrr/arranging-plots-with-parmfrow-and-layout.html 8
Diagnosis of regression model
Scatter plot for residual

9
Your turn !

• 請建⽴線性迴歸,探討SGOT之變化是否可⽤SGPT來預測或解釋,在此題請將⼤
於SGPT第三分位數之資料剔除(剩437⼈),並對其進⾏模型檢測。

• Hint : Use summary() to get the 3rd quantile, then use which().
• Do not evaluate this question by your “eyes”.

10
Diagnosis of regression model
Results of exercise

11
Multiple linear regression model
Motivation
n
• Data : {(Xi , Yi)}i=1
Random Part (Error)

2
• Model : Yi = β0 + β1Xi1 + . . . + βpXip + εi , where εi ∼ N(0, σ ), i = {1,..., n}
M i.i.d

• Estimation method : Least Square Estimation (LSE)


• Motivation : there may many characteristic for a single observation
• Multiple covariate may improve the estimation of regression

12
Linear Regression
Example : Indian Liver Patient Data
• 探討DB之變化是否可⽤Gender及SGPT來預測或解釋,並檢定其預測是否達
統計學上之顯著︖(α = 0.05)
H0 : βSGOT = 0
HA : βSGOT ≠ 0
H0 : βGender = 0
HA : βGender ≠ 0

• p-value <0.05,達統計上之顯著,故拒絕虛無假設。
當調整SGOT之差異後,以平均來說,男⽣組相較於
13
女⽣的DB⾼0.5247個單位,且達統計上之顯著。
Linear Regression
Interaction term
• 探討DB之變化是否可⽤Gender、SGPT及其⼆之交互作⽤,來預測或解釋,
並檢定其預測是否達統計學上之顯著︖(α = 0.05)
Yi = β0 + βSGOT XSGOT,i + βSex XSex,i + βinteract XSex,i XSGOT,i + εi

• Why we need interaction term in


our multiple linear regression?

• How to interpret it ?
14
Linear Regression
Why we need interaction term ?

Female Male

15

You might also like