You are on page 1of 54

AMS 572 Presentation

CH 10 Simple Linear Regression


Introduction
Example:

David Beckham: 1.83m Brad Pitt: 1.83m George Bush :1.81m


Victoria Beckham: 1.68m Angelina Jolie: 1.70m Laura Bush: ?

● To predict height of the wife in a couple, based on the husband’s height


Response (out come or dependent) variable (Y): height of the wife
Predictor (explanatory or independent) variable (X): height of the husband
Regression analysis:
●  regression analysis is a statistical methodology to estimate the relationship
of a response variable to a set of predictor variable.

●  when there is just one predictor variable, we will use simple linear regression.
When there are two or more predictor variables, we use multiple linear regressio
n.

●when it is not clear which variable represents a response and which is a predictor,
correlation analysis is used to study the strength of the relationship

History:
● The earliest form of linear regression was the method of least squares, which
was published by Legendre in 1805, and by Gauss in 1809.
● The method was extended by Francis Galton in the 19th century to describe

a biological phenomenon.
● This work was extended by Karl Pearson and Udny Yule to a more general s

tatistical context around 20th century.


A probabilistic model
 Specific settings of the predictor variable

x1 , x2 , ..., xn
Corresponding values of the response variable

y1 , y2 , ..., y n
ASSUME:

yi - Observed value of the random variable Yi depends on xi

Yi   0  1 xi   i (i  1, 2, ..., n) (10.1)

E ( i )  0
i - random error with
Var ( i )   2

 E (Yi )  i   0  1 xi (10.2) unknown mean of Yi

True Regression Line Unknown Slope


Unknown Intercept
4 BASIC
ASSUMPTIONS
Yi Linear function of xi

Have a common variance, 


2

Same for all values of x.

i Normally distributed

Independent
Comments:
1. Linear not because of x
Linear in the parameters  0 and 1

Example:

E (Y )   0  1 log x linear, logx = x

2. Predictor variable is not set as predetermined fixed values,


is random along with Y

Example: Height and Weight of the children.


Height (X) – given
Weight (Y) – predict

E (Y | X  x )   0  1 x

Conditional expectation of Y given X = x


10.2 Fitting the Simple
Linear Regression
Model
10.2.1 Least Squares (LS) Fit
Example 10.1(Tires Tread Wear vs. Mileage: Scatter
Plot)
y  0  1 x n
Q   [ yi  (  0  1 xi )]
2
yi  (0  1 xi )(i  1,2,.....n)
i 1
The “best” fitting straight line in the sense of minimizing Q: LS
estimate

 
One way to find the LS estimate  0 and 1
Q n
 2 [ yi  (  0  1 xi )]
 0 i 1

Q n
 2 xi [ yi  (  0  1 xi )]
1 i 1

Setting these partial derivatives equal to zero and simplifying, we get

n n
 0 n  1  xi   yi
i 1 i 1
n n n
 0  xi  1  x   xi yi
2
i
i 1 i 1 i 1
 Solve the equations and we get

n n n n


( xi2 )( yi )  ( xi )( xi yi )
0  i 1 i 1
n
i 1
n
i 1

n x  ( xi )
2
i
2

i 1 i 1
n n n


n xi yi  ( xi )( yi )
1  i 1
n
i 1
n
i 1

n x  ( xi )
2
i
2

i 1 i 1
 To simplify, we introduce
n n
1 n n
S xy   ( xi  x )( yi  y )   xi yi  ( xi )( yi )
i 1 i 1 n i 1 i 1
n n n
1
S xx   ( xi  x ) 2   xi2  ( xi ) 2
i 1 i 1 n i 1
n n n
1
S yy   ( yi  y ) 2   yi2  ( yi ) 2
i 1 i 1 n i 1
   S xy
 0  y  1 x 1 
S xx
  
 We get The equation y     x is known as the
least squares line, which is an0 estimate
1
of the true
regression line.
Example 10.2 (Tire Tread vs. Mileage: LS Line Fit)
Find the equation of the line for the tire tread wear data from
Table10.1,we have

x i  144,  yi  2197.32,  xi2  3264,  yi2  589,887.08,  xi yi  28,167.72

and n=9.From these we calculate


x  16, y  244.15,
n
1 n n
1
S xy   xi yi  ( xi )(  yi )  28,167.72  (144  2197.32)  6989.40
i 1 n i 1 i 1 9
n
1 n
1
S xx   xi  ( xi )  3264  (144)  960
2 2 2

i 1 n i 1 9
The slope and intercept estimates are

ˆ 6989.40
1   7.281and ˆ0  244.15  7.281*16  360.64
960
Therefore, the equation of the LS line is

y  360.64  7.281x.
Conclusion: there is a loss of 7.281 mils in the tire groove depth for
every 1000 miles of driving.

Given a particular
We can find x  25
y  360.64  7.281* 25  178.62mils
Which means the mean groove depth for all tires driven for
25,000miles is estimated to be 178.62 miles.
10.2.2 Goodness of Fit of the LS Line
 Coefficient of Determination and Correlation
yˆi   0  ˆ1 xi (i  1, 2,.....n)

 The residuals:
(i  1,2,.....n)
ei  yi  ( ˆ0  ˆ1 xi )

are used to evaluate the goodness of fit of the LS


line.
n n n n
SST   ( yi  y ) 2   ( yˆi  y ) 2   ( yi  yˆi ) 2  2 ( yi  yˆi )( yˆi  y )
i 1
i 1     i 1      i 1       
SSR SSE 0

 We define:
SST  SSR  SSE

Theratio r 2  SSR  1  SSE


SST SST
Note: total sum of squares (SST)
Regression sum of squares (SSR)
Error sum of squares (SSE)

r is called the coefficient of determination 0<r<1


Example 10.3(Tire Tread Wear vs. Mileage:
Coefficient of Determination and Correlation
2
 For the tire tread wear data, calculate r and r using the
result s from example 10.2 We have
n
1 n 2 1
SST  S yy   y  ( yi )  589,887.08  (2197.32) 2  53, 418.73
2
i
i 1 n i 1 9
 Next calculate SSR  SST  SSE  53, 418.73  2531.53  50,887.20
50,887.20
 Therefore r 2
  0.953and r   0.953  0.976
53, 418.73

where the sign of r follows from the sign of ˆ1  7.281 since
95.3% of the variation in tread wear is accounted for by
linear regression on mileage, the relationship between the
two is strongly linear with a negative slope.
10.2.3 estimation of 
An unbiased estimate of 
2
is given by
n

e SSE
2
i
s 
2

i 1

n2 n2
Example 10.4(Tire Tread Wear Vs. Mileage: Estimate of  2

Find the estimate of for the tread wear data using the results from Example 10.3 W
e have SSE=2351.3 and n-2=7,therefore

2351.53
S2   361.65
7
Which has 7 d.f. The estimate of  is s  361.65  19.02 miles.
Statistical Inference on  and  , Con’t

 
Point estimators:  0 , 1
 
Sampling distributions of  0 and 1 :

   xi 2
x i
2
 
ˆ 0 ~ N   0,  2  SE (  0 )  s
 nS xx  nS xx
 

   2  s
ˆ
 1 ~ N   1,  SE ( 1 ) 
 Sxx  S xx
For mathematical derivations, please refer to the text book, P331.
Statistical Inference on  and  , Con’t

P.Q.’s:
ˆ 0   0 ˆ 1   1
~ tn  2 ~ tn  2
SE ( ˆ 0) SE ( ˆ 1)
CI’s:


     
 0  t  SE   0  , 1  t  SE  1 
n  2,
2   n  2,
2  
Statistical Inference on  and  , Con’t

H 0 : 1  10 H 0 : 1  0
Hypothesis test:
H a : 1  10 H a : 1  0
-- Test statistic:  
1   0
1
t0  
1
t0  
SE ( 1 ) SE ( 1 )
-- At the significance level  , we reject H 0 in
favor of H a iff t0  tn 2, / 2

-- Can be used to show whether there is a


linear relationship between x and y
Analysis of Variance (ANOVA), Con’t

Mean Square:
-- a sum of squares divided by its d.f.

SSR SSE
MSR= , MSE=
1 n2
2 2
MSR SSR ˆ 12 Sxx  ˆ 1   ˆ1  H 0 2
 2       t ~ F 1, n  2
2 ˆ
MSE s s  s / Sxx   SE (  1) 
Analysis of Variance (ANOVA)

ANOVA Table
Source of Sum of Degrees of Mean F
Variation Squares Freedom Square
(Source) (SS) (d.f.) (MS)
SSR
Regression SSR 1 MSR=
1 MSR
SSE F=
MSE= MSE
Error SSE n-2 n2
Total SST n-1

Example:
Source SS d.f. MS F
Regression 50,887.20 1 50,887.20 140.71
Error 7 361.25
2531.53
Total 53,418.73 8
10.4 Regression Diagnostics

10.4.1 Checking for Model Assumptions

 Checking for Linearity


 Checking for Constant Variance
 Checking for Normality
 Checking for Independence
Checking for Linearity
Xi =Mileage Y=β0 + β1 x
^ Yi =Groove Depth ^ ^ ^
i Xi Yi Yi ei
^ Y=β0 + β1 x
1 0 394.33 360.64 33.69 Yi =fitted value ^
2 4 329.50 331.51 -2.01 ei =residual Residual = ei = Yi- Yi
3 8 291.00 302.39 -11.39
4 12 255.17 273.27 -18.10
Scatterplot of ei vs Xi
5 16 229.33 244.15 -14.82
40
6 20 204.83 215.02 -10.19
7 24 179.00 185.90 -6.90 30

20

10
ei

8 28 163.83 156.78 7.05


0

-10

-20
0 5 10 15 20 25 30 35
Xi
Checking for Normality
Normal Probability Plot of residuals
Normal
99
Mean 3.947460E-16
StDev 17.79
95 N 9
AD 0.514
90
P-Value 0.138
80
70
Percent

60
50
40
30
20

10

1
-40 -30 -20 -10 0 10 20 30 40 50
C1
Checking for Constant Variance
Plot of Residuals vs Fitted Value
40

30

20

10
Residuals

0
0 100 200 300 400
-10

-20

-30
Fitted Value

Var(Y) is not constant. A sample residual plots when


Var(Y) is constant.
Checking for Independence

 Does not apply for


Simple Linear
Regression Model
 Only apply for time
series data
10.4.2 Checking for Outliers &
Influential Observations

 What is OUTLIER
 Why checking for outliers is important
 Mathematical definition
 How to deal with them
10.4.2-A. Intro
Recall Box and Whiskers Plot (Chapter 4)
 Where (mild) OUTLIER is defined as any observations that lies outside of
Q1-(1.5*IQR) and Q3+(1.5*IQR) (Interquartile range, IQR = Q3 − Q1)
 (Extreme) OUTLIER as that lies outside of Q1-(3*IQR) and Q3+(3*IQR)
 Observation "far away" from the rest of the data
10.4.2-B. Why are outliers a
problem?
 May indicate a sample peculiarity or a data entry error or other
problem ;
 Regression coefficients estimated that minimize the Sum of Squares
for Error (SSE) are very sensitive to outliers >>Bias or distortion of
estimates;
 Any statistical test based on sample means and variances can be
distorted In the presence of outliers >>Distortion of p-values;
 Faulty conclusions.

Example:
( Estimators not sensitive to outliers are said to be robust )

Sorted Data Median Mean Variance 95% CI for mean


Real 1 3 5 9 12 5 6.0 20.6 [0.45, 11.55]
Data
Data 1 3 5 9 120 5 27.6 2676.8 [-36.630,91.83]
with
Error
10.4.2-C. Mathematical Definition
 Outlier
The standardized residual is given by

If |ei*|>2, then the corresponding observation may be regarded an outlier.


Example: (Tire Tread Wear vs. Mileage)

i 1 2 3 4 5 6 7 8 9

ei* 2.25 -0.12 -0.66 -1.02 -0.83 -0.57 -0.40 0.43 1.51

• STUDENTIZED RESIDUAL: a type of standardized residual calculated with the current observ
ation deleted from the analysis.
• The LS fit can be excessively influenced by observation that is not necessarily an outlier as d
efined above.
10.4.2-C. Mathematical Definition

 Influential Observation
Observation with extreme x-value, y-value, or both.

• On average hii is (k+1)/n, regard any hii>2(k+1)/n as high leverage;


• If xi deviates greatly from mean x, then hii is large;
• Standardized residual will be large for a high leverage observation;
• Influence can be thought of as the product of leverage and outlierness.
Example: (Observation is influential/ high leverage, but not an outlier)

0 5 10 15

eg.1 with without eg.2 scatter plot residual plot


10.4.2-C. SAS code of the
examples
SAS code
proc reg data=tire;
model y=x;
output out=resid rstudent=r h=lev cookd=cd dffits=dffit;
proc print data=resid;
where abs(r)>=2 or lev>(4/9) or cd>(4/9) or abs(dffit)>(2*sqrt(1/9));
run;

SAS output
10.4.2-D. How to deal with
Outliers & Influential
Observations

 Investigate (Data errors? Rare events? Can be


corrected?)
 Ways to accommodate outliers
 Non Parametric Methods (robust to outliers)
 Data Transformations
 Deletion (or report model results both with and
without the outliers or influential observations to
see how much they change)
10.4.3 Data Transformations
Reason

 To achieve linearity
 To achieve homogeneity of variance
 To achieve normality or symmetry about the
regression equation
Type of Transformation
 Linearzing Transformation
transformation of a response variable, or predicted
variable, or both, which produces an approximate
linear relationship between variables.

 Variance Stabilizing Transformation


make transformation if the constant variance
assumption is violated
Method of
Linearizing Transformation

 Use mathematical operation, e.g. square


root, power, log, exponential, etc.

 Only one variable needs to be transformed in


the simple linear regression.
Which one? Predictor or Response? Why?
e.g. We take a exponential transformation on
Y = exp (-x) <=> log Y = log - x

Plot of Residual vs xi & xi from the exponential fit


^ 40 Variable
Y= ei (original)
ei with transformation
^ ^ 30
Xi Yi log Yi exp (logYi) Ei
20
0 394.33 5.926 374.64 19.69

Residual
10
4 329.50 5.807 332.58 -3.08
8 291.00 5.688 295.24 -4.24 0

12 255.17 5.569 262.09 -6.92 -10

16 229.33 5.450 232.67 -3.34


-20
20 204.83 5.331 206.54 -1.71 0 5 10 15 20 25 30 35
xi
24 179.00 5.211 183.36 -4.36
Normal Probability Plot of ei and ei with transformation
99
Variable
ei
95 ei with transformation
Mean StDev N AD P
90 3.947460E-16 17.79 9 0.514 0.138
0.3256 8.142 9 0.912 0.011
80
70
28 163.83 5.092 162.77 1.06
Percent

60
50
40
30
20

10

1
-40 -30 -20 -10 0 10 20 30 40 50
Data
Method of
Variance Stabilizing Transformation
Delta method : Two terms Taylor-series approximations

Var( h(Y)) ≈ [h2 g2 (whereVar(Y) = g2Y) = 

1. set [h’(]2 g2 (


1
2. h’( = g ( )

d dy
3. h =  h(y)
 g ( ) =  g ( y)
e.g. Var(Y) = c22 , where c > 0, g = c↔ g(y) = cy

h(y) = 
 cy
dy 1
c  dyy 1
c
log( y )
Therefore it is the logarithmic transformation
Correlation Analysis
 Correlation: a measurement of how closely two
variables share a linear relationship.

Cov(X, Y)
   corr(X, Y) 
Var(X)Var( Y)
 Useful when it is not possible to determine which
variable is the predictor and which is the response.
 Health vs wealth. Which is predictor? Which is response?
Statistical Inference on the
Correlation Coefficient ρ
 We can derive a test on the correlation
coefficient in the same way that we have
been doing in class.
 Assumptions
 X, Y are from the bivariate normal
distribution
 Start with point estimator
 R: sample estimate of the population
correlation coefficient ρ

(X i  X )(Yi  Y )
 R i 1
n n

 ( X i  X )2  (Yi  Y ) 2
i 1 i 1

 Get the pivotal quantity


 The distribution of R is quite complicated
 T: transform the point estimator into a p.q.

R n2
 T
1 R2

 Do we know everything about the p.q.?


 Yes: T ~ tn-2 under H0 : ρ=0
Bivariate Normal Distribution
 pdf:

 Properties
 μ1, μ2 means
for X, Y
 σ12, σ22 variances
for X, Y
 ρ the correlation coeff
between X, Y
Derivation of T
are these equivalent?
r n  2 ? ˆ1
t   Therefore, we can use t
1 r 2
SE ( ˆ )
1
as a statistic for testing
substitute : against the null
s S
r  ˆ1 x  ˆ1 xx  ˆ1
S xx hypothesis
sy S yy SST
H0: β1=0
SSE (n  2) s 2
1 r 
2

SST SST

then :
 Equivalently, we can
S xx ( n  2) SST ˆ1 ˆ1 test against
t  ˆ1  
SST ( n  2) s 2 s / S xx SE ( ˆ1 ) H0: ρ=0
 yes, they are equivalent.
Exact Statistical Inference on ρ
 Test  Example (from textbook)
 A researcher wants to determine if two test
 H0 : ρ=0 , Ha : ρ<>0 instruments give similar results. The two
test instruments are administered to a
 Test statistic: sample of 15 students. The correlation
coefficient between the two sets of scores
is found to be 0.7. Is this correlation
statistically significant at the .01 level?
r n2
t0 
1 r 2  H0 : ρ=0
0.7 ,15 H
2 a : ρ<>0
t0   3.534
1  0. 7 2

 Reject H0 if t0 > tn-2


 for α = .01, 3.534 = t0 > t13, .005 = 3.012

 ▲ Reject H0
Approximate Statistical Inference on ρ
 There is no exact method of
testing ρ vs an arbitrary ρ0
 Distribution of R is very
complicated
 T ~ tn-2 only when ρ = 0

 To test ρ vs an arbitrary ρ0 use


Fisher’s Normal approximation
1 1 R   1 1   1 
  N  ln 
1
tanh R  ln , 
2 1 R   2  1    n  3 
 Transform the sample estimate
^ 1 1 r  ^  1  1  0  1 
  ln , under H 0 , ~ N  ln , 

2 1 r   2  1  0  n  3 
Approximate Statistical Inference on ρ
 Test : H 0 :    0 vs. H1 :    0
1  1  0 
H 0 :   0  ln  vs. H1 :   0
2  1   0 

^ 1 1 r 
 Sample estimate:   ln 
2  1 r 

^ 
z0  n  3   0 
 Z statistic:  
reject H0 if |z0| > zα/2

 CI: ^
  z / 2
1 ^
     z / 2
1
n3 n3
e 2l  1 e 2u  1
   2u
e 2l  1 e 1
Approximate Statistical Inference on ρ
using SAS

 Code:

 Output:
Pitfalls of Regression and
Correlation Analysis
 Correlation and causation
 Ticks cause good health
 Coincidental data
 Sun spots and republicans
 Lurking variables
 Church, suicide, population
 Restricted range
 Local, global linearity
Summary
Model
Assumptions Linear regression analysis

Correlation
The Least squares (LS) estimates:  and  Coefficient r

Probabilistic model
for Linear regression:
 0or1  tn  2 ,  / 2 SE (  0 or1)
Correlation
Analysis
Outliers?
Confidence Interval & Prediction interval
Influential Observations?

Data Transformations?
n
Least Squares (LS) Fit Q  i
[ y
i 1
 (  0   x
1 i )]2

Sample correlation coefficient r


SSR
Statistical inference on ß0 & ß1
ˆ r 


2
1
2
xi
2
 ˆ 

  1 ~ N  1,
 2



SST
 0 ~ N  0,   
 nSxx  Sxx 
 
 
 
2
 * 1 x *
x 
Y  Y  tn  2, / 2 s 1  
*

n S xx
Prediction Interval  
 

Linearity Constant Variance


Model Assumptions Normality Independence

t
r n2 1 1
2 
Correlation Analysis
ln 
Thank You and Any questions?

You might also like