18 views

Uploaded by ODURODUA

- 2010-05+STA2020F+test+2
- PredInt.xls
- AP Statistics Practice Exam (2)
- ADD Postmortem Interval
- Kiwifruit and apricot ﬁrmness measurement.pdf
- Impact of Training and Development, training design and on the job training onEmployee’s commitment in banking sector of Pakistan
- A Project On
- Notas de clase - Regresión Lineal
- Religious Roles in Fertility Behaviour among the Residents of Akinyele Local Government, Oyo State, Nigeria
- The Housing Bubble and the GDP_LV10117
- Egger and Lassmann (2012)_The Language Effect in International Trade a Meta-Analysis
- Hatting h 2013
- Some determinants of social and environmental disclosures in New Zealand companies
- Chap9
- Multiple Regression - D. Boduszek
- Regression Assignment Inst
- Quiz 4 Review Questions With Answerss
- 117-29
- Lecture 20
- 06a2.pdf

You are on page 1of 75

2.2-2.3 Parameter Estimation

2.4 Properties of Estimators

2.5 Inference

2.6 Prediction

2.7 Analysis of Variance

2.8 Regression Through the Origin

2.9 Related Models

1

2.1 The Model

Measurement of y (response) changes in a linear fashion with a

setting of the variable x (predictor):

y =

0

+

1

x

linear relation

+

noise

The linear relation is deterministic (non-random).

The noise or error is random.

Noise accounts for the variability of the observations about the

straight line.

No noise relation is deterministic.

Increased noise increased variability.

2

Experiment with this simulation program:

simple.sim <-

function(intercept=0,slope=1,x=seq(1,10),sigma=1){

noise <- rnorm(length(x),sd = sigma)

y <- intercept + slope

*

x + noise

title1 <- paste("sigma = ", sigma)

plot(x,y,pch=16,main=title1)

abline(intercept,slope,col=4,lwd=2)

}

> source("simple.sim.R")

> simple.sim(sigma=.01)

> simple.sim(sigma=.1)

> simple.sim(sigma=1)

> simple.sim(sigma=10)

3

Simulation Examples

2 4 6 8 10

2

4

6

8

1

0

sigma = 0.01

x

y

2 4 6 8 10

2

4

6

8

1

0

sigma = 0.1

x

y

2 4 6 8 10

2

4

6

8

1

0

sigma = 1

x

y

2 4 6 8 10

0

1

0

2

0

3

0

sigma = 10

x

y

4

The Setup

Assumptions:

1. E[y|x] =

0

+

1

x.

2. Var(y|x) = Var(

0

+

1

x +|x) =

2

.

Data: Suppose data y

1

, y

2

, . . . , y

n

are obtained at settings

x

1

, x

2

, . . . , x

n

, respectively. Then the model on the data is

y

i

=

0

+

1

x

i

+

i

(

i

i.i.d. N(0,

2

) and E[y

i

|x

i

] =

0

+

1

x

i

.)

Either

1. the xs are xed values and measured without error

(controlled experiment)

OR

2. the analysis is conditional on the observed values of x

(observational study).

5

2.2-2.3 Parameter Estimation, Fitted Values and Residuals

1. Maximum Likelihood Estimation

Distributional assumptions are required

2. Least Squares Estimation

Distributional assumptions are not required

6

2.2.1 Maximum Likelihood Estimation

Normal assumption is required:

f(y

i

|x

i

) =

1

2

e

1

2

2

(y

i

1

x

i

)

2

Likelihood: L(

0

,

1

, ) =

f(y

i

|x

i

)

n

e

1

2

2

n

i=1

(y

i

1

x

i

)

2

Maximize with respect to

0

,

1

, and

2

.

(

0

,

1

): Equivalent to minimizing

n

i=1

(y

i

1

x

i

)

2

(i.e. Least-squares)

2

: SSE/n, biased.

7

2.2.2 Least Squares Estimation

Assumptions:

1. E[

i

] = 0

2. Var(

i

) =

2

3.

i

s are independent.

Note that normality is not required.

8

Method

Minimize

S(

0

,

1

) =

n

i=1

(y

i

1

x

i

)

2

with respect to the parameters or regression coefcients

0

and

1

:

0

and

1

.

Justication: We want the tted line to pass as close to all of the

points as possible.

Aim: small Residuals (observed - tted response values):

e

i

= y

i

1

x

i

9

Look at the following plots:

> source("roller2.plot")

> roller2.plot(a=14,b=0)

> roller2.plot(a=2,b=2)

> roller2.plot(a=12, b=1)

> roller2.plot(a=-2,b=2.67)

10

0 2 4 6 8 10 12

0

2

0

4

0

6

0

Roller weight (t)

D

e

p

r

e

s

s

i

o

n

i

n

l

a

w

n

(

m

m

)

Fitted values

Data values

+ve residual

ve residual

a=14, b=0

0 2 4 6 8 10 12

0

2

0

4

0

6

0

Roller weight (t)

D

e

p

r

e

s

s

i

o

n

i

n

l

a

w

n

(

m

m

)

Fitted values

Data values

+ve residual

ve residual

a=2, b=2

0 2 4 6 8 10 12

0

2

0

4

0

6

0

Roller weight (t)

D

e

p

r

e

s

s

i

o

n

i

n

l

a

w

n

(

m

m

)

Fitted values

Data values

+ve residual

ve residual

a=12, b=1

0 2 4 6 8 10 12

0

2

0

4

0

6

0

Roller weight (t)

D

e

p

r

e

s

s

i

o

n

i

n

l

a

w

n

(

m

m

)

Fitted values

Data values

+ve residual

ve residual

a=2, b=2.67

The rst three lines above do not pass as close to the plotted

points as the fourth, even though the sum of the residuals is

about the same in all four cases.

Negative residuals cancel out positive residuals.

The Key: minimize squared residuals source("roller3.plot.R")

> roller3.plot(14,0); roller3.plot(2,2); roller3.plot(12,1);

> roller3.plot(a=-2,b=2.67) # small SS

0 2 4 6 8 10 12

0

4

0

8

0

1

2

0

Roller weight (t)

D

e

p

r

e

s

s

i

o

n

i

n

l

a

w

n

(

m

m

)

Fitted values

Data values

+ve residual

ve residual

a=14, b=0

0 2 4 6 8 10 12

0

4

0

8

0

1

2

0

Roller weight (t)

D

e

p

r

e

s

s

i

o

n

i

n

l

a

w

n

(

m

m

)

Fitted values

Data values

+ve residual

ve residual

a=2, b=2

0 2 4 6 8 10 12

0

4

0

8

0

1

2

0

Roller weight (t)

D

e

p

r

e

s

s

i

o

n

i

n

l

a

w

n

(

m

m

)

Fitted values

Data values

+ve residual

ve residual

a=12, b=1

0 2 4 6 8 10 12

0

4

0

8

0

1

2

0

Roller weight (t)

D

e

p

r

e

s

s

i

o

n

i

n

l

a

w

n

(

m

m

)

Fitted values

Data values

+ve residual ve residual

a=2, b=2.67

11

Unbiased estimate of

2

1

n # parameters estimated

n

i=1

e

2

i

=

1

n 2

n

i=1

e

2

i

* n observations n degrees of freedom

* 2 degrees of freedom are required to estimate the parameters

* the residuals retain n 2 degrees of freedom

12

Alternative Viewpoint

* y = (y

1

, y

2

, . . . , y

n

) is a vector in n-dimensional space.

(n degrees of freedom)

* The tted values y

i

=

0

+

1

x

i

also form a vector in

n-dimensional space:

y = ( y

1

, y

2

, . . . , y

n

)

(2 degrees of freedom)

* Least-squares seeks to minimize the distance between y and

y.

* The distance between n-dimensional vectors u and v is the

square root of

n

i=1

(u

i

v

i

)

2

* Thus, the squared distance between y and

y is

n

i=1

(y

i

y

i

)

2

=

n

i=1

e

2

i

13

Regression Coefcient Estimators

The minimizers of

S(

0

,

1

) =

n

i=1

(y

i

1

x

i

)

2

are

0

= y

1

x

and

1

=

S

xy

S

xx

where

S

xy

=

n

i=1

(x

i

x)(y

i

y)

and

S

xx

=

n

i=1

(x

i

x)

2

14

HomeMade R Estimators

> ls.est <-

function (data)

{

x <- data[,1]

y <- data[,2]

xbar <- mean(x)

ybar <- mean(y)

Sxy <- S(x,y,xbar,ybar)

Sxx <- S(x,x,xbar,xbar)

b <- Sxy/Sxx

a <- ybar - xbar

*

b

list(a = a, b = b, data=data)

}

> S <-

function (x,y,xbar,ybar)

{

sum((x-xbar)

*

(y-ybar))

}

15

Calculator Formulas and Other Properties

S

xy

=

n

i=1

x

i

y

i

n

i=1

x

i

)(

n

i=1

y

i

)

n

S

xx

=

n

i=1

x

2

i

(

n

i=1

x

i

)

2

n

Linearity Property:

S

xy

=

n

i=1

y

i

(x

i

x)

so

1

is linear in the y

i

s.

Gauss-Markov Theorem: Among linear unbiased estimators for

1

and

0

,

1

and

0

are best (i.e. they have smallest variance).

Exercise: Find the expected value and variance of

y

1

y

x

1

x

. Is it

unbiased for

1

? Is there a linear unbiased estimator with

smaller variance?

16

Residuals

i

= e

i

= y

i

1

x

i

= y

i

y

i

= y

i

y

1

(x

i

x)

In R:

> res

function (ls.object)

{

a <- ls.object$a

b <- ls.object$b

x <- ls.object$data[,1]

y <- ls.object$data[,2]

resids <- y - a - b

*

x

resids

}

17

2.3.1 Consequences of Least-Squares

1. e

i

= y

i

y

1

(x

i

x)

(follows from intercept formula)

2.

n

i=1

e

i

= 0 (follows from 1.)

3.

n

i=1

y

i

=

n

i=1

y

i

(follows from 2.)

4. The regression line passes through the centroid ( x, y) (follows

from intercept formula)

5.

x

i

e

i

= 0

(set partial derivative of S(

0

,

1

) wrt

1

to 0)

6.

y

i

e

i

= 0

(follows from 2. and 5.)

18

2.3.2 Estimation of

2

The residual sum of squares or error sum of squares is given by

SSE =

n

i=1

e

2

i

= S

yy

1

S

xy

Note: SSE = SS

Res

and S

yy

= SS

T

An unbiased estimator for the error variance is

2

= MSE =

SSE

n 2

=

_

S

yy

1

S

xy

_

n 2

Note: MSE = MS

Res

= Residual Standard Error =

MSE

19

Example

roller data

weight depression

1 1.9 2

2 3.1 1

3 3.3 5

4 4.8 5

5 5.3 20

6 6.1 20

7 6.4 23

8 7.6 10

9 9.8 30

10 12.4 25

20

Hand Calculation

(y = depression, x = weight)

10

i=1

x

i

= 1.9 +3.1 + 12.4 = 60.7

x =

60.7

10

= 6.07

10

i=1

y

i

= 2 +1 + 25 = 141

y =

141

10

= 14.1

10

i=1

x

2

i

= 1.9

2

+3.1

2

+ +12.4

2

= 461

10

i=1

y

2

i

= 4 +1 +25 + +625 = 3009

10

i=1

x

i

y

i

= (1.9)(2) + (12.4)(25) = 1103

S

xx

= 461

(60.7)

2

10

= 92.6

21

S

xy

= 1103

(60.7)(141)

10

= 247

S

yy

= 3009

(141)

2

10

= 1021

1

=

S

xy

S

xx

=

247

92.6

= 2.67

0

= y

1

x = 14.1 2.67(6.07) = 2.11

2

=

1

n2

(S

yy

1

S

xy

)

=

1

8

(1021 2.67(247)) = 45.2 = MSE

Example Summary: the tted regression line relating depression

(y) to weight (x) is

y = 2.11 +2.67x

The error variance is estimated as MSE = 45.2.

R commands (home-made version)

> roller.obj <- ls.est(roller)

> roller.obj[-3] # intercept and slope estimate

$a

[1] -2.09

$b

[1] 2.67

> res(roller.obj) # residuals

[1] -0.98 -5.18 -1.71 -5.71 7.95

[6] 5.82 8.02 -8.18 5.95 -5.98

> sum(res(roller.obj)2)/8 # error variance (MSE)

[1] 45.4

22

R commands (built-in version)

> attach(roller)

> roller.lm <- lm(depression weight)

> summary(roller.lm)

Call:

lm(formula = depression weight)

Residuals:

Min 1Q Median 3Q Max

-8.180 -5.580 -1.346 5.920 8.020

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -2.0871 4.7543 -0.439 0.67227

weight 2.6667 0.7002 3.808 0.00518

23

Residual standard error: 6.735 on 8 degrees of freedom

Multiple R-squared: 0.6445, Adjusted R-squared: 0.6001

F-statistic: 14.5 on 1 and 8 DF, p-value: 0.005175

> detach(roller)

or

> roller.lm <- lm(depression weight, data = roller)

> summary(roller.lm)

Using Extractor Functions

Partial Output:

> coef(roller.lm)

(Intercept) weight

-2.09 2.67

> summary(roller.lm)$sigma

[1] 6.74

From the output,

the slope estimate is

1

= 2.67.

the intercept estimate is

0

= 2.09.

the Residual standard error is the square root of the MSE:

6.74

2

= 45.4.

24

Other R commands

tted values: predict(roller.lm)

residuals: resid(roller.lm)

diagnostic plots: plot(roller.lm)

(these include a plot of the residuals against the tted values

and a normal probability plot of the residuals)

Also plot(roller); abline(roller.lm)

(this gives a plot of the data with the tted line overlaid)

25

2.4: Properties of Least Squares Estimates

E[

1

] =

1

E[

0

] =

0

Var(

1

) =

2

S

xx

Var(

0

) =

2

_

1

n

+

x

2

S

xx

_

26

Standard Error Estimators

Var(

1

) =

MSE

S

xx

so the standard error (s.e.) of

1

is estimated by

MSE

S

xx

roller e.g.: MSE = 45.2, S

xx

= 92.6

s.e. of

1

is

_

45.2/92.6 = .699

Var(

0

) = MSE

_

1

n

+

x

2

S

xx

_

so the standard error (s.e.) of

0

is estimated by

_

MSE

_

1

n

+

x

2

S

xx

_

roller e.g.:

MSE = 45.2, S

xx

= 92.6, x = 6.07, n = 10

s.e. of

0

is

45.2

_

1

10

+

6.07

2

92.6

2

_

= 4.74

27

Distributions of

1

and

0

y

i

is N(

0

+

1

x

i

,

2

), and

1

=

n

i=1

a

i

y

i

(a

i

=

x

i

x

S

xx

)

1

is N(

1

,

2

S

xx

).

Also,

SSE

2

is

2

n2

(independent of

1

) so

1

_

MSE/S

xx

t

n2

0

=

n

i=1

c

i

y

i

(c

i

=

1

n

x(x

i

x)

S

xx

)

0

is N(

0

,

2

_

1

n

+

x

2

S

xx

_

) and

MSE

_

1

n

+

x

2

S

xx

_

t

n2

28

2.5 Inferences about the Regression Parameters

Tests

Condence Intervals

for

1

, and

0

+

1

x

0

29

2.5.1 Inference for

1

H

0

:

1

=

10

vs. H

1

:

1

=

10

Under H

0

,

t

0

=

10

_

MSE/S

xx

has a t-distribution on n 2 degrees of freedom.

p-value = P(|t

n2

| > t

0

)

e.g. testing signicance of regression for roller:

H

0

:

1

= 0 vs. H

1

:

1

= 0

t

0

=

2.67 0

.699

= 3.82

p-value = P(|t

8

| > 3.82) = 2(1 P(t

8

< 3.82)) = .00509

R command: 2

*

(1-pt(3.82,8))

30

(1 ) Condence Intervals

slope:

1

t

n2,/2

s.e.

or

1

t

n2,/2

_

MSE/S

xx

roller e.g. (95% condence interval)

2.67 t

8,.025

(.699)

or

2.67 2.31(.699) = 2.67 1.61

R command: qt(.975, 8)

intercept:

0

t

n2,/2

s.e.

e.g. 2.11 2.31(4.74) = 2.11 10.9

31

2.5.2 Condence Interval for Mean Response

E[y|x

0

] =

0

+

1

x

0

where x

0

is a possible value that the predictor could take.

E[y|x

0

] =

0

+

1

x

0

is a point estimate of the mean response at

that value.

To nd a 1 condence interval for E[y|x

0

], we need the

variance of y

0

=

E[y|x

0

]:

Var( y

0

) = Var(

0

+

1

x

0

)

0

=

n

i=1

c

i

y

i

1

x

0

=

n

i=1

b

i

x

0

y

i

32

Therefore,

y

0

=

n

i=1

(c

i

+b

i

x

0

)y

i

so

Var( y

0

) =

n

i=1

(c

i

+b

i

x

0

)

2

Var(y

i

)

=

2

n

i=1

(c

i

+b

i

x

0

)

2

=

2

n

i=1

(

1

n

+

(x

0

x)(x

i

x)

S

xx

)

2

=

2

(

1

n

+

(x

0

x)

2

S

xx

)

Var( y

0

) = MSE(

1

n

+

(x

0

x)

2

S

xx

)

The condence interval is then

y

0

t

n2,/2

_

MSE(

1

n

+

(x

0

x)

2

S

xx

)

e.g. Compute a 95% condence interval for the expected

depression when the weight is 5 tonnes.

x

0

= 5, y

0

= 2.11 +2.67(5) = 11.2

n = 10, x = 6.07, S

xx

= 92.6, MSE = 45.2

Interval:

11.2 t

8,.025

45.2(

1

10

+

(5 6.07)

2

92.6

)

= 11.2 2.31

= (5.99, 16.41)

R code:

> predict(roller.lm,

newdata=data.frame(weight<-5),

interval="confidence")

fit lwr upr

[1,] 11.2 6.04 16.5

Ex. Write your own R function to compute this interval.

2.6: Predicted Responses

If a new observation were to be taken at x

0

, we would predict it

to be y

0

=

0

+

1

x

0

+ where is independent noise.

y

0

=

0

+

1

x

0

is a point prediction of the response at that

value.

To nd a 1 prediction interval for y

0

, we need the variance of

y

0

y

0

:

Var(y

0

y

0

) = Var(

0

+

1

x

0

+

1

x

0

)

= Var(

1

x

0

) +Var()

=

2

(

1

n

+

(x

0

x)

2

S

xx

) +

2

33

=

2

(1 +

1

n

+

(x

0

x)

2

S

xx

)

Var(y

0

y

0

) = MSE(1 +

1

n

+

(x

0

x)

2

S

xx

)

The prediction interval is then

y

0

t

n2,/2

_

MSE(1 +

1

n

+

(x

0

x)

2

S

xx

)

e.g. Compute a 95% prediction interval for the depression for a

single new observation where the weight is 5 tonnes.

x

0

= 5, y

0

= 2.11 +2.67(5) = 11.2

n = 10, x = 6.07, S

xx

= 92.6, MSE = 45.2

Interval:

11.2 t

8,.025

45.2(1 +

1

10

+

(5 6.07)

2

92.6

)

= 11.2 2.31

R code for Prediction Intervals

> predict(roller.lm,

newdata=data.frame(weight<-5),

interval="prediction")

fit lwr upr

[1,] 11.2 -5.13 27.6

Write your own R function to produce this interval.

34

Degrees of Freedom

A random sample of size n coming from a normal population

with mean and variance

2

has n degrees of freedom:

Y

1

, . . . , Y

n

.

Each linearly independent restriction reduces the number of

degrees of freedom by 1.

n

i=1

(Y

i

)

2

2

has a

2

(n)

distribution.

n

i=1

(Y

i

Y )

2

2

has a

2

(n1)

distribution. (Calculating

Y imposes

one linear restriction.)

n

i=1

(Y

i

Y

i

)

2

2

has a

2

(n2)

distribution. (Calculating

0

and

1

imposes two linearly independent restrictions.)

Given the x

i

s, there are 2 degrees of freedom in the quantity

0

+

1

x

i

.

Given x

i

and

Y , there is one degree of freedom:

y

i

Y =

1

(x

i

x).

35

2.7: Analysis of Variance: Breaking down (analyzing) Variation

The variation in the data (responses) is summarized by

S

yy

=

n

i=1

(y

i

y)

2

= TSS

(Total sum of squares)

2 sources of variation in the responses:

1. variation due to the straight line relationship with the

predictor

2. deviation from the line (noise)

y

i

y

deviation from data center

=

y

i

y

i

residual

+

y

i

y

difference: line and center

y

i

y = e

i

+ y

i

y

36

so

S

yy

=

(y

i

y)

2

=

(e

i

+ y

i

y)

2

=

e

2

i

+

( y

i

y)

2

since

e

i

y

i

= 0 and y

e

i

= 0

S

yy

= SSE +

( y

i

y)

2

= SSE +SS

R

The last term is the regression sum of squares.

Relation between SS

R

and

1

We saw earlier that

SSE = S

yy

1

S

xy

Therefore,

SS

R

=

1

S

xy

=

1

2

S

xx

Note that, for a given set of xs SS

R

depends only on

1

.

MS

R

= SS

R

/d.f. = SS

R

/1

(1 degree of freedom for slope parameter)

37

Expected Sums of Squares

E[SS

R

] = E[S

xx

1

2

]

= S

xx

_

Var(

1

) +(E[

1

])

2

_

= S

xx

_

2

S

xx

+

2

1

_

=

2

+

2

1

S

xx

= E[MS

R

]

Therefore, if

1

= 0, then SS

R

is an unbiased estimator for

2

.

Development of E[MSE]:

E[S

yy

] = E[

(y

i

y)

2

]

= E[

y

2

i

n y

2

] =

E[y

2

i

] nE[ y

2

]

38

Development of E[MSE] (contd)

Consider the 2 terms on RHS, separately:

1. term:

E[y

2

i

] = Var(y

i

) +(E[y

i

])

2

=

2

+(

0

+

1

x

i

)

2

E[y

2

i

] = n

2

+n

2

0

+2n

0

1

x +

2

1

x

2

i

2. term:

E[ y

2

] = Var( y) +(E[ y])

2

=

2

/n +(

0

+

1

x)

2

nE[ y

2

] =

2

+n

2

0

+2n

0

1

x +n

2

1

x

2

39

Development of E[MSE] (contd)

E[S

yy

] = (n 1)

2

+

2

1

(x

i

x)

2

= E[SS

T

]

E[SSE] = E[S

yy

] E[SS

R

]

= (n 1)

2

+

2

1

(x

i

x)

2

(

2

+

2

1

S

xx

)

= (n 2)

2

E[MSE] = E[SSE/(n 2)] =

2

40

Another approach to testing H

0

:

1

= 0

Under the null hypothesis, both MSE and MS

R

estimate

2

.

Under the alternative, only MSE estimates

2

.

E[MS

R

] =

2

+

2

1

S

xx

>

2

A reasonable test is

F

0

=

MS

R

MSE

F

1,n2

Large F

0

evidence against H

0

.

Note t

2

= F

1,

so this is really the same test as

t

2

0

=

_

1

_

MSE/S

x

x

_

_

2

=

1

2

S

xx

MSE

=

MS

R

MSE

41

The ANOVA table

Source df SS MS F

Reg. 1

1

2

S

xx

1

2

S

xx

MS

R

MSE

Error n 2 S

yy

1

2

S

xx

SSE/(n 2)

Total n-1 S

yy

roller data example:

> anova(roller.lm) # R code

Analysis of Variance Table

Response: depression

Df Sum Sq Mean Sq F value Pr(>F)

weight 1 658 658 14.5 0.0052

Residuals 8 363 45

(recall that the t-statistic for testing

1

= 0 had been 3.81 =

14.5)

Ex. Write an R function to compute these ANOVA quantities.

42

Condence Interval for

2

SSE

2

2

n2

so

P(

2

n2,1/2

SSE

2

2

n2,/2

) = 1

so

P(

SSE

2

n2,/2

2

SSE

2

n2,1/2

) = 1

e.g. roller data:

SSE = 363

2

8,.975

= 2.18

(R code: 1-qchisq(8, .025))

2

8,.025

= 17.5

(363/17.5, 363/2.18) = (20.7, 166.5)

43

2.7.1 R

2

- Coefcient of Determination

R

2

= is the fraction of the response variability explained by the

regression:

R

2

=

SS

R

S

yy

0 R

2

1. Values near 1 imply that most of the variability is

explained by the regression.

roller data:

SS

R

= 658 and S

yy

= 1021

so

R

2

=

658

1021

= .644

44

R output

> summary(roller.lm)

...

Multiple R-Squared: 0.644, ...

Ex. Write an R function which computes R

2

.

Another interpretation:

E[R

2

]

.

=

E[SS

R

]

E[S

yy

]

=

2

1

S

xx

+

2

(n 1)

2

+

2

1

S

xx

=

2

1

S

xx

n1

+

2

n1

2

+

2

1

S

xx

n1

.

=

2

1

S

xx

(n1)

2

+

2

1

S

xx

(n1)

for large n. (Note: this differs from the textbook.)

45

Properties of R

2

Thus, R

2

increases as

1. S

xx

increases (xs become more spread out)

2.

2

decreases

Cautions

1. R

2

does not measure the magnitude of the regression slope.

2. R

2

does not measure the appropriateness of the linear model.

3. A large value of R

2

does not imply that the regression model

will be an accurate predictor.

46

Hazards of Regression

Extrapolation: predicting y values outside the range of observed

x values. There is no guarantee that a future response would

behave in the same linear manner outside the observed range.

e.g. Consider an experiment with a spring. The spring is

stretched to several different lengths x (in cm) and the restoring

force F (in Newtons) is measured:

x F

3 5.1

4 6.2

5 7.9

6 9.5

> spring.lm <- lm(Fx - 1,data=spring)

> summary(spring.lm)

Coefficients:

47

Estimate Std. Error t value Pr(>|t|)

x 1.5884 0.0232 68.6 6.8e-06

The tted model relating F to x is

F = 1.58x

Can we predict the restoring force for the spring, if it has been

extended to a length of 15 cm?

High leverage observations: x values at the extremes of the

range have more inuence on the slope of the regression than

observations near the middle of the range.

Outliers can distort the regression line. Outliers may be

incorrectly recorded OR may be an indication that the linear

relation or constant variance assumption is incorrect.

A regression relationship does not mean that there is a

cause-and-effect relationship. e.g. The following data give the

number of lawyers and number of homicides in a given year for a

number of towns:

no. lawyers no. homicides

1 0

2 0

7 2

10 5

12 6

14 6

15 7

18 8

Note that the number of homicides increases with the number of

lawyers. Does this mean that in order to reduce the number of

homicides, one should reduce the number of lawyers?

Beware of nonsense relationships. e.g. It is possible to show

that the area of some lakes in Manitoba is related to elevation.

Do you think there is a real reason for this? Or is the apparent

relation just a result of chance?

2.9 - Regression through the Origin

intercept = 0

y

i

=

1

x

i

+

Max. Likelihood and L-S

minimize

n

i=1

(y

i

1

x

i

)

2

1

=

x

i

y

i

x

2

i

e

i

= y

i

1

x

i

SSE =

e

2

i

Max. Likelihood

2

=

SSE

n

48

Unbiased Estimator:

2

= MSE =

SSE

n 1

Properties of

1

:

E[

1

] =

1

Var(

1

) =

2

x

2

i

1 C.I. for

1

:

1

t

n1,/2

_

MSE

x

2

i

1 C.I. for E[y|x

0

]:

y

0

t

n1,/2

_

MSEx

2

0

x

2

i

since

Var(

1

x

0

) =

2

x

2

i

x

2

0

1 P.I. for y, given x

0

:

y

0

t

n1,/2

_

MSE(1 +

x

2

0

x

2

i

)

R code:

> roller.lm <- lm(depressionweight - 1,

data=roller)

> summary(roller.lm)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

weight 2.392 0.299 7.99 2.2e-05

Residual standard error: 6.43

on 9 degrees of freedom

Multiple R-Squared: 0.876,

Adjusted R-squared: 0.863

F-statistic: 63.9 on 1 and 9 DF,

p-value: 2.23e-005

> predict(roller.lm,

newdata=data.frame(weight<-5),

interval="prediction")

fit lwr upr

[1,] 12.0 -2.97 26.9

Ch. 2.9.2 Correlation

Bivariate Normal Distribution:

f(x, y) =

1

2

1

2

_

1

2

e

1

2(1

2

)

(Y

2

2XY +X

2

)

where

X =

x

1

1

and

Y =

y

2

2

correlation coefcient:

=

12

2

1

and

2

1

are the mean and variance of x

2

and

2

2

are the mean and variance of y

12

= E[(x

1

)(y

2

)], the covariance

49

Conditional distribution of y given x

f(y|x) =

1

2

1.2

e

1

2

_

y

0

1

x

1.2

_

2

where

0

=

2

1

=

2

2

1.2

=

2

2

(1

2

)

50

An Explanation

:

Suppose

y =

0

+

1

x +

where and x are independent N(0,

2

) and N(

1

,

2

1

) random

variables.

Dene

2

= E[y] and

2

2

= Var(y):

2

=

0

+

1

2

2

=

2

1

2

1

+

2

Dene

12

= Cov(x

1

, y

2

):

12

= E[(x

1

)(y

2

)]

= E[(x

1

)(

1

x +

1

1

)]

=

1

E[(x

1

)

2

] +E[(x

1

)]

=

1

2

1

51

Dene =

12

2

:

=

1

2

1

2

=

1

2

Therefore,

1

=

2

and

0

=

2

1

=

2

y|

x=x

=

0

+

1

x +

must be normal with mean

E[y|x = x] =

0

+

1

x

=

2

+

1

(x

1

)

and variance

Var(y|x = x) =

2

1.2

=

2

=

2

2

2

1

2

1

=

2

2

2

2

2

1

2

1

=

2

2

(1

2

)

Estimation

Maximum Likelihood Estimation (using bivariate normal) gives

0

= y

1

x

1

=

S

xy

S

xx

r = =

S

xy

_

S

xx

S

yy

Note that

r =

S

xx

S

yy

so

r

2

=

1

2

S

xx

S

yy

= R

2

52

i.e. the coefcient of determination = square of correlation

coefcient

Testing H

0

: = 0 vs. H

1

: = 0

Condl approach (equiv. to testing

1

= 0):

t

0

=

1

_

MSE/S

xx

=

1

_

S

yy

1

2

S

xx

(n2)S

xx

=

1

2

(n 2)

S

yy

S

xx

1

2

=

_

(n 2)r

2

1 r

2

where we have used

1

2

= r

2

S

yy

/S

xx

.

53

The above statistic is a t statistic on n 2 degrees of freedom

conclude = 0 if pvalue

P(|t

n2

| > |t

0

|) is small.

Testing H

0

: =

0

Z =

1

2

log

1 +r

1 r

has an approximate normal distribution with mean

Z

=

1

2

log

1 +

1

and variance

2

Z

=

1

n 3

for large n.

Thus,

Z

0

=

log

1+r

1r

log

1+

0

1

0

2

_

1/(n 3)

has an approximate standard normal distribution when the null

hypothesis is true.

54

Condence Interval for

Condence interval for

1

2

log

1+

1

:

Z z

/2

_

1/(n 3)

Find endpoints (l, u) of this condence interval and solve for :

(

e

2l

1

1 +e

2l

,

e

2u

1

1 +e

2u

)

55

R code for fossum example

Find 95% condence interval for the correlation between total

length and head length:

> source("fossum.R")

> attach(fossum)

> n <- length(totlngth) # sample size

> r <- cor(totlngth,hdlngth)# correlation est.

> zci <- .5

*

log((1+r)/(1-r)) +

qnorm(c(.025,.975))

*

sqrt(1/(n-3))

> ci <- (exp(2

*

zci)-1)/(1+exp(2

*

zci))

> ci # transformed conf. interval

> detach(fossum)

[1] 0.62 0.83 # conf. interval for true

# correlation

56

- 2010-05+STA2020F+test+2Uploaded byAnn Kim
- PredInt.xlsUploaded byKristi Rogers
- AP Statistics Practice Exam (2)Uploaded byGuy Cecelski
- ADD Postmortem IntervalUploaded byboomerzoomer
- Kiwifruit and apricot ﬁrmness measurement.pdfUploaded byntdien923
- Impact of Training and Development, training design and on the job training onEmployee’s commitment in banking sector of PakistanUploaded byinventionjournals
- A Project OnUploaded byMuzammil Shaikh
- Notas de clase - Regresión LinealUploaded byDiegoFernandoGonzálezLarrote
- Religious Roles in Fertility Behaviour among the Residents of Akinyele Local Government, Oyo State, NigeriaUploaded byTI Journals Publishing
- The Housing Bubble and the GDP_LV10117Uploaded byRicharnellia-RichieRichBattiest-Collins
- Egger and Lassmann (2012)_The Language Effect in International Trade a Meta-AnalysisUploaded byTan Jiunn Woei
- Hatting h 2013Uploaded byAyban Wan
- Some determinants of social and environmental disclosures in New Zealand companiesUploaded byIgotmypoint
- Chap9Uploaded byvivekverma390
- Multiple Regression - D. BoduszekUploaded byrebela29
- Regression Assignment InstUploaded byAshok Nk
- Quiz 4 Review Questions With AnswerssUploaded bySteven Nguyen
- 117-29Uploaded bylee7717
- Lecture 20Uploaded byAnonymous 0qR2NHz9D
- 06a2.pdfUploaded byPETER
- PSQ Reviewer 1Uploaded byJan Michael Abarquez
- Lecture 07Uploaded byAshwani Kumar
- a2 Tiratira Ched Cktr Med Lcs & TeUploaded byCarlo Magno
- Heteros Kedas Tic No StUploaded byjebotipasmaterusupak
- econometricUploaded bySilvio Paula
- Kuliah 2 - Regresi OLS (Lanjutan)Uploaded byMuhammad Syah
- RegressUploaded byAlex Caceros
- hw3 4.0 (1)Uploaded bySahil Bhagat
- Data AssignmUploaded byAtul Bhaisare
- A Sticky OptimizationUploaded bySalvador Bernal Galván

- 2012_851 ESMA MiFID Supervisory Briefing Appropriateness and Execution-OnlyUploaded byalefbiondo
- 4-Fiber MSP Ring ProtectionUploaded byganjuvivek85
- Simple and Effective Antennas for Amateur Radio OperatorsUploaded byToto W. Juniarto
- BOIS Getting StartedUploaded bysubhamay
- CaseStudies-RoboticArcWeldingUploaded byKoshy John
- Chapter 7_Tarbela Reservoir Sediment Flushing ProjetUploaded byMirza Waqar Baig
- Ministry of Absorption Health Services BookletUploaded byEnglishAccessibility
- Sample(JohnLocke)Uploaded byRené Ríos
- Ancient Economic ThoughtUploaded byBernardokpe
- attacksUploaded byShubham Bansal
- OTN TechnologyUploaded byAlejandro Ramos
- A Practical Guide to Wig ModelsUploaded bydogbreath27
- DIA4ED2150102ENUploaded byVictor Pugliese
- Bikol Reporter January 6 - 12, 2019 IssueUploaded byBikol Reporter
- Goitia vs. a 35 PHIL 252 Nov. 2, 1916Uploaded byShe Oh
- MetroNID_ReleaseNote_5-2-1-2_20110421Uploaded byRiverdry Charlie
- Thesis-IUploaded bypoonam_2rasal2
- MPower 3000 Data SheetUploaded byLNLINFO
- Trutax Income Tax ConsultingUploaded byTru Tax
- Tech Ethernet v1r0c0Uploaded byMAX PAYNE
- Bearing failures and their causes. Fallas en Rodamientos y Sus Causas SKF (en Inglés)Uploaded byTeckelino
- ChocolateUploaded byMoHit Poddar
- MIMMSUploaded byMiguel Orozco Luna
- Code of Ethics for Professional Teachers Deped NCRUploaded byJonas Reduta Cabacungan
- E-challan Ccmt ChallanUploaded bySingh KD
- Michigan Court Proposed Rule Change For Indigent FilingsUploaded byBeverly Tran
- Competent Leadership Progress TrackerUploaded byArea33Toastmasters
- 2013-11-25 Program (Draft) - Jeju WorkshopUploaded byChikirin Grønlund
- 1208-CAP-XX-Charging-a-Supercapacitor-from-a-Solar-Cell.pdfUploaded bymehralsmensch
- Launceston CBD Building Height and Massing StudyUploaded byThe Examiner