You are on page 1of 54

2016S_STAT5102_L1 | Department of Statistics, The Chinese University of Hong Kong

I. Simple Linear Regression

In this chapter, we shall cover

•Relations between two variables


•Scatter plots
•Univariate regression model
•Assumptions
•Estimation and method of least squares
•Inferences concerning ß1
•Inferences concerning ß0
•Estimation of the mean of the response variable for a given level of X
•Prediction of new observation
•Analysis of variance approach to regression analysis
•Measures of linear association between X and Y

1
Relations between variables

• Functional relation between two variables


Independent (predictor, explanatory) variable
Dependent (response) variable

• Statistical relation between two variables

2
Scatter plots
• A scatter plot (or scatter diagram) can be used to
show the relationship between two variables.

Linear relationship Curvilinear relationship

Y Y

X X

Y Y

3
X X
Types of Relations

Strong relations Weak relationships

Y Y

X X

Y Y

X X 4
Types of Relations (Cont’d)

No relationship

X 5
Historical Origins of Regression Models

• It was first introduced by


Sir Francis Galton in the
late 19th century.

• Galton had studied the


relation between heights
of parents and children
and noted that the heights geographer, meteorologist,
of children of both tall tropical explorer, founder of
and short parents differential psychology,
appeared to “revert” or inventor of fingerprint
“regress” to the mean of identification, pioneer of
the group. statistical correlation and
regression …
6
Basic concepts
• A tendency of the response variable Y to vary with
the predictor variable X in a systematic fashion.

• A scattering of points around the curve of


statistical relationship.

• There is a probability distribution of Y for each


level of X.

• The means of these probability distributions vary


in some systematic fashion with X.

7
Simple Linear Regression Models

Slope Random
Y intercept Independen Error
Coefficient
t Variable term
Dependent
Variable

Yi = 0 + X
1 i + i
Linear component Random Error
Assumptions: component
• E(εi) = 0
• Variance (εi) = σ2
• Covariance (εi, εj) =0 8
Model Assumptions

• Normality of Error
Error values (ε) are (normally) distributed for any
given value of X.
• Homoscedasticity
The probability distribution of the errors has
constant variance.
• Independence of Errors
Error values are statistically independent.

9
Simple Linear Regression Equation

The simple linear regression equation provides an


estimate of the population regression line.

Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept

Value of X for

Ŷi = b0 + b1Xi
observation i

10
Slopes and the Intercept

• b0 is the estimated average value of Y when


the value of X is zero.
• b1 is the estimated change in the average
value of Y as a result of a one-unit change in
X.

11
Example

• A real estate agent wishes to examine the relation


between the selling price of a home and its size
(measured in square feet).

• A random sample of 10 houses is selected


Dependent variable (Y) = house price in $’000s
Independent variable (X) = square feet

12
Sample Data for House Price Model
House Price in $’000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

13
Graphical Presentation

House price model: scatter plot


450
400
House Price ($1000s)

350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet

14
Graphical Presentation

House price model: scatter plot and regression line


450
400
House Price ($1000s)

350 Slope
300
= 0.10977
250
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

house price = 98.24833 + 0.10977 (square feet)


15
Interpretation of the Intercept b0

house price = 98.24833 + 0.10977 (square feet)


• b0 is the estimated average value of Y when the value
of X is zero (if X = 0 is in the range of observed X
values)

• Here, no houses had 0 square feet, so b0 = 98.24833


just indicates that, for houses within the range of
sizes observed, $98,248.33 is the portion of the house
price not explained by square feet

16
Interpretation of the Slope Coefficient b1

house price = 98.24833 + 0.10977 (square feet)


• b1 measures the estimated change in the average value
of Y as a result of a one-unit change in X.

• Here, b1 = 0.10977 tells us that the average value of a


house increases by .10977($1000) = $109.77, on average,
for each additional one square foot of size.

17
Predictions using the Fitted Model
To predict the price for a house with 2,000
square feet, we write:

house price 98.25 0.1098 (sq.ft.)


98.25 0.1098 (2000)
317.85

The predicted price for a house with 2,000 square feet


is 317.85($1,000s) = $317,850.

18
Interpolation vs. Extrapolation
• When using a regression model for prediction, only
predict within the relevant range of data

Relevant range for


interpolation

450
400
House Price ($1000s)

350
300
250
200
150 Do not try to
100 extrapolate
50
0
beyond the range
0 500 1000 1500 2000 2500 3000 of observed X’s
Square Feet
19
Interpolation vs. Extrapolation

20
Estimation (Method of Least Squares)

• b0 and b1 can be obtained by choosing the


values of b0 and b1 so that they minimise
the sum of the squared differences between
Y and Ŷ:

min {Yi Ŷi (b)} = min


2
{Yi (b0 + b1Xi )} 2

21
Estimation (Method of Least Squares)

• In fact, b0 and b1 have nice closed-form solutions.

(X i X )( Y i Y )

• b1 = i 1
n
2
(X i X )
i 1

• b0 = Y b1 X

22
Estimation of error terms variance σ2

• The estimator of σ2 is n
n 2
(Yi Yˆ )
2 ei
2 SSE i 1 i 1
S MSE
n 2 n 2 n 2

• Note that S2 is an unbiased estimator of σ2.

23
Method of Maximum Likelihood
n

(X i X )( Y i Y )

• b1 = i 1
n
Normal Error Model
2
(X i X )
i 1

• b0 = Y b1 X
SSE n 2 2
• The estimator of σ2 is S
n n
• MLE of b0 = LSE of b0 (unbiased)
• MLE of b1 = LSE of b1 (unbiased)
• MLE of σ2 < unbiased estimator of σ2 [asymptotically unbiased]
24
Example

The director of admissions of a small college selected


120 students at random from the new freshman class in
a study to determine whether a student’s grade point
average (GPA) at the end of the freshman year (Y) can
be predicted from the ACT test score (X). The data are
given in the disk.

25
Regression output from SAS

Parameter Estimates
Variable Label DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept Intercept 1 2.11405 0.32089 6.59 <.0001
ACT ACT test score 1 0.03883 0.01277 3.04 0.0029

b0 = 2.11405, b1 = 0.03883, Ŷ = 2.11405 + .03883X

26
A Scatter Plot with Regression Line

27
The Slope

• Obtain a point estimate of the mean freshman


GPA for students with ACT test score X = 30.

Answer = 3.27895

• What is the point estimate of the change in the


mean response when the entrance test score
increase by one point?

Answer = 0.03883

28
More on Inference

Yi = 0 + 1Xi + i
Assumptions:

• Xi are known constants, i = 1,…n (Fixed Design)


• εi ~ N(0, σ2), i.i.d.

Therefore, Yi ~ N( β0 β 1X i , σ2), ind.

29
Distribution of b1

(X i X )( Y i Y )

b1 = i 1
n
2
(X i X )
i 1

= k iY i
i 1

(X X )
where k i n
i

2
(X i X )
i 1

30
Distribution of b1

Hence, we can deduce that


n

b1 ~ N(β1, σ2/Sxx), where Sxx = (X i X )


2 .
i 1

It follows that

ß1
~ t (n - 2) with s 2 (b ) = MSE
b1
s { b1 } 1 Sxx

Proof will be discussed in class

31
Testing (Two-sided test of β1)

H0: β1 = 0 (no linear relationship)


H1: β1 ≠ 0 (linear relationship exists between X and Y)

b1
Test statistics: T =
s { b1 }

Decision rule: Reject Ho if T t / 2,n 2

32
Two-sided Confidence Intervals for β1

Two-sided 100(1-α)% C.I. for β1 :

b1 t / 2 ,n 2
s { b1 }, b1 t / 2 ,n 2
s { b1 }

33
One-sided Test for β1

H0: β1 ≤ k
H1: β1 > k

b1 k
Test statistics: T = s { b1 }

Decision rule: Reject H0 if T t ,n 2

34
One-sided Tests for β1

H0: β1 ≧ k
H1: β1 < k

b1 k
Test statistics: T = s { b1 }

Decision rule: Reject H0 if T t ,n 2

35
Distribution of b0

Recall that b0 = Y b1 X

Hence, we can write


2

b0 ~ N(β0, σ2 1
n n
X
2
)
(X i X )
i 1

b0 0

s{ b0 }
~ t (n - 2)
Proof will be discussed in class

36
Two-sided Tests for β0

H0: β0 = k
H1: β0 ≠ k

b0 k
Test statistics: T = s{b 0 }

Decision rule: Reject H0 if T t / 2,n 2

37
Two-sided Confidence Interval for β0

Two-sided 100(1-α)% C.I. for β0 :

(b0 t /2,n 2 s{b0 }, b0 + t /2,n 2 s{b0 })

38
One-sided Tests for β0

H0: β0 ≤ k
H1: β0 > k

b0 k
Test statistics: T = s{ b }
0

Decision rule: Reject H0 if T t ,n 2

39
One-sided Test for β0

H0: β0 ≧ k
H1: β0 < k

b0 k
Test statistics: T = s{b 0 }

Decision rule: Reject H0 if T t ,n 2

40
Estimation of the mean of the response
variable for a given level of X
Example
Suppose you want to develop a model to predict selling
price of homes based on assessed value. A sample of 30
recently sold single-family houses in a small city is
selected to study the relationship between selling price
(Y, in $000) and assessed value (X, in $000). The houses
in the city had been reassessed at full value 1 year prior
to the study.

Estimate of the average selling price for houses with an


assessed value of $70,000.

41
Estimation of the mean of the response
variable for a given level of X
Let Xh denote the level of X for which we wish to
estimate the mean response [to be estimated by Ŷh].
(Note: Given Xh, the mean response is E(Yh) = ß0 + ß1Xh
according to the model)
For estimation, Given Xh: Ŷh = b0 + b1Xh

Distribution of E(Yh) - Ŷh:

2
1 X X
E(Yh) - Ŷh ~ N(0, σ2 n n
h

2
)
(X i X )
i 1

42
Confidence Interval for E(Yh)

Two-sided 100(1-α)% C.I. for E(Yh) :

2 2
1 X X 1 X X
Yˆh Yˆh
h h
t / 2 ,n 2
s n
, t / 2 ,n 2
s n
n 2 n 2
(X i
X ) (X i
X )
i 1 i 1

43
Prediction of a new observation Yh(new)

Example
Suppose you want to develop a model to predict selling
price of homes based on assessed value. A sample of 30
recently sold single-family houses in a small city is
selected to study the relationship between selling price (Y,
in $000) and assessed value (X, in $000). The houses in the
city had been reassessed at full value 1 year prior to the
study.

Estimate the selling price of an individual house with an


assessed value of $70,000.

44
Prediction of a new observation Yh(new)

Prediction of Yh(new) corresponding to a given level X


of the predictor variable by Ŷh.

Ŷh = b0 + b1Xh

Distribution of Yh(new) - Ŷh:


2
1 X X
Yh(new) - Ŷh ~ N(0, σ2 1
n n
h

2
)
(X i X )
i 1

45
Confidence Interval for Yh(new)

Two-sided 100(1-α)% C.I. for Yh(new) :

2 2

Yˆh Yˆh
1 X X 1 X X
t / 2 ,n 2
s 1
n
h
, t / 2 ,n 2
s 1
n
h

n 2 n 2
(X i X ) (X i X )
i 1 i 1

46
ANOVA and Regression Analysis

• Partitioning of Total Sum of Squares


• Mean Squares
• Analysis of Variance (ANOVA) Table

47
ANOVA and Regression

The difference between the observed value of the


response variable and the mean value of the
response variable is called the total deviation and is
equal to

The difference between the predicted value of the


response variable and the mean value of the
response variable is called the explained deviation
and is equal to

48
ANOVA and Regression

The difference between the observed value


of the response variable and the predicted
value of the response variable is called the
unexplained deviation and is equal to

49
ANOVA and Regression

50
ANOVA and Regression

Total Variation
= Unexplained Variation + Explained Variation

51
ANOVA and Regression

Total Variation = Unexplained Variation + Explained Variation

Unexplained Variation Explained Variation


1= +
Total Variation Total Variation

Explained Variation Unexplained Variation


= 1 –
Total Variation Total Variation

52
ANOVA and Regression

ANOVA Table
Sum of Squares Degrees of freedom Mean squares F
(SS) (df) (MS)
Regression SSR 1 MSR = SSR/1 MSR/MSE

Error SSE n-2 MSE = SSE/(n-2)

Total SST n-1

H0: β1 = 0 (no linear relationship)


H1: β1 ≠ 0 (linear relationship does exist between X and Y)

Test Statistics: F = MSR/MSE


Rejection Rule: reject the null hypothesis if F > F (α, 1, n-2)
53
Measures of Linear Association
between X and Y
• Coefficient of Determination R2
SSR SSE
a) R 2
SSTO
1 SSTO

b) 0 R 2
1

• Coefficient of Correlation

a) r R 2

b) A plus or minus sign is attached to this measure


according to whether the slope of the fitted regression
line is positive or negative
54

You might also like