CUHK STAT5102 Ch1

2016S_STAT5102_L1 | Department of Statistics, The Chinese University of Hong Kong
I. Simple Linear Regression
In this chapter, we shall cover
•Relations between two variables

•Scatter plots
•Univariate regression model
•Assumptions
•Estimation and method of least squares
•Inferences concerning ß1
•Inferences concerning ß0
•Estimation of the mean of the response variable for a given level of X
•Prediction of new observation
•Analysis of variance approach to regression analysis
•Measures of linear association between X and Y
1
Relations between variables
• Functional relation between two variables

Independent (predictor, explanatory) variable
Dependent (response) variable
• Statistical relation between two variables
2
Scatter plots
• A scatter plot (or scatter diagram) can be used to
show the relationship between two variables.
Linear relationship Curvilinear relationship
Y Y
X X
Y Y
3
X X
Types of Relations
Strong relations Weak relationships
Y Y
X X
Y Y
X X 4
Types of Relations (Cont’d)
No relationship
X 5
Historical Origins of Regression Models
• It was first introduced by

Sir Francis Galton in the
late 19th century.
• Galton had studied the

relation between heights
of parents and children
and noted that the heights geographer, meteorologist,
of children of both tall tropical explorer, founder of
and short parents differential psychology,
appeared to “revert” or inventor of fingerprint
“regress” to the mean of identification, pioneer of
the group. statistical correlation and
regression …
6
Basic concepts
• A tendency of the response variable Y to vary with
the predictor variable X in a systematic fashion.
• A scattering of points around the curve of

statistical relationship.
• There is a probability distribution of Y for each

level of X.
• The means of these probability distributions vary

in some systematic fashion with X.
7
Simple Linear Regression Models
Slope Random
Y intercept Independen Error
Coefficient
t Variable term
Dependent
Variable
Yi = 0 + X
1 i + i
Linear component Random Error
Assumptions: component
• E(εi) = 0
• Variance (εi) = σ2
• Covariance (εi, εj) =0 8
Model Assumptions
• Normality of Error
Error values (ε) are (normally) distributed for any
given value of X.
• Homoscedasticity
The probability distribution of the errors has
constant variance.
• Independence of Errors
Error values are statistically independent.
9
Simple Linear Regression Equation
The simple linear regression equation provides an

estimate of the population regression line.
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi = b0 + b1Xi
observation i
10
Slopes and the Intercept
• b0 is the estimated average value of Y when

the value of X is zero.
• b1 is the estimated change in the average
value of Y as a result of a one-unit change in
X.
11
Example
• A real estate agent wishes to examine the relation

between the selling price of a home and its size
(measured in square feet).
• A random sample of 10 houses is selected

Dependent variable (Y) = house price in $’000s
Independent variable (X) = square feet
12
Sample Data for House Price Model
House Price in $’000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
13
Graphical Presentation
House price model: scatter plot

450
400
House Price ($1000s)
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
14
Graphical Presentation
House price model: scatter plot and regression line

450
400
350 Slope
300
= 0.10977
250
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
house price = 98.24833 + 0.10977 (square feet)

15
Interpretation of the Intercept b0

• b0 is the estimated average value of Y when the value
of X is zero (if X = 0 is in the range of observed X
values)
• Here, no houses had 0 square feet, so b0 = 98.24833

just indicates that, for houses within the range of
sizes observed, $98,248.33 is the portion of the house
price not explained by square feet
16
Interpretation of the Slope Coefficient b1

• b1 measures the estimated change in the average value
of Y as a result of a one-unit change in X.
• Here, b1 = 0.10977 tells us that the average value of a

house increases by .10977($1000) = $109.77, on average,
for each additional one square foot of size.
17
Predictions using the Fitted Model
To predict the price for a house with 2,000
square feet, we write:
house price 98.25 0.1098 (sq.ft.)

98.25 0.1098 (2000)
317.85
The predicted price for a house with 2,000 square feet

is 317.85($1,000s) = $317,850.
18
Interpolation vs. Extrapolation
• When using a regression model for prediction, only
predict within the relevant range of data
Relevant range for

interpolation
450
400
350
300
250
200
150 Do not try to
100 extrapolate
50
0
beyond the range
0 500 1000 1500 2000 2500 3000 of observed X’s
Square Feet
19
Interpolation vs. Extrapolation
20
Estimation (Method of Least Squares)
• b0 and b1 can be obtained by choosing the

values of b0 and b1 so that they minimise
the sum of the squared differences between
Y and Ŷ:
min {Yi Ŷi (b)} = min

2
{Yi (b0 + b1Xi )} 2
21
Estimation (Method of Least Squares)
• In fact, b0 and b1 have nice closed-form solutions.
(X i X )( Y i Y )
• b1 = i 1
n
2
(X i X )
i 1
• b0 = Y b1 X
22
Estimation of error terms variance σ2
• The estimator of σ2 is n
n 2
(Yi Yˆ )
2 ei
2 SSE i 1 i 1
S MSE
n 2 n 2 n 2
• Note that S2 is an unbiased estimator of σ2.
23
Method of Maximum Likelihood
n
(X i X )( Y i Y )
• b1 = i 1
n
Normal Error Model
2
(X i X )
i 1
• b0 = Y b1 X
SSE n 2 2
• The estimator of σ2 is S
n n
• MLE of b0 = LSE of b0 (unbiased)
• MLE of b1 = LSE of b1 (unbiased)
• MLE of σ2 < unbiased estimator of σ2 [asymptotically unbiased]
24
Example
The director of admissions of a small college selected

120 students at random from the new freshman class in
a study to determine whether a student’s grade point
average (GPA) at the end of the freshman year (Y) can
be predicted from the ACT test score (X). The data are
given in the disk.
25
Regression output from SAS
Parameter Estimates
Variable Label DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept Intercept 1 2.11405 0.32089 6.59 <.0001
ACT ACT test score 1 0.03883 0.01277 3.04 0.0029
b0 = 2.11405, b1 = 0.03883, Ŷ = 2.11405 + .03883X
26
A Scatter Plot with Regression Line
27
The Slope
• Obtain a point estimate of the mean freshman

GPA for students with ACT test score X = 30.
Answer = 3.27895
• What is the point estimate of the change in the

mean response when the entrance test score
increase by one point?
Answer = 0.03883
28
More on Inference
Yi = 0 + 1Xi + i
Assumptions:
• Xi are known constants, i = 1,…n (Fixed Design)

• εi ~ N(0, σ2), i.i.d.
Therefore, Yi ~ N( β0 β 1X i , σ2), ind.
29
Distribution of b1
(X i X )( Y i Y )
b1 = i 1
n
2
(X i X )
i 1
= k iY i
i 1
(X X )
where k i n
i
2
(X i X )
i 1
30
Distribution of b1
Hence, we can deduce that

n
b1 ~ N(β1, σ2/Sxx), where Sxx = (X i X )

2 .
i 1
It follows that
ß1
~ t (n - 2) with s 2 (b ) = MSE
b1
s { b1 } 1 Sxx
Proof will be discussed in class
31
Testing (Two-sided test of β1)
H0: β1 = 0 (no linear relationship)

H1: β1 ≠ 0 (linear relationship exists between X and Y)
b1
Test statistics: T =
s { b1 }
Decision rule: Reject Ho if T t / 2,n 2
32
Two-sided Confidence Intervals for β1
Two-sided 100(1-α)% C.I. for β1 :
b1 t / 2 ,n 2
s { b1 }, b1 t / 2 ,n 2
s { b1 }
33
One-sided Test for β1
H0: β1 ≤ k
H1: β1 > k
b1 k
Test statistics: T = s { b1 }
Decision rule: Reject H0 if T t ,n 2
34
One-sided Tests for β1
H0: β1 ≧ k
H1: β1 < k
b1 k
Test statistics: T = s { b1 }
35
Distribution of b0
Recall that b0 = Y b1 X
Hence, we can write

2
b0 ~ N(β0, σ2 1
n n
X
2
)
(X i X )
i 1
b0 0
s{ b0 }
~ t (n - 2)
Proof will be discussed in class
36
Two-sided Tests for β0
H0: β0 = k
H1: β0 ≠ k
b0 k
Test statistics: T = s{b 0 }
Decision rule: Reject H0 if T t / 2,n 2
37
Two-sided Confidence Interval for β0
Two-sided 100(1-α)% C.I. for β0 :
(b0 t /2,n 2 s{b0 }, b0 + t /2,n 2 s{b0 })
38
One-sided Tests for β0
H0: β0 ≤ k
H1: β0 > k
b0 k
Test statistics: T = s{ b }
0
39
One-sided Test for β0
H0: β0 ≧ k
H1: β0 < k
b0 k
Test statistics: T = s{b 0 }
40
Estimation of the mean of the response
variable for a given level of X
Example
Suppose you want to develop a model to predict selling
price of homes based on assessed value. A sample of 30
recently sold single-family houses in a small city is
selected to study the relationship between selling price
(Y, in $000) and assessed value (X, in $000). The houses
in the city had been reassessed at full value 1 year prior
to the study.
Estimate of the average selling price for houses with an

assessed value of $70,000.
41
Estimation of the mean of the response
variable for a given level of X
Let Xh denote the level of X for which we wish to
estimate the mean response [to be estimated by Ŷh].
(Note: Given Xh, the mean response is E(Yh) = ß0 + ß1Xh
according to the model)
For estimation, Given Xh: Ŷh = b0 + b1Xh
Distribution of E(Yh) - Ŷh:
2
1 X X
E(Yh) - Ŷh ~ N(0, σ2 n n
h
2
)
(X i X )
i 1
42
Confidence Interval for E(Yh)
Two-sided 100(1-α)% C.I. for E(Yh) :
2 2
1 X X 1 X X
Yˆh Yˆh
h h
t / 2 ,n 2
s n
, t / 2 ,n 2
s n
n 2 n 2
(X i
X ) (X i
X )
i 1 i 1
43
Prediction of a new observation Yh(new)
Example
Suppose you want to develop a model to predict selling
price of homes based on assessed value. A sample of 30
recently sold single-family houses in a small city is
selected to study the relationship between selling price (Y,
in $000) and assessed value (X, in $000). The houses in the
city had been reassessed at full value 1 year prior to the
study.
Estimate the selling price of an individual house with an

assessed value of $70,000.
44
Prediction of a new observation Yh(new)
Prediction of Yh(new) corresponding to a given level X

of the predictor variable by Ŷh.
Ŷh = b0 + b1Xh
Distribution of Yh(new) - Ŷh:

2
1 X X
Yh(new) - Ŷh ~ N(0, σ2 1
n n
h
2
)
(X i X )
i 1
45
Confidence Interval for Yh(new)
Two-sided 100(1-α)% C.I. for Yh(new) :
2 2
Yˆh Yˆh
1 X X 1 X X
t / 2 ,n 2
s 1
n
h
, t / 2 ,n 2
s 1
n
h
n 2 n 2
(X i X ) (X i X )
i 1 i 1
46
ANOVA and Regression Analysis
• Partitioning of Total Sum of Squares

• Mean Squares
• Analysis of Variance (ANOVA) Table
47
ANOVA and Regression
The difference between the observed value of the

response variable and the mean value of the
response variable is called the total deviation and is
equal to
The difference between the predicted value of the

response variable and the mean value of the
response variable is called the explained deviation
and is equal to
48
The difference between the observed value

of the response variable and the predicted
value of the response variable is called the
unexplained deviation and is equal to
49
50
Total Variation
= Unexplained Variation + Explained Variation
51
Total Variation = Unexplained Variation + Explained Variation
Unexplained Variation Explained Variation

1= +
Total Variation Total Variation
Explained Variation Unexplained Variation

= 1 –
Total Variation Total Variation
52
ANOVA Table
Sum of Squares Degrees of freedom Mean squares F
(SS) (df) (MS)
Regression SSR 1 MSR = SSR/1 MSR/MSE
Error SSE n-2 MSE = SSE/(n-2)
Total SST n-1
H0: β1 = 0 (no linear relationship)

H1: β1 ≠ 0 (linear relationship does exist between X and Y)
Test Statistics: F = MSR/MSE

Rejection Rule: reject the null hypothesis if F > F (α, 1, n-2)
53
Measures of Linear Association
between X and Y
• Coefficient of Determination R2
SSR SSE
a) R 2
SSTO
1 SSTO
b) 0 R 2
1
• Coefficient of Correlation
a) r R 2
b) A plus or minus sign is attached to this measure

according to whether the slope of the fitted regression
line is positive or negative
54

CUHK STAT5102 Ch1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CUHK STAT5102 Ch1

Uploaded by

Copyright:

Available Formats

2016S_STAT5102_L1 | Department of Statistics, The Chinese University of Hong Kong

I. Simple Linear Regression

In this chapter, we shall cover

•Relations between two variables

• Functional relation between two variables

• Statistical relation between two variables

Linear relationship Curvilinear relationship

Strong relations Weak relationships

• It was first introduced by

• Galton had studied the

• A scattering of points around the curve of

• There is a probability distribution of Y for each

• The means of these probability distributions vary

The simple linear regression equation provides an

• b0 is the estimated average value of Y when

• A real estate agent wishes to examine the relation

• A random sample of 10 houses is selected

House price model: scatter plot

House price model: scatter plot and regression line

house price = 98.24833 + 0.10977 (square feet)

house price = 98.24833 + 0.10977 (square feet)

• Here, no houses had 0 square feet, so b0 = 98.24833

house price = 98.24833 + 0.10977 (square feet)

• Here, b1 = 0.10977 tells us that the average value of a

house price 98.25 0.1098 (sq.ft.)

The predicted price for a house with 2,000 square feet

Relevant range for

• b0 and b1 can be obtained by choosing the

min {Yi Ŷi (b)} = min

• In fact, b0 and b1 have nice closed-form solutions.

• Note that S2 is an unbiased estimator of σ2.

The director of admissions of a small college selected

b0 = 2.11405, b1 = 0.03883, Ŷ = 2.11405 + .03883X

• Obtain a point estimate of the mean freshman

• What is the point estimate of the change in the

• Xi are known constants, i = 1,…n (Fixed Design)

Therefore, Yi ~ N( β0 β 1X i , σ2), ind.

Hence, we can deduce that

b1 ~ N(β1, σ2/Sxx), where Sxx = (X i X )

Proof will be discussed in class

H0: β1 = 0 (no linear relationship)

Decision rule: Reject Ho if T t / 2,n 2

Two-sided 100(1-α)% C.I. for β1 :

Decision rule: Reject H0 if T t ,n 2

Decision rule: Reject H0 if T t ,n 2

Hence, we can write

Decision rule: Reject H0 if T t / 2,n 2

Two-sided 100(1-α)% C.I. for β0 :

(b0 t /2,n 2 s{b0 }, b0 + t /2,n 2 s{b0 })

Decision rule: Reject H0 if T t ,n 2

Decision rule: Reject H0 if T t ,n 2

Estimate of the average selling price for houses with an

Distribution of E(Yh) - Ŷh:

Two-sided 100(1-α)% C.I. for E(Yh) :

Estimate the selling price of an individual house with an

Prediction of Yh(new) corresponding to a given level X

Distribution of Yh(new) - Ŷh:

Two-sided 100(1-α)% C.I. for Yh(new) :

• Partitioning of Total Sum of Squares

The difference between the observed value of the

The difference between the predicted value of the

The difference between the observed value

Total Variation = Unexplained Variation + Explained Variation

Unexplained Variation Explained Variation