You are on page 1of 4

THE UNIVERSITY OF HONG KONG

DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE


STAT2301_3600 Linear Statistical Analysis (Semester 1, 2014/2015)
Example Class 1
Notations
n

S XX = ( xi x ) = xi nx 2
2

i =1
n

i =1
n

S YY = ( y i y ) = y i ny 2
S=
XY

i =1
n

i =1

y)
( xi x )( yi =

x y n x y

=i 1 =i 1

dependent variable

Scatter Plot / Scatter Diagram / Scatter Graph


A graphical approach to display the values of two variables

independent variable
Sample correlation coefficient r
An indicator to measure the linear association between two variables

r = r = XY =
X Y

S XY
n 1
S XX S YY
n 1 n 1

S XY
S XX S YY

1 r 1
Scale-independent

Simple Linear Regression Model


To study the linear relationship between an explanatory variable (independent variable / predictor
1

variable / regressor) and a response variable (dependent variable / predicted variable) based on a
sample (of size n ) collected.
Consider a sample of observations is observed in the form of ( , ), = 1, , .

1
2

1
2

where is the explanatory variable and is the response variable.


Assumptions of simple linear regression model:
1 , , are nonrandom constants,
= 0 + 1 + with ~ . . . (0, 2 ),
The responses 1 , , are independent.

0 = 0 and 1 = 1 are estimators of 0 and 1


Fitted value y i = 0 + 1 xi
Residual ei = y i y i
The least square estimates (also the MLE) of 0 and 1 are

S XY

b1 = b1 =
S XX

b = b = y b x
0
1
0
0 and 1 are unbiased estimators for 0 and 1 respectively.

(0 ) = 2 +
(1 ) =

(0 , 1 ) =

Mean square error MSE is the estimate of 2 , i.e.


=1( )2
2 = 2 =
,
2
Hence
2
estimate of (1 ) =

estimate of (0 ) = 2 +

estimate of (0 , 1 ) =

Example 1.1

Show that when the line = , which passes through the origin, is fitted to the data ( , ),
2

= 1, 2, , , the least squares estimate of is


=1
=
.
=1 2

Example 1.2

Consider the following data on the number of hours that 10 persons studied for a French test and
their scores on the test:
Hours studied ( x )
Test score ( y )
4
31
9
58
10
65
14
73
4
37
7
44
12
60
22
91
1
21
17
84
The scatter plot of test score against hours studied is:

Test score (Y)

100
90
80
70
60
50
40
30
20
10
0
0

10
15
Hours studied (X)

20

25

a) Does a linear relationship appear reasonable?


b) Compute S XX , S YY and S XY .
c) Estimate the correlation coefficient between hours studied ( x ) and test score ( y ).
d) Write down an appropriate model according to the data given and state all the model
assumptions.
e) Fit a regression line of y on x .

f) Predict the test score of a person who studied 22.5 hours for the test. Is the prediction reliable?
g) Given: 2 = 2 = 27.8848. Find the standard errors for b 0 and b1 .

h) Estimate the covariance of b 0 and b1 .

Example 1.3
A sample of n boys and n girls is taken from a secondary school and their heights are measured.
Let y1 , y 2 ,..., y n denote the heights of the n girls, and y n +1 , y n + 2 ,..., y 2 n those of the n boys,
respectively. It is believed that the random quantities y i satisfy y i = + xi + i ,

i ~ i.i.d .N (0, 2 ), i = 1,...,2n where , are unknown parameters and the covariates xi are
i = 1,2,..., n
1,
defined by xi =
Find the least squares estimators of and .
i = n + 1,...,2n
+ 1,