Original Title: Example Class One

STAT2301_3600 Linear Statistical Analysis (Semester 1, 2014/2015)

Example Class 1

Notations

n

S XX = ( xi x ) = xi nx 2

2

i =1

n

i =1

n

S YY = ( y i y ) = y i ny 2

S=

XY

i =1

n

i =1

y)

( xi x )( yi =

x y n x y

=i 1 =i 1

dependent variable

A graphical approach to display the values of two variables

independent variable

Sample correlation coefficient r

An indicator to measure the linear association between two variables

r = r = XY =

X Y

S XY

n 1

S XX S YY

n 1 n 1

S XY

S XX S YY

1 r 1

Scale-independent

To study the linear relationship between an explanatory variable (independent variable / predictor

1

variable / regressor) and a response variable (dependent variable / predicted variable) based on a

sample (of size n ) collected.

Consider a sample of observations is observed in the form of ( , ), = 1, , .

1

2

1

2

Assumptions of simple linear regression model:

1 , , are nonrandom constants,

= 0 + 1 + with ~ . . . (0, 2 ),

The responses 1 , , are independent.

Fitted value y i = 0 + 1 xi

Residual ei = y i y i

The least square estimates (also the MLE) of 0 and 1 are

S XY

b1 = b1 =

S XX

b = b = y b x

0

1

0

0 and 1 are unbiased estimators for 0 and 1 respectively.

(0 ) = 2 +

(1 ) =

(0 , 1 ) =

=1( )2

2 = 2 =

,

2

Hence

2

estimate of (1 ) =

estimate of (0 ) = 2 +

estimate of (0 , 1 ) =

Example 1.1

Show that when the line = , which passes through the origin, is fitted to the data ( , ),

2

=1

=

.

=1 2

Example 1.2

Consider the following data on the number of hours that 10 persons studied for a French test and

their scores on the test:

Hours studied ( x )

Test score ( y )

4

31

9

58

10

65

14

73

4

37

7

44

12

60

22

91

1

21

17

84

The scatter plot of test score against hours studied is:

100

90

80

70

60

50

40

30

20

10

0

0

10

15

Hours studied (X)

20

25

b) Compute S XX , S YY and S XY .

c) Estimate the correlation coefficient between hours studied ( x ) and test score ( y ).

d) Write down an appropriate model according to the data given and state all the model

assumptions.

e) Fit a regression line of y on x .

f) Predict the test score of a person who studied 22.5 hours for the test. Is the prediction reliable?

g) Given: 2 = 2 = 27.8848. Find the standard errors for b 0 and b1 .

Example 1.3

A sample of n boys and n girls is taken from a secondary school and their heights are measured.

Let y1 , y 2 ,..., y n denote the heights of the n girls, and y n +1 , y n + 2 ,..., y 2 n those of the n boys,

respectively. It is believed that the random quantities y i satisfy y i = + xi + i ,

i ~ i.i.d .N (0, 2 ), i = 1,...,2n where , are unknown parameters and the covariates xi are

i = 1,2,..., n

1,

defined by xi =

Find the least squares estimators of and .

i = n + 1,...,2n

+ 1,

