Professional Documents
Culture Documents
Weight (kg)
75
of finding the equation of the line that is
70
closest to all points observed. 65
Example 1:
A patient is given a drip feed containing a particular chemical and its concentration in his blood is
measured, in suitable units, at one hour intervals. The doctors believe that a linear relationship will
exist between the variables.
Time, x (hours) 0 1 2 3 4 5 6
Concentration, y 2.4 4.3 5.0 6.9 9.1 11.4 13.5
There is a formula which gives the equation of the line of best fit.
1
** The statistical equation of the simple linear regression line, when only the response variable Y is
random, is: Y 0 1 x (or in terms of each point: Yi 0 1 xi i )
Here 0 is called the intercept, 1 the regression slope, is the random error with mean 0, x is the
regressor (independent variable), and Y the response variable (dependent variable).
** The least squares regression line is obtained by finding the values of 0 and 1 values (denoted in
the solutions as ˆ & ˆ ) that will minimize the sum of the squared vertical distances from all points
0 1
to the line: d yi yˆ i yi 0 1 xi
2 2 2
i
The solutions are found by solving the equations: 0 and 0
0 1
** The equation of the fitted least squares regression line is Yˆ ˆ0 ˆ1 x (or in terms of each point:
Yˆ ˆ ˆ x ) ----- For simplicity of notations, many books denote the fitted regression equation as:
i 0 1 i
Yˆ b0 b1 x (* you can see that for some examples, we will use this simpler notation.)
S xy
where ̂ 1 and ˆ0 y ˆ1 x .
S xx
x y x x 2
xy x y i y ; S xx x xi x ;
2 2
Notations: S xy i
n n
x and y are the mean values of x and y respectively.
Note 1: Please notice that in finding the least squares regression line, we do not need to assume
any distribution for the random errors i . However, for statistical inference on the model
parameters ( 0 and 1 ), it is assumed in our class that the errors have the following three properties:
Normally distributed errors
Homoscedasticity (constant error variance var i 2 for Y at all levels of X)
Independent errors (usually checked when data collected over time or space)
***The above three properties can be summarized as: i ~ N 0, 2 , i 1, , n
i .i .d .
Note 2: Please notice that the least squares regression is only suitable when the random errors exist in
the dependent variable Y only. If the regression X is also random – it is then referred to as the Errors
in Variable (EIV) regression. One can find a good summary of the EIV regression in section 12.2 of
the book: “Statistical Inference” (2nd edition) by George Casella and Roger Berger.
2
S xy xy
x y 209.4 21 52.6 51.6
n 7
S xx x 2
x 2
91
212 28
n 7
S xy 51.6
So, ˆ1 1.843 and ˆ0 y ˆ1 x 7.514 1.843 3 1.985 .
S xx 28
So the equation of the regression line is ŷ = 1.985 + 1.843x.
To work out the concentration after 3.5 hours: ŷ = 1.985 + 1.843 × 3.5 = 8.44 (3sf)
If you want to find how long it would be before the concentration reaches 8 units, we substitute ŷ = 8
into the regression equation:
8 = 1.985 + 1.843x
Solving this we get: x = 3.26 hours
Note: It would not be sensible to predict the concentration after 8 hours from this equation – we don’t
know whether the relationship will continue to be linear. The process of trying to predict a value from
outside the range of your data is called extrapolation.
Example 2:
The heights and weights of a sample of 11 students are:
Height 1.36 1.47 1.54 1.56 1.59 1.63 1.66 1.67 1.69 1.74 1.81
(m) h
Weight 52 50 67 62 69 74 59 87 77 73 67
(kg) w
[ n 11 h 17.72 h 2
28.705 w 737 w 2
50571 hw 1196.1 ]
a) Calculate the regression line of w on h.
b) Use the regression line to estimate the weight of someone whose height is 1.6m.
Note: Both height and weight are referred to as random variables – their values could not have been
predicted before the data were collected. If the sampling were repeated again, different values would
be obtained for the heights and weights.
Solution:
a) We begin by finding the mean of each variable:
h
h 17.72 1.6109...
n 11
w
w 737 67
n 11
Next we find the sums of squares:
3
h
2
17.72 2
S hh h
2
28.705 0.1597
n 11
w 2
737 2
S ww w 2
50571 1192
n 11
S hw hw
h w 1196.1 17.72 737 8.86
n 11
The equation of the least squares regression line is:
wˆ ˆ0 ˆ1 h
where
S 8.86
ˆ1 hw 55.5
S hh 0.1597
and
ˆ0 w ˆ1 h 67 55.5 1.6109 22.4
So the equation of the regression line of w on h is:
ŵ = -22.4 + 55.5h
Error sum of squares (Unexplained Variation): SSE Yi Yˆi
2
4
Coefficients of Determination and Correlation
The sample correlation is denoted by r and is closely related to the coefficient of determination as
follows:
r sign ˆ R 2 ; 1 r 1
1
r
( x x )( y y )
S XY
n xy x y
[ ( x x ) ][ ( y y )
2 2
] S XX S YY [n( x 2 ) ( x) 2 ][n( y 2 ) ( y ) 2 ]
The corresponding population correlation between Y and X is denoted by ρ and defined by:
Therefore one can see that in the population correlation definition, both X and Y are assumed to
be random. When the joint distribution of X and Y is bivariate normal, one can perform the
following t-test to test whether the population correlation is zero:
• Hypotheses
H0: ρ = 0 (no correlation)
HA: ρ ≠ 0 (correlation exists)
• Test statistic
H0
r
t0 ~t n2
1 r2
n2
Note: One can show that this t-test is indeed the same t-test in testing the regression slope β1 = 0
shown in the following section.
Note: The sample correlation is not an unbiased estimator of the population correlation. You can
study this and other properties from the wiki site:
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
Example 3: The following example tabulates the relations between trunk diameter and tree height.
5
Tree Trunk
Height Diameter
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
Scatter plot:
70
60
50
40
30
20
10
0
0 2 4 6 8 10 12 14
n xy x y
r
[n( x 2 ) ( x) 2 ][n( y 2 ) ( y) 2 ]
8(3142) (73)(321)
[8(713) (73) 2 ][8(14111) (321) 2 ]
0.886
6
r = 0.886 → relatively strong positive linear association between x and y
r .886
t0 4.68
1 r2 1 .886 2
n2 82
At the significance level α = 0.05, we reject the null hypothesis because t 0 4.68 t 6,0.025 2.447
and conclude that there is a linear relationship between tree height and trunk diameter.
Data tree;
Input height trunk;
Datalines;
35 8
49 9
27 7
33 6
60 13
21 7
45 11
51 12
;
Run;
Note: See the following website for more examples and interpretations of the output – plus how
to draw the scatter plot (proc Gplot) in SAS: http://www.ats.ucla.edu/stat/sas/output/corr.htm
SSE
ˆ s
n2
7
Simple Linear Regression:
3. Inferences Concerning the Slope
Objectives: measures of variation, the goodness-of-fit measure, and the correlation coefficient
t-test
Test used to determine whether the population based slope parameter ( 1 ) is equal to a pre-determined
value (often, but not necessarily 0). Tests can be one-sided (pre-determined direction) or two-sided
(either direction).
2-sided t-test:
b1 0
• Test statistic: t 0
s b1
sε sε sε
Where s b1
S XX (x x) 2
( x) 2
x
n
2
A test based directly on sum of squares that tests the specific hypotheses of whether the slope
parameter is 0 (2-sided). The book describes the general case of k predictor variables, for simple
linear regression, k = 1.
H 0 : 1 0 H A : 1 0
MSR SSR / k
TS: Fobs
MSE SSE / (n k 1)
RR: Fobs F ,k ,n k 1
8
If entire interval is positive, conclude >0 (Positive association)
If interval contains 0, conclude (do not reject) 0 (No association)
If entire interval is negative, conclude <0 (Negative association)
Example 4: A real estate agent wishes to examine the relationship between the selling price of a home
and its size (measured in square feet). A random sample of 10 houses is selected
– Dependent variable (y) = house price in $1000s
– Independent variable (x) = square feet
–
House Price in $1000s (y) Square Feet (x)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
9
Estimated house price 98.24833 0.10977 (square feet)
• b1 measures the estimated change in the average value of Y as a result of a one-unit change in
X
– Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000)
= $109.77, on average, for each additional one square foot of size
SSR 18934.9348
R2 0.58082
SST 32600.5000
This means that 58.08% of the variation in house prices is explained by variation in square feet.
s ε 41.33032
The standard error (estimated standard deviation of the random error) is also given in the output
(above).
Since the units of the house price variable is $1000s, we are 95% confident that the average impact on
sales price is between $33.70 and $185.80 per square foot of house size
Example 5 (SAS): What is the relationship between Mother’s Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4
11
Finance Application: Market Model
• One of the most important applications of linear regression is the market model.
• It is assumed that rate of return on a stock (R) is linearly related to the rate of return on the
overall market.
R = β0 + β1Rm +ε
R: Rate of return on a particular stock
Rm: Rate of return on some major stock index
β1: The beta coefficient measures how sensitive the stock’s rate of return is to changes in the
level of the overall market.
Example: Here we estimate the market model for Nortel, a stock traded in the Toronto Stock
Exchange. Data consisted of monthly percentage return for Nortel and monthly percentage return for
all the stocks.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.560079
R Square 0.313688
Adjusted R Square
0.301855
Standard Error
0.063123
Observations 60
ANOVA
df SS MS F Significance F
Regression 1 0.10563 0.10563 26.50969 3.27E-06
Residual 58 0.231105 0.003985
Total 59 0.336734
Coefficients
Standard Error t Stat P-value
Intercept 0.012818 0.008223 1.558903 0.12446
TSE 0.887691 0.172409 5.148756 3.27E-06
TSE (estimated regression slope): This is a measure of the stock’s market related risk. In this sample,
for each 1% increase in the TSE return, the average increase in Nortel’s return is .8877%.
R Square (R2) This is a measure of the total risk embedded in the Nortel stock, that is market-related.
Specifically, 31.37% of the variation in Nortel’s return are explained by the variation in the TSE’s
returns.
12
Linear Regression in Matrix Form
Data:
( y1 , x11 , x12 ,, x1 p 1 ), ( y 2 , x 21 , x 22 ,, x 2 p 1 ),, ( y n , x n1 , x n 2 ,, x np 1 ).
The multiple linear regression model in scalar form is
yi 0 1 xi1 2 xi 2 p 1 xip 1 i , i ~ N (0, 2 ), i 1,, n.
The above linear regression can also be represented in the vector/matrix form. Let
y1 1 x11 x1 p 1 1 0
y 1 x
x2 p 1
y , X , ε , β
2 21 2 2
.
yn 1 xn1 xnp1 n p 1
Then,
y1 0 1 x11 p 1 x1 p 1 1 0 1 x11 p 1 x1 p 1 1
y x x x x
p 1 2 p 1 2
p 1 2 p 1
y 2
0 1 21 0 1 21
2
yn 0 1 xn1 p 1 xnp1 n 0 1 xn1 p 1 xnp1 n
1 x11 x1 p 1 0 1
1 x x2 p 1 1 2
= 21
Xβ ε .
1 xn1 xnp1 p 1 n
Estimation:
Least square method:
The least square method is to find the estimate of β minimizing the sum of square of residual,
1
n
S (β) S ( 0 , 1 ,, p 1 ) i2 1 2 n 2 ε t ε
i 1
n
(y Xβ)t (y Xβ)
since ε Y Xβ . Expanding S (β ) yields
S (β) (y Xβ)t (y Xβ) (y t β t X t )(y Xβ)
y t y y t Xβ β t X t y β t X t Xβ
y t y 2β t X t y β t X t Xβ
Note:
For two matrices A and B, AB t B t A t and A 1 At
t 1
Similar to the procedure in finding the minimum of a function in calculus, the least square estimate b
can be found by solving the equation based on the first derivative of S (β ) ,
13
S β
0
S (β) S β
y t y 2β t X t y β t X t Xβ
1 2X t y 2X t Xβ 0
β β
S β
p 1
X t Xβ X t y
b0
b
t 1 t 1
b (X X) X y
bp1
The fitted regression equation is
yˆ b0 b1 x1 b2 x 2 b p 1 x p 1 .
yˆ1
yˆ
The fitted values (in vector): y Xb
ˆ 2
yˆ n
e1
e
The residuals (in vector): y yˆ y Xb e
2
en
p 0 a1
t ( i 1ai ) a
(β a ) 1
Note: (i)
i 1
a, where β and a 2 .
β β
p 1 a p
p p
( i 1 j 1aij )
(β t Aβ) i 1 j 1
(ii) 2Aβ, where A is any symmetric p p matrix.
β β
Note: Since
X X t t
X t X t X t X ,
t
X t X is a symmetric matrix.
Also,
X X X X
t 1 t t t 1
XtX
1
,
X X
t 1
is a symmetric matrix.
14
1
1
Therefore, if there is intercept, then the first column of X is . Then,
1
1 x11 x1 p 1
1 x x
2 p 1
n
e t X e1 e2 en ei 0
21
i 1
1 xn1 xnp1
n
ei 0
i 1
n
Note: for the linear regression model without the intercept, e
i 1
i might not be
equal to 0.
E ( Z n )
and
cov( Z1 , Z1 ) cov( Z1 , Z 2 ) cov( Z1 , Z n )
cov( Z , Z ) covZ 2 , Z 2 cov( Z 2 , Z n )
V (Z ) 2 1
cov( Z n , Z1 ) cov( Z n , Z 2 ) cov( Z n , Z n )
Var ( Z1 ) cov( Z1 , Z 2 ) cov( Z1 , Z n ) .
cov( Z , Z ) Var ( Z 2 ) cov( Z 2 , Z n )
2 1
cov( Z n , Z1 ) cov( Z n , Z 2 ) Var ( Z n )
Then
(a) E ( AZ ) AE ( Z ), E ( Z C ) E ( Z ) C .
(b) V ( AZ ) AV (Z ) At , V (Z C ) V (Z )
Note:
E ε 0,V 2 I
15
The properties of least square estimate:
E (b0 ) 0
E (b )
1. E (b ) 1
β
1
E (bp 1 ) p 1
2. The variance –covariance matrix of the least square estimate b is
Var (b0 ) cov(b0 , b1 ) cov(b0 , b p1 )
cov(b , b ) Var (b1 ) cov(b1 , b p 1 )
V (b )
1 0
cov(b p 1 , b0 ) cov(b p 1 , b1 ) Var (b p1 )
2 ( X t X ) 1
[Derivation:]
1
E (b) E X t X X t y X t X X t E (y ) X t X X t Xβ β
1 1
since
E (y ) EXβ ε Xβ E (ε) Xβ 0 Xβ .
Also,
V(b) V X t X X t y X t X X tV(y)
1
1
X X X
t 1 t
t
XtX 1
X tσ 2 IX X t X
1
σ 2 XtX
1
XtX XtX
1
σ 2 XtX 1
since
V (y ) V Xβ ε V (ε) 2 I
Example 6:
Heller Company manufactures lawn mowers and related lawn equipment. The managers believe the
quantity of lawn mowers sold depends on the price of the mower and the price of a competitor’s
mower. We have the following data:
Competitor’s Price Heller’s Price Quantity sold
xi1 xi 2 yi
120 100 102
140 110 100
190 90 120
130 150 77
155 210 46
175 150 93
125 250 26
145 270 69
180 300 65
150 250 85
The regression model for the above data is
yi 0 1 xi1 2 xi 2 i .
The data in matrix form are
16
y1 102 1 x11 x12 1 120 100
y 100 1 x 1 140 110 0
Y , X
2 21 x 22 , β 1 .
2
y10 85 1 x101 x102 1 150 250
The least square estimate b is
b0 66.518
b b1 X X X y 0.414 .
t 1 t
Suppose now we want to predict the quantity sold in a city where Heller prices it mower at $160 and
the competitor prices its mower at $170. The quantity sold predicted is
Example 7:
We show how to use the matrix approach to obtain the least square estimate and its expected value and
the variance.
Let
yi 0 1 xi i , i 1, , n .
Then,
y1 1 x1 1
y 1 x
2 2 0
y ,X , , 2
1
yn 1 xn n
Thus,
1 x1 n n
n
1 1
x i
1 1
y 1
i 1 yi
XtX
i 1
, Xty
x1 x n 1 x x
n n
2 x1 x n n
n i xi
y n xi y i
i 1 i 1 i 1
and
n 2 n
n 2
1 x i x i
1 xi
( X t X ) 1 n n
i 1
n
i 1
n i 1
x
2
n xi ( xi ) xi
2
n xi nx
2 2 n
i 1 i 1
i 1
i 1
x 1
n 2
1
xi
= i 1 x ,
S XX n
x 1
since
17
1
a b 1 d b
c
d ad bc c a
Therefore,
n 2 n
n
n
xi y i
1 i 1 2
xi yi
1 i 1 i 1 n
xi y i x
b ( X t X ) 1 X t y x n
i 1
n i 1
S XX n xi y i S XX n n
x 1 i 1 x y i xi y i
i 1 i 1
n 2 2 n
y x i n x n x 2
y xi y i x
i 1 i 1
1
S XX n
i 1
xi y i nx y
S
n y x XY
1 yS XX x xi y i nx y 1 yS XX x S XY S XX
S XX i 1 S XX S XY S XY
S XY S XX
y b1 x
b1
Also,
E (b0 ) 0
E (b)
E (b1 ) 1
and
n 2
xi x
Var (b0 ) cov(b0 , b1 ) i 1 2
V (b) nS XX S XX
cov(b0 , b1 ) Var (b1 ) x 1
S XX S XX
Acknowledgement: In compiling this lecture notes, we have revised some materials from the
following websites:
www.schoolworkout.co.uk/documents/s1/Regression.doc
www.stat.ufl.edu/~winner/mar5621/mar5621
www.fordham.edu/economics/Vinod/correl-regr.
www2.thu.edu.tw/~wenwei/Courses/regression/ch6.1.doc
www.msu.edu/~fuw/teaching/Fu_Ch11_linear_regression.ppt
www.stanford.edu/class/msande247s/kchap17.ppt
Quiz 3. Due Tuesday at the beginning of the lecture (*Yes, I will definitely collect*)
18