Professional Documents
Culture Documents
1
a) Definition - Regression
• Note that there can be many x variables but we will limit ourselves to the case where
there is only one x variable to start with. In our set-up, there is only one y variable.
a) Definition - Regression is different from correlation
• The correlation between two variables measures the degree of linear association
between them.
• If we say y and x are correlated, it means that we are treating y and x in a completely
symmetrical way.
– It is not implied that changes in x cause changes in y, or indeed that changes in y cause
changes in x.
– It is simply stated that there is evidence for a linear relationship between the two variables,
and that movements in the two are on average related to an extent given by the correlation
coefficient.
• In regression, we treat the dependent variable (y) and the independent variable(s) (x’s)
very differently. The y variable is assumed to be random or “stochastic” in some way,
i.e., to have a probability distribution. The x variables are, however, assumed to have
fixed (“non-stochastic”) values in repeated samples.
– Regression as a tool is more flexible and more powerful than correlation.
a) Simple regression
• For simplicity, say k = 1. This is the situation where y depends on only one
x variable.
• Is this realistic? No, since it corresponds to the case where the model fitted the
data perfectly, i.e., all of the data points lay exactly on a straight line.
• We observe data for xt, but since yt also depends on ut, we must be
specific about how the ut are generated; ut is not observable; guess the
value of ut.
• We usually make the following set of assumptions about the ut’s (the
unobservable error terms); make some reasonable assumption about the
shape of the distribution of each u.
b) The assumptions underlying the CLRM
• An alternative assumption to 4., which is slightly stronger, is that the xt’s are
non-stochastic or fixed in repeated samples; values of x are either controllable
or fully predictable. Violation of this assumption creates problems like “errors
in variable” and “autoregression”.
• Additional Assumption
5. ut is normally distributed.
- Violation of it will render the usual tests of significance for the
estimated parameters, e.g., t-test, inapplicable.
c) OLS estimation - Estimator or estimate?
• The PRF is a description of the model that is thought to be generating the actual data
and the true relationship between the variables (i.e., the true values of and ).
• PRF = DGP
• The PRF is what is really wanted, but all that is ever available is the SRF.
x
c) Ordinary least squares (OLS)
• The most common method used to fit a line to the data is known as OLS.
• What we actually do is take each distance and square it (i.e. take the area of each of the
squares in the diagram) and minimise the total sum of the squares (hence least squares).
c) Ordinary least squares (OLS): Actual and fitted value
û i
ŷi
ŷt
xi x
c) Ordinary least squares (OLS): Actual and fitted value
1 2 3
2
4
2
5
t
• So min. uˆ uˆ uˆ uˆ uˆ , or minimise t 1
2 2 2 ˆ
u 2
. This is known
as the residual sum of squares (RSS) or the sum of squared residuals.
• But what was ût ? It was the difference between the actual point and
the line, yt - ŷt .
• So minimising y ˆ
y
t t is equivalent to minimising
2
t
ˆ
u 2
• Letting ̂ and ˆ denote the values of α and β selected by minimising the RSS, respectively, the
equation for the fitted line is given by:
yˆ t ˆ ˆxt
• Let L denote the RSS, which is also known as a loss function:
L ( yt yˆ t ) 2 ( yt ˆ ˆxt ) 2
t i
• L is minimised with respect to (w.r.t.) ̂ and ˆ , to find the values of α and β which minimise
the residual sum of squares to give the line that is closest to the data.
• So L is differentiated w.r.t. ̂ and ˆ, setting the first derivatives to zero.
• The coefficient estimators for the slope and the intercept are given by:
ˆ x y Txy
t t
and ˆ y ˆx
x Tx 2
t
2
c) How OLS works
ˆ x y Tx y
t t
and ˆ y ˆx
x Tx 2
t
2
• Both equations state that, given only the sets of observations xt and yt, it is always possible to
calculate the values of the two parameters, ̂ and ˆ, that best fit the set of data.
• The first formula can also be written, more intuitively, as:
ˆ ( x x )( y y )
t t
(x x) t
2
• Suppose that we have the following data on the excess returns on a fund
manager’s portfolio (“fund XXX”) together with the excess returns on a
market index:
Year, t Excess return Excess return on market index
= rXXX,t – rft = rmt - rft
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3
• We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship between
x and y given the data that we have. The first stage would be to form a
scatter plot of the two variables.
Graph (scatter diagram)
45
Excess return on fund XXX
40
35
30
25
20
15
10
5
0
0 5 10 15 20 25
Excess return on market portfolio
What do we use $ and $ for?
• Plugging the five observations in to make up the formulae of $ and $ would
lead to the estimates:
$ = -1.74 and $ = 1.64. We would write the fitted line as:
yˆ t 1.74 1.64 x t
– xt = market risk premium
• Question: If an analyst tells you that she expects the market to yield a return
20% higher than the risk-free rate next year, what would you expect the return
on fund XXX to be?
• Solution: We can say that the expected value of y = “-1.74 + 1.64 * value of x”,
so plug x = 20 into the equation to get the expected value for y:
– Thus, for a given expected market risk premium of 20%, fund XXX would be expected to
earn an excess over the risk-free rate of approximately 31%.
What do we use $ and $ for?
yˆ t 1.74 1.64 x t
• If x increases by 1 unit, y will be expected, everything else being equal, to increase by
1.64 units’.
• If $ had been negative, a rise in x would on average cause a fall in y.
• $, the intercept coefficient estimate, is interpreted as the value that would be taken by
the dependent variable y if the independent variable x took a value of zero.
• Suppose that $ = 1.64, x is measured in per cent and y is measured in thousands of US
dollars. Then, if x rises by 1%, y will be expected to rise on average by $1.64 thousand
(or $1,640).
d) Properties of OLS regression line
1. The OLS regression line passes through the point of means (𝑥,ҧ ȳ,). [Eq. (6) and (7) (see ols-proof)]:
n n
Yi Xi
b1 i 1 b i 1
(6)
n 2 n
b1 Y b2 X (7)
⸫𝒚
ഥ=𝜶 𝒙
ෝ + 𝒃ഥ
2. ei have zero covariance with the sample xi values, and also with 𝑦ෝ𝑖 .
1
• Cov(xi,ei) = σ(𝑥𝑖 − 𝑥)(𝑒
ҧ 𝑖 − 𝑒)ҧ [Eq. (5) (see ols-proof)]
𝑛
1 SSE n
= σ(𝑥𝑖 − 𝑥)𝑒
ҧ 𝑖 (⸪ 𝑒ҧ = 0) 2 X i Yi b1 b2 X i 0 (5)
𝑛 b2 i 1
1 1
= σ 𝑥𝑖𝑒𝑖 − 𝑥ҧ σ 𝑒𝑖 SSE n
𝑛 𝑛 2 X i ei 0 [ Yi b1 b2 X i Yi (b1 b2 X i ) Yi Yˆi ei ]
1 b2 i 1
= σ 𝑥𝑖𝑒𝑖 (⸪ ∑ei = 0)
𝑛 SSE n
X i ei 0
b2 i 1
𝟏
⸫ Cov(xi,ei) = σ 𝒙𝒊𝒆𝒊 = 0
𝒏
a) Asymptotic unbiasedness:
• ̂ is an asymptotically unbiased estimator of β if:
lim 𝐸 𝛽መ = 𝛽
𝑛→∞
Estimator ̂ , which is biased, becomes unbiased as the sample size approaches infinity.
If an estimator is unbiased, it is also asymptotically unbiased.
• The least squares estimates of ̂and ̂ are unbiased. That is E(̂ )= and E(ˆ )=.
• Thus, on average, the estimated value will be equal to the true values.
• To prove this also requires the assumption that cov(ut, xt)=0 and E(ut)=0.
• Unbiasedness is a stronger condition than consistency since it holds for small as
well as large samples
e) Properties of estimators
2. Large sample or asymptotic properties
• Sample size is large and approaches to infinity.
b) Consistency
• If the increase in sample size reduces bias and variance of the estimator, and this continues until both bias
variance become zero as n → ∞, the estimator is said to be consistent.
• ̂ is a consistent estimator if:
lim 𝐸 𝛽መ − 𝛽 = 0
𝑛→∞
and
lim 𝑉𝑎𝑟 𝛽መ = 0
𝑛→∞
• The least squares estimators ̂ and ˆ are consistent.
• That is, the estimates will converge to their true values as the sample size increases to infinity.
• Need the assumptions E(xtut)=0 and E(ut) = 0 to prove this.
f) Properties of the OLS estimators
• In order to use OLS, we need a model which is linear in the parameters ( and ). It does
not necessarily have to be linear in the variables (y and x).
• Linear in the parameters means that the parameters are not multiplied together, divided,
squared or cubed etc.
• Models that are not linear in the variables can often be made to take a linear form by
applying a suitable transformation or manipulation, e.g., the exponential regression model:
Yt AX t eut ln Yt ln( A) ln X t ut
– Taking logarithms of both sides, applying the laws of logs and rearranging the right-
hand side (RHS).
• Then let ln(A) = α, yt = ln Yt and xt=ln Xt:
yt xt ut
• This is known as the exponential regression model. Y varies according to some exponent
(power) function of X.
• Here, the coefficients can be interpreted as elasticities.
• Thus a coefficient estimate of 1.2 for $ in both equations is interpreted as stating that ‘a rise
in X of 1% will lead on average, everything else being equal, to a rise in Y of 1.2%’.
f) Linear and non-linear models
• Any set of regression estimates of ̂ and ˆ are specific to the sample used in
their estimation; xt & yt are different → different values of the OLS estimates.
• Recall that the estimators of and from the sample parameters (̂ and ˆ ) are
ˆ t 2 t
given by x y Tx y
and ˆ y ˆx
xt Tx 2
xt2 xt
2
SE (ˆ ) / ˆ (ˆ ) s s ,
T ( xt x ) 2 T xt2 T 2 x 2
1 1
SE ( ˆ ) / ˆ ( ˆ ) s s
( xt x ) 2
xt2 Tx 2
where s is the standard error of the equation / regression.
Precision and Standard Errors
• It is worth noting that the standard errors give only a general indication of the
likely accuracy of the regression parameters.
• They do not show how accurate a particular set of coefficient estimates is.
• If the standard errors are small, it shows that the coefficients are likely to be
precise on average, not how precise they are for this particular sample.
T 2
An unbiased estimator of s / SE (regression) is given by: s t
ˆ
u 2
T 2
where uˆ 2
t is the residual sum of squares and T is the sample size.
• s is also known as the standard error of the regression or the standard error of the
estimate.
– Everything else being equal, the smaller this quantity is, the closer is the fit of the line to the
actual data.
Some Comments on the Standard Error Estimators
1. Both SE($ ) and SE( $ ) depend on s2 (or s). s2 is the estimate of the error
variance.
The greater the variance s2, then the more dispersed the errors are about their mean
value and therefore the more dispersed y will be about its mean value.
The larger this quantity is, the more dispersed are the residuals, and so the greater is
the uncertainty in the model.
If s2 is large, the data points are collectively a long way away from the line.
2. The sum of the squares of x about their mean appears in both formulae.
xt x appears in the denominators.
2
The larger the sum of squares, the smaller the coefficient variances, 2 ˆ .
Some Comments on the Standard Error Estimators (cont’d)
y
y
x x
0 x 0 x
3. The larger the sample size, T, the smaller will be the coefficient
variances. T appears explicitly in SE($ ) and implicitly in SE( $ ).
T appears implicitly since the sum xt x is from t = 1 to T.
2
4. The term xt appears in the SE($ ), thus affects only the intercept
2
0 x
Some Comments on the Standard Error Estimators (cont’d)
• Calculations: $
830102 (22 * 416.5 * 86.65)
ˆ x y Txy
t t
2 0.35
3919654 22 *(416.5) x Tx
2
t
2
• SE(regression), s uˆ t2
130.6
2.55
T 2 20
xt
2
3919654
SE( ) 2.55 * 3.35 SE(ˆ ) / ˆ (ˆ ) s
22 3919654 22 416.5 2
T x Tx
2
t
2
1 1
SE( ) 2.55 * 0.0079 SE ( ˆ ) / ˆ ( ˆ ) s
3919654 22 416.5 2
xt2 Tx 2
• We now write the results as
yˆ t 59.12 0.35xt
(3.35) (0.0079)