Week 2

Week 2
The simple linear regression model (SLRM)

Part 1
1
a) Definition - Regression
What is a regression model?
• Regression is probably the single most important tool at the

econometrician’s disposal.
• The term was coined by Francis Galton.
But what is regression analysis?
• It is concerned with describing and evaluating the relationship

between a given variable (usually called the dependent variable) and
one or more other variables (usually known as the independent
variable(s)).
a) Definition - Some notation
• Denote the dependent variable by y and the independent variable(s) by x1, x2, ... , xk
where there are k independent variables.
• Variations in k variables (the x’s) cause changes in some other variable, y.
• Some alternative names for the y and x variables:

y x’s
dependent variable independent variables
regressand regressors
effect variable causal variables
explained variable explanatory variable
response variable control variable
• Note that there can be many x variables but we will limit ourselves to the case where
there is only one x variable to start with. In our set-up, there is only one y variable.
a) Definition - Regression is different from correlation
• The correlation between two variables measures the degree of linear association
between them.
• If we say y and x are correlated, it means that we are treating y and x in a completely
symmetrical way.
– It is not implied that changes in x cause changes in y, or indeed that changes in y cause
changes in x.
– It is simply stated that there is evidence for a linear relationship between the two variables,
and that movements in the two are on average related to an extent given by the correlation
coefficient.
• In regression, we treat the dependent variable (y) and the independent variable(s) (x’s)
very differently. The y variable is assumed to be random or “stochastic” in some way,
i.e., to have a probability distribution. The x variables are, however, assumed to have
fixed (“non-stochastic”) values in repeated samples.
– Regression as a tool is more flexible and more powerful than correlation.
a) Simple regression
• For simplicity, say k = 1. This is the situation where y depends on only one
x variable.
• Examples of the kind of relationship that may be of interest include:

– How asset returns vary with their level of market risk
– Measuring the long-term relationship between stock prices and
dividends.
– Constructing an optimal hedge ratio
a) Simple regression
• Suppose that there should be a

relationship between two variables, y and
x.
• Financial theory suggests that an increase

in x will lead to an increase in y.
• The scatter plot appears that there is an

approximate positive linear relationship
between x and y, which means that
increases in x are usually accompanied
by increases in y, and the relationship
between them can be described
approximately by a straight line.
a) Finding a line of best fit
• We can use the general equation for a straight line,

y =a + bx
to get the line that best “fits” the data.
• However, this equation (y = a + bx) is completely deterministic; exact.

– If the values of a and b had been calculated, then given a value of x, it would be
possible to determine with certainty what the value of y would be.
• Is this realistic? No, since it corresponds to the case where the model fitted the
data perfectly, i.e., all of the data points lay exactly on a straight line.
• So what we do is to add a random disturbance term, u into the equation.

yt =  + xt + ut
where t (= 1, 2, 3, 4, 5,…) denotes the observation number
a) Why do we include a disturbance term?
• The disturbance term can capture a number of features:

– We always leave out some determinants of yt.
• This might arise because the number of influences on y is too large to
place in a single model, or because some determinants of y may be
unobservable or not measurable.
– There may be errors in the measurement of yt that cannot be modelled.
– Random outside influences on yt which we cannot model.

• E.g., a terrorist attack, a hurricane or a computer failure could affect
financial asset returns in a way that cannot be captured in a model and
cannot be forecast reliably.
• Similarly, many researchers would argue that human behaviour has an
inherent randomness and unpredictability!
b) The assumptions underlying the
classical linear regression model (CLRM)
• The model yt    xt  ut , which we have used, together with the

assumptions, is known as the CLRM.
• Estimate the values of two unknown parameters, α and β.
• Need data on y, x, and u.
• We observe data for xt, but since yt also depends on ut, we must be
specific about how the ut are generated; ut is not observable; guess the
value of ut.
• We usually make the following set of assumptions about the ut’s (the
unobservable error terms); make some reasonable assumption about the
shape of the distribution of each u.
b) The assumptions underlying the CLRM
Technical Notation Interpretation

1. E(ut) = 0 for all t  The errors have zero mean or expected value.
 For any given x, u may have different values, but on average, it is
zero.
 Violation of it leads to the “biased intercept” problems.
2. Var (ut) = E[ut – E(ut)]2  The variance of the errors is constant and finite over all values of xt.
= E(ut2) [⸪E(ut)=0]  Homoscedasticity; every ut has the same variance.
= 2 < ∞ for all t  ut do not have a variance which is a systematic function of
i.e., 2 constant for all t explanatory variable; ut would not be higher for higher values of x
than for lower values.
 Violation of it leads to the “heteroscedasticity” problem.
3. Cov (ui,uj) = E(ui,uj) = 0  The errors are statistically independent of one another; uncorrelated
for all i ≠ j  Serial independence or non-autocorrelation
 Different value of u are not correlated
 Violation of it leads to the “serial correlation” or “autocorrelation”
problem.
4. Cov (ut,xt) = 0  No relationship between the error and corresponding x variate.
 ut and xt are independently distributed; Exogeneity.
b) The assumptions underlying the CLRM again
• Non-stochasticity of Xt: Xt is fixed in repeated sample.
• Linearity: The model must be linear in parameter because

the OLS is a linear estimation technique.
• The number of observation (T) > number of parameters (2).

b) The assumptions underlying the CLRM again
• Assumptions 1 and 4 imply that the regressor is orthogonal to (i.e. unrelated

to) the error term.
• An alternative assumption to 4., which is slightly stronger, is that the xt’s are
non-stochastic or fixed in repeated samples; values of x are either controllable
or fully predictable. Violation of this assumption creates problems like “errors
in variable” and “autoregression”.
• A fifth assumption is required if we want to make inferences about the

population parameters (the actual  and ) from the sample parameters ( ̂
and ˆ ); for conducting statistical tests of significance of the parameters
estimated.
• Additional Assumption
5. ut is normally distributed.
- Violation of it will render the usual tests of significance for the
estimated parameters, e.g., t-test, inapplicable.
c) OLS estimation - Estimator or estimate?
• Estimators are the formulae used to calculate the coefficients.
• Estimates are the actual numerical values for the coefficients.

c) Some further terminology:
The population and the sample
• The population is the total collection of all objects or people to be studied,

for example,
• Interested in Population of interest

predicting outcome the entire electorate
of an election
• A sample is a selection of just some items from the population.
• A random sample is a sample in which each individual item in the

population is equally likely to be drawn.
c) The data generating process (DGP), the population regression
function (PRF) and the sample regression function (SRF)
• The PRF is a description of the model that is thought to be generating the actual data
and the true relationship between the variables (i.e., the true values of  and ).
• PRF = DGP
• The PRF is yt    xt  ut
• The SRF is y ˆ t  ˆ  ˆxt

and we also know that uˆt  yt  yˆt .
• We use the SRF to infer likely values of the PRF.
• The PRF is what is really wanted, but all that is ever available is the SRF.
• We also want to know how “good” our estimates of  and  are.

c) OLS estimation - Determining the regression coefficients
• So how do we determine what  and  are?

• Choose  and  so that the (vertical) distances from the data points to the fitted lines
are minimised (so that the line fits the data as closely as possible).
• This could be done by ‘eye-balling’ the data and, for each set of variables y and x, one
could form a scatter plot and draw on a line that looks as if it fits the data well by hand.
y
x
c) Ordinary least squares (OLS)
• The most common method used to fit a line to the data is known as OLS.
• What we actually do is take each distance and square it (i.e. take the area of each of the
squares in the diagram) and minimise the total sum of the squares (hence least squares).
c) Ordinary least squares (OLS): Actual and fitted value
• Tightening up the notation, let

yt denote the actual data point t
ŷt denote the fitted value from the regression line; for the given value
of x of this observation t, ŷt is the value for y which the model would
have predicted; (ˆ) = value estimated. y
ût denote the residual, yt - ŷt
yt
yi
û i
ŷi
ŷt
xi x
c) Ordinary least squares (OLS): Actual and fitted value
• What is done is to minimise the sum of the uˆt2 y
• The reason that the sum of the squared distances is

minimised rather than, e.g., finding the sum of ût that is as yt
close to zero as possible, is that in the latter case, some yi
points will lie above the line while others lie below it. Then,
when the sum to be made as close to zero as possible is û i
formed, the points above the line would count as positive
ŷi
values, while those below would count as negatives. ŷt
• These distances will in large part cancel each other out.
• One could fit virtually any line to the data, so long as
the sum of the distances of the points above the line
and the sum of the distances of the points below the
line were the same. xi x
• Thus, there would not be a unique solution for the

estimated coefficients.
• Nevertheless, taking the squared distances ensures that all
deviations that enter the calculation are positive and
therefore do not cancel out.
c) How OLS works
5
1 2 3
2
4
2
5
 t
• So min. uˆ  uˆ  uˆ  uˆ  uˆ , or minimise t 1
2 2 2 ˆ
u 2
. This is known
as the residual sum of squares (RSS) or the sum of squared residuals.
• But what was ût ? It was the difference between the actual point and
the line, yt - ŷt .
• So minimising  y  ˆ
y 
 t t is equivalent to minimising
2
 t
ˆ
u 2
with respect to ̂ and ˆ .

c) How OLS works
• Letting ̂ and ˆ denote the values of α and β selected by minimising the RSS, respectively, the
equation for the fitted line is given by:
yˆ t  ˆ  ˆxt
• Let L denote the RSS, which is also known as a loss function:
L   ( yt  yˆ t ) 2   ( yt  ˆ  ˆxt ) 2
t i
• L is minimised with respect to (w.r.t.) ̂ and ˆ , to find the values of α and β which minimise
the residual sum of squares to give the line that is closest to the data.
• So L is differentiated w.r.t. ̂ and ˆ, setting the first derivatives to zero.
• The coefficient estimators for the slope and the intercept are given by:
ˆ   x y  Txy
t t
and ˆ  y  ˆx
 x  Tx 2
t
2
c) How OLS works
ˆ   x y  Tx y
t t
 x  Tx 2
t
2
• Both equations state that, given only the sets of observations xt and yt, it is always possible to
calculate the values of the two parameters, ̂ and ˆ, that best fit the set of data.
• The first formula can also be written, more intuitively, as:
ˆ   ( x  x )( y  y )
t t
 (x  x) t
2
ˆ = sample covariance between x and y / sample variance of x
• This method of finding the optimum is known as OLS.

• From the equation for ̂, the regression line will go through the mean of the observations, i.e.,
the point ( x , y ) lies on the regression line.
Simple regression: An example
• Suppose that we have the following data on the excess returns on a fund
manager’s portfolio (“fund XXX”) together with the excess returns on a
market index:
Year, t Excess return Excess return on market index
= rXXX,t – rft = rmt - rft
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3
• We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship between
x and y given the data that we have. The first stage would be to form a
scatter plot of the two variables.
Graph (scatter diagram)
45
Excess return on fund XXX
40
35
30
25
20
15
10
5
0
0 5 10 15 20 25
Excess return on market portfolio
What do we use $ and $ for?
• Plugging the five observations in to make up the formulae of $ and $ would
lead to the estimates:
$ = -1.74 and $ = 1.64. We would write the fitted line as:
yˆ t  1.74  1.64 x t
– xt = market risk premium
• Question: If an analyst tells you that she expects the market to yield a return
20% higher than the risk-free rate next year, what would you expect the return
on fund XXX to be?
• Solution: We can say that the expected value of y = “-1.74 + 1.64 * value of x”,
so plug x = 20 into the equation to get the expected value for y:
yˆ i  1.74  1.6420  31.06
– Thus, for a given expected market risk premium of 20%, fund XXX would be expected to
earn an excess over the risk-free rate of approximately 31%.
What do we use $ and $ for?
yˆ t  1.74  1.64 x t
• If x increases by 1 unit, y will be expected, everything else being equal, to increase by
1.64 units’.
• If $ had been negative, a rise in x would on average cause a fall in y.
• $, the intercept coefficient estimate, is interpreted as the value that would be taken by
the dependent variable y if the independent variable x took a value of zero.
• Suppose that $ = 1.64, x is measured in per cent and y is measured in thousands of US
dollars. Then, if x rises by 1%, y will be expected to rise on average by $1.64 thousand
(or $1,640).
d) Properties of OLS regression line
1. The OLS regression line passes through the point of means (𝑥,ҧ ȳ,). [Eq. (6) and (7) (see ols-proof)]:
 n   n 
  Yi    Xi 
b1   i 1   b  i 1 
(6)
 n  2 n 
   
   
b1  Y  b2 X (7)
⸫𝒚
ഥ=𝜶 ෡𝒙
ෝ + 𝒃ഥ
2. ei have zero covariance with the sample xi values, and also with 𝑦ෝ𝑖 .
1
• Cov(xi,ei) = σ(𝑥𝑖 − 𝑥)(𝑒
ҧ 𝑖 − 𝑒)ҧ [Eq. (5) (see ols-proof)]
𝑛
1 SSE n
= σ(𝑥𝑖 − 𝑥)𝑒
ҧ 𝑖 (⸪ 𝑒ҧ = 0)    2  X i Yi  b1  b2 X i   0 (5)
𝑛 b2 i 1
1 1
= σ 𝑥𝑖𝑒𝑖 − 𝑥ҧ σ 𝑒𝑖 SSE n
𝑛 𝑛   2 X i ei  0 [ Yi  b1  b2 X i  Yi  (b1  b2 X i )  Yi  Yˆi  ei ]
1 b2 i 1
= σ 𝑥𝑖𝑒𝑖 (⸪ ∑ei = 0)
𝑛 SSE n
   X i ei  0
b2 i 1
𝟏
⸫ Cov(xi,ei) = σ 𝒙𝒊𝒆𝒊 = 0
𝒏
• Cov(𝑦ො i,ei) = 0 ෠ 𝑖 implies that 𝑦ො i is a linear function of 𝑥𝑖 ]

[⸪ 𝑦ො i = 𝛼ො + 𝑏𝑥
d) Properties of OLS regression line
෠ may also be computed sequentially [see
3. The estimated coefficients (𝛼ො and 𝑏)
ols-proof].
4. The total variation in yi may be expressed as sum of two components:
a) The variation ‘explained’ by the estimated regression line.
b) The variation not explained or ‘unexplained’ by the estimated regression line.
e) Properties of estimators
1. Small sample Properties (30 or less observations)
a) Unbiasedness:
• E( ˆ ) = β → Mean or expected value = value of true population
parameter.
• Estimator able to provide an accurate estimate of its true population
parameter.
b) Minimum Variance or Bestness:
• If Var (ˆ ) < Var (β*), ̂ is called the minimum variance or best
estimator of β if compared to β*.
• Associated with reliability dimension of the estimator. The estimator
with lower variance is more reliable.
c) Efficiency
• ˆ is efficient if:
a) ˆ is unbiased Minimum variance unbiased estimator
b) Var (ˆ ) < Var (β*) (MVUE) or best-unbiased estimator
• Efficiency
An estimator ˆ of parameter  is said to be efficient if it is
unbiased and no other unbiased estimator has a smaller
variance. If the estimator is efficient, we are minimising
the probability that it is a long way off from the true value
of .
d) Linearity:
• The estimator can be expressed as a linear combination of sample
observations.
• Linearity is associated with linear, i.e., additive, calculation rather
than multiplicative or non-linear calculation.
e) Mean-squared error (MSE):

• When (bias ˆ ) > (bias β*), but Var (ˆ ) < Var (β*) → difficult choice.
• If MSE (ˆ ) < MSE (β*), accept ˆ as an estimator of β.
2. Large sample or asymptotic properties
• Sample size is large and approaches to infinity.
a) Asymptotic unbiasedness:
• ̂ is an asymptotically unbiased estimator of β if:
lim 𝐸 𝛽መ = 𝛽
𝑛→∞
 Estimator ̂ , which is biased, becomes unbiased as the sample size approaches infinity.
 If an estimator is unbiased, it is also asymptotically unbiased.
• The least squares estimates of ̂and ̂ are unbiased. That is E(̂ )= and E(ˆ )=.
• Thus, on average, the estimated value will be equal to the true values.
• To prove this also requires the assumption that cov(ut, xt)=0 and E(ut)=0.
• Unbiasedness is a stronger condition than consistency since it holds for small as
well as large samples
2. Large sample or asymptotic properties
• Sample size is large and approaches to infinity.
b) Consistency
• If the increase in sample size reduces bias and variance of the estimator, and this continues until both bias
variance become zero as n → ∞, the estimator is said to be consistent.
• ̂ is a consistent estimator if:
lim 𝐸 𝛽መ − 𝛽 = 0
𝑛→∞
and
lim 𝑉𝑎𝑟 𝛽መ = 0
𝑛→∞
• The least squares estimators ̂ and ˆ are consistent.
• That is, the estimates will converge to their true values as the sample size increases to infinity.
• Need the assumptions E(xtut)=0 and E(ut) = 0 to prove this.
f) Properties of the OLS estimators
• If assumptions 1. through 4. hold, then the estimators ̂ and ˆ determined by

OLS are known as Best Linear Unbiased Estimators (BLUE).
What does the acronym stand for?
• “Estimator” - ̂ and ˆ are estimators of the true value of α and .
• “Linear” - ̂ and ˆ are linear estimators - that means that the formulae
for ̂ and ˆ are linear combinations of the random variables (in
this case, y)
• “Unbiased” - On average, the actual value of the ̂ and ˆ ’s will be equal to
the true values. → E(ˆ ) = β; accuracy.
• “Best” - means that the OLS estimator ˆ has minimum variance among
the class of linear unbiased estimators. The Gauss-Markov
theorem, i.e., the BLUE property, proves that the OLS estimator
is best by examining an arbitrary alternative linear unbiased
estimator, and showing in all cases that it must have a variance no
smaller than the OLS estimator.
f) Linearity
• In order to use OLS, we need a model which is linear in the parameters ( and  ). It does
not necessarily have to be linear in the variables (y and x).
• Linear in the parameters means that the parameters are not multiplied together, divided,
squared or cubed etc.
• Models that are not linear in the variables can often be made to take a linear form by
applying a suitable transformation or manipulation, e.g., the exponential regression model:
Yt  AX t eut ln Yt  ln( A)   ln X t  ut
– Taking logarithms of both sides, applying the laws of logs and rearranging the right-
hand side (RHS).
• Then let ln(A) = α, yt = ln Yt and xt=ln Xt:
yt    xt  ut
• This is known as the exponential regression model. Y varies according to some exponent
(power) function of X.
• Here, the coefficients can be interpreted as elasticities.
• Thus a coefficient estimate of 1.2 for $ in both equations is interpreted as stating that ‘a rise
in X of 1% will lead on average, everything else being equal, to a rise in Y of 1.2%’.
f) Linear and non-linear models
• Similarly, if theory suggests that y and x should be inversely related:


yt     ut
xt
then the regression can be estimated using OLS by substituting:
1
zt 
xt
and regressing y on a constant and z.
• But some models are intrinsically non-linear, e.g.,


yt    xt  ut
– Such models cannot be estimated using OLS, but might be estimable using a non-linear
estimation method.
Precision and Standard Errors
• Any set of regression estimates of ̂ and ˆ are specific to the sample used in
their estimation; xt & yt are different → different values of the OLS estimates.
• Recall that the estimators of  and  from the sample parameters (̂ and ˆ ) are
ˆ   t 2 t
given by x y  Tx y
 xt  Tx 2
• What we need is some measure of the reliability or precision of the estimators

(̂ and ˆ ). The precision of the estimate is given by its standard error. Given
assumptions 1 - 4, then the standard errors can be shown to be given by
 xt2  xt
2
SE (ˆ ) / ˆ (ˆ )  s s ,
T  ( xt  x ) 2 T  xt2  T 2 x 2
1 1
SE ( ˆ ) / ˆ ( ˆ )  s s
 ( xt  x ) 2
 xt2  Tx 2
where s is the standard error of the equation / regression.
Precision and Standard Errors
• It is worth noting that the standard errors give only a general indication of the
likely accuracy of the regression parameters.
• They do not show how accurate a particular set of coefficient estimates is.
• If the standard errors are small, it shows that the coefficients are likely to be
precise on average, not how precise they are for this particular sample.
• Thus standard errors give a measure of the degree of uncertainty in the

estimated values for the coefficients.
Estimating the Variance of the Disturbance Term (σ2)
• The variance of the random variable ut is given by

Var(ut) or σ2 = E[(ut)-E(ut)]2
which reduces to (Assumption 1 of the CLRM: E(ut) = 0):
Var(ut) or σ2 = E(ut2)
• We could estimate this using the average of ut2 :

1
Estimated variance of the error term = s 2 or ˆ 2 
T
 ut2
• Unfortunately this is not workable since ut is not observable. We can use

the sample counterpart to ut, which is ût :
1
s 2
or ˆ 2 
T
 t
ˆ
u 2
But this estimator is a biased estimator of 2.

Estimating the Variance of the Disturbance Term(σ2) (cont’d)
• An unbiased estimator of s2 is given by: s 2 or ˆ 2   t

ˆ
u 2
T 2
An unbiased estimator of s / SE (regression) is given by: s  t
ˆ
u 2
T 2
where  uˆ 2
t is the residual sum of squares and T is the sample size.
• s is also known as the standard error of the regression or the standard error of the
estimate.
– It is sometimes used as a broad measure of the fit of the regression equation.
– Everything else being equal, the smaller this quantity is, the closer is the fit of the line to the
actual data.
Some Comments on the Standard Error Estimators
1. Both SE($ ) and SE( $ ) depend on s2 (or s). s2 is the estimate of the error
variance.
 The greater the variance s2, then the more dispersed the errors are about their mean
value and therefore the more dispersed y will be about its mean value.
 The larger this quantity is, the more dispersed are the residuals, and so the greater is
the uncertainty in the model.
 If s2 is large, the data points are collectively a long way away from the line.
2. The sum of the squares of x about their mean appears in both formulae.
  xt  x  appears in the denominators.
2

 The larger the sum of squares, the smaller the coefficient variances,  2 ˆ .
Some Comments on the Standard Error Estimators (cont’d)
Consider what happens if 

 tx  x 2
is small or large:
y
y
y
y
x x
0 x 0 x
The data are close together so that  xt  x 

2
The points are widely dispersed across a long
is small. It is more difficult to determine with section of the line, so that one could hold more
any degree of certainty exactly where the confidence in the estimates in this case.
line should be.
3. The larger the sample size, T, the smaller will be the coefficient
variances. T appears explicitly in SE($ ) and implicitly in SE( $ ).
T appears implicitly since the sum  xt  x  is from t = 1 to T.
2
– Reason: every observation on a series represents a piece of useful information

which can be used to determine the coefficient estimates.
– Thus, the larger the size of the sample, the more information will have been
used in estimation of the parameters, and hence the more confidence will be
placed in those estimates.
4. The term  xt appears in the SE($ ), thus affects only the intercept
2
standard error and not the slope standard error.

 The reason is that  xt measures how far the points are away from the y-axis.
2
Accuracy of intercept estimate
• Care needs to be exercised when considering the intercept estimate, particularly

if there are no or few observations close to the y-axis:
y
0 x
• The points collectively are closer to the y-

• All of the points are bunched a long
axis and hence it will be easier to
way from the y-axis, which makes it
determine where the line actually crosses
more difficult to accurately estimate
the axis.
the point at which the estimated line
• Note that this intuition will work only in
crosses the y-axis (the intercept).
the case where all of the xt are positive!
Example: How to Calculate the Parameters and Standard Errors
• Assume we have the following data calculated from a regression of y on a

single variable x and a constant over 22 observations.
• Data:
 xt yt  830102, T  22, x  416.5, y  86.65,
 t  3919654, RSS  130.6
x 2
• Calculations: $ 
830102  (22 * 416.5 * 86.65)
ˆ   x y  Txy
t t
2  0.35
3919654  22 *(416.5)  x  Tx
2
t
2
$  86.65  035 .  5912

. * 4165 . ˆ  y  ˆx
• We write yˆ t  ˆ  ˆxt
yˆt  59.12  0.35xt
Example (cont’d)
• SE(regression), s   uˆ t2

130.6
 2.55
T 2 20
 xt
2
3919654
SE( )  2.55 *  3.35 SE(ˆ ) / ˆ (ˆ )  s

22  3919654  22  416.5 2
 T  x  Tx 
2
t
2
1 1
SE(  )  2.55 *  0.0079 SE ( ˆ ) / ˆ ( ˆ )  s

3919654 22  416.5 2
  xt2  Tx 2
• We now write the results as
yˆ t   59.12  0.35xt
(3.35) (0.0079)

Week 2 - The Simple Linear Regression Model PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 2 - The Simple Linear Regression Model PDF

Uploaded by

Copyright:

Available Formats

The simple linear regression model (SLRM)

What is a regression model?

• Regression is probably the single most important tool at the

But what is regression analysis?

• It is concerned with describing and evaluating the relationship

• Variations in k variables (the x’s) cause changes in some other variable, y.

• Some alternative names for the y and x variables:

• Examples of the kind of relationship that may be of interest include:

• Suppose that there should be a

• Financial theory suggests that an increase

• The scatter plot appears that there is an

• We can use the general equation for a straight line,

• However, this equation (y = a + bx) is completely deterministic; exact.

• So what we do is to add a random disturbance term, u into the equation.

• The disturbance term can capture a number of features:

– There may be errors in the measurement of yt that cannot be modelled.

– Random outside influences on yt which we cannot model.

• The model yt    xt  ut , which we have used, together with the

• Estimate the values of two unknown parameters, α and β.

• Need data on y, x, and u.

Technical Notation Interpretation

• Non-stochasticity of Xt: Xt is fixed in repeated sample.

• Linearity: The model must be linear in parameter because

• The number of observation (T) > number of parameters (2).

• Assumptions 1 and 4 imply that the regressor is orthogonal to (i.e. unrelated

• A fifth assumption is required if we want to make inferences about the

• Estimators are the formulae used to calculate the coefficients.

• Estimates are the actual numerical values for the coefficients.

• The population is the total collection of all objects or people to be studied,

• Interested in Population of interest

• A sample is a selection of just some items from the population.

• A random sample is a sample in which each individual item in the

• The PRF is yt    xt  ut

• The SRF is y ˆ t  ˆ  ˆxt

• We use the SRF to infer likely values of the PRF.

• We also want to know how “good” our estimates of  and  are.

• So how do we determine what  and  are?

• Tightening up the notation, let

• What is done is to minimise the sum of the uˆt2 y

• The reason that the sum of the squared distances is

• Thus, there would not be a unique solution for the

with respect to ̂ and ˆ .

ˆ = sample covariance between x and y / sample variance of x

• This method of finding the optimum is known as OLS.

yˆ i  1.74  1.6420  31.06

• Cov(𝑦ො i,ei) = 0 ෠ 𝑖 implies that 𝑦ො i is a linear function of 𝑥𝑖 ]

e) Mean-squared error (MSE):

• If assumptions 1. through 4. hold, then the estimators ̂ and ˆ determined by

• Similarly, if theory suggests that y and x should be inversely related:

• But some models are intrinsically non-linear, e.g.,

• What we need is some measure of the reliability or precision of the estimators

• Thus standard errors give a measure of the degree of uncertainty in the

• The variance of the random variable ut is given by

• We could estimate this using the average of ut2 :

• Unfortunately this is not workable since ut is not observable. We can use

But this estimator is a biased estimator of 2.

• An unbiased estimator of s2 is given by: s 2 or ˆ 2   t

– It is sometimes used as a broad measure of the fit of the regression equation.

Consider what happens if 

The data are close together so that  xt  x 

– Reason: every observation on a series represents a piece of useful information

standard error and not the slope standard error.