You are on page 1of 12

AE6207 QUANTITATIVE METHODS

SIMPLE LINEAR REGRESSION MODEL


Trimester 2, 2021 - 2022

Introduction Bringing these assumptions together gives


In applied economics, we are often interested in the
causal relationships between variables.
E(ui ) = 0; 8i
Consider …rst the simple case of two variables, for exam- 2
ple, expenditure on vacation (Y ) and income (X). var(ui ) = ; 8i (5)
cov(ui ; uj ) = E(ui uj ) = 0; i 6= j
Economic theory may postulate that the expected vaca-
tion expenditure bears a linear relationship to income. Adding the assumption of normality, these assumptions
This can be represented as can be concisely stated as ui iid N (0; 2 ):

It follows from E(ui ) = 0 and the …xed regressor as-


sumption that
E(Y jX) = + X (1)

For the ith household, this expectation gives


E(Xi uj ) = Xi E(uj ) = 0; 8i; j (6)

Now suppose the available data come in time series form


E(Yi jXi ) = + Xi (2) and that

The actual vacation expenditure of the ith household is Xt = aggregate real disposable personal income in year
Yi , so the discrepancy between the actual and expected t
expenditures is denoted by ui , where
Yt = aggregate real vacation expenditure in year t

where t = 1; 2; :::; T .
ui = Yi E(Yi jXi ) = Yi Xi (3)
The series Xt is no longer a set of sample values from
The discrepancy term, commonly referred to as the dis- the distribution of all N incomes in any year: it is the
turbance term, ui , represents the net in‡uence of factors actual sum of all incomes in each year.
other than income on vacation expenditure. These fac-
tors might include such things as the number and ages of In that case the conditional distribution f (Y jX) can still
household members, accumulated savings, and so forth. be given a probabilistic formulation. To see this reason-
ing, return to the cross section formulation and introduce
Taking conditional expectation of ui we have the time subscript. Thus
E(ui jXi ) = E(Yi Xi jXi )
= 0 (4)
Yit = + Xit + uit (7)
Similarly, the conditional variance of ui , denoted by th
where Yit = real vacation expenditure by the i household
2
yjxi ; may well vary with Xi : For the moment we as- in year t;
sume that the conditional variance is constant and inde- Xit = real disposable income of the ith household in
pendent of income. year t:
Finally, we assume that the the disturbances are inde-
Assuming (implausibly) that and are the same for all
pendent of each other.
households, aggregating over all N households in equa-
Nov 2021 - Mar 2022 tion (6), we have

1
Equation (14) gives
X X X
Yit = N + Xit + uit
i i i
Yt = + X t + ut (8) a=Y bX (16)
and equation (15) gives
The stochastic disturbance term is assumed to be
iid(0; ! 2 ) Pn P
i=1 Xi Yia n i=1 Xi
b = Pn 2
i=1 X i
Pn P
Estimation = i=1 Xi Yi P(Y bX) ni=1 Xi
n 2
Whether the sample data are of cross section or time se- i=1 Xi
ries form, the simplest version of the two-variable model Pn 2
i=1 Xi YP
i nX Y + bnX
is Yi = + Xi + ui , with ui being iid(0; 2 ): b = n 2
i=1 Xi

There are three parameters to be estimated in the model, Xn


2 X
n
b( Xi2 nX ) = Xi Yi nX Y
namely, ; and 2 :
i=1 i=1
Pn
One popular estimation method is known as the ordinary i=1 Xi Yi nX Y
b = Pn 2 (17)
least squares estimation. 2
i=1 Xi nX

Let the estimators of and be a and b, respectively. Equation (17) can also be expressed as

The estimated equation for E(Yi jXi ) is then


Pn
xi yi
b = Pi=1
n 2
(18)
i=1 xi
Ybi = a + bXi (9) where xi = Xi X and yi = Yi Y:
and the error of estimation (residuals), ei , is
Equation (18) can be further expressed as

ei = Yi Ybi = Yi a bXi (10)


Pn
xi yi =n
The least squares principle is to …nd a and b that mini- b = Pi=1
n 2
i=1 xi =n
mize the residual sum of squares: Pn Pn 2
xi yi =n i=1 yi =n
= Pi=1
n 2
P n 2
i=1 xi =n i=1 yi =n
Pn
X
n X
n
i=1 xi yi =n
M in RSS = e2i = (Yi a bXi )2 (11) = q P Pn
a;b n 2 2
i=1 i=1 i=1 xi =n i=1 yi =n
Pn 2
The necessary conditions are i=1 yi =n
q P Pn
n 2 2
i=1 xi =n i=1 yi =n
P q P
@( n 2 X
n Pn n
i=1 ei ) yi2 =n
= 2 (Yi a bXi ) = 0 (12) i=1 xi yi =n i=1
@a = q P Pn q P
i=1 n 2 2 n
i=1 xi =n i=1 yi =n i=1 x2i =n
P
@( n 2 X
n
sy
i=1 ei ) = r (19)
= 2 Xi (Yi a bXi ) = 0 (13) sx
@b i=1
The least squares line has three important properties:
Summing the terms in equations (12) and (13), the …rst
order conditions can be written as 1. It minimizes the sum of squared residuals
2. It passes through the point (X; Y )
X
n X
n 3. The least-squares residuals have zero correlation in
Yi = na + b Xi (14) the sample with the values of X:
i=1 i=1

X
n X
n X
n To estimate 2 , one possibility is the sample variance
Xi Yi = a Xi + b Xi2 (15) of the least squares residuals. However, for reasons ex-
i=1 i=1 i=1 plained below, the usual estimator is

2
They can be estimated by random samples drawn from
Pn 2
i=1 ei
the population of interest using the least-square ap-
s2 = (20)
n 2 proach.

Two important questions relating to this estimator are:


Decomposition of the TSS
Using Equations (10) and (16), we can express the resid- 1. what are the properties of these estimators?
uals in terms of x and y : 2. how may these estimators be used to make infer-
ences about and ?

ei = yi bxi The answers to both these questions depend on the sam-


pling distribution of the least-squares estimators.
Squaring both sides and summing over the sample ob- In repeated samples where we hold the X values con-
servations gives stant, Y values will vary because of the stochastic dis-
turbance term u:

P
n P
n P
n P
n The varying Y values with …xed X values in repeated
e2i = yi2 2b xi yi + b2 x2i
i=1 i=1 i=1 i=1 samples give varying sample values of the least-squares
Substitution by Equation (18) gives estimates a; b and s2 :

It can be shown that a and b are unbiased for and :


P
n P
n P
n
yi2 = b2 x2i + e2i
i=1 i=1 i=1 From equation (18), note that b can be expressed as
P
n Pn
= b xi yi + e2i
i=1 i=1 X
2 P P b= wi yi (24)
n n
= r yi2 + e2i (21)
i=1 i=1 where

This decomposition of the sum of squares is usually writ- xi


ten as wi = P 2 (25)
xi

Note that the weights have the following properties:


T SS = ESS + RSS (22)
X X 1 X X
where wi = 0 wi2 = P 2 and wi xi = wi Xi = 1
T SS = total sum of squared deviations in the Y variable xi
(26)
RSS = residual, or unexplained, sum of squares from the
regression of Y on X It then follows that
ESS = explained sum of squares from the regression of Y
on X:
X
b = w i yi
Equation (21) can be rearranged to give X
= wi Y i (27)
Note that the least-squares estimator b is a linear combi-
2 RSS ESS nation of the Y values. Replacing for the Y in equation (27)
r =1 = (23)
T SS T SS gives
r2 may be interpreted as the proportion of the Y varia- X X X
tion attributable to the linear regression on X: b = wi + wi Xi + wi ui
X
= + wi ui (28)
Inference in the Two-Variable Taking expectation, we then have

Regression
; and 2
are parameters of the linear regression E(b) = (29)
model. Hence, the least-squares estimator b is unbiased for :

3
From equation (28)
2
b N ; n 2
(34)
i=1 xi
X 2 !!
V (b) = E[(b )2 ] = E wi ui 2 1 X
2
a N ; + (35)
X 2 n n 2
i=1 xi
= wi E u2i
2
X 2 The square root of the variances of a and b are often
= wi
referred to as the standard errors of a and b, respectively.
2
= P (30)
x2i If 2 is known, we can use these results to form con…-
dence intervals and test hypotheses about and :
In a similar way, it can be shown that
For example, a 95% con…dence interval for is

E(a) = (31)
" #
1 X
2 b 1:96 p n 2
(36)
2
V (a) = +P 2 (32) i=1 xi
n xi
and the test statistic for testing H0 : = 0 is
The two estimators, a and b, are generally correlated
with covariance given by b 0 b 0
p = (37)
= n 2
i=1 xi
s:e:(b)
Comparing this test statistic value with the critical value
2
X from the standard normal distribution at the appropriate
Cov(a; b) = P 2 (33)
xi level of signi…cance will lead to either reject or not reject
the null hypothesis.
The covariance vanishes if X = 0:
When 2 is unknown the above inferential procedures
The least-squares estimators are linear combinations of
are not feasible. We need two further results to derive
the Y variable and they are unbiased. They are said to
an operational procedure.
belong to the class of linear unbiased estimators.
It can be shown that
In addition, they can be shown to have the smallest vari-
ance in this class. That is, for a linear unbiased estima-
tor, Pn
i=1 e2i 2
2 n 2 (38)
Pn
X and e2i is distributed independently of a and b.
i=1
b = ci Yi
it can be shown that Then

2
X
V (b ) = V (b) + (ci wi )2
pb n0 2
P 2 = x b 0
Since (ci wi ) 0; V (b ) V (b): Equality holds only q Pni=1 2i = p tn 2 (39)
n 2
i=1 ei s= i=1 xi
when ci = wi for all i; that is, when b = b: (n 2) 2
q Pn 2
i=1 ei 2
The least squares estimators thus has minimum variance where s = n 2
is an estimator of :
in the class of linear unbiased estimators and is said to
be a best linear unbiased estimator, or BLUE. Note that equation (39) has the same structure as that
of equation (37), the only di¤erence being that the un-
known is replaced by its estimate s:
Inference Procedures
The results established so far require the assumption Using equation (39) the 95% con…dence interval for is
that ui iid(0; 2 ):

If it is further assumed that ui iid N (0; 2 ); then s


b tn 2;0:975 p (40)
n 2
we have the sampling distributions of the least squares i=1 xi
estimators as and the test statistic for testing H0 : = 0 is

4
Analysis of Variance
b 0 Equation (37) o¤ers one way to test for the signi…cance
p (41)
n 2
s= i=1 xi of X; (H0 : = 0):

H0 : = 0 will be rejected at 100 % level of signi…- This hypothesis may also be tested in an analysis of vari-
cance if ance framework.

From the sampling distribution of b, we have

b 0
p > tn 2;1 =2 (42)
n 2
s= i=1 xi b (b )2 2
pP N (0; 1) ) 2=
P 1 (49)
= x2 x2
Similarly, tests on the intercept are based on the t-
Pn
distribution: 2 e2
Also, from equation (38) we have i=12 i n 2 : Fur-
Pn 2
ther, it can be shown that i=1 ei is distributed inde-
a pendently of b:
r tn 2 (43)
2
s n1 + X
P Dividing these two chi-square random variables by their
x2
i
respective degrees of freedom and forming the ratio of
Thus, a (1 )100% con…dence interval for is given the resulting expressions gives a random variable with a
by known F -distribution:

P
s (b )2 x 2
1 X
2 F = Pn 2
F1;n 2 (50)
a tn 2;1 =2 s +P 2 (44) i=1 ei =(n 2)
n xi
2
Note that the unknown is absent from this statistic.
The hypothesis H0 : = 0 will be rejected at 100 %
level of signi…cance if To test the hypothesis H0 : = 0; the test statistic is
obtained from equation (50) when we set = 0:

P
a b2 x2
r > tn 2;1 =2 (45) F = Pn 2
F1;n 2 (51)
2 i=1 ei =(n 2)
s n1 + X
P
x2
i
By referring to the decomposition of the sum of squares
2 in equation (31), the test statistic is seen to be
Tests on may be derived from the result stated in
equation (38). Using that result, we can write
ESS=1
F = (52)
RSS=(n 2)
2 (n 2)s2 2
P n 2; =2 < 2
< n 2;1 =2 =1 ; 0< <1
An ANOVA table can be set out to represent equation
(46) (52)
2
Hence, the (1 )100% con…dence interval for is
Source of Variation SS DoF Mean Squares
(1) (2) (3) (4)
" # X ESS = b2 x2 1 ESS=1
(n 2)s2 (n 2)s2 Residual RSS = e2 n 2 RSS=(n 2)
2
; 2
(47)
n 2;1 =2 n 2; =2 Total T SS = y 2 n 1 T SS=(n 1)
2 2
The hypothesis H0 : = 0 will be rejected at 100 %
level of signi…cance if The F -statistic in equation (52) is the ratio of the mean
square due to X to the residual mean square. The latter
may be regarded as a measure of the "noise" in the sys-
(n 2)s2 2 (n 2)s2 2 tem, and thus an X e¤ect is only detected if it is greater
2
< n 2; =2 or 2
> n 2;1 =2 (48)
0 0 than the inherent noise level.

5
The signi…cance of X is thus tested by examining
whether the sample F exceeds the appropriate critical Yb0 = a + bX0 = Y + bx0 (56)
value of F taken from the upper tail of the F distribu-
tion. where x0 = X0 X: The actual value of Y for the prediction
is
The test procedure is then to reject H0 : = 0 at the
level of signi…cance if Y0 = + X 0 + u0 (57)
The average value of Y taken over the sample observations
is
ESS=1
F = > Fn 2;1 (53)
RSS=(n 2)
Y = + X +u (58)
Using equation (31) again, another equivalent way of Subtraction equation (58) from equation (57) gives
presenting the test statistic is:
Y 0 = Y + x 0 + u0 u (59)
The prediction error is de…ned as
r2 (n 2)
F = (54)
1 r2
e0 = Y0 Yb0
Taking the square root of equation (54) gives a t- = (b )x0 + u0 u (60)
statistic:
Note that E(e0 ) = 0, so that Yb0 is a linear unbiased
predictor of Y0 :
p
(n 2)
r The variance of e0 can be shown to be
t= p (55)
1 r2

Either statistic in equations (54) and (55) may also be 2 1 x2


V (e0 ) = 1+ + 02 (61)
used to test the hypothesis H0 : = 0: n xi

Note from equation (60) that e0 is a linear combination


Prediction of normally distributed variables (b; u0 and u): Thus, it
is also normally distributed, and so
Having estimated the linear relationship between Y and
X; we are often interested to use it to predict Y for a
given value of X: e0
r N (0; 1) (62)
1 x2
For example, if X = X0 what would be the Y value 1+ n
+ 0
x2
i
corresponding to it?
Replacing the unknown by its estimate s then gives
X0 may be a value within the range of the sample ob-
servations, or it may lie outside this range. In either
e0
case, we assume the linear relationship continues to hold r tn 2 (63)
1 x2
within or without the sample range. s 1+ n
+ 0
x2
i

i.e.
Alternatively, given an observed pair of values (X0; Y0 ),
we may in interested to ask if these values could have
come from the same population as the sample data. Y0 Yb0
r tn 2 (64)
1 x2
s 1+ n
+ 0
x2
Prediction may be in the form of a point or interval i

prediction.
A (1 )100% con…dence interval for Y0 is then
Point prediction su¤ers from the weakness of bearing no
information on the magnitude of the error of prediction. s
1 x2
Yb0 tn 2;1 =2 s 1+ + 02 (65)
The point prediction is given by n xi

6
Even if we know and with certainty, there is an For and increasing series ( > 0), this implies a decreas-
inherent element of uncertainty in prediction Y0 , owing ing growth rate, and for a decreasing series ( < 0), the
to the random drawing u0 that occurs in the prediction speci…cation gives an increasing decline rate.
period. If our interest centres on prediction the mean
For a series with an underlying constant growth rate,
value of Y0 ;that is
whether positive or negative, Eq. (68) is then an inap-
propriate speci…cation.

E(Y0 ) = + X0 (66) The appropriate speci…cation expresses the logarithm of


the uncertainty related to u0 is eliminated. Following on the series as a linear function of time. This can be seen
the same analysis above, a (1 )100% con…dence interval as follows.
for E(Y0 ) is then If a series grows at a rate of per time period, starting
s from and initial value of Y0 the accumulated value after
1 x2 t periods will be
Yb0 tn 2;1 =2 s + 02 (67)
n xi

Note from equations (65) and (67) that the width of the Yt = Y0 (1 + )t (69)
interval increases symmetrically the further X0 is from r
However, if the growth rate is m per 1=m of a time
the sample mean X
period, the accumulated value after t periods will then
be
Time As a Regressor
Many economic variables increase or decrease with time. mt
A linear trend relationship would then be modeled as Yt = Y0 1 + (70)
m
If we allow m ! 1; it can be shown that
r mt
Y = + T +u (68) lim 1 + m = ert , in which case the accumulated
m!1
sum will be
where T indicates time.

The T variable can be speci…ed in many ways, but each t


Yt = Y0 e (71)
speci…cation requires one to de…ne the origin from which
time is measured and the unit of measurement that is Taking the logarithm gives
used.

For example, if we had annual observations on some vari- ln Yt = + t (72)


able for the years from 1980 to 1992 (n = 13 years),
where = ln Y0 :
possible speci…cations of the T variable would be
Thus, relating the logarithm of Yt to t gives the coef-
…cient as the constant continuous rate of change per
T = 1980; 1981; : : : ; 1992 unit time.
T = 1; 2; 3; : : : ; 13 For t = 1, Eq.(72) gives
T = 6; 5; : : : ; 5; 6

In all three cases the unit of measurment is a year. For ln Y1 = ln Y0 +


the second and third case, T = 0 corresponds to 1979
= ln Y1 ln Y0 (73)
and 1986, respectively.
Thus the di¤erence of the logarithms of Yt one period
Taking …rst di¤erences of Eq. (68) gives apart gives the constant continuously compounded rate
of change per unit time.

Yt = + (ut ut 1)
The equivalent discrete rate of change per unit time, g,
can be obtained as
Ignoring the disturbances, the implication of Eq. (68)
is that the series increases (decreases) by a constant
amount each period. g=e 1 (74)

7
Transformations of Variables The semilog speci…cation
The procedures of the simple linear equation above can
be applied to nonlinear relations between the depen-
dent variable and regressor if the nonlinearity can be Y = + ln X (78)
linearized by an appropriate transformation. can be used to study the relationship between class of ex-
For example, the growth equation has employed a log penditure, Y , and income, X:
transformation of the dependent variable.
A certain threshold level of income (e = ) is needed
before anything is spent on this commodity.
Log-Log Transformation
Many important econometric applications involve the Expenditure that increases monotonically with income,
logs of both variables. but at a diminishing rate.

The relevant functional speci…cation is The marginal propensity ( =X) to consume this good
declines with increasing income, and the elasticity ( =Y )
also declines as income increases.
Y = AX
ln Y = + ln X (75) Reciprocal Transformations
Reciprocal transformations are useful in modeling situ-
where = ln A:
ations where there are asymptotes for one or both vari-
ables.
The elasticity of Y with respect to X is de…ned as the
percentage change in Y per one percentage change in X: In general
Mathematically, it is expressed as:

(Y 1 )(X 2) = 3 (79)
dY X
=
dX Y describes a rectangular hyperbola with asmptotes at Y =
1X 1 and X = 2 :
= AX
Y
AX Eq.(79) may be expressed as
=
Y
=
3
Y = 1 + (80)
Eq.(75) shows the slope of a log-log speci…cation is the X 2

elasticity. This equation speci…es a constant elasticity Adding a disturbance term to Eq.(80) will then result in
formulation. a regression that is nonlinear in the ’s.
Such speci…cations frequently appear in applied work. Two special cases of Eq.(80) where linearizing transfor-
Di¤erent values of imply di¤erent functional relation- mations are available. Setting 2 = 0 gives
ships between Y and X:

Semilog Transformations 1
Y = + (81)
The general formulation is X
where = 1 and = 3:

Alternatively, setting 1 = 0 gives


ln Y = + X +u (76)

The coe¢ cient represents a proportionate change in Y


1
per unit change in X;as can be seen here: = + X (82)
Y
where = 2= 3 and = 1= 3:
d ln Y
= These special cases have been used to study the Phillips
dX
1 dY curves that relate wage or price change to unemployment
= (77) rate and cross-section expenditure function.
Y dX

8
Lagged Dependent Variable as Thus the full set of zero covariances in Eq.(6) does not
hold when the regressor is a lagged value of the depen-
Regressor dent variable.
Time series observations often display trends such that
successive values tend to be fairly close together. Least squares estimators are now biased; and the exact,
…nite sample results derived above are no longer valid.
One way of modeling such behaviour is by means of an
autoregression. The simplest autoregressive scheme is However, inference procedures based on least squares es-
timation of Eq.(83) can be given an asmptotic justi…ca-
tion.
Yt = + Yt 1 + ut (83)
An Introduction to Asymptotics
This is called a …rst-order autoregressive scheme and is Asymptotic theory derives results relating to the sam-
frequently denoted by AR(1): pling distribution of a statistic as the sample size in-
creases to in…nity.
For this model the properties of least squares estimator
and the associated inference procedures derived above Two main concepts related to this theory are convergence
are not strictly applicable, even though we continue to in probability and convergence in distribution.
make the same assumptions about the disturbance term
as in Eq.(5). Convergence in Probability
The reason is that Eq.(83) violates the …xed regressor Consider drawing a random sample of size n from a pop-
assumption used in the derivation of the sampling dis- ulation with unknown pdf which possesses a …nite mean
tributions of the least squares estimators. and …nite variance 2 :

Let xn be the sample mean where the subscript n indi-


To see this point, substitute successively for the lagged
cates the sample size on which the mean is based.
Y term on the rhs of Eq.(83) to obtain
The question we ask is how a variable such as xn and its
pdf behave as n ! 1:
Y1 = + Y 0 + u1
The x0 s are iid( ; 2
) by assumption. It follows directly
Y2 = + ( + Y 0 + u1 ) + u2
that
2
= (1 + ) + Y0 + (u2 + u1 )
.. ..
. = . 2

2 t 1 t
E(xn ) = and V (xn ) =
Yt = (1 + + + + )+ Y0 n
2 t 1 Thus xn is an unbiased estimator for any sample size,
+ut + ut 1 + ut 2 + + u1 (84)
and the variance tends to zero as n increases inde…nitely.
Now multiply Eq.(84) successively by
ut ; ut 1 ; ut 2 ; :::etc and take expectations. The It is then intuitively clear that the distribution of
result is xn ;whatever its precise form, becomes more and more
concentrated in the neighbourhood of as n increases.

Formally, if one de…nes a neighbourhood around as


2
E(Yt ut ) = , the expression
2
E(Yt ut 1) = (85)
2 2
E(Yt ut 2) =
P( < xn < + ) = P (jxn j < )
.. ..
. = . indicates the probability that xn lies in the speci…ed inter-
val.
Thus, Yt is correlated with the current and all previ-
ous disturbances but uncorrelated with all future distur- The interval may be made arbitrarily small by a suitable
bances. choice of :

It follows that the regressor Yt 1 is uncorrelated with the Since V (xn ) reduces monotonically with increasing n,
current disturbance ut but is correlated with all previous there exists a number n and a (0 < < 1) such that
disturbances. for all n > n

9
Convergence in Distribution
P (jxn j < )>1 (86) The next question we ask is how the pdf of xn behaves
with increasing n:
The random variable xn is then said to converge in prob-
ability to the constant : An equivalent statement is
The form of the distribution is unknown, since the mean
is a linear combination of x’s whose distribution is as-
lim P (jxn j < )=1 (87) sumed unknown.
n!1

In words, the probability xn lying in an arbitrarily small However, since the variance goes to zero in the limit, the
interval about can be made as close to unity as we distribution collapses on :
desire by letting n become su¢ ciently large.
The distribution is then said to be degenerate.
A shorthand way of writing Eq.(87) is

One seeks then an alternative statistic, some function of


plim xn = (88) xn , whose distribution will not degenerate.

where plim stands for "probability limit". p


A suitable alternative statistic is n(xn )= , which
The sample mean is then said to be a consistent estima- has zero mean and unit variance.
tor of :
The basic Central Limit Theorem states
The process is called convergence in probability.

In this example the estimator is unbiased for all sample


sizes. Suppose that we have another estimator x en of p
n(xn ) Ry 1 z 2 =2
such that lim P y = p e dz (91)
n!1 1 2

c This is a remarkable and powerful result. Whatever the


E(e
xn ) = +
n form of f (x), the limiting distribution of the statistic
p
where c is some constant. The estimator is biased in …nite n(xn )= is standard normal.
samples but
The process is labeled convergence in distribution, and
lim E(e
xn ) = an alternative way of expressing Eq.(91) is
n!1

Provided V (e en is also a consistent


xn ) ! 0 as n ! 1, x
estimator for : p p 2
nxn ! N ( n ; ) (92)
This case is an example of convergence in mean square, p
which occurs when the limit of the expected value of the to be read, " nxn tends in distribution to a normal vari-
p
estimator is the parameter of interest, and the limit of able with mean n and variance 2 :"
the variance of the estimator is zero.
In practice the objective is to use xn to make inferences
Convergence in mean square is a su¢ cient condition for
about . This is done by taking the limiting form as an
consistency.
approximation for the unknown distribution of xn : The
An extremely useful feature of probability limits is the relevant statement is
ease with which the probability limits of functions of
random variables may be obtained.
2
For example, if we assume that an and bn possess prob- a
xn N ; (93)
ability limits, then n
to be read "xn is asymptotically normally distributed with
2
mean and variance n ".
plim(an bn ) = plim(an ) plim(bn ) (89)
The unknown 2 can be replaced by the sample variance,
an plim(an ) s2 , which is a consistent estimator, and Eq.(93) used for
plim = (90)
bn plim(bn ) inferences about :

10
The Autoregressive Equation to economic journals in the year 2000. We are interested
The autoregressive Eq.(83) may be estimated by the in the relation between the demand for economics journals
least squares formulae above. and their price. A suitable measure of the price of journals
is the price per citation. A scatterplot of ln(subs) against
If the estimated coe¢ cients are a and b it has been shown ln(price=citation) clearly shows that the number of subscrip-
p p
by Mann and Wald that n(a ) and n(b ) have tions is decreasing with price.
a bivariate normal limiting distribution with zero mean
and …nite variances and covariances.

Thus least squares estimators are consistent for and


:

Moreover, the limiting variances and covariance may


be consistently estimated by the least-squares formulae
above.

Consequenctly, least-squares estimation of the autore-


gressive model now have an asymptotic justi…cation
rather than exact, …nite-sample validity.
Scatterplot: log(subs) vs log(price/cit)
Tutorial Exercise No. 4
Question 1
(a) Estimate a linear regression of ln(subs) on
ln(price=citations):
Show the following properties of the weights wi : (b) Estimate 2 , the variance of the disturbance term.
(c) Provide a decomposition of the TSS into ESS and RSS.
Determine what proportion of the variation in the logarithm
X X 1 X X of the number of subscriptions is ’explained’by the variation
wi = 0 wi2 = P 2 and wi xi = wi Xi = 1
xi in the logarithm of the price per citation.
(d) Compute the t test statistics for testing H0 : = 0
x
where wi = P i2 :
xi
and H0 : = 0:
(e) Verify that the F -test statistic for testing H0 : = 0 is
Question 2 twice the corresponding t test statistic.
(f) For ln(price=citations) = 3:0, provide a point and 95%
Show that con…dence interval estimate of E[ln(subs)]:
(g) Find the 95% con…dence interval estimate of ln(subs)
E(a) = when ln(price=citations) = 3:0:

Question 7
Question 3
Fit a constant growth curve to the accompanying data,
using two di¤erent speci…cations of the time variable
Show that the hypothesis Ho : = 0 can also tested with
2
the statistic F = r 1(nr22) : Year Marijuana crop (10,000 tons)
1985 38.1
Question 4 1986 80
1987 170.4
Show that cov(b; u) = 0
1988 354.5
1989 744.4
Question 5

2 1 x2
Show that V (e0 ) = 1+ n
+ 0
x2
Question 8
i
Show that log Y = + log X + u gives the same estimate
Question 6 of whether logs are taken to base 10 or to base e: Is this true
The …le data_journals contains a small data set that of the estimate of ? Do your conclusions need modi…cation
provides information on the number of library subscriptions if log X is replaced by t (a time trend variable)?

11
Question 9
Discuss brie‡y the advantages and disadvantages of the
relation

vi = + log v0
as a representation of an Engel curve, where vi is expendi-
ture per person on commodity i and v0 is income per week.
Fit such a curve to the following data, and from your results
estimate the income elasticity at an income of $500 per week.
Does it make any di¤erence to your estimate of the income
elasticity if the logarithms of v0 are taken to base 10 or base
e?

$’00 per week


vi 0.8 1.2 1.5 1.8 2.2 2.3 2.6 3.1
v0 1.7 2.7 3.6 4.6 5.7 6.7 8.1 12.0

Question 10
A response rate Y to a stimulus X is modeled by the func-
tion

100
= +
100 Y X
where Y is measured in percentage terms. Outline the
properties of this function. Fit the function to the accompa-
nying data.

X 3 7 12 17 25 35 45 55 70 120
Y 86 79 76 69 65 62 52 51 51 48

12

You might also like