Professional Documents
Culture Documents
◼ Exercises
1 x1
1. Let X = K K .
1 xn
a. The normal equations are given by (3-12), Xe = 0 (we drop the minus sign), hence for each
of the columns of X, xk, we know that Xk e = 0. This implies that in=1ei = 0 and in=1 xi ei = 0.
b. Use in=1ei to conclude from the first normal equation that a = y − bx .
c. We know that in=1ei = 0 and in=1 xi ei = 0. It follows then that in=1 ( xi − x )ei = 0
because in=1 xei = x in=1ei = 0. Substitute ei to obtain in=1 ( xi − x )( yi − a − bxi ) = 0.
or in=1 ( xi − x )( yi − y − b( xi − x )) = 0.
in=1 ( xi − x )( yi − y )
Then, in=1 ( xi − x )( yi − y ) = bin=1 ( xi − x )( xi − x )) so b = .
in=1 ( xi − x )2
d. The first derivative vector of ee is −2Xe. (The normal equations.) The second derivative matrix
is 2(ee)/bb = 2XX. We need to show that this matrix is positive definite. The diagonal
elements are 2n and 2in=1 xi2 which are clearly both positive. The determinant is [(2n)( 2in=1 xi2 )]
−( 2in=1 xi )2 = 4nin=1 xi2 − 4(nx )2 = 4 n[(in=1 xi2 ) − nx 2 ] = 4 n[(in=1 ( xi − x )2 ]. Note that a much simpler
proof appears after (3-6).
4. What is the result of the matrix product M1M where M1 is defined in (3-19) and M is defined in (3-14)?
M1 M = (I − X1 (X1 X1 )−1 )(I − X(XX)−1 ) = M − X1 (X1X1 ) −1 X1M
There is no need to multiply out the second term. Each column of MX1 is the vector of residuals in
the regression of the corresponding column of X1 on all of the columns in X. Since that x is one of the
columns in X, this regression provides a perfect fit, so the residuals are zero. Thus, MX1 is a matrix
of zeroes which implies that M1M = M.
5. The original X matrix has n rows. We add an additional row, xs. The new y vector likewise has an
X y
additional element. Thus, X n,s = n and y n,s = n . The new coefficient vector is
xs ys
bn,s = (Xn,s Xn,s)−1(Xn,syn,s). The matrix is Xn,s Xn,s = XnXn + xsxs. To invert this, use (A-66);
1
( Xn,s Xn,s )−1 = ( Xn Xn )−1 − ( Xn Xn )−1 x s xs ( Xn Xn )−1. The vector is
1 + xs ( Xn Xn )−1 x s
(Xn,s yn,s) = (Xn yn) + xsys. Multiply out the four terms to get
(Xn,s Xn,s)−1(Xn,syn,s) =
1 1
bn − ( Xn X n )−1 x s xs b n + ( Xn X n )−1 xsys − ( Xn Xn )−1 x s xs ( Xn Xn )−1 xs ys
1 + xs ( Xn Xn ) x s
−1
1 + xs ( Xn Xn )−1 x s
xs ( Xn X n )−1 x s 1
= bn + ( Xn X n )−1 xsys − ( Xn X n )−1 x s ys − ( Xn Xn )−1 x s xs b n
1 + xs ( Xn X n ) x s
−1
1 + xs ( Xn X n )−1 x s
x ( X X )−1 x 1
bn + 1 − s n n −1s ( Xn Xn )−1 x s ys − ( Xn Xn )−1 x s xs b n
1 + x
s ( X X
n n ) x s 1 + x
s ( X X
n n ) −1
x s
1 1
bn + ( Xn Xn )−1 x s ys − ( Xn Xn )−1 x s xs b n
1 + xs ( Xn Xn )−1 x s 1 + xs ( Xn Xn )−1 x s
1
bn + ( Xn X n )−1 x s ( ys − xs b n ).
1 + xs ( Xn X n )−1 x s
i x 0 0 y
6. Define the data matrix as follows: X = = X1 , = X1 X2 and y = o .
1 0 1 1 ym
(The subscripts on the parts of y refer to the “observed” and “missing” rows of X.)
We will use Frish-Waugh to obtain the first two columns of the least squares coefficient vector.
b1 = (X1M2X1)−1(X1M2y). Multiplying it out, we find that M2 = an identity matrix save for
the last diagonal element that is equal to 0.
0 0
X1M2X1 = X1 X1 − X1 X1 . This just drops the last observation. X1M2y is computed likewise.
0 1
Thus, the coefficients on the first two columns are the same as if y0 had been linearly regressed on X1.