You are on page 1of 9

The Linear Regression Model in Matrix Form

This appendix derives various results for ordinary least squares estimation of the multiple linear
regression model using matrix notation and matrix algebra.

E.1 The Model and Ordinary Least Squares Estimation


Throughout this appendix, we use the t subscript to index observations and an n to denote the
sample size. It is useful to write the multiple linear regression model with k parameters as
follows:

where yt is the dependent variable for observation t, and xtj, j = 2,3, …, k, are the independent
variables.

[Some authors prefer to define xt as a column vector, in which case, xt is replaced with x’t.
Mathematically, it makes more sense to define it as a row vector.]

We can write in full matrix notation by appropriately defining data vectors and

Finally, let u be the n x 1 vector of unobservable disturbances. Then, we can write


for all n observations in matrix notation:

1
Estimation of β proceeds by minimizing the sum of squared residuals. Define the sum of squared
residuals function for any possible k x 1 parameter vector b as

. This is equivalent to

(We have divided by -2 and taken the transpose.) We can write this first order condition as

We want to write these in matrix form to make them more useful. Using the formula for
partitioned multiplication in Appendix D, we see that

2
is equivalent to

or

It can be shown that this formula always has at least one solution. Multiple solutions do not help
us, as we are looking for a unique set of OLS estimates given our data set.

The assumption that X’X is invertible is equivalent to the assumption that rank(X) = k, which
means that the columns of X must be linearly independent. This is the matrix version of multiple
linear regression assumption 4 in Chapter 3.

It is tempting to simplify the formula for as follows:

The flaw in this reasoning is that X is usually not a square matrix, and so it cannot be inverted. In
other words, we cannot write unless n = k, a case that virtually never arises
in practice.

From first order condition and the definition of vector , we can see that the
first order condition for vector , is the same as

3
Because the first column of X consists entirely of ones, implies that the OLS residuals
always sum to zero when an intercept is included in the equation and that the sample covariance
between each independent variable and the OLS residuals is zero.

The sum of squared residuals can be written as

Using matrix, we can show that the total sum of squares is equal to the explained sum of squares
plus the sum of squared residuals.

E.2 Finite Sample Properties of OLS

1. Linear in parameters

The model can be written as in where y is an observed n x 1 vector, X is an n


x k observed matrix, and u is an n x 1 vector of unobserved errors or disturbances.

2. Zero conditional mean

3. No perfect collinearity

This is a careful statement of the assumption that rules out linear dependencies among the
explanatory variables. X’X is nonsingular, and so vector is unique and can be written as in

Unbiasedness of the OLS estimator

4
= so we have seen that it is unbiased

To obtain the simplest form of the variance-covariance matrix of , we impose the assumptions
of homoskedasticity and no serial correlation.

4. Homoscedasticity and no serial correlation

The variance of ut cannot depend on any element of X, and the variance must be constant across
observations t. The no serial correlation assumption: the errors cannot be correlated across
observations. We often say that u has scalar variance-covariance matrix when the assumption
holds. We can now derive the variance-covariance matrix of the OLS estimator.

Because (X’X)-1X’ is a nonrandom constant

5
This means that the variance of (conditional on X) is obtained by multiplying σ2 by the jth
diagonal element of (X’X)-1. The equation also tells us how to obtain the covariance between any
two OLS estimates: multiply σ2 by the appropriate off diagonal element of (X’X)-1.

The Gauss-Markov Theorem

based on the above assumptions

it is true

6
Unbiasedness of
Based on the above assumptions

Proof

7
E.3 Statistical Inference

When we add the final classical linear model assumption, vector has a multivariate normal
distribution, which leads to the t and F distributions.

Assumptions 5: Normality of Errors: Conditional on X, the are independent and


identically distributed as Normal (0, ). Equivalently, given X, is distributed as multivariate
normal with mean zero and variance-covariance matrix : . Here u

means the difference between a variable like ( ) and their means

Under Assumption MLR.5, each ut is independent of the explanatory variables for all t. In a time
series setting, this is essentially the strict exogeneity assumption.

Theorem: Normality of : Under the classical linear model Assumption MLR. 1

through MLR. 5 conditional on X is distributed as multivariate normal with mean and


variance covariance matrix

This theorem (normality of distribution of ) is the basis for statistical inference involving .
We can also find the chi-square distribution, t-distribution, and F distributions of the .

Normal distribution:-

Z - Standard normal distribution:-

Chi-square distribution:- we can write . The denominator 1 can be

found by dividing a number by equal number. so that . Here (n-k)


and are constants. However, is a random variable. Since it is square of a normally
distributed it has a chi-square distribution with (n-k) degree of freedom.

Chi-square distribution:-

8
t-distribution: a normal distribution over a chi-square distribution is a t distribution with the chi-
squared degree of freedom.

Since the denominator is 1,

You might also like