You are on page 1of 30

Econometrics II Part I The Classical Linear Regression Model (CLRM) and the OLS Estimator


y1 y y2 yn column vector containing the n sample observations on the dependent variable y. (1)

x 1k xk x 2k x nk column vector containing the n sample observations on the independent variable x k , with k 1, 2, . . . , K. (2)

x1 x2



n K data matrix containing the n sample observations on the K independent variables. Usually the vector x 1 is assumed to be a column of 1s (constant).

1 2 n column vector containing the n disturbances. (4)

Assumptions of the CLRM

Assumption 1: linearity
Observed data are generated by the following linear model y i 1 x i1 2 x i2 . . . K x iK i x i i i 1, 2, . . . , n. (5)

The K unknown parameters of the model can be collected in a column vector, 1 2 K and the model can be rewritten in compact form: y x 1 1 x 2 2 . . . x K K y X (6)


Assumption 2: the strict exogeneity assumption

The expected value of each disturbance, i conditional on all observations is zero E i |X 0 i 1, 2, . . . , n In compact form: E|X 0 (9)


First implication of strict exogeneity The unconditional mean is also zero. In fact, by the Law of Total Expectations: (10) E i E X E i |X 0 i 1, 2, . . . , n

Second implication of strict exogeneity The regressors are orthogonal to the error term for all observations: E i x jk 0 i, j 1, 2, . . . , n; k 1, 2, . . . , K


Third implication of strict exogeneity The orthogonality conditions are equivalent to zero-correlation conditions Covx jk , i Ex jk i Ex jk E i 0


Assumption 3: no (perfect) multicollinearity

The rank of the n K matrix, X, is K with probability 1. This implies that X has full column rank; the columns of X are linearly independent and there are at least K observations (n K.

Assumption 4: spherical error variance

Homoskedasticity assumption The conditional second moment of each disturbance, i is constant 2 E 2 i |X i 1, . . . , n


No-correlation assumption The conditional second cross-moment between i and j is zero for all i j E i j |X 0 for i j (14)

2 2 2 Var i |X E 2 i |X E i |X E i |X

(15) (16) (17)

and Cov i , j |X E i j |X E i |XE j |X E i j |X 0 the two assumptions can be written as: E |X Var|X 2 I where I is a n n identity matrix.

Introduction to Least Squares Method

The parameters of the linear regression model, and the common variance of the error terms, 2 are unknown quantities. By using available sample data, estimation methods provide estimates of these unknown quantities. Although we do not observe the vector of disturbances, y X we can compute the vector of residuals implied by , of as any hypothetical value, y X (18) Fitting criterion: the least squares method chooses the value of which minimizes the sum of squared residuals.

the Given an arbitrary choice for the coefficient vector, to minimize the function minimization problem is to choose , where: S y X y X S Note that y X i y i x i 2 i 2 y X i The set of first order condition is: S 0 2X y 2X X

Derivation of the OLS Estimator

(19) (20)


Let b be the solution. Then b satisfies the least squares normal equations: X Xb X y If the inverse of X X exists (which follows from the full column rank assumption) the solution is: b X X 1 X y and the least squares residuals can be written as: e y Xb For this solution to minimize the sum of squares 2 Sb 2 X X b b must be a positive definite matrix (X has full column rank).


(23) (24) (25)

The normal equations X Xb X y imply that X y Xb X e 0 Therefore, for every column x k of X xke 0 and, if the first column of X is a column of 1s x1e i e 0 In words, the least squares residuals sum to zero.

(26) (27) (28) (29)

An Alternative Expression for the OLS Estimator

b X X 1 X y can be rewritten as:
1 X y 1 S 1 s b X X 1 XX XY n n

(30) (31)

1 1 S XX X X n n x i x i (32) i 1 xiyi s XY X y 1 n n i Intuition: S XX and s XY can be thought as sample averages of x i x i and x i y i respectively. This form is utilized in large sample theory.

More Concepts and Algebra

Vector of fitted values Xb Vector of residuals e y Xb y (34) (33)

Projection matrix P XX X 1 X Annihilator matrix M IP Both P and M are n n, symmetric and idempotent PX X MX 0 (36) (37) (35)

Sum of Squared Residuals (RSS) RSS e e M In fact e y Xb y XX X 1 X y My MX M and, after squaring both terms we obtain: e e M

(38) (39) (40)

Estimate of the Variance of the Error Term The OLS estimate of 2 , denoted s 2 , is the sum of squared residuals divided by n K e e 2 s nK Standard Error of the Regression (SER) The square root of s 2 , s, is called the standard error of the regression.


Sampling Error b X X 1 X y X X X
1 X X X

(42) X
1 X X

1 X X X

1 X X

Goodness of Fit and Analysis of Variance

The least squares method chooses the vector of coefficients to minimize the sum of squared residuals. However, the fact the this sum is minimized does not tell us whether it is big or small. To circumvent this limitation it is necessary to compare the sum of squared residuals with an appropriate benchmark. This benchmark is given by the sum of squared deviations of the dependent variable from its sample mean (TSS, Total Sum of Squares).

Starting from y y M0y (43) we obtain 0 0 0 0 0 2 i y i y M y M y y M M y y M y (44) where M 0 is a n n symmetric idempotent matrix that transforms observations into deviations from sample means. Its diagonal elements are all 1 1 n and its off-diagonal elements are 1 n .

Derivation of the coefficient of determination y e after subtracting y from both sides, we obtain yy ye that can be rewritten as M 0 y M 0 e M 0 Xb e The total sum of squares is M 0 y M 0 y M 0 Xb e M 0 Xb e which can be rewritten as: y M y b X M 0 Xb e e

(45) (46) (47) (48) (49)

The total sum of squares (TSS) can be decomposed in two parts, measuring respectively the proportion of the TSS that is accounted for by variation in the regressors (ESS) and the proportion that is not (RSS) TSS ESS RSS The standard measure of the goodness of fit of a regression is simply the ratio between ESS and TSS. This measure called the coefficient of determination, R 2 is bounded between zero and one. R2
0 ESS X M Xb 1 e e b TSS y M0y y M0y



Problems with the use of the coefficient of determination It never decreases when an additional explanatory variable is added to the regression. For this reason alternative measures have been implemented, including the so-called adjusted R 2 (for the degrees of freedom) which is computed as follows: e e/n K 2 R 1 0 (52) y M y/n 1 This adjusted variable can decrease if the contribution of the additional variable to the fit of the regression is relatively low. If the constant term is not included in the model the coefficient of determination is not bounded between zero and one and can indeed turn negative.