You are on page 1of 3

# Generalized Least Squares

Simon Jackman
Stanford University
Consider the linear regression model, y = Xb+u, where y is a n-by-1 vector of observations
on a dependent variable, X is a n-by-k matrix of independent variables of full column rank, b
is a k-by-1 vector of parameters to be estimated, and u is a n-by-1 vector of disturbances. Via
the Gauss-Markov Theorem, if
A1 E(u|X) = 0 (i.e., the disturbances have conditional mean zero), and

## A2 E(uu0 |X) = r2 X, where X = In , a n-by-n identity matrix (i.e., conditional on the X,

the disturbances are independent and identically distributed or ‘‘iid’’ with conditional
variance r2 ),
then the ordinary least squares estimator b̂OLS = (X0 X)-1 X0 y with variance-covariance matrix
V(b̂OLS ) = r2 (X0 X)-1 is (1) the best linear unbiased estimator (BLUE) of b, in the sense of having
smallest sampling variability in the class of linear unbiased estimators; (2) a consistent
estimator of b (i.e., as n → ∞, Pr[|b̂OLS - b| < e] = 1, for any e > 0, or plim b̂OLS = b).
If A2 fails to hold (i.e., X is a positive definite matrix but not equal to In ), then b̂OLS remains
unbiased, but no longer ‘‘best’’, and remains consistent. Relying on b̂OLS when A2 doesn’t hold
risks faulty inferences; without A2, r̂2 (X0 X)-1 is a biased and inconsistent estimator of V(b̂OLS ),
meaning that the estimated standard errors for b̂OLS are wrong, invalidating inferences and
the results of hypothesis tests. Assumption A2 often fails to hold in practice: e.g., (1) when
pooling across disparate units generates disturbances with different conditional variances
(heteroskedasticity); (2) an analysis of time series data generates disturbances that are not
conditionally independent (serially correlated disturbances).
When A2 does not hold, it may be possible to implement a generalized least squares (GLS)
estimator that is BLUE (at least asymptotically). For instance, if the researcher knows the
exact form of the departure from A2 (i.e., the researcher knows X) then the GLS estimator
b̂GLS = (X0 X-1 X)-1 X0 X-1 y is BLUE, with variance-covariance matrix r2 (X0 X-1 X)-1 . Note that when
A2 holds, X = In and b̂GLS = b̂OLS (i.e., OLS is a special case of the more general estimator).
Typically, researchers suspecting that assumption A2 does not hold do not possess
exact knowledge of X, meaning that b̂GLS is non-operational, and a estimated or feasible
generalized least squares (FGLS) estimator is utilized. FGLS estimators are often implemented
in multiple steps: (1) a OLS analysis to yield estimated residuals û; (2) analysis of the û to
form an estimate of X, denoted X̂ (3) computing the FGLS estimator b̂FGLS = (X0 X̂-1 X)-1 X0 X̂-1 y.
Step (3) is often performed by noting that X̂ can be decomposed as X̂ = P-1 (P0 )-1 , and b̂FGLS is
obtained by running the weighted least squares (WLS) regression of y* = Py on X* = PX; i.e.,
0 0
b̂FGLS = (X* X* )-1 X* y* .
The properties of FGLS estimators vary depending on the form of X (i.e., the nature of
the departure from the conditional iid assumption in A2) and the quality of X̂ and so can
not be neatly summarized. Finite sample properties of the FGLS estimator are often dervied
case-by-case via Monte Carlo experiments; in fact, it is possible to find cases where b̂OLS is
more efficient than b̂FGLS , say, when the violation of A2 is mild (e.g., Rao and Griliches, 1969;
Chipman, 1979). Asymptotic results are more plentiful, and usually rely on showing that
b̂FGLS and b̂GLS are asymptotically equivalent, so that b̂FGLS is a consistent and asymptotically
efficient estimator of b: e.g., Judge et al. (1980, 117-8), Amemiya (1985, 186-222).
In social-science settings, FGLS is most commonly encountered in dealing with sim-
ple forms of residual autocorrelation and heteroskedasticity. For instance, the popular
Cochrane-Orcutt (1949) and Prais-Winsten (1954) procedures for AR(1) disturbances yield
FGLS estimators. The use of FGLS to deal with heteroskedasticity appears in many econo-
metrics texts: Judge et al. (1980, 128-145) and Amemiya (1985, 198-207) provide rigorous
treatments of FGLS estimators of b for commonly encountered forms of heteroskedasticity.
Group-wise heteroskedasticity arises from pooling across disparate units, yielding distur-
bances that are conditionally iid within groups of observations; this model is actually a
special case of Zellner’s (1962) seemingly unrelated regression (SUR) model. FGLS applied
to Zellner’s SUR model results in a cross-equation weighting scheme that is also used as the
the third-stage of three-stage least squares estimators (Zellner and Theil, 1962).
FGLS is also used to estimate models for panel data with unit and/or period specific
heterogeneity not captured by the independent variables: e.g., yit = xit0 b + mi + gt + eit where
mi and gt are error components (or random effects) specific to units i and time periods t,
respectively. The composite error term uit = mi + gt + eit is generally not iid, but its variance-
covariance matrix X can be estimated with the sum of the estimated variances of the error
components; provided that E(uit |xit ) = 0 and E(X0 X-1 X) is of full rank, then FGLS provides a
consistent estimator of b. See Wooldridge (2002, Chapter 10).
Finally, FGLS estimators can often be obtained as maximum likelihood estimates (MLEs),
generating consistent estimates of b in ‘‘one shot’’. While the iid assumption (A2) often
greatly simplifies deriving and computating MLEs, the more common (and hence more simple)
departures from assumption A2 are often tapped via a small number of auxiliary parameters
that determine the content and structure of X (e.g., consider the relatively simple form of X
implied by first-order residual autocorrelation, group-wise heteroskedasticity, or simple error
components structures). Thus the parameters of substantive interest (b) and parameters
tapping the assumed error process can be estimated simulataneously via MLE, yielding
estimates that are asymptotically equivalent to FGLS estimates.

References
Amemiya, Takeshi. 1985. Advanced Econometrics. Cambridge: Harvard University Press.

Chipman, J.S. 1979. ‘‘Efficiency of least squares estimation of linear trend when residuals
are autocorrelated.’’ Econometrica 47:115--128.

Cochrane, D. and G.H. Orcutt. 1949. ‘‘Application of Least Squares Relationships Containing
Autocorrelated Error Terms.’’ Journal of the American Statistical Association 44:32--61.

Judge, George G., William E. Griffiths, R. Carter Hill and Tsoung-Chao Lee. 1980. The Theory
and Practice of Econometrics. Wiley Series in Probability and Mathemetical Statistics New
York: Wiley.

2
Prais, S. J. and C. B. Winsten. 1954. Trend estimators and serial correlation. Chicago: Cowles
Commission.

Rao, Potluri and Zvi Griliches. 1969. ‘‘Small-Sample Properties of Several Two-Stage
Regression Methods in the Context of Auto-Correlated Errors.’’ Journal of the American
Statistical Association 64:253--272.

Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross-Section and Panel Data. Cam-
bridge, Massachusetts: MIT Press.

Zellner, A. 1962. ‘‘An Efficient Method of Estimating Seemingly Unrelated Regressions and
Tests for Aggregation Bias.’’ Journal of the American Statistical Association 57:348--368.

## Zellner, A. and H. Theil. 1962. ‘‘Three-Stage Least Squares: Simultaneous Estimation of

Simultaneous Equations.’’ Econometrica 30:54--78.