You are on page 1of 9

Seemingly Unrelated Regressions

Hyungsik Roger Moon

Benoit Perron

Department of Economics

D´epartement de sciences ´economiques, CIREQ, and CIRANO

University of Southern California

Universit´e de Montr´eal

moonr@usc.edu

benoit.perron@umontreal.ca
July 2006

Abstract
This article considers the seemingly unrelated regression (SUR) model first analyzed by Zellner (1962). We describe
estimators used in the basic model as well as recent extensions.

1

. let Yt = ˜ t = diag (x1t . firms or industries provide a natural application as these various entities are likely to be subject to spillovers from economy-wide or worldwide shocks. for example as a location in space.. .. if we stack for each observation t. β 0N . and t = 1. . The first one is to gain efficiency in estimation by combining information on different equations. . and several extensions of the basic SUR model. xit = (1. xN T on its diagonal. a block-diagonal matrix with x1t . .. Ut = [u1t . On the other hand... For example.. t could have other interpretations. and a thorough treatment is available in the book by Srivastava and Giles (1987). x2t . A classical linear SUR model is a system of linear regression equations. ˜ t0 β + Ut . Zellner (1962) provided the seminal work in this area. . The second motivation is to impose and/or test restrictions that involve parameters in different equations. Yt = X 2 (1) . T. xit. Alternatively.Seemingly unrelated regressions A seemingly unrelated regression (SUR) system comprises several individual relationships that are linked by the fact that their disturbances are correlated. where the double index it denotes the tth observation of the ith equation in the system. . Such models have found many applications. demand functions can be estimated for different households (or household types) for a given commodity. There are two main motivations for use of SUR. xNT ) . ... but in some applications.Ki −1 ) is a Ki -vector of explanatory variables for observational unit i. states. This chapter selectively overviews the SUR model...uN t ]0 . and uit is an unobservable error term. yN t ]0 . xit. A recent survey can be found in Fiebig (2001) .. xit. countries. ...1 .. although much Bayesian analysis has been done with this model (including Zellner’s contributions). N. For example.. Basic linear SUR model 0 Suppose that yit is a dependent variable. but adding-up constraints leads to restrictions on the parameters of different equations in this case. Denote L = K1 +· · ·+KN .. equations explaining some phenomenon in different cities.. Then. X £ ¤0 and β = β 01 . some of the estimators used in such systems and their properties. y1t yNt = β 01 x1t + u1t . · · · .. Often t denotes time and we will refer to this as the time dimension. Further simplification in notation can be accomplished by stacking the observations either in the t dimension or for each i.. The correlation among the equation disturbances could come from several sources such as correlated shocks to household income. [y1t . We adopt a Classical perspective.2 . = β 0N xNt + uNt where i = 1. one could model the demand of a household for different commodities..

. Ordinary least squares (OLS) estimator: The first estimator of β is the ordinary least squares (OLS) estimator of Yt on regressor Xt . Furthermore. xi = [xi1 . x0NT ]0 and A (β) = diag (β 1 .. £ Estimation of β : In this section we summarize four estimators of β that have been widely used in applications of the classical linear SUR.... . x02t . and that conditional on all the regressors X 0 = [X1 .. xiT ] is of full rank Ki .. Assumption: 0 In the classical linear SUR model. 2. .OLS = t=1 xit xit t=1 xit yit . XT ] .OLS . empirical Bayes. . j)th element of Σ. iN ) ⊗ IK . UT ] is given by 0¤ E vec (U ) (vec (U )) = Σ ⊗ IT . the SUR model in (1) can be rewritten as Yt = A (β)0 Xt + Ut . X ˆ This is just the vector that stacks the equation-by-equation OLS estimators. σ ij = E (uit ujt |X) . In the special case where K1 = · · · = KN = K. β OLS = ³P ´−1 P T T 0 ˆ β i. Generalized least squares (GLS) and feasible GLS (FGLS) estimator: 3 ³ 0 ´0 ˆ ˆ0 β . N. Interested readers should refer to Srivastava and Giles (1987) and Fiebig (2001) . define Xt = [x01t . 1. we assume that Σ is positive definite and denote by σ ij the (i.OLS N... . that is. . .. Then. . we have G = diag(i1 .. the covariance matrix of the entire vector of disturbances U 0 = [U1 . β .. For this. where 1.. Other estimators (such as Bayes. (2) vec (A (β)) = Gβ..Another way to present the SUR model is to write it in a form of a multivariate regression with parameter restrictions. . β N ) to be a (L × N ) block diagonal coefficient matrix. or shrinkage estimators) have also been proposed.. .. (3) and the coefficient A (β) satisfies for some (N L × L) full rank matrix G. . the errors Ut are iid over time with mean zero and homoskedastic variance Σ = E (ut u0t | X) ... where ij denotes the j’th column of the N × N identity matrix IN . .. ˆ β OLS = µX T t=1 ˜t X ˜ t0 X ¶−1 X T t=1 ˜ t Yt . we assume that for each i = 1... Under this assumption.

A.When the system covariance matrix Σ is known. det Σ − t=1 2 2 ´ ³ ˆ ˆ where A (β) denotes the coefficient A in (2) with the linear restriction of (3) . Σ) = const + or equivalently. ´0 ´ ³ T 1 XT ³ ˜ t0 β Σ−1 Yt − X ˜ t0 β . a feasible GLS (FGLS) estimator is defined by replacing the unknown Σ with a consistent estimate.OLS xkt . Yt − X det Σ − t=1 2 2 ¢0 ¡ ¢ T 1 XT ¡ 0 0 Yt − A (β) Xt Σ−1 Yt − A (β) Xt . Gaussian quasi-maximum likelihood estimator (QMLE): The Gaussian log-likelihood function is L (β. X The FGLS estimator is a two-step estimator where OLS is used in the first step to obtain residuals eˆkt and an estimator ˆ of Σ. an estimator of β by minimizing the distance between Aˆ and β in (3) . When the vector Ut has a normal distribution. that is. . For this. 4. i. where σ ˆ ij = 1 T PT ˆit eˆjt t=1 e ˆ0 and eˆkt is the OLS residuals of the kth equation. assume that T > L and that the whole ´−1 P ³P T T 0 0 X X regressor matrix X has full rank L. This estimator is sometimes referred to as the restricted estimator as opposed to the unrestricted estimator proposed by Zellner that uses the residuals from an OLS regression of (2) without imposing the coefficient restrictions (3) . this estimator is the maximum likelihood estimator. Then ˆ β F GLS = µX T t=1 ˜t Σ ˆ −1 X ˜ t0 X ¶−1 X T t=1 ˜tΣ ˆ −1 Yt . Minimum distance (MD) estimator: ˆ and then. that is. A widely used estimator of Σ is ˆ = (ˆ Σ σij ) . eˆkt = ykt − β k. Aˆ = t=1 t t t=1 Xt Yt . we have ˆ β MD = µ µ XT ˆ −1 ⊗ G0 Σ t=1 ¶ ¶−1 µ µ XT ˆ −1 ⊗ Xt Xt0 G G0 Σ t=1 4 ¶ ³ ´¶ Xt Xt0 vec Aˆ . the ˆ optimal MD estimator β MD minimizes the optimal MD objective function ¶³ ³ ³ ´ ´0 µ ³ ´ ´ XT ˆ −1 ⊗ Xt Xt0 Σ vec Aˆ − Gβ . from regressing each regressand on all distinct regressors in the system. = Σ X X X β t t GLS t=1 t=1 When the covariance matrix Σ is unknown. the GLS estimator of β is ¶−1 X µX T T −1 ˜ 0 ˆ ˜ ˜ t Σ−1 Yt . 3. Σ) . When Aˆ is the OLS estimator of A (β) . The second step compute β F GLS based on the estimated Σ in the first step. k = i. obtain The idea of the MD estimator is to obtain an estimator of the unrestricted coefficient A in (2) . Σ) = const + L (β. ΣQMLE maximizes L (β.e. j. QMD (β) = vec Aˆ − Gβ t=1 In this case. and the QMLE β QMLE .

1977 and Srivastava and Maekawa. the FGLS estimator β F GLS is often called the SUR estimator (SURE). The efficiency gain relative to OLS tends to be larger when the correlation across equations is larger and when the correlation among regressors in different equations is smaller. by asymptotic expansions (e. 1972. kept fixed. Distribution of the estimators: ˆ In the literature on the classical linear SUR.g. In this sense. Finite sample properties of SURE have been studied extensively either analytically in some restrictive cases (e. For asymptotic theories for large N. it is straightforward to construct statistics to test general nonlinear hypotheses. For example. E X E X Σ X X X E X t t t t t OLS t and µ h ³ ´i−1 ¶ ´ √ ³ ´ √ ³ ´ √ ³ −1 ˜ 0 ˆ ˆ ˆ ˜ X T β − β . one can refer to Phillips and Moon (1999. if we use the same consistent estimator Σ. 1967). if we use the QMLE estimator of Σ. equation-by-equation OLS provides some degree of robustness since it is not affected by misspecification in other equations in the system. T β − β . Note also that efficient estimators propagate misspecification and inconsistencies across equations. It is straightforward to show that the SUR estimator using the information in the system is more efficient (has a smaller variance) than the estimator of the individual equations. β F GLS = β MD . Using the above distributional results. that is. T. the asymptotic distributions as T → ∞ of the aforementioned estimators are: µ h ³ ´ ´i−1 ³ ´h ³ ´i−1 ¶ √ ³ 0 0 0 ˆ ˜ ˜ ˜ ˜ ˜ ˜ T β − β ⇒ N 0.Relationship among the estimators: ˆ the FGLS Some of the above estimators are tightly linked. However. The usual asymptotic analysis of the SURE is carried out when the dimension of index t. 1968). Under regularity conditions. Also. QMLE is identical to β F GLS and to β MD . T β − β ⇒ N 0. Kakwani. By the Gauss-Markov theorem. then the entire vector β will be inconsistently estimated by the efficient methods. ˆ ˆ ˆ and the MD estimators above are identical. 1963. readers can refer Chapter 14 of Greene (2003) and Davidson and MacKinnon (1993. 294-295). For other cases. 1995) or 5 . if any equation is misspecified (for example some relevant variable has been omitted). Phillips. E X Σ t GLS F GLS MD t ³¡ ¡ ¢ ¢−1 ´ ≡ N G0 Σ−1 ⊗ E (Xt Xt0 ) G . this efficiency gain disappears in some special cases described in Kruskal’s theorem (Kruskal. Zellner. the GLS estimator β GLS ˆ is more efficient than the OLS estimator β OLS when the system errors are correlated across equations. T. A well-known special case of this theorem is when the regressors in each equation are the same. N. ΣQMLE ³ ´ ˆ ˆ ˆ ˆ ˆ β in place of Σ. pp. For example. increases to infinity with the dimension of index i.g.

. Bootstrap methods have also proposed to remedy these documented departures from normality and improve the size of tests. Zt = [z1t . T.g Kmenta and Gilbert. For example. utilizes the moment condition E [vec (Zt Ut0 )] = 0. some parametric assumption on the disturbance process is often imposed (see Greene. 1980). ˆ The optimal GMM estimator β GMM is derived by minimizing the GMM objective function with the optimal choice of 6 .. Most work has focused on the two-equation case. Hodgson. The above approximations appear to be good descriptions of the finite-sample behavior of the estimators analyzed when the number of observations. Non-normality of disturbances has also been found to deteriorate the quality of the above approximations. Endogenous regressors: When the regressor Xt in the SUR model is correlated with the error term Ut . This two-tier approach (dubbed quasi-FGLS) has been suggested by Creel and Farell (1996) . One could define the equivalent of White (in the case of heteroskedasticity) or HAC (in the case of serial correlation) standard errors to conduct inference with the OLS estimator as in the single-equation framework For efficiency in estimation. in the case of heteroskedasticity. The GMM estimator (or the IV estimator). is large relative to the number of equations.. 2003). then. serial correlation can arise in this environment due to the presence of individual effects (see Baltagi. An intermediate approach is to use a restricted (or parametric) covariance matrix to try to capture some efficiency gains in estimation. zN t ] to estimate β. 1968). and Vorkink (2002) propose an adaptive estimator that is efficient under the assumption that the errors follow an elliptical symmetric distribution that includes the normal as a special case.by simulation (e. . In particular. In addition to standard dynamic effects. 0 0 0 say. efficient methods provide an efficiency gain in cases where the correlation among disturbances across equations is high and when correlation among regressors across equations is low. We suppose that the IVs satisfy the usual rank condition. Linton. N. Extensions In this section we discuss several extensions of the classical linear SUR model where the assumption on the error terms is no longer satisfied. Autocorrelation and heteroskedasticity: As in standard univariate models. one needs instrumental variables (IVs).and then use a nonparametric heteroskedasticity and autocorrelation (HAC) consistent estimator of the covariance matrix to do inference. non-spherical disturbances can be accommodated by either modelling the residuals or computing robust covariance matrices.

should be used in FGLS.³ ³ ´´−1 ˆ ⊗ PT Zt Zt0 weighting matrix given by Σ . see Park and Ogaki (1991). an estimator of the long-run variance of Ut . A special case is when the order of the lagged dependent variables is one. and the assumption in the previous section is violated. These papers showed that for efficient estimation of β. In addition some modification of the regression is necessary when the integrated regressors and the stationary errors are correlated. Empirical applications in the main references include tests for purchasing power parity. When Xt is exogenous. Groen and Kleibergen (2003) and Larsson. Nonlinear SUR (NSUR): 7 . see for example Chang (2004). Seemingly unrelated cointegration regressions: When the non-constant regressors in Xt are integrated nonstationary variables but the errors in Ut are stationary. Moon (1999). and Moon and Perron (2004). ⎧ Ã ⎫ µX ¶ µX ¶−1 µX ¶!−1 ⎬−1 ⎨ T T T ˆ⊗ = G0 Σ Xt Zt0 Zt Zt0 Zt Xt0 G t=1 t=1 t=1 ⎩ ⎭ ( ) µX ¶ µX ¶−1 µX ¶ ³ ´ T T T 0 −1 0 0 0 ˆ ×G Σ ⊗ Xt Zt Zt Zt Zt Xt vec Aˆ2SLS . and tests of the forward rate unbiasedness hypothesis. If the coefficient of yit−1 is one. In this case. t=1 where Aˆ2SLS = Zt Zt0 0 t=1 Xt Zt ´ ³P T t=1 Zt Zt0 t=1 ´−1 ³P T t=1 Zt Xt0 t=1 ´¾−1 ³P T t=1 Xt Zt0 ´ ³P T t=1 Zt Zt0 ´−1 ³P T t=1 ´ Zt Yt0 is the two- stage least squares estimator of A (β) . Vector autoregressions: When the index t in the SUR model denotes time and the regressors xit include lagged dependent variables. Nonstationary SUR VAR models have been used in developing tests for unit roots and cointegration in panels with crosssectional dependence. and we conclude that β GMM = β MD . so that Xt = Zt . and Lothgren (2001). Mark et al (2005). the regressors X are no longer strictly exogenous. Lyhagen. t=1 QGMM (β) = ∙X T t=1 Then. for {yit }t to be stationary. {yit }t is nonstationary. the relation between national saving and investment. the GMM objective function QGMM (β) ˆ ˆ and minimum distance objective function QMD (β) are identical. we have ˆ β GMM ¸ µ n ¡ XT ¢0 o 0 0 ˆ⊗ vec Zt Yt − A (β) Xt Σ t=1 ½³ PT ¶−1 ∙X T t=1 ¸ n ¡ ¢0 o 0 vec Zt Yt − A (β) Xt . we call model (1) (or equivalently (2)) a seemingly unrelated cointegration regression model. not of the spontaneous covariance Σ as in the previous section. In this case. the classical linear SUR model becomes a vector autoregression model (VAR) with exclusion restrictions. it is necessary that the absolute value of the coefficient of yit−1 is less than one.

(1968): When are Gauss-Markov and Least Squares Estimators Identical?. 48. 109-142. 70-75. J. 5th ed. G. Linton. [11] Kamenta. 21. Vorkink (2002): Testing the Capital Asset Pricing Model Efficiently under Elliptical Symmetry: A Semiparametric Approach. Xt ) = (h1 (β. we write the NSUR model in a multivariate nonlinear regression form. A Companion to Theoretical Econometrics. yit = hi (β. Gilbert (1968): Small Sample Properties of Alternative Estimators for Seemingly Unrelated Regressions. x1t ) . 0101-121. 1547-1552. 62. (2004): Bootstrap Unit Root Tests in Panels with Cross-Sectional Dependency. MacKinnon (1993): Estimation and Inference in Econometrics. [6] Greene. and R. Journal of the American Statistical Association. . Journal of Econometrics. and M. [8] Hodgson. [4] Davidson. [12] Kruskal.. 619-639. xit ) . Economic Letters. Oxford University Press. References [1] Baltagi.. Journal of Business and Economic Statistics. M. 239-245. Lyhagen. B. R. Journal of Applied Econometrics. In this case. Backwell Publishers.Defining H (β. Econometrics Journal. [2] Chang. Kleibergen (2003): Likelihood-Based Cointegration Analysis in Panels of Vector Error Correction Models. eds. 17. (1980): On Seemingly Unrelated Regressions with Error Components. hN (β. O. 141-142.. 120. and K. J. in Baltagi. Y. 1180-1200. W. (2001): Seemingly Unrelated Regression. Farell (1996): SUR Estimation of Multiple Time-Series Models with Heteroskedasticity and Serial Correlation of Unknown Form.An NSUR model assumes that the conditional mean of yit given xit is nonlinear. 4. J. 63. 295-318. Xt ) + Ut . D. xN t )) . xit ) + uit . that is. (2003): Econometric Analysis. [5] Fiebig.. and M. Econometrica. Prentice Hall. 39. we may estimate β using (quasi) MLE assuming that Yt are Gaussian conditioned on Xt or GMM utilizing the moment condition that E [g (Xt ) Ut0 ] = 0 for any measurable transformation g of Xt . (1967): The Unbiasedness of Zellner’s Seemingly Unrelated Regression Equations Estimator. [9] Larsson. and F. Annals of Mathematical Statistics. 263-293. Lothgren (2001): Likelihood-based Cointegration Tests in Heterogeneous Panels. N. [10] Kakwani. [7] Groen. W.. Yt = H (β.F. Journal of the American Statistical Association. 8 . [3] Creel. R.C. B. New Jersey. say hi (β. D. and J. 53.

P. Journal of the American Statistical Association. [19] Srivastava. New York: Marcel Dekker Inc.B. [20] Zellner A. 797-820.[13] Mark. and D. Ogaki (1991): Seemingly Unrelated Canonical Cointegrating Regressions. Maekawa (1995): Efficiency Properties of Feasible Generalized Least Squares Estimators in SURE Models under Non-normal Disturbances. 977-992 [22] Zellner A. [18] Srivastava. H.C. A. and D. 23. Journal of the American Statistical Association. K. (1963): Estimators for Seemingly Unrelated Regression Equations: Some Finite Sample Results. Econometric Reviews. J. Perron (2004): Efficient Estimation of SUR Cointegration Regression Model and Testing for Purchasing Power Parity. 72.. Journal of Econometrics. Giles (1987): Seemingly Unrelated Regression Equations Models.. Journal of the American Statistical Association. Review of Economic Studies. [15] Moon. and P. 255 9 . V. 500-509. 25—31. 66. (1972): Corrigenda. Economics Letters. H. Sul (2005): Dynamic Seemingly Unrelated Cointegrating Regressions. [21] Zellner A. 57. K. Ogaki. [16] Park. M. University of Rochester working paper 280.R. (1999): A Note on Fully-Modified Estimation of Seemingly Unrelated Regressions Models with Integrated Regressors. 67. and K. E. (1977): Finite sample distribution of Zellner’s SURE. 65. N. Journal of Econometrics. [14] Moon. [17] Phillips. (1962): An Efficient Method of Estimating Seemingly Unrelated Regression Equations and Tests of Aggregation Bias.R. 58. 147-164. 293-323. 99-121. and M. V. 6.