Proceedings of the American Control Conference Arlington, VA June 25-27, 2001

A Framework for Subspace Identification Methods
R u i j i e Shi a n d J o h n F. M a c G r e g o r Dept. of Chemical Engineering, McMaster University, Hamilton, ON L8S 4L7, Canada Email: shir@mcmaster.ca, macgreg@mcmaster.ca

Abstract
Similarities and differences among various subspace identification methods (MOESP, N4SID and CVA) are examined by putting them in a general regression framework. Subspace identification methods consist of three steps: estimating the predictable subspace for multiple future steps, then extracting state variables from this subspace and finally fitting the estimated states to a state space model. The major differences among these subspace identification methods lie in the regression or projection methods used in the first step to remove the effect of the future inputs on the future outputs and thereby estimate the predictable subspace, and in the latent variable methods used in the second step to extract estimates of the states. This paper compares the existing methods and proposes some new variations by examining them in a common framework involving linear regression and latent variable estimation. Limitations of the various methods become apparent when examined in this manner. Simulations are included to illustrate the ideas discussed.

focuses on the algorithms instead of concepts and ideas behind of these methods. In this paper, SIMs are compared by casting them into a general statistical regression framework. The fundamental similarities and differences among these SIMs is clearly shown in this statistical framework. All the discussion in this paper is limited to the open loop case of linear time invariant (LTI) system. In next section, a general framework for SIMs will be set up first. Then the following two sections will discuss the major parts and how these methods fit to the framework. A simulation example follows to illustrate the key points. The last section provides conclusions and some application guidelines.

2. General Statistical F r a m e w o r k for SIMs
2.1 Data Relationships in Multi-step Statespace Representation
A linear deterministic-stochastic combined system can be represented in the following state space form: xk+1 = Ax~ + Bu~ + w~ (1)
Yk = Cx~ + D u k + Nw~ + v~ (2) where outputs Yk, inputs Uk and state variables Xk are of dimension l, m and n respectively, and stochastic variable Wk and Vk are of proper dimensions and un-correlated with each other. In order to catch the dynamics, SIMs use the multiple steps of past data to relate to the multiple steps of future data. For an arbitrary time point k taken as the current time point, all the past p steps of the input forms a vector Up, and the current and the future j¢l steps of the input forms a vector u_r. Similar symbols for output and noise variables (some algorithms assume p-j):

1. Introduction
Subspace identification methods (SIMs) have become quite popular in recent years. The key idea in SIMs is to estimate the state variables or the extended observability matrix directly from the input and output data. The most influential methods are CVA (Canonical Variate Analysis, Larimore, 1990), MOESP (Multivariable Output Error State space, Verhaegen and Dewilde, 1992) and N4SID (Numerical Subspace State-Space System IDentification, Van Overschee and De Moor, 1994). These methods are so different in their algorithms that it is hard to bring them together and get more insights on the essential ideas and the connections among them. However, some effort has been made to contrast these methods. Viberg (1995) gave an overview of SIMs and classified them into realization-based or direct types, and also pointed out the different ways to get system matrices via estimated states or extended observability matrix. Van Overschee and De Moor (1995) gave a unifying theorem based on lower order approximation of an oblique projection. Here different methods are viewed as different choices of row and column weighting matrices for the reduced rank oblique projection. The basic structure and idea of their theorem is based on trying to cast these methods into the N4SID algorithm. It

Up =

yp =

/uk-= /
\ u~-i )

|
\ Yk-i

0-7803-6495-3/01/$10.00 © 2001 AACC

3678

these methods are found to be very similar. For convenience.I Hk Uk+l Uf= / Uk+2. and follow the same framework.. Uf. /]. .. ItfUf is the effect of the future inputs and can be removed if Hf is known or estimated.(f2p . . + H.f are : D 0 0 -. With auto-correlated inputs..AS'Fp+Hp )Up + H. computation tools and interpretation. The relationships between these data sets and the state variables are analyzed in the following multi-step statespace representation as a general environment for discussing SIMs and their framework. N4SID is interpreted in the concept of non-stationary Kalman filters. FfXk is the free evolution of current outputs (with no future inputs) and independent of the system matrices. However. Yf. A.. 1970).p)W~ . if the basic ideas behind these methods are scrutinized from the viewpoint of statistical regression.. .APrp+Hp)Up + (n.... Ap-2B. In fact. Only FfXk is predictable from the past data set..+H.. "•• . All the possible Xk for different k are collected in columns of Xk. -A"F. the extended observability matrix is Ff=[C T. Wf and Vf... . and this predictable subspace is the fundamental base for SIMs to estimate state sequence Xk or the observability matrix Ff. the right singular vectors are estimated as state variables and fit to the state space model. The framework consists of three steps: i) Estimate the predictable subspace FfXk by a linear regression method ii) Extract state variables from the estimated subspace by a latent variable method D CB " 0 D " .Uk+t'-I Y--f equation (6) summarizes the necessary information in the past history to predict the future outputs of (5).... Up. "'° 0 o 0 ~. current state sequence Xk (therefore FrXk) is a linear combination of the past data.. similar notations for Yp. all the possible __Up for different k are collected in columns of Up which is the past input data set. System states can be defined as "the minimum amount of information about the past history of a system which is required to predict the future motion" (Astr/Dm. k.. CVA uses CCA (Canonical Correlation Analysis) to estimate the state variables (called memory) and fit them to the state space model./ = CA CA1-3 B 0 N C CAr-4 B 0 0 N ... " " " CA/~-2 CAr-3 CA !-4 N Ff and Hf show the effect of current states and the future inputs on the future outputs respectively• The result of substituting Xk_p from (4) to (3) is (Fp+ is the pseudo-inverse): x~ : A~rp+rp + (n~ . a linear relationship between the future outputs and the past data as well as the future inputs is obtained: YS : r'/. By substituting (6) into (5). the following multi-step state-space model for the current states.. A B .)Wp -FrA"F. Here the linear combination of the terms on the right hand side of 3679 . (CAf-1)T]T and the lower block triangular Toeplitz matrices Hf and Hs.pWp + Vp (3) (4) Y: : r ~ X ~ + H~U~ + H..+V. ..(2p=[Ap-IB.. and the computation methods are analyzed in regression terms and related to each other. (2) and the above notations.CAi-2 B N C H. it is not part of the causality effect to be modeled in system identification.. the difference between these SIMs seems so large that it is hard to find the similarities between them.. (CA2) T. Based on equation (1). p .. As for the detailed algorithms (refer to papers). The original MOESP does a QR decomposition on [U f.. based on which A and C matrices are estimated. . (CA) T..0 CB Hf = CAB • + r / ( ~ ... It is the part of future output space in (5) that can be estimated from the data relationships.pmp Yp : FpX~_p + HpUp + H. H f U f is correlated with the past data and therefore part of it can be calculated from the past data if the input auto-correlation remains unchanged. Uf] and does an SVD on the part corresponding to the past data. Yf] and then a SVD on part of the R matrix. .. N4SID projects Yf onto [Yp. mp-2.rU r + F1iA"F. B]. It is interpreted in maximum likelihood principle.2 General Statistical Framework for SIMs Each SIM looks quite different from others in concept.. and therefore should not be taken into account for the prediction of Yf based on the past data..A'rp+Vp (6) That is.p=[A p-l.jW s + ~+ (7) All the terms involving the past form the basis for FrXk. Wp.. and B and D are estimated through a LS fitting.('~s. The input auto-correlation may give difficulty in estimation of the predictable subspace and the state variables.jWj + V~ (5) Where extended controllable matrices are . Part of the singular vector matrix is taken as Ff.. 0 0 o 0 D . 2. ... . Vp.APrp+H.. and the future noise terms are unpredictable..+Y. the past and future output data can be obtained (Xk_p is the initial state sequence): X k = APXk_p + ~'~p Up -Jr-ff~s.

do the estimation of Hf and projection onto the past data sets in one step. Yf]. Once Hf is estimated. however. This bias will occur because of the correlation between past outputs and the past noise terms in (7). 4. Up]). Here MOESP is analyzed based on estimated states that come from exactly the same subspace as Ff (also refer to Van Overschee and De Moor. 1995). Once Hf is estimated. 1995. which is equivalent to projection Y~-/2/. however. Hf shows the effects of Uf on If. This method gives an unbiased result only when the inputs are white noise signals. however.. method to estimate Hf implicitly and the predictable subspace via QR decomposition on [Uf. Regression out method Uf can be regressed out of both sides of (7) by projecting to the orthogonal space of Uf. This projection procedure may induce some error.1Uf onto the past data. it is natural to try to get Hf by directly performing LS regression of Yf against Uf as in (5). Regression Yf against Uf (MOESP) Since Hf is the coefficient matrix relating /. SVD on this subspace will gives an asymptotical unbiased estimation of ~ . This estimation will have a slight bias if the input signals are auto-correlated. Yf] (Pio=[Yp. The central problem is how to remove the future input effects HfUf away from Yf in (5) in order to obtain a better estimate of the predictable subspace F~Xk. the estimation Of Xk will be biased. These impulse weight blocks can be estimated from a simple model. Different algorithm uses different method to estimate Hf from the input and output data sets.2 Methods Used to Estimate Hf 1. See next section for more discussion. the predictable subspace FfXk should be first estimated in order to have a basis for estimation of states Xk or F~ matrix.Instrumental Variable Method. The predictable subspace then is estimated as Yr ~ Uf. 3. however. which can be obtained by regressing Yk against Uk (if De0). Up. and was implied in Van Overschee and De Moor (1995). This is the method some CVA algorithms use to estimated Hf and the predictable subspace. Uf]. this method regresses part of the state effect away and gives a biased result for the predictable subspace. oT(PIoPIoT)-'PIo) for the estimation of Hr. most algorithms do not make full use of these features. 5. Another similar approach is to regress past data p out of both sides of (7) (projecting to the orthogonal space of Pio. The method has also been applied to the CVA method (refer to Van Overschee and De Moor.iii) Then fit the estimated states to a state space model. by post multiplying both side by Pufo=I-Ufr(ufufr)-lUf. If there is a variable that is correlated to Uf but has no 3. The original MOSEP algorithm extracts Ff from the estimated subspace. This estimation includes the effects of the estimation errors in ~ and the effects of future stochastic signals. This turns out to be equivalent to the approach of N4SID.e. (CVA) The nature of Hf implies that it can be constructed by the first f impulse block weights. This removes away the Uf term from the equation. in most cases it is less than the unpredictable future noise. It is realized by QR decomposition of [Uf. which can be removed away by projection to the past data. 2. Here the regression coefficient for Uf is an estimate of Hf (/2(r) and the part corresponding to the past data is an estimation of the predictable subspace. say ///. post multiplied by Ppo = I-P. The true Hf is a lower block triangular matrix. If the input sequences are auto-correlated. Uf] (N4SID) Based on (6). such as N4SID. i. This is the method used in N4SID to estimate Hf and the predictable subspace. There are quite a few ways for this task. Pio. Up. These features (or requirements of Hf) are very informative. Yf-~ Uf is an estimation of the predictable subspace. The coefficient matrix Hf is unknown and needs to be estimated. they all belong to the linear regression method. The original MOESP uses this 3680 .If to Yf. It is a natural choice to regress Yf against [Yp. This result is equivalent to that from N4SID. Carette. past inputs (Up) and past outputs (Yp). Regression Yf against [Yp. we know FfXk in (5) can be estimated by a linear combination of the past inputs Up and past outputs Yp. Some subspace identification methods. The major differences among SIMs are in the first two steps and the third step is the same. 2000). and the coefficient matrices for past data in (7) can be obtained by regressing YfPufo against PioPufo.Constructing Hf from impulse weights of ARX model 3. A basic assumption for an unbiased result is that the future inputs are un-correlated with the noise terms in (7). such as an ARX model or FIR model.1 Linear Regression for Hf to Estimate FfX k In SIMs. which will also include the effect of state variables/'fXk in this case. The PO-MOESP (1994. It includes all the future noise. past output (PO-) MOESP) gives similar results. and consists of the first f steps of impulse weights on lower diagonals (SISO) or block weights on block lower diagonals (MIMO). the predictable subspace is estimated as Yf-/2//Uf. E s t i m a t i o n of the P r e d i c t a b l e Subspace 3.

the estimation error generally makes the space full rank. 2. Here @ is the third block of the regression coefficient matrix. The key problem comes from the correlation between Ue and Xk. and this is not satisfied in case of auto-correlated inputs. which has no correlation with past data. an unbiased He can be estimated by the instrumental variable (IV) method based on (5). one is maximizing the relative variation in Y in each dimension rather than the absolute variation. 4. Principle Component Analysis (PCA). The PO-MOESP applied PCA to the projection of estimated predictable subspace onto part of the past data space. All these estimation methods are variants of the linear regression method: they differ only in their choice of the independent and dependent variables. i. the true model structure. Latent variables (LVs) are linear combinations of the original (manifest) variables for optimization of a specific objective. this subspace should be only of rank n and any n independent variables in the data set or their linear combinations can be taken as state variables. As mentioned above. Direct choice of any n variables will have large estimation error and lose the useful information in all other variables.1 Latent Variable Methods for State Estimation The predictable subspace estimated by the linear regression methods of the last section is a high dimensional space (>>system order n) consisting of highly correlated variables. Up] and Y=Yfe = Yc[IfUe. however. et al. 3681 . the direct results of canonical variates J1P~oPueo are obviously biased estimation for Xk.2 Methods Used for State Estimation 1. For auto-correlated inputs. CCA (CVA) CVA applies CCA on P~o=[Yp. This estimated predictable subspace includes all the furore noise and is not predicted by the past data. 2001). and take the residual as the IV. Ue correlates with Xk through its correlation with Up. 1996). the predictable subspace can be easily estimated. CCA can also be applied to the results resulting from projecting future inputs Ue out both the past data and the future outputs. PioPueo and YePueo. covariance. The estimation accuracy (bias and variance) in each method depends on the input signal. and therefore PCA results have large estimation errors and no guarantee of the predictability. The second assumption is essentially for the unbiasness of the estimation. In general terms. which are highly correlated to the true states.Uf] corresponding to the past data. This implies assumptions that/-'fXk has a larger variation than that of the estimation error and the two parts are uncorrelated. and at the same time the future noise is removed. Partial Least Squares (PLS). where ~ is obtained by directly regressing Yf onto Uf. In fact.Ue based on the past data. and the degree of utilization of knowledge about the features of He. has no correlation with Xk. see Shi. However. and PCs are taken as estimated states. Canonical Correlation Analysis (CCA) and Reduced Rank Analysis (RRA) are latent variable methods that maximize variance. which is essentially a SVD procedure. E s t i m a t i o n o f S t a t e V a r i a b l e s 4. It is clear that the best predictability in N4SID is in the sense of total predictable variance of Yf-/2(. Extracting only n linear combinations from this highly correlated high-dimensional space and keeping as much information as possible will be the most desirable.1. therefore the part of Ue. The result is 4. There are a variety of latent variable methods based on different optimization objectives. of regression sequences. PCA (MOSEP and N4SID) Both N4SID and MOESP extract X k by doing PCA on the estimated predictable subspace.. and this is assured by projection onto the past data. If the inputs are auto-correlated. Here the coefficient matrix J~ should be applied to the original past data to get state estimates: J~P~o. Therefore the estimation error and bias of Xk from N4SID are very small in general. Once He is estimated. therefore the result is generally improved. This method gives unbiased result for white noise inputs. By selecting the canonical variates with largest correlation as states. This is exactly the general goal and the situation for which latent variable methods were developed. it can be shown that this method is equivalent to performing RRA on past data and Yr ~ Uf (for proof. which arises from autocorrelation of the input sequence in the open loop case. this result can be deemed as the predictable subspace projection Yc-/7/rUf on to the past data. and the signal to noise ratio (SNR). and the first n latent variables (CVs) from the past data set are estimates of Xk. This part of Ue can be constructed by regressing Up out of Ue. the result will be biased.Up. The state-based MOESP (in original algorithm) directly uses PCA (SVD) on the estimated predictable subspace YfUf. The first assumption is well satisfied if the signal to noise ration is large. correlation and predictable variance respectively (for details refer to Burnham. Different SIMs employ different LVMs or use them in different ways to estimated state variables.e.correlation with Xk and the future noise. and this ensures the first n PCs are the estimated state variables. These estimates are no longer orthogonal. N4SID applies PCA (SVD) on the part of the projection Yf/[Yp'. If there were no estimation error. Latent variable methods are therefore employed in all SIMs as the methodology for estimation of the state variables from the predictable subspace.

such as PLS and RRA. 2000). states which maximizing the variance explained in the predictable subspace. Viveros and J. E. Canonical Variate Analysis for system Identification. 2001). Simulation Example In this section. but they are very close. 3.95z -1 The input signal is a PRBS signal with switching time period T~=5 and magnitude of 4. There are many indexes for comparison of the estimated states to the true states. The estimated states from the former method are not linear combinations of those estimated by the latter method. However. 1997 3682 . Journal of Chemometrics.R. One quick index is the canonical correlation coefficients between estimated states and the true states (see Table 2). a simulation example is used to illustrate the discussed points.If in general. 1000 data points are collected with var(ek)=l. Academic Press. It also reveals possible new methods and new combinations of existing approaches for novel methods. 1990 Larimore. Although SIMs are quite different in their concepts and algorithms.0. This gives a clear idea on how the two spaces are consistent with each other. it should give numerically improved estimates since it directly obtains the n LVs which explain the greatest variance in the predictable subspace. Vol. 1996 Carette. The results from other SIMs are very close to the true values.8z -1 1-0. In effect. Monitoring and Control Using Canonical Variate Analysis. are possible choices for state extraction. modeled as: 0. Frameworks for Latent Variable Multivariate Regression.l) and their errors (Fig. The results by regressing Yf directly onto Uf are clearly the farthest from the true values because of the bias resulting from the strong auto-correlation in the input PRBS signal. By discussing the SIMs in this framework their similarities and differences can be clearly seen. MOESP using the direct regression Ye/Uf clearly gives poor results. CCA results based on PioPufo and YfPufo and those based on Pio and Yf-HfUf (using the true He) are also compared. Conclusions 5.0. personal communication. P.93 (in variance). it will generally not provide minimal order state models. W.2). (2) use of a latent variable method for estimation of a minimal set of the state variables. 1970 Burnham. All SIMs give smooth response by fitting the LVs to the state equation. Introduction to stochastic Control Theory. The coefficient matrix J1 from the former is different from that of the latter (to large to show). which shows how much the total sum squares of the true states can be explained by the estimated states (two states are scaled to unit variance). see Shi.31-45. the data set YfPurohas a worse SNR than Yf-~/. 29th IEEE conference on Decision and Control. pp. A.. However. Proc. and (3) then fitting to the state space model. rather than the two-step procedure of the first performing an ill-conditioned least squares (oblique projection) followed by PCA/SVD (see Shi.. State estimation by PLS gives relatively degraded results compared with CCA or RRA based on the same estimated predictable subspace by ARX. that is. References AstrOm.. it will try to provide state vectors for the joint input/output space (refer to Shi and MacGregor. 6. such as use of the IV method for the estimation of Hf. Hawaii. 2000 Larimore. Since the objective of PLS is to model the variance in the past data set and the predictable subspace as well as their correlation. Different methods for estimation of the Hf matrix are applied to the simulation example and compared to the true result. preprints of ADCHEM.1. RRA should provide estimates of the states based on the same objective as N4SID. If the estimated states are used for fitting the state space model. Other Possible Methods Other LVMs. Honolulu. The estimated states by other methods are much closer to the true states. Optimal Reduced Rank Modeling. they follow the same statistical framework set up here: (1) use of a linear regression method to estimate/-/f and the predictable subspace. Combinations of the methods used for the predictable subspace estimation and the methods used for the state variable estimation can lead to a whole set of different subspace identification methods. each method can be compared by plotting their estimated impulse responses (Fig. Banff. For example. and use of other latent variable methods RRA and PLS for state estimation. Filtering and Adaptive Control. notes for CVA. since part of the state signal is removed away by regressing Uf out while the noise is kept intact. V 10. The example is a simple 1 st order SISO process with AR(1) noise.J.proven to give the unbiased state variable estimates (for proof.F. Other methods give results close to the true values. Similar conclusions are indicated by the squared multiple correlation R 2. MacGregor. MOESP gives is a poor result for this example.. A rough comparison is to take the mean of elements on lower diagonal as the estimated impulse weight. E. 2001). The results and total absolute errors are listed in Table 1. The somewhat irregular response from the ARX model is also shown for comparison. W. Prediction.2z -1 1 Yk = ~ u k -~ ek l_0. and SNR is about 0. Both past and future lag steps are takes as 7 for every method.J. 2000). The result from SIM-ARX-PLS has a large error but can be improved to match the others by using more LVs (refer to Shi and MacGregor. K.

1626 0... and De Moor.0778 lILl 0 ~. OVA .9993 2 nd CC 0.. -~ .I V ~ i. 40 45 50 Fig..0159 0...9995 0. /!~i~ o~ i"~t'. V30. V52. ".. N4S1Dand CVA .1137 0.D thesis. 1996 Shi...8680 0..... pp. pp. M.1546 0.(32 Table 2 Comparison between the Estimated and the True States Method MOESP N4SID CVA SIMSIM.1035 0. Ph... .... V56.2716 0. R. 2000 Shi.:?.. Subspace identification from closed loop data.1024 0... .. Automatica..9600 0.. pp.: G2. 2 Error on the impulse responses 3683 .0728 0. R... P. 5 10 15 20 25 3o 35 4o 45 ]irr~(SaTl:lirgIrto~i) Fig.ARXCCA PLS RRA IV ARX ARX Predictable Yf/Uf Yf/[PIo..1997 0.. Err. Identification Algorithms.2203 0.!..5688 0. 30 .1953 0.. 75-93.0863 0. The Output-error State-space model identification class of algorithms.Uf] ARX Subspace CCA PLS RRA Estimated PCA PCA CCA States 1st CC 0. S I M ..(125 0 I . o. ...128 0. 25 .0750 0. Part 1..i -0. ..1585 0."~ i . 0... J.9590 0.. Identification of Deterministic Part of MIMO State .. .9613 0..Ufl ARX Wl w2 w3 w4 w5 w6 w7 Sum Abs.. 1 Impulse response from SIMs M:d~ing F~sdts by SIMs O.0610 0. Modeling of Dynamic Systems using Latent Variable and Subspace Methods.0675 IV 0. Subspace-based Methods for the Identification of Linear Time-invariant System.015 -0.. ....2599 0..1617 0.1003 0. 30. . '.. . .9500 0.. i\.1613 0.0510 0.9618 R2 0.O35 i i i i J i i i i Table 1 the Impulse Weights in Estimated Hf Method True Yf/Uf Y([PIo. 1995 0 ' 5 1'o 1. . And McKelvey...3 . International Journal of Control. 05 il ...9612 0. A Unifying Theorem for Three Subspace System. .. .... M..0668 0.. .. ... Space Models Given in Innovations from Input-output Data... 18351851. McMaster University. .1951 0.9122 0..~' !l.25 True MOESP 0. No. . 0 0. P.423-439.... and Dewilde P. MacGregor. Automatica. V31... N4SID" Subspace Algorithms Irrloulse ~ R 0. of Chemometrics. Vol..9993 0.... ON.0559 0.. "~'"~'~.31.. V14. Automatica.. No.:. B.1061 0 0... .Ljung.2 0.01 ¢~ i~i '.~ / ~.).*/-" rr ~ E -0..0655 0 0..0819 0. pp. 35 (Sa~ing Irter~t) . Vol.9997 0.!:o' I!:~ ":.9605 0. 1835-1864. 12. .. pp.4031 0.16 0.1086 0. Automatica...i! : i Trua . 1995 Verhaegen.~.9623 0.0191 0. 2001 Van Overschee.0602 0.. by r n : ~ s frcrn MOESP. 1994 Van Overschee.1770 0.. pp.0811 0. B. N4SID OVA . M. Subspace Model Identification. 1994 Viberg. T. .. . .9606 -0.2135 0.2i for the Identification of Combined Deterministic-Stochastic System.. L..8667 0.SIMIVARX. S I ~ ....61-74.. Signal Processing. and De Moor.. Canada. and J.1187-1210. 1992 Verhaegen.. 20 . 1.