VAR VEC Juselius Johansen

The Cointegrated VAR Model: Econometric Methodology and Macroeconomic Applications
Katarina Juselius July, 20th, 2003
Chapter 1 Introduction
Economists frequently formulate an economically well-specied model as the empirical model and apply statistical methods to estimate its parameters. In contrast, statisticians might formulate a statistically well-specied model for the data and analyze the statistical model to answer the economic questions of interest. In the rst case, statistics are used passively as a tool to get some desired estimates, and in the second case, the statistical model is taken seriously and used actively as a means of analyzing the underlying generating process of the phenomenon in question. The general principle of analyzing statistical models instead of applying methods can be traced back to R.A. Fisher. It was introduced into econometrics by Haavelmo (1944) [hereafter Haavelmo] and operationalized and further developed by Hendry and Richard (1983), Hendry (1987), Johansen (1995) and recent followers. Haavelmos inuence on modern econometrics has been discussed for example in Hendry, Spanos, and Ericsson (1989) and Anderson (1992). Few observed macroeconomic variables can be assumed xed or predetermined a prior. Haavelmos probability approach to econometrics, therefore, requires a probability formulation of the full process that generated the data. Thus, the statistical model is based on a full system of equations. The computational complexities involved in the solution of such a system were clearly prohibitive at the time of the monograph when even the solution of a multiple regression problem was a nontrivial task. In todays computerized world it is certainly technically feasible to adopt Haavelmos guide-lines in empirical econometrics. Although the technical diculties have been solved long ago most papers in empirical macroeconomics do not seem follow the general 1
CHAPTER 1. INTRODUCTION
principles stated very clearly in the monograph. In this monograph we will claim that the VAR approach oers a number of advantages as a general framework for addressing empirical questions in (macro)economics without violating Haavelmos general probability principle. First we need to discuss a number of important questions raised by Haavelmo and their relevance for recent developments in the econometric analysis of time series. In so doing we will essentially focus on issues related to empirical macroeconomic analysis using readily available aggregated data, and only briey contrast this situation with a hypothetical case in which the data have been collected by controlled experiments. The last few decades, in particular the nineties, will probably be remembered as a period when the scientic status of macroeconomics, in particular empirical macroeconomics was much debated and criticized both by people from outside, but increasingly also from inside the economics profession. See for example the discussions in Colander and Klamer (1987), Colander and Brenner (1992), and Colander (2001). An article expressing critical views on empirical economics is Summers (1991), in which he discusses what he claims to be the scientic illusion in empirical macroeconomics. He observes that applied econometric work in general has exerted little inuence on the development of economic theory, and generated little new insight into economic mechanisms. As an illustration he discusses two widely dierent approaches to applied econometric modelling; (i) the representative agents approach, where the nal aim of the empirical analysis is to estimate a few deep parameters characterizing preferences and technology, and (ii) the use of sophisticated statistical techniques, exemplied by VAR-models a la Sims, to identify certain parameters on which inference to the underlying economic mechanisms is based. In neither case he nds that the obtained empirical results can convincingly discriminate between theories aimed at explaining a macroeconomic reality which is innitely more rich and complicated than the highly simplied empirical models. Therefore, he concludes that a less formal examination of empirical observations, the so-called stylized facts (usually given as correlations, mean growth rates, etc.) has generally resulted in more fruitful economic research. This is a very pessimistic view of the usefulness of formal econometric modelling which has not yet been properly met, nor ocially discussed by the profession. The aim of this book is to challenge this view, claiming that part of the reason why empirical results often appear unconvincing is a neglect to follow the principles laid out by Haavelmo (1943).
1.1. A HISTORICAL OVERVIEW
The formal link between economic theory and empirical modelling lies in the eld of statistical inference, and the focus is on statistical aspects of the proposed VAR methodology, but at the same time stressing applicability in the elds of macroeconomic models. Therefore, all through the text the statistical concepts are interpreted in terms of relevant economic entities. The idea is to dene a dierent class of stylized facts which are statistically well founded and much richer than the conventional graphs, correlations and mean growth rates often referred to in discussion of stylized facts. In this chapter we will revisit Haavelmos monograph as a background for the discussion of the reasons for the scientic illusion in empirical macroeconomics and ask whether it can be explained by a general failure to follow the principles expressed in the monograph. Section 1.1 provides a historical overview, Section 1.2 discusses the choice of a theoretical model. Section 1.3-1.5 discuss three issues from Haavelmo that often seem to have been overlooked in empirical macroeconomics: (i) the link between theoretical, true and observed variables, (ii) the distinction between testing a hypothesis and testing a theory, and (iii) the formulation of an adequate design of experiment in econometrics and relates it to a design by controlled experiments and by passive observation. Section 1.6 nally introduces the empirical problem which is to be used as an illustration all through the book.
1.1
A historical overview
To be included!
1.2
On the choice of economic models
This section discusses the important link between economic theory and the empirical model. In order to make the discussion more concrete, we will illustrate the ideas with an example taken from the monetary sector of the economy. In particular we will focus on the aggregate demand for money relation, being one of the most analyzed relations in empirical macroeconomics. Before selecting a theoretical model describing the demand for money as a function of some hypothetical variables, we will rst discuss the reasons why it is interesting to investigate such a relation. The empirical interest in money demand relations stems from basic macroe-
conomic theory postulating that the ination rate is directly related to the expansion in the (appropriately dened) supply of money at a rate greater than that warranted by the growth of the real productive potential of the economy. The policy implication is that the aggregate supply of money should be controlled in order to control the ination rate. The optimal control of money, however, requires knowledge of the noninationary level of aggregate demand for money at each point of time, dened as the level of money stock, m , at which there is no tendency for the ination rate to increase or decrease. Thus, on a practical level, the reasoning is based on the assumption that there exists a stable aggregate demand-for-money relation, m = f (x), that can be estimated. Given this background, what can be learned from the available economic theories about the form of such a relation, and which are the crucial determinants? There are three distinct motives for holding money. The transactions motive is related to the need to hold cash for handling everyday transactions. The precautionary motive is related to the need to hold money to be able to meet unforeseen expenditures. Finally, the speculative motive is related to agents wishes to hold money as part of their portfolio. Since all three motives are likely to aect agents needs to hold money, let the initial assumption be that m/p = f (y r , c), saying that real money holdings, m/p, is a function of the level of real income (assumed to determine the volume of transactions and precautionary money) and the cost of holding money, c. Further assumptions of optimizing behavior are needed in order to derive a formal model for agents willingness to hold money balances. Among the available theories two dierent approaches can be distinguished: (i) theories treating money as a medium of exchange for transaction purposes, so that minimizing a derived cost function leads to optimizing behavior, (ii) theories treating money as a good producing utility, so that maximizing the utility function leads to optimizing behavior. For expository purposes, only the rst approach will be discussed here, specically the theoretical model suggested by Baumol (1952), which is still frequently referred to in this context. The model is strongly inuenced by inventory theory, and has the following basic features. Over a certain time period t1 t2 , the agent will pay out T units of money in a steady stream. Two dierent costs are involved, the opportunity cost of the foregone investment measured by the interest rate r, and the so-called brokerage cost b. The latter should be assumed to cover all kinds of costs in connection with a cash withdrawal. It is also assumed that liquid money does not yield interest.
1.3. THEORETICAL, TRUE AND OBSERVABLE VARIABLES
so that the cost-minimizing agent will demand cash in proportion to the square root of the value of his transactions. The average holding of cash under these assumption is C/2. Taking the logarithms of (1.1) gives a transactions elasticity of 0.5 and an interest elasticity of -0.5. These have been the prior hypotheses of many empirical investigations based on aggregated data, and estimates supporting this have been found, for instance, in Baba, Hendry, and Starr (1992). If this theoretical model is tested against data, more precise statements of what is meant by the theoretical concepts C, b, T and r need to be made. This is no straightforward task, as expressed by Haavelmo, p.4: When considering a theoretical set-up, involving certain variables and certain mathematical relations, it is common to ask about the actual meaning of this and that variable. But this question has no sense within a theoretical model. And if the question applies to reality it has no precise answer. It is one thing to build a theoretical model it is another thing to give rules for choosing the facts to which the theoretical model is to be applied. It is one thing to choose the model from the eld of mathematics, it is another thing to classify and measure objects of real life. As a means to clarify this dicult issue Haavelmo introduces the concepts of true and theoretical variables as opposed to observable variables and proposes that one should try to dene how the variables should be measured in an ideal situation. This will be briey discussed in the next section.
The optimal value of cash withdrawn from investment can now be found as: p C = 2bT /r (1.1)
1.3
Theoretical, true and observable variables
In order to operationalize a theoretical concept, one has to make precise statements about how to measure the corresponding theoretical variable, i.e. to give the theoretical variable a precise meaning. This is expressed in Haavelmo, p. 5 as: We may express the dierence [between the true and the theoretical variables] by saying that the true variables (or time functions) represent our ideal as to accurate measurements of reality as it is in fact while the variables dened in theory are the true measurements that we should make if reality were actually in accordance with our theoretical model.
Say, for example, that a careful analysis of the above example shows that the true measurements are the average holdings of cash and demand deposits by private persons in private banks, postal banks or similar institutions (building societies, etc.) measured at closing time each trading day of a month. The theoretical variable C as dened by Baumols model would then correspond to the true measurements given that i) no interest is paid on this liquid money, ii) transactions are paid out in a steady stream over successive periods, iii) the brokerage cost and interest rate r are unambiguously dened, vi) no cash and demand deposits are held for speculative or precautionary motives, and so on. Needless to say, the available measurements from ocial statistics are very far from the denitions of the true variables. Even if it were possible to obtain measurements satisfying the above denition of the true measurements, it seems obvious that these would not correspond to the theoretical variables1 . Nevertheless, if the purpose of the empirical investigation is to test a theory, a prerequisite for valid inference about the theoretical model is a close correspondence between the observed variables and the true variables, or in the words of Haavelmo: It is then natural to adopt the convention that a theory is called true or false according as the hypotheses implied are true or false, when tested against the data chosen as the true variables. Then we may speak interchangeably about testing hypotheses or testing theories. For example, the unit root tests of GDP series to discriminate between dierent real growth theories are good examples of misleading inference in this respect. Much of the criticism expressed by Summers may well be related to situations in which valid inference would require a much closer correspondence between observed and true variables. Even if data collected by passive observation do not generally qualify for testing deep theoretical models, most empirical macroeconomic models are nonetheless based on the ocially collected data. This is simply because of a genuine interest in the macroeconomic data as such. Since data seldom speak by themselves, theoretical arguments are needed to understand the variation in these data in spite of the weak correspondence between the theoretical and observed variables. Nevertheless, the link between the theoretical
It should be pinted out that Baumol does not claim any such correspondence. In fact, he gives a long list of reasons why this cannot be the case.
1
1.4. TESTING A THEORY AS OPPOSED TO A HYPOTHESIS
and the empirical model is rather ambiguous in this case and the interpretation of the empirical results in terms of the theoretical arguments is not straightforward. This leads to the second issue to be discussed here, i.e. the distinction between testing a hypothesis and testing a theory. This monograph will claim that based on macroeconomic data it only possible to test theoretical hypotheses but not a theory model as such. The frequent failiure to separate between the two might explain a great del of Summers critique.
1.4
Testing a theory as opposed to a hypothesis
When the arguments of the theory do not apply directly to the empirical model, one compromise is to be less ambitious about testing theories and instead concentrate on testing specic hypotheses derived from the theoretical model. For example, the hypotheses that the elasticity of the transactions demand for cash ec = 0.5 and of the interest rate er = 0.5 have frequently been tested within empirical models that do not include all aspects of Baumols theoretical model. Other popular hypotheses that have been widely tested include long-run price and income homogeneity, the sign of rst derivatives, zero restrictions, and the direction of causality. Sophisticated statistical techniques have often been used in this context. Since, according to Summers among others, the outcomes of these exercises have generally not been considered convincing or interesting enough to change economists views, there seems to be a need for a critical appraisal of econometric practice in this context. There are several explanations why empirical results are often considered unpersuasive. First, not enough care is taken to ensure that the specication of empirical models mimics the general characteristics of the data. If the empirical model is based on a valid probability formulation of the data, then statistical hypothesis testing is straightforward, and valid procedures can be derived by analyzing the likelihood function. In this case, inferences about the specied hypotheses are valid, though valid inference about the theory as such depends on the strength of the correspondence between the true and the theoretical variables. Second, not enough attention is paid to the crucial role of the ceteris paribuseverything else constant assumption when the necessary information
set is specied. Third, the role of rational expectations as a behavioral assumption in empirical models has been very unconvincing! Fourth, the available measurments do not correspond closely enough to the true values of the theoretical variables. For example, in many applications it is assumed that certain (linearly transformed) VAR residuals can be interpreted as structural shocks. This would require the VAR residuals to be invariant to changes in the information set, which is not likely to be the case in practical applications. These issues will be further discussed in the next chapter and related to some general principles for VAR-modelling with nonstationary data. But, rst the concept of a design of experiment in econometrics as discussed by Haavelmo has to be introduced.
1.5
Experimental design in macroeconomics
As discussed above, the link between the true variables suggested by the theory and the actual measurements taken from ocial statistics is, in most cases, too weak to justify valid inference from the empirical model to the theory. Even in ideal cases, when the ocial denition of the aggregated variable, say liquid money stock M1, corresponds quite closely to the true measurements, there are usually measurement problems. High-quality, reasonably long aggregated series are dicult to obtain because denitions change, new components have entered the aggregates and new regulations have changed the behavior. The end result is the best set of measurements in the circumstances, but still quite far from the true measurements of an ideal situation. This problem is discussed in terms of a design of experiment in Haavelmo, p. 14: ...If economists would describe the experiments they have in mind when they construct the theories they would see that the experiments they have in mind may be grouped into two dierent classes namely, (1) experiments that we should like to make to see if certain real economic phenomena when articially isolated from other inuences would verify certain hypothesis and (2) the stream of experiments that nature is steadily turning out from his own enormous laboratory, and which we merely watch as passive observers. ...... In the rst case we can make agreements or disagreements between theory
1.5. EXPERIMENTAL DESIGN IN MACROECONOMICS
and facts depend on two things: the facts we choose to consider and our theory about them. ..... In the second case we can only try to adjust our theories to reality as it appears before us. And what is the meaning of a design of experiment in this case. It is this: We try to choose a theory and a design of experiments to go with it, in such a way that the resulting data would be those which we get by passive observation of reality. And to the extent that we succeed in doing so, we become masters of reality by passive agreement. Since the primary interest here is in the case of passive observations, we will restrict ourselves to this case. What seems to be needed is a set of assumptions which are general enough to ensure a statistically valid description of typical macroeconomic data, and a common modelling strategy to allow questions of interest to be investigated in a consistent framework. Chapter 3 will discuss under which conditions the VAR model can work as a reasonable formalization of a design of experiment for data by passive observation. Controlled experiments are not usually possible within a single macroeconomy and the only possibility to test hypothetical relationships is to wait for new data which were not used to generate the hypothesis. Another possibility is to rely on the experiments provided by other countries or regions that dier in various aspects with regard to the investigated economic problem. For instance, if the question of interest is whether an expansion of money supply generally tends to increase the ination rate, it seems advisable to examine this using similar data from countries that dier in terms of the pursued economic policy. In the rest of the book we will discuss the applicability of the cointegrated VAR model as a common modelling strategy. We will argue that this model in spite of its simplicity oers a potential richness in the specication of economically meaningful short- and long-run structures and components, such as steady-state relations and common trends, interaction and feed-back eects. Even more importantly, in the unrestricted general form the VAR model is essentially only a reformulation of the covariances of the data. Provided that these have remained approximatively constant over the sample period the VAR can be considered a convenient summary of the stylized facts of the data. To the extent that the true economic model underlying behavior satises a rst order linear approximation (see Henry and Richard, 1983), one can test economic hypotheses expressed as the number of autonomous permanent shocks, steady-state behavior, feed-back and interaction eects, etc. within a statistically valid framework. This is essentially the general-
10
to-specic approach described in Hendry and Mizon (1993), subsequently evaluated in Hoover and Perez (1999) and recently implemented as an automatic selection procedure in PcGets (Hendry and Krolzig, 2003). The nal empirical model should, in the ideal case, be structural both in the economic and statistical sense of the word. The VAR procedure is less pretentious about the prior role of a theoretical economic model, but it avoids the lack of empirical relevance the theory-based approch has often been criticized for. Since the starting point is the timeseries structure of the chosen data, it is often advantageous to search for structures against the background of not just one but a variety of possibly relevant theories. In this sense the suggested approach is a combination of inductive and deductive inference.
1.6
On the choice of empirical example
Throughout the book we will illustrate the methodological arguments with empirical analyses of a Danish money demand data. These data have been extensively analyzed by myself and Sren Johansen both as an inspiration for developing new test procedures and as a means to understand the potential use of the cointegrated VAR model. See for example Johansen and Juselius (1990) and Juselius (1993a). Given the above discussion one should ask whether there is a close enough correspondence between observed variables and the true versus theoretical variables for example as given by the Baumol model? The answer is clearly no. A detailed examination of the measurements reveal that essentially all the usual problems plague the data. For instance, new components have entered the aggregate, banking technology has changed, the time interval between the measurements is too broad, and so on. From these observed variables, it would be very hard to justify inference from the empirical analysis to a specic transactions demand-for-money theory. The main reason why these variables were selected is simply because they were being used by the Central Bank of Denmark as a basis for their policy analysis. In that sense, the data have been collected because of an interest in the variables for their own sake. This choice does not exclude the possibility that these variables are closely related to the theoretical variables, nor that the empirical investigation might suggest other more relevant measurements for policy analysis. For example, LaCour (1993) demonstrated that long-run
1.6. ON THE CHOICE OF EMPIRICAL EXAMPLE
11
price homogeneity was more strongly accepted when a weighted average of components of dierent liquidity was used as a proxy for the transactions demand for money. Looking in the rear mirror, after having applied the cointegrated VAR model to many other data sets, it is obvious that we were very fortunate to begin our rst cointegration attempts using this data set. It turned out to be one of the rare data sets describing long-run relationships which have remained remarkable stable over the last few decades. However, it should be pointed out that this was true only after having econometrically accounted for a regime shift in 1983 as a consequence of deregulating capital movements in Denmark. In this sense the Danish data were also able to illustrate a most important nding: the sensitivity of the VAR approach to empirical shifts in growth rates and in equilibrium means. Many of the theoretical advances were directly inuenced by empirical analyses of this data set. In the rst applied paper (Johansen and Juselius, 1990), a vector of real money, real income, and two interest rates was analyzed and only one cointegration relation was found. However, this was based on an implicit assumption of long-run price homogeneity. A need to test this properly resulted in the development of the I(2) analysis (Johansen, 1992, 1995, 1997). The latter approach demonstrated that the original specication in real variables were misspecied without including the ination rate. Juselius (1993) tested the long-run price homogeneity assumption in the I(2) model and found that it was accepted. Re-estimating the model with ination rate correctly included as a new variable showed that ination rate was a crucial determinant in the system, which strongly aected the interpretation of the steady-state relations and the dynamic feed-back eects within the system. Including ination rate in the vector produced one more cointegration relation (a relation between the two interest rates and ination) in addition to the previously found money demand relation. When the number of potentially interesting variables is large, so is the number of cointegrating relations. In this case, it is often a tremendously dicult task to identify them. The number of possible combinations is simply too large. If, instead, more information is gradually added to the analysis, it is possible to build on previous results and, thus, not to loose track so easily. The idea of gradually increasing the data vector of the VAR model, building on previously found results, was also inuenced by empirical analyses of the Danish data. Juselius (1992a) added the loan rate to the previously used information set and found one additional cointegration relation between the
12
loan rate and the deposit rate. This approach was then further developed in Juselius (1992b), where three dierent sectors of the macroeconomy were analyzed separately and then combined into one model. Thus, contrary to the general to specic approach of the statistical modelling process (i.e. of imposing more and more restrictions on the unrestricted VAR), it appeared more advantageous to follow the principle of specic to general in the choice of information set. The representation and statistical analysis of the I(2) model meant a major step forward in the empirical understanding of the mechanisms determining nominal growth rates in Denmark. Juselius (1993) analyzed common trends in both the I(2) and I(1) models, and found that the stochastic trend component in nominal interest rates seemed to have generated price ination. This was clearly against conventional wisdom that predicted the link to be the other way around. Needless to say, such a result could not convince economists as long as it stood alone. Therefore, a similar design was used in a number of other studies (ref.) based on data from various European countries (with essentially the same conclusions!) The experience of looking at various economies characterized by dierent policies through the same spectacles generated the idea that the VAR approach could possibly be used as a proxy for a designed experiment in a situation where only data by passive observations were available. Todays academics living under an increasingly strong pressure to publish or perish can seldom aord to spend time on writing computer software. It is no coincidence that the most inuential works in econometrics in the last decades were those which combined theoretical results with the development of the necessary computer software. To meet the demand for empirical applicability, all test and estimation procedures discussed in this book are readily implemented as user-friendly menu-driven programs in, for example, CATS for RATS (Hansen and Juselius, 1995) or in PcGive (Doornik and Hendry, 2002). An urge to understand more fully why this approach frequently produced results that seemed to add a question mark to conventional theories and beliefs was the motivation for writing the next chapter. It focuses on what could be called the economists approach as opposed to the statisticians approach to macroeconomic modelling, distinguishing between models and relations in economics and econometrics. The aim is to propose a framework for discussing the probability approach to econometrics as contrasted to more traditional methods. It draws heavily on Juselius (1999).
Chapter 2 Models and Relations in Economics and Econometrics

In Chapter 1 we discussed Haavelmos probability approach to empirical macroeconomics and the need to formulate a statistical model based on a stochastic formulation of the process that has generated the chosen data. Because most macroeconomic data exhibit strong time dependence, it is natural to formulate the empirical model in terms of time dependent stochastic processes. The organization of this chapter is as follows: Section 2.1 discusses in general terms the VAR approach as contrasted to a theory-based model approach. Section 2.2 briey considers the treatment of ination and monetary policy in Romer (1996) with special reference to the equilibrium in the money market. Section 2.3 discusses informally some empirical and theoretical implications of unit roots in the data. Section 2.4 addresses more formally a stochastic formulation based on a decomposition of the data into trends, cycles, and irregular components. Section 2.5 gives an empirical motivation for treating the stochastic trend in nominal prices as I(2), and Section 2.6 as I(1).
2.1
The VAR approach and theory based models
The vector autoregressive (V AR) process based on Gaussian errors has frequently been a popular choice as a description of macroeconomic time series 13
14
CHAPTER 2. MODELS AND RELATIONS
data. There are many reasons for this: the VAR model is exible, easy to estimate, and it usually gives a good t to macroeconomic data. However, the possibility of combining long-run and short-run information in the data by exploiting the cointegration property is probably the most important reason why the V AR model continues to receive the interest of both econometricians and applied economists. Theory-based economic models have traditionally been developed as nonstochastic mathematical entities and applied to empirical data by adding a stochastic error process to the mathematical model1 . As an example of this approach we will use the macroeconomic treatment in Ination and Monetary Policy, Chapter 9 in D. Romer (1996): Advanced Macroeconomics. From an econometric point of view the two approaches are fundamentally dierent: one starting from an explicit stochastic formulation of all data and then reducing the general statistical (dynamic) model by imposing testable restrictions on the parameters, the other starting from a mathematical (static) formulation of a theoretical model and then expanding the model by adding stochastic components. For a detailed methodological discussion of the two approaches, see for example Gilbert (1986), Hendry (1995), Juselius (1993), and Pagan (1987). Unfortunately, the two approaches have shown to produce very dierent results even when applied to identical data and, hence, dierent conclusions. From a scientic point of view this is not satisfactory. Therefore, we will here attempt to bridge the gap between the two views by starting from some typical questions of theoretical interest and then show how one would answer these questions based on a statistical analysis of the V AR model. Because the latter by construction is bigger than the theory model, the empirical analysis not only answers a specic theoretical question, but also gives additional insight into the macroeconomic problem. A theory model can be simplied by the ceteris paribus assumptions everything else unchanged, whereas a statistically well-specied empirical model has to address the theoretical problem in the context of everything else changing. By imbedding the theory model in a broader empirical framework, the analysis of the statistically based model can provide evidence of possible pitfalls in macroeconomic reasoning. In this sense the V AR analysis can be useful for generating new hypotheses, or for suggesting modications of too narrowly specied theoretical models. As a convincing illustration see
1
Dynamic general equilibrium models
2.2. INFLATION AND MONEY GROWTH
15
Homan (2001?). Throughout the book we will illustrate a variety of econometric problems by addressing questions of empirical relevance based on an analysis of monetary ination and the transmission mechanisms of monetary policy. These questions have been motivated by many empirical V AR analyses of money, prices, income, and interest rates and include questions such as: How eective is monetary policy when based on changes in money stock or changes in interest rates? What is the eect of expanding money supply on prices in the short run? in the medium run? in the long run? Is an empirically stable demand for money relation a prerequisite for monetary policy control to be eective? How strong is the direct (indirect) relationship between a monetary policy instrument and price ination? Based on the V AR formulation we will demonstrate that every empirical statement can, and should, be checked for its consistency with all previous empirical and theoretical statements. This is in contrast to many empirical investigations, where inference relies on many untested assumptions using test procedures that only make sense in isolation, but not in the full context of the empirical model.
2.2
Ination and money growth
A fundamental proposition in macroeconomic theory is that growth in money supply in excess of real productive growth is the cause of ination, at least in the long run. Here we will briey consider some conventional ideas underlying this belief as described in Chapter 9 by Romer (1996). The well-known diagram illustrating the intersection of aggregate demand and aggregate supply provides the framework for identifying potential sources of ination as shocks shifting either aggregate demand upwards or aggregate supply to the left. See the upper panel of Figure 2.1. As examples of aggregate supply shocks that shift the AS curve to the left Romer (1996) mentions; negative technology shocks, downward shifts in labor supply, upwardly skewed relative-cost shocks. As examples of aggregate demand shocks that shift the AD curve to the right he mentions; increases in money stock, downward shifts in money demand, increases in government purchases. Since all these types of shocks, and many others, occur quite
16
frequently there are many factors that potentially can aect ination. Some of these shocks may only inuence ination temporarily and are, therefore, less important than shocks with a permanent eect on ination. Among the latter economists usually emphasize changes in money supply as the crucial inationary source. The economic intuition behind this is that other factors are limited in scope, whereas money in principle is unlimited in supply. More formally the reasoning is based on money demand and supply and the condition for equilibrium in the money market: M/P = L(R, Y r ), LR < 0, Ly > 0.
(2.1)
where M is the money stock, P is the price level, R the nominal interest rate, Y r real income, and L() the demand for real money balances. Based on the equilibrium condition, i.e. no changes in any of the variables, Romer (1996) concludes that the price level is determined by: P = M/L(R, Y r )
(2.2)
The equilibrium condition (2.1) and, hence (2.2), is a static concept that can be thought of as a hypothetical relation between money and prices for xed income and interest rate. The underlying comparative static analysis investigates the eect on one variable, say price, when changing another variable, say money supply, with the purpose of deriving the new equilibrium position after the change. Thus, the focus is on the hypothetical eect of a change in one variable (M) on another variable (P ), when the additional variables (R and Y r ) are exogenously given and everything else is taken account of by the ceteris paribus assumption. However, when time is introduced the ceteris paribus assumption and the assumption of xed exogenous variables become much more questionable. Neither interest rates nor real income have been xed or controlled in most periods subject to empirical analysis. Therefore, in empirical macroeconomic analysis all variables (inclusive the ceteris paribus ones) are more or less continuously subject to shocks, some of which permanently change the previous equilibrium condition. In this sense an equilibrium position is always related to a specic time point in empirical modelling. Hence, the static equilibrium concept has to be replaced by a dynamic concept, for instance a steady-state
2.2. INFLATION AND MONEY GROWTH
17
position. For an equilibrium relation time is irrelevant whereas a steady-state relation without a time index is meaningless. In a typical macroeconomic system new disturbances push the variables away from steady-state, but the economic adjustment forces pull them back towards a new steady-state position. To illustrate the ideas one can use an analogy from physics and think of the economy as a system of balls connected by springs. When left alone the system will be in equilibrium, but pushing a ball will bring the system away from equilibrium. Because all balls are connected, the shock will inuence the whole system, but after a while the eect will die out and the system is back in equilibrium. In the economy, the balls would correspond to the economic variables and the springs to the transmission mechanisms that describe how economic shocks are transmitted through the system. As we know, the economy is not a static entity. Instead of saying that the economy is in equilibrium it is more appropriate to use the word steady-state. Hence, we need to replace the above picture with a system where the balls are moving with some controlled speed and by pushing a ball the speed will change and inuence all the other balls. Left alone the system will return to the controlled state, i.e. the steady state. However, in real applications the adjustment back to steady-state is disturbed by new shocks and the system essentially never comes to rest. Therefore, we will not be able to observe a steady-state position and the empirical investigation has to account for the stochastic properties of the variables as well as the theoretical equilibrium relationship between them. See the lower panel of Figure 1 for an illustration of a stochastic steady-state relation. In (2.1) the money market equilibrium is an exact mathematical expression and it is straightforward to invert it to determine prices as is done in (2.2). The observations from a typical macroeconomic system are adequately described by a stochastic vector time series process. But in stochastic systems, inversion of (2.1) is no longer guaranteed (see for instance Hendry and Ericsson, 1991). If the inverted (2.1) is estimated as a regression model it is likely to result in misleading conclusions. The observed money stock can be demand or supply determined or both, but it is not necessarily a measurement of a long-run steady-state position. This raises the question whether it is possible to empirically identify and estimate the underlying theoretical relations. For instance, if central banks are able to eectively control money stock, then observed money holdings are likely to be supply determined and the demand for money has to adjust to the supplied quantities. This is likely to be the case in trade and capital
18
1.5 AD
AS An equilibrium position
.5
.1
Deviations from money steady-state: m-p-y-14(Rm-Rb)
-.1
1975
1980
1985
1990
1995
Figure 2.1: An equilibrium position of the AD and AS curve (upper panel) and deviations from an estimated money demand relation for Denmark: (m p y)t 14.1(Rm Rb ) (lower panel). regulated economies or, possibly, in economies with exible exchange rates, whereas in open deregulated economies with xed exchange rates central banks would not in general be able to control money stock. In the latter case one would expect observed money stock to be demand determined. Under the assumption that the money demand relation can be empirically identied, the statistical estimation problem has to be addressed. Because macroeconomic variables are generally found to be nonstationary, standard regression methods are no longer feasible from an econometric point of view. Cointegration analysis specically addresses the nonstationarity problem and is, therefore, a feasible solution in this respect. The empirical counterpart of (2.1) (with the opportunity cost of holding money, R = Rb Rm ) can be written as a cointegrating relation, i.e.: ln(M/P Y r )t L(Rb Rm )t = vt (2.3)
where vt is a stationary process measuring the deviation from the steady-state
2.3. THE TIME DEPENDENCE OF MACROECONOMIC DATA
19
position at time t. The stationarity of vt implies that whenever the system has been shocked it will adjust back to equilibrium. This is illustrated in Figure 2.1 (lower panel) by the graph of the deviations from an estimated money demand relation based on Danish data with the opportunity cost of holding money being measured by Rb Rm (Juselius, 1998b). Note the large equilibrium error at about 1983, as a result of removing restrictions on capital movements and the consequent adjustment back to steady-state. However, empirical investigation of (2.3) based on cointegration analysis poses several additional problems. Although in a theoretical exercise it is straightforward to keep some of the variables xed (the exogenous variables), in an empirical model none of the variables in (2.1), i.e. money, prices, income or interest rates, can be assumed to be xed (i.e. controlled). The stochastic feature of all variables implies that the equilibrium adjustment can take place in either money, prices, income or interest rates. Therefore, the equilibrium deviation vt is not necessarily due to a money supply shock at time t, but can originate from any change in the variables. Hence, one should be cautious to interpret a coecient in a cointegrating relation as in the conventional regression context, which is based on the assumption of xed regressors. In multivariate cointegration analysis all variables are stochastic and a shock to one variable is transmitted to all other variables via the dynamics of the system until the system has found its new equilibrium position. The empirical investigation of the above questions raises several econometric questions: What is the meaning of a shock and how do we measure it econometrically? How do we distinguish empirically between the long run, the medium run and the short run? Given the measurements can the parameter estimates be given an economically meaningful interpretation? These questions will be discussed in more detail in the subsequent sections.
2.3
The time dependence of macroeconomic data
As advocated above, the strong time dependence of macroeconomic data suggests a statistical formulation based on stochastic processes. In this context it is useful to distinguish between: stationary variables with a short time dependence and nonstationary variables with a long time dependence.
20
In practice, it is useful to classify variables exhibiting a high degree of time persistence (insignicant mean reversion) as nonstationary and variables exhibiting a signicant tendency to mean reversion as stationary. However, it is important to stress that the stationarity/nonstationarity or, alternatively, the order of integration of a variable, is not in general a property of an economic variable but a convenient statistical approximation to distinguish between the short-run, medium-run, and long-run variation in the data. We will illustrate this with a few examples involving money, prices, income, and interest rates. Most countries have exhibited periods of high and low ination, lasting sometimes a decade or even more, after which the ination rate has returned to its mean level. If ination crosses its mean level, say ten times, the econometric analysis will nd signicant mean reversion and, hence, conclude that ination rate is stationary. For this to happen we might need up to hundred years of observations. The time path of, for example, quarterly European ination over the last few decades will cover a high ination period in the seventies and beginning of the eighties and a low ination period from mideighties until the present date. Crossing the mean level a few times is not enough to obtain statistically signicant mean reversion and the econometric analysis will show that ination should be treated as a nonstationary variable. This is illustrated in Figure 2.2 where yearly observations of the Danish ination rate has been graphed for 1901-1992 (upper panel), for 19451992 (middle panel), and 1975-1992 (lower panel). The rst two time series of ination rates look mean-reverting (though not to zero mean ination), whereas signicant mean-reversion would not be found for the last section of the series. That ination is considered stationary in one study and nonstationary in another, where the latter is based, say, on a sub-sample of the former might seem contradictory. This need not be so, unless a unit root process is given a structural economic interpretation. There are many arguments in favor of considering a unit root (a stochastic trend) as a convenient econometric approximation rather than as a deep structural parameter. For instance, if the time perspective of our study is the macroeconomic behavior in the medium run, then most macroeconomic variables exhibit considerable inertia, consistent with nonstationary rather than stationary behavior. Because ination, for example, would not appear to be statistically dierent from a nonstationary variable, treating it as a stationary variable is likely to invalidate the statistical analysis and, therefore, lead to wrong economic conclusions.
2.3. THE TIME DEPENDENCE OF MACROECONOMIC DATA
21
.2
Century long inflation
0 .15 .1 .05 0 45 .1
10
20
30
40
50
60
70
80
90
Inflation after 2nd world war
50
55
60
65
70
75
80
85
90
Inflation after 1975
.05 75 80 85 90
Figure 2.2: Yearly Danish ination 1901-92 (upper panel), 1945-92 (middle panel), and 75-92 (lower panel). On the other hand, treating ination as a nonstationary variable gives us the opportunity to nd out which other variable(s) have exhibited a similar stochastic trend by exploiting the cointegration property. This will be discussed at some length in Section 2.5, where we will demonstrate that the unit root property of economic variables is very useful for the empirical analysis of long- and medium-run macroeconomic behavior. When the time perspective of our study is the long historical macroeconomic movements, ination as well as interest rates are likely to show signicant mean reversion and, hence, can be treated as a stationary variable. Finally, to illustrate that the same type of stochastic processes are able to adequately describe the data, independently of whether one takes a closeup or a long-distance look, we have graphed the Danish bond rate in levels and dierences in Figure 2.3 based on a sample of 95 quarterly observations (1972:1-1995:3), 95 monthly observations (1987:11-1995:9), and 95 daily observations (1.5.95-25.9.95). The daily sample corresponds to the little hump
22
daily (level)

daily (difference) .2 0 0 20 40 monthly (level) 10 0 7.5 0 20 40 quarterly (level) 20 15 10 0 20 40 60 80 60 80 -1 100 0 20 40 quarterly (difference) 2 0 -2 100 0 20 40 60 80 100 60 80 100 60 80 100 0 20 40 monthly (difference) 60 80 100
8.5 8
Figure 2.3: Average Danish bond rates, based on daily observations, 1.5.25.9.95 (upper panel), monthly observations, Nov 1987 - Sept 1995 (middle panel), and quarterly observations, 1972:1-1995:3, (lower panel). at the end of the quarterly time series. It would be considered a small stationary blip from a quarterly perspective, whereas from a daily perspective it is nonstationary, showing no signicant mean reversion. Altogether, the three time series look very similar from a stochastic point of view. Thus, econometrically it is convenient to let the denition of long-run or short-run, or alternatively the very long-run, the medium long-run, and the short-run, depend on the time perspective of the study. From an economic point of view the question remains in what sense a unit root process can be given a structural interpretation.
2.4
A stochastic formulation
To illustrate the above questions we will consider a conventional decomposition into trend, T , cycle, C, and irregular component, I, of a typical
2.4. A STOCHASTIC FORMULATION macroeconomic variable. X =T CI
23
Instead of treating the trend component as deterministic, as is usually done in conventional analysis, we will allow the trend to be both deterministic, Td , and stochastic, Ts , i.e. T = Ts Td , and the cyclical component to be of long duration, say 6-10 years, Cl , and of shorter duration, say 3-5 years, Cs , i.e. C = Cl Cs . The reason for distinguishing between short and long cycles is that a long/short cycle can either be treated as nonstationary or stationary depending on the time perspective of the study. As an illustration of long cycles that have been found nonstationary by the statistical analysis (Juselius, 1998b) see the graph of trend-adjusted real income in Figure 4, middle panel. An additive formulation is obtained by taking logarithms: x = (ts + td ) + (cl + cs ) + i (2.4)
where lower case letters indicate a logarithmic transformation. In the subsequent chapters the stochastic time dependence of the variables will be of primary interest, but the linear time trend will also be important as a measure of average linear growth rates usually present in economic data. To give the economic intuition for the subsequent multivariate cointegration analysis of money demand / money supply relations, we will illustrate the ideas in Section 4.1 and 4.2 using the time series vector xt = [m, p, y r , Rm , Rb ]t , t = 1, ..., T, where m is a measure of money stock, p the price level, y r real income, Rm the own interest on money stock, and Rb the interest rate on bonds. All variables are treated as stochastic and, hence, from a statistical point of view need to be modelled, independently of whether they are considered endogenous or exogenous in the economic model. To illustrate the ideas we will assume two autonomous shocks u1 and u2 , where for simplicity u1 is a nominal shock causing a permanent shift in the AD curve and u2 is a real shock causing a permanent shift in the AS curve. This would be consistent with the aggregate supply (AS ) aggregate demand (AD) curve. To clarify the connection between the econometric analysis and the economic interpretation we will rst assume that the empirical analysis is based
24
on a quarterly model of, say, a few decades and then on a yearly model of, say, a hundred years. In the rst case, when the perspective of the study is the medium run, we will argue that prices should generally be treated as I(2), whereas in the latter case, when the perspective is the very long run, prices can sometimes be approximated as a strongly correlated I(1) process. Figure 4 illustrates dierent stochastic trends in the Danish quarterly data set. The stochastic I(2) trend in the upper panel corresponds to trendadjusted prices and the stochastic I(1) trend in the middle panel corresponds to trend-adjusted real income. The lower panel is a graph of pt and describes the stochastic I(1) trend in the ination rate, which is equivalent to the dierenced I(2) trend. Note, however, that ination has been positive over the full sample, meaning that the price level will contain a linear deterministic trend. Thus, a nonzero sample average of ination, p 6= 0, is consistent with a linear trend in the price levels, pt .
Stochastic I(2) trend in prices
.1 0 -.1
1975 .1 .05 0 -.05 1975 .15 .1 .05 1975
1980
1985
1990
1995
Stochastic I(1) trend in real income
1980
1985
1990
1995
Stochastic I(1) trend in inflation
1980
1985
1990
1995
Figure 2.4: Stochastic trends in Danish prices, real income and ination, based on quarterly data 1975:1-1994:4 The concept of a common stochastic trend or a driving force requires a further distinction between:
2.4. A STOCHASTIC FORMULATION
25
an unanticipated shock with a permanent eect (a disturbance to the system with a long lasting eect) an unanticipated shock with a transitory eect (a disturbance to the system with a short duration). To give the non-expert reader a more intuitive understanding for the meaning of a stochastic trend of rst or second order, we will assume that ination rate, t , follows a random walk: t = t1 + t , = t /(1 L) + 0 , t =1,...T. (2.5)
where t = p,t + s,t , consists of a permanent shock, p,t , and a transitory shock, s,t .2 By integrating the shocks, starting from an initial value of ination rate, 0 , we get: t = p,t + p,t1 + p,t2 + ... + p,1 + s,t + s,t1 + s,t2 + ... + s,1 + 0 Pt Pt Pt = i=1 p,i + i=1 s,i + 0 i=1 p,i + s,t + 0 A permanent shock is by denition a shock that has a lasting eect on the level of ination, such as a permanent increase in government expenditure, whereas the eect of a transitory shock disappears either during the next period or over the next few ones. For simplicity we assume that only the former case is relevant here. An example of a transitory price shock is a value added tax imposed in one period and removed the next. In the latter case prices increase temporarily, but return to their previous level after the removal. Therefore, a transitory shock can be described as a shock that occurs a second time in the series but then with opposite sign. Hence, a transitory shock disappears in cumulation, whereas a permanent shock has a long lasting eect on the level. In the summation of the shocks i : t =
t P
i + 0
(2.6)
i=1
P only the permanent shocks will remain and we call t i a stochastic trend. i=1 The dierence between a linear stochastic and deterministic trend is that
Note that in this case an ARIMA(0,1,1) model would give a more appropriate specication provided p and s are white noise processes.
2
26
the increments of a stochastic trend change randomly, whereas those of a deterministic trend are constant over time. The former is illustrated in the middle and lower panels of Figure 2.4. A representation of prices instead of ination is obtained by integrating (2.6) once, i.e.
t P s t PP
pt =
s + p0 =
i + 0 t + p0 .
(2.7)
s=1
s=1 i=1
It appears that ination being I(1) with a nonzero mean, corresponds to prices being I(2) with linear trends. The stochastic I(2) trend is illustrated in the upper part of Figure 2.4. The question whether ination rates should be treated as I(1) or I(0) has been subject to much debate. Figure 2.2 illustrated that the Danish ination rate measured over the last decades was probably best approximated by a nonstationary process, whereas measured over a century by a stationary, though strongly autocorrelated, process. For a description of the latter case, (2.5) should be replaced by: t = t1 + t , = t /(1 L) + 0 , t =1,...T. which becomes:
t P
(2.8)
t =
ti i + 0 = t + t1 + ... + t1 1 + 0
(2.9)
i=1
where the autoregressive parameter is less than but close to one. In this case prices would be represented by:
t P t s PP
pt =
s + p0 =
t P
si i + 0 t + p0
t P
(2.10)
s=1
s=1 i=1
= (1 )1
i=1
i (1 )1
ti i + 0 t + p0
i=1
i.e. by a strongly autoregressive rst order stochastic trend and a deterministic linear trend.
2.4. A STOCHASTIC FORMULATION
27
The dierence between (2.6) and (2.9) is only a matter of approximation. In the rst case the parameter is approximated with unity, because the sample period is too short for the estimate to be statistically dierent from one. In the second case the sample period contains enough turning points for to be signicantly dierent from one. Econometrically it is more optimal to treat a long business cycle component spanning over, say, 10 years as an I(1) process when the sample period is less than, say, 20 years. In this sense the dierence between the long cyclical component cl and ts in (2.4) is that ts is a true unit root process ( = 1), whereas cl is a near unit root process ( 1) that needs a very long sample to distinguish it from a true unit root process. we will argue below that, unless a unit root is given a structural interpretation, the choice of one representation or the other is as such not very important, as long as there is consistency between the economic analysis and the choice. However, from an econometric point of view the choice between the two representations is usually crucial for the whole empirical analysis and should, therefore, be based on all available information. Nevertheless, the distinction between a permanent (long-lasting) and a transitory shock is fundamental for the empirical interpretation of integration and cointegration results. The statement that ination, pt , is I(1) is consistent with inationary shocks being strongly persistent. Statistically this is expressed in (2.6) as ination having a rst order stochastic trend dened as the cumulative sum of all previous shock from the starting date. Whether ination should be considered I(1) or I(0) has been much debated, often based on a structural (economic) interpretation of a unit root. We argue here that the order of integration should be based on statistical, rather than economic arguments. If is not signicantly dierent from one (for example because the sample period is short), but we, nevertheless, treat ination as I(0) then the statistical inference will sooner or later produce logically inconsistent results. However, the fact that a small value of , say 0.80, is often not significantly dierent from one in a small sample, whereas a much higher value of , say 0.98, can dier signicantly from one in a long sample, is likely to give semantic problems when using long-run and short-run to describe integration and cointegration properties. For example, a price variable could easily be considered I(2) based on a short sample, whereas I(1) based on a longer period. Because of the sample split, inference on the cointegration and integra-
28
tion properties will be based on relatively short samples leading to the above interpretational problems. When interpreting the subsequent results we will use the concept of long-run relation to mean a cointegrating relation between I(1) or I(2) variables, as dened above. We will talk about short-run adjustment when a stationary variable is signicantly related to a cointegration relation or to another stationary variable. A necessary condition for a long-run relation to be empirically relevant in the model is short-run adjustment in at least one of the system equations3 . Based on this denition non-cointegrating relations incorrectly included in the model will eventually drop out as future observations become available: a stationary variable cannot signicantly adjust to a nonstationary variable. Because a cointegrating relation does not necessarily correspond to an interpretable economic relation, we make a further distinction between the statistical concept of a long-run relation and the economic concept of a steady-state relation.
2.5
Scenario Analyses: treating prices as I(2)
As an illustration of how the econometric analysis is inuenced by the above assumptions we will consider the following decomposition of the data vector:
3
In this section we will assume that the long-run stochastic trend tP P s in (2.4) can be described by the twice cumulated nominal (AD) shocks, u1i , as in (2.7) and the long cyclical component cl by the once cumulated nomP P inal shocks, u1i , and the once cumulated real (AS) shocks, u2i . This representation gives us the possibility of distinguishing between the long-run PP stochastic trend component in prices, u1i , the medium-run stochastic P trend in P ination, price u1i , and the medium-run stochastic trend in real activity, u2i .
When data are I(2) we have integration and cointegration on dierent levels and the concepts need to be modied accordingly. The need to distinguish between these concepts is moderate here and we refer to Juselius (1999b) for a more formal treatment.
2.5. THE I(2) SCENARIO
29
mt pt r yt Rm,t Rb,t
c11 c21 0 0 0
PP [ u1i ] +
d11 d21 d31 d41 d51
d12 d22 d32 d42 d52
P P u1i + u2i
g1 g2 g3 0 0
[t] + stat.comp. (2.11)
The deterministic trend component, td = t, accounts for linear growth in nominal money and prices as well as real income. Generally, when {g1 6= 0, g2 6= 0, g3 6= 0}, then the level of real income growth rate and nominal growth rates is nonzero consistent with stylized facts in most industrialized P countries. If g3 = 0 and d31 = 0 in (2.11), then u2i is likely to describe the long-run real growth in the economy, i.e. a structural unit root process as discussed in the many papers on the stochastic versus deterministic real growth models. See for instance, King, Plosser, Stock and Watson (1991). If g3 6= 0, then the linear time trend is likely to capture the long-run trend and P u2i will describe the medium-run deviations from this trend, i.e. the long business cycles. The trend-adjusted real income variable in the middle panel of Figure 4 illustrates such long business cycles. For a further discussion, see Rubin (1998). The rst case explicitly assumes that the average real growth rate is zero whereas the latter case does not. Whether one includes a linear trend or not in (2.11) inuences, therefore, the possibility of interpreting the P second stochastic trend, u2i , as a long-run structural trend or not. Conditions for long-run price homogeneity: Let us now take a closer look at the trend components of mt and pt in (2.11): PP P P mt = c11 P P u1i + d11 P u1i + d12 P u2i + g1 t + stat. comp. pt = c21 u1i + d21 u1i + d22 u2i + g2 t + stat. comp. P P
If (c11 , c21 ) 6= 0, then {mt , pt } I(2). If, in addition, c11 = c21 then mt pt = (d11 d21 ) u1i + (d12 d22 ) u2i + (g1 g2 )t + stat.comp.
30
is at most I(1). If {(d11 6= d21 ), (d12 6= d22 )}, then mt and pt are cointegrating from I(2) to I(1), i.e. they are CI(2, 1). If, in addition, (g1 6= g2 ) then real money stock grows around a linear trend. The case (mt pt ) I(1) implies long-run price homogeneity and is a testable hypothesis. Money stock and prices are moving together in the longrun, but not necessarily in the medium-run (over the business cycle). Longrun and medium-run price homogeneity requires {c11 = c21 , and d11 = d21 }, i.e. the nominal (AD) shocks u1t aect nominal money and prices in the same way both in the long run and in the medium run. Because the real P stochastic trend u2i is likely to enter mt but not necessarily pt , testing long- and medium-run price homogeneity jointly is not equivalent to testing (mt pt ) I(0). Therefore, the joint hypothesis is not as straightforward to test as long-run price homogeneity alone. Note that (mt pt ) I(1) implies (mt pt ) I(0), i.e. long-run price homogeneity implies cointegration between price ination and money growth. If this is the case, then the stochastic trend in ination can equally well be measured by the stochastic trend in the growth in money stock. Assuming long-run price homogeneity: In the following we will assume long-run price homogeneity, i.e. c11 = c21 , and discuss various cases where medium-run price homogeneity is either present or absent. In this case it is convenient to transform the nominal data vector into real money (m p) and ination rate p (or equivalently m)4 : mt pt pt r yt Rm,t Rb,t = d11 d21 d12 d22 c21 0 d31 d32 d41 d42 d51 d52 P u1i P + u2i g1 g2 0 g3 0 0 [t] + ...
(2.12)
In (2.12) all variables are at most I(1). The ination rate (measured by P pt or mt ) is only aected by the once cumulated AD trend, u1i , but all the other variables can in principle be aected by both stochastic trends, P P u1i and u2i . The case trend-adjusted (mt pt ) I(0) requires that both d11 = d21 and d12 = d22 , which is not very likely from an economic point of view. A
4
Chapter 14 and 15 will discuss this nominal to real transformation in more detail.
31
r The case (mt pt yt ) I(0), i.e. money velocity of circulation is a stationary variable, requires that d11 d21 d31 = 0, d12 d22 d32 = 0 and g1 g2 g3 = 0. If d11 = d21 (i.e. medium run price homogeneity), d22 = 0 (real stochastic growth does not aect prices), d31 = 0 (medium-run price growth r does not aect real income), and d12 = d32 , then mt pt yt I(0). In this case real money stock and real aggregate income share one common trend, P the real stochastic trend u2i . The stationarity of money velocity, implying common movements in money, prices, and income, is then not inconsistent with the conventional monetarist assumption as stated by Friedman (1970) that ination always and everywhere is a monetary problem. This case, r (mt pt yt ) I(0), has generally found little empirical support (Juselius, 1996, 1998b, Juselius, 2000, Juselius and Toro, 1999). As an illustration see the graph of money velocity in the upper panel of Figure 5. We will now turn to the more realistic assumption of money velocity being I(1). r The case (mt pt yt ) I(1), implies that either {(d11 d21 d31 ) 6= 0 or (d12 d22 d32 ) 6= 0}. It suggests that the two common stochastic trends aect the level of real money stock and real income dierently. A few examples illustrate this: Example 1 : Ination is cointegrating with velocity, i.e.:
P priori, one would expect the real stochastic trend u2i to inuence money stock (by increasing the transactions, precautionary and speculative demands for money) but not the price level, i.e. that d12 6= 0 and d22 = 0.
r mt pt yt + b1 pt I(0),
(2.13)
or alternatively
r (mt pt yt ) + b2 mt I(0).
Under the previous assumptions that d31 , d22 = 0, and d12 = d32 , the I(0) assumption of (2.9) implies that d11 d21 = b1 c21 . If b1 > 0, then (2.13) can be interpreted as a money demand relation, where the opportunity cost of holding money relative to real stock as measured by pt is a determinant of money velocity. On the other hand if b1 < 0 (or b2 > 0), then ination adjusts to excess money, though if | b1 |> 1, with some time lag. In this case it is not possible to interpret (2.13) as a money demand relation.
32
CHAPTER 2. MODELS AND RELATIONS Example 2 : The interest rate spread and velocity are cointegrating, i.e.:
r (mt pt yt ) b3 (Rm Rb )t I(0).
(2.14)
Because (Rm Rb )t I(1), either (d41 d51 ) 6= 0, or (d42 d52 ) 6= 0, or both. In either case the stochastic trend in the spread has to cointegrate with the stochastic trend in velocity for (2.14) to hold. If b3 > 0, then (2.14) can be interpreted as a money demand relation in which the opportunity cost of holding money relative to bonds is a determinant of agents desired money holdings. On the other hand, if b3 < 0 then the money demand interpretation is no longer possible, and (2.14) could instead be a central bank policy rule. Figure 5, the middle panel, shows the interest spread between the Danish 10 year bond rate and the deposit rate, and the lower panel the linear combination (2.14) with b3 = 14. It is notable how well the nonstationary behavior of money velocity and the spread cancels in the linear money demand relation. From the perspective of monetary policy a nonstationary spread suggests that the short-term central bank interest rate can be used as an instrument to inuence money demand. A stationary spread on the other hand signals fast adjustment between the two interest rates, such that changing the short interest rate only changes the spread in the very short run and, hence, leaves money demand essentially unchanged. In a model explaining monetary transmission mechanisms, the determination of real interest rates is likely to play an important role. The Fisher parity predicts that real interest rates are constant, i.e.
m P
Rt = Et (m pt+m )/m + R0 = (pt+m pt )/m + R0 = 1/m
pt+i + R0 (2.15)
i=1
where R0 is a constant real interest rate and Et (m pt+m )/m is the expected value at time t of ination at the period of maturity t + m. If (m pt+m Et m pt+m ) I(0), then the predictions do not deviate from the actual realization with more than a stationary error. If, in addition, (pt (m pt+m )/m) I(0), then Rt pt is stationary. From (2.15) it appears that if (Rm p) I(0) and (Rb p) I(0), then d42 = d52 = 0. Also, if d42 = d52 = 0, then Rm and Rb must be cointegrating, (Rm b4 Rb )t
33
-.8 -1
Money velocity
1975 The interest rate spread .02 .01 1975 .1 0 -.1 1975
1980
1985
1990
1995
1980
1985
1990
1995
Deviations from money steady-state
1980
1985
1990
1995
Figure 2.5: Money velocity (upper panel), the interest rate spread (middel panel), and money demand (lower panel) for Danish data. I(0) with b4 = 1 for d41 = d51 . In this sense stationary real interest rates are both econometrically and economically consistent with the spread and the velocity being stationary. It corresponds to the situation where real income P and real money stock share the common AS trend, Pu2i , and ination and the two nominal interest rates share the AD trend, u1i . This case can be formulated as a restricted version of (2.12): mt pt pt r yt Rm,t Rb,t = 0 d12 c21 0 0 d12 c21 0 c21 0 P P u1i + ... u2i
(2.16)
Though appealing from a theory point of view, (2.16) has not found much empirical support. Instead, real interest rates, interest rate spreads, and money velocity have frequently been found to be nonstationary. This
34
suggests the presence of real and nominal interaction eects, at least over the horizon of a long business cycle. By modifying some of the assumptions underlying the Fisher parity, the nonstationarity of real interest rates and the interest rate spread can be justied. For example, Goldberg and Frydman (2002) shows that imperfect knowledge expectations (instead of rational expectations) is likely to generate an I(1) trend in the interest rate spread. Also if agents systematically mispredict the future ination rate we would expect (m pt+m Et m pt+m ) I(1) and, hence, Rt pt I(1). In this case one would also expect Et {(b pt+b )/b (m pt+m )/m} I(1), and (Rm Rb )t I(1) would be consistent with the predictions from the expectations hypothesis (or the Fisher parity).
2.6
Scenario Analyses: treating prices as I(1)
In this case < 1 in (2.9) implying that ination is stationary albeit strongly autocorrelated. The representation of the vector process becomes: mt pt r yt Rm,t Rb,t c11 c21 0 0 0 d12 d22 d32 d42 d52 g1 g2 g3 0 0
Money and prices are represented by: mt = c11 pt = c21 P P u1i + d12 u1i + d22
P u1i P + u2i P P
[t] + stat.comp.
(2.17)
u2i + g1 t + stat.comp. u2i + g2 t + stat.comp.
If c11 = c21 there is long-run price homogeneity, but (mt pt ) I(1) unless (d12 d22 ) = 0. If d12 6= 0 and d22 = 0, then mt pt = d12 P u2i + (g1 g2 )t + stat.comp.
r If d12 = d32 then (mt pt yt ) I(0). From {mt , pt } I(1) it follows that {p, m} I(0), and real interest rates cannot be stationary unless d42 = d52 = 0.
2.7. CONCLUDING REMARKS
35
Hence, a consequence of treating prices as I(1) is that nominal interest rates should be treated as I(0), unless one is prepared a priori to exclude the possibility of stationary real interest rates. As discussed above, the ination rate and the interest rates have to cross their mean path fairly frequently to obtain statistically signicant meanreversion. The restricted version of (2.17) given below is economically as well as econometrically consistent, but is usually only relevant in the analysis of long historical data sets. mt pt r yt Rm,t Rb,t c11 d12 g1 P g2 c21 0 u 0 d12 P 1i + g3 u2i 0 0 0 0 0 0
[t] + stat.comp.
(2.18)
2.7
Concluding remarks
This chapter focused on the decomposition of a nonstationary time-series process into stochastic and deterministic trends as well as cycles and other stationary components. In the case when some of the variables contain common stochastic trends Sections 4 and 5 showed that the latter can be canceled by taking linear combinations of the former and that these linear combinations potentially can be interpreted as economic steady-state relations. All this was given in a purely descriptive manner without specifying a statistical model consistent with these features. This is the purpose of the next chapter which discusses the properties of aggregated time-series data and point out under which assumptions these data will produce the VAR model.
Chapter 3 The Probability Approach in Econometrics and the VAR

This chapter will (i) dene the basic characteristics of a single time series and a vector process, (ii) derive the VAR model under certain simplifying assumptions on the vector process, (iii) discuss the dynamic properties of the VAR model (iv) and illustrate the concepts with a data set of quarterly observations covering 1975:1-1993:4 on money, income, ination and two interest rates from Denmark. The aim is to discuss under which simplifying assumptions on the vector time series process the VAR model can be used as an adequate summary description of the information in the sample data.
3.1
A single time series process
To begin with we will look at a single variable observed over consecutive time points and discuss its time-series properties. Let xs,t , s = 1, ..., S, t = 1, ..., T describe S realizations of a variable x over T time periods. When S > 1 this could, for example describe a variable in a study based on panel data or it could describe a simulation study of a time series process xt , in which the number of replications are S. Here we will focus on the case when S = 1, i.e. when there is just one realization (x1 , ..., xT ) on the index set T . Since we have just one realization of the random variable xt , we cannot make inference on the shape of the distribution or its parameter values without making simplifying assumptions. We illustrate the diculties with two simple examples in Figures 3.1 and 3.2. 37
38 x(t) 6
CHAPTER 3. THE PROBABILITY APPROACH
r r S QQ S Q Qr Q S x4 S S Srx6 r x2 r
x3
x5
x1
Figure 3.1. E(xt ) = , V ar(xt ) = 2 , t = 1, .., 6 x(t) 6

r r S QQ 3 Q S Qr Q 5 S x4 S 2 S 6 Srx6 4 1 r x2 r
x3
x5
x1
Figure 3.2. E(xt ) = t , V ar(xt ) = 2 , t = 1, .., 6 In the two examples, the line connecting the realizations xt produces the graph of the time-series. For instance in Figure 3.1 we have assumed that the distribution, the mean value and the variance is the same for each xt , t = 1, ..., T . In gure 3.2 the distribution and the variance are identical, but the mean varies with t. Note that the observed time graph is the same
3.1. A SINGLE TIME SERIES PROCESS
39
in both cases illustrating the fact, that we often need rather long time series to be able to statistically distinguish between dierent hypotheses in time series models. To be able to make statistical inference we need: (i) a probability model for xt , for example the normal model (ii) a sampling model for xt , for example dependent or independent drawings For the normal distribution, the rst two moments around the mean are sucient to describe the variation in the data. Without simplifying assumptions on the time series process we have the general formulation for t = 1, ..., T : E(xt ) = = t V ar(xt ) = E(xt t )2 = 2 t,0 Cov(xt , xth ) = E[(xt t )(xth th )] = t,h
h = ... 1, 1, ...
E[x] = E
x1 x2 . . . xT
= 1.0 2.1 3.2 . . .
1 2 . . . T
= 1.1 2.0 3.1 . . . T.T 2 1.2 2.1 3.0 . . . T.T 3 ... 1.T 1 2.T 2 3.T 3 . . . T.0 =
0 Cov[x] = E[x E(x)][x E(x)] = x1 x2 . . . xT
T.T 1
x=
N(, )
40
Because there is just one realization of the process at each time t, there is not enough information to make statistical inference about the underlying functional form of the distribution of each xt , t T and we have to make simplifying assumptions to secure that the number of parameters describing the process are fewer than the number of observations available. A typical assumption in time series models is that each xt has the same distribution and that the functional form is approximately normal. Furthermore, given the normal distribution, it is frequently assumed that the mean is the same, i.e. E(xt ) = , for t = 1, ..., T, and that the variance is the same, i.e. E(xt )2 = 2 , for t = 1, ..., T . We use the following notation to describe (i) time varying mean and variance, (ii) time varying mean and constant variance, (iii) constant mean and variance: (i) xt (ii) xt (iii) xt N(t , 2 ) t N(t , 2 ) N(, 2 ) t = 1, ..., T t = 1, ..., T t = 1, ..., T
For a time series process time dependence is an additional problem that has to be addressed. Consecutive realizations cannot usually be considered as independent drawings from the same underlying stochastic process. For the normal distribution the time dependence between xt and xth , h = ..., 1, 0, 1, ... and t = 1, ..., T can be described by the covariance function. A simplifying assumption in this context is that the covariance function is a function of h, but not of t, i.e. t.h = h , for t = 1, ..., T. If xt has constant mean and variance and in addition h = 0 for all h = 1, ..., T, then is a diagonal matrix and xt is independent of xth for h = 1, ..., t. In this case we say that: xt Niid(, 2 ), where Niid is a shorthand notation for Normally, identically, independently, distributed.
3.2
A vector process
We will now move on to the more interesting case where we observe a vector of p variables. In this case we need additionally to discuss covariances between the variables at time t as well as covariances between t and t h. The
3.2. A VECTOR PROCESS
41
covariances contain information about static and dynamic relationships between the variables which we would like to uncover using econometrics. For notational simplicity xt will here be used to denote both a random variable and its realization. Consider the p 1 dimensional vector xt : xt = x1,t x2,t . . . xp,t , t = 1, ..., T.
We introduce the following notation for the case when no simplifying assumptions have been made: 1,t 2,t . . . 11.h 21.h . . . p1.h 12.h 22.h . . . p2.h ... 1p.h 2p.h . . . pp.h
E[xt ] =
p,t t = 1, ..., T.
= t , Cov[xt , xth ] =
= t.h
where Z is a pT 1 vector. The covariance 1.0 02.1 2.1 2.0 . . 0 . . . . E[(Z )(Z ) ] = . . T 1.T 2 . T.T 1 T.T 2
We will now assume that the same distribution applies for all xt and that it is approximately normal, i.e. xt N(t ,t ). Under the normality assumption the rst two moments around the mean (central moments) are sucient to describe the variation in the data. We introduce the notation : 1 x1 x2 2 (3.1) Z= . , E[Z] = . = . . . . xT T matrix is given by ... 0T 1.T 2 . . . T 1.0 T.1 0T.T 1 0T.T 2 . . . 0 T.1 T.0 =
(T pT p)
42
where t.h = Cov(xt , xth ) = E(xt t )(xth th )0 . The above notation provides a completely general description of a multivariate normal vector time series process. Since there are far more parameters than observations available for estimation, it has no meaning from a practical point of view. Therefore, we have to make simplifying assumptions to reduce the number of parameters. Empirical models are typically based on the following assumptions: t.h = h , for all t T, h = ..., 1, 0, 1, ... t = for all t T. These two assumptions are needed to secure parameter constancy in the VAR model to be dened in Section 3.4. When the assumptions are satised we can write the mean and the covariances of the data matrix in the simplied form: 0 1 2 . . . T 1 01 0 1 ... 0 2 0 1 0 ... 2 ... ... ... 1 0 T 1 . . . 02 01 0
. . .
, =
The above two assumptions for innite T dene a weakly stationary process: Denition 1 Let {xt } be a stochastic process (an ordered series of random variables) for t = ..., 1, 1, 2, .... If E[xt ] = < f or all t, E[xt ]2 = 2 < f or all t, E[(xt )(xt+h )] = .h < f or all t and h = 1, 2, ... then {xt } is said to be weakly stationary. Strict stationarity requires that the distribution of (xt1 , ..., xtk ) is the same as (xt1 +h , ..., xtk +h ) for h = ..., 1, 1, 2, ....
43
An illustration:
The data set introduced here will be used throughout the book to illustrate the many questions and their empirical answers that can be asked within the cointegrated VAR model. As a matter of fact the development of many of the subsequent cointegration procedures were more or less forced upon us as a result of the empirical analyses being performed on this data set. The r data vector is dened by [mr , yt pt Rm,t Rb,t ], t = 1975:1, ..., 1993:4,where t mr = mt pt is a measurement of real money stock at time t, where t mt is the nominal M3 and pt is the implicit deator of the gross national expenditure, GNE, r yt is the real gross national expenditure, GNE, pt is the quarterly ination rate measured by the implicit GNE deator, Rm.t is the average deposit rate as a proxy for the interest yield on M3, Rb,t is the 10 year government bond rate. Figure 3.3 and 3.4 show the graphs of the data in levels and in rst dierences.
.75 .5 .25
Real money stock 1.5 1.4 1.3 1.2 1975 1980 1985 Inflation rate 1990 1995 .03 .025 .02
Real aggregate expenditure
1975
1980 Deposit rate
1985
1990
1995
.05 .025 0 1975 .05 .04 .03 1975 1980 1980
.015 1985 1990 1995 1975 1980 1985 1990 1995
Bond rate
1985
1990
1995
Figure 3.3. The Danish data in levels.
44

.1 .05 0 0 1975 .05 .025 0 -.025 1975 .005 0 -.005 1975 1980 1985 1990 1995
Dib DDpy
.1
Dm
Dry
1980
1985
1990
1995 .005
1975
Did
1980
1985
1990
1995
1980
1985
1990
1995
1975
1980
1985
1990
1995
Figure 3.4. The Danish data in rst dierences. A visual inspection reveals that neither the assumption of a constant mean nor of a constant variance seem appropriate for the levels of the variables, whereas the dierenced variables look more satisfactory in this respect. If the marginal processes are normal then the observations should lie symmetrically on both sides of the mean. This seems approximately to be the case for real money, but for real income, ination rate, and the two interest rates there are some outlier observations. The question is whether these observations are too far away from the mean to be considered realizations from a normal distribution. At this stage it is a good idea to have a look in the economic calendar to nd out if the outlier observations can be related to some signicant economic interventions or reforms. The outlier observation in real income and ination rate appears at the same date and seems related to a temporary removal of the value added tax rate in 1975:4. Denmark had experienced a stagnating domestic demand in the period after the rst oil shock and to boost aggregate activity the government decided to remove VAT for one quarter and gradually put it back again over the next two quarters. The outlier observation in bond rate is related to the lifting of previous restrictions on capital movements and the start of the hard EMS in 1983, whereas the outliers in the deposit rate are related Central Bank interventions. Furthermore, we note that the
45
variability of the ination rate in Figure 3.3. looks higher in the rst part of the sample. Thus, the assumption of constant variance does not seem to be satised in this particular case. These are realistic examples that point at the need to include additional information on interventions and institutional reforms in the empirical model analysis. This can be done by including new variables measuring the eect of institutional reforms, or if such variables are not available by using dummy variables as a proxy for the change in institutions. At the start of the empirical analysis it is not always possible to know whether an intervention was strong enough to produce an extraordinary eect or not. Essentially every single month, quarter, year is subject to some kind of political interventions, most of them have a minor impact on the data and the model. Thus, if an ordinary intervention does not stick out as an outlier, it will be treated as a random shock for practical reasons. Major interventions, like removing restrictions on capital movements, joining the EMS, etc. are likely to have much more fundamental impact on economic behavior and, hence, need to be included in the systematic part of the model. Ignoring this problem is likely to seriously bias all estimates of our model and result in invalid inference. It is always a good idea to start with a visual inspection of the data and their time series properties as a rst check of the assumptions of the V AR model. Based on the graphs we can get a rst impression of whether xi,t looks stationary with constant mean and variance, or whether this is the case for xi,t . If the answer is negative to the rst question, but positive to the next one, we can solve the problem by respecifying the VAR in error-correction form as will be demonstrated in the next chapter. If the answer is negative to both questions, it is often a good idea to check the economic calendar to nd out whether any signicant departure from the constant mean and constant variance coincides with specic reforms or interventions. The next step is then to include this information in the model and nd out whether the intervention or reform had caused a permanent or transitory eect, whether it seems to be additive to the model or whether it fundamentally changed the parameters of the model. In the latter case the intervention is likely to have caused a regime shift and the model would need to be re-specied allowing for the shift in the structure. The econometric modelling of intervention eects will be discussed in more detail in Chapter 6.
46
3.3
Sequential decomposition of the likelihood function
The purpose of this section is to demonstrate (i) that the joint likelihood function P (X; ) can be sequentially decomposed into T conditional probabilities P (xt | xt1,..., x1 ; X0 ,), (ii) that the conditional process (xt | xt1,..., x1 ; X0 ) has a parameterization that corresponds to the vector autoregressive model. First we give a repetition of the simple multiplicative rule to calculate joint probabilities, and the formulas for calculating the conditional and marginal mean and the variance of a multivariate normal vector Y.
Repetition:
*********************************************** An illustration of the multiplicative rule for probability calculations based on four dependent events, A, B, C, andD: P (A B C D) = P (A|B C D)P (B C D) = P (A|B C D)P (B|C D)P (C D) = P (A|B C D)P (B|C D)P (C|D)P (D) Note that a multiplicative formulation has been achieved for the conditional events, even if the events themselves are are not independent. The general principle of the multiplicative rule for probability calculations will be applied in the derivation of conditional and marginal distributions. Consider rst two normally distributed random variables y1,t and y2,t with the joint distribution: Y N(m, S) Y= y1,t y2,t , E[Y] = m1 m2 , Cov y1,t y2,t = s11 s21 s12 s22
The marginal distributions for y1,t and y2,t are given by y1,t N(m1 , s11 ) y2,t N(m2 , s22 )
3.4. DERIVING THE VAR The conditional distribution for y1,t |y2,t is given by (y1,t |y2,t ) N(m1.2 , s11.2 ) where m1.2 = m1 + s12 s1 (y2,t m2 ) 22 = (m1 s12 s1 m2 ) + s12 s1 y2,t 22 22 = 0 + 1 y2,t and s11.2 = s11 s12 s1 s21 22
47
(3.2)
(3.3)
The joint distribution of Y can now be expressed as the product of the conditional and the marginal distribution: P (y , y ; ) = | 1,t{z 2,t } P (y |y ; ) | 1,t {z2,t 1} P (y ; ) {z | 2,t 2} (3.4)
the joint distribution
the conditional distribution
the marginal distribution
**************************************************
3.4
1
Deriving the VAR
The empirical analysis begins with the data matrix X = [x1 , ..., xT ]0 where xt is a (p 1) vector of variables. Under the assumption that the observed data X is a realization of a stochastic process we can express the joint probability of X given the initial value X0 and the parameter value describing the stochastic process: P (X|X0 ; ) = P (x1 , x2 , ..., xT |X0 ; ) For a given probability function maximum likelihood estimates can be found by maximizing the likelihood function. Here we will restrict the discussion to the multivariate normality distribution. To express the joint probability of X|X0 it is convenient to use the stacked process Z0 = x01 , x02 , x03 , ..., x0T NT p (,) dened in (3.1) instead of the (T p) data matrix X. Since
1
This section draws heavily on Hendry and Richard (1983).
48
is T p 1 and is T p T p, there are far more parameters than observations without simplifying assumptions. But even if we impose simplifying restrictions on the mean and the covariances of the process, they are not very informative as such. Therefore, to obtain more interpretable results we will decompose the joint process into a conditional process and a marginal process and then sequentially repeat the decomposition for the marginal process: P (x1 , x2 , x3 , ..., xT |X0 ; ) = P (xT |xT 1 , ..., x1 , X0 ; )P (xT 1 , xT 2 , ..., x1 |X0 ; ) . . . = where X0 = [xt1 , xt2 , ..., x1 , X0 ]. t1 The VAR model is essentially a description of the conditional process It is now possible to see how t and are related to and by using the rules given in Section 3.3. for calculating the mean and the variance of the conditional distribution (3.2)-(3.3). We rst decompose the data into two xt sets, the vector xt and the conditioning set X0 , i.e. X = . Using t1 X0 t1 the notation of Section 3.3 we write the marginal and the conditional process: y1,t = = 0 1 . . . T 1 xt xt1 xt2 . . . x1 0 1 0 ... 1 = = E[xt ] E[xt1 ] E[xt2 ] . . . E[x1 ] 12 22 {xt |X0 } NIDp (t , ). t1
t=1
(3.5)
P (xt |X0 ; ) t1
y2,t
0 1 ... 1
0 T 1 . . . 0 1 0
11 = 21
3.4. DERIVING THE VAR We can now derive the parameters of the conditional model: (xt |X0 ) N(1.2 , 11.2 ) t1 where 1.2 = 1 + 12 1 (X0 2 ) 22 t1 and 11.2 = 11 12 1 21 22
49
(3.6)
(3.7)
The dierence between the observed value of the process and its conditional mean is denoted t : xt t = t Inserting the expression for the conditional mean gives: xt = 1 + 12 1 (X0 2 ) + t 22 t1 1 xt = 1 12 22 2 + 12 1 X0 + t 22 t1 Using the notation: 0 = 1 12 1 2 , [1, 2 , ..., T 1 ] = 12 1 and 22 22 assuming that k+1 , k+2 , ..., T 1 = 0, we arrive at the k0 th order vector autoregressive model: xt = 0 + 1 xt1 + ... + k xtk + t , t = 1, ..., T (3.8)
where t is Niidp (0, ) and x0 , ...xk+1 are assumed xed. If the assumption that X = [x1, x2 , ..., xT ] is multivariate normal (,) is correct then it follows that (3.8): is linear in the parameters has constant parameters has normally distributed errors t . Note that the constancy of parameters depends on the constancy of the covariance matrices 12 and 22 . If any of them change as a result of a reform or intervention during the sample, both the intercept, 0 , and the slope coecients 1 , ..., k are likely to change.
50
3.5
Interpreting the VAR model
We have shown that the VAR model is essentially a reformulation of the covariances of the data. The question is whether it can be interpreted in terms of rational economic behavior and if so whether it could be used as a design of experiment when data are collected by passive observation. The idea, drawing on Hendry and Richard (1983), is to interpret the conditional mean t of the VAR model
t = Et1 (xt | xt1 , ..., xtk ) = 1 xt1 + ... + k xtk ,
(3.9)
as describing agents plans at time t 1 given the available information X0 = [xt1 , ..., xtk ]. According to the assumptions of the VAR model the t1 dierence between the mean and the actual realization is white noise process
xt t = t , t Niidp (0, ).
(3.10)
Thus, the N iid(0, ) assumption in (3.10) is consistent with economic agents which are rational in the sense that they do not make systematic forecast errors when they make plans for time t based on the available information at time t 1. For example, a VAR model with autocorrelated and or heteroscedatic residuals would describe agents that do not use all information in the data as eciently as possible. This is because they could do better by including the systematic variation left in the residuals, thereby improving their expectations about the future. Checking the assumptions of the model, i.e. checking the white noise requirement of the residuals, is not only crucial for correct statistical inference, but also for the economic interpretation of the model as a rough description of the behavior of rational agents. As an illustration Figure 3.5 shows the graphs of the (0,1) standardized residuals from the VAR(2) model based on the Danish data.
3.5. INTERPRETING THE VAR MODEL
51
2 0 -2 1975 2.5 0
res_m3
2 0 -2 1980 1985 1990 1995 2.5 0 1975
res_y
1980
res_deprate
1985
1990
1995
Res_infl
1975 2.5 0 -2.5 1975
1980
Res_bondrate
1985
1990
1995
1975
1980
1985
1990
1995
1980
1985
1990
1995
Figure 3.5. The graphs of the residuals from the VAR(2) model for real money stock, real income, ination, deposit rate and bond rate. The residuals do not look too bad, though a few outlier observations can be found, approximately corresponding to the interventions and reforms discussed above. Unfortunately, in many economic applications the multivariate normality assumption is seldom satised for the VAR in its simplest form (3.8). Since in general the statistical inference is valid only to the extent that the assumptions of the underlying methods are satised, this is potentially a serious problem. Therefore, we have to ask whether it is possible to modify the baseline VAR model (3.8) so that it preserves its attractiveness as a convenient description of the basic properties of the data, while at the same time yielding valid inference. Simulation studies we shown that valid statistical inference is sensitive to the validity of some of the assumptions, like parameter non-constancy, autocorrelated residuals (the higher, the worse!) and skewed residuals, while quite robust to others, like excess kurtosis and residual heteroscedastisity.
52
This will be discussed in more detail in Chapter 5. Whatever the case, direct or indirect testing of the assumptions is crucial for the success of the empirical application. As soon as we understand the reasons why the model fails to satisfy the assumptions it is often possible to modify the model, so that in the end we can start from a statistically well-behaved model. Important tools in this context are: the use of intervention dummies to account for signicant political or institutional events during the sample, conditioning on weakly exogenous variables, checking the measurements of the chosen variables, changing the sample period to avoid fundamental regime shift or splitting the sample into more homogenous periods. How to use these tools will be further discussed in the subsequent chapters.
3.6
The dynamic properties of the VAR process
The dynamic properties of the process can be investigated by calculating the roots of the VAR process (3.8). It is convenient to formulate the VAR as a polynomial in the lag operator L, where Li xt = xti : (I 1 L ... k Lk )xt = Dt + t , (L)xt = Dt + t , (3.11)
where the model has been extended to contain Dt , a vector of deterministic components, such as a constant, seasonal dummies and intervention dummies. The autoregressive formulation is useful for expressing hypotheses on economic behavior, whereas the moving average representation is useful when examining the properties of the process. When the process is stationary, the latter representation can be found directly by inverting the VAR model so that xt , t = 1, ..., T, is expressed as a function of past and present shocks, tj , j = 0, 1, ..,initial values X 0 , possible deterministic components Dt :
3.6. THE DYNAMIC PROPERTIES OF THE VAR PROCESS
53
xt = 1 (L)( Dt + t ), = | (L)|1 a (L)( Dt + t ), = (I + C1 L + C2 L2 ...)( Dt + t ),
(3.12) (3.13) (3.14)
where | (L)| = det( (L)) and a (L) is the adjunct matrix of (L) . Johansen (1995), Chapter 2 gives a recursive formula for Cj = f ( 1 , ..., k ) when the VAR process is stationary. When the VAR process is nonstationary xt is non-invertible and the Cj matrices have to be derived under the assumption of reduced rank. This case will be discussed in Chapter 5.
3.6.1
The roots of the characteristic function
To calculate the roots of the VAR process we consider rst the characteristic polynomial: (z) = I 1 z ... k z k , ((z))1 = | (z)|1 a (z). We will rst illustrate that the roots of | (z)| = 0 summarize important information about the dynamics and the stability of the process. We assume a stationary two-dimensional VAR(2) model: (I 1 L 2 L2 )xt = Dt + t . The characteristic function is: 1.11 1.12 1.21 1.22 2.11 2.12 2.21 2.22
(z) = I = I =
z2
1.11 z 1.12 z 1.21 z 1.22 z
2.11 z 2 2.12 z 2 2.21 z 2 2.22 z 2
1 1.11 z 2.11 z 2 1.12 z 2.12 z 2 1.21 z 2.21 z 2 1 1.22 z 2.22 z 2
54 and
| (z)| = (1 1.11 z 2.11 z 2 )(1 1.22 z 2.22 z 2 ) ( 1.12 z + 2.12 z 2 )( 1.21 z + 2.21 z 2 ), = 1 a1 z a2 z 2 a3 z 3 a4 z 4 , = (1 1 z)(1 2 z)(1 3 z)(1 4 z), i.e. the determinant is a fourth order polynomial in z which gives us four characteristic roots, z1 = 1/1 , ..., z4 = 1/4 , when solving for | (z)| = 0. The characteristic roots contain useful information about the dynamic behavior of the process. This can be illustrated using the simple two-dimensional VAR(2) model: a (L)( Dt + t ) , = (1 1 L)(1 2 L)(1 3 L)(1 4 L) a (L) t + Dt = , (1 2 L)(1 3 L)(1 4 L) (1 1 L)
xt
for t = 1, ..., T. As an example of the dynamic behavior of the process we expand the rst root component (1 1 L)1 ( t + Dt ) = (1 + 1 L + ..)( t + Dt ). Thus, each shock t will dynamically aect both present and future values of the variables in xt . Note that this holds true also for any dummy variable included in Dt . The persistence of the eect depends on the magnitude of |1 | , the larger, the stronger the persistence. It is noteworthy that already the simple two-dimensional VAR(2) model can generate a very rich dynamic pattern in the variables xt as a result of the multiplicity of the roots and the additional dynamics given by the c (L) matrix. A real root j will generate exponentially declining behavior, a complex pair of roots j = real icomplex will generate exponentially declining cyclical behavior. If a real root is lying on the unit circle ( = 1, = 1) it will generate non-stationary behavior, i.e. a stochastic trend in xt . If the modulus of a complex root is one it corresponds to nonstationary seasonal behavior. An example of the latter is the simple fourth order dierence model for quarterly data: (1 L4 ) xt = (1 L)(1 + L)(1 + L2 ) xt = t . We set the characteristic polynomial to zero
3.6. THE DYNAMIC PROPERTIES OF THE VAR PROCESS
55
(1 z)(1 + z)(1 + z 2 ) = 0 and nd the characteristic roots: z1 = 1, z2 = 1, z3 = i, z4 = i.
3.6.2
Calculating the roots using the companion matrix
We will here demonstrate that the roots of the process can be conveniently calculated by rst reformulating the VAR(k) model into the companion AR(1) form and then solving an eigenvalue problem. In this case the eigenvalue solution gives the roots directly as 1 , ..., pk instead of the inverse 1 , ..., 1 obtained by solving the characteristic function. To distinguish 1 pk between the two cases we call the former characteristic roots and the latter eigenvalue roots. The latter are calculated by transforming the VAR(k) model into an AR(1) model based on the companion form. For simplicity we assume k = 2 when we illustrate the procedure. First we rewrite the VAR(2) model in the AR(1) form: xt xt1 1 I 2 0 xt1 xt2 t 0
or more compactly: ex e xt = et1 + et e V = V 1 I
e The roots of the matrix can be found by solving the eigenvalue problem: where V is a kp 1 vector. v1 v2 =
2 0
v1 v2
56 i.e.
v1 = v2 = The solution can be found from: v1 = v1 =
1 v1 + 2 v2 v1
1 v1 + 2 (v1 /) 1 (v1 /) + 2 (v1 /2 )
i.e. the eigenvalues of are the pk roots of the second order polynomial: I 1 1 2 2 = 0 or I 1 z 2 z2 = 0, I z = 0
where z = 1 . Note that the roots of the companion matrix i are the inverse of the roots of the characteristic polynomial. Thus, the solution to
gives the stationary roots outside the unit circle, whereas the solution to = 0 I
gives stationary roots inside the unit circle. To summarize: if the roots of | (z)| , are all outside the unit circle (or alternatively if the eigenvalues of the companion matrix are all inside the unit circle) then {xt } is stationary, if the roots are outside or on the unit circle (alternatively if the eigenvalues are inside or on the unit circle) then {xt } is nonstationary, if any of the roots are inside the unit circle (alternatively if the eigenvalues are outside the unit circle) then {xt } is explosive.
3.6. THE DYNAMIC PROPERTIES OF THE VAR PROCESS Table 3.1: The roots of the VAR(2) model real complex modulus 0.99 0.00 0.99 0.75 0.10 0.76 0.75 -0.10 0.76 0.69 -0.33 0.76 0.69 0.33 0.76 -0.29 -0.40 0.49 -0.29 0.40 0.49 -0.35 0.00 0.35 0.11 -0.28 0.30 0.11 0.28 0.30
57
3.6.3
Illustration
Table 3.1 reports the roots of the VAR(2) model for the Danish data and Figure 3.6 shows them in the unit circle.
There are two real roots, one is almost on the unit circle, the other is negative root (which is likely to be the results of pt being to some extent over-dierenced). All remaining roots come in complex pairs.
58

The eigenvalues of the companion matrix
1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 -1.0
-0.5
0.0
0.5
1.0
Figure 3.6. The pk = 10 roots of the VAR(2) model for the Danish data.
3.7
Concluding remarks
The aim of this chapter was to describe a design of experiment that may have generated data by passive observation for which the VAR model is an appropriate description. On an aggregated level, economic agents were assumed rational in the sense that they learn by past experience and adjust their behavior accordingly, so that their plans do not systematically deviate from actual realizations. Thus, the design of experiment consistent with the Niid assumption of the residuals relies on the assumption that agents make plans based on conditional expectations using the information set {xt1 , Dt }, so that the residual (the unexpected component given the chosen information set) behaves as a normal innovation process. In this framework, the success of the empirical analysis relies crucially on the choice of a sucient and relevant information set of an appropriate sample period and the skillfulness of the investigator to extract economically interesting results from this information. The purpose of the next chapter is to discuss estimation of the unrestricted VAR and some diagnostic tools which can be used when assessing the appropriateness of the chosen model.
Chapter 4 Estimation, Specication and Tests in the Unrestricted VAR

The probability approach in econometrics requires an explicit probability formulation of the empirical model so that a fully specied statistical model can be derived and checked against the data. Assume that we have derived an estimator under the assumption of multivariate normality as demonstrated in the previous chapter. We then take the model to the data and obtain model estimates derived under this assumption. If the multivariate normality assumption is correct the residuals should not deviate signicantly from the Niid assumption. If they do not pass the tests, for example because they are autocorrelated or heteroscedastic, or because the distribution is skewed or leptocurtic, then the estimators may no longer have optimal properties and cannot be considered full information maximum likelihood (FIML) estimators. The obtained parameter estimates (based on an incorrectly derived estimator) may not have any meaning and since we do not know their true properties inference is likely to be hazardous. However, some assumptions are more crucial for the properties of the estimates than others. Therefore, when reporting the various misspecication tests below we will discuss robustness properties against modest violations of the assumptions. Nevertheless, if we are going to claim that our conclusions are based on FIML inference, then we also have to demonstrate that our model is capable of mirroring the full information of the data in a satisfactory way. Before being able to test the assumptions we need to estimate the model and Section 4.1 derives the ML estimator under the null of correct model 59
60
CHAPTER 4. ESTIMATION AND SPECIFICATION
specication. Section 4.2 discusses dierent parametrization of the unrestricted VAR model and illustrates the estimates based on the Danish data. Section 4.3 reports briey some frequently used misspecication tests.
4.1
Likelihood based estimation in the unrestricted VAR
Under the assumption that the parameters = { 1 , 2 , ..., k , } in the VAR model (3.3.6) of Chapter 3 are unrestricted, it can be shown that the simple OLS estimator is identical to the FIML estimator. When the data contain unit roots we need to derive the likelihood estimator subject to reduced rank restrictions. Chapter 7 will give a detailed discussion of how to solve this problem. To simplify notation we rewrite (4.7) in compact form: xt = B0 Zt + t , t = 1, ..., T t Np (0, ) (4.1)
where B0 = [ 1 , 2 , ..., k ], Z0t = [x0t1 , x0t2 , ..., x0tk ] and the initial values X0 = [x00 , x01 , ..., x0k+1 ] are given. For simplicity we assume Dt = 0. We need to derive the equations for estimating B and which can be done by nding the expression for B and for which the rst order derivatives of the likelihood function are equal to zero. We consider rst the log likelihood function 1 1 1X ln L(B, ; X) = T ln(2) T || (xt B0 Zt )0 1 (xt B0 Zt ), 2 2 2 t=1
T
and calculate ln L/B = 0 which gives

T P
t=1
xt Z0t = B0
t=1
so that the FIML estimator for B is: B0 =

t=1 T P
T P
Zt Z0t ,
(xt Z0t )(
t=1
T P
Zt Z0t )1 = MxZ M1 . ZZ
(4.2)
4.1. LIKELIHOOD BASED ESTIMATION IN THE UNRESTRICTED VAR61 Next we calculate ln L/ = 0 which gives the estimator of : = T 1
T P
t=1
(xt B Zt )(xt B Zt )0 = T 1
t=1
The ML estimators (4.2) and (4.3) are identical to the corresponding OLS estimators. We can now nd the maximal value of the (log) likelihood function for the ML estimates B and : 1P T 1 1 0 0 ln Lmax = T ln(2) T ln (xt B Zt )0 1 (xt B Zt ) 2 2 2 t=1 We will now show that ln Lmax = 1 T ln + constant terms. Consider 2 rst: 0 0 (xt B Zt )0 1 (xt B Zt ) = t 10t = u2 + ... + u2 . 1t pt
T P
t0t .
(4.3)
where ui,t = t 1/2 . Let: u2 1t u2 pt
It follows that trace(U) = u2 + ... + u2 . 1t pt Using the rule trace(ABC) = trace(CAB) we have that: h i i h 0 0 0 0 trace (xt B Zt )0 1 (xt B Zt ) = trace (xt B Zt )(xt B Zt )0 1 so that trace(0t 1t ) = trace(t0t 1 ). Summing over T we obtain: trace{T 1 T (xt B Zt )1 (xt B Zt )0 } = trace{T 1 T t0t 1 } t=1 t=1 1 = trace{ } = trace{Ip } = p
0 0
U=
u2 2t
...
62 and
ln Lmax
i.e. apart from some constant terms, the maximum of the log likelihood function is proportional to the log determinant of the residual covariance matrix : 1 = T ln + constant terms 2
1 1 1 = T ln T p T ln(2), 2 2 2
ln Lmax
This result will be used in many of the test procedures discussed below and in the derivation of the maximum likelihood estimator for the cointegrated VAR model in Chapter 7. To be able to test hypotheses on B we have to nd the distribution of the estimates B. We will use the simple VAR(2) model to discuss the asymptotic distribution of B under the assumption of stationarity of the process xt . Next, consider the estimation error of the VAR coecients: i h 0 B0 = 1 , 2 [1 , 2 ] B
(4.4)
First we denote the covariance matrices between xt1 and xt2 V ar(xt1 ) 11 12 Cov(xt1 , xt2 ) . = = Cov(xt2 , xt1 ) 21 22 V ar(xt2 ) Under the stationarity assumption the distribution of (4.4) has the following asymptotic property: T 2 (B B) N(0, 1 ) where =
1
1
(4.5)
1 1 11 12 1 1 21 22
4.1. LIKELIHOOD BASED ESTIMATION IN THE UNRESTRICTED VAR63 To see how the distribution of the unrestricted VAR estimates relates to the corresponding results for the standard regression model we digress briey to the latter.
Repetition:
********************************** The distribution of the linear regression model estimate. y = X + , Niid (0, 2 ) = (X0 X)1 X0 y = (X0 X)1 X0 V ar( ) = 2 (X0 X)1 *********************************** Thus, the VAR results are similar to the linear regression model, except that the design matrix X0 X of the latter is replaced by the 2p 2p covariance matrix M. Note that the asymptotic distribution of the linear regression model coecients are based on the assumption that the design P matrix T 1 X0 X A where A is a constant matrix. When the data have unit roots this is no longer the case and the design matrix when normalized dierently will instead converge towards a matrix of Brownian motions. This will be discussed further in Chapter 8. Assume now that we would like to test the signicance of a single coecient, for example the rst element 1,11 of 1 . We dene two design vectors 0 = [1, 0, 0, ..., 0] and 0 = [1, 0, 0, ..., 0] where is p 1 and is 2p 1, so that 0 B0 = 1,11 . Using (4.5) we can nd the test statistic for the null hypothesis 1,11 = 0, which has a Normal (0,1) distribution. This can be generalized to testing any coecient in B by appropriately choosing the vectors and :
1
T 2 0 B0 (
0 0
1 ) 2
N(0, 1).
(4.6)
64
4.1.1
The estimates of the unrestricted VAR(2) for the Danish data
The unrestricted VAR(2) model was estimated based on the following assumptions: xt = 1 xt1 + 2 xt2 + Dt + t t = 1, ...T, t Np (0, ) (4.7)
where Dt contains three centered seasonal dummies and a constant. The estimates reported in Table 4.1 are calculated in GiveWin by running an OLS regression equation by equation. As discussed above these estimates are ML estimates as long as no restrictions have been imposed on the VAR model. To increase readability we have omitted standard errors of estimates and t ratios. Instead, coecients with a t-ratio greater than 1.9 have been given in bold face. Since Chapter 3 found at least one characteristic root very close to the unit circle in the unrestricted VAR, xt is not likely to be stationary. This implies that the design matrix normalized dierently will no longer converge towards a constant matrix in the limit but, instead, towards a matrix of Brownian motions. In this case the t-ratios are more likely to be distributed as the Dickey-Fullers and should, therefore, not be interpreted as Students t. mr t r yt pt Rm,t Rb,t r mt1 0.59 0.37 0.11 2.59 7.22 r 0.11 0.96 0.23 2.42 1.13 yt1 pt1 + = 0.03 0.04 0.52 0.38 0.29 0.01 0.00 0.00 0.87 0.30 Rm,t1 0.03 0.05 0.01 0.03 1.27 Rb,t1 r mt2 0.21 0.15 0.07 3.41 2.22 r 0.01 0.25 0.07 0.07 2.24 yt2 pT 2 + 0.04 0.09 0.22 1.83 0.43 + 0.01 0.01 0.01 0.13 0.25 Rm,t2 0.02 0.04 0.01 0.03 0.40 Rb,t2 1.0 0.0267 0.25 0.0169 1.0 , = 0.0145 0.20 0.19 1.0 = 0.09 0.05 0.0013 0.02 1.0 0.24 0.09 0.09 0.32 1.0 0.0017
1.t 2.t 3.t 4.t 5.t
4.2. THREE DIFFERENT ECM-REPRESENTATIONS
65
Log(Lmax ) = 1973.1, log = 51.24, R2 (LR) = 0.9998, R2 (LM) = 0.64, F-test on all regressors: F(50,272) =32.5, where R2 (LR) and R2 (LM) will be explained in Section 4.3. An inspection of the estimated coecients reveals more signicant coecients at lag 1 than lag 2. Most of the coecients with large t-ratios are on the diagonal, implying that the variables are highly autoregressive. Only the ination rate, pt , has a negative autoregressive coecient, which is a result of the imposed dierence operator1 . The bond rate seems to be quite important for most of the variables in this system. The residual correlations between equations are generally moderately sized and suggest that the current eects are not likely to be very important in this case. The log likelihood value and log are only informative when compared to another model specication. For example, based on the VAR(1) model we obtained log = 50.19 > 51.24. The R2 (LR) is almost 1.0, which does not imply
that we have explained all variation in the data but, instead, that R2 is an incorrect measure when the variables are trending as they are in the present case. See also the discussion in Section 4.3.2. Finally we report F-tests on the signicance of single regressors. The tests are distributed as F(5,59): r r r mt1 yt1 pt1 Rm,t1 Rb,t1 mr t2 yt2 pt2 Rm,t2 Rb,t2 4.8 14.1 4.8 10.3 22.7 2.8 3.7 1.0 0.9 2.9 We note that the second lag of ination and deposit rate could altogether be omitted from the system.
4.2
Three dierent ECM-representations
The unrestricted V AR model can be given dierent parametrization without imposing any binding restrictions on the model parameters, i.e. without changing the value of the likelihood function. The so called vector (equilibrium)-error-correction model gives a convenient reformulation of (4.7) in terms of dierences, lagged dierences, and levels of the process. There are several advantages of this formulation:
1
For example, if pt = 0.5pt1 + 0.5pt2 then pt = 0.5pt1 + t .
66
CHAPTER 4. ESTIMATION AND SPECIFICATION 1. The multicollinearity eect which typically is strongly present in timeseries data is signicantly reduced in the error-correction form. Dierences are much more orthogonal than the levels of variables. 2. All information about long-run eects are summarized in the levels matrix which can, therefore, be given special attention when solving the problem of cointegration. 3. The interpretation of the estimates is much more intuitive, as the coecients can be naturally classied into short-run eects and long-run eects. 4. We are generally interested in understanding why ination rate, say, changed from the previous to the present period. The ECM-formulation answers this question directly.
We will now discuss three dierent versions of the VAR(k) model represented in the general error-correction form xt = 1 xt1 + 2 xt2 + ... + k1 xtk+1 + xtm + Dt + t (4.8) where m is an integer value between 1 and k. Note that the value of the likelihood function does not change even if we change the value of m. For the Danish data we will assume the lag length k = 2 and report the unrestricted parameter estimates for that choice. Additionally the value of the log likelihood function, some multivariate R2 measures, and F-test of the signicance of the regressors will be reported. The purpose is to illustrate how dierent the estimates can look although it is exactly the same model that has been estimated in all three cases.
(m) (m) (m)
4.2.1
The ECM formulation with m = 1.
The VAR(2) model is specied as: xt = 1 xt1 + xt1 + Dt + t

(1) (1)
(4.9)
where = I 1 2 , and 1 = 2 . In (4.9) the lagged levels matrix has been placed at time t 1.
67
The estimated coecients reported below show that most of the signicant coecients are now in the lagged levels matrix , whereas only four out (1) of 25 coecients in the 1 matrix seem signicant. Among the former, two are on the diagonal and the remaining two describe the big change in inverse velocity, as a result of a reallocation of money holdings when restrictions on capital movements were lifted at 1983, which coincided with a huge drop in the bond rate. In Chapter 5 we will re-estimate the VAR model accounting for this intervention. Altogether, estimating the model exclusively in dierences, i.e. setting xt1 = 0, would not deliver many interesting results. But, including xt1 in the model raises the question of how to handle the nonstationarity problem. Since a stationary process cannot be equal to a nonstationary process, the estimation results can only make sense if xt1 denes stationary linear combinations of the variables. This can be seen by noting that the rst row of xt1 can be reformulated as:
r 0.20(mr 1.1yt1 + 0.9pt1 30.0Rm,t1 + 25.0Rb,t1 ). t1
If the linear combination in the bracket denes a stationary variable then all parts of the rst equation in the system would be stationary and, therefore, balanced. This is, in a nutshell, what cointegration analysis does: it identies stationary linear combinations between nonstationary variables so that an I(1) model can be reformulated exclusively in stationary variables. Our task is to give the stationary linear combinations an economically meaningful interpretation by imposing relevant identifying or over-identifying restrictions on the coecients. For example the above relation may be interpreted as the deviation of observed money holdings from a steady-state money demand relation, mr mr where t1 t1
r mr = 1.1yt1 0.9pt1 + 30.0Rm,t1 25.0Rb,t1 . t
We might now like to test whether real income has a unit coecient, whether the coecient to ination is zero, and whether the interest rate coecients are equal with opposite sign. How to do it, will be discussed in Chapter 9.
68
mr t r yt p2 t Rm,t Rb,t
Log(Lmax ) = 1973.1, log = 51.2, trace correlation = 0.54, R2 (LR) = 0.96, R2 (LM) = 0.42, F-test on all regressors: F(50,272) =5.4. The trace correlation coecient will be described in Section 4.3.2. Note that Log(Lmax ) and log are exactly the same as for the unrestricted VAR in the previous section, demonstrating that from a likelihood point of view the models are identical. Because the residuals are identical in all the ECM representations, all residual tests or information criteria are identical, whereas tests of the signicance of single variables need not be (and often are not). For example, the single F-tests of the lagged variables in differences, distributed as F(5,59), are very dierent when compared to the two subsequent specications, whereas the test values for the lagged variables in levels are identical. This illustrates that the matrix is invariant to linear (m) transformations of the VAR system but not the 1 matrices, which depend on how we choose m. r 2 r r mr t1 yt1 pt1 Rm,t1 Rb,t1 mt1 yt1 pt1 Rm,t1 Rb,t1 2.8 3.7 1.0 0.9 2.9 3.7 4.1 16.0 4.9 4.5
mr 0.20 0.15 0.07 3.41 2.22 t1 r 0.01 0.25 0.07 0.07 2.24 yt1 p2 = 0.04 0.08 + 0.22 1.83 0.43 t1 0.00 0.01 0.00 0.26 0.05 Rm,t1 0.02 0.04 0.01 0.03 0.40 Rb,t1 r mt1 0.20 0.22 0.18 6.01 5.00 r 0.10 0.29 0.31 2.35 1.11 yt1 pt1 + Dt + t 0.01 0.12 1.74 1.46 0.14 + 0.00 0.01 0.00 0.26 0.05 Rm,t1 0.01 0.01 0.02 0.00 0.13 Rb,t1
4.2.2
The ECM formulation with m = 2
The VAR(2) model is now specied so that is placed at xt2 : xt = 1 xt1 + xt2 + Dt + t
(2)
(4.10)

(2)
69
with = I 1 2 and 1 = (I 1 ). Thus, the matrix remains (2) unchanged, but not the 1 matrix. The latter measures the cumulative long(1) run eect, whereas 1 in (4.9) describes pure transitory eects measured by the lagged changes of the variables. While the explanatory power is identical for the two model versions, the estimated coecients and their p-values can vary considerably. Usually many more signicant coecients are obtained with formulation (4.10) than with (4.9). We note that the number of signicant coecients in 1 is larger than in 1 , but that the matrix is unchanged. Thus, many signicant coecients does not necessarily imply high explanatory power, but may as well be a consequence of the parameterization of the model. It illustrates that the interpretation of the estimated coecients in dynamic models is less straightforward than in static regression models. mr t r yt p2 t Rm,t Rb,t mr 0.40 0.36 0.11 2.59 7.22 t1 r 0.11 0.04 0.23 2.42 1.13 yt1 p2 = 0.03 + 0.04 1.52 0.38 0.29 t1 0.01 0.00 0.00 0.13 0.30 Rm,t1 0.03 0.05 0.01 0.03 0.27 Rb,t1 r mt2 0.20 0.22 0.18 6.01 5.00 r 0.10 0.29 0.31 2.35 1.11 yt2 pt2 + Dt + t 0.01 0.12 1.74 1.46 0.14 + 0.00 0.01 0.00 0.26 0.05 Rm,t2 0.01 0.01 0.02 0.00 0.13 Rb,t2
Log(Lmax ) = 1973.1, log = 51.2, R2 (LR) = 0.96, R2 (LM) = 0.42, F-test on all regressors: F(50,272) = 5.4. F-test on single regressors:
r 2 r r mr t1 yt1 pt1 Rm,t1 Rb,t1 mt2 yt2 pt2 Rm,t2 Rb,t2 6.7 5.3 33.9 1.2 4.9 3.7 4.1 16.0 4.9 4.5
We note that the single F-tests, F(5,59), on the lagged variables in dierences have changed. For example, p2 is now highly signicant, whereas t1 it was completely insignicant in the case m = 1.
70
4.2.3
ECM-representation in acceleration rates, changes and levels
Another convenient formulation of the VAR model is in second order dierences (acceleration rates), changes, and levels: 2 xt = 1.1 2 xt1 + ... + 1.k2 2 xt2 + xt1 + xt2 + Dt + t (4.11) where = I 1 ... k1 = I 2 2 3 ... (k 1) k , = I 1 ... k as before, and 1.i = i+2 + ...+ k = 3 + 2 4 + ... + k k . Chapter 15 will show that this formulation is particularly useful when xt contains I(2) variables, but it is in general a convenient representation when the sample contains periods of rapid change, so that acceleration rates (in addition to growth rates) become relevant determinants of agents behavior. The VAR(2) model for the Danish data now becomes: 2 xt = xt1 + xt2 + Dt + t (4.12)
where = I 1 , and = I 1 2 . At rst sight the estimates of the 1 matrix reported below look very dierent from the previous case. A second look shows that the coecients are identical except that a constant factor of -1 has been added to the diagonal elements. Thus, the signicance of the diagonal elements are only a consequence of applying the dierence operator once more to xt . Therefore, it may be more meaningful to test whether the diagonal elements are signicantly dierent from -1 (or from -2 for the ination rate) than from zero. 2 mr t r 2 yt 2 pt 2 Rm,t 2 Rb,t mr 1.40 0.36 0.11 2.59 7.22 t1 r 0.11 1.04 0.23 2.42 1.13 yt1 = 0.03 0.04 2.52 0.38 0.29 2 pt1 0.01 0.00 0.00 1.13 0.30 Rm,t1 0.03 0.05 0.01 0.03 0.73 Rb,t1 r mt2 0.20 0.22 0.18 6.01 5.00 yt2 0.10 0.29 0.31 2.35 1.11 r 0.01 0.12 1.74 1.46 0.14 pt2 + 0.00 0.01 0.00 0.26 0.05 Rm,t2 0.01 0.01 0.02 0.00 0.13 Rb,t2
+ + Dt + t
4.2. THREE DIFFERENT ECM-REPRESENTATIONS Log(Lmax ) = 1973.1, log = 51.2, R2 (LR) = 0.999, R2 (LM) = 0.65, F-test on all regressors: F(50,272) =11.5, F-tests on single regressors:
71
r 2 r r mr t1 yt1 pt1 Rm,t1 Rb,t1 mt2 yt2 pt2 Rm,t2 Rb,t2 34.5 24.9 33.9 18.3 20.3 3.7 4.1 16.0 4.9 4.5
The F-tests on the signicance of the rst ve regressors have now obtained very large values, which is just an artifact of the 2 transformation, and they do not really say much about how important the lagged t 1 variables are for explaining the variation in xt .
4.2.4
The relationship between the dierent VAR formulations
We will now evaluate the above model formulations using the characteristic function based on the slightly more general VAR(3) model: xt = 1 xt1 + 2 xt2 + 3 xt3 + Dt + t with the characteristic function: (z) = I 1 z 2 z2 3 z3 , The ECM form of (4.13), with m = 1, is: xt = 1 xt1 + 2 xt2 + xt1 + Dt + t and the characteristic function: (z) = I z 1 (1 z)z (1) (1 z)z2 z, 2
(1) (1) (1) (1) (1)
(4.13)
(4.14)
= I (I + (1) + )z ( (1) 1 )z2 + 2 z3 . 1 2
The relation between the parameters of 4.13) and (4.14) can now be found as:
72
(1)
(1) 2
= 3 , = (I 1 2 3 ).
= ( 2 + 3 ),
The ECM form of (4.13), with m = 3, is: xt = 1 xt1 + 2 xt2 + xt3 + Dt + t and the characteristic function: (z) = I z 1 (1 z)z 2 (1 z)z z3 ,
(3) (3) (3) 2 (3) (3) (3) 2 (3) (3)
(4.15)
= I (I + 1 )z ( 2 1 )z +( 2 )z . The relationship between (4.13) and (4.15) is: 1

(3)
(3) 2
= (I 1 2 ), = (I 1 2 3 ).
(m)
= (I 1 ),
In both cases the matrix is unchanged, but the i chosen lag m of xt in the model.
depends on the
4.3
Misspecication tests
After the model has been estimated the multivariate normality assumption underlying the VAR model can (and should) be checked against the data using the residuals t . In the subsequent sections we will briey discuss some of the test procedures and information criteria contained in CATS in RATS (Hansen and Juselius, 1995) and in GiveWin (Doornik and Hendry, 2002).
4.3.1
Specication checking
It is always useful to begin the specication checking with a graphical analysis. Quite often the graphs reveal specication problems that the tests fail to
4.3. MISSPECIFICATION TESTS
73
discover. Figures 4.1-4.5 show the tted and actual values of xi,t (panel a), the empirical distribution compared to the normal (panel b), the residuals (panel c), and the autocorrelogram of order 20 (panel d). Figure 4.6 shows all the autocorrelations for the full system. The diagonal autocorrelograms are dened by Corr(xit , xith ), i = 1, .., 5, h = 1, ..., 18, i.e. are the same as in Figures 4.1-4.5, panel d, whereas the o diagonal diagrams dene the cross-autocorrelograms Corr(xit , xjth ), i 6= j.
Actual and Fitted for DMO

0.125 0.100 0.075 0.050 0.025 -0.000 -0.025 -0.050 0.10 -0.075 0.05 -0.100 75 78 81 84 87 90 93 0.00 0.45 0.40 0.35 0.30 0.25 0.20 0.15
Histogram of Standardized Residuals

Normal DMO
Standardized Residuals
2.7
1.00 0.75
Correlogram of residuals
1.8
0.50
0.9
0.25 0.00 -0.25
-0.0
-0.9
-0.50 -0.75 -1.00
-1.8
-2.7 75 77 79 81 83 85 87 89 91 93
10
12
14
16
18
Lag
Figure 4.1. Graphs of residuals from the money stock equation.
74
0.100

Actual and Fitted for DFY
0.5

Normal DFY
0.075
0.4
0.050 0.3 0.025 0.2
0.000
-0.025
0.1
-0.050 75 78 81 84 87 90 93 0.0
3
1.00 0.75
2
0.50
0.25 0.00 -0.25
-1
-0.50 -0.75 -1.00
-2
-3 75 77 79 81 83 85 87 89 91 93
10
12
14
16
18
Lag
Figure 4.2. Graphs of residuals from the income equation.
Actual and Fitted for DDIFPY

0.06 0.45 0.40 0.04 0.35 0.02 0.30 0.25 0.20 -0.02 0.15 0.10 0.05 -0.06 75 78 81 84 87 90 93 0.00

Normal DDIFPY
0.00
-0.04
3.2 2.4 1.6
0.25 1.00 0.75 0.50
0.8 -0.0 -0.8 -1.6 -2.4 75 77 79 81 83 85 87 89 91 93
0.00 -0.25 -0.50 -0.75 -1.00 2 4 6 8 10 12 14 16 18
Lag
Figure 4.3. Graphs of residuals from the ination equation.

Actual and Fitted for DIDE
0.006 0.56
75
Normal DIDE
0.004
0.48
0.002
0.40
0.32 0.000 0.24 -0.002 0.16 -0.004 0.08 -0.006 75 78 81 84 87 90 93 0.00
3.2 2.4 1.6 0.8 -0.0 -0.8 -1.6 -2.4
-1.00 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75
-3.2 75 77 79 81 83 85 87 89 91 93
10
12
14
16
18
Lag
Figure 4.4. Graphs of residuals from the deposit rate equation.
Actual and Fitted for DIBO

0.004 0.002 0.4 0.000 -0.002 -0.004 0.2 -0.006 -0.008 -0.010 75 78 81 84 87 90 93 0.0 0.1 0.3 0.5

Normal DIBO
2.7 1.8 0.9
0.25 1.00 0.75 0.50
-0.0 -0.9 -1.8 -2.7 -3.6 75 77 79 81 83 85 87 89 91 93
0.00 -0.25 -0.50 -0.75 -1.00 2 4 6 8 10 12 14 16 18
Lag
Figure 4.5. Graphs of residuals from the bond rate equation.
76

Cross- and autocorrelograms of the residuals
DMO DMO DFY DDIFPY DIDE DIBO
DFY
DDIFPY
DIDE
DIBO
Lags 1 to 19
Figure 4.6. Cross- and autocorrelograms of the full system. The graphs helps us to spot the big value-added residual in the income equation and the big deregulation residual in the bond rate equation. But, even if the graphical analysis can be a powerful tool to detect problems in model specication, it cannot replace formal misspecication tests. The following multivariate and univariate residual tests will be discussed and illustrated based on the Danish data: an LR test and three information criteria tests for the choice of lag length, a trace correlation statistic, a multivariate LM test for rst and fourth order residual autocorrelation, the multivariate Doornik-Hansen test for normality, a univariate ARCH test and a univariate Jarque-Bera normality test. Because the VAR estimates are more sensitive to deviations from normality due to skewness than to excess kurtosis we also report these measures.
4.3.2
Residual correlations and information criteria
The VAR model is often called a reduced form model because it describes the variation in xt as a function of lagged values of the process, but not of current values. This means that all information about current eects in the data is stored in the residual covariance matrix . Because correlations (standardized covariances) are easier to interpret most software programs
77
(inclusive CATS and PcFiml) report correlations instead of covariances. The correlation coecients are calculated as follows: ij ij = p , i = 1, ..., p. ii jj (4.16)
When the correlation coecients and the residual variances (or residual standard deviations) are given it is, thus, straightforward to derive the corresponding covariances. The estimated standardized residual covariance matrix for the Danish data is reported in Section 4.1.1. The residual standard errors, i = ii i = 1, ..., p are reported below: mr y r 2 p Rm Rb 0.0241 0.0153 0.0132 0.0012 0.0016 Note that the residual standard errors, multiplied by 100, can be interpreted as an % error in this case because the variables are in logarithmic changes. When assessing the adequacy of the VAR specication we frequently make use of the maximal likelihood value given by: 2/T ln Lmax = ln || + constant terms For example, when determining the truncation lag k of the VAR model one can use the Likelihood ratio test procedure 2lnQ(Hk /Hk+1 ) = T (ln|k | ln|k+1 |) where Hk is the null hypothesis that k lags are sucient and Hk+1 is the alternative hypothesis that the VAR model needs k + 1 lags. Because the LR test is testing a p p matrix to be zero, the test statistic 2lnQ is asymptotically distributed as 2 with p2 degrees of freedom. The LR test of k = 1 versus k = 2 for the Danish data becomes: 2lnQ(H1 /H2 ) = 77(50.2 + 51.2) = 77, which is distributed as 2 (25) under the null of no signicant coecients at lag 2 in the VAR model. The 2 (25) is approximately 35 and the null is .95 therefore rejected.
78
There are various other test procedures for the determination of the lag length and we will briey discuss three of them, the Akaike, the Schwartz and the Hannan-Quinn information criteria. They are dened by: 2 AIC = ln || + (p2 k) , T ln T SC = ln || + (p2 k) T 2 ln ln T HQ = ln || + (p2 k) , T (4.17) (4.18) (4.19)
All of them are based on the maximal value of the likelihood function with an additional penalizing factor related to the number of estimated parameters. The suggested criteria dier regarding the strength of the penalty associated with the increase in model parameters as a result of adding more lags. The idea is to calculate the test criterium for dierent values of k and then choose the value of k that corresponds to the smallest value. When using these criteria for the choice of lag length it is important to remember that they are valid under the assumption of a correctly specied model. If there are other problems with the model, such as regime shifts and non-constant parameters, then these should be accounted for prior to choosing the lag length. Section 4.1.1 showed that many of the coecients at lag two were insignicant in the Danish data and we may consult the information criteria to nd out whether a VAR(1) would be appropriate. We report the Schwarz and Hannan-Quinn criteria for lag 1, 2, and 3 lag below: k=1 k=2 k=3 Schwarz 47.68 47.30 46.53 Hannan Quinn 48.50 48.58 48.28 The SC criterium suggests k = 1 and the H-Q suggests k = 2. Because the suggested information criteria are based on dierent penalties of estimated coecients they need not produce the same answer and often do not. Checking the other misspecication tests for k = 1 showed that all of them got much worse as compared to k = 2. However, the graphical analysis of the previous chapter suggested the possibility of a regime shift in the model which has not yet been tested for. Before this is done the specication tests remain tentative. At this stage we continue with the VAR(2) model.
79
In the VAR model we can calculate an overall measure of goodness of t, which is similar to the conventional R2 in the linear regression model. In CATS this is called trace correlation. Trace correlation = 1 trace((V(xt ))1 )/p, where V(xt ) is the covariance matrix of xt . In CATS it is calculated using the Rats instruction VCV. It can be roughly interpreted as an average R2 of all the VAR equations. For the Danish data it is 0.54. PcFiml calculates two alternative measures called R2 (LR) and R2 (LM) in the VAR model which are dened in the PcFiml manual, Section 10.8. 2 Finally, R2 for each equation is calculated as Ri = 1ii /V ar(xt )ii , i = 2 1, ..., p. For the Danish data the estimated Ri for the models in ECM form: is mr y r 2 p Rm Rb 0.75 0.36 0.75 0.49 0.44 Note that the R2 values are completely misleading when calculated for the unrestricted VAR(2) in Section 4.1.1 because the dependent variable in this case is a nonstationary, trending variable. The R2 compares the models ability to explain the variation in the dependent variable as compared to the baseline of a constant mean, i.e. R2 = 1 2 /(xi x)2 . When xi,t is i a nonstationary variable, the baseline hypothesis of a constant mean is no longer appropriate. Essentially any variable will do better in this respect and the random walk hypothesis should replace the constant mean as the baseline hypothesis.
4.3.3
Tests of residual autocorrelation

[T /4]
The Ljung-Box test of residual autocorrelation is given by: Ljung Box = T (T + 2) X

h=1
(T h)1 trace(0h 1 0h 1 )
(4.20)
P where h = T 1 T t 0th and the residuals are from the estimated VAR t=h model. The Ljung-Box test is considered to be approximately distributed as 2 with p2 ([T /4] k + 1) p2 degrees of freedom. (See Ljung & Box (1978)
80
and Hosking (1980)). For the Danish data the test becomes 2 (425) = 438.8 (p-value = 0.31). The LM-tests for rst and fourth order autocorrelation are calculated using an auxiliary regression as proposed in Godfrey (1988, Chapter 5). The covariance matrix of the residuals in the auxiliary model is calculated as: T (j) = 0 0 0Lag j x0 0 0Lag j x0 0Lag j x0 0 1 0Lag j x0 , j = 1, 4.
where x0 = [x1 , x2 , ..., xT ], and 0Lag j = [j , j+1, ..., T j ]. The rst j missing values j , ..., 1 are set equal to 0. The LM-test is calculated as Wilks ratio test with a small-sample correction. (See Anderson (1984, section 8.5.2) or Rao (1973, section 8c.5)), ! 1 |(j)| LM(j) = (T p(k + 1) ) ln . (4.21) 2 || The test is asymptotically 2 -distributed with p2 degrees of freedom. This is a fairly important test, partly because the whole VAR philosophy is based on the idea of decomposing the variation in the data into a systematic part describing all the dynamics and an unsystematic random part. If the test suggests that there are signicant autocorrelations left in the model, agents plans based on the conditional VAR expectations would have deviated systematically from actual realizations. The properties of the estimators are also sensitive to signicant autocorrelations, the bigger the worse. The test statistic for the Danish data became LM(1) : 2 (25) = 17.4 (pvalue = 0.86) and LM(4) : 2 (25) = 40.2 (p-value = 0.03).
4.3.4
Tests of residual heteroscedastisity
In CATS the number of lags in the test for ARCH is equal to the number of lags in the model, and the test statistic is calculated as (T k) R2 , where R2 is from the auxiliary regression, 2 it = 0 +
k X j=1
j 2 + error. i,tj
81
The residuals from each equation are tested individually for ARCH eects. For the Danish data the tests became: mr y r 2 p Rm Rb 2.7 4.0 3.5 6.7 0.1 Only the residuals from the deposit rate equation were borderline signicant. However, simulation studies have demonstrated that the (cointegrated) VAR estimates are robust against moderate ARCH eects in the residuals.
4.3.5
Normality tests
The test for normality used in CATS is based on Doornik & Hansen (1994). Because the test is not very well known we explain the calculations in more detail. The multivariate test for normality is the sum of p univariate tests based on system residuals, where system residuals are dened as ut = V1 V0 diag( i 2 )( t ),
1
(4.22)
where is a diagonal matrix of eigenvalues of the correlation matrix of the (original) residuals and V are the eigenvectors, so that (4.22) is a principal components decomposition of the standardized residuals. The system residuals, which are invariant to ane transformations of each variable and reordering of the system, are uncorrelated by construction and thereby independent under the assumption of normality. Testing for normality is done by testing each of the system residual series, and the univariate tests are based on the skewness and kurtosis estimates of the residuals with small sample corrections as proposed in Shenton & Bowman (1977), and a modication of that test as proposed in Doornik & Hansen (1994). The skewness and kurtosis of each series is calculated as, P skewnessi = (4.23) b1i = T 1 T u3 , t=1 it PT 4 (4.24) kurtosisi = b2i = T 1 t=1 uit ,
i = 1, . . . , p. With the small sample approximations, we obtain 2p independent standard normal variables, which are squared and summed to a multivariate omnibus test for normality, so that
p X Normality test = 2 + 2 1i 2i i=1
(4.25)
82
is approximately 2 -distributed with 2p degrees of freedom. For the Danish data the test became 2 (10) = 18.7 (p-value = 0.04). Thus, normality is borderline rejected. To be able to nd out where in the system the non-normality is most pronounced we also need to calculate the univariate Jarque-Bera test for normality, distributed as 2 (2): Jarq.Bera = T (skewness)2 /6 + T (kurtosis 3)2 /24 2 (2) where skewness and excess kurtosis are dened below: mean(i,t ) = T
1 T X t=1 a
i,t = 0 3 4
(4.26) (4.27)
std.dev(i,t ) = i skewness(i,t ) = T 1 kurtosis(i,t ) = T

1
T X t=1 T X t=1
i,t i i,t i
(4.28) (4.29)
The Jarque-Bera test of normality is based on the null hypothesis of normally distributed errors, under which the following applies: (i,t / i )3 N(0, 6) and (i,t / i )4 N(3, 24). Thus, the variance of skewness is smaller than the variance of kurtosis, which means that the normality test is more easily rejected when the empirical distribution is skewed (often because of outliers in the VAR model) than when it is leptocurtic (thick tails or too many small residuals close to the mean). Note also that the Jarque-Bera test is 2 -distributed only asymptotically. This means that the small sample behavior may not follow a 2 distribution very closely. For the Danish data the test results are reported in Table 4.1: We nd that the normality is rejected primarily because of non-normality in the two interest rate equations. This is due to the big deregulation outlier in 1983 for the bond rate and excess kurtosis for both interest rates.
a a
4.4. CONCLUDING REMARKS Table 4.1: Specication tests for the unrestricted VAR(2) model. Univariate normality tests for: mr y r p Rm Rb Jarq.Bera(2) 1.4 1.6 2.2 6.2 5.9 Skewness 0.32 -0.20 0.27 -0.05 -0.40 Kurtosis 3.00 3.28 3.39 4.02 4.07
83
Altogether the formal misspecication tests have conrmed the ndings based on the graphical inspection: multivariate normality is borderline rejected due to non-normality in the two interest rates; there is some seasonal autocorrelation left in the residuals; the deposit rate exhibits moderate ARCH eects.
4.4
Concluding remarks
All parameters of the VAR models (4.7) - (4.11) of Chapter 3 were unrestricted and we demonstrated that OLS equation by equation produced ML estimates. In this chapter the estimates of these models showed that the unrestricted VAR models are heavily over-parametrized. This is consistent with the discussion in Chapter 3 which showed that the VAR model is essentially just a convenient reformulation of the covariances of the data, some of which are very small and statistically insignicant. The generality of the VAR formulation has a cost: adding one variable to a p-dimensional VAR(k) system introduces (2p+1)k new parameters. When the sample is small, typically 50-100 in quarterly macroeconomic models, adding more variables can easily become prohibitive. However, in some cases there can be trade-o between the number of variables in the system, p, and the number of lags, k, needed to obtain uncorrelated residuals. Since an extra lag corresponds to p p additional parameters one can in some cases reduce the total number of VAR parameters by adding a relevant variable to the model. By imposing statistically acceptable restrictions on the VAR model we hope to uncover meaningful economic models with interpretable coecients. Such restrictions are, for example, reduced rank restrictions, zero parameter restrictions and other linear or nonlinear parameter restrictions. This will be the focus of the remaining chapters of this book. Even if the VAR(2) model seemed to provide a fair description of the
84
information in the data, the tests and the graphs suggested some scope for improvements. In Chapter 6 we will discuss the important role of deterministic components as a means to improve model specication.
Chapter 5 The Cointegrated VAR Model

The purpose of this chapter is to introduce the nonstationary VAR model and show that the nonstationarity can be accounted for by a reduced rank condition on the long-run matrix = 0 . Section 5.1 denes the concepts of integration and cointegration and Section 5.2 gives an intuitive interpretation of the reduced rank, r, as the number of stationary long-run relations based on the unrestricted VAR estimates of the Danish data. Section 5.3 discusses the interpretation of the number of unit roots, p r, as the number of common driving trends and shows how they can be related to the VAR model by inverting the AR lag polynomial. Based on the simple VAR(1) model Section 5.4 demonstrates how the parameters of the MA representation are related to the AR parameters. Finally, Section 5.5 concludes the chapter with a discussion of the cointegrated VAR model as a general framework within which one can describe economic behavior in terms of pulling forces towards equilibrium, generating stationary behavior, and pushing forces away from equilibrium, generating nonstationary behavior.
5.1
Dening integration and cointegration
We will now show that the presence of unit roots in the unrestricted VAR model corresponds to nonstationary stochastic behavior which can be accounted for by a reduced rank (r < p) restriction of the long-run levels matrix = 0 . Johansen (1996), Chapter 3, provides a mathematically precise denition of the order of integration and cointegration. Here we only reproduce the basic denitions: 85
86
CHAPTER 5. THE COINTEGRATED VAR MODEL
Denition 1 xt is integrated of order d if xt has the representation (1 L)d xt = C(L)t , where C(1) 6= 0, and t Niid(0, 2 ). Denition 2 The I(d) process xt is called cointegrated CI(d,b) with cointegrating vector 6= 0 if 0 xt is I(d-b), b = 1,...,d, d = 1,... Cointegration implies that certain linear combinations of the variables of the vector process are integrated of lower order than the process itself. As already discussed informally in Chapter 2, cointegrated variables are driven by the same persistent shocks. Thus, if the non-stationarity of one variable corresponds to the non-stationarity of another variable, then there exists a linear combination between them that becomes stationary. Another way of expressing this is that when two or several variables have common stochastic (and deterministic) trends, they will show a tendency to move together in the long-run. Such cointegrated relations, 0 xt , can often be interpreted as long-run economic steady-state relations and are, therefore, of considerable economic interest. In the next section we will give an intuitive account of such relations and how one can nd them in the long-run matrix .
5.2
An intuitive interpretation of = 0
Within the VAR model the cointegration hypothesis can be formulated as a reduced rank restriction on the matrix dened in the previous chapter, Section 4.5. Below we reproduce the VAR(2) model in ECM form with m=1: xt = 1 xt1 + xt1 + + t (5.1)
and give the estimate of the unrestricted 1 matrix for the Danish data (keeping in mind that it was invariant to the choice of model formulation): r mt1 0.20 0.22 0.18 6.01 5.00 yt1 0.10 0.29 0.31 2.35 1.11 r 0.01 0.12 1.74 1.46 0.14 pt1 0.00 0.01 0.00 0.26 0.05 Rm,t1 0.01 0.01 0.02 0.00 0.13 Rb,t1
xt =
Coecients with a |t-ratio| > 0 are given in bold face.
5.2. AN INTUITIVE INTERPRETATION OF = 0
87
If xt I(1), then xt I(0) implying that cannot have full rank as this would lead to a logical inconsistency in (5.1). This can be seen by considering =I as a simple full rank matrix. In this case each equation would dene a stationary variable xt to be equal to a nonstationary variable, xt1 , plus some lagged stationary variables 1 xt1 and a stationary error term. Hence, either = 0, or it must have reduced rank: = 0 where is a p r matrix and is a p r matrix, r p. Thus, under the I(1) hypothesis the cointegrated VAR model is given by: xt = 1 xt1 + ... + k1 xtk+1 + 0 xt1 + + t (5.2)
where 0 xt1 is an r 1 vector of stationary cointegration relations. Under the hypothesis that xt I(1) all stochastic components are stationary in model (5.2) and the system is now logically consistent. In Chapter 4, Section 3, we showed that the rst row of was a linear combination of the ve variables which could be given a tentative interpretation as a (stationary) deviation from a long-run money demand relation, i.e. as an equilibrium error. We will now examine the implications of this for the full VAR model, assuming that r = 1, i.e. among the ve variables there exists only one stationary relation, the money demand relation. Using roughly the estimated coecients of the Danish data 11 = 0.2 and 01 = [1, 1, 0.9, 25.0, 25.0] we can approximately reproduce the rst row of as 11 01 . If, for simplicity, we assume that 1 = 0 the cointegrated VAR model can now be written as: mr t r yt 2 pt Rm,t Rb,t 0.2 a21 a31 a41 a51
{mr y r + 0.9p 25(Rm Rb )}t1 + + t
(5.3)
The question is now how to choose the estimates of 21 , ..., 51 so that the relevant information contained in is preserved. An inspection of the unrestricted matrix shows that the coecients of the second row corresponding
88
to real income might (with some good will) be considered proportional to 01 xt with a21 0.10. (Whether this is so is a testable hypothesis which will be discussed formally in Chapter 9.) None of the remaining rows seems to have coecients even vaguely proportional to 01 xt1 and a31 = a41 = a51 = 0 seems the only possible choice. Thus for r = 1 the best representation of = 0 seems to be: mr t r yt 2 pt Rm,t Rb,t 0.2 0.1 0.0 0.0 0.0
{mr y r + 0.9p 25(Rm Rb )}t1 + + t
(5.4)
Model (5.4) represents an economy where the deviation between agents actual money holdings, mr , and their desired demand for money, mr = t1 t1 y r 0.9p + 25(Rm Rb )}t1 , determines the real money stock, mr , and the t r aggregate level of expenditure, yt . When actual money holdings are above or below the long-run desired level, agents make gradual adjustments of their money holdings (and in their expenditure level) until the level of money stock is back at the steady-state level. Because a21 > 0, real expenditure goes up when there is excess money in the economy. Excess money has no shortrun or long-run impact on ination, nor on the two interest rates. In this simple empirical model it can be shown (see Johansen and Juselius, 2003) that the central bank would only be able to inuence the level of aggregate expenditure by changing money supply but this would have no eect on ination rate. Thus, the specic values of the coecients and can have important implications for whether a chosen policy is eective or not. Under the assumption that a21 = 0, model (5.4) would be equivalent to the single equation error-correction model. Such models have been widely used to estimate money demand relations based on the assumption that a stable relation is a prerequisite for monetary policy control. Needless to say the implications of (5.4) as discussed above suggest that the assumption r = 1 is a too simplistic2 to be relevant as an analytical tool for monetary policy decisions. I will argue that one reason why this is necessarily so is
At this point we will disregard the possibility of extending the information set and how this may change the interpretation of the model.
2
5.2. AN INTUITIVE INTERPRETATION OF = 0
89
that r = 1 presumes p r = 4 common stochastic trends. Given the present information set this seems too many to be consistent with most theoretical assumptions underlying ination and monetary policy control. However, it is only by formulating a model for the full system that we can examine the implications of this seemingly innocent assumption. The choice of r = 1 forced us to impose many restrictions on , such as the proportionality of the rst two rows and the zeros of the remaining rows. Thus, it does not allow for enough exibility in the description of the feed-back dynamics of the long-run structure. For example, assume that the coecient of ination rate, 0.9, in 01 xt is in fact zero, supported by its very small 0 t0 -ratio in the rst row of . With r = 1, p would have to be excluded altogether from the cointegration space, which would leave out the variable of primary interest for monetary policy control. From an econometric point of view such a choice would be contradicted by the signicant 0 t0 -ratio of ination in the third row of . We will now assume that r = 2 and see how this choice can give us more exibility. The question is how to choose the second relation 02 xt1 so that the two cointegration relations together approximately describe the structure of the matrix. An inspection of the rst two rows of the unrestricted shows that the proportionality assumption is probably not valid. Therefore, we can instead let the second cointegration relation describe an IS type of relation between real income, real money stock and the deposit rate with coecients approximately consistent with the second row of the matrix. The r = 2 system can now be represented as: mr t r yt 2 pt Rm,t Rb,t 0.2 0 0 0.3 {(mr y r ) 25(Rm Rb )}t1 0 0 + + t {y r 0.3mr + 7Rd }t1 0 0 0 0 (5.5)
Although, (5.5) can approximately reproduce the relevant information of the rst two rows of , it excludes the ination rate from the long-run relations, which is inconsistent with its highly signicant coecient in the third row of . One possibility is that ination rate is in fact stationary by itself and, therefore, is a cointegration vector by itself. In this case the rank of would
90
have to be increased to r = 3 and the model would become: 0.2 0 0 mr t r yt 0 0.3 0 {(mr y r ) 25(Rm Rb )}t1 2 pt = 0 0 1.74 {y r 0.3mr + 7Rd }t1 (5.6) Rm,t 0 0 0 pt1 0 0 0 Rb,t 1,t 1 2 2,t + 3 + 3,t 4 4,t 5 5,t Model (5.6) is now able to roughly reproduce the data information in xt1 , with the exception of the signicant deposit rate in the fourth row of . The above discussion served the purpose of illustrating that the choice of and has to reproduce the statistical information in the matrix. No testing was performed and the proposed decompositions were, therefore, completely hypothetical. Chapters 8 and 9 will discuss likelihood based test procedures for a wide variety of hypotheses including the above mentioned. However, the choice of and should also ideally describe an interpretable economic structure and provide some empirical insight on the appropriateness of the underlying economic model. The tentatively proposed cointegration relations given above are quite dierent from the monetary theory consistent cointegration relations discussed in Chapter 2 and reproduced below. Since Chapter 2 did not discuss a hypothetical adjustment structure for the cointegration relations, the coecients reported below have been chosen to be roughly consistent with the predictions from monetarists theory models. mr t r yt 2 pt Rm Rb,t a11 a12 0 0 0 a23 a31 0 0 a41 0 0 a51 a52 a53 1 2 3 4 5 1,t 2,t + 3,t 4,t 5,t (5.7)
(m p y r )t1 (Rm Rb )t1 + (Rb p)t1
5.3. COMMON TRENDS
91
It is noteworthy that a11 = 0.20 and a12 5 would approximately reproduce the money demand relation of (5.6). Thus, a row in the matrix can also be found as a linear combination of several cointegration relations. In this sense the illustration of the three cointegration relations in (5.6) is likely to be too simplied. Most empirical models would require a much more complicated cointegration structure and without the help of sophisticated test procedures it would be utterly hard to uncover these structures. This will be the topic of Chapters 9 - 13.
5.3
Common trends and the moving average representation
Chapter 3 showed that the stationary VAR model could be directly inverted into the moving average form. When the VAR model contains unit roots the autoregressive lag polynomial becomes non-invertible. The purpose of this section is to demonstrate how we can nd the moving average representation in this case. For simplicity of notation we focus on the simple VAR(2) model: (L)xt = (I 1 L 2 L2 )xt = + t
We assume now that the characteristic polynomial |(z)| = |I 1 z 2 z 2 | contains a unit root. In this case the determinant |(z)| = 0 and (z) cannot be inverted for z = 1. Therefore, we need to factorize out the unit root component of the lag polynomial of the VAR model. First we move the matrix lag polynomial to the right hand side of the equation for xt : (L)xt = + t xt = 1 (L)( + t )
a
(5.8)
(L) Since 1 (L) = det((L)) and det((L)) is a polynomial in z we multiply both sides of (5.8) with the dierence operator (1 L) so that the non-invertible unit root is cancelled out:
(1 L)xt = 1 (L)(1 L)(t + ) = C(L)(t + ) = (C0 + C1 L + C2 L2 + ...)(t + )
92
where C(L) is now a stationary lag polynomial and hence invertible. C(z) = C0 + C1 z + C2 z 2 + ... The characteristic function of C(L) is now expanded using the Taylor rule, evaluated for z = 1 and reformulated as: C(L) = C(1) + C (L)(1 L). (5.9)
By inserting (5.9) in (5.8) we get:
where C =C(1). This equation can be written as:
(1 L)xt = {C + C (L)(1 L)}(t + )
where Ys is a short-hand notation for the stationary part of the process, C (L)s . Summing for s = 1, ..., t we get: xt = x0 + C = C
t X s=1 t X s=1
xs = xs1 + Cs + C (L)(s s1 ), = xs1 + Cs + Ys Ys1 ,
s + Yt Y0 ,
e Thus, x0 contains both the initial value, x0 , of the process xt and the initial value of the short-run dynamics C (L)0 . Dividing through by (1 L) we obtain: t e xt = C + C (L)t + x0 . 1L We can now formulate the VAR model in moving average form as:
e s + Yt + x0 .
e (5.10) xt = Ct i + C (L)t + x0 , i=1 Pt showing that xt contains stochastic trends C i=1 i , stationary stochastic components C (L)t , and initial values. Thus the VAR model is capable of reproducing a similar trend-cycle-irregular decomposition of the vector process as was informally discussed in Chapter 2.
5.4. FROM AR TO MA
93
5.4
From the AR to the MA representation3
We showed above that the Ci matrices are functions of the i matrices. We will now illustrate how one can nd the Ci matrices when and are known for the simple VAR(1) model. Johansen (1996), Chapter 4 provides a detailed and lengthy derivation of the results for the general VAR(k) model. The interested reader is referred to the discussion there. Although the detailed derivation of the result for the VAR(k) model is more complicated, the general principle can be understood using the VAR(1) model: xt = 0 xt1 + + t , t = 1, . . . , T, (5.11)
with initial value x0 . We consider now the matrix of full rank and dimension p (p r) so that 0 = 0 and rank (, ) = p. As will be discussed in Chapter 12 the matrix is not uniquely dened without imposing identifying restrictions, but the results here apply for any choice of . We will now make use of the following relationship between , , , : (0 )
1
0 + ( 0 )
0 = I.
(5.12)
Using (5.12) we can decompose any vector v in Rp into a vector v1 sp( )and a vector v2 sp().We now apply the results to the p-dimensional vector xt : xt = (0 )1 0 xt + (0 ) = 1 0 xt + 1 xt .
0 1
0 xt (5.13)
Thus, xt can be expressed as a linear combination of 0 xt and the cointegration relations, 0 xt and the next step is to express them as a function of initial values and the errors (t , t1 , ...). We rst pre-multiply equation (5.11) with 0 and then solve for 0 xt to get the equation 0 xt = (I + 0 ) 0 xt1 + 0 + 0 t .
3
This section relies heavily on the SJ Chapter 3.
94
The eigenvalues of the matrix (I + 0 ) are inside the unit circle when the r-dimensional process 0 xt is stationary and it is straightforward to represent 0 xt as a function of I , I = 1, ..., T and the constant :
X i=0
xt =
(I + 0 ) 0 (ti + ) .
(5.14)
An expression for 0 xt is found by pre-multiplying (5.11) with 0 and get the equation 0 xt = 0 t + 0 , which has the solution:
t X i=1
0 xt
0 x0
0 (i + ) .
(5.15)
Inserting (5.14) and (5.15) into (5.13) we obtain the following result:
P P i xt = (0 )1 0 t (i + ) + 0 x0 + (0 )1 (I + 0 ) 0 (ti + i=1 i=0 = C = C Pt Pt

0 i=1 i + Ct + Cx0 + ( ) 1
i=1
i + 1 t + 0 + Yt
i=0
(I + 0 ) 0 (ti + ) .
(5.16)
where C = (0 )1 0 , 1 = C measures the slope of a linear trend in xt , 0 = Cx0 depends on initial values, and Yt is a stationary process. By expressing the VAR(k) model in the companion form it is possible to derive the results for the more general case using the same principle as above. This will not be done here but is left for the interested reader to do as an exercise. The main result relates the C matrix to all the parameters of the VAR(k) model: C = (0 )1 0
(5.17)
5.4. FROM AR TO MA
95
where =I1 ...k1 (See Johansen (1996), Chapter 4). Thus, (5.16) with =I is a special case of (5.17). By expressing (5.17) as: C = 0 where = (0 )1 it is easy to see that the decomposition of the C matrix is similar to the one of the matrix, except that in the AR representation determines the common long-run relations and the loadings, whereas in the moving average representation 0 determines the common stochastic trends and their loadings. Thus, (5.17) together with (5.16) show that the non-stationarity in the process xt originates from the cumulative sum of the p r combinations 0 t i , leading to the following i=1 denition: Denition 3 The common driving trends are the variables 0 Pt
i=1 i .
All the remaining chapters will use results based on both the AR and the MA representation. It is, therefore, important to have a good intuition for the meaning of the orthogonal complements. We will illustrate this with a few simple examples: Representation (5.3): r = 1 corresponds to p r = 4, so that and are 5 4 matrices. For = [0.1, 0.2, 0.0, 0.0, 0.0] we can nd as: 1 2 0 0 0 = 0 0 0 0 Pt P i=1 1,i 0 0 0 t P i=1 2,i 1 0 0 Xt t i=1 i = P i=1 3,i 0 1 0 t P i=1 4,i 0 0 1 t i=1 5,i
(5.18)
It is easy to verify that 0 = 0. We also note that a zero row in corresponds to a unit vector in . Such a variable is called a weakly exogenous variable, which will be further discussed in Chapter 9. Interpreting P 0 t i as an estimate of the p r common stochastic trends shows that i=1 the rst one is a weighted sum of cumulated shocks to real money stock and real income, whereas the next three stochastic trends are equal to the cumulated shocks to ination, the deposit rate and the bond rate, respectively.
96
Note that a linear combination of common stochastic trends are also stochastic trends. Thus, (5.18) is just one of innitely many representations. We will now similarly nd corresponding to 0 = [1, 1, 0 , 25, 25], where for simplicity we have set the coecient to ination to zero. 1 1 0 0 0 0 0 1 0 0 0 0 0 1 1 25 0 0 0 1
To be able to interpret as loadings to the common stochastic trends we would rst need to post multiply by the component (0 )1 . This will done in subsequent chapters. We leave it to the reader to nd the orthogonal complements to and for (5.5) and (5.6).
(5.19)
5.5
4
Pulling and pushing forces
The simple error correction model (5.11) illustrates on one hand how the process is pulled towards steady-state, dened by 0 xt E(0 xt ) = 0, with the force which activates as soon as the process is out of steady-state, dened by 0 xt 6= E(0 xt ). The common trends representation (5.16) illustrates on the other hand how the variables move in a nonstationary manner described P by the common driving trends 0 t i . In this sense the AR and the MA i=1 representation are two sides of the same coin: the pulling and the pushing forces of the system. Figure 5.1 illustrates these forces for a simple bivariate system with x0t = [mt ,yt ], where mt is money stock and yt is income. If the steady state position corresponds, for example, to a constant money velocity m y = , so that 0 = [1, 1], then the attractor set is = [1, 1]. In the picture this is indicated by the 450 line showing that mt = yt in steady state. If 0 xt = (mt yt ) E(0 xt ) 6= 0, then the adjustment coecients will force the process back towards the attractor set with a speed of adjustment that depends on the length of and the size of the equilibrium error 0 xt . The common trend, measured by 0 t i , has pushed money and i=1 income along the line dened by , the attractor set. Thus, positive shocks
4
This section relies heavily on Johansen (1976) Chapter 3.
5.5. PULLING AND PUSHING FORCES mt
97
* Pt 0 i=1 i
HH H H j sp( ) 0 xt
-
yt
Figure 5.1: The process x0t = [mt , yt ] is pushed along the attractor set by the common trends and pulled towards the attractor set by the adjustment coecients to the system will push the process higher up along , whereas negative shocks will move it down. The steady-state positions 0 xt E(0 xt ) = 0 describes a system at rest. At these (long-run) equilibrium points there is no economic adjustment force (incentive) to change the system to a new position. But when new (exogenous) shocks hit the system, causing 0 xt 6= E(0 xt ), the adjustment forces are activated and pull the process back towards E( 0 xt ). Note, however, that the steady-state relations, 0 xt E( 0 xt ), should not be interpreted to mean that these relations are satised in the limit as t . The steady-state position is something that exists at all time points as a rest point towards which the process is drawn when being pushed away. It should be emphasized that the picture is strictly speaking only valid for model (5.11) where the short-term dynamics have been left out. When there is short-run adjustment dynamics in the lagged dierences of the process the situation is more complicated and the simple intuition behind the pulling and pushing forces in the above picture can be misleading. For example, Hansen and Johansen (1998), Exercise ?? gives an example where 0 xt is stationary, which shows that the common trends 0 t i cannot generally i=1 be replaced by 0 xt .
98
CHAPTER 5. THE COINTEGRATED VAR MODEL Hansen and Johansen (1998) also discusses a model with overshooting 1 x1,t = (x1,t1 x2,t1 ) + 1,t , 2 1 x2,t = (x1,t1 x2,t1 ) + 2,t , 4
i.e. a model where the variable x2,t does not error-correct with a coecient of plausible sign, despite the fact that the variables are cointegrating. In this model x2,t is not error-correcting to the equilibrium error (x1,t1 x2,t1 ), but instead is pushing the process further away from steady state. But, since, x1,t is also reacting on the same equilibrium error with a larger error correction coecient, |0.50| > |0.25| , the process is nevertheless stable.
5.6
Concluding discussion
P This chapter has shown that the notion of common trends, 0 t i , and i=1 the notion of cointegrating relations, 0 xt , are two sides of the same coin, as are the loadings coecients, , and the adjustment coecients, . Although we can, of course, choose the representation we prefer, there is, nonetheless, one aspect in which the two concepts dier. A cointegration relation is invariant to changes in the information set, whereas this is not necessarily the case with a common trend. If cointegration holds between a set of variables, then the same cointegration relation will still be found in a larger set of variables. This will be discussed in great detail in Chapter 10. An unanticipated disP turbance, t , dening the common trend 0 t i , is only unanticipated i=1 for the chosen information set. Unless the latter is complete in the sense of comprising all relevant variables, the residual, t , will necessarily contain the eect of omitted variables. Thus an unanticipated t based on a smaller system need no longer be so in a larger system. Generally, both t and change when the information set changes, implying that the denition of a common trend is not invariant to changes in the information set, see Hendry (1995). This will be further discussed in Chapter 11.
Chapter 6 Deterministic Components in the I(1) Model

The purpose of this chapter is to discuss the interpretation of xed eects, such as constant, deterministic trends, and intervention dummies and show how they aect the mean of the dierenced process, E(xt ) and the mean of the equilibrium error process, E(0 xt ). Section 6.1 rst illustrates the dual role of the constant term and the linear trend in a simple dynamic regression model. Section 6.2 then extends the discussion to the more complicated case of the VAR model. Section 6.3 discusses ve cases of dierent restrictions imposed on the trend and the constant in the VAR model. Section 6.4 derives the MA representation when there is a trend and a constant in the VAR model. Section 6.5 discusses the role of three dierent types of dummy variables in a simple dynamic regression model and Section 6.6 extends the discussion to the VAR model. Section 6.7 illustrates.
6.1
A trend and a constant in a simple dynamic regression model
Most economists are familiar with the standard regression model and how to interpret its coecients. When dynamics are introduced, inference and interpretation changes fundamentally and interpretational mistakes are easy to make. We will use a simple univariate model to demonstrate how the interpretation of a linear time trend and a constant term is crucially related to the dynamics of the model, in particular to whether the dynamics contains 99
100
CHAPTER 6. DETERMINISTIC COMPONENTS
a unit root or not. We consider the following simple regression model for yt containing a linear trend and a constant: yt = t + ut + 0 , t = 1, ..., T where the residual ut is a rst order autoregressive process: ut = t 1 L (6.2) (6.1)
and u0 is assumed xed. Note that the assumption (6.2) implies that (6.1) is a common factor model. As demonstrated below such a model imposes nonlinear restrictions on the parameters of the AR model and is, therefore, a special case of the general autoregressive model. Nonetheless, (6.1)-(6.2) serves the purpose of providing a pedagogical illustration of the dual roles of deterministic components in dynamic models. It is useful to see how the constant 0 is related to the initial value of yt . Using (6.1) we have that y0 = 0 + u0 . Since an economic variable is usually given in logs, the level contains information about the unit of measurements (the log of 100.000 euro, say). Therefore, the value of 0 is generally dominated by y0 . For practical purposes 0 ' y0 and in the discussion below we will set 0 = y0 to emphasize the role of measurments on the constant in a dynamic regression model. By substituting (6.2) in (6.1) we get: yt = t + t + y0 1 L (6.3)
and by multiplying through with (1 L) : (1 L)yt = (1 L)t + (1 L)y0 + t . Rewriting (6.4) using Lxt = xt1 we get: yt = yt1 + (1 )t + + (1 )y0 + t . (6.5) (6.4)
It is easy to see that the static regression model (6.1) is equivalent to the following dynamic regression model: yt = b1 yt1 + b2 t + b0 + t (6.6)
6.1. A DYNAMIC REGRESSION MODEL with b1 = b2 = (1 ) b0 = + (1 )y0 .
101
(6.7)
We consider the following four cases: Case 1. = 1 and 6= 0. It follows from (6.5) that yt = + t , for t = 1, ..., T, i.e. the random walk with drift model. Note that E(yt ) = 6= 0 is equivalent to yt having a linear trend, t. Case 2. = 1 and = 0. It follows from (6.5) that yt = t , for t = 1, ..., T, i.e. the pure random walk model. In this case E(yt ) = 0 and yt contains no linear trend. Case 3. | |< 1 and 6= 0 gives (6.6), i.e. yt is stationary around its mean Eyt = a1 t + a0 . We will now show that a1 = and a0 = y0 : Eyt = Eyt1 + (1 )t + + (1 )y0 a1 t + a0 = (a1 (t 1) + a0 ) + (1 )t + + (1 )y0 a1 (1 )t + a0 (1 ) + a1 = (1 )t + + (1 )y0 Hence: a1 (1 )t = (1 )t a1 = and: a0 (1 ) + = + (1 )y0 a0 = y0 . Thus, one should note that the coecients in a dynamic regression model have to be interpreted with caution. For example b2 in (6.6) is not an estimate of the trend slope in yt and b0 is not an estimate of 0 . Case 4. | |< 1 and = 0, gives us yt = yt1 + (1 )y0 + t , where Eyt = y0 i.e. the stationary autoregressive model with a constant term. To summarize: in the static regression model (6.1) the constant term is essentially accounting for the unit of measurement of yt ,
102
in the dynamic regression model (6.5) the constant term is a weighted average of the growth rate and the initial value y0 , in the dierenced model ( = 1) the constant term is only measuring the growth rate, .
6.2
A trend and a constant in the VAR
The above results were derived for the univariate model. We will now demonstrate that one can, with some modications, apply a similar interpretation of the deterministic components in the multivariate model. A characteristic feature of the error-correction formulation given below is the inclusion of both dierences and levels in the same model, allowing us to investigate short-run as well as long-run eects in the data. When two variables share the same stochastic trend we showed in the previous chapter that it is possible to nd a linear combination that cancels the trend, i.e. a cointegration relation. But many economic variables typically exhibit linear deterministic growth (at least locally over the sample period) in addition to stochastic growth. Statistically it is not always straightforward to distinguish between the two, especially over short sample periods. Sometimes it is preferable to approximate the trend behavior with a stochastic trend, sometimes it is better to model it with a deterministic trend and in most cases we need a combination of the two. We call a variable which contains only a deterministic trend, but no stochastic trend a trend-stationary variable. In the cointegrated VAR model the latter case can be modeled by adding a trend to the cointegration space. In other cases a linear combination between the variables removes the stochastic trend but not the deterministic trend and we need to add a linear trend to the cointegration relation to achieve stationarity. We call it a trendstationary cointegration relation. The basic idea can be illustrated with a simple VAR(1) model containing a constant, 0 , and a trend, 1 t. For notational simplicity all short-run dynamic eects, i , have been set to zero. Thus we consider the following model in AR form: xt = 0 xt1 + 0 + 1 t + t and in the MA form: (6.8)
6.2. A TREND AND A CONSTANT IN THE VAR
103
t X X xt = C (i + 0 +i1 ) + C (ti + 0 +(t i)1 ) i i=1 i=0
(6.9)
Because, xt and 0 xt1 are stationary we can express (6.8) as: xt Ext = (0 xt1 E( 0 xt1 )) + t where Ext = E(0 xt1 ) + 0 +1 t and E0 xt = 0 E(0 xt1 ) + 0 0 +0 1 t E0 xt = (1 + 0 )E( 0 xt1 ) + 0 0 +0 1 t (6.12) (6.10)
(6.11)
The two (p 1) vectors 0 and 1 can always be decomposed into two new vectors, so that one of them belongs to the space, i.e. to the cointegration relations, and the other to the orthogonal space, i.e. to xt . Since a vector can be decomposed in many dierent ways we need some principle for doing it. Generally we would like an equilibrium error to have mean zero and one possibility is to decompose 0 and 1 so that (0 xt E0 xt ) = 0. To achieve this when 1 6= 0 is quite involved and we will focus here on the case (0 6= 0, 1 = 0), i.e. a constant term but no linear trend in the VAR model. In this case we can use the equality: (0 )1 0 + (0 )1 0 = I to decompose the vector 0 into two new vectors: 0 = (0 )1 0 0 + (0 )1 0 0 , = 0 + 0 (6.14)
(6.13)
where 0 = 0 )1 0 0 and 0 = (0 )1 0 0 . We will now show that E( 0 xt + 0 ) = 0 and Ext = 0 in (6.8) when 1 = 0. In this case (6.9) becomes:
104
xt = C and
t X X (i + 0 ) + C (ti + 0 ) i i=1 i=0
Ext = C0 X E0 xt = 0 C 0 i
i=0
(6.15)
By noting that E0 xt = 0 in (6.12) (because E 0 xt is constant) we obtain the following expression for E0 xt : E xt = ( ) 0 =
0 0 1 0 0 X i=0
C 0 . i
(6.16)
Inserting (6.15) and (6.16) into (6.10) and noting that C = (0 )1 0 gives: xt (0 )1 0 0 = 0 xt1 + (0 )1 0 0 + t (6.17) xt = ( 0 xt1 + 0 ) + t Because the l.h.s. of (6.17) has a zero mean, as has t , E(0 xt1 + 0 ) = 0 and we have shown that the decomposition (6.13) satises the criterium E( 0 xt + 0 ) = 0. Thus, the motivation for choosing the decomposition (6.13) is that when 1 = 0 and 1 = 0 we have that E(4xt ) = 0 and, hence, E(0 xt + 0 ) = 0. When 1 , ..., k1 6= 0 (but 1 = 0) we can apply a slightly dierent decomposition: C + (I C) = (0 )1 0 + I (0 )1 0 = I (6.18) where = I 1 ... k1 . However, the derivation of the mean value of xt becomes more complicated in the general case.
6.2. A TREND AND A CONSTANT IN THE VAR
105
When 1 6= 0 we would need a dierent, more complicated, decomposition of 0 and 1 to achieve that the equilibrium error has a zero mean1 . In any case a similar logic is used for the decomposition of the constant and the trend into the space spanned by and : 0 = 0 + 0 1 = 1 + 1 By substituting (6.19) in (6.8) we get: xt = 0 xt1 + 0 + 1 t + 0 + 1 t + t , and by rearranging (6.20) can be written as: xt1 xt = [ 0 , 0 , 1 ] 1 + 0 + 1 t + t . t 0 xt = xt1 + 0 + 1 t + t , where = [ 0 , 0 , 1 ] and xt1 = (xt1 , 1, t)0 . The components can be interpreted from the equations:
0
(6.19)
(6.20)
Thus, (??) can be reformulated as:
(6.21)
E(xt ) = 0 + 1 t,
(6.22)
i.e. 0 6= 0, implies linear growth in at least some of the variables as demonstrated in Case 1 in Section 1, and 1 6= 0 implies quadratic trends in the variables. From the above discussion it appears that the constant and the deterministic trend play a double role in the cointegration model and we need to be able to distinguish between the part that belongs to the cointegration relations, and the part that belongs to the dierences.
If, nevertheless, (6.13) is used to obtain (0 , 1 , 0 , 1 ), the consequence is that E(0 xt1 + 0 + 1 t) 6= 0. but, in general, the deviation from a zero mean is very small.
1
106
6.3
Five cases
In empirical work we generally do not know from the outset whether there are linear trends in some of the variables, or whether they cancel in the cointegrating relations or not. We will demonstrate in Chapter 9 that such hypotheses can be expressed as testable linear restrictions on the econometric model. The ve dierent models discussed below arise from imposing dierent restrictions on the deterministic components in (??). Case 1. 1 , 0 = 0. This case corresponds to a model with no deterministic components in the data, i.e. Ext = 0 and E(0 xt ) = 0, implying that the intercept of every cointegrating relation is zero. As demonstrated in the previous section an intercept is generally needed to account for the initial level of measurements, X0 , and only in the exceptional case when the measurements start from zero, or when the measurements cancel in the cointegrating relations a zero restriction can be justied. Case 2. 1 = 0, 0 = 0 but 0 6= 0, i.e. the constant term is restricted to be in the cointegrating relations. In this case, there are no linear trends in the data, consistent with E(xt ) = 0 in (6.22). The only deterministic component in the model is the intercept of the cointegrating relations, implying that the equilibrium mean is dierent from zero. Case 3. 1 = 0, and the constant term 0 is unrestricted, i.e. no linear trends in the VAR model (6.8), but linear trends in the data (6.9). In this case, there is no trend in the cointegration space, but E(xt ) = 0 6= 0, is consistent with linear trends in the variables but, since 1 = 0, these trends cancel in the cointegrating relations. It appears that 0 6= 0 implies both linear trends in the data and a non-zero intercept in the cointegration relations. Case 4. 1 = 0, but 0 , 0 , 1 are unrestricted, i.e. the trend is restricted only to appear in the cointegrating relations, but the constant is unrestricted in the model. When 1 is restricted to zero we allow linear, but no quadratic trends, in the data. As illustrated in the previous section, E(xt ) = 0 6= 0, implies a linear trend in the level of xt . When, in addition, 1 6= 0 the linear trends in the variables do not cancel in the cointegrating relations, i.e. our model contain trend-stationary variables or trend-stationary cointegrating relations. These can either describe a trend-stationary variable or a trend-stationary cointegration relation. Therefore, the hypothesis that a variable is trend-stationary, for example that the output gap is stationary, can be tested in this model.
6.4. THE MA REPRESENTATION
107
Case 5. No restrictions on 0 , 1 , i.e. trend and constant are unrestricted in the model. With unrestricted parameters, the model is consistent with linear trends in the dierenced series xt and, thus, quadratic trends in xt . Although quadratic trends may sometimes improve the t within the sample, forecasting outside the sample is likely to produce implausible results. Instead it seems preferable to nd out what has caused this approximate quadratic growth, and if possible include more appropriate information in the model (for example, population growth or the proportion of old/young people in a population). These are only a few examples showing that the role of the deterministic and stochastic components in the cointegrated VAR is quite complicated. Not only is a correct specication important for the model estimates and their interpretation, but also the asymptotic distribution of the rank test depends on the specication of these components. This will be further discussed in Chapter 8.
6.4
The MA representation with deterministic components
For simplicity we will here focus on the derivation of the MA representation when the VAR contains an unrestricted constant and a linear trend. It is straightforward to generalize to other cases. Chapter 5 showed that the MA form of the VAR model (??) can be obtained by inverting:
4xt = C(L)(t + 0 +1 t) = [C(1) + C (L)(1 L)](t + 0 +1 t) and summing: (t + 0 +1 t) + C (L)(t + 0 +1 t) + X0 (1 L)
(6.23)
xt = C(1)
(6.24)
where X0 contains the eect of the initial values dened so that 0 X0 = 0. As before:
108
C(1) = C = (0 )1 0 . By summing and rearranging (6.24) can be written as: xt = C0 t + 0.5C1 t2 + 0.5C1 t + C (L)1 t + C (1)0 + | {z }
determ. comp.
(6.25)
Substituting (6.25) in (6.26) we get:
+Ci + C (L)t + X0 , for t = 1, ..., T. | {z }

stoch. comp.
(6.26)
Substituting (6.14) in (6.27) and focusing on the linear and quadratic trend components we get: 0 0 t = 0 0 t + 0 0 t | {z }
0 0
xt = (0 )1 0 0 t + 0.51 t + 0.51 t2 + [C (L)1 t] + | {z } | {z } (6.27) +Ci + C (L)t + C (1)0 + X0
and
0 0.51 t = 0.5(0 1 t + 0 1 t) | {z } 0 0.51 t2 = 0.5(0 1 t2 + 0 1 t2 ), | {z }

0
and the MA representation can be written as:
xt = (0 )1 0 { 0 t + 0.5 1 t + 0.5 1 t2 } + C (L)1 t + + C (1)0 + Ci + C (L)t + X0 .
(6.28)
Thus, (6.28) shows that linear trends in the variables can originate from three dierent sources in the VAR model:
6.5. DUMMY VARIABLES IN A SIMPLE REGRESSION MODEL 1. the component (C (L)1 t) of the unrestricted linear trend 1 t 2. the component ( 1 t) of the unrestricted linear trend 1 t 3. the component ( 0 t) of unrestricted constant term 0 . We write (6.28) in a more compact form: xt = C{ 1 t + 2 t2 } + Ci + C (L)t +X0 , | {z } | {z }
109
(6.29)
where 1 and 2 can be derived from (6.28).
6.5
Dummy variables in a simple regression model
Similarly as for the trend and the constant we consider rst a simple regression model for yt containing three dierent types of dummy variables, Dst , Dpt , and Dtrt : yt = s Dst + p Dpt + tr Dtrt + ut + y0 , t = 1, ..., T (6.30)
where Dst is a mean-shift dummy (...0,0,0,1,1,1,...), Dpt is a permanent intervention dummy (...0,0,1,0,0,...) and Dtrt is a transitory shock dummy (...0,0,1,-1,0,0,...), and the residual ut is a rst order autoregressive process: ut = t 1 L (6.31)
By substituting (6.31) in (6.30) we get: (6.32) (1 L)yt = s (1 L)Dst + p (1 L)Dpt + tr (1 L)Dtrt +(1 L)y0 + t , yt = yt1 + s Dst s Dst1 + p Dpt p Dpt1 +tr Dtrt tr Dtrt1 + (1 )y0 + t , = b1 yt1 + b2 Dst + b3 Dst1 + b4 Dpt + b5 Dpt1 +b6 Dtrt + b7 Dtrt1 + b0 + t (6.33)
110
Thus, the static regression model (6.30) with autoregressive errors corresponds to a dynamic model with lagged dummy variables. Note that the eects of the dummy variables can equivalently be formulated as: s Dst s Dst1 = s Dst + s (1 )Dst p Dpt p Dpt1 = p Dpt + p (1 )Dpt tr Dtrt tr Dtrt1 = tr Dtrt + tr (1 )Dtrt (6.34) (6.35) (6.36)
where Dst now becomes an impulse dummy (...0,0,1,0,0,...) describing an permanent intervention (...0,0,1,0,0,...) and Dpt becomes a transitory blip dummy (...0,0,1,-1,0,0,...), and Dtrt a double transitory blip dummy (...0,0,1,-2,1,0,0,...). Hence, (6.32) can be reformulated as: yt = (1 )yt1 + s (1 )Dst + [s + p (1 )]Dpt + (6.37) +[p + tr (1 )]Dtrt + tr Dtrt + t When = 1 (6.37) becomes: yt = s Dst + p Dpt + tr Dtrt + t , t = 2, ..., T, = s Dpt + p Dtrt + tr Dtrt + t , i.e. a shift in the levels of a variable becomes a blip in the dierenced variable, a permanent blip in the levels becomes a transitory blip in the dierences, and nally a transitory blip in the levels becomes a double transitory blip in the dierences.
6.6
Dummy variables and the VAR
Signicant interventions and reforms frequently show up as extraordinary large (non-normal) shocks in the VAR analysis, thus violating the normality assumption. We will strongly argue below that it would be a mistake to treat them exclusively as a statistical nuisance to be remedied by appropriately correcting the observations. First we will illustrate the need for intervention dummies and how to model them based on the analysis of the Danish data, then we will discuss more formally how they inuence the dynamics of the
6.6. DUMMY VARIABLES AND THE VAR
111
VAR model, and nally discuss their signicance for the empirical model estimates. A graphical analysis of the residuals based on the unrestricted VAR(2) model for the Danish data showed that the temporary removal of the VAT in 1975 had a very strong impact on real aggregate expenditure2 and that the removal of restrictions on capital movements in 1983 caused the yearly long-term bond rate to fall with approximately 10% from a level of ca. 25%, which was internationally high. The huge drop in interest rate was associated with a similarly large increase in aggregate money stock as a result of the possibility among foreigners to hold Danish krones. It is often much easier to recognize a possible outlier observation in xt than in the levels xt . Thus, a rst tentative decision to model a political intervention, for example with a permanent or a transitory blip dummy, can be made by examining the dierenced process as illustrated below. The eect on the levels of the process will be discussed subsequently. The rst intervention is transitory in the sense that the VAT was removed for one quarter and gradually restored again over two quarters. Nonetheless, it was clearly meant to have a permanent eect on real aggregate income and we might hypothetically expect both a transitory and a permanent effect. The removal of VAT is also likely to inuence prices in various ways. For example, if we measure prices by an CPI index, then a change in VAT is likely to be seen as a roughly proportional eect. But it is also possible that some producers, in the expectation of a high demand for their products, will increase prices to some extent. In the present illustration we are using the implicit price deator of GNE as a measure of prices so the eects are likely to be much smaller than for the CPI. The transitory eect of the VAT intervention should be modelled by a transitory blip dummy, for example Dtrt = [0, 0, 0, ...0, 1, 0.5, 0.5, 0, 0, ..., 0, 0, 0], whereas a permanent eect should be modelled by a blip dummy, for example Dpt = [0, 0, 0, ...0, 1, 0, 0, ..., 0, 0, 0]. The statistical analysis of the VAR model can then be used to nd out if the VAT intervention eects were indeed signicant and if this was the case for which equations. The second intervention is a result of permanently removing restrictions in the capital market and is, therefore, rst of all likely to have permanent eects on the system. However, markets sometimes overreact and we often
The purpose of the intervention was exactly to boost domestic demand to avoid a depression in the aftermath of the rst oil shock.
2
112
see both a permanent eect and a transitory eect as a result of a signicant reform or intervention in the market. The latter would show up as a shock in one period, followed shortly afterwards by a similar shock of opposite sign making the variable return to its previous level. Such transitory shocks are quite common, in particular, for high frequency data. If they are of minor signicance and, therefore, not explicitly modelled they are likely to produce some negative residual autocorrelation in the VAR model. But, contrary to permanent shocks they will disappear in cumulation and, therefore, will have no eect on the stochastic trends dened as the cumulative sum of all previous shocks. Using dummies to account for extraordinary mean-shifts, permanent blips, and transitory shocks, the CVAR model is reformulated as: xt = 1 xt1 + 0 xt1 + s Dst + p Dpt + tr Dtrt + 0 + t , t Niid (0, ), t = 1, ..., T (6.38) where Dst is d1 1 vector of mean-shift dummy variables (...0,0,0,1,1,1,...), Dpt is a d2 1 vector of permanent blip dummy variables (...0,0,1,0,0,...) and Dtrt is a d3 1 vector of transitory shock dummy variables (...0,0,1,-1,0,0,...). Because the VAR model contain both dierences and levels of the variables the role dummy variables (and other deterministic terms) is more complicated than in the usual regression model (Hendry and Juselius, 2001, Johansen, Nielsen, and Mosconi, 2001). An unrestricted mean shift dummy accounts for a mean shift in xt and cumulates to a broken trend in xt . An unrestricted permanent blip dummy accounts for a large blip (impulse) in xt and cumulates to a level shift in xt . An unrestricted transitory blip dummy accounts for two consecutive blips of opposite signs in xt and cumulates to a single blip in xt . To understand the role of the dummies in the CVAR model we use (6.13) to partition the dummy eects into an and an component: s = 0 + 1 , (6.39)
p = 0 +1 ,
(6.40)
6.6. DUMMY VARIABLES AND THE VAR tr = 0 +1 ,
113 (6.41)
where 0 = ( 0 )1 0 s and 1 = (0 )1 0 s and 0 , 1 , 0 , 1 are similarly dened. It is now possible to investigate the dynamic eects of the dummies from the moving average representation of the model, which denes the variables xt as a function of i , i = 1, ..., t, the dummy variables Dst , Dpt and Dtt , and the initial values, X0 . For model (6.38) it is given by: xt = C Pt1
i=1
i + Cs
C (L)(t + 0 + 0 Dst +p Dpt +1 Dtt ) + X0 where, as before C = (0 )1 0
Pt1
i=1
Dsi + Cp
Pt1
i=1
Dpi + Ctr
Pt1
i=1
Dti +
(6.42)
(6.43)
and C (L) is an innite polynomial in the lag operator L. The rst summation in (6.42) gives the common stochastic trends generated by the ordinary shocks to the system, the second summation generates a broken linear trend in xt , the third summation a shift in the level of the variables associated with the permanent extraordinary large shock, the forth summation a blip in the variables. It appears from (6.43) that only the components of (6.39) (6.41) will enter with a nonzero coecient in the summation of the dummy components in (6.42), whereas the components will have a zero coecient and, hence, disappear. Thus, dummy variables which are restricted to be in the cointegration relations do not cumulate in xt . Hence, to avoid broken linear trend in the data 1 = 0 has to be imposed in (6.39). The second part of (6.39), 0 , is restricted to lie in the space, i.e. in the cointegration relations. Thus, 0 6= 0 describes a mean shift in 0 xt as a result of mean shifts in the variables that do not cancel in a cointegrated relation. A mean shift in a variable xj,t implies a permanent blip in xj,t and, hence, 0 6= 0 implies p 6= 0. If, s = 0, then there is no broken trend in the variables nor a mean shift in the cointegration relations, but if p 6= 0, then 1 6= 0 describes level shifts in the variables that cancel in 0 xt , whereas 0 6= 0 describes a blip
114
in 0 xt . A blip in the variables xt implies a transitory shock in xt . Thus, 0 6= 0 is consistent with tr 6= 0, and describes a situation when the blips in the levels of xt generated by transitory shocks to xt do not cancel in 0 xt . To avoid adding more dummy components we assume here that 0 = 0. Thus, (6.42) shows that a large shock at time t, accounted for by the dummies Dpt or Dtrt , will inuence the variables with the same dynamics as an ordinary shock unless the dummies enters the model with lags. Thus, if a dummy variable needs a lag in the model we will consider the corresponding intervention shock to be inherently dierent from the ordinary shocks, whereas if the dummy is needed only once, at the day of the news, we will consider it a big, but nevertheless ordinary, shock3 . Thus, we need to make a distinction between extraordinary intervention shocks with a permanent eect, for example as a result of central bank or government interventions, and ordinary large shocks, for example as a result of market (over)reaction to various news. Conceptually we can distinguish between: ordinary (normally distributed) random shocks, (extra)ordinary large permanent random shocks (|i,t | > 3.3 ) described by a blip dummy without lags, intervention shocks (large permanent shocks, |i,t | > 3.3 , related to a well-dened intervention) described by a blip dummy with lags transitory large shocks, outliers (typing mistakes, etc.) described by a +/- blip dummy. The occurrence of transitory shocks in the model, whether large or small, will produce some (usually small) residual autocorrelations in the model and, hence, violate the independence assumption of the VAR model. Because transitory shocks appear un-systematically this problem cannot be solved by increasing the lag length of the VAR or by including a moving average term in the error process. To some extent it can be accounted for by the inclusion of transitory intervention dummies in the model. But only the very large transitory shocks will generally be accounted for by dummies and the
A similar distinction is between additive outliers, which are extraordinary shocks which are not subject to the VAR dynamics (a typical example is a typing error in the data) and which after they have occurred are subject to the VAR dynamics.
3
6.7. AN ILLUSTRATIVE EXAMPLE
115
empirical model is, therefore, likely to exhibit some minor autocorrelations in the residuals. Similar arguments as was made for the trend and the constant term in the VAR model can be made for intervention dummies. The intervention may have inuenced several variables in such a way that the intervention eect is cancelled in a cointegration relation. Alternatively, the intervention may only have aected one of the variables (or several variables but not proportionally with ), so that the eect does not disappear in a cointegration relation. Table 1 not yet nished. To be worked out later.
6.7
An illustrative example
Consistent with the discussion above we will now re-estimate the Danish VAR model allowing for a trend restricted to lie in the cointegration space, a shift dummy Ds83t = 1 for t = 1983:1,.....,1993:4, 0 otherwise, also restricted to lie in the cointegration space, an unrestricted transitory blip dummy Dtr75t = 1 for t = 1975:4, -0.5 for 1976:1 and 1976:2, 0 otherwise and a permanent blip dummy Dp83t = 1 for 1983:1, 0 otherwise. The Dp83t and its lagged value will be included in the model. In addition, three centered seasonal dummies dened by Dq0t = [Dq1t , Dq2t , Dq3t ], where Dqit = 0.75 in quarter i, -0.25 in quarterP 1, i + 2, i + 3. the advantages of using centered seasonal dummies i+ is that T Dqij = 0 in samples covering complete years. j=1 The VAR model to be estimated is given by: xt = 1 xt1 + 0 xt1 + 0 + 1 t + 0 Ds83t + +p.1 Dp83t + p.2 Dp83t1 + tr D75trt + 0q Dqt + 0 + t , = 1 xt1 + xt1 + p.1 Dp83t + p.2 Dp83t1 + tr D75trt + 0q Dqt + 0 + t , where xt1 1 0 = 1 and xt1 = t Ds83t 0 .
0
116
Table 6.1: Specication tests for the unrestricted VAR(2) model with dummies. Multivariate tests: Ljung-Box 2 (425) = 450.6 p-val. 0.19 29.3 p-val. 0.25 Residual autocorr.LM1 2 (25) = 2 LM4 (25) = 31.8 p-val. 0.16 2 Normality: LM (10) = 17.2 p-val. 0.07 Univariate tests: mr y r p Rm Rb ARCH(2) 5.0 1.9 2.2 4.1 0.2 Jarq.Bera(2) 1.1 2.1 1.8 10.3 1.9 Skewness 0.22 -0.37 0.26 0.12 -0.37 Kurtosis 3.10 2.92 3.29 4.50 3.04 The 5 largest roots of the characteristic polynomial Real 0.87 0.87 0.66 0.66 0.58 Complex 0.02 -0.02 0.31 -0.31 0.00 Modulus 0.88 0.88 0.73 0.73 0.58
The properties of the residuals from the estimated VAR model with dummies have now changed compared to the unrestricted model of Chapter 4. We will rst have a look at the misspecication tests reported in Table 6.1. It appears that the model specication has improved to some extent. For example the test for forth order autocorrelation is no longer signicant and the residuals from the bond rate equation now pass the normality test. However, the excess kurtosis of the residuals from the deposit rate is now further away from normality and the Jarque-Bera test clearly rejects normality. This is probably due to the fact that calculated standard errors of the equations are now smaller and, hence, any deviations from normality will now be measured with a more precise yardstick. We note that instead of one root almost on the unit circle we have now two fairly large complex roots in which the complex part is very small. The question whether they correspond to approximate unit roots will be tested in Chapter 8.
6.7. AN ILLUSTRATIVE EXAMPLE
117
mr t r yt p2 t Rm,t Rb,t
mr 0.29 0.17 0.10 3.83 0.91 t1 r 0.00 0.25 0.12 1.10 0.69 yt1 0.05 0.09 0.22 1.93 0.20 p2 t1 0.01 0.01 0.01 0.16 0.25 Rm,t1 0.01 0.03 0.00 0.07 0.20 Rb,t1
0.28 0.19 0.15 4.87 4.64 0.05 0.000 0.09 0.26 0.27 2.20 1.08 0.01 0.000 0.04 0.14 1.76 1.73 0.00 0.00 0.000 0.01 0.01 0.00 0.31 0.08 0.00 0.000 0.01 0.01 0.02 0.00 0.10 0.00 0.000
mr t1 r yt1 pt1 Rm,t1 Rb,t1 Ds83t1 t
0.03 0.06 0.01 0.048 0.008 0.031 0.09 0.04 0.01 0.03 0.006 0.002 0.000 0.34 0.01 0.00 0.00 0.006 0.000 0.008 0.17 0.00 0.00 0.00 0.001 0.000 0.001 0.00 0.00 0.01 0.00 0.002 0.001 0.002 0.01
Dtr75t Dp83t Dp83t1 Dq1t Dq2t Dq3t const
+ t
Log(Lmax ) = 1973.1, log || = 52.0, trace correlation = 0.61, We notice that although the estimates of and 1 have not changed a lot compared to the no-dummy VAR, the parameters of the implicit money
1.0 0.23 1.0 0.19 0.20 1.0 0.18 0.05 0.00 1.0 0.11 0.11 0.13 0.41 1.0
, =
0.0218 0.0143 0.0130 0.0011 0.0014
118
demand relation in the rst row of the matrix seem now more reasonable. For example the interest rate elasticities are now more moderately sized. The shift dummy is signicant in the money stock and deposit rate equation which is consistent with our prior hypotheses. The linear trend seems to be important in the income and ination rate equation. The transitory VAT dummy is signicant only in the income equation and the 1983 blip dummy is signicant in the money stock and the bond rate equation. The lagged 1983 dummy is not signicant in any of the equations and we conclude that the 1983 shock can be considered a large, but ordinary shock. Most of the seasonal eects are quite insignicant. The residual correlations are almost unchanged, but the estimated standard errors have decreased quite substantially.
6.8
Conclusions
The normality assumption of t is frequently not satised in empirical VAR models without accounting for such reforms and interventions that have produced extraordinary large residuals. There are several possibilities: 1. The linear relationship of the VAR model does not hold for large shocks: market reacts dierently to ordinary and extraordinary shocks. 2. The linear relationship of the VAR model holds approximately, but the properties of the VAR estimates are sensitive to the presence of extraordinary large shocks. Ordinary and extraordinary shocks are drawn from dierent distributions. 3. The estimates of the VAR model are generally robust to deviations from normality. It is almost impossible to know beforehand which of the three cases is closest to the truth in a specic empirical application. It depends very much on such aspects as the length of the sample, how frequently the outliers occur and whether positive and negative outliers occur relatively symmetrically, and so on. The best advice is to take them seriously. Neglecting the outlier problem is likely to produce unreliable results.
Chapter 7 Estimation in the I(1) Model

We assume here that the empirical VAR model can describe the data satisfactorily (i.e. is data congruent) so that we can proceed by testing the reduced rank conditions of under the assumption that = I 1 ... k1 satisfy the I(1) condition, i.e. there are no I(2) components in the model. Section 7.1 demonstrates how the short-run dynamics can be concentrated out from the general VAR model, Section 7.2 derives the ML estimator of and , Section 7.3 shows how to normalize the cointegration vectors, Section 7.4 discusses the uniqueness of the unrestricted estimates, and Section 7.5 illustrates the estimation procedure.
7.1
Concentrating the general VAR-model
The I(1) condition can be stated as: = 0 , (7.1)
where and are p r matrices. If r = p then xt is stationary and standard inference applies. If r = 0 then xt is nonstationary and it is not possible to obtain stationary relations between the levels of the variables by linear combinations. We say that the variables do not have any common stochastic trends and hence cannot move together in the long run. In this case the VAR model in levels becomes a VAR model in dierences (without loss of long-run information) and since xt I(0) standard inference applies. If p > r > 0 then xt I(1) and there exists r directions into which the process 119
120
CHAPTER 7. ESTIMATION IN THE I(1) MODEL
can be made stationary by linear combinations. These are the cointegrating relations and the question is whether they can be given an interpretation as economic steady-state relations. We consider now a VAR(k) model in ECM form with = 0 : xt = 1 xt1 + ... + k1 xtk+1 + 0 xt1 +Dt + t , (7.2)
where t = 1, ..., T, xt1 is p11, p1 = p+m, m is the number of deterministic components, such as constant or trend, and the initial values x1,..., xk are assumed xed. In (7.2) the time series process xt is dependent on lagged values xt1,..., xtk . When we estimate the model based on a nite sample it is useful to condition on the rst k observations X0 = { x1,..., xk }, i.e. treating them as xed known parameters. This is particularly the case when the data are nonstationary, simply because it is not meaningful to include the marginal probability of a nonstationary variable in the likelihood function. Note that when choosing the sample period, it is important to make sure that the rst observations are not too far away from equilibrium. Otherwise the rst initial values might generate explosive roots in the data. We use the following shorthand notation: Z0t = xt Z1t = xt1 Z2t = [xt1 , xt2 , ..., xtk+1 , Dt ], and write (7.2) in the more compact form Z0t = 0 Z1t +Z2t + t , where = [1 , 2 , ..., k1 , ]. We will now concentrate out the short-run transitory eects, Z2t , to obtain a cleaner long-run adjustment model. To explain the idea of concentration which is used in many dierent situations in econometrics, we will rst illustrate its use in a multiple regression model.
A digression:
*********************************** It is well known (the Frish-Waugh theorem) that the OLS estimate of 2.1 in the linear regression model:
7.1. CONCENTRATING THE GENERAL VAR-MODEL
121
yt = 1.2 x1t + 2.1 x2t + t can be obtained in two steps: b 1.a. Regress yt on x1t , obtaining the residual u1t from yt = 1 x1t + u1t 1.b. Regress x2t on x1t , obtaining the residual u2t from x2t = 2 x1t + u2t b 2. Regress u1t on u2t , to obtain the estimate of 2.1 , i.e.:
u1t = 2.1 u2t + error. Hence, we rst concentrate out the eect of x1t on both yt and x2t , and then regress the cleaned yt , i.e. u1t , on the cleaned x2t , i.e. u2t . ************************************************* We now use the same idea on the VAR-model. First we dene the auxiliary regressions: Z0t = B01 Z2t +R0t Z1t = B02 Z2t +R1t
(7.3)
where B01 = M02 M1 and B02 = M12 M1 are OLS estimates and Mij = t (Zit Z0jt )/T . 22 22 i Z0j are the empirical counterparts of the covariance matrices ij Thus Mij Z discussed in Chapter 3. The following scheme shows how these are dened for the VAR(k) model: xt M00 M10 M20 xt1 M01 M11 M21 xtk+1 xt1 M02 M12 M22
xt xt1 . . . xtk+1 xt1
122
The concentrated model R0t = 0 R1t + error (7.4)
is important for understanding both the statistical and economic properties of the VAR model. In the form (7.4) we have transformed the original messy VAR containing short-run adjustment and intervention eects into the baby model form, in which the adjustment exclusively takes place towards the long-run steady-state relations. This means that we not only have transformed the dirty empirical model into a nice statistical model but also into a more interpretable economic form.
7.2
Derivation of the ML estimator
Consider the concentrated model (7.4): R0t = 0 R1t + t , t = 1, ..., T, where t Np (0, ). The ML estimator is derived in two steps: First we assume that is known and derive an estimator of under the assumption that 0 R1t is a known variable. Then we insert = () in the expression for the maximum of the likelihood function so that it becomes a function of , but not of . We then nd the value of that maximizes the likelihood function. When we have found the ML estimator of we can then nd = (). Step 1. The ML estimator of given corresponds to the standard LS estimator. It can be derived by post-multiplying (7.4) with R01t and dropping the error term: R0t R01t = 0 R1t R01t . Summing over t, and dividing by T gives:
7.2. DERIVATION OF THE ML ESTIMATOR
123
S01 = 0 S11 where Sij = T 1 t Rit R0jt = Mij Mi2 M1 M2j . 22 It is now easy to derive the least squares estimator of as a function of : () = S01 ( 0 S11 )1 (7.5)
Step 2. Given the assumption of multivariate normality we have that the maximum of the likelihood function of (7.4) is equal to the determinant of the error covariance matrix as a function of xed and : L2/T (, ) = |(, )| + cons tan t terms max where (, ) = T 1 P (R0t 0 R1t )(R0t 0 R1t )0 (7.6)
P P P P = T 1 ( R0t R00t R0t R01t 0 0 R1t R00t +0 R1t R01t 0 ) = S00 S01 0 0 S10 + 0 S11 0 (7.7)
By substituting (7.5) in (7.7) we can express the error covariance matrix as a function exclusively of : () = S00 S01 (0 S11 )1 0 S10 S01 (0 S11 )1 0 S10 + S01 (0 S11 )1 0 S11 (0 S11 )1 0 S10 . | {z }
I
Hence:
() = S00 S01 (0 S11 )1 0 S10
(7.8)
124
where A, B, C are nonsingular square matrices. Now substitute: S00 = A 0 S11 = C S01 = B in (7.9), resulting in:
The ML estimator of is given by the estimate that minimizes |()| . To derive the estimator we use of the following result: A B 0 1 1 0 0 (7.9) B C = |A| |C B A B| = |C| |A BC B |
|S00 | |0 S11 0 S10 S1 S01 | = |0 S11 | |S00 S01 (0 S11 )1 0 S10 | 00 | {z }

|()|
Hence, |()| = |S00 | |0 S11 0 S10 S1 S01 | 00 |0 S11 | |0 (S11 S10 S1 S01 )| 00 = |S00 | |0 S11 |
Using the result that the function f (x) = |X0 MX| |X0 NX|
is maximized by solving the eigenvalue problem |N M| = 0, we can obtain a solution for that minimizes |()|. We rst substitute: M = S11 S10 S1 S01 00 N = S11 X = (7.10)
7.3. NORMALIZATION
125
in (7.10) to formulate the eigenvalue problem, the solution of which gives the estimates of : |S11 S11 +S10 S1 S01 | = 0 00 or equivalently: |(1 )S11 S10 S1 S01 | = 0 00 | {z }
(7.11)
The solution gives p eigenvalues 1 , ..., p and p eigenvectors V1 , ..., Vp and we can now express the determinant of the residual covariance matrix as:
p Q
|| = |S00 |
i=1
(1 i )
(7.12)
0 The cointegration vectors Vi xt are not yet normalized on a variable and in the next section we will discuss how to choose an appropriate normalization for each vector. The normalized vectors will be called i to distinguish them from the non-normalized vectors. Note that the relations 0i xt are ordered according to 1 > ... > p > 0 and the magnitude of i is a measure of the stationarity of the corresponding 0i xt . The next chapter will discuss how to classify the p relation into r stationary relations corresponding to the r largest eigenvalues and p r non-stationary relations corresponding to the p r smallest eigenvalues.
7.3
Normalization
To be able to interpret a cointegration relation as a relation to be primarily associated with a particular economic variable we need to normalize the former by setting the coecient of the latter to be unity. This is similar to what we do in a regression model, x1t = 0 + 1 x2t + 3 x3t + ut , when we choose one of the variables, x1t , to be the dependent variable, i.e. to have a unitary coecient. In a regression model with stochastic variables it might happen that we choose the wrong variable to be the dependent e variable. This would be the case if it turns out that the regression, x2t = 0 + e e 1 x1t + 3 x3t +et , gives more interpretable coecient estimates and improved u statistical properties.
126
In an analog manner the choice of normalization of a cointegrating relation should make sense economically as well as statistically. For example, normalizing on an insignicant or irrelevant coecient does not make sense, normalizing on money stock, say, in a relation describing real income seems a bad choice. There is, however, an important dierence between a regression model and a cointegration relation. Normalizing on either x1t or x2t in the regression model generally changes the estimates of the regression coecients, whereas in a cointegration relation the ratios between coecients are the same independent on the chosen normalization. In this sense the coecient estimates in a cointegration relation are more canonical.
7.4
The uniqueness of the unrestricted estimates
For a given choice of the number of stationary cointegrating relations, r, the Johansen procedure gives the maximum likelihood estimates of the unrestricted cointegrating relations 0 xt . How to determine r will be discussed in the next chapter. In Chapter 6 we gave these estimates a rst tentative interpretation in terms of underlying steady-state relations. In this section we will discuss whether such an interpretation is at all meaningful. The unrestricted estimates of and are calculated given the following conditions: 0 1. Stationarity, i.e. xt I(0). 0 0 2. Conditional independence of j xt , i.e. S11 = I, where S11 was dened at the beginning of this chapter. 3. The ordering given by the maximal conditional correlation with the stationary process xt . Given the above criteria the unrestricted cointegrating relations are uniquely determined and possibly meaningful if the former can be considered relevant. Because it happens quite frequently that the unrestricted cointegration relations are interpretable, it may be of some interest to discuss whether the three conditions (or rather the last two, since stationarity is clearly mandatory) are reasonable from an economic point of view. The conditional independence condition is a consequence of the chosen 0 eigenvalue normalization, S11 = I, as a result of analyzing the likelihood
7.5. AN ILLUSTRATION
127
function. It is a purely statistical condition and is arbitrary in the sense that we could have chosen another normalization. Because the conditional orthogonality condition surprisingly often seem to produce economically interpretable relations it is tempting to look for some regularity in macroeconomic behavior which could be associated with this purely statistical condition. If the empirical problem is about macroeconomic behavior in a market where equilibrating forces are allowed to work without binding restrictions, at least in the long-run, one would generally expect two types of agents with disparate goals interacting in such a way that equilibrium is restored once it has been violated. These can be demanders versus suppliers, producers versus consumers, employers versus employees, etc. Thus, if we choose a suciently rich set of variables for the VAR analysis, economic theory would generally suggest at least two (but often more) stationary long-run relationships. The question is whether we should expect them to be conditionally independent. A somewhat heuristic guess is that the conditional independence may produce empirically interpretable relations when the VAR model contains suciently many variables to identify these hypothetical long-run relations. For example, to be able to empirically identify a long-run demand and a supply relation among the unrestricted cointegration relations there should be at least one variable which is strongly inuencing the demand behavior but unrelated to the supply behavior and vice versa. This is, of course, the basic idea behind identication which will be discussed in great detail in Chapter 10, where we discuss how to obtain unique estimates without the need to imposing the condition of conditional independence. The third statistical criterion, the maximal correlation with the stationary part of the process, does not seem easily interpretable as a meaningful economical criterion. Therefore, even if a direct interpretation of the unrestricted cointegration vectors is sometimes possible, the results should be considered indicative rather than conclusive, and cannot replace formal testing of structural hypotheses.
7.5
An illustration
Solving the eigenvalue problem (7.11) for the Danish data produced the ve eigenvalues and the corresponding eigenvectors reported in the rst part of Table 7.1. The eigenvectors are calculated based on the normalization v0 S11 v = I. The ordering is based on the magnitude of i so that the rst
128
0 relation v1 xt is most strongly correlated with the stationary part of the process. The squared canonical correlation coecient 1 = 0.58 corresponds to a correlation coecient 0.58 = 0.76. We note that the last eigenvalue 5 is quite close to zero. The question is when the value of i is small enough not to be signicantly dierent from zero. The next chapter will deal with this important and dicult issue. For each eigenvector vi there is a corresponding vector of weights (loadP 0 ings) wi = S01 vi satisfying p wi vi = . The coecients of the eigenveci=1 tors vi reported in Table 7.1 are generally quite large and the coecients of the weights wi correspondingly small. Without an adequate normalization it is hard to see what they mean and the rst task is to normalize1 the eigenvectors by an element vij as follows:
1 i = vi vij , i = 1, ..., p, i = wi vij , i = 1, ..., p.
To distinguish between the non-normalized and the normalized vectors we use the notation vi and wi for the former and i and i for the latter. The normalized vectors are reported in the middle part of Table 7.1. The rst vector has been normalized on p; the second on m r ; the third on R m ; the fourth on y r , and nally the fth on R b . The choice of normalization element can be done arbitrarily, but to be able to interpret the results one needs to normalize on a variable that is representative for the relation. For example, normalizing on p in the rst relation means that the rst relation should in some vague sense describe a relation for ination rate (with signicant equilibrium correction in the ination rate equation). The rst choice of normalization is tentative by nature and will often change as a result of more detailed inspection of the results. The estimated unrestricted cointegration vectors may or may not make economic sense. As already mentioned they are uniquely dened based on (1) the ordering of the i and (2) the choice of eigenvector normalization 0 S11 = I, both of which are statistical criteria without any obvious economic interpretation. Nevertheless, we will illustrate below that a careful inspection of the rst estimation results can often be crucial for a successful completion of the empirical exercise.
1
We will further discuss normalization of cointegration vectors in Section 9.1.
129
Table 7.1: Estimated eigenvalues, eigenvectors, and loadings for the Danish data 0 Non-normalized eigenvectors: vi r r i m y p Rm Rb D83 trend 0 0.58 v1 -1.5 -7.2 -122.7 199.5 -82.6 0.3 -0.0 0 -25.2 25.5 17.2 327.1 -350.9 3.0 0.0 0.36 v2 0.26 v 03 4.6 2.7 24.2 556.9 -208.8 -2.9 0.0 0 0.12 v4 -13.3 40.7 5.1 68.2 -64.6 -1.7 0.0 -10.2 19.9 3.8 183.7 -339.9 -3.8 -0.0 0.06 v 05 0 Normalized eigenvectors: i 0 1 0.01 0.06 1.00 -1.63 0.67 -.00 0.00 02 1.00 -1.01 -0.68 -12.97 13.92 -0.12 -0.00 0.01 0.00 0.04 1.00 -0.37 -0.01 0.00 03 0 4 -0.20 0.60 0.07 1.00 -0.95 -0.03 0.00 05 0.03 -0.06 -0.01 -0.54 1.00 0.01 0.00 The weights to the eigenvectors: i 1 2 3 4 5 mr 0.26 0.33 0.27 0.25 0.17 t
(0.9) (0.6) (5.6) (0.2) (1.6) (0.2) (0.8) r yt
0.11 1.62
(9.3)
0.05
(1.1) (1.7)
1.34 0.30
(1.5) (2.8)
0.41 0.15
(0.3)
2 pt Rm,t Rb,t
0.06
(0.7)
0.11
(0.1) (4.4)
0.09
(0.9) (1.5) (0.9)
0.01
(0.9) (0.8)
0.00 0.31 0.00

(0.1)
0.01 0.01
0.02
(0.5) (2.2)
0.02 mr t 0.29
(4.0) (1.7) (0.7)
0.04
(0.5)
0.11 D83 t 0.04

(3.1) (0.5)
mr t
r yt
The combined eects : r yt pt Rmt Rbt 0.18 0.04 4.82 4.80

(1.5) (0.1) (1.1) (2.9) (3.8) (2.6) (1.5)
trend 0.00
(1.7) (3.3) (3.2)
0.08 0.21 0.01

(2.0)
0.21
(9.2)
1.84
(1.7) (2.0)
0.94
(1.1) (0.3)
0.00 0.00
(0.5) (1.7)
2 pt Rm,t Rb,t
0.03 0.11 1.66 0.01

(1.6) (1.6) (0.2) (0.8)
1.96
(3.2)
0.21 0.00 0.00 0.06

(0.9) (1.1)
0.00 0.28 0.02 0.01

(0.0)
0.001
(1.5)
0.00
(0.1) (0.6)
0.00
(1.1)
0.01
0.10 0.00
0.00
130
To be able to discriminate between signicant and less signicant i coecients we have reported the least squares standard errors of estimates. Note, however, that the t values are distributed as Students t only if the corresponding 0i xt is stationary. If this is not the case, then a Dickey-Fuller type distribution is probably more appropriate. Based on the estimated coecients we note that (i) the rst relation is signicantly adjusting only in the ination rate equation, (ii) the second relation only in the money stock equation, (iii) the third relation only in the deposit rate equation, (iv ) the fourth relation only in real income, but with a t-value which would hardly be signicant based on a Dickey-Fuller distribution, and (v) the last relation is not really important in any equation. Since a stationary variable, xt , cannot be signicantly explained by a nonstationary variable, this is a sign of nonstationarity of the last two eigenvectors. The graphs in Figure 7.1-7.5 of 0i xt and 0i R1t , where 0i R1t is derived from the concentrated model (7.4) seem to support this interpretation. The nding that the cointegration vectors are signicant in just one equation each is a fortunate (and atypical) situation. In this case we already know from the outset of the cointegration analysis that three of the variables, real money, ination and deposit rate, are equilibrium error correcting, whereas the remaining two, the real income and the bond rate, are not. This is supported by the nding of no signicant coecients in the fth row of the matrix corresponding to the bond rate equation and only a signicant trend coecient in the second row corresponding to the income equation. As a zero row of is the condition for weak exogeneity w.r.t. the long-run parameters , this tentative nding suggests that real income and bond rate are weakly exogenous in this model, an issue that will be further discussed in Chapter 9. Because each cointegration relation 0i xt was found to be important in just one equation, we will tentatively try to interpret them as potential steadystate relations in respective equations. The rst relation seems approximately to describe a relation between real deposit rate and the interest rate spread:
(Rm p) = 0.6(Rb Rm ) + ...
(7.13)
It was found to be important in the ination rate equation where the sign of the adjustment coecient suggests equilibrium error correction. Hence, in-
131
ation rate seems to have adjusted upwards when the real short-term interest rate has been above 0.6 times the long-short interest rate spread. The second relation resembles a typical money demand relation where the opportunity cost of holding money is measured by the spread between the bond rate and the deposit rate: mr = y r 13.5(Rb Rm ) + ... (7.14)
Only money stock is signicantly adjusting to this relation with a negative sign of the coecient 12 . This suggests that money stock is equilibrium error correcting to agents demand for money. The third relation seems to describe an interest rate relation: Rm = 0.4Rb + ... (7.15)
Only the deposit rate seems to be signicantly adjusting to this relation. Again the coecient suggests it is equilibrium error correcting. The fourth relation, which is probably nonstationary, seems most important for the real income equation. Normalizing on real income gives the following relation: y r = 0.3mr 1.4(Rm Rb ) + ... (7.16)
which resembles an IS-curve relationship with positive real money eects. The last relation seems denitely nonstationary and we will not attempt to interpret it. It has been debated whether it is at all meaningful even tentatively to interpret the estimated eigenvectors and in some cases the unrestricted relations are clearly not interpretable, and it does not make sense to attempt to do so. But, surprisingly often the rst unrestricted estimates give a rough picture of the basic long-run information in the data. The latter can subsequently be used to facilitate the identication of an acceptable structure of cointegration relations as argued below: If we assume that the cointegration rank is three in the above example, then the rst three eigenvectors (7.13) - (7.15) dene stationary relations (provided that the coecients we tentatively set to zero were in fact
132
zero). This can be formally tested based on a LR test procedure discussed in Chapter 9 and if accepted we have identied three tentatively interpretable cointegration vectors spanning the cointegration space. Assume now that the economic model we had in mind contained two steady-state relations: a money demand relation and an aggregate income relation, but due to a ceteris paribus assumption no prior relation for ination rate and the interest rates. How should we proceed after this rst inspection of the results? In my view already at this stage we need to adjust our intuition of how the economic and the empirical model work together. One possibility is to go back to the economic model and see whether it is possible to understand the long-run weak exogeneity of the real income variable and the long-term bond rate and whether it is possible to make ination and the deposit rate enter the model. In some cases the economic model needs only minor modications, in other cases the model needs mor fundamental changes. Another possibility is to reconsider the choice of variables and nd out whether an extended empirical model would be more consistent with the chosen economic model. For example in the above example we found that the fourth relation, though probably nonstationary, exhibited coecients which resembled an income relation. This could suggest that there is an important omitted I(1) variable, for example real exchange rate, which is needed for the income relation to become stationary. Thus, a rst tentative inspection of the empirical results might at an early stage of the analysis suggest how to modify either your empirical or your economic model. This is in my view one way of translating Haavelmos .... The other alternative, which is to force your economic model onto the data, i.e. squeezing the reality into all-too-small-size clothes, is a too frustrating experience which all too often makes the desperate researcher choose solutions which are not scientically justied. The last part of Table 7.1 reports the estimates of the unrestricted based on full rank. First note that 0i. xt , i = 1, ..., p, denes a stationary relation only when = 0 where and are pr and (p + m) r matrices and m is the number of deterministic components estimated to be proportional to . Therefore t-values in the brackets cannot be interpreted as Students t, because some of the 0 xt relations are nonstationary.
V1` * Zk(t)
1 0 -1 -2 -3 -4 -5 -6 74 76 78 80 82 84 86 88 90 92
133
V1` * Rk(t)
3 2 1 0 -1 -2 -3 74 76 78 80 82 84 86 88 90 92
Figure 7.1. The rst cointegration relation 01 xt (upper panel) and 01 R1t corrected for short-run eects (lower panel).
134

V2` * Zk(t)
-32.0 -32.8 -33.6 -34.4 -35.2 -36.0 -36.8 -37.6 -38.4 74 76 78 80 82 84 86 88 90 92
V2` * Rk(t)
2.7 1.8 0.9 -0.0 -0.9 -1.8 -2.7 74 76 78 80 82 84 86 88 90 92
Figure 7.2. The second cointegration relation 02 xt (upper panel) and 02 R1t corrected for short-run eects (lower panel).
V3` * Zk(t)
-9 -10 -11 -12 -13 -14 -15 74 76 78 80 82 84 86 88 90 92
135
V3` * Rk(t)
3.2 2.4 1.6 0.8 -0.0 -0.8 -1.6 -2.4 74 76 78 80 82 84 86 88 90 92
Figure 7.3. The third cointegration relation 03 xt (upper panel) and 03 R1t corrected for short-run eects (lower panel).
136

V4` * Zk(t)
55 54 53 52 51 50 49 48 74 76 78 80 82 84 86 88 90 92
V4` * Rk(t)
3.0 2.4 1.8 1.2 0.6 0.0 -0.6 -1.2 -1.8 74 76 78 80 82 84 86 88 90 92
Figure 7.4. The fourth cointegration relation 04 xt (upper panel) and 04 R1t corrected for short-run eects (lower panel).
V5` * Zk(t)
-4 -5 -6 -7 -8 -9 -10 74 76 78 80 82 84 86 88 90 92
137
V5` * Rk(t)
2.1 1.4 0.7 0.0 -0.7 -1.4 -2.1 -2.8 74 76 78 80 82 84 86 88 90 92
Figure 7.5. The fth cointegration relation 05 xt (upper panel) and 05 R1t corrected for short-run eects (lower panel).
Chapter 8 Determination of Cointegration Rank

Section 8.1 gives the basic results for the derivation of the Likelihood Ratio test of the cointegration rank and discusses whether there is an optimal sequence of these tests. Section 8.2 discusses the derivation of the asymptotic tables and how the presence of deterministic components inuence these tables. Section 8.3 discusses the dicult choice of the cointegration rank in a practical situation, Section 8.4 provides an empirical illustration, and Section 8.5 reports on some diagnostic tools for checking parameter constancy and illustrates with an analysis of the Danish data.
8.1
The LR test for cointegration rank
The LR test for the cointegration rank r is based on the VAR model in the R-form (??), where all short-run dynamics, dummies and other deterministic components have been concentrated out. Using (??) and (??) we can write the log likelihood function as:
p X i=1
2lnL() = T ln|S00 | + T
ln(1 i ),
(8.1)
by calculating the eigenvalues of the determinant

1 |S11 S01 S00 S10 | = 0
(8.2)
139
140
CHAPTER 8. COINTEGRATION RANK
giving the solution (1 , 2 , ..., p ). The eigenvalues i can be interpreted as the squared canonical correlation between linear combinations of the levels 0i R1t1 and a linear combination of the dierences 0i R0t . In this sense the magnitude of i is an indication of how strongly the linear relation 0i R1t1 is correlated with the stationary part of the process R0t . Another way of ex 0 pressing this is by noticing that diag(1 , ..., r ) = 0 S1 = S10 S1 S01 , 00 00 i.e. i is related to the estimated i . When i = 0 the linear combination 0 i xt is nonstationary and there is no equilibrium correction, i.e. i = 0. The statistical problem is to derive a test procedure to discriminate between those i , i = 1, ..., r which correspond to stationary relations and those i , i = r + 1, ..., p which correspond to nonstationary relations. Because i = 0 does not change the likelihood function, the maximum is exclusively a function of the non-zero eigenvalues: L2/T = |S00 | r (1 i ). i=1 (8.3)
Based on (8.3) it straightforward to derive a likelihood ratio test for the determination of the cointegration rank r, which involve the following hypotheses: H0 (p) : rank = p, i.e. no unit roots, xt is stationary H1 (r) : rank = r, i.e. p r unit roots, r cointegration relations, xt is non-stationary The LR test, the so called trace test, is found as: ( |S00 |(1 1 )(1 2 ) (1 r ) |S00 |(1 1 )(1 2 ) (1 r ) (1 p ) )
2lnQ(Hr /Hp ) = T ln
pr = T ln(1 r+1 ) (1 p ).
(8.4)
As an illustration consider a VAR with p = 5 variables based on which we test the hypothesis H2 : r = 2, i.e. p r = 3 against the null H5 : r = 5, i.e. p r = 0. The test value is calculated as: |S00 |(1 1 )(1 2 ) 2lnQ(H2 /H5 ) = T ln |S00 |(1 1 )(1 2 )(1 3 )(1 4 )(1 5 ) 3 = T {ln(1 3 ) + ln(1 4 ) + ln(1 5 )}
8.1. THE TRACE TEST
141
i.e. the LR test is a test of 3 = 4 = 5 = 0, corresponding to three unit roots in the model. If this hypothesis is correct then the test statistic should be small when compared to some critical value derived under the assumption that 3 = 4 = 5 = 0. Note, however, that H2 , i.e. (3 = 4 = 5 = 0) is correctly accepted also when 2 = 0 or 2 = 1 = 0. Therefore, if H2 is accepted, we conclude that there are at least 3 unit roots and, hence, at most two stationary relations. Assume that we have a prior hypothesis of the correct number of common trends p r , i.e. r cointegrating relations. We could then calculate the test statistic pr using (8.4) and compare it with the appropriate critical value Cpr to be discussed in the next section. If pr > Cpr , we reject the hypothesis of p r unit roots (common trends) in the model, and conclude that they are fewer than assumed. If pr < Cpr , we accept the hypothesis of at least p r unit roots in the model, but conclude there may be more. Hence, the trace test (8.4) does not give us the exact number of unit roots p r (or cointegration relations r). It only tells us whether p r < p r (r r ) when pr > Cpr or alternatively p r p r (r < r ) when pr Cpr . Therefore, to estimate the value of r we have to perform a sequence of tests. The question is whether this sequence should be from top to bottom, i.e. {r = 0, p unit roots}, {r = 1, p 1 unit roots}, .... , {r = p, 0 unit roots} or the other way around. The asymptotic tables are determined so that when Hr is true then Ppr ( pr C95% (p r)) = 95% , where pr is given by: pr = 2lnQ(Hr |Hp ) = T
p X
i=r+1
ln(1 i )
(8.5)
We discuss rst the topbottom procedure and then compare it with the bottomtop procedure based on a simple example where p = 3. Applying the topbottom trace test procedure can hypothetically produce four dierent choices of the cointegration rank r and, hence, the number of unit roots (p r) : {p r = 3, {p r = 2, {p r = 1, {p r = 0, r = 0} r = 1} r = 2} r = 3} when when when when { 3 { 3 { 3 { 3 C3 } > C3 , 2 C2 } > C3 , 2 > C2 , 1 C1 } > C3 , 2 > C2 , 1 > C1 } .
142
We will now illustrate the properties of the top bottom sequence by investigating P1 { = i, i = 0, ..., 3} when the true value of r = 1, i.e. r p r = 2, and the size of the test is 5%. First, P1 { = 0} = P1 { 3 C3 } r where n o 3 = T ln(1 1 ) + ln(1 2 ) + ln(1 3 ) .
For 1 > 0, we have that T ln(1 1 ) when T . Thus, P1 ( 3 C3 ) 0 asymptotically. The next value p r = 2 corresponds to r = 1, the true cointegration number, and P1 ( 2 C2 ) 0.95 in accordance with the as. way the critical tables have been constructed. Thus, P1 (p r = 1) 0.05 as. and P1 (p r = 0) 5%. We summarize: P1 (p r = 3, P1 (p r = 2, P1 (p r = 1, P1 (p r = 0, r = 0) r = 1) r = 2) r = 3) = = = = P1 ( 3 P1 ( 3 P1 ( 3 P1 ( 3 C3 ) > C3 , 2 C2 ) > C3 , 2 > C2 , 1 C1 ) > C3 , 2 > C2 , 1 > C1 ) 0 0.95 0.05 p0 < 0.05
In this case we start by testing p r = 0, i.e. stationarity and if rejected continue with p r = 1, and so on until rst acceptance. P1 (p r = 0, r = 3) P1 (p r = 1, r = 2) P1 (p r = 2, r = 1) P1 (p r = 3, r = 0) = = = = P ( 1 P ( 1 P ( 1 P ( 1 > C1 ) C1 , 2 > C2 ) C1 , 2 C2 , 3 > C3 ) C1 , 2 C2 , 3 C3 ) p0 < 0.05 0.05 0.95 0
Thus, by applying the top bottom procedure we will asymptotically accept the correct value of r in 95% of all cases, which is exactly what we would like a 5% test procedure to do. We will now similarly investigate the bottom top procedure. For p = 3 there are the following four dierent choices of the cointegration rank: {p r = 0, r = 3} when { 1 > C1 } {p r = 1, r = 2} when { 1 C1 , 2 > C2 } {p r = 2, r = 1} when { 1 C1 , 2 C2 , 3 > C3 } {p r = 3, r = 0} when { 1 C1 , 2 C2 , 3 C3 } .
8.2. THE ASYMPTOTIC TABLES
143
In this case the probability of wrongly accepting r = 2 or r = 3 is 0.05 + p0 0.05. Hence, the probability of choosing the correct value r = 1 is 0.95. The reason why the top down procedure is asymptotically more correct is because the probability of incorrectly accepting r < r is asymptot ically zero, whereas the probability of incorrectly accepting r > r in the bottom top procedure is generally greater than the chosen p-value.
8.2
The asymptotic tables and the deterministic components
The distribution of the Likelihood Ratio test statistic (8.4) is non-standard and has been determined by simulations for the asymptotic case. The asymptotic distributions depend on the deterministic terms in the V AR model as shown in Johansen, 1995c where a detailed treatment can be found. Here we will only give the intuition for how the distributions have been derived and how they have been aected by deterministic components in the VAR model. The LR test statistic of the hypothesis H(r) against H(p) is given by (8.4). Under the null of p r unit roots the last p r eigenvectors vi , i = r+1, ..., p, should behave like random walks. Therefore, if the null hypothesis P is correct then the calculated trace test statistic, T p i=r+1 ln(1 i ) should not deviate signicantly from the simulated test values C . However, it can be shown that the asymptotic distribution of 2lnLR{(H(r)/H(p)} is the same as 2lnLR{(H(0)/H(p r)}. Therefore, the asymptotic tables have been simulated for (p r)-dimensional VAR models where r = 0. Under the null hypothesis of p r unit roots, the following approximation can be used: T where
pr X i=1 pr X i=1
ln(1 i ) T
pr X i=1
1 1 i trace(S11 S10 S00 S01 )
(8.6)
144
Without loss of generality we let S00 = I and hence:

pr X i=1
1 1 i trace(S11 S10 S01 ) = trace(S01 S11 S10 )
(8.7)
Thus, the idea behind the asymptotic tables is to simulate the distribution of (8.7) by rst generating a (p r)dimensional random walk process of acceptable length and then replicate this process a large number of times. In the following we will discuss how to simulate the asymptotic tables for ve dierent assumptions on the deterministic terms in the model. They are all sub-models of the following model: xt = 0 xt1 + 0 + 1 t + t where xt is (p 1), 0 = 0 + 0 and 1 = 1 + 1 (8.10) (8.9) (8.8)
Because the trace test is exclusively related to the non-stationary directions of the model we need not consider the stationary components of the model when deriving the asymptotic distributions. Therefore, the tables are simulated under the assumption that there are no short-run adjustment eects, i.e. 1 = 0, ..., k1 = 0, and that r = 0, i.e. there are no equilibrium correction in the simulated models. Altogether the tables have been simulated for pr = 1, ..., 12 unit roots and for ve dierent assumptions on the deterministic components. Under the assumption that r = 0 in (8.8) we have that 0 = 0 , 1 = 1 in (8.9) and (8.10). Thus, the process xt can be represented as:
t X i=1
xt =
1 i + 0 t + 1 t(t + 1) + x0 2
(8.11)
If 0 and 1 are unrestricted in (8.8) all nonstationary directions of the process contain stochastic trends as well as linear and quadratic deterministic
145
time trends. If there are linear but no quadratic trends in the data, then 1 = 0. If there are no linear trends at all in the data, 1 = 0 and 0 = 0. Because the asymptotic distributions change depending on whether 0 and 1 are restricted or unrestricted in the model a correct specication of the deterministic components in the VAR model is crucial for correct inference. The asymptotic tables reported in Johansen (1976) and reproduced in the Appendix have been derived from 6000 replications of p - dimensional random walk processes, yt = t , p = 1, ..., 12 and t Np (0, I), t = 1, ..., 400. As discussed in Chapter 7 the reduced rank regression is based on the covariance matrices Sij , i, j = 0, 1 reproduced below: P 0 S11 = T 1 T R1,t R1,t Pt=1 0 S01 = T 1 T 0 R0,t R1,t Pt=1 0 S00 = T 1 T R0,t R0,t t=1
(8.12)
b b b b where R1,t = xt1 10 11 t and R0,t = xt 00 01 t are the residuals in the auxiliary regressions of xt1 and x in (8.8). Because the simulated VAR model does not contain any short-run eects, xt1 and xt will only be corrected in those versions of the model which contain an unrestricted trend and/or a constant. We will now show how to simulate the asymptotic tables using an example where p r = 3. The following ve VAR models with dierent assumptions on the deterministic terms have been simulated: Case 1. 0 , 1 = 0. This corresponds to the VAR model: xt = 0 xt1 + t , which under the assumption that 0 = 0 gives: xt =
t X i=1
i + x0 .
There are no deterministic terms in this model and R1,t and R0,t are specied as:
146
R1,t
Case 2. 0 6= 0, with 0 = 0, and 0 6= 0, 1 = 0. This corresponds to the VAR model: xt =

0
Pt1 1i 1t Pi=1 = t1 2i and R0,t = 2t , t = 1, ..., T. Pi=1 t1 3t i=1 3i
which under the assumption that 0 = 0 gives: xt =

t X i=1
xt1 1
+ t ,
i + x0
There are no linear trends in the model, nor in the data, but the cointegrating relations have an intercept term, so: Pt1 i=1 1i Pt1 2i = Pi=1 t1 3i i=1 1 1t = 2t , t = 1, ..., T. 3t
R1,t
and R0,t
Case 3. 1 = 0, 0 is unrestricted. This corresponds to the VAR model: xt = 0 xt1 + 0 + t with the vectors of corrected residuals: R1,t = xt1 01 and R0,t = xt1 b 00 . Under the assumption that 0 = 0 and 0 6= 0 the process xt is given b by:
t X i=1
xt =
i + 0 t + x0
(8.13)
147
There are no linear trends in the VAR model but, because of the unrestricted constant 0 , the data contain linear trends. In the auxiliary regression of xt1 on the unrestricted constant in the VAR model, the constant x0 in P (8.13) cancels and R1,t will contain stochastic trends t1 i and a linear i=1 trend t but no constant. Since a linear time trend asymptotically dominates a stochastic trend, the (p r) = 3 nonstationary directions of the process are decomposed into (p r 1) = 2 directions that contain the corrected stochastic trends and one direction that contains the linear trend. The tables are based on: Pt1 ( i=1 1i ) | 1 1t P = ( t1 2i ) | 1 and R0,t = 2t , t = 1, ..., T. i=1 3t t|1
R1,t
Case 4. 1 6= 0, 1 = 0, 0 is unrestricted. This corresponds to the VAR model: 0 xt1 + 0 + t xt = 1 t
with the vectors of corrected residuals given by : R1,t = [x0t1 , t]0 01 and b 0 b R0,t = xt1 00 . Under the assumption that = 0 the process xt becomes: xt =
t X i=1
i + 0 t + x0
In this case we have allowed for a linear trend both in the data and in the cointegration relations, but have restricted the quadratic trend to be zero. Because the constant x0 cancels in the regression of xt1 on a constant, R1,t contain three corrected stochastic trends and a linear trend but no constant. Pt1 i=1 1i | 1 Pt1 2i | 1 = Pi=1 t1 3i | 1 i=1 trend | 1 1t = 2t , t = 1, ..., T. 3t
R1,t
and R0,t
148
Case 5. 1 , 0 are unrestricted. This corresponds to the VAR model: xt = 0 xt1 + 0 + 1 t + t with the vectors of corrected residuals: R1,t = xt1 01 11 t and R0,t = b b 0 b b xt1 00 10 t. Under the assumption that = 0 and 1 6= 0 the process xt becomes: xt =
t X i=1
1 i + 0 t + 1 t(t + 1) + x0 2
(8.14)
This model allows for linear trends and quadratic trends in the data as well as linear trends in the cointegrating relations. Because the constant and the linear trend in (8.14) cancel in the regression of xt1 onP unrestricted the constant and the linear trend, only the stochastic trends, t1 i , and the i=1 quadratic trend, t2 , are left in R1,t . Furthermore, because a quadratic time trend asymptotically dominates a linear stochastic trend, the (p r) = 3 nonstationary directions of the process are decomposed into (p r 1) = 2 directions which contain the corrected stochastic trends and one direction which contains the quadratic trend. Pt1 ( i=1 1i ) | 1, t 1t Pt1 = ( i=1 2i ) | 1, t and R0,t = 2t , t = 1, ..., T. 3t t2 | 1, t
R1,t
As an example let us consider a VAR model with an unrestricted constant. In this model we would like to test the hypothesis of p r = 3 unit roots. This case corresponds to the third row of Table A.3 and the rst test value corresponds to the 50% quantile of the 5000 test statistics calculated from the simulated VAR model with p r = 3 unit roots and an unrestricted constant. Thus, in 2500 cases the trace test statistic was smaller than 18.65 and in 4750 cases smaller than 29.38, the 95% quantile. We have shown that the asymptotic distributions depend on whether there is a constant and/or a trend in the VAR model and whether they are unrestricted or not. However, other deterministic components, such as intervention dummies, are also likely to inuence the shape of the distributions.
149
In particular, care should be taken when a deterministic component generates trending behavior in the levels of the data. A typical example is an unrestricted shift dummy ( ,0,0,0,1,1,1, ) which cumulates to a broken linear trend in the data. An detailed discussion of this case can be found in Johansen, Nielsen, and Mosconi (2000), Juselius (2000), and Doornik, Hendry, and Nielsen (1998). Because the asymptotic distributions for the rank test depend on the deterministic components in the model and whether these are restricted or unrestricted, the rank and the specication of the deterministic components have to be determined jointly. Alternatively the deterministic components have to be removed from the model prior to testing. Nielsen and Rahbek (1998) have demonstrated that a test procedure based on a model formulation that allows a deterministic variable, Dt , to be in the cointegration relations and its dierence, Dt , to be in the VAR equations gives similarity in the test procedure. Assume, for example, that the data contain a linear trend t so that E[xt ] = 0 6= 0. In this case we need to include an unrestricted constant term 0 = 0 + 0 in the VAR model (c.f. (??) and the discussion in the previous chapter) to account for the linear growth in the data. However, a linear trend in the variables need not cancel in the cointegrating relations and we need to allow for the possibility of trend-stationary cointegration relations. This is achieved by allowing the linear trend to enter the cointegrating relations, i.e. 1 6= 0 in (??). Thus, to achieve similarity in the test procedure the linear trend t has to be restricted to the cointegration relations and a constant term (i.e. the dierence of t) has to be unrestricted in the model. To summarize: Given linear trends in the data, case 4 (see also Chapter 6, Section 3) is generally the best specication to start with unless we have a strong prior that the linear trends cancel in the cointegration relations. This is because case 4 allows for trends both in the stationary and nonstationary directions of the model and, hence, similarity in the test procedure. When the rank has been determined it is always possible to test the hypothesis 1 = 0, as a linear hypothesis on the cointegrating relations. This will be illustrated in the next chapter. Given no linear trends in the data, case 2 is the appropriate specication (unless exceptionally the cointegration relations can be assumed to have a zero mean)
150
8.3
The cointegration rank: a dicult and crucial choice
The cointegration rank divides the data into r relations towards which the process is adjusting and p r relations which are pushing the process. The former will be given an interpretation as equilibrium errors (deviations from steady-state) and the latter as common driving trends in the system. Hence, the choice of r will inuence all subsequent econometric analysis and will be crucial for conclusions we draw on our economic hypotheses. In the previous section we showed that the asymptotic distributions depend on the deterministic components in the VAR model and that the stationary short-run eects 1 xt1 + ...+ k1 xtk+1 do not matter asymptotically. In small samples these eects are in most cases important. Johansen (2002) demonstrated that the closer the VAR model is to the I(2) boundary the more important is the short-term dynamic eects. In many cases the proper solution is to use bootstrap methods to determine the critical values (ref.). The idea is to simulate tables for models mimicking the short-run dynamics of the empirical model. Unfortunately there is no clear answer to the question of how many observations we need for the asymptotic results to hold suciently well. Whether the sample is small or big is not exclusively a function of the number of observations available in the sample but also of the information in the data. If the data are very informative about a hypothetical long-run relation ( 0 xt , i.e. the equilibrium error crosses the mean line several times over the sample period) then we might have good test properties even if the sample period is relatively short. If the estimated eigenvalues are empirically informative in the sense of being either very high or very low the trace test is likely to perform well. Note, however, that a high value of i can also be an indication of a small ratio between the number of estimated parameters and the number of observations. If some of the estimated eigenvalues are in the region where it is hard to discriminate between signicant and insignicant eigenvalues the trace test will usually have low power for near unit root alternatives. A low power of the test is a sign that the data are not very informative about the cointegration rank. Thus, we may have a problem both with the size and the power of the test when determining the rank. In the ideal case we would like the probability to reject a correct null hypothesis (r = r ) to be small and the probability
8.3. CHOOSING THE RANK
151
to accept a correct alternative hypothesis (r 6= r ) to be high for relevant hypotheses in the near unit root region. For example, if the adjustment back to equilibrium is very slow then the correct hypothesis would be a stationary but near unit root. Many simulation studies have demonstrated that the asymptotic distributions can be poor approximations to the true distributions when the sample size is small resulting in substantial size and power distortions. Small sample corrections have been developed in Johansen (2002). For moderately sized samples (50-70) typical of many empirical models in economics these corrections can be substantial. While applying a small sample correction to the trace test statistics leads to a more correct size, it does not solve the power problem. In some cases the size of the test and the power of alternative hypotheses close to the unit circle are almost of the same magnitude. In such cases a 5% test procedure will reject r = r incorrectly in 5 % of all the cases where r is the true value. It will also incorrectly accept r = r in say 90% of the cases when the true value of r is greater than r . This is particularly worrying when the null of a unit root is not a natural economic hypothesis. As discussed above the trace test is based on a sequence of tests. In the top-down case we test the hypothesis p unit roots and, if rejected, continues until rst acceptance of p r unit roots. Whether we choose the top-down or the down-top, the test procedure is essentially based on the principle of no prior economic knowledge regarding the rank r. This is in many cases dicult to justify. For example, in the monetary model of Chapter 2 we demonstrated that the hypothesis (r = 3, p r = 2) was a priori consistent with two types of autonomous shocks, one shifting the aggregate demand curve and the other the aggregate supply curve. We also discussed that for a more regulated economy the hypothesis (r = 2, pr = 3) might be preferable a priori as a result of very slow market adjustment. An alternative procedure is, therefore, to test a given prior economic hypothesis, say p r = 2, using the trace test and, if accepted, continue with this assumption unless the data strongly suggest the presence of additional unit roots. The latter can be investigated in a number of ways. For example, we can test the signicance of the adjustment coecients i,r , i = 1, ..., p of the r0 th cointegrating vector. If all r,i coecients have small t-ratios, then including the r0 th cointegrating relation in the model would not improve the explanatory power of the model but, more likely, would invalidate subsequent inference. But if the choice of r incorrectly includes a nonstationary relation
152
among the cointegrating relations, then one of the roots of the characteristic polynomial of the model would correspond to a unit root or a near unit root and, thus, be large. If either of these cases occur, then the cointegration rank should be reduced. Note, however, that additional unit roots in the characteristic polynomial can be the result of I(2) components in the data. Reducing the rank in this case will not solve the problem as will be further discussed in Chapters 14 and 15. Note also that the cointegration rank is not in general equivalent to the number of theoretical equilibrium relations derived from an economic model. For example, in the monetary model of Chapter 2 there was one equilibrium relation, the money demand relation (??). As demonstrated in Chapter 2, Section 4 the monetary model would be consistent with r = 3 cointegrating relations (and not one) in a VAR model with real money, real income, ination and two interest rates (instead of just one as in Romers example). The prior assumption that r = 1 has been incorrectly assumed in many empirical applications of money demand data. Thus, cointegration between variables is a statistical property of the data that only exceptionally can be given a direct interpretation as an economic equilibrium relation. The reason for this is that a theoretically meaningful relation can be (and often is) a weighted sum of several irreducible cointegration relations (Davidson, 2001). However, these relations contain invaluable information about common stochastic trends between sets of variables. In the next chapter we will illustrate that this can be used to assess the hypothetical scenario of the economic problem as proposed in Chapter 2. To summarize: When assessing the appropriateness of the asymptotic tables to determine the cointegration rank we need to consider not only the sample size but also the short-run dynamics. Because the power of the trace test can be very low for alternative hypotheses in the neighborhood of the unit circle it is advisable to use as much additional information as possible. For example, we can check 1. the characteristic roots of the model: If the rth + 1 cointegration vector is nonstationary and is wrongly included in the model, then the largest characteristic root will be close to the unit circle. 2. the t-values of the -coecients to the rth + 1 cointegration vector. If all of them are small, say less than 3.0, then one would not gain a lot by including the rth + 1 vector as a cointegrating relation in the model.
153
3. the recursive graphs of the trace statistic for r = 1, 2, ..., p. Since the e variable Tj ln(1 i ), j = T1 , ..., T, grows linearly over time when i 6= 0 the recursively calculated components of the trace statistic should grow linearly for all i = 1, ..., r , but stay constant for i = r + 1, ..., p. 4. the graphs of the cointegrating relations: If the graph of a supposedly stationary cointegration relation reveals distinctly nonstationary behavior, one should reconsider the choice of r, or nd out if the model specication is in fact incorrect, for example are data I(2) instead of I(1). 5. the economic interpretability of the results. The above criteria will now be illustrated based on the Danish data.
8.4
An illustration based on the Danish data
In Table 8.1 we have reported the estimated eigenvalues, P, the components i of the trace test, Tln(1-i ), the trace test, Trace(i) = i Tln(1-i ),and j=1 C , the 95% quantiles from the asymptotic Table 15.4 in Johansen (1995). .95 The trace test appears to reject 5 unit roots (140.4 > 87.0) and 4 unit roots (71.7 > 62.6), but not 3 unit roots (39.0 < 42.2). Based on the trace test we would, therefore, choose r = 2. As demonstrated in Chapter 1, our prior economic hypothesis was r = 3 assuming two driving trends; one nominal and one real stochastic trend. The sample size is 79 and the asymptotic tables may not be very precise approximations in this case. Furthermore, the power of the test might be low for the third eigenvector with the eigenvalue 3 = 0.27. This value is in the borderline region where it is hard to know whether the corresponding eigenvector should be considered stationary or nonstationary. Therefore, before choosing the cointegration rank it is useful to examine all ve sources of additional information suggested at the end of Section 8.3. The largest characteristic root is 0.77 for r = 2 and 0.83 when r = 3, a moderate dierence. Whether the root 0.83 corresponds to a unit root or not is dicult to know and we will collect more information by inspecting the signicance of the adjustment coecients i,2 and i,3, i = 1, ..., p. Table 7.1 reported the matrix decomposed into all ve and vectors. We
154
Table 8.1: The trace test of the cointegration rank and the eigenvalue roots of the model i p-r Tln(1-i ) Trace(i) C Modulus: 5 largest roots .95 r=5 r=4 r=3 r=2 0.59 5 68.7 140.4 87.0 0.88 1.0 1.0 1.0 0.35 4 32.7 71.7 62.6 0.88 0.87 1.0 1.0 0.27 3 24.0 39.0 42.2 0.74 0.74 0.83 1.0 0.12 2 10.1 15.0 25.5 0.55 0.74 0.66 0.77 0.06 1 5.0 5.0 12.4 0.44 0.56 0.66 0.43
notice that the choice r = 2 will exclude the relation 03 xt describing a relation between the two interest rates from the model. The adjustment coecient 4,3 is highly signicant implying that the deposit rate is strongly adjusting to 03 xt . In Figure 8.1 the graph of the third relation suggests meanreverting behavior in spite of some evidence of drift. Finally, the graph of the recursively calculated trace test for the third component given in Figure 8.2 exhibits linear growth over time, consistent with 3 being dierent from zero. Altogether we have found strong evidence supporting our economic prior r = 3. However, before trusting this choice we need to assess the constancy of parameters of the VAR model. In the next section we will discuss some recursive procedures which have been developed to detect possible sources of parameter nonconstancy in the VAR model.
8.5. RECURSIVE TESTS OF CONSTANCY

V3` * Zk(t)
-9 -10 -11 -12 -13 -14 -15 74 76 78 80 82 84 86 88 90 92
155
V3` * Rk(t)
3.2 2.4 1.6 0.8 -0.0 -0.8 -1.6 -2.4 74 76 78 80 82 84 86 88 90 92
Figure 8.1. The graphs of the third cointegration relation. Upper panel based on 0 xt and lower panel based on 0 R1t .
8.5
Recursive tests of constancy
In Chapter 4 we applied a number of residual misspecication tests of the VAR model. Though the empirical model seemed to pass these tests suciently well to continue the analysis it is completely possible that the model suer from parameter non-constancy. The purpose of this section is to provide a number of diagnostic tests to check for this important feature of the model.
8.5.1
Recursively calculated trace tests
156
The Trace tests

2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 83 84 85 86 87 88 89 90 91 92 93
Z(t)
THE CRITICAL VALUES ARE NOT VALID
2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 83 84 85 86 87
R(t)
88
89
90
91
92
93
1 is the 10% significance level
Figure 8.2. The recursively calculated components of the trace statistic scaled by the 90% quantile of the asymptotic distributions. The expression for the trace test in (8.6) is used repeatedly in the recursive estimation of the VAR model. Figure 8.2 illustrates the recursively calculated components. To increase readability, the trace statistic is scaled by the 90% quantile of the appropriate asymptotic distribution. Note, however, that the scaling by the critical values can have the consequence that the lines cross each other. Note also that if the VAR model contains exogenous variables or dummy variables, then the applied 90% quantiles may no longer be appropriate and CATS will insert a warning in the picture The critical values are not valid. The upper panel is based on recursive estimation of the full model, whereas the lower panel is based on recursive estimation of the R-form in which shortrun eects have been concentrated out as follows: Based on the full sample R0t and R1t are determined once and for all by the auxiliary regressions (??)
157
in Chapter 7. The recursions are now calculated as if R0t = 0 R1t + t is the true model. In this sense any parameter instability in the short-run coecients will have been averaged out in the lower panel. All trace components exhibit (to some extent) trending behavior consistent with non-zero i , i = 1, ..., p. However, the smallest eigenvalue test component in the upper panel is hardly growing and is, therefore, likely to correspond to a unit root or a near unit root.
8.5.2
The recursively calculated log-likelihood

Z(t)
51.1 51.0 50.9 50.8 50.7 50.6 50.5 50.4 50.3 83 84 85 86 87 88 89 90 91 92 93 50.04 83 84 85 86 87 88 89 90 91 92 93 50.28 50.52
R(t)
50.64
-ln(det(S00))
-ln(det(S00))
50.40
50.16
2.72 2.56 2.40 2.24 2.08 1.92 1.76 1.60 1.44 83 84 85 86
-Sum(ln(1-lambda))
2.1 2.0 1.9 1.8 1.7 1.6 1.5
-Sum(ln(1-lambda))
87
88
89
90
91
92
93
83
84
85
86
87
88
89
90
91
92
93
60.0 57.5 55.0 52.5 50.0 47.5 45.0 83 84 85 86
-2/T*log-likelihood
60.0 57.5 55.0 52.5 50.0 47.5 45.0
-2/T*log-likelihood
87
88
89
90
91
92
93
83
84
85
86
87
88
89
90
91
92
93
Figure 8.3. The recursively calculated loglikelihood based on the full model and the R-form. The log-likelihood value is calculated as: 2/t1 ln(L(r)) = t1 = T0 , . . . , T, !
ln |S00,(t1 ) | +
r X i=1
ln(1 i,(t1 ) ) ,
(8.15)
p and the 95% condence bound is calculated as 2 2p/t1 . Figure 8.3 illustrates.
158
It appears that the calculated log-likelihood lies within the 95% condence bands for t1 =1983:1-1994:3. Note also that the R - form is more stable than the full model. This is a typical outcome, because some of the short-run coecients in the full model are likely to be unstable over time.
8.5.3
Recursively calculated prediction tests

1-step prediction test
5
Z(t)
83
84
85
86
87
88
89
90
91
92
93
2.5
R(t)
2.0
1.5
1.0
0.5
0.0
83
84
85
86
87
88
89
90
91
92
93
Figure 8.4. Recursively calculated one-step ahead prediction errors of the system. The upper panel is for the full model and the lower panel for the R-form model. The one-step-ahead prediction test is based on the hypothesis that the vector process xt1 is generated by the same cointegrated process that has generated x1 , . . . , xt1 1 , (t1 = T0 + 1, . . . , T ). (See Ltkepohl (1991), u section 4.6). The one-step-ahead prediction error for the system is calculated
8.5. RECURSIVE TESTS OF CONSTANCY as:

k1 X j=1
159
ft1 = xt1
j,(t1 ) xt1 j (t11 ) xt1 1 0,(t1 1) (t1 1) Dt1 , (8.16)
t1 = T0 + 1, . . . , T, and the test statistic as: t1 0 T (t1 ) = + 1 f(t1 ) 11) f(t1 ) (t1 d1 + r t1 = T0 + 1, . . . , T.
(8.17)
where d1 = k 1 + d and d is the number of dummy variables in the model. Under the null T (t1 ) is asymptotically distributed as 2 with p degrees of freedom. Figure 8.4 illustrates. A prediction error larger than two standard errors it indicated with a long vertical line. Note that the prediction errors from the full model are generally much larger than from the R-form model. This is almost an artifact of the R-model predicting exclusively the long-run components of the model in contrast to the full model which has to predict both the long-run and the short-run components. The test of one-step-ahead prediction errors for the individual variables xi,t1 is given by: t1 2 + 1 fi,t1 /ii,(t1 1) , d1 + r
Ti (t1 ) =
t1 = T0 + 1, . . . , T,
(8.18)
and is asymptotically distributed as 2 (1). Also in this case we can choose between predictions from the full model illustrated in Figure 8.5 and from the R-form model illustrated in Figure 8.6. Note the large prediction errors of the bond rate and the deposit rate at 1983:1 based on the full model. The R-form model does not show a similar prediction failure because this eect has been concentrated out by the dummy variable Dp83t .
160
3.0
DMO
2.00
DIDE
1.75 2.5 1.50 2.0 1.25
1.5
1.00
0.75 1.0 0.50 0.5 0.25
0.0 83 84 85 86 87 88 89 90 91 92 93
0.00 83 84 85 86 87 88 89 90 91 92 93
3.5
DFY
6.4
DIBO
3.0
5.6
2.5
4.8
4.0 2.0 3.2 1.5 2.4 1.0
1.6
0.5
0.8
0.0 83 84 85 86 87 88 89 90 91 92 93
0.0 83 84 85 86 87 88 89 90 91 92 93
2.00
DDIFPY
1.75
1.50
1.25
1.00
0.75
0.50
0.25
0.00 83 84 85 86 87 88 89 90 91 92 93
Figure 8.5. One-step ahead prediction errors of each variable of the system based on the full model. A vertical line indicates a prediction error larger than two standard errors.
161
2.25 2.00 1.75
DMO_R
1.75
DIDE_R
1.50
1.25 1.50 1.25 1.00 0.75 0.50 0.50 0.25 0.00 83 84 85 86 87 88 89 90 91 92 93 0.25 1.00
0.75
0.00 83 84 85 86 87 88 89 90 91 92 93
3.5
DFY_R
1.50
DIBO_R
3.0
1.25
2.5 1.00 2.0 0.75 1.5 0.50 1.0 0.25
0.5
0.0 83 84 85 86 87 88 89 90 91 92 93
0.00 83 84 85 86 87 88 89 90 91 92 93
2.25 2.00 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 83 84 85 86
DDIFPY_R
87
88
89
90
91
92
93
Figure 8.6. One-step ahead prediction errors of each variable of the system based on the R-form model. A vertical line indicates a prediction error larger than two standard errors. The time paths of the recursively calculated r largest eigenvalues shows the estimated eigenvalues from the unrestricted VAR model (8.22). The standard error of the estimate i (t1 ) is calculated as: s.e.(i ) = where ri,u (h) = T 1 PT t=h ui,t ui,th , q P 1 1 3 T 1 4(1 i )2 (i + T (1 h/T 3 )2 (ri,u (h)2 ri,uv (h)2 )), h=1 (8.19)
ri,uv (h) = T 1
PT
t=h ui,t vi,th ,
162 for
1 ui,t = i 2 0i S00 R0t , and

0 1 1 vi,t = (i (1 i )) 2 0i S00 (R0t R1t ).
For further details see Hansen & Johansen (1999). Figure 8.7 illustrates the recursively calculated i , i = 1, ..., 3 together with the 95% condence bands. It appears that the estimated eigenvalues stays within the bands for all periods. This is a quite typical outcome which suggests that the power of detecting instability might be quite low.
1.00
lambda1
1.00
lambda3
0.75
0.75
0.50
0.50
0.25
0.25
0.00 83 84 85 86 87 88 89 90 91 92 93
0.00 83 84 85 86 87 88 89 90 91 92 93
1.00
lambda2
0.75
0.50
0.25
0.00 83 84 85 86 87 88 89 90 91 92 93
Figure 8.7. Recursively calculated i , i = 1, ..., 3 together with 95% condence bands.
8.5. RECURSIVE TESTS OF CONSTANCY The test of constancy of is a test of the hypothesis: H : sp( (t1 ) ), t1 = T0 , . . . , T.
163
in which is a known matrix. The test statistic is given by 2 ln(Q(H | (t1 ) )) = t1

r X ln(1 i,(t1 ) ) ln(1 i,(t1 ) ) , (8.20) i=1
t1 = T0 , . . . , T, where i (t1 ) are the solutions of
1 |S11,(t1 ) S10,(t1 ) S00,(t1 ) S01,(t1 ) | = 0,
t1 = T0 , . . . , T,
(8.21)
and i ( ) are the r largest eigenvalues in the solution of the unrestricted eigenvalue problem:
1 |S11,(t1 ) S10,(t1 ) S00,(t1 ) S01,(t1 ) | = 0,
t1 = T0 , . . . , T.
(8.22)
The test statistic (8.20) is asymptotically distributed as 2 with (p1 r)r degrees of freedom (Hansen & Johansen, 1993). Figure 8.8 illustrates the case where is the full sample estimate and Figure 8.9 where is the estimate based on the sample 1974:2-1987:1. To increase readability the test values have been scaled by the 95% quantiles of the 2 distribution. A value larger than 1.0 means that the test rejects constancy. The solid line is based on the full model the dotted line based on the R-form model. In Figure 8.8 the dotted line shows that the full sample would have been accepted in all periods 1974:2-1983:1+j, j = 1, ..., 47 when the short-run eects had been corrected for whereas the solid line shows that this would not have been the case in the rst few years after 1983. Based on Figure 8.9 the conclusion is not as clear. Here we have chosen a specic sub-sample, 1974:2-1987:1 as a reference point. Based on the R-form model (the dotted line) we could approximately accept constancy of over the full period, though at the end of the sample the test is close to the critical value. Based on the full model (solid line) constancy of is less strongly supported. This serves as an illustration that the test procedures can give quite dierent results depending on the questions we ask. It is often useful to check the sensitivity of the stability tests to the choice of reference value using dierent sample periods.
164

Test of known beta eq. to beta(t)
BETA_Z BETA_R
1.50 1.25 1.00 0.75 0.50 0.25 0.00 -0.25 83
85
87
89
91
93
Figure 8.8. Recursively calculated tests of the full sample estimate sp( t1 ).
1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 83 85 87 89 91 93
Test of known beta eq. to beta(t)

BETA_Z BETA_R
Figure 8.9. Recursively calculated tests of sp( t1 ) where is estimated on the sub-sample 1974:2-1987:1.
Chapter 9 Testing restrictions on and

In Section 7.4 we discussed the eigenvector decomposition of the long-run matrix and interpreted the unrestricted estimates as a convenient description of the information given by the covariances of the data. In Chapter 8 we discussed how to determine the cointegration rank which separates the eigenvectors into r stationary and p r nonstationary directions. The purpose of this chapter is to discuss a number of test procedures by which we can test various restrictions on the r stationary cointegrating relations. While the nal aim is to test and impose over-identifying structural restrictions on the long-run structure and on the adjustment coecients , the restrictions discussed here are not identifying by themselves. Nonetheless, they imply binding restrictions on = 0 and, hence, are testable. By systematic application of these tests one will gain invaluable information on the time-series properties of the long-run relations which will facilitate identication of the long-run and short-run structure to be discussed in Chapters 10-14. This chapter will discuss how to test for stationarity of cointegration relations subject to the following types of restrictions on and : (i) same restrictions on all cointegrating relations (ii) all coecients known in some of the relations, (iii) only some coecients known in a cointegrating relation, (iv) zero restrictions of rows of . The organization is as follows: Section 9.1 discusses how to formulate hypotheses as restrictions on the parameter matrices, Section 9.2 how to test the same restriction on all , Section 9.3 how to test a known , Section 9.4 how to test some restrictions on a cointegration vector, Section 9.5 how to test long-run weak exogeneity as a row restriction on , and Section 9.6 interprets the results in terms of the scenario analysis of Chapter 2. 165
166
CHAPTER 9. TESTING RESTRICTIONS
9.1
Formulating hypotheses as restrictions on
Hypotheses on the cointegration vectors can be formulated in two alternative ways: either by specifying the si free parameters in each i vector, or the mi restrictions on each vector. We will consider both cases for a general formulation of restrictions on the (p1r) matrix ,where p1 is the dimension of xt1 in the VAR model. We rst specify the constrained cointegration vector c in terms of the si free parameters: i
c = (c , ..., c ) = (H1 1 , ..., Hr r ), 1 r
(9.1)
where i is a (si 1) coecient matrix, and H i is a (p1 si ), design matrix, and i = 1, ..., r . In this case we use the design matrices to determine the si free parameters in each cointegration vector. In the other case we specify some restriction matrices Ri (p1 mi ) which dene the mi restrictions on i :
R01 1 = 0 . . . R0r r = 0. As an illustration consider the following hypothetical specication of 0 xt 0 r r where xt = [mt , yt , pt , Rm,t , Rb,t , Ds83t ] :
r 01 xt = mr yt b1 (Rm,t Rb,t ) b2 Ds83t t r 02 xt = yt b3 (pt Rb,t ) 0 3 xt = (Rm,t Rb,t ) + b4 Ds83t
The rst cointegration relation has three free parameters (s1 = 3), corresponding to three restrictions (m1 = 3) and the second and the third relation have two free parameters (s2 = s3 = 2), corresponding to four restrictions (m2 = m3 = 4). The three restricted vectors Hi i take the following form:
9.1. FORMULATING HYPOTHESES
167
1 = H1 1 = 2 = H2 2 =
1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0
11 12 , 13
21 22 , 0 0 0 0 0 1 31 32 .
3 = H3 3 =
It appears that 11 = 11 = 12 , 12 = 24 = 25 , 13 = 36 . After normalization on real money in the rst relation the normalized coecients become b1 = 12 /11 , b2 = 13 /11 and similarly for the remaining relations. We will now formulate the above relations as restrictions on the i coefcients. They are as follows:
168
1 1 0 0 0 0 R01 1 = 0 0 1 0 0 0 0 0 0 1 1 0 1 0 R02 2 = 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0
11 12 13 14 15 16 21 22 23 24 25 26 31 32 33 34 35 36
0 0 0 1 0 0 0 0
= 0,
1 0 R03 3 = 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 1
= 0,
Note that Ri = H,i , i.e. R0i Hi = 0. After the rank has been determined all tests are about stationarity, i.e. the null hypothesis is now that a restricted linear combinations of the vector process c is stationary. This is in contrast to the rank test where the null i hypothesis was a unit root. Under the assumption that the rank was correctly chosen, the matrix = 0 , where and are of dimension r, describes the r stationary directions . Hence testing restrictions on is the same as asking whether a restricted vector c lies in the stationarity space spanned i by . When a test rejects it means that the restricted vector points outside the stationarity space.
= 0.
9.2
Same restriction on all
Before imposing specic restrictions on each i it is often useful to test for general restrictions on all relations. Typical examples are tests of long-run
9.2. SAME RESTRICTION
169
exclusion of a variables, i.e. a zero row restriction on . If accepted, the variable is not needed in the cointegration relations and can be omitted altogether from the cointegration space. Another example is the test of longrun price homogeneity between some (or all) of the variables. For example, we want to test whether nominal money and prices are long-run homogeneous in all cointegration relations. If accepted, we can re-specify the long-run relations directly in real money (and, as will be shown in Chapter 14 a nominal growth rate) thus simplifying the long-run structure. These are testable hypotheses, but, since they impose the same restriction on all cointegration relations, they are not identifying as will be shown in the next chapter. All Hi (or Ri ), i = 1, ...r, are identical and we can formulate the hypothesis as:
Hc (r) : c = (H1 , ..., Hr ) = H
(9.2)
where c is p1 r, H is p1 s, is s r and s is the number of unrestricted coecients in each vector. The hypothesis Hc (r) is tested against H(r) : unrestricted, i.e. we test the following restricted model:
xt = H xt1 +
k1 X i=1
i xti + t
(9.3)
The hypothesis (9.2) generally implies a transformation of the data vector, H0 xt , as can be illustrated by the following example. Assume that we wish to test the hypothesis of long-run proportionality between mr and y r in all cointegration relations specied by the following design matrix H:
H =
0
1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1
, =
11 21 31 41 51
12 22 32 42 52
13 23 33 43 53
The transformed data vector becomes:
170
If this restriction is accepted all cointegrating relations can be expressed as a function of the liquidity ratio (mr y r )t . Thus, the restricted ,is now (53) instead of (63). The likelihood ratio test procedure is derived by calculating the ratio between the value of the likelihood function in the restricted and unrestricted model. The maximum likelihood of the unrestricted model is: L2/T (H(r)) = |S00 | r (1 i ). max i=1 It appears from (9.3) that we can nd the restricted ML estimator similarly as for the unrestricted model, i.e. by solving the reduced rank regression of xt on H0 xt1 corrected for the short-run dynamics: 0 H S11 H H0 S10 S1 S01 H = 0. 00
H xt =
0
(mr y r )t pt Rm,t Rb,t Ds83t
c This gives the eigenvalues c , c , ..., c and the eigenvectors v1 , vc , ..., vc . 2 s 1 2 s Note that when H is p1 s, the restricted model will have s eigenvalues, i.e. less than the p1 eigenvalues of the unrestricted model and, hence, m p1 r. In the money demand example with r = 3 and p1 r = 3 we can impose at most three identical restrictions on the cointegrating relations. We could, for example, in addition to the liquidity ratio impose the interest rate spread in all relations as well as long-run exclusion of the shift dummy Ds83t resulting in the transformed data vector x0t H = [(mr y r ), p, (Rm Rb )]. Imposing one more restriction would make the dimension of the transformed vector equal to 2, which would violate the condition r = 3. c The ML estimator of is found by choosing = [ v1 , vc , ..., vc ] and then 2 r c transforming the vectors back to = H . Note that this type of restrictions are not identifying and the estimates of the constrained cointegration relations are, therefore, based on the normalization condition and the ordering of the eigenvalues as discussed in Section 7.4.
9.2. SAME RESTRICTION The LR test statistic is calculated as: 2/T c Lmax H (r) Lmax H(r)
2/T
171
= i.e.
c c |S00 | (1 c )(1 2 ) (1 r ) 1 = |S00 | (1 1 )(1 2 ) (1 r )

c
2 ln = T {ln(1 c ) ln(1 1 ) + ... + ln(1 r ) ln(1 r )}. 1 It is asymptotically distributed as 2 () where = rm. There are rm degrees of freedom because we have imposed m restrictions on each of the r cointegration vectors. If the eigenvalues change signicantly when we impose the restrictions H0 xt the test will reject. Section 8.3 gave an interpretation of the eigenvalues i as a squared correlation coecients between a linear combination of the stationary part of the vector process and a linear combination of the non-stationary part. Thus, c if i becomes very small as a result of imposing the restrictions Ri , it is a sign of nonstationarity of the restricted cointegration relation. Another way of expressing this is again to notice that diag(1 , ..., r ) = 0 S1 = 00 0 S10 S1 S01 , i.e. i is related to the estimated i . Therefore, the rejection 00 of the hypothesis = H implies that at least one of the restricted relations no longer dene a mean-reverting relation and, thus, will have insignicant coecients.
9.2.1
Illustrations
The test of = H can be calculated in CATS by choosing the option: Restrictions on subsets of the vectors. The program rst prompts for the number of dierent groups. When the same restriction is imposed on all vectors, the answer is [1 ]. The program then asks for the number of restrictions [m]. In Chapter 6 we discussed the role of the deterministic components in the model and showed that if there are linear trends in the data, the most general formulation is to include a linear trend inside the cointegration relations and an unrestricted constant in the VAR equations. As discussed in Chapter 8, this formulation delivers similar inference based on the trace test for cointegration rank. After the rank is determined we rst test whether the
172
linear trend is needed in the cointegration relations, i.e. whether we can impose a zero restriction on the trend coecient in all cointegrating relations. Example 1: A test of long-run exclusion of a linear trend in the cointer gration relations for x0t = [mr , yt , pt , Rm,t , Rb,t , Ds83t , t]. t H1 : c = H or R0 = 0 where 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 , =
and
H=
11 21 31 41 51 61
12 22 32 42 52 62
13 23 33 43 53 63
R0 = [0, 0, 0, 0, 0, 0, 1]. The design matrix H is specied so that the last row, corresponding to the trend, is equal to zero and the restriction matrix R so that it makes the last coecient row of equal to zero. With few restrictions the R formulation is more parsimonious than the H formulation. Table 9.1 reports the estimates of c and c in the restricted model. It appears that the c i hardly changes compared to the i . The same is true with c compared to the unrestricted reported in Table 7.1 in Chapter 7. Not surprisingly, the three zero restrictions on the trend can be accepted with a p-values of 0.61 and we conclude that the appropriate specication is Case 3 of Section 6.3, i.e. no trend in the cointegration relations, unrestricted constant in the model. When testing the hypothesis of long-run exclusion of variables which are strongly correlated, i.e. collinear, we sometimes nd that long-run exclusion is accepted even if it subsequently turns out that the variable in question was very important in at least some of the long-run relations. Thus, some caution is needed with this test, in particular if the variables are strongly collinear. This could be the case, for example, if two variables are strongly trending.
173
Example 2. A test of long-run exclusion of the shift dummy Ds83t in the r cointegration relations for x0t = [mr , yt , pt , Rm,t , Rb,t , Ds83t , t]. At around t this date the graphs of the variables in levels and dierences reported in Section 3.2 showed a major shift in the level of money stock, a less pronounced shift in the level of real aggregate demand, and a big blip in the dierences of the bond rate. A blip in xt corresponds to a shift in the levels xt . Because it is often hard to know whether the mean shift will cancel or not in a cointegration relation, it is often useful to include the shift dummy in the cointegration relations and then test its signicance. The specication of the H and the R matrices are similar as for the test for long-run exclusion of the trend and will, therefore, not be reported. The hypothesis of long-run exclusion of the shift dummy Ds83 t was rejected with a p-value of 0.01. The restricted estimates are reported under H2 in Table 9.1. Comparing the estimates under H2 to the estimates under H1 (which are c close to the unrestricted estimates) it appears that 1 has hardly changed (implying that the shift dummy was not needed in this relation), whereas this c c is not the case with 2 and 3 . This implies that the estimates of the money demand relation and of the interest rate relation are sensitive to whether we c include the shift dummy or not: The income coecient in 2 is no longer close to -1, as it was when the shift dummy was allowed to enter the relation. Moreover, the interest coecients have almost doubled and one of them has changed sign. Thus, the consequence of deregulating capital movements in 1983 seems to have been a shift in money stock to a higher level without a corresponding shift in the level of real income. If the shift to the new (deregulated) equilibrium level of money stock is not appropriately accounted for by the shift dummy, then the econometric procedure will account for the extraordinary increase in money stock by increasing the estimates of the real income and interest rate coecients. The deregulation of capital movements in 1983 also aected the Danish interest rates, in particular on the long-term bond rate. As a result of the increased foreign demand for Danish bonds, the previously very high yield on bonds dropped dramatically and, after a period of increased volatility, settled down on a much lower level where it has stayed all since. Figure 3.3 in Section 3.2 illustrates this graphically. The eect was a marked decrease in the interest rate spread and a similar gradual stabilization at a much lower level in the new regime. Figure 2.5 in Chapter 2 illustrates this graphically. Without accounting for this shift in the equilibrium level between the interest rates
174
(the risk premium) the relationship between the two interest rates becomes negative as c demonstrates. 3 This example serves as an illustration of the importance of appropriately accounting for major reforms and interventions in order not to bias the coecient estimates of the economic steady-state relations. Example 3. A test of long-run homogeneity between mr and y r in all coinr tegrating relations for x0t = [mr , yt , pt , Rm,t , Rb,t , Ds83t , t]. The hypothesis t H3 can be accepted with a p-value of 0.13. The design matrices H and R have the following form: 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
The restricted estimates are reported in Table 9.1 under H3 and shows that c does not dier very much from the unrestricted values. The coecients to real money and real income are very small except for in the second relation, suggesting that they enter signicantly only in one of the three cointegration relations. Thus, two of the acceptable long-run homogeneity restrictions are describing a 0, 0 relationship. Example 4. A test of the interest rate spread in all cointegration relations. The hypothesis H4 was strongly rejected with a p-value of 0.00. The H and R matrices are similar to the previous example and will not be reported. The restricted estimates of H4 are reported in Table 9.1. It appears that the rst and the third restricted cointegration relation. The violation of the spread restriction in the rst relation suggests that ination is not exclusively related to the interest rate spread, but also to the level of nominal interest rates. The violation of the spread in the third relation suggests that the two interest rates are not homogeneously related to each other. Example 5. A joint test of H1 and H3 . It appeared that the zero restriction on the trend and the homogeneity restriction on real money and income were each acceptable. The joint hypothesis H5 was accepted with a p-value of 0.20. Note that the joint test is not the sum of the individual tests, because the latter are not independent. It frequently happens that the joint
H=
, R = 1 1 0 0 0 0 0 .
175
Table 9.1: Estimated eigenvalues, eigenvectors, and loadings for the Danish data i mr yr p Rm Rb Ds83 trend 2 H1 : 7 = 0, (3) = 1.80, p-value = 0.61 0.58 c0 0.03 0.05 1.00 -1.63 0.81 -0.00 0.0 1 c0 0.36 2 1.00 -1.05 -0.83 -13.9 14.2 -0.15 0.0 0.04 1.00 -0.37 -0.00 0.0 0.26 c0 0.01 0.00 3 2 H2 : 6 = 0, (3) = 11.69, p-value = 0.01 c0 0.58 1 0.01 0.06 1.00 -1.66 0.67 0.0 0.00 0.36 c0 1.00 -1.34 -1.21 27.7 21.5 0.0 -0.00 2 0.26 c0 0.12 -0.16 0.05 1.00 0.86 0.0 -0.00 3 2 H3 : 1 = 2 , (3) = 5.71, p-value = 0.13 0.58 c0 0.06 -0.06 1.00 -2.69 1.46 0.01 0.00 1 c0 0.36 2 1.00 -1.00 -0.86 -13.2 14.0 -0.15 0.00 1.00 -0.33 -0.00 0.00 0.26 c0 0.02 -0.02 0.05 3 H4 : 4 = 5 , 2 (3) = 15.33, p-value = 0.00 0.58 c0 0.03 0.05 1.00 -0.54 0.54 -0.00 0.00 1 0.36 c0 1.00 1.04 -0.80 -14.5 14.5 -0.12 -0.00 2 0.01 0.01 1.00 -1.00 -0.01 0.00 0.26 c0 -0.02 3 2 H5 : 1 = 2 and 7 = 0, (6) = 8.54, p-value = 0.20 0.58 c0 0.06 -0.06 1.00 -2.69 1.46 0.01 0.0 1 c0 0.36 2 1.00 -1.00 -0.86 -13.2 14.0 -0.15 0.0 c0 0.26 3 0.02 -0.02 0.05 1.00 -0.33 -0.00 0.0 test rejects, though the individual tests are accepted. The design matrices H and R are formulated as: 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
We have given several examples of restrictions that were acceptable and restrictions that were not. In the latter cases it was quite straightforward to
H=
, R = 1 1 0 0 0 0 0 . 0 0 0 0 0 0 1
176
see why a restriction violated the information in the data. This was partly because the unrestricted cointegration relations could be roughly interpreted as plausible steady-state relations, c.f. the discussion in Section 6.7. In most cases it is not so obvious why a restriction violates the information in the data. To be able to re-specify the hypothesis in a more adequate way it is crucial to understand why a restriction rejects. A useful tool in this context is to compare the unrestricted with the restricted c matrix. If the rec0 0 strictions are acceptable then the constrained c = c = , i.e. the constrained cointegration relations and the corresponding weights should approximately reproduce the unrestricted . If the restrictions are not ac0 c0 ceptable, then c = c 6= = . To illustrate how to locate the reason for a test failure Table 9.2 compares with c , where the latter is calculated under the hypothesis H4 in Table 9.1. It is now easy to see that imposing the interest rate spread on all cointegrating relations hardly changes the rows for real money, real income, and the bond rate of c , but does change the rows for ination and the deposit rate. In the ination equation the signicant eect of the deposit rate in has now disappeared and in the deposit rate equation the deposit rate is no longer related to the bond rate. Therefore, if we had started the analysis with the transformed vector e x0t = [mr , y r , p, (Rm Rb )] important feedback information from changes in nominal interest rate on the system would have been lost. Such transformations are often suggested by the underlying theory model and, therefore, frequently employed in empirical models without rst testing for data admissibility. The consequence is, however, that potentially important information in the original variables is being lost.
9.3
Some assumed known
This test is useful when we want to test whether a hypothetical known vector is stationary. For example we might be interested in whether real interest rate dened as R 4p is stationary, whether 4p is stationary by itself, and whether the income velocity of money, mr y r is stationary. To formulate this hypothesis it is convenient to decompose the r cointegrating relations into s known vectors b (in most cases s = 1) and r s unrestricted vectors :
9.3. SOME ASSUMED KNOWN
177
Table 9.2: Comparing the combined eects of unrestricted and restricted cointegration relations for r=3 c0 0 Comparing = (UR) with c = c (R) under H4 mr yr p Rm Rb Ds83 trend r UR mt 0.33 0.32 0.03 4.98 4.88 0.04 0.00
(5.4) (5.5) (5.1) (5.1) (0.1) (0.1) (3.1) (4.3) (4.9) (4.3) (3.9) (2.8) (3.8) (1.7)
R UR R UR R UR R UR R
mr t
r yt r yt
0.34 0.03
(0.8) (1.1)
0.32 0.06
(1.3) (1.4)
0.03 0.20
(0.9) (1.0)
5.00 5.00 1.76

(1.6) (1.1)
0.04
0.00
1.07
(1.5) (1.1)
0.00 0.00
(0.3) (1.8) (0.8) (0.1)
0.05
0.06
(4.1) (3.7)
0.21
(9.2) (8.7)
0.87 1.96
(2.0)
0.87 0.00 0.00 0.28 0.00 0.00

(0.5) (0.3) (0.6) (0.4) (5.3)
2 pt pt Rm,t Rm,t Rb,t Rb,t

2
0.04 0.16 1.66

(1.2) (0.1)
0.00 0.15 1.65 0.01

(1.4)
0.22
(0.3)
0.22 0.00 0.10 0.002

(1.8) (1.5) (3.4) (2.5)
0.00
(0.2)
0.01
(0.5)
0.00 0.30
(0.1) (3.5)
0.00
(1.3) (2.5)
0.00
(0.1)
0.00
(0.1)
0.00
(0.1)
0.10
(1.5)
0.10 0.002 0.03

(0.4) (0.3)
0.00 0.00
(0.1) (0.6)
0.00
(0.0) (0.1)
0.00
(0.1) (0.1)
0.01
(0.7) (0.6)
0.07
(0.6)
0.00
(0.2) (0.6)
0.00
0.00
0.01
0.02
(0.3)
0.02 0.00
0.00
178
Hc (r) : c = (b, ), where b is a p1 s, and is a p1 (r s) vector. We partition = (1 , 2 ),
(9.4)
(9.5)
where 1 are the adjustment coecients to b, and 2 to . The cointegrated VAR model can now be written as: 4xt = 1 b0 xt1 + 2 0 xt1 + 1 4 xt1 + Dt + t . (9.6)
The unrestricted estimates and are rst obtained from the concentrated model: R0t = 0 R1t + t dened in (??). Inserting (9.4) and (9.5) in (9.7) we get: R0t = 1 b0 R1t + 2 0 R1t + t (9.8) (9.7)
Since the variable b0 R1t is known and stationary under Hc it can be concentrated out from (9.8): R0,t = B1 b0 R1,t + R0.b,t and R1,t = B2 b0 R1,t + R1.b,t The new concentrated model is given as: R0.b,t = 2 0 R1.b,t + t (9.9)
9.3. SOME ASSUMED KNOWN
179
The remaining task is now to derive the ML estimator of (, 1 , 2 ). Two complications have to be solved: First, if happens to lie in the space spanned by b, then the model will become singular. Therefore, to avoid this we can force to lie in the space spanned by b . This can be achieved by reformulating model (9.9) for = b : R0.b,t = 2 0 b0 R1.b,t + t Second, because the maximum of the likelihood function is given by: c LT /2 = |S00.b | rs (1 i ) max i=1 and |S00.b | 6= |S00 | the determinants do not cancel in the LR test. This can be circumvented by solving an auxiliary eigenvalue problem: S00 S01 b(b0 S11 b)1 b0 S10 = 0
from which we get the eigenvalues:
1 > ... > s > s+1 = ... = p = 0 If s = 1, as is usually the case in practical applications, only 1 > 0. It can be shown that |S00.b | = |S00 | s (1 i ) i=1 Therefore, c i=1 L2/T (Hc (r)) = |S00 | s (1 i )rs (1 i ) max i=1 and the LR test procedure = L2/T (Hc (r))/L2/T (H(r)) max max
180 gives us the LR test statistic:
c c 2 ln = T {ln(1 1 ) + ... + ln(1 s ) + ln(1 1 ) + ... + ln(1 rs ) ln(1 1 ) + ... + ln(1 r )} which is asymptotically distributed as 2 with (p1 r)s degrees of freedom.
9.3.1
Illustrations
This test can be calculated in CATS by choosing the same option as in the previous test: Restrictions on subsets of the vectors. The program rst prompts you to input the number of dierent groups. If there is one known vector to test, then the answer is [2 ]; i.e. one group containing the known vector and the second group the remaining r 1 vectors. The program prompts for the number of vectors in group 1 [1 ] and for the number of restrictions [p1-1]. Then you need to type in the restrictions dening the known vector and save the restrictions. Finally, the program will ask for the number of vectors in the second group [r-1 ] and the number of restrictions [0 ]. As an illustration of the test procedure we will ask whether ination rate, real interest rates, and the interest rate spread are stationary by themselves. Note that the null is stationarity in this case. First, the test that p I(0) is formulated as: H6 : c = (b, ), where b0 = 0 0 1 0 0 0 0
i.e. b is a unit vector that picks up the ination rate. The remaining r-1 = 2 vectors are unrestricted and described by the matrix of dimension p1 r 1 = 7 2. The coecients ij are uniquely determined based on the ordering of eigenvalues and the normalization 0 S11.b = I. In most cases ij are not of particular interest and there is no need to interpret the estimated
9.4. SOME COEFFICIENTS KNOWN
181
coecients. Only when b is found to be stationary can an inspection of the remaining relations sometimes be helpful in tentatively identifying the complete cointegration structure. The results of testing the four stationarity hypotheses are presented in the rst part of Table 9.3. H6 tests the stationarity of ination rate, H7 of real deposit rate, H8 of real bond rate, and H9 of the interest rate spread. All of them are rejected. Note, however, that the test results are not independent of the choice of r. If a conservative value of r is chosen (small r) then it will be more dicult to accept the stationarity hypothesis than if when r is big. We note, however, that the rejection might be related to the zero restriction of the shift dummy Ds83 t . For example, if the interest rate spread is stationary around one level before 1983 and another level after that date, then H9 would probably be rejected as a consequence of imposing a zero restriction on Ds83 t . If this is the case it would be more relevant to ask whether (Rm Rb b1 Ds83) I(0) rather than (Rm Rb ) I(0), where b1 Ds83 is the estimated shift in the level of interest rate spread as a result of deregulation of capital movements. In the next section we will discuss a test procedure for the situation when at least one of the coecients is not known and, therefore, have to be estimated.
9.4
Only some coecients are known
Here we will test restrictions on one (or a few) of the cointegration vectors assuming that some of the coecients are known and some have to be estimated. In the general case we formulate the hypothesis as = {H1 1 , H2 2 , ..., Hr r }, where Hi is a (p1 si ) matrix, i = 1, .., r. For simplicity we assume in the following that the r cointegration relations are divided into two groups containing r1 and r2 vectors each. Here we will focus on the special case r1 = 1, r2 = r 1 and H2 = I and leave the more general specication to the next chapter. This case is useful for asking questions about the stationarity of a single hypothetical cointegrating relation while leaving the remaining r 1 relations unrestricted. For example, this test would answer the question whether there exists any stationary combination between nominal interest rate R and ination 4p, i.e. whether (R p) is stationary for some value of . The test procedure will nd the value that produces the most stationary relation and calculate the p-value associated with the null hypothesis. The latter is expressed as:
182
c = (c , 2 ) = (H1 1 , 2 ) 1
(9.10)
where H1 is a known design matrix of dimension p1 s1 , 1 is s1 1 matrix and 2 is a p1(r 1) matrix of unrestricted coecients. Again we partition so that it corresponds to the partitioning of c : = (1 , 2 ). The concentrated model can be written as: R0t = 1 01 H01 R1t + 2 02 R1t + t The estimation problem is now more complicated because neither 01 H01 R1t nor 02 R1t are known and can be concentrated out. Dierent algorithms for solving nonlinear estimation problems can be used. CATS uses a switching algorithm described below: 1. Estimate an initial value of c = 1 . 1 2. For a xed value of c = 1 estimate 2 and 2 by reduced rank 1 regression of R0. 1 ,t on R1. 1 ,t , where R0. 1 ,t and R1. 1 ,t are corrected 2/T 0 for 1 R1t . This denes 2 and Lmax ( 1 ). 3. For xed value of 2 = 2 estimate 1 and 1 by reduced rank regres0 sion of R0. 2 ,t on H1 R1. 2 ,t , where R0. 2 ,t and H01 R1. 2 ,t are corrected 2/T 0 for 2 R1t . This denes 1 = H1 1 and Lmax (2 ). 4. Repeat the steps, always using the last obtained values of i until the values of the maximized likelihood function converge. In CATS 2/T 2/T {Lmax ( 1 ) Lmax ( 2 )} 0.000001 for the algorithm to stop. A maximum of 200 iterations is set by CATS. The eigenvalue problem for xed 1 = 1 is given by: 1 S11 S10. 1 (S00. 1 )1 S01. 1 = 0
9.4. SOME COEFFICIENTS KNOWN or for xed 2 = 2 by: 1 2 H01 S11 H1 H01 S10. 2 (S00. 2 ) S01. 2 H1 = 0 L2/T (Hc ) = |S00 | (1 1 )...(1 r ) max
183
and the maximum of the likelihood function by:
where i are the eigenvalues obtained after convergence of the likelihood function. Using the LR procedure the hypothesis (9.10) can be tested by calculating the test statistic: 2 ln = T {ln(1 1 ) + ... + ln(1 r ) ln(1 1 ) ... ln(1 r )}, (9.11) which is asymptotically 2 distributed with the degrees of freedom given by: = (m1 r + 1) = (p1 r) (s1 1)
9.4.1
Illustrations
This test can be performed in CATS by choosing the same option as in the previous test: Restrictions on subsets of the vectors. The program rst prompts you to input the number of dierent groups, usually [2 ] i.e. one group containing the restricted vector(s) and the second group the remaining vectors. The program prompts for the number of vectors in group 1 [1 ] and for the number of restrictions [m]. Then CATS will ask you to type in the restrictions of the design matrix H or alternatively the restriction matrix R dening the restricted vector and save the restrictions. Finally, the program will ask for the number of vectors in the second group [r-1 ] and the number of restrictions [0 ]. We will now illustrate this type of tests: c = {H1 , 2 } (9.12)
184
H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 H24 H25 H26 H27 H28
Table 9.3: Testing the stationarity of simple relations mr y r p Rm R b Ds83 trend 2 () p.val. Tests of a known vector 0 0 1 0 0 0 0 20.0(4) 0.00 0 0 1 -1 0 0 0 16.5(4) 0.00 0 0 1 0 -1 0 0 12.7(4) 0.01 0 0 0 1 -1 0 0 20.1(4) 0.00 Tests of a trend-stationary relation 1 0 0 0 0 -0.57 0.003 8.9(2) 0.01 0 1 0 0 0 -0.28 0.003 12.6(0) 0.01 Tests of velocity relations 1 -1 0 0 0 -0.39 0.002 8.5(2) 0.01 0.04 -0.04 1.0 0 0 -0.01 0.000 4.4(1) 0.04 1 -1 0 -14.5 14.5 -0.113 -0.001 0.02(1) 0.88 1 -1 0 -14.1 14.1 -0.145 0 0.65(2) 0.72 Tests of real income relations 0 1 11.7 0 0 -0.053 0.002 0.39(1) 0.53 0 1 15.3 -15.3 0 -0.105 0.002 2.27(1) 0.13 0 1 17.7 0 -17.7 -0.438 0.006 9.32(1) 0.00 Tests of ination, real interest rates and the spread 0 0 1 0 0 0.015 0 7.4(3) 0.06 0 0 1 -1 0 0.009 0 7.8(3) 0.05 0 0 1 0 -1 -0.001 0 12.6(3) 0.01 0 0 0 1 -1 -0.001 0 14.3(3) 0.00 Tests of combinations of interest rates and ination rate 0 0 1 -0.45 0 0.012 0 7.1(2) 0.03 0 0 1 0 -0.20 0.011 0 6.8(2) 0.03 0 0 0 1 -0.45 -0.001 0 0.9(2) 0.64 Tests of homogeneity between ination and the interest rates 0 0 1.0 -0.25 -0.75 0 0 12.1(3) 0.01 0 0 1.0 -1.66 0.66 0.015 0 5.8(2) 0.06 0 -0.04 1.0 -1.56 0.56 0 -0.00 0.02(1) 0.89
9.4. SOME COEFFICIENTS KNOWN
185
where we impose restrictions on just one of the vectors and leave the remaining ones unrestricted. This test is indispensable for identifying irreducible sets of cointegrated variables (Davidson, 2001), our so called building blocks. Rejecting the stationarity of the hypothetical relation 01 H0 xt implies that it does not qualify as a steady-state relation in the nal long-run structure. Therefore, systematically testing the stationarity of all possible relationships is an indispensable help in spotting potentially relevant cointegration relations in the structure of identied long-run steady-state relations. The test of a fully identied cointegration structure will be discussed in more detail in the next chapter. Trend-stationarity of real money and real income is tested as the hypotheses H10 and H11 . None of them are accepted. Note also that the estimated trend coecient implies a negatively sloped trend in both real money and income! The next group of tests, H12 H15 , are about (inverse) velocity relations. The trend-stationarity of the liquidity ratio, H12 , is clearly rejected (see also Figure 2.5 in Chapter 2). It appears from H13 that the liquidity ratio and ination do not appear to be cointegrated, while H14 shows strong cointegration between the liquidity ratio and the interest rate spread. To investigate whether the trend is signicant we have imposed a zero restriction on the trend in H15 . The dierence between the two test values, 0.65-0.02 = 0.63, is approximately 2 (1) and, hence, the eect is not signicant and the trend can be set to zero in H14 . The hypotheses H16 H18 are tests about real income relations. Real income and ination appear cointegrated in H16 but with implausible coecients. This is probably a consequence of a declining ination and a modestly growing real income in this period. Though being found stationary, we would not consider this relation to be a serious candidate for an economic steadystate relation. The same is true for H17 which cannot be interpreted as an IS relation and for H18 the stationarity of which is rejected. The tests in the next group of hypotheses, H19 H22 , are similar to the tests in the rst part of the table. The dierence is that we here allow the shift dummy Ds83 t to enter the relations. However, none of the variables becomes convincingly stationary and we conclude that ination, real interest rates and the spread are all nonstationary even when we allow for a mean shift in 1983:1. The hypotheses H23 H25 are similar to H20 H22 except that the unitary restriction on the bond rate coecient is now relaxed. We note that the two
186
interest rates appear cointegrated with a coecient of 0.45. The low value of the coecient can possibly be explained by the deposit rate being an average of all the components in M3, some of which yielded no interest, while others quite a high interest. Finally, the last group tests homogeneity between ination rate and the two interest rates. The design matrices in H26 are specied as: H= 0 0 0 0 1 0 , R = 1 1 0 1 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 1 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
Because H26 is rejected we ask in H27 whether the results would improve by allowing for a mean shift in 1983:1. Stationarity did improve, but not convincingly so. By inspecting the c and comparing it with the unrestricted (not reported here) we found that imposing H27 on the data caused real income to become insignicant in the ination equation. Hence, by adding real income (and the trend) to H27 we obtained H28 which was strongly accepted. To summarize: this exercise has identied the following three possible candidates for a long-run structure: H15 , H25 , and H28 . These will be tested jointly in Chapter 10.
9.5
Long-run weak exogeneity: restrictions on
The hypothesis that a variable has inuenced the long-run stochastic path of the other variables of the system, while at the same time has not been inuenced by them, is called the hypothesis of no levels feedback or longrun weak exogeneity when the parameters of interest are . We test the following hypothesis on :
c H (r) : = Hc
(9.13)
9.5. LONG-RUN WEAK EXOGENEITY
187
where is p r, H is a p s matrix, c is a s r matrix of nonzero coecients and s r. (Compare this formulation with the hypothesis of the same restriction on all , i.e. = H.) As with tests on we can express the restriction (9.13) in the equivalent form:
c H (r) : R0 = 0
(9.14)
where R = H . Since the R matrix is often of much smaller dimension than the H matrix, the default in CATS is to input the R matrix. The condition s r implies that the number of non-zero rows in must not be greater than r. This is because a variable that has a zero row in does not adjust to the long-run relations and, hence, can be considered as a common driving trend in the system. Since there can be at most (p r) common trends the number of zero-row restrictions can at most be equal to (p r) . The hypothesis (9.13) can be expressed as: c 1 c H0 : H = = . 0
c Under H (r):
xt = 1 xt1 + c 0 xt1 + 0 + Dt + t . We consider rst the concentrated model: R0t = 0 R1t + t Under H0 : R0t = H0 R1t + t with t NIID(0, ) and = 11 12 21 22
(9.15)
where 12 is the covariance matrix between the errors from the m endogenous variables x1,t and the errors from the p m weakly exogenous variables
188
x2,t . The weak exogeneity hypothesis can be tested with a LR test procedure described in Johansen and Juselius (1990). The idea behind the derivation of the test procedure is to partition the equation system (9.15) into the x1,t and the x2,t equations. Formally this can be done by multiplying (9.15) with H and H respectively. To obtain simpler results we use the normalized 0 matrix H = H(H0 H)1 instead of H (because H H = H0 H= I), i.e.: H R0t = c 0 R1t + H t 1 H0 R0t = H0 t For the baby model this would correspond to: x1t = c 0 xt1 + 1t 1 x2t = 2t Next step is to formulate the joint model (9.16) as a conditional and marginal model using results from multivariate normal distributions: H R0t = c 0 R1t + H0 R0t H0 t + H t 1 H0 R0t = H0 t where = 12 1 . For the baby model this would correspond to: 22 x1t = c 0 xt1 + x2t 2t + 1t 1 x2t = 2t i.e. x1t = c 0 xt1 + x2t + 1.2t 1 x2t = 2t where 1.2t = 1t 2t and Cov(1.2t , 2t ) = 0. All relevant information about and are now in the x1t equation and we can solve the eigenvalue problem entirely based on that equation. Since x2t is stationary we can correct R0,t (x1t ) and R1,t (xt1 ) for this variable based on the auxiliary regressions: R0t = B1 x2t + R0.H t
0 0 0 0
(9.16)
(9.17)
9.5. LONG-RUN WEAK EXOGENEITY and R1t = B2 x2t + R1.H t so that: R0.H t = c 0 R1.H t + ut
189
(9.18)
The usual eigenvalue problem is based on (9.18) and the solution delivers c p m eigenvalues i . The r largest are used in the LR test which is given by:
c c 2 ln L(H (r)/H(r)) = T r {ln(1 i ) ln(1 i )} i=1
It is asymptotically distributed as 2 () where = rm and m is the number of weakly exogenous variables.
9.5.1
Empirical illustration:
We will now test the hypothesis that the bond rate is weakly exogenous for the long-run parameters in the Danish money demand data, i.e. that 51 = 52 = 53 = 0 . We have p = 5, r = 3, m = 1, s = 4. mr t r yt 2 pt Rm,t Rb,t 11 12 13 . . . 01 xt1 = ... + . . . 02 xt1 . . . 03 xt1 51 51 51 1 0 0 0 c 0 11 c c 12 13 0 1 0 0 1 xt1 . . . 0 2 xt1 = 0 0 1 0 . . . 0 0 0 1 03 xt1 c c c 41 42 43 0 0 0 0 c 11 c c 12 13 . . . 01 xt1 . . 02 xt1 = . c 41 c c 03 xt1 42 43 0 0 0
190 Table 9.4: mr 0.4 10.9 19.2 23.8
CHAPTER 9. TESTING RESTRICTIONS Tests of long-run weak exogeneity yr p Rm Rb 2 () 0.2 29.8 0.5 0.6 1 3.8 1.0 40.3 0.6 0.6 2 6.0 2.4 50.4 11.1 0.7 3 7.8 6.6 54.7 15.4 1.1 4 9.5
r 1 2 3 4
The specication of the zero row restriction on the adjustment coecients of the bond rate is given by the R matrix (which is the form used in CATS):
R0 = [0, 0, 0, 0, 1]. The test statistic, distributed as 2 (3), became 0.7 and the weak exogeneity of the bond rate is clearly accepted. In Table 9.4 we have reported the weak exogeneity test for all variables and for all possible values of r. The row corresponding to our preferred choice r = 3 is indicated by two arrows. The remaining test results for other choices of r serve the purpose of sensitivity analyses: If, instead we had chosen r = 2 (remember that based on the rank test we could as well have chosen this value), the two interest rates would have become weakly exogenous, implying that each of them would have acted as an independent common driving trend in this system, which a priori would not seem very plausible. On the other hand, if we had chosen r = 1 (as has often been the case in many empirical models of money demand), then only ination would have been non-exogenous and all information about agents adjustment towards a money-demand relation would have been lost. This is because the rst unrestricted cointegration relation was signicant exclusively in the ination equation (see Table 7.1). Furthermore, the test results in Table 9.4 show that the weak exogeneity result for y r and Rb is robust to the choice of r and we might ask whether they are jointly exogenous. Because the weak exogeneity tests of a single variable are not independent the joint hypothesis of two or several variables can be (and often is) rejected even if the latter are individually accepted as weakly exogenous. The design matrix for the joint test is specied as:
9.5. LONG-RUN WEAK EXOGENEITY
191
R =
0 0 0 0 1 0 1 0 0 0
and the test statistic, distributed as 2 (6), became 3.21 with a p-value of 0.78. Hence, we can safely accept both variables to be weakly exogenous in this system. Since m = 2 which corresponds to the number of common trends, p r, this is the maximum number of weakly exogenous variables in this case. Long-run weak exogeneity of the real income and the bond rate implies that valid inference on can be obtained from the three-dimensional system describing mr , p, and Rm conditional on y r and Rb (Johansen, 1992, Hendry and Richard, 1983). The argument is based on a partitioning of the joint density into the conditional and marginal densities: D(xt |xt1 , xt1 , Ds83t1 , Dt ; ) = = D(x1t |x2t , xt1 , xt1 , Ds83t1 , Dt ; 1 ) D(x2t |xt1 , Dt ; 2 ) (9.19)
r where x0t = [x01t , x02t ] and x01t = [mr t , pt , Rm,t ], x02t = [yt , Rb,t ], Dt = [D75.4, Ds83t , Ds83t1 , Dp92.4], 1 , 2 are variation free and only 1 contains the long-run parameters of interest . In this case the cointegrated VAR(2) model
xt = 1 xt1 +0 xt1 +Dt +t is equivalent to x1t = x2t +1.1 xt1 +1 0 xt1 +1 Dt +1t 2t x2t = 1.2 xt1 +2 Dt +2t where Cov(1.2t , 2t ) = 0, a1 is (32), 1.1 (35), 1 (33), 1 (3 5), 1.2 (2 5), and 2 (2 5). When m zero-row restrictions on are accepted (m (p r) we can partition the p equations into (p m) equations which exhibit levels feedback, and m equations with no levels feedback. We say that the m variables are weakly exogenous when the parameters of interest are . Because the
192
m weakly exogenous variables do not contain information about the longrun parameters, we can obtain fully ecient estimate of from the (p m) equations conditional on the marginal models of the m weakly exogenous variables. This gives the condition for when a partial model can be used to estimate without losing information. More formally: let {xt } = {x1,t , x2,t } where x2,t is weakly exogenous when is the parameter of interest, then a fully ecient estimate of can be obtained from the partial model: x1,t = A0 x2,t + 11 xt1 + 1 0 xt1 + 0 + 1 Dt + 1t . (9.20)
Thus, to know whether we can estimate from a partial system we need rst to estimate the full system and test 2 = 0 in that system. After having estimated the full system, the question is why would we bother to reestimate a partial system. There are two reasons: 1. It is sometimes a priori very likely that weak exogeneity holds, for example it is almost sure that US bond rate has an inuence on the Danish economy, while it is almost as sure that the Danish economy is irrelevant for the US bond rate. Hence, testing may not be necessary and we can estimate a partial system conditional on the US bond rate. When the number of potentially relevant variables to include in the VAR model is large it can be useful to impose weak exogeneity restrictions from the outset. 2. By conditioning on weakly exogenous variables one can often achieve a partial system which has more stable parameters than the full system. This is often the case when the marginal model of the weakly exogenous variable has non-constant parameters or exhibit non-linearities in the parameters. Finally, nding weakly exogenous variables is often helpful in the identication of the common driving trends as will be illustrated in Chapter 13. Thus the hypothesis 2 = 0 is of interest for its own sake as it corresponds to the hypothesis of no long-run feed-back.
9.6
Revisiting the scenario analysis
The choice of r = 3 is consistent with the discussion in Chapter 2, where P we hypothetically assumed two autonomous common trends, t u1i and i=1
9.6. REVISITING THE SCENARIO ANALYSIS Pt

i=1 u2i
193
as the driving forces in the small monetary system. We also noted in Chapter 2 that if nominal and real shocks are separated in the sense of nominal shocks exhibiting no real eects (at least in the long-run) we would expect to nd the following relations to be stationary cointegrating relations: (m p y r ) I(0), (Rb Rm ) I(0), (Rb p) I(0). In the next chapter we will demonstrate that each of them imposes two overidentifying restrictions on the cointegrating space, i.e. a total of six testable restrictions, which together would completely determine the cointegration space: (m p)t r yt pt Rm,t Rb,t
However, none of the above relations were found to be stationary suggesting that there have been permanent interaction eects between the nominal and the real side of the economy (at least over the business cycle horizon). We found strong empirical support for the stationarity of the following relations1 : {(m p y r ) + 14(Rb Rm )} I(0), {Rm 0.45Rb } I(0), {(p Rm ) 0.56(Rm Rb ) 0.04(y r b4 t)} I(0). The rst relation shows that there is empirical support for a Danish money demand relation of the type discussed in Romer (1996). However, it is a linear combination of two nonstationary relations rather than a linear combination of two stationary relations. The nding that (mr y r )t I(1) and (Rb Rm )t I(1), but {(mr y r )t + 14(Rb Rm )}t I(0) implies that the stochastic
Similar results have been found in many other empirical applications based on monetary transmission models for Germany, Italy and Spain in the post Bretton Woods period. See for example Juselius (1998a, 1998b), Juselius (2000), and Juselius and Toro (2003).
1
1 1 0 0 0 0 0 1 1 xt = 0 0 0 0 1 0 1
194
trend in money velocity is the same as the stochastic trend in the spread. We will now take a closer look at the time series implications of this result. The fact that (mr y r )t I(1) can mean either that mr and y r contain dierent stochastic trends or that mr and y r have been aected by the same stochastic trend but not in the same proportion one to one. In the rst case no value of b in the linear combination mr by r would produce stationarity between mr and y r , for example because y r had only been aected by cumulated real shocks, whereas mr by both real and nominal shocks. The hypothesis (mr by r ) I(0) is a testable hypothesis which was rejected by the data. Thus, money velocity seems to have been aected by the nominal stochastic trend, but possibly also by the real stochastic trend in this period. Which of the two cases is empirically correct can be inferred by examining the time series properties of the two interest rates. The cointegration results showed that even if (Rm Rb )t I(1) there existed a stationary linear combination (Rm 0.45Rb )t I(0). Hence, the two interest rates do share a common trend though not in the proportion one to one. This trend is much more likely to describe cumulated nominal rather than real shocks to the system. Since (mr y r ) and (Rm Rb ) were found to be cointegrating, money velocity must have been aected by the nominal stochastic trend. Chapter 2 pointed out that it is not irrelevant for the eectiveness of monetary policy whether we get one result or the other. As an example, let us consider the simple case when the central bank increases its interest rate in order to curb excess aggregate demand in the economy (and hence inationary pressure). This is based on the assumption that the interest rate shock will rst inuence all market interest rates from the short to the long end, then lower the demand for investment, and nally bring ination down. If (Rm Rb ) I(0), then the shocks will be transmitted one to one and the central bank may very well be successful in bringing the ination down by increasing interest rates. If (Rm Rb ) I(1), then the shocks will not be transmitted one to one and it is less obvious that the central bank is able to bring the ination down by increasing its own interest rates. Whether this is the case or not depends on the remaining cointegration properties and the dynamics of the system (Johansen and Juselius, 2003). For example it depends on whether excess aggregate demand is related to the real interest rate, whether the latter is stationary or not, whether excess aggregate demand is related to ination or not, and so on. Because the implications of monetary policy are likely to dier depend-
9.6. REVISITING THE SCENARIO ANALYSIS
195
ing on the cointegration properties between a policy instrument variable, intermediate targets and a goal variable it is important to have reliable information about these relationships. In periods of deregulation or changes in regimes the latter are likely to change. Therefore, cointegration properties and changes in them are likely to provide valuable information about the consequences of shifting from one regime to another. See for example Juselius (1998a).
Chapter 10 Identication of the Long-Run Structure

When the empirical model is estimated with data that are nonstationary in levels we need to discuss two dierent identication problems: identication of the long-run structure, i.e. of the cointegration relations and identication of the short-run structure, i.e. of the equations. The former is about imposing long-run economic structure on the unrestricted cointegration relations, the latter is about imposing short-run dynamic adjustment structure on the equations for the dierenced process. In this chapter we will primarily discuss identication of the long-run relations and leave the short-run adjustment structure to the next chapter. The organization of this chapter is the following: Section 10.1 discusses the cointegrated VAR model for rst order integrated data in reduced and structural form. The parameters of the model are partitioned into the shortrun and the long-run parameters and it is shown that the analysis of the long-run structure can be performed in either representation. Section 10.2 discusses the condition for identication in terms of restrictions of the longrun structure. A general result for formal identication in a statistical model is given, and empirical identication is dened. Section 10.3 discusses how to calculate degrees of freedom when testing over-identifying restrictions. Section 10.4 discusses just-identied restrictions of the long-run structure and provides two illustrations in which economic identication can be addressed. Section 10.5 does the same for an over-identied structure and Section 10.6 for an nonidentied structure. Section 10.7 illustrates some recursive procedures to check for parameter constancy and Section 10.8 concludes. 195
196
CHAPTER 10. IDENTIFICATION LONG-RUN STRUCTURE
10.1
Identication when data are nonstationary
To illustrate the dierence between the two identication problems it is useful to consider the cointegrated VAR model both in the so called reduced form and the structural form and discuss in which aspects they dier. First, consider the usual reduced form representation: xt = 1 xt1 + 0 xt1 + Dt + t , t Np (0, ) (10.1)
and then pre-multiply (10.1) with a nonsingular p p matrix A0 to obtain the so called structural form representation (10.2): A0 xt = A1 xt1 + a0 xt1 + Dt + vt , vt Np (0, ). (10.2)
At this stage we assume that reduced form parameters RF = {1 , , ,, } and structural form parameters SF = {A0 , A1 , a, , , } are unrestricted. To distinguish between parameters of the long-run and the short-run structure we partition RF = {S , L }, where S = {1 , , , } and L = RF RF RF RF {} and SF = {S , L }, where S = {A0 , A1 , a, , } and L = {}. SF SF SF SF The relation between S and S is given by: RF SF 1 = A1 A1 , = A1 a, t = A1 vt , = A1 , = A1 A00 1 . 0 0 0 0 0 The short-run parameters of the reduced form, S , are uniquely deRF ned, whereas those of the structural form, S , are not without imposing SF p(p 1) just-identifying restrictions. Although the long-run parameters are uniquely dened based on the normalization of the eigenvalue problem, this need not coincide with an economic identication and in general we need to impose r(r 1) just-identifying restrictions also on . Because the longrun parameters remain unaltered under linear transformations of the VAR model, is the same both in both forms and identication of the long-run structure can be done in either the reduced form or the structural form. This gives the rationale for identifying the long-run and the short-run structure as two separate statistical problems, though from an economic point of view they are interrelated. Therefore, we can discuss the statistical problem of how to test structural hypotheses on the long-run structure {} before
10.2. IDENTIFYING RESTRICTIONS
197
addressing structural hypotheses on the short-run structure {A0 ,A1 , a,, }. From a practical point of view this is invaluable, as the joint identication of the long- and short-run structure is likely to be immensely dicult. The identication process starts with the identication of in the reduced form and proceed to the identication of the short-run structure keeping the identied xed. To understand all aspects of identication it is useful to distinguish between identication in three dierent meanings: generic (formal) identication, which is related to a statistical model empirical (statistical) identication, which is related to the actual estimated parameter values, and economic identication, which is related to the economic interpretability of the estimated coecients of a formally and empirically identied model. For identication to be empirically useful all three conditions for identication have to be satised in the empirical problem, which as a crucial part involves the choice of data.
10.2
1
Identifying restrictions
In order to identify the long-run structure we have to impose restrictions on each of the cointegrating relations. As before, Ri denotes a p1mi restriction matrix and Hi = Ri a p1si design matrix (mi +si = p1) so that Hi dened by R0i Hi = 0. Thus, there are mi restrictions and consequently si parameters to be estimated in the i th relation. The cointegrating relations are assumed to satisfy the restrictions R0i i = 0, or equivalently i = Hi i for some si -vector i , that is = (H1 1 , ..., Hr r ), (10.3)
where the matrices H1 , ..., Hr express linear hypotheses to be tested against the data. Note that the linear restrictions do not specify any normalization
1
This section relies strongly on Johansen and Juselius (1995).
198
of the vectors i . In the previous chapter we gave several examples of design matrices Hi . The idea is to choose Hi so that (10.3) identies the cointegrating relations. The well known rank condition expresses that the rst cointegration relation, say, is identied if rank(R01 1 , ..., R01 r ) = rank(R01 H1 1 , ..., R01 Hr r ) = r 1. (10.4)
This implies that no linear combination of 2 , ..., r can produce a vector that looks like the coecients of the rst relation, i.e. satises the restrictions dening the rst relation. Note, however, that in order to check the rank condition (10.4) we need to know the coecients i i = 1, ..., r, but in order to estimate the coecients we need to know whether the restrictions are identifying. Most software programs check the rank condition prior to estimation, by rst giving the coecients some arbitrary numbers. If the rank condition is satised estimation can proceed. One can, however, avoid the arbitrary coecients and explicitly check the rank condition based on the known matrices Ri and Hi . Johansen (1992b) gives the following condition for a set of restrictions to be identifying: The set of restrictions is formally identifying if for all i and k = 1, ..., r1 and any set of indices 1 i1 < ... < ik r not containing i it holds that rank(R0i Hi1 , ..., R0i Hik ) k . (10.5)
As an example we consider r = 2, where (10.5) reduces to the condition: ri.j = rank(R0i Hj ) 1, i 6= j. If r = 3 the conditions to be satised are ri.j = rank(R0i Hj ) 1, i 6= j, ri.jm = rank(R0i (Hj , Hm )) 2, i, j, m dierent. The value of ri.jm can be determined by nding the eigenvalues of the symmetric matrix (Hj , Hm )0 (I Hi (H0i Hi )1 H0i )(Hj , Hm ) using the identity: ri.jm = rank(R0i (Hj , Hm )) = rank(Hj , Hm )0 (I Hi (H0i Hi )1 H0i )(Hj , Hm ). (10.6)
199
Thus the usual rank condition (10.4) requires the knowledge of the (not yet estimated) parameters, whereas condition (10.5) is a property of the known design matrices. In the empirical applications below we will illustrate how to check the rank conditions using (10.6). It is useful to distinguish between just identifying restrictions and overidentifying restrictions. The former can be achieved by linear combinations of the relations (equations) and, hence, do not change the likelihood function, whereas the latter constrain the parameter space and, hence, change the likelihood function. For identifying restrictions it holds that rank (R0i ) r1, and if equality holds the i0 th rel ation is exactly identied, if inequality holds then the i0 th rel ation is overidentied. The system is exactly identied if rank(R0i ) = r1 for all i, and overidentied if it is identied and rank R0i > r 1 for at least one i. As a general procedure it is useful to start with a just-identied system and then impose further restrictions if the estimated parameters indicate that a further reduction in the statistical model is possible. For example, one would generally prefer to set insignicant coecients in the just-identied model to zero. Such restrictions constrain the parameter space and, thus, are testable. However they may, but need not, be overidentifying. The reason is that when accepting further overidentifying restrictions the rank condition (10.5) need not be satised and the more restricted model is no longer identied. For example, if the rank condition (10.5) is satised under the condition that a certain coecient is nonzero, but the true value is in fact zero, then the rank condition in the true model is not satised. Generally the parameter values of the generically identied model are not known but have to be estimated. If the true coecient is zero, then the estimate will in general not be signicantly dierent from zero and restricting it to zero will in such a case violate the rank condition. Thus, we have that although the original statistical model is formally identifying, the economic model is not empirically identied. This can be formalized as: An economic model specied by the parameter value , say, is formally identied if is contained in the parameter space specied by identifying restrictions. It is empirically identied if is not contained in any nonidentied sub-model.
200
In a formally identied model the parameters can be estimated subject to the restrictions by the iterative procedure discussed in Chapter 9, Section 4. The results can now be generalized to the case where we impose (identifying) restrictions on all cointegrating vectors. We consider the equilibrium error correction term of (10.1) and write it as: 0 xt1 = 1 01 xt1 + ... + r 0r xt1 = 1 01 H01 xt1 + ... + r 0r H0r xt1 . The hypothesis on is expressed as: = (1 , ..., r ) = (H1 1 , ..., Hr r ) where Hi , i = 1, ..., r are known design matrices of dimension p1 si , and i are si 1 matrices of unrestricted coecients. Again we partition so that it corresponds to the partitioning of : = (1 , ..., r ). The concentrated model can be written as: R0t = 1 01 H01 R1t + ... + r 0r Hr R1t + t The estimation problem can be solved by extending the switching algorithm, introduced in Section 9.3, described below: 1. Estimate an initial value of 1 = 1 , ..., and r1 = r1 . 2. For xed value of 1 = 1 , ..., r1 = r1 estimate r and r by re0 0 duced rank regression of R0t on H0r R1t corrected for 1 R1t , ..., r1 R1t . 2/T This denes r = Hr r and Lmax (1 , ..., r1 ). 3. For xed value of 2 = 2 , ..., r1 = r1 , r = r estimate 1 0 and 1 by reduced rank regression of R0t on H1 R1t corrected for 2/T 0 0 2 R1t , ..., r R1t . This denes 1 = H1 1 and Lmax (2 , ..., r ). 4. Repeat the steps using the last obtained values of i until the value of the maximized likelihood function has converged.
201
e The eigenvalue problem for xed 1 = {1 = 1 , ..., r1 = r1 } is given by: 0 0 1 r Hr S11 Hr Hr S10. (S00. 1 ) S01. Hr = 0 e e
1 1
e and for xed 2 = { 2 = 2 , ..., r = r } by:

2 2
and so on. The maximum of the likelihood function is given by: L2/T (Hc ) = |S00 | (1 1 )...(1 r ) max where i are the eigenvalues obtained from applying the switching algorithm until convergence of the likelihood function. Using the LR procedure, the hypothesis (??) can be tested by calculating the test statistic: 2 ln = T {ln(1 1 ) + ... + ln(1 r ) ln(1 1 ) ... ln(1 r )}, (10.7) which, under assumption that all Hi are identifying, is asymptotically 2 distributed with degrees of freedom given by: = r (mi r + 1) = r (p1 r) (si 1). i=1 i=1 How to calculate the degrees of freedom will be given a detailed discussion in the next section. To summarize: For xed values of 2 , ..., r , or 2 , ..., r , we can nd the ML estimate of 1 by performing a reduced rank regression of xt on H01 xt1 corrected for all the stationary and deterministic terms, that is, 02 xt1 , ..., 0r xt1 , xt1 and Dt . This determines the estimate of 1 and, hence, 1 = H1 1 . In the next step we keep the values 1 , 3 , ..., r xed and perform a reduced rank regression of xt on H02 xt1 corrected for all
1 0 0 1 H1 S11 H1 H1 S10. (S00. ) S01. H1 = 0, e e e

2
202
stationary and deterministic terms. This determines 2 . By applying the algorithm until the likelihood function has converged to its maximum we can nd the maximum likelihood estimates of subject to the identifying restrictions. The speed of convergence of the switching algorithm depends very much on how we choose the initial values of . For example, the unrestricted estimates of are not in general the best choice, because the unrestricted eigenvectors need not correspond to the ordering given by H1 , ..., Hr and, thus, can be very poor initial values. Instead, the linear combination of the unrestricted estimates which is as close as possible to sp(Hi ), i = 1, ..., r is clearly preferable as a starting value for i . These can be found by solving the eigenvalue problem: |H0i Hi H0i ( ) Hi | = 0, for the r eigenvalues 1 > ... > r and v1 , ..., vr , and choose as initial value for i the rst eigenvector dened by Hi i . This choice of initial values has the extra advantage that for exactly identied equations no iterations are needed.
0 0 0
10.3
Formulation of identifying hypotheses and degrees of freedom
When testing restrictions imposed on the cointegration relations using readily available software packages, the degrees of freedom are usually provided by the program. However, some hypotheses can impose restrictions which are quite complicated and where standard formula for calculating degrees of freedom may no longer be applicable. CATS will in such cases suggest a number and ask if you agree or simply prompt for the degrees of freedom. It is, therefore, important to understand the logic behind the calculations of the degrees of freedom. Furthermore, even if most software packages check identication using some generic values for the model parameters and inform the user when identication is violated, we need to understand why identication failed in order to respecify the restrictions. The following example illustrates how to calculate degrees of freedom when we have imposed over-identifying restrictions on three cointegrating r relations in a VAR analysis of the Danish data (mr , yt , pt , Rm.t , Rb,t ). The t
10.3. FORMULATING IDENTIFYING HYPOTHESES
203
r rst relation expresses that (mr yt ) and (Rm.t Rb,t ) are cointegrated, so t r are driven by the same stochastic trend; the second relation that yt , pt and Rb,t are cointegrated, so share two common trends; and the third relation that Rm.t and Rb,t are cointegrated, so share one common trend. The rst step is to examine whether the restricted structure dened by c = {H1 1 , ..., Hr r } satises the rank and order condition for identication. We will here illustrate how this can be done analytically using condition (10.5) based on the following example where, for simplicity, we disregard any deterministic components in the cointegration relations. mr t c r yt 11 c 0 c c 11 12 12 c c c 0 21 22 0 23 pt (10.8) Rm,t 0 0 0 c c 31 32 Rb,t
When normalizing c by diving through with a non-zero element c , the i ij corresponding c vector is multiplied by the same element. Thus, normalizai tion does not change = c c0 = 0 and we can generally choose whether i i to normalize or not. However, when the long-run structure is identied normalization becomes more important. This is because it is only possible to get standard errors of ij when each cointegration vector is properly normal ized. In this case it is convenient to express c = hi + Hi i where is now i (si 1) 1, hi is a vector in sp(Hi ) dening the chosen normalization, and sp(hi , Hi ) = sp(Hi ) (see Johansen, 1995, p. 76). As discussed in Chapter 9.1 hypotheses on the cointegration structure can be formulated either by specifying the number of free parameters i , i.e. c = (H1 1 , ..., Hr r ) or the number of restrictions mi , i.e. R0 = 0. When we discuss identication it is convenient to choose the former formulation,
As discussed in Chapter 7 the parameters ( c , c ), ( c , c , c ) and ( c , c ) 11 12 21 22 23 31 32 are dened up to a factor of proportionality, and it is possible to normalize on one element in each vector without changing the likelihood: mr t r yt 1 1 0 c / c c / c 12 11 12 11 c c c c 0 1 22 / 21 0 23 / 21 pt . (10.9) Rm,t 0 0 0 1 c / c 32 31 Rb,t
204
i.e. to express the hypotheses in terms of the number of free parameters. For each cointegration vector c we have to make a distinction between the i normalized coecient and the remaining si 1 free coecients i . As an illustration we express the above structure (10.9) using c = hi + Hi i : i c = {h1 + H1 1 , h2 +H2 2 , h3 +H3 3 }, where 1 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 (10.10)
h1 + H1 1 = 3 3 = h3 + H
+ 0 0 0 0 1
[ 12 ] , h2 + H2 2 =
22 23 ,
[ 32 ] .
For given estimates of c the estimates of c are given by the formula in Chapter 7: c = S01 ( S11 ) , c and the standard errors of ij are calculated as: c = ij where q T 1 diag[H{H0 ( c0 1 c S11 )H}1 H0 ]ij H= H1 0 0 . . 0 . ... 0 0 Hr (10.12)
c c0 c 1
(10.11)
0 H2 . . . 0 0
10.3. FORMULATING IDENTIFYING HYPOTHESES
205
and the elements diag[]ij are dened by the ordering i = 1, ..., r and j = 1, ..., p1. The standard errors of the corresponding c coecients are calculated ij as: q c c0 c 1 c0 = T 1 ii ( ( S11 ) )jj
c ij
(10.13)
We will show below that it is always possible to impose r 1 justidentifying restrictions on by linear manipulations of the unrestricted cointegration vectors. Such pseudo restrictions do not change the value of the likelihood function and, thus, no testing is involved in this case. Note that pseudo restrictions are just-identifying only if there are exactly r 1 of them and they satisfy the condition (10.5). Additional restrictions on the structure change the value of the likelihood function and, thus, are testable. Such restrictions are over-identifying if they satisfy (10.5), otherwise they are non-identifying though nevertheless real testable restrictions. Given that the restrictions satisfy (10.5) the degrees of freedom can be calculated from the following formula: = X (mi (r 1)).
Consider si , the number of restricted coecients in c , and mi = p si , the i total number of restrictions on vector c , then the degrees of freedom in the i above example are calculated as follows: si s1 = 2 s2 = 3 s3 = 2 mi m1 = 3 m2 = 2 m3 = 3 r1 2 2 2 mi (r 1) 1 0 1 so the degrees of freedom are = 2. Note, however, that some of the restrictions may not be identifying (for example the same restriction on all cointegration relations), but are nevertheless testable restrictions. Whatever the case, the restrictions that are identifying must as a minimum satisfy the condition for just identication.
206
10.4
Just-identifying restrictions
In general we can always transform the long-run matrix = 0 by a 0 nonsingular r r matrix Q in the following way: = QQ1 0 = , where = Q and = Q01 . We will now demonstrate how to choose the matrix Q so that it imposes r 1 just-identifying restrictions on each i . As an example of the latter, we consider the following design matrix Q = [1 ] where 1 is a (r r) nonsingular matrix dened by 0 = [1 , 2 ]. In this case 0 = (1 10 0 ) = [I, ] where I is the (r r) unit matrix and 1 = 10 2 is a r (p r) matrix of full rank. For example, assume that 1 is (53): 1 0 0 .... e 41 e 51 0 1 0 .... e 42 e 52 0 0 1 ..... e 43 e 53
11 21 31 .... 41 51
12 22 32 .... 42 52
13 23 33 ..... 43 53
1 1 = ; 1 = 1 .... .... 2 2
We notice that the choice of Q = 1 in our example has in fact imposed two zero restrictions and one normalization on each cointegration relation. These just-identifying restrictions have transformed to the long-run reduced form. Thus, the above example for xt = [x1t , x2t ], where x1t = [x1t , x2t , x3t ] and x2t = [x4t , x5t ], would describe an economic application where the three variables in x1t are endogenous and the two in x2t are exogenous. Furthermore, if 2 = 0, then 0 xt does not appear in the equation for x1,t and x2,t is weakly exogenous for . In this case ecient inference on the long-run relations can be conducted in the conditional model of x1,t given x2,t as discussed in the previous chapter, see also Johansen (1992a). When endogenous and exogenous are given an economic interpretation this corresponds to the representation suggested by Phillips (1990). Note that the triangular representation requires that 2 = 0, which is a testable hypothesis and it is straightforward to test whether this is a good description of the data. Example 1: We consider here a just identied structure describing the long-run reduced form assuming real money, ination, and the short-term
10.4. JUST-IDENTIFYING RESTRICTIONS Table 10.1: Two just-identied long-run structures HS1 HS2 1 2 3 1 2 3 0 1.0 0 0 1.0 0 0.09 0.86 0.01 0.09 -1.0 0.08
(3.9) (4.3) (1.3) (3.9) (3.9)
207
mr yr p Rm Rb D83 mr t
r yt 2 pt
1.0 0 0.12
(0.6) (0.2)
0 0
(4.8)
0 1.0
(6.8)
1.0 0 1.0 0.21 15.9 0.27

(4.4) (3.7) (4.9)
8.09 0.44
(6.0)
0.21
(4.4) (0.5)
15.0
(6.1) (5.0)
0 0.00
(0.1)
0.00 0.18 1
(5.4)
0.00
(1.7) (3.2)
0.00 0.14 1 2 * 0.30

(5.4) (1.6) (4.5)
2 3 * 0.31 5.32 * *
(1.7) (1.7) (3.5)
3 *
* 1.66
(9.1)
* 1.7
* 3.4
* * * 5.00
(2.4)
Rm,t Rb,t
* *
0.01 0.31 * *
0.83 *
0.01
(1.7)
0.83
(4.5)
interest rate to be endogenous and real income and the bond rate to be exogenous corresponding to the following restrictions on : HS1 : = (H1 1 , H2 2 , H3 3 ), where H1 = 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 , H2 = 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 , H3 = 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1
i.e. H1 picks up ination rate, H2 real money, H3 the short-term interest rate and the two exogenous variables and the shift dummy Ds83 enter all
208
three relations. The estimates are reported in Table 10.1. The rst relation is similar to the real income relation H16 in Table 9.3, noticing that the coecient to bond rate (and the shift dummy) is not signicantly dierent from zero. The second relation is approximately money velocity as a function of the long-term bond rate, except that the income coecient is not as close to unity as we have seen before. The third relation is almost replicating H25 in Table 9.2 (noticing that real income is not signicant). There is no testing in this case since the r - 1 = 2 restrictions have been obtained by linear combinations of the unrestricted relations, i.e. by rotating the cointegration space. The coecients are reported in the lower part of Table 10.1. Only coe cients with a |t-value| > 1.6 have been reported. Because no real restrictions 0 c0 have been imposed the matrix = = c , i.e. it is exactly the same as for the unrestricted model. It is now easy to see that the combination c0 c0 c 2 xt + c 3 xt will replicate the money demand relation H15 in Table 12 13 9.2. Hence, the money demand relation, mr y r 14.1(Rm Rb )0.145Ds83 is in fact a linear combination of two stationary relations. Similarly, the linc0 c0 ear combination c 1 xt + c 3 xt will replicate the ination relation H28 31 33 in Table 9.2. Example 2. Here we give an example of both zero and non-zero (homogeneity) just-identifying restrictions: HS2 : = (H1 1 , H2 2 , H3 3 ), where H1 = 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 , H2 = 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 , H3 = 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
In this case we check that the restrictions are in fact just-identifying by calculating the rank condition (10.5). Just identication corresponds to rank(R0i Hj ) = 1 for i, j = 1, 2, 3 and j 6= i, and rank (R0i (Hj , Hk )) = 2 for i, j, k dierent. The rank indices reported in Table 10.3 column HS2 ,
10.5. OVER-IDENTIFYING RESTRICTIONS
209
Table 10.2: Two over-identied long-run structures for the Danish data HS.3 HS.4 1 2 3 1 2 3 r m 0 1.0 0 0 1.0 0 r y 0.09 0.96 0 0.08 -1.0 0
(9.0) (5.3) (5.0)
p Rm Rb D83
1.0 0 0
0 0
(5.6)
0 1.0 0 1.0 1.48 13.8

(6.7) (6.1)
0 1.0
(9.7)
8.56 0.38
(10.0)
0.48
(2.2)
13.8 0.38
(6.1)
0 0.18
(4.7)
0 0.15
(5.5)
0 3
(4.4)
1 mr t
r yt 2 pt
2 3 1 0.29 5.09
(5.0) (3.4)
2 0.30
(5.0)
1.66
(9.0)
(2.3)
(1.6) (3.5)
1.5 1.66
(9.1)

(2.3)
Rm,t Rb,t
0.01 0.28
0.01 0.37
conrm that the above conditions are indeed met. The estimates of and are given in the right hand side of Table 10.1. It appears that the second relation is now the money demand relation, whereas 1 + 3 becomes the rst relation and 1 3 the third relation under HS1 . These two examples serve the purpose of illustrating that in general one can nd several identied structures by rotating the cointegrating space. Table 10.1 showed that two of the estimated coecients in the just-identied structures were insignicant. In the next section we will discuss how to impose over-identifying restrictions on and how to choose between dierent long-run structures.
10.5
Over-identifying restrictions
We will consider two overidentied structures based on the money demand data for p1 = 6 and r = 3. Formal identication requires that rank(R0i Hj ) 1 for i, j = 1, 2, 3 and j 6= i, and that rank (R0i (Hj , Hk )) 2 for i, j, k
210
CHAPTER 10. IDENTIFICATION LONG-RUN STRUCTURE Table 10.3: Checking the rank conditions HS2 HS3 HS4 HS5 ri.jk HS2 HS3 HS4 1 1 1 1 1 1 3 2 1 1 2 3 2 1 2 1 2 2 2 0 3 2 2 3 1.23 2 4 4
ri.j 1.2 1.3 2.1 2.3 3.1 3.2
HS5 4
2.13
3.12
dierent. The rank indices are given in Table 10.3 column HS.3 and HS.4 where the i.j elements should be at least 1 and the i.jk elements at least 2 for generic identication. The degrees of freedom in the test for overidentifying restrictions are calculated as = i (mi r + 1), where mi is the number of restrictions on i . Example 3. We rst consider an overidentied structure reported dened by: HS3 : = (H1 1 , H2 2 , H3 3 ), where 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0
i.e. structure HS1 with the insignicant coecients set to zero. The estimates are given in Table 10.2. The degrees of freedom are = r mi (r 1) = i=1 (4 2) + (2 2) + (4 2) = 2 + 0 + 2 = 4 and the test statistic became 2 (4) = 1.79 with a p-value of 0.77. Thus the structure can clearly be accepted. Example 4. The next example consists of the joint testing of the three stationary relations H28 , H15 , and H25 reported in Table 9.2. They are reported
H1 =
, H2 =
, H3 =
10.6. LACK OF IDENTIFICATION
211
under HS4 in Table 10.2. The degrees of freedom are = r mi (r 1) = i=1 1 + 1 + 2 = 4 and the test statistic became 2 (4) = 1.60 with a p-value of 0.81. Thus, both HS3 and HS4 are acceptable long-run structures. HS3 has more the character of building blocks (corresponding to the irreducible cointegration vectors in Davidson 2001) which, when weighted by the corresponding ij , become more interpretable from an economic point of view. The relations of HS4 describe directly interpretable steady-state relations, and as demonstrated above, are in fact linear combinations of the building blocks in HS3 . Does it matter or not whether we choose one representation or the other? In the present example the answer is probably not really. Most economists would probably prefer a structural representation that mimics as closely as possible the hypothetical steady-state relations. From a statistical point of view the building blocks representation is in many cases preferable for two dierent but related reasons: First, when r is large relative to p - r it becomes increasingly dicult to identify r meaningful steady-state relations given that at least r - 1 restrictions has to be imposed on each relation. Second, when a steady-state relation is a direct combination of two stationary building blocks, such as (mr y r )1 (Rm Rb ) where (mr y r ) I(0) b 1 is no longer a coecient between two nonand (Rm Rb ) I(0), then b stationary variables, and the super-consistency result for estimated cointegration coecients no longer holds. In fact, 1 has now the meaning of an b coecient combining two stationary cointegration relations. Nevertheless, the money demand relation in HS4 is a linear combination of two nonstationary relations, (mr y r ) I(1) and (Rm Rb ) I(1), as demonstrated in Table 9.2. We demonstrated above that the same money demand relation can also be obtained by combining two stationary building blocks. Similar arguments hold for the rst relation in HS4 . Because of the specic restrictions imposed the ij coecients of HS4 are all coecients between nonstationary variables. Hence, the calculated standard errors are based on correct distributional assumptions in both HS3 and HS4 .
10.6
Lack of identication
Example 5: . We now consider the structure dened by:
212
where
HS5 : = (H1 1 , H2 2 , H3 3 ), 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 , H2 = 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 , H3 = 0 0 0 1 0 0 0 0 0 0 1 0 ,
i.e. structure HS4 without imposing the homogeneity restriction on 1 . The rank conditions of Table 10.3 shows that relation 1 is not identied relative to relation 3. The reason is that the above structure cannot be distinguished from {c + c , c , c }, R. We say that H3 is a subset of H1 . But, the 1 3 2 3 homogeneity restriction of HS4 identies 1 uniquely in the sense that c 3 can no longer be added without violating the former. Thus, identication can be restored by either imposing the homogeneity restriction as in HS4 or by setting one of two interest rates to zero in H1 1 . Thus, the model specied by the restrictions in HS5 is not identifying in the sense dened here implying that the four parameters 11 , 12 , 13 and 14 cannot be estimated uniquely without further restrictions. Another way of expressing this is that one of the interest rates can be removed from 1 by adding a linear combinations of 3 . For example, 1 0.10 3 , removes 0.36 the bond rate from 1 . Therefore, in this set-up we can only estimate the impact of a linear combination of the interest rates in the rst relation. Nevertheless, though the restrictions in Table 10.4 are not identifying, they are genuine restrictions on the parameter space and the model can be tested by a likelihood ratio test but in this case we need to tell CATS how many degrees of freedom there is in the test. In this case the test statistic became 2.07 with v = 1 + 1 + 2 = 4 degrees of freedom and the estimated relations clearly dene stationary relations in the cointegration space.
H1 =
10.7
Recursive tests of and
c The parameters of the identied model are simply plots of (t1 ) and c (t1 ) for t1 = T0 , . . . , T . The standard errors are calculated as indicated in (10.12) and (10.13).
10.7. RECURSIVE TESTS OF AND Table 10.4: An un-identied long-run structures for the Danish data HS.5 1 2 3 r m 0 1.0 0 r y 0.08 1.0 0 p 1.0 0 0 Rm -0.04 -12.9 1.0 Rb 0.10 12.9 0.36 D83 0 0.15 0 1 2 3 r mt 0.30
(5.0) r yt 2 pt
213
1.68
(9.1)

(2.3)
(1.8) (4.5)
1.6
Rm,t Rb,t
0.01 0.36
-0.90 -1.08 -1.26 -1.44 -1.62 -1.80 -1.98 -2.16
IDE
1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 83 84 85 86
MO = 0.00
87
88
89
90
91
92
93
83
84
85
86
87
88
89
90
91
92
93
0.14
FY
1.2 1.0 0.8
IBO
0.12
0.10
0.6 0.4 0.2
0.08
0.06 0.0 0.04 83 84 85 86 87 88 89 90 91 92 93 -0.2 83 84 85 86 87 88 89 90 91 92 93
2.00 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 83 84 85 86
DIFPY = 1.00
1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00
D83 = 0.00
87
88
89
90
91
92
93
83
84
85
86
87
88
89
90
91
92
93
Figure 10.1. Recursively calculated coecients of 1 (t1 ) for t1 =1983:2,...,1993:4.
214

1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 83 84 85 86 87 88 89 90 91 92 93
DMO
0.075 0.050 0.025 0.000 -0.025 -0.050 -0.075 83 84 85 86 87
DIDE
88
89
90
91
92
93
1.25 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 83 84 85 86 87
DFY
0.075 0.050 0.025 -0.000 -0.025 -0.050 -0.075 -0.100 -0.125
DIBO
88
89
90
91
92
93
83
84
85
86
87
88
89
90
91
92
93
-0.8 -1.0 -1.2 -1.4 -1.6 -1.8 -2.0 -2.2 83 84 85 86 87
DDIFPY
88
89
90
91
92
93

2.00 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 83 84 85 86 87 88 89 90 91 92 93 -25 83 84 85 86 87 88 89 90 91 92 93 -15 -5
MO = 1.00
IDE
-10
-20
0.00 -0.25 -0.50 -0.75 -1.00 -1.25 -1.50 -1.75 -2.00 83 84 85 86
FY = -1.00
25
IBO
20
15
10
0 87 88 89 90 91 92 93 83 84 85 86 87 88 89 90 91 92 93
1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 83 84 85 86
DIFPY = 0.00
-0.05
D83
-0.10
-0.15
-0.20
-0.25
-0.30 87 88 89 90 91 92 93 83 84 85 86 87 88 89 90 91 92 93
10.7. RECURSIVE TESTS OF AND

0.16 0.08 0.00 -0.08 -0.16 -0.24 -0.32 -0.40 -0.48 83 84 85 86 87 88 89 90 91 92 93
215
DIDE
DMO
0.0080 0.0040 0.0000 -0.0040 -0.0080 -0.0120 -0.0160 -0.0200 -0.0240 83 84 85 86
87
88
89
90
91
92
93
0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 83 84 85 86 87
DFY
0.025 0.020 0.015 0.010 0.005 0.000 -0.005 -0.010 -0.015
DIBO
88
89
90
91
92
93
83
84
85
86
87
88
89
90
91
92
93
0.15 0.10 0.05 -0.00 -0.05 -0.10 -0.15 -0.20 83 84 85 86
DDIFPY
87
88
89
90
91
92
93

1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 83 84 85 86 87 88 89 90 91 92 93
MO = 0.00
2.00 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 83 84 85 86
IDE = 1.00
87
88
89
90
91
92
93
1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 83 84 85 86
FY = 0.00
-0.160 -0.200 -0.240 -0.280 -0.320 -0.360 -0.400 -0.440 -0.480 -0.520
IBO
87
88
89
90
91
92
93
83
84
85
86
87
88
89
90
91
92
93
1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 83 84 85 86
DIFPY = 0.00
1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00
D83 = 0.00
87
88
89
90
91
92
93
83
84
85
86
87
88
89
90
91
92
93
216

6 5 4 3 2 1 0 -1 -2 -3 83 84 85 86 87 88 89 90 91 92 93 -0.6 83 84 85 86 87 88 89 90 91 92 93 -0.4 -0.5 -0.1 -0.2 -0.3
DMO
-0.0
DIDE
2 1 0
DFY
0.24
DIBO
0.12
0.00 -1 -0.12 -2 -3 -4 83 84 85 86 87 88 89 90 91 92 93 -0.24
-0.36 83 84 85 86 87 88 89 90 91 92 93
3 2 1 0 -1 -2 -3 83 84 85 86 87
DDIFPY
88
89
90
91
92
93
Figure 10.6. Recursively calculated coecients of 3 (t1 ) for t1 =1983:2,...,1993:4. We note that the estimated long-run coecients remain remarkably stable all since 1983:2. More!
10.8
Concluding discussions
This chapter demonstrated that an empirically stable money demand relation of the type discussed in Romer (1996) can be found as a linear combination (1.0, 13.8) of the two stationary cointegration relations (mr y r + 8.6Rb 0.2Ds83)t and (Rm 0.4Rb )t , or, alternatively, as a cointegration relation (1.0, 13.8) between two nonstationary processes, (mr y r 0.2Ds83)t and (Rm Rb )t . Given that the VAR model is a good description of the information in the data, we do not only obtain full information maximum likelihood estimates of the money demand parameters (which in general have optimal properties) but, as already discussed in Section 9, we also gain information about the number of autonomous shocks, i.e. the common stochastic trends driving the system.
10.8. CONCLUDING DISCUSSIONS
217
Furthermore, by imbedding the money demand relation (as well as the other stationary relations) in a dynamic equilibrium error correction system it is possible gain empirical insight into the short-run dynamics of the adjustment and feed-back behavior. This will be the topic of the next chapter, in which we will ask questions like: Is it possible by expansion of money supply in excess of the real productivity level in the economy to permanently increase real income or is the nal eect only an increase in the ination rate? Should money supply or the interest rates be used as monetary instruments? Is money stock endogenously or exogenously determined? Is the level of the market interest rates determined by the central bank or by the nancial market, or by both? What are the empirical consequences of either case?
Chapter 11 Identication of the Short-Run Structure

In the previous chapter we gave the arguments for why it is possible to treat the identication as two separate, though interdependent, problems. Essentially all the results discussed in Section 10.1 apply also for the shortrun structure of the model and will not be repeated here. This means that an identied short-run adjustment structure should satisfy the conditions for generic, empirical and economic identication similarly as for the long-run structure. However, the residual covariance matrix plays an important role in the identication of the short-run structure, whereas the long-run covariance matrix of the cointegrating relations was not part of the identication process. In this important respect the two identication problems dier from each other. In this chapter we will discuss identication of a short-run structure where we allow simultaneous current eects as well as short-run adjustment eects to lagged changes of the variables and to previous equilibrium errors to be part of the model. However, we will not impose identifying restrictions on the residuals, such as orthogonality restrictions, except for trivially in the triangular form model. Identifying restrictions on the residuals will be discussed in Chapter 13, where we also will discuss how to distinguish between temporary and permanent errors in a structural VAR model. The organization of this chapter is as follows: Section 11.1 discusses how to impose identifying restrictions on the short-run adjustment dynamics, Section 11.2 the dicult problem of interpreting linear functions of VAR residuals as structural shocks, and Section 11.3 empirical and economic identica219
220CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE tion and provides some examples of economic questions related to the shortrun adjustment structure. Sections 11.4, 11.5 and 11.6 illustrate short-run identication in three dierent model representations. Section 11.7 discusses identication of the short-run structure in a partial model and illustrates with the Danish data. Finally, Section 11.8 concludes with a discussion of the economic plausibility of the estimated results based on generically and empirically identied models for the Danish data.
11.1
Formulating identifying restrictions
As already discussed the identication of the short-run structure is very much facilitated by keeping the properly identied cointegrating relations xed at c0 their estimated values, i.e. treating xt as predetermined stationary regressors similarly as xt1 . The statistical justication is that the estimates c of the long-run parameters are super consistent, i.e. the speed of convergence toward the true value c is proportional to t as t + , whereas the convergence the estimates of the short-run adjustment parameters are of proportional to t. The cointegrated VAR model is a reduced form model in the short-run dynamics in the sense that potentially important current (simultaneous) effects are not explicitly modeled but are left in the residuals. Thus, large o-diagonal elements of the covariance matrix can be a sign of signicant current eects between the system variables. In some cases the residual covariances are small and of minor importance and can be disregarded altogether. Since the reduced form is always generically identied all further restrictions on the short-run structure are then overidentifying. While a simplication search in the reduced form VAR model is quite simple, this is generally not the case when the covariance matrix is part of the identication process. When discussing identication of the short-run adjustment parameters we will assume that the cointegration relations have been properly identied in the rst step of the identication scheme. For simplicity we disregard dummy variables at this stage. The discussion of how to impose identifying restrictions on the short-run parameters will then be based on a given c identied = i.e. for:
11.1. FORMULATING IDENTIFYING RESTRICTIONS
221
c0 A0 xt = A0 1 xt1 +A0 xt1 +A0 0 +A0 t , t Np (0, ) or equivalently: A0 xt = A1 xt1 + a xt1 + 0,a +vt , vt Np (0, )
c0
(11.1) (11.2)
(11.3)
where A1 = A0 1 , a = A0 , 0,a = A0 0 , and vt = A0 t . Multiplying the VAR model with a nonsingular (p p) matrix A0 does not change the likelihood function, but introduces p(p1) new parameters (assuming that the diagonal elements of A0 are ones and that the residual covariance matrix is unrestricted). Thus, we need to impose at least p(p1) just-identifying restrictions on (11.3) to obtain a unique solution. The structural vector error correction model (11.3) can be written in a more compact form, similar as for the cointegration structure, 0 xt = ut : A0 Xt = 0,a + vt . (11.4)
where A0 = (A0 , A1 , a) and X0t = (x0t , x0t1 , x0t1 c ) is a stationary process. When checking for generic identication of the short-run structure (11.4) we can use the same rank condition given in Chapter 10 for the longrun structure. Identifying restrictions on the rows of A0 (we assume here that the constant term is not part of the identication process) can, as before, be formulated by the design matrices Hi : A = (H1 1 , ..., Hp p ). Note that when the model contains dummy variables we can, in general, choose whether to include them in the identication process or not. In the latter case, the dummy variables will be unrestricted in all equations, whereas in the former case they will be included in the vector X0t and, thus, have to satisfy the usual conditions for identication. Either of the two ways of calculating the identication rank indices described in the previous chapter can be used. To nd out whether the restrictions dening the model are identifying one can either check the rank conditions in (??) based on (??) or by using some generic parameter values.
222CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE Given that the short-run identication structure is generically identied, estimation can in principle be carried out by the switching algorithm of the eigenvalue routine described in Chapter 10. Since the dimension of the equation system is generally larger than the dimension of the cointegration space, the switching algorithm can be a little cumbersome and it is common to apply other maximization algorithms. See, for example, the selection of dierent estimation algorithms in GiveWin (Doornik and Hendry, 2003).
11.2
Interpreting shocks
Because dierent choices of A0 lead to dierent estimates of the residuals, the question of how to dene a shock is important. The theoretical concept of a shock, and its decomposition into an anticipated and unanticipated part, has a straightforward correspondence in the V AR model as a change of a variable, xt , and its decomposition into an explained part, the conditional expectation Et1 {xt | Xt1 }, and an unexplained part, the residual t . The requirement for t to be a correct measure of an unanticipated (autonomous) shock is that the conditional expectation Et1 {xt | xt1 } correctly describes how agents form their expectations. For example, if agents make model based rational expectations from a model that is dierent from the V AR model, then the conditional expectation would no longer be an adequate description of the anticipated part of the shock xt . However, in this case the VAR model would not be a good description of the information in the data. Theories also require shocks to be structural, implying that they are, in some sense, objective, meaningful, or absolute. With the reservation that the word structural has been used to cover a wide variety of meanings, we will here assume that it describes a shock, the eect of which is (1) unanticipated (novelty), (2) unique (a shock hitting money stock alone), and (3) invariant (no additional explanation by increasing the information set). The novelty of a shock depends on the credibility of the expectations formation, i.e. whether t = xt Et1 {xt | g(Xt1 )} is a correct measure of the unanticipated change in x. The uniqueness can be achieved econometrically by choosing A0 so that the covariance matrix becomes diagonal. For example, as will be demonstrated in Section 11.5, by postulating a causal ordering among the variables of the system one can trivially achieve uncorrelated residuals. In general, the VAR residuals can be orthogonalized in many dierent ways and whether the orthogonalized residuals vt
11.3. WHICH ECONOMIC QUESTIONS?
223
can be given an economic interpretation as a unique structural shock, depends crucially on the plausibility of the identifying assumptions. Some of them are just-identifying and cannot be tested against the data. Thus, dierent schools will claim structural explanations for dierently derived estimates based on the same data. The invariance of a structural shock is probably the most crucial requirement as it implies that an empirically estimated structural shock should not change when increasing the information set. Theory models dening deep structural parameters are always based on many simplifying assumptions inclusive numerous ceteris paribus assumptions. In empirical models the ceteris paribus assumptions should preferably be accounted for by conditioning on the ceteris paribus variables. Since essentially all macroeconomic systems are stochastic and highly interdependent, the inclusion of additional ceteris paribus variables in the model is likely to change the VAR residuals and, hence, the estimated shocks. Therefore, though derived from sophisticated theoretical models, structural interpretability of estimated shocks seems hard to justify. In the words of Haavelmo (1943) there is no close association between the true shocks dened by the theory model and the measured shock based on the estimated VAR residuals. A structural shock seems to be a theoretical concept with little or fragile empirical content in macro-econometric modelling. In the remaining part of this chapter we will discuss economic identication in the broad sense of answering some questions of economic relevance, and not in the narrow sense of identifying deep structural parameters
11.3
Which economic questions?
Macroeconomic theory is usually quite informative about prior economic hypotheses relevant for the long-run structure, whereas much less seems to be known about the short-run adjustment mechanisms. Thus, the identication of the short-run structure has often the character of data analysis aiming at the identication of a parsimonious parameterization rather than testing well-specied economic hypotheses. Nevertheless, the simplication search should preferably be guided by some relevant questions that the empirical analysis should provide an answer to. From (11.3) we note that the data variation can be decomposed into a systematic (anticipated) part and an unsystematic (unanticipated) part. The
224CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE systematic part explains the change from t 1 to t of the system variables as a result of: 1. current (anticipated) changes (xj,t , j 6= i) in the system variables, xi,t , i = 1, ..., p. 2. previous changes of the system variables, xi,tm , i = 1, ..., p, m = 1, ..., k 1, 3. deviations from previous long-run equilibrium states, 0i xt1 , i = 1, .., r, and 4. extraordinary events, Dt , such as reforms and interventions. Thus, this type of empirical models allow for the possibility that agents react on (i) previous equilibrium errors in the long-run steady-state relations, (ii) current and lagged changes in the determinants, and (iii) extraordinary events. In contrast, most theoretical models make prior assumptions on how agents are supposed to react under optimizing behavior given some ceteris paribus assumptions. Based on these assumptions, the model may predict, for example, instantaneous or partial adjustment behavior towards equilibrium states. In this sense the specication of economic models is more precise with respect to the postulated behavior and its economic consequences. On the other hand these consequences are only empirically relevant given that the postulated behavior is (at least approximately) correct. The specication of an empirical model like (11.3) is less precise in terms of an underlying theoretical model, but far more exible in terms of actual macroeconomic behavior which is inuenced by the wider circumstances that generated the data, i.e. by the ceteris paribus assumptions of the economic model. In the ideal case the empirical model should allow for all relevant aspects of (possibly competing) theoretical model(s) as testable hypotheses, but also add to the realism of the empirical model by conditioning on relevant ceteris paribus variables. For example, the question of instantaneous or partial adjustment behavior in the domestic money market can be specied as hypotheses on the adjustment coecients ij in the empirical model. A surprising outcome of a test might then be associated with ceteris paribus assumptions of the theoretical model, such as constant real exchange rate
225
or no risk aversion in the capital markets just to mention a few of the possibly very crucial ones. We will illustrate these ideas by recalling that the economic motivation for doing an empirical analysis of the Danish money demand data was the discussion in Chapter 2 of ination and monetary policy. The latter was strongly inuenced by the discussion in Romer (1996), Chapter 9, in which the economic model was based on an assumption on equilibrium behavior in the money market. The interest in empirical money demand relations is motivated by the idea that if the latter are known and empirically stable, then central banks would be able to supply exactly the amount of money satisfying agents demand for money. Based on the empirical results reported in the preceding chapters we were able to conclude that the Danish money market behavior in the post Bretton Woods period has been characterized by short-run equilibrium error correction behavior. By allowing for dynamic adjustment towards long-run steady states it is now possible to address a number of additional questions which are directly or indirectly related to the eectiveness of monetary policy. For example: Is money stock adjusting to a long-run money demand or supply relation? Has monetary policy been more eective when based on changes in money stock or changes in interest rates? Does the eect of expanding money supply on prices dier in the short run, in the medium run, or in the long run? Is an empirically stable demand for money relation a prerequisite for monetary policy to be eective for ination control? How strong is the direct (indirect) relationship between a monetary policy instrument and price ination? As demonstrated above the short-run adjustment coecients are not in general invariant to the choice of A0 , which is why the answers to the above questions are crucially related to the identication issue. In some cases the empirical answers can be very sensitive to the choice of identication scheme, in other cases results are more robust.
226CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE To illustrate how one can empirically investigate the above questions we will assume that the causal chain model below is an adequately identied representation of the short-run structure of the Danish data. For simplicity of notation we will here assume that A1 = 0, i.e. there is no short-run dynamic adjustment in lagged changes of the process and that the longrun structure can be described by the simple relations discussed in Chapter 2. Generalization to more realistic specications, such as the empirically identied relations of Table 10.2, should be straightforward. We consider the following model specication: 1 a0 a0 a0 12 13 14 0 1 a0 a0 23 24 0 0 1 a0 34 0 0 0 1 0 0 0 0 a11 a21 a31 a41 a51 a12 a22 a32 a42 a52 a13 a23 a33 a43 a53 a0 15 a0 25 a0 35 a0 45 1 mr t 2 pt Rm,t r yt Rb,t 0,1 0,2 0,3 0,4 0,5
To simplify the discussion we will focus solely on the money and ination equations, albeit acknowledging that a complete answer should be based on the full system analysis. mr t 2 pt
r yt Rm,t Rb,t r = a0 2 pt + a0 Rm,t + a0 yt + a0 Rb,t 12 13 14 15 r r +a11 (m y )t1 + a12 (Rb Rm )t1 + a13 (p Rm )t1 + mt r = a0 Rm,t + a0 yt + a0 Rb,t 23 24 25 r r +a21 (m y )t1 + a22 (Rb Rm )t1 + a23 (p Rm )t1 + pt = .... = ..... = .....
(mr y r )t1 (Rb Rm )t1 + (p Rm )t1
vmt vpt vyr t vRm,t vRb,t
(11.5)
Question 1. Is money stock adjusting to money demand or supply?
227
If a11 < 0, a12 > 0, and a13 > 0, then empirical evidence is in favor of money holdings adjusting to a long-run money demand relation. The latter can be derived from the money stock equation as follows: mr = y r + a12 /a11 (Rb Rm ) + a13 /a11 (p Rm ) = y r 1 (Rb Rm ) 2 (p Rm )
(11.6)
Relation (11.6) corresponds to the aggregate money demand relation discussed in a static equilibrium framework by Romer, with the dierence that (11.6) is now imbedded in a dynamic adjustment framework. On the other hand, the case a11 < 0, a12 = 0, and a13 = 0, could be consistent with a situation where the central bank has controlled money stock so that money velocity is stationary around a constant level. Question 2. Is an empirically stable money demand relation a prerequisite for central banks to be able to control ination rate? This hypothesis is based on essentially three arguments: 2.1 There exists an empirically stable demand for money relation. 2.2. Central banks can inuence the demanded quantity of money. 2.3. Deviations from this relation cause ination. The empirical requirement for a stable money demand relation is that {a11 < 0, a12 > 0,and a13 > 0}, and that the estimates are empirically stable. In this case money stock is endogenously determined by agents demand for money and it is not obvious that money stock can be used as a monetary instrument by the central bank. Thus, given the previous result (i.e. a11 < 0, a12 > 0,and a13 > 0), central banks cannot directly control money stock. Nevertheless, controlling money stock indirectly might still be possible by changing the short-term interest rate. But a12 > 0 and (Rb Rm ) I(0) implies that a change in the short-term interest rate will transmit through the system in a way that leaves the spread basically unchanged. Therefore, even if a stable demand for money relation has been found, it may, nevertheless, be dicult to control money stock. Finally, {a21 > 0} would be consistent with the situation where deviations from the empirically stable money demand relation inuence ination rate in the short-run. Under the assumption that agents can obtain their desired
228CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE level of money (i.e. no credit restrictions), it seems likely that there is excess money in the economy because of excess supply rather than excess private demand. Excess money would generally then be the result of central bank issuing more money than demanded, i.e. by monetization of government debt. In the next sections we will illustrate the above discussion using the Danish data based on the following identication schemes: 1. imposing restrictions on the short-run parameters when A0 = I, 2. imposing (just-identifying) zero restrictions on the o-diagonal elements of , 3. imposing general restrictions on A0 without imposing restrictions on , 4. re-specifying the full system model as a partial model based on weak exogeneity test results.
11.4
Overidentifying restrictions on the reduced form
The estimates of the unrestricted VAR model were given in Table 7.1. We note that: (i) the short-run structure is over-parameterized with many insignicant coecients, (ii) the adjustment coecients ij seem to provide the bulk of explanatory power, (iii) the signs and the magnitudes of the ij coecients seem reasonable, but the coecients to the lagged changes of the process are more dicult to interpret, (iv) some of the correlations of the (0,1) standardized residuals are quite large possibly because of important current eects. For the Danish data we have no strong prior hypotheses about the shortrun structure and imposing overidentifying restrictions has more the character of a simplication search rather than stringent economic identication. The guiding principle is the plausibility of the results, in particular plausible estimates of the equilibrium correction coecients, a. Many of the short-run adjustment coecients in the unrestricted reduced form VAR discussed in Chapter 4 were statistically insignicant and the rst step is to test whether they can be set to zero without violating the
11.4. REDUCED FORM RESTRICTIONS
229
joint test of over-identifying restrictions. First we remove the insignicant lagged variables 2 pt1 and R m,t1 from the system (altogether 10 coefcients) based on F-test. The dummy variables, except for Ds83t , were only marginally signicant and were also left out (altogether 15 coecients). In this trimmed system 20 additional zero restrictions were accepted based on a LR test of over-identifying restriction, 2 (20) = 19.7 and a p-value of 0.54. The estimates of the remaining coecients reported in Table 11.1 are all signicant on the 5% level, albeit some only borderline so. The original VAR contained 50 autoregressive coecients + 20 seasonal coecients, 20 dummy variable coecients and 15 parameters in the residual covariance matrix. Of these only 16 remained signicant, which is a signicant reduction. We note that the equilibrium correction coecients aij are highly signicant demonstrating the major loss of information usually associated with VAR models specied in dierences. All current eects are accounted for by the residual covariance matrix at the bottom of Table 11.1. Most of the correlations are relatively small, but the residuals from the money stock equation and the real income equation are correlated with a coecient 0.33 and the residuals from the short-term interest rate and the bond rate equations with a coecient of 0.40. There are at least three explanations for why the residuals from a VAR model are likely to be correlated: 1. Causal chain eects get blurred when the data are temporally aggregated; the more aggregated the data, the higher the correlations. Assume, for example, that the bond rate changes the day after a change in the short-term interest rate (as a result of a central bank intervention, say) and that the central bank changes the short-term interest rate as a result of a market shock to the long-term bond rate but only after week. In this case we would be able to identify correctly the causal links based on daily data. However, when we aggregate the data over time the information about these links get mixed up. 2. Expectations are inadequately modelled by the VAR model. Chapter 3 showed that the VAR model is consistent with agents making plans based on the expectation: Et1 (xt | xt1 , ..., xtk+1 , 0 xt1 ). Assume now that agents have forward looking expectations to the variable, x1t , so that Et1 (x1t ) = xe , and that they use this value when 1t making plans: Et1 (xj,t | xe , xt1 , ..., xtk+1 , 0 xt1 ), j 6= 1. If 1t
230CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE xe = x1t , i.e. the expectation is exactly correct, then the reduced 1t form VAR residuals from equation x1t would be correlated with the residuals from the equations in which xe were important. 1t 3. Omitted variables eects. In most cases the VAR model contains only a subset of (the most important) variables needed to explain the economic problem. This is because the VAR model is very powerful for the analysis of small system, but the identication of cointegration relations becomes increasingly dicult when the dimension of the vector process gets larger. There are two important implications of omitted variables: (1) the estimated short-run adjustment coecients are likely to change when we increase the number variables in the model (provided that the new variables are not orthogonal to the old variables), (2) the residuals generally become smaller when increasing the dimension of the VAR. Thus, the residuals of a small VAR model are likely to contain omitted variables eects and the residuals will be correlated if the left-out variables are important for several of the variables in the small VAR. The last point makes it very hard to argue that the residuals could possibly be a measure of autonomous errors (structural shocks). Therefore, a large residual correlation coecient does not necessarily imply a structural simultaneous eect, nor does it imply incorrectly specied expectations. With this in mind we will now attempt to identify some of the current eects in the model as if they were a result of either point 1 or 2 above.
11.5
The VAR in triangular form
Assume that we want to estimate the VAR model with uncorrelated residuals. The most convenient way of achieving this is by choosing A0 in (11.3) to be an upper triangular matrix 1/2 , i.e. by pre-multiplying the VAR model with the inverse of the Choleski decomposition of the covariance matrix . In this case the likelihood function is decomposed into p independent sequentially decomposed likelihoods as demonstrated in Section 2.4: P (x1,t , ..., xp,t |xt1 , 0 xt1 ; s ) = =
i=1
P (xit |xi+1,t , ..., xp,t , xt1 , 0 xt1 ; s ) i
11.5. THE VAR IN TRIANGULAR FORM
231
Table 11.1: A parsimonious parametrization of the cointegrated VAR model m r 2 pt R m,t y r R b,t t t r mt1 0.21 0 0.01 0 0.02
(2.3) (2.6) (3.2) r yt1
0 0 0 0.33
(6.7)
0 0 1.39
(13.3)
0 0.34
(4.5)
0.21
(2.0)
0.05
(5.6) (3.4)
Rb,t1 ecm1t1 ecm2t1 ecm3t1 Ds83 t mr t 2 pt Rm,t r yt Rb,t
2.28
(2.4)
0.29 0 0 0 0.01
(5.0)
0 0.01
(2.0) (3.9)
0 0 0 0
0 0 0
0 0.06
(2.6)
0.28 0
(o-diagonal elements are standardized) 0.02432 -0.14 -0.22 0.33 -0.13 0.01462 -0.02 -0.14 -0.14
0.00122 -0.09 0.41
0.01822 0.02
0.00152
232CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE We nd p conditional expectations: E(xit |xi+1,t , ..., xp,t , xt1 , 0 xt1 ) = p P a0,j+1 xj+1,t + A1 xt1 + a0 xt1 , i = 1, ..., p
j=i
where a0i is a row vector dened by the last p-i elements of the ith row of A0 . Because the residuals are uncorrelated in this system, the coecients can be estimated fully eciently by OLS equation by equation. In this case the OLS estimates are equivalent to FIML estimates. The triangular system is based on a specic ordering of the p variables and, thus, on an underlying assumption of a causal chain. Since dierent orderings will, in general, produce dierent results, and the choice of ordering is subjective, there is an inherent arbitrariness in this type of models. It is, therefore, important to check the sensitivity of the results to various orderings of the causal chains. Here we rst investigate the consequences of dierent ordering by performing an OLS regression for each variable as if it had been at the end of the causal chain. The results are reported in Table 11.2 where the coecients of each row are given by the conditional c0 expectation E(xi,t |xj,t (j 6= i), xt1 , Dt ), i = 1, ..., 5, . By this exercise we can nd out how the reduced form model estimates would have changed for equation xi,t , i = 1, ..., 5 if we had included current changes of all the other variables in the system as regressors. The results show that mr and yr as well as Rb,t and Rm,t exhibited t t signicant simultaneous eects and that Rm,t was equilibrium correcting to the third cointegration relation but not Rb,t . In Chapter 10 we showed that both yr and Rb,t are weakly exogenous t for the long-run parameters . However, the short-run adjustment parameters are not invariant to transformations by a non-singular matrix A0 as demonstrated in (11.1). Thus, previously insignicant adjustment coecients ij in the reduced form model might become signicant in the transformed model. Independently of the chosen ordering no such changes in equilibrium correction eects were found in the equations for yr and Rb,t , demonstrating t the robustness of the previous results regarding the lack of long-run feedback for dierent model specications. This is an important reason for placing yr and Rb,t prior to Rm,t , 2 pt , and mr in the causal chain. Finally, t t because the primary purpose of this study is to investigate the role of money
11.5. THE VAR IN TRIANGULAR FORM
233
Table 11.2: The fully specied conditional expectations equation by equation m r 2 pt R m,t y r R b,t t t r m t 1 0.09 0.01 0.21 0.01
(1.1) (1.1) (2.4) (0.7) (1.3)
2 pt R m,t y r t R b,t mr t1
r yt1
0.22
(1.1) (1.1) (2.4)
1 0.28
(0.1)
0.00
(0.1)
0.14
(1.0) (0.3)
0.02 0.47
(3.3)
2.79 0.40 1.52

(0.7)
1 0.00
(0.3)
0.61 1 0.75
(0.5) (1.7) (0.9)
0.10
(1.0) (1.3) (0.9)
0.01
(0.5)
1.61 0.06 0.22

(2.0)
0.31
(3.3)
1 0.01
(2.0)
0.32
(3.0)
0.01
(1.3) (2.0)
0.14 0.12 3.13

(2.4)
0.17
(0.9) (0.4)
0.02 0.26
(3.0) (0.2)
0.04
(4.5)
Rb,t1 ecm1t1 ecm2t1 ecm3t1 Ds83 t
0.66 0.26
(0.8)
0.54
(0.5)
0.12
(1.0)
1.43
(12.6)
0.00 0.01
(2.0) (3.5) (2.2)
0.40
(1.6)
0.01
(0.6)
0.30
(4.5)
0.01
(0.3)
0.10
(1.8)
0.00
(0.3)
0.38
(0.2)
0.83
(0.7)
0.30 0.003
1.71
(1.3)
0.08
(0.5)
0.05
(1.6)
0.02
(0.7)
0.03
(1.2)
0.01
(4.3)
and ination in this system, 2 pt and mr are placed at the end of the t chain, prior to Rm,t . Ination did not exhibit any signicant eects from current changes in the other variables and, hence was placed prior to mr . t Based on the above arguments the estimates of the triangular system in Table 11.3 are based on the following ordering:
Rb,t yr Rm,t 2 pt mr . t t
(11.7)
The upper triangular representation of the current eects in Table 11.3 corresponds to the Cholesky decomposition of the residual covariance matrix when the system variables are ordered as in (11.7). To increase readability coecients with an absolute t-value > 1.9 are in bold face. It appears that the transformation by A0 does not change the VAR model estimates very much, indicating that the current eects are not
234CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE
Table 11.3: The short-run adjustment structure in triangular form m r 2 pt R m,t y r R b,t t t r m t 1 0 0 0 0 2 pt 0.22 1 0 0 0
(1.1) (1.1)
R m,t y r t R b,t mr t1 r yt1 Rb,t1 ecm1t1 ecm2t1 ecm3t1 Ds83 t mr t 2 pt Rm,t r yt Rb,t
A00 :
2.79 0.40
(2.4)
0.52
(0.3)
1 0.01
(0.7)
0 1 0.33
(0.2) (1.2) (1.2)
0 0 1 0.02
(2.6)
0.14
(1.4) (1.2) (0.5)
1.52
(0.7)
1.51 0.04 0.22

(1.9)
0.32
(3.5)
0.32
(3.0)
0.01
(1.1) (2.2)
0.10 0.16 3.66

(2.9)
A01
0.17
(0.9) (0.4)
0.02 0.25
(3.0) (0.1)
0.04
(3.8) (2.9)
0.66 0.26
(0.8)
0.54
(0.5)
0.29 0.01
(0.9)
1.43
(12.6)
0.00 0.01
(2.1) (3.6) (2.0)
0.21
(1.5)
a0 :
0.30
(4.5)
0.04
(0.9)
0.04
(0.9)
0.00
(0.1)
0.38
(0.2)
0.81
(0.7)
0.30 0.003
1.41
(1.1)
0.07
(0.6)
0.05
(1.6)
0.02
(0.9)
0.02
(0.8)
0.01
(4.2)
1 0 0 0 0
1 0 0 0
1 0 0
1 0
11.6. GENERAL RESTRICTIONS
235
very important, i.e. points 1 and 2 above do not seem to be empirically very important in this case.
11.6
Imposing general restrictions on A0
Many of the estimates of the triangular system reported in Table 11.3 were insignicant and it would seem natural to restrict them to zero. We will next demonstrate how to impose general restrictions on the matrices A0 , A1 , a, and while leaving unrestricted. Though statistical signicance is not necessarily the same as economic signicance the former is important for two reasons; (i) our model should preferably describe empirically relevant aspects of the economic problem for the chosen period, (ii) leaving insignicant coecients in the model may introduce near singularity in the model if generic identication is violated when restricting the insignicant coecients to zero. In the following we will demonstrate two dierent model structures obtained by imposing as many zero restrictions as possible on (A0 , A1 , a, and ) while at the same time keeping the residual correlations of as small as possible. Short-run structure I reported in Table 11.4 allows simultaneous eects r of mr and yt and of Rm,t and Rb,t while imposing overidentifying t restrictions on the remaining model parameters. The 2 pt equation is in the reduced form and, therefore, identied by construction. The remaining equations contain simultaneous eects and therefore need to be checked for generic and empirical identication. r The zero restriction of ecm2t1 in the equation for yt and the corresponding nonzero coecient in the equation for mr are identifying, both t generically and empirically (because of the highly signicant coecient of r ecm2t1 in the mr equation). Similarly, the signicant coecients of yt1 t r and Rb,t1 in the equation for yt and the corresponding zero coecients r in the equation for mt are identifying. The calculated rank indices (??) reported in Table 11.5 conrms the visual inspection: r1.4 = 2 implying that eq. 1 is identied w.r.t. eq. 4 with one overidentifying restriction and r4.1 = 3 that eq.4 is identied w.r.t. eq.1 with two overidentifying restrictions. The signicant coecients of mr and Ds83 t in the equation for Rb,t t1 and the corresponding zero coecients in the equation for Rm,t are identifying the latter w.r.t. the former. This is an example where a strongly signif-
Table 11.4: An overidentied simultaneous short-run adjustment structure I m r 2 pt R m,t y r R b,t t t r m t 1 0 0 0.28 0
(1.7)
2 pt R m,t y r t R b,t mr t1
r yt1
A00
0 0 0.76
(2.0)
1 0 0 0 0 0 0 1.40
(13.4)
0 1 0 0.10
(0.7)
0 0 1 0 0 0.31
(2.5)
0 0.24
(0.8)
0 1 0.01
(2.4)
0 0.32
(2.9)
0 0.02
(1.8)
A01 :
0 0 0
0.04
(5.0)
Rb,t1 ecm1t1 ecm2t1 ecm3t1 Ds83 t mr t 2 pt Rm,t r yt Rb,t 0 : a0 :
0.35
(3.5)
4.04
(2.4)
0.22
(1.8)
0 0.01
(2.2) (3.9)
0 0 0 0
0 0 0 0.01
(4.8)
0.31
(6.2)
0 0 0
0 0.07
(2.7)
0.33 0
(o-diagonal elements are standardized) 0.02362 -0.08 0.01472 : -0.15 0.02 0.00122 0.06 -0.20 -0.16 0.02102 -0.11 -0.14 0.10 0.01 0.00142
237
icant (and economically interpretable) dummy variable can contain valuable identifying information. Similarly, the signicant coecients of ecm2t1 and ecm3t1 in the Rm,t equation and the corresponding zero coecient in the r Rb,t equation are identifying w.r.t. Rm,t , whereas the coecients of yt1 and Rb,t1 are not identifying, since they enter similarly in both equations. The calculated rank indices (??) reported in Table 11.5 conrms the visual inspection: r3.5 = 2 implying that eq. 3 is identied w.r.t. eq. 5 with one overidentifying restriction. The same is true for eq.5 w.r.t. eq.3. Altogether 17 overidentifying restrictions have been imposed on the shortrun structure I. The LR test, distributed as 2 (17), produced a test statistic of 16.0 and the restrictions were accepted with a p-value of 0.52. The residual correlations reported at the bottom of Table 11.4 are quite small and insignicant, though not zero as in the triangular representation in Table 11.3. However, the estimated simultaneous eects are generally not very signicant, suggesting that the identifying information is weak in this model. This is particularly so for the coecients of the bond rate and the deposit rate. Therefore, short-run structure II, reported in Table 11.6, sets the insignicant current eect of R b,t in the equation for R m,t to zero and imposes additionally a zero restriction on R b,t1 in the equation for R b,t . As a result the residual correlations changed to some extent, but not signicantly so as can be inferred from the test of the 20 overidentifying restrictions which were accepted based on a test value of 22.5 (p-value 0.34). Based on the estimated results we notice that: 1. real money stock is exclusively equilibrium error correcting to long-run money demand, 2. the eect of an increase in real aggregate demand on money stock might be smaller in the short run (0.7) than in the long run (1.0), 3. ination is exclusively equilibrium error correcting to the homogeneous ination-interest rate relation, 4. short-term interest is essentially equilibrium error correcting to the long-term bond rate and exhibits a (small) negative eect from excess liquidity, 5. real aggregate demand shows a negative short-run eects from increases in the long-term bond rate,
Table 11.5: Checking the rank ri.j ri.jk 1.23 7 1.2 2 1.24 4 1.3 5 1.25 6 1.4 2 1.34 5 1.5 4 1.35 5 1.45 4 2.13 10 2.1 5 2.14 7 2.3 6 2.15 9 2.4 4 2.34 8 2.5 6 2.35 8 2.45 8 3.12 6 3.1 4 3.14 4 3.2 2 3.15 4 3.4 2 3.24 4 3.5 2 3.25 4 3.45 4 4.12 5 4.1 3 4.13 6 4.2 2 4.15 5 4.3 4 4.23 6 4.5 4 4.25 6 4.35 6 5.12 5 5.1 3 5.13 4 5.2 2 5.14 3 5.3 2 5.23 4 5.4 2 5.24 4 5.34 4
conditions of short-run structure 1 ri.jkm ri.jkmn
1.234 1.235 1.345
7 7 5
1.2345
2.134 10 2.135 10 2.345 10
2.1345 10
3.124 3.125 3.245
6 6 6
3.1245
4.123 4.125 4.235
6 7 8
4.1235
5.123 5.124 5.234
6 5 6
5.1234
239
Table 11.6: An overidentied simultaneous short-run adjustment structure II m r 2 pt R m,t y r R b,t t t r m t 1 0 0 0.16 0
(1.7)
pt R m,t y r t R b,t mr t1
r yt1
A00 :
0 0 0.70
(1.8)
1 0 0 0 0 0 0 1.40
(13.4)
0 1 0 0 0 0 0.38
(4.8)
0 0 1 0 0 0.12
(2.3)
0 0.67
(3.8)
0 1 0.01
(2.1)
0 0.32
(3.0)
A01 :
0 0 0
0.04
(4.6)
Rb,t1 ecm1t1 ecm2t1 ecm3t1 Ds83 t mr t 2 pt Rm,t r yt Rb,t 0 : a0 :
4.01
(2.8)
0 0 0 0 0.01
(5.0)
0 0.005
(1.5)
0 0 0 0
0.31
(6.3)
0 0 0
0 0.06
(2.7)
0.28
(3.4)
(o-diagonal elements are standardized) 0.02332 -0.08 -0.18 0.10 -0.03 0.01472 -0.02 -0.19 -0.15
0.00132 -0.14 -0.17
0.02092 0.03
0.00142
240CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE 6. the bond rate exhibits a small negative short-run eect from an increase in liquidity and a positive short-run eect from an increase in aggregate demand, 7. neither real aggregate demand nor the long-term bond rate exhibit any long run equilibrium correction eects. The last point can probably be explained by the present information set not being suciently large to give a satisfactory explanation of all ve variables. For example, the long-term bond rate would probably be equilibrium error correcting to the German bond rate of similar maturity and real aggregate demand to real exchange rates, terms of trade and foreign aggregate demand if these variables were included in the analysis. Chapter 16 will discuss a procedure for extending the model to include such eects.
11.7
A partial system
In this section we re-estimate the model as a partial model for real money stock, ination and the short-run interest rate conditional on the bond rate and the real aggregate demand which were found to be weakly exogenous for the long-run parameters . Note, however, that the latter does not imply weak exogeneity for the short-run adjustment parameters ( S ). Therefore, SF r if the parameters of interest are S then neither Rb,t nor yt is weakly exSF ogenous for these parameters as demonstrated by the estimation results in Tables 10.3 and 10.4. But, because the motivation for the empirical study in most cases is an interest in the dynamic feed-back eects w.r.t. the long-run structure, establishing long-run weak exogeneity for a variable is often used as a justication for performing the model analysis conditional on such a variable. As already discussed in Chapter 9.5 to nd out whether weak exogeneity holds either for the long-run parameters , or the short-run adjustment parameters S , we need to estimate the full system of equations. The question SF is why one would like to continue the analysis in a partial model considering that we already have estimated the full system. However, in some cases there are clear advantages with a partial system analysis. Assume, for example, that a weakly exogenous variable (w.r.t. ) has been subject to many
11.8. ECONOMIC IDENTIFICATION
241
interventions and that current changes of this variable has signicantly affected the other variables in system. Conditioning on such a variable is likely to reduce the need for intervention dummies in the model and this might improve the stability of the parameter estimates in the conditional model. Furthermore, the residual variance in the conditional model is often significantly reduced, thus improving the precision of the statistical inference as compared to the full model. In some cases the exogeneity status of a variable is so obvious that testing might not be needed and one can directly estimate a partial model. For example, the economic activity in USA is likely to inuence the Danish economy, but whatever happens in the Danish economy it is not likely to inuence the US economy. In more doubtful cases the procedure suggested in Harbo, Johansen, and Rahbek (1998) can be used as a safeguard against conditioning on variables, which are not weakly exogenous for the long-run parameters. Table 11.7 reports the estimates of the partial model for mr , p2 , and t t r Rm,t conditional on Rb,t and yt . The estimated results are similar to the estimates of the full system reported in Table 11.4 and 11.6. Thus, the empirical conclusions would be fairly robust whether based on the full or the partial system. The parameter estimates (particularly the current eects) have become slightly more signicant in the present model. As mentioned above the need for dummy variables may change in the partial model. This was the case here: all dummy variables became insignicant after conditionr ing on Rb,t and yt . Finally, the residual standard error of the deposit rate equation decreased to some extent compared to the full system estimate. mr 2 p Rm 0.0233 0.0147 0.00130 0.0232 0.0147 0.00117
Full system: Partial system:
11.8
Economic identication
We will now make an attempt to answer the questions raised in Section 11.2 based on the above empirical results. First we notice that the basic results on the short-run adjustment structure remained essentially unaltered in all the dierent representations reported in Table 11.1-11.5. In particular, the ECM terms in the parsimonious reduced form model reported in Table 11.1 hardly changed at all when allowing for simultaneous eects in the model.
242CHAPTER 11. IDENTIFICATION OF THE SHORT-RUN STRUCTURE Table 11.7: The estimates of a partial system for money, the ination rate and the deposit rate m r 2 pt R m,t t m r 1 0 0 t 2 pt 0 1 0 R m,t A00 : 0 0 1 y r 0.41 0 0 t
(2.9)
R b,t mr t1
r yt1
2.58
(2.0)
0 0 0 0 1.39
(12.9)
0.27
(2.9)
0.26
(2.7)
0 0.02
(2.8)
A01 :
0 0 0
Rb,t1 ecm1t1 ecm2t1 ecm3t1 mr t 2 pt Rm,t a0 :
0.28
(3.4)
0 0.01
(2.0) (3.7)
0.29
(5.9)
0 0
0 0.02322 -0.12 -0.07
0.31
0.01472 -0.04
0.00112
The latter were essentially found between changes in real money stock and real income and between changes in the short and the long-term interest rate. Including the change in current real income in the real money stock equation seemed to improve the model specication. Including current interest rate changes improved the dynamic specication somewhat but the static longrun solutions remained unaltered. Altogether, current and lagged changes of the process do not seem very crucial for the interpretation of the results and we can focus on the equilibrium error correction results when answering the questions of Section 11.2. Question 1. Is money stock adjusting to money demand or supply? The result that real money holdings have been exclusively adjusting to a long-run money demand relation was very robust in all representations. This seems to indicate either that the central bank has willingly supplied the
243
demanded money, or that agents have been able to satisfy their desired level of money holdings independently of the central bank. The access to credit outside Denmark as a result of deregulation of capital movements might suggest the latter case. Question 2. Given the empirically stable money demand relation found in the Danish data, can ination be eectively controlled by the central bank? If the level of (the broad measure of) money stock is endogenously determined by agents demand for money, then the central bank can indirectly inuence the level of the demanded quantity by inuencing its determinants, i.e. the cost of holding money. According to the estimated money-demand relation this can be achieved by changing the short-term interest rate. But a change in the short-term interest rate is likely to change money demand only to the extent that it changes the spread (Rm Rb ). Although we found that (Rm Rb ) I(1) in the money demand relation, we also found that the money demand relation was a linear combination between two stationary relations (mr y r + b1 Rb + b2 Ds83t ) and (Rm 0.4Rb ). Hence, it seems likely that the long-term movements in the demand for money are primarily inuenced by the level of the long-term bond rate, which was found to be weakly exogenous and probably not controllable by the central bank. Since the modied spread was found to be stationary an increase in central bank interest rate is likely to transmit through the system in a way that leaves this component basically unchanged. Therefore, the present empirical evidence seems to suggest that the central bank would not have been able to eectively control M3. Question 3. Does excess money, dened here as the deviation from the long-run money demand relation cause ination? If this is the case, ecm2 should have a positive coecient in ination equation. No such evidence is found in any of the short-run structures. The triangular form in Table 10.2 reports a positive, but very small (0.04) coecient of ecm2. However, this insignicant eect is canceled by a negative coecient -0.04 to lagged changes in money stock. Thus, there is no evidence that ination has been caused by monetary expansion in this period, at least not in the short run. Question 4. What is the eect of expanding money supply on prices in the long run? Is money causing prices or prices causing money? This question will be addressed in the common trends model in the next chapter.
Chapter 12 Identication of common Trends

Sections 5.2 and 6.4 demonstrated the duality between the AR-representation describing the adjustment dynamics , , towards the long-run relations 0 xt and the MA-representation describing the common driving trends, 0 i , and their weights, . Generally, the identication problem for the common trends case is similar to the one of the long-run relations in the sense that one can choose a normalization and (p r 1) restrictions without changing the value of the likelihood function, whereas additional restrictions are overidentifying and, hence, testable. In this chapter we will discuss how to impose restrictions on the underlying common trends without attaching a structural meaning to the estimated shocks. This will be done in the next chapter, where we will discuss restrictions on the VAR model that aim at identifying r permanent and p r transitory shocks. The organization of this chapter is as follows: Section 12.1 discusses the common trends decomposition based on the VAR model, Section 12.2 focuses on some special cases, Section 12.3 illustrates the ideas using the Danish data, and Section 12.4 discusses whether the results are economically identied using the scenario analysis of Chapter 2. 245
246
CHAPTER 12. IDENTIFICATION OF COMMON TRENDS
12.1
The common trends representation
Chapters 10 and 11 demonstrated that the V AR residuals are not invariant to linear transformations. This was shown by pre-multiplying the VAR model: xt = 1 xt1 + 0 xt1 + 0 + 1 t+t , t Np (0, ) with a (p p) non-singular matrix A0 : A0 xt = A1 xt1 + a0 xt1 + 0,a + a 1 t + vt , vt Np (0, A0 A00 ) (12.2) (12.1)
where A1 = A0 1 , a = A0 , 0,a = A0 0 , and vt = A0 t . The moving average representation for a VAR model with linear, but no quadratic, trends is given by: X i + tC0 + C (L)(t + 0 + 1 t),
xt = C where
(12.3)
C = (0 )1 0 .
(12.4)
It is useful to express the C matrix as a product of two matrices (similarly as = 0 ) e C = 0 , e C = 0 , (12.5)
or, alternatively,
e e where = (0 )1 , and 0 = (0 )1 0 . CATS uses the former formulation. Note that the matrices and can be directly calculated for given estimates of , , and based on (12.4). Thus, the common stochastic trends and their weights can be calculated either based on unrestricted , , c c or on restricted estimates, , . When choosing the moving average option
12.2. SOME SPECIAL CASES
247
in CATS the program uses the latest estimates of and as a basis for the calculations. e It appears from (12.5) that the p (p r) matrix (alternatively c e interpretation as the coecients of the p r common ) can be given an P P 0 stochastic trends i (alternatively c0 i ) of the variables xt . In the last section of this chapter we will relate this decomposition to the more intuitive discussion of common stochastic trends of Chapter 2. e The decomposition of C = 0 resembles the decomposition = 0 e but with the important dierence that is a function not only of , but e also of . Similarly as for and one can transform and by a nonsingular (p r) (p r) matrix Q
0 e ec C = QQ1 = c0 .
(12.6)
without changing the value of the likelihood function. Thus, the Q transformation leads to just-identied common trends for which no testing is ine volved. Additional restrictions on and would constrain the likelihood function and, hence, be testable. The next section will discuss a few special cases of testable restrictions on which can be expressed as testable restrictions on , so that new test procedures need not be derived.
12.2
Some special cases
We will now discuss a few special cases of restrictions on and which e imply some interesting restrictions on and . Case 1: Long-run homogeneity: = a b c 1 a 2 b 3 c (1 1 )a (1 2 )b (1 3 )c e = 1 1 1 0 0 1 1 1 0 0
Case 2: A stationary variable in :
248
i.e. a zero row in the C-matrix. Case 3: A column of is proportional to a unit vector: 0 0 0 0 0 0
0 1 0 0 0
e =
i.e. a zero column in the C matrix. Case 4: A row in is equal to zero. 0 0 0
In this case the last variable is weakly exogenous and corresponds to a common driving trend. In addition, if all the rows of i , i = 1, ..., k 1 corresponding to the weakly exogenous variable are zero, then the latter will appear as a unit row vector in the C-matrix.
0 0 0 0 1
12.3
Illustrations
For the Danish data we found r = 3 and, hence, p r = 2 common trends. We will report three dierent cases of the estimated common trends representation: 1. based on the unrestricted and from the VAR model,
12.3. ILLUSTRATIONS Table 12.1: The unrestricted VAR representations The common trends representation e e mr p Rm yr ,1 ,2
(0.024) (0.015) (0.001)
249
(0.018)
Rb (0.002) -0.54 1.00
mr p Rm yr Rb
5.3 0.1 -0.4 -2.2 -0.9
-11.6 0.4 0.5 -3.8 1.0
0,1 0,2
-0.03 -0.02
0.02 0.03
1.00 0.50
-0.24 -0.16
Residual standard deviations in the brackets
The C matrix mr p Rm 0.02 0.18 0.44

(0.1) (0.9) (0.1)
yr Rb 0.57 14.5
(2.6) (4.9)
p Rm yr Rb
0.01
(0.7)
0.01
(0.9) (0.5)
0.31 0.09
(0.9) (5.5)
0.40
(1.8)
0.00
(0.8) (0.4) (0.6)
0.00 0.13
(0.7) (1.0) (0.9) (0.9)
0.01
(1.2)
0.68
(5.4)
0.13 0.14 4.05 0.01 0.01 0.38

(0.3)
1.11
(5.7) (2.2)
2.60
(1.0) (5.1)
0.05
1.51
t-ratios in the brackets
2. based on a restricted (the zero row restrictions of real income and bond rate), but unrestricted . 3. based on restricted to the structure HS.4 of Table 10.3 and restricted as in 2. e e The estimates ,1 , ,2 , ,1 , and ,2 in Table 12.1 are based on a rst tentative normalization by normalizing on the largest coecient. We note that the largest coecient in ,2 corresponds to the weakly exogenous bond rate, whereas in ,1 it corresponds to the short-term interest rate and not to the weakly exogenous real income. Thus, by normalizing on Rm we have in fact normalized on an insignicant coecient. This is a reminder that a large coecient does not necessarily imply a statistically signicant coecient. Unless the residuals are standardized the magnitude of a coecient is not very informative. To improve interpretability we have, therefore, reported the residual standard deviations in the brackets underneath the residuals in
250
the upper part of Table 12.1. It appears that even if the coecient of Rm is 5 times larger than the coecient of yr , the residual standard error of the latter is 18 times larger than the one of the former. Figure 12.1 (to be included) shows the graphs of the cumulated VAR residuals equation by equation and Figure 12.2 (to be included) the two unrestricted common trends dened by: X
t i=1 u1,i t i=1 u2,i
where ,1 and ,2 are given by the estimates in Table 12.1 and 0i = [mr , p , Rm , yr , Rb ]i , i = 1974:3, ..., 1993:3. As already mentioned the common trends estimates of Table 12.1 are not unique in the sense that we can impose (p r 1) = 1 identifying restriction on each vector without changing the likelihood function. Assume, for example, that we would like to impose a zero restriction on the bond rate residual and a unitary coecient on the real income residual in ,1 , and a zero restriction on the real income residual and a unitary coecient on the bond rate residual in ,2 . This can be achieved by premultiplying by 1 the transformation matrix Q
1
= 0,1 = 0,2
X X
t i=1i , t i=1i ,
Q = resulting in:
0.24 0.54 0.16 1.00
3.06 1.65 0.49 0.73
c0
0.12 0.11 3.89 1.0 0.0 0.00 0.01 0.12 0.0 1.0 0.58 14.46 0.09 0.35 0.02 0.72 1.14 2.61 0.06 1.49
By post-multiplying by Q we get the corresponding loadings matrix: =
12.3. ILLUSTRATIONS Table 12.2: Three representations of the moving average form mr p Rm yr Rb
(0.024) (0.015) (0.001) (0.018) (0.002)
251
,1 ,2 mr p Rm y
r
0 0 mr 0 0 0 0 0
0 0 1 0 0 0 0 1 The C matrix: Case 2 p Rm yr ( ,1 ) Rb ( ,2 ) 0 0 0.66 14.4

(2.8) (5.1)
Case 3 yr ( ,1 ) Rb ( ,2 ) 0.82 16.0

(2.8) (4.3)
0 0 0 0
0 0 0 0
0.08
(5.4)
0.51
(3.0) (5.0)
0.10
(5.3)
0.44
(2.0) (4.4)
0.02
(2.0) (6.0) (2.3)
0.68 3.85
(1.6)
0.02
(2.0) (5.6) (2.0)
0.55 3.49
(1.2)
1.25 0.05
1.27 0.05
Rb
1.40
(4.9)
1.45
(4.4)
Note that this can also be achieved by simple row manipulations of the unrestricted vectors, for example: c = [( 01 + 0.54 02 )/(0.33)] = ,1 0.12 0.09 3.91 1.0 0.0 .
In the lower part of the Table 12.1 we report the C matrix for the un restricted VAR. The columns of m , p and Rm contain no signicant co ecients which is consistent with the columns of being proportional to unit vectors (as indeed seemed to be the case for HS.4 in Chapter 10). The results suggest that there are no signicant long-run impact of disturbances to real money stock, short-term interest rate and the ination rate on any of the variables of this system. Note that the rst two are probably closely related to unanticipated shocks to the monetary policy instruments and the third one is the goal variable of monetary policy. In representation 2 reported in Table 12.2 we have imposed the two zero row restrictions on corresponding to the weak exogeneity of the real income P and the bond rate. The corresponding two common trends, t yr ,i and i=1 Pt i=1 Rb,i , are now overidentied with three overidentifying restrictions each (which is the equivalent of the six degrees of freedom in the joint exogeneity test in Chapter 9). Note that the two exogeneity restrictions completely
252
determine the common trends in our empirical model (recalling that p r = 2). Consequently, two rows of the matrix are zero in this case and the remaining three rows can be represented as: 11 0 0 0 22 0 01 xt1 0 33 02 xt1 . = 0 0 0 0 03 xt1 0 0 0
Thus, when there are exactly p r weakly exogenous variables, the vectors can be represented as r unit vectors resulting in r zero columns in the C matrix. In addition to the weak exogeneity restrictions on , representation 3 has imposed the structure HS.4 on . Since the rst three columns of the C matrix are equal to zero, Table 12.2 only reports the last two columns. In this case they are equivalent to ,1 and ,2 . A comparison of the columns for yr and Rb , i.e. of ,1 and ,2 , in the three cases reveals that the unrestricted estimates have not changed much as a result of imposing the six restrictions and the four restrictions. This is because these restrictions were accepted with very high p-values. When this is not the case the estimates of the C matrix may dier considerably depending on whether they are based on unrestricted or restricted and . While the long-run weak exogeneity of the bond rate and the real income implies that their cumulated residuals can be considered common stochastic trends, it does not necessarily imply that these two variables are the common trends. For this to be the case we need the further condition on the i matricesthat the rows associated with the weakly exogenous variables have to be zero. In a such a case the equation for the variable xj,t becomes xj,t = j,t P so that xj,t = t j,i , i.e. the common stochastic trend coincides with the i=1 variable itself. Table 12.2 shows that the bond rate is signicantly aected by both of the stochastic trends. The same is true for the real aggregate income variable, though less signicantly so. The reason for this can be found in Table 11.1 which shows that both variables exhibit signicant eects from lagged changes of process. Chapter 5 demonstrated that cointegration and common trends are two sides of the same coin. We will now exploit this feature to learn more about
12.3. ILLUSTRATIONS
253
the generating mechanisms underlying the cointegration structure HS.4 . We reproduce the common trends estimates of Case 3 in Table 12.2: mr p Rm yr Rb 0.82 16.0 0.10 0.44 P stationary and yr ,t P 0.02 0.55 + deterministic Rb,t 1.27 -3.49 components 0.05 1.45
where yr ,t is a measure of u1t and Rb,t a measure of u2t . In Chapter 9 we found that the liquidity ratio, the interest rate spread and the short and long-term real interest rates were nonstationary. Using the above results we can now express these as linear functions of the two stochastic trends: mr y r Rm Rb Rm p Rb p = = = = (0.82 1.27)yr ,i (16.0 + 3.5)Rb,i = 0.45yr ,i 12.50Rb,i (0.02 0.05)yr ,i + (0.55 1.45)Rb,i = 0.03yr ,i 0.90Rb,i (0.02 + 0.10)yr ,i + (0.55 0.44)Rb,i = 0.12yr ,i + 0.11Rb,i (0.05 + 0.10)yr ,i + (1.45 0.44)Rb,i = 0.15yr ,i + 1.01Rb,i
The nonstationarity of the liquidity ratio seems primarily to derive from cumulated shocks to the bond rate. The level of real money balances seems to have been strongly inuenced by the latter trend, whereas the level of real aggregate income much less so. Furthermore, cumulated shocks to real aggregate income might have inuenced real money balances and real income dierently (though the dierence between 0.82 and 1.27 may or may not be statistically signicant). Thus, the results seem to suggest that the Danish money demand has been more interest rate elastic than aggregate demand. The nonstationarity of the interest spread is related to both stochastic trends. Both the short- and the long-term interest rate have been aected by the two shocks, but the bond rate more strongly so. The magnitude of the nonstationarity of the short-term and the longterm real interest rate appears to dier in the sense that the real short-term interest rate is closer to stationarity than the real long-term rate. This is primarily because the real bond rate has been more strongly aected by the two stochastic trends than the real deposit rate. Furthermore, it is interesting to notice that the rst stochastic trend, describing shocks to real income, has had a positive impact on both interest rates but a negative one on the
254
ination rate, whereas the second trend describing shocks to the bond rate has had a positive impact on the ination rate and the two interest rates. It seems plausible that the nonstationarity of real interest rates in this period is related to the real income stochastic trend and its eect on the ination rate. Altogether, the results indicate that the relationship between excess aggregate demand and ination rate has not followed conventional mechanisms in the present sample period. Based on the above results it is now straightforward to examine the stationary cointegration relations dened by HS.4 . (mr y r ) 13.8(Rm Rb ) Rm 0.4Rb (Rm p) + 0.5(Rm Rb ) 0.08y r = (0.11 0.45 + 0.28)yr ,i + (0.10 0.01 0.10)Rb,i = 0.06yr ,i 0.03Rb,i = 0.03yr ,i = = 0.06yr ,i 0.01Rb,i
The reason why the stochastic trends do not cancel exactly is that cointegration relations are derived under the hypothesis H24 but without imposing weak exogeneity restrictions on , whereas the common trends representation is based on a restricted and . The stationarity of the empirical money demand relation is a result of the interest rate spread being inuenced by the stochastic trends in the same proportion as the liquidity ratio. The stationarity of the interest rate relation has been achieved by combining the two interest rates in the same proportion as the two stochastic trends enter the variables. The stationarity of the third relation is achieved by combining a homogeneous relation between the ination rate and the two interest rates with the real income variable. It appears that the homogeneous ination-interest rates relation (i.e. H27 in Table 9.3) does not cancel the two stochastic trends, of which the real trend is probably the most signicant. Adding a small fraction of real income is enough to counterbalance the two stochastic trends so that stationarity is strongly improved in the extended relation (H28 in Table 9.3).
12.4
Economic identication
We have now obtained estimates of the common trends representation that can be used to evaluate the empirical content of the real money representation
255
discussed in Chapter 2. The theoretical model predicted that nominal growth derives from money expansion in excess of real productive growth in the economy and real growth from productivity shocks to the economy, i.e. u1t = 0,1 t and u2t = 0,2 t , where 0,1 = [, 0, 0, 0, 0] and 0,2 = [0, 0, 0, , 0]. The empirical model was consistent with nding two autonomous permanent disturbances which cumulate to two common driving trends. One of these seemed consistent with the hypothetical real income trend, whereas the other was clearly not related to excess monetary expansion in the domestic market. Instead, it was generated by permanent shocks to the long-term bond rate and, thus, seemed more related to nancial behavior in long-term capital market. To facilitate the discussion of the cointegration implications of the empirical results for the real money representation we reproduce both the theoretical representation and the estimates from table 12.2, Case 3, i.e. with the weak exogeneity restrictions imposed on and the restrictions HS,4 on . To improve comparability u1,t denotes an autonomous real disturbance and u2,t a nominal disturbance. mr p yr Rm Rb mr t pt r yt Rm,t Rb,t d12 0 stationary and 0 c21 P u1i P d12 0 + deterministic u2i components 0 c21 0 c21
(12.7)
The cointegration implications of the theory model (12.7) is that velocity, (mr y r ), the interest rate spread, (Rm Rb ), and the real interest rates, (Rm p), and (Rb p) should all be stationary. As demonstrated above this was not the case here and we will now discuss why. According to (12.7) the real trend should inuence real money stock and real income with the same coecients. The estimated coecients are 0.82 to money stock and 1.27 to real income; both are positive and not too far away
0.82 16.0 0.10 0.44 P stationary and P u1i 1.27 -3.49 + deterministic u2i 0.02 0.55 components 0.05 1.45
(12.8)
256
from unity, but may, nevertheless, deviate too much from each other. The theory model predicts that the nominal I(1) trend should not inuence real money nor real income. It appears that the nominal trend (dened here as the cumulated shocks to the long-term bond rate) has indeed an insignicant eect on the real income variable, but a very signicant one on real money stock. Therefore, some of discrepancy between the assumed monetary model and the empirical evidence can be related to the strong impact of the longterm interest rate in the money demand relation (conventional monetary models would assume the demand for money to be interest rate inelastic). The Fisher parity predicts that nominal interest rates are only inuenced by the stochastic nominal trend and the expectations hypothesis predicts that the latter inuences the interest rates with equal coecients. The empirical results show that both stochastic trends inuence the two interest rates in a similar way though not in the proportion one to one. Instead the weights are approximately in the proportion 0.4 to 1.0 consistent with (Rm 0.4Rb ) I(0) (probably because we used average yields on M3 money stock). Finally, but not least importantly, ination rate is signicantly aected by both trends which is against the econometric assumption of Chapter 2 that only the nominal trend should inuence ination. Furthermore, the effect from the real trend on ination is highly signicant and negative. The latter seems counter-intuitive, at least in terms of a conventional Phillips curve relationship. The eect from the nominal trend (shocks to the bond rate) is positive, possibly reecting the self-fullling eect of long-term inationary expectations on price ination. The nding of two stochastic trends in ination rate could also be consistent with prices containing two stochastic I(2) trends instead of one. However, the estimated results of the I(2) analysis in Chapter 14 clearly supports the existence of one I(2) trend. Therefore, to achieve econometrically consistent results we need to redene the common nominal trend as a linear combination of the shocks to real expenditure and to the nominal bond rate. By transforming based on (12.8) using the following transformation matrix Q
Q=
1.0 0.00 0.23 1.0
we achieve the following common trends representation
257
mr p yr Rm Rb
in which ination is aected by a single common stochastic trend dened by u2i = Rb,i + 0.23yr ,i and u1i = yr ,i as before. Note that the two common trends are no uncorrelated as they were in (12.8) with yr ,t and Rb,t being correlated with a coecient 0.03 (see Table 11.5). Though the coecient of the nominal trend in y r is insignicant we can similarly apply the following transformation to (12.9): 1.0 7.45 0.0 1.0
2.86 16.0 0.0 0.44 P stationary and P u1i + deterministic 0.47 -3.49 u2i 0.15 0.55 components 0.38 1.45
(12.9)
Q= and obtain the representation: r 2.86 5.31 m p 0.0 0.44 r y = 0.47 0.0 Rm 0.15 1.67 0.38 4.28 Rb
In (12.10) we have identied the common trends by imposing restrictions e , in (12.8) on , and in (12.9) on both. Given the discussion in Secon tion 12.1 about interpreting shocks we conclude that the denition of u1t and u2t satises the requirement of uniqueness and, possibly novelty, but fails on the econometric condition that only the nominal stochastic trend should inuence ination. In (12.9) and (12.10) the econometric condition is satised but at the sacrice of the uniqueness condition. Furthermore, the economic implications of the results became less plausible in (12.9) and (12.10). For example, the condition that real money stock and real aggregate demand should be aected similarly by the real stochastic trend is now lost. This seems to be related to the fact that the cumulated shocks to real aggregate demand were more signicant, and negatively so, in price ination than were shocks to the bond rate. Moreover, the nding that the nominal stochastic
P stationary and P u1i + deterministic u2i components
(12.10)
258
trend did not arise from shocks to money stock was very strong and significant. Thus, the results illustrate how fragile a structural interpretation of VAR residuals can be. The logic of the conventional monetary model seems to be inconsistent with the logic of the econometric analysis and, thus, points to a need to reconsider the theoretical basis for understanding inationary mechanisms in this period. Whether or not such an empirical claim can be forcefully brought forward depends on the reliability of the empirical results, i.e. whether or not the structuring of the economic reality using the cointegrated VAR model is empirically convincing. However, as long as a rst order linear approximation of the underlying economic structure provides an adequate description of the empirical reality, the VAR model is essentially a convenient summary of the covariances of the data (see Chapter 3). Provided that further reductions (simplications) of the model are based on valid procedures for statistical inference the nal empirical results should essentially reect the information in the covariances of the data. Altogether, the empirical evidence based on the present sample does not seem to support the conventional monetary model as represented by (12.7). In particular, the nding that shocks to the long-term bond rate, instead of the money stock, were an important driving force, suggests that nancial deregulation and the increased globalization may have played a more crucial role for nominal growth in the domestic economy than the actions of the central bank. By comparing the theoretical model with the corresponding empirical results we may get some understanding for why the theory model failed. Thus, in the ideal case the empirical analysis might suggest possible directions for modifying the theoretical model. Alternatively, it might suggest how to modify the empirical model (for example by adding more data) to make it more consistent with the theory. In either case the analysis points forward, which is why we believe the VAR methodology has the potential of being a progressive research paradigm. We will return to this question in the last chapter. Up to this point we have not explicitly identied the estimated shocks by making them uncorrelated, or by distinguishing between permanent and transitory shocks. This will be the topic of the next chapter.
Chapter 13 Identication of a Structural VAR

In the previous chapter no attempt was made to identify the transitory shocks, i.e. the shocks which do not cumulate to stochastic trends. This is the purpose of this chapter. There is, however, an important dierence: The identication of the longrun structure is formulated as r linear hypotheses on the variables of the system, whereas the identication of the common trends is formulated as p r linear hypotheses on the shocks to the variables/equations. While we do not in general need to discuss what a variable is, the empirical denition of a (structural) shock is more arbitrary. The previous chapter argued that this is particularly so when the shock has to be estimated from the VAR residuals which are seldom invariant to extensions of the information set. Therefore, the denition of a structural shock and how to identify it based on the VAR residuals plays an important role in the identication of the common trends.
13.1
Transitory and permanent shocks
The VAR model: xt = 1 xt1 +0 xt1 + + t , with t Np (0, ). The Vector Equilibrium Correction model with simultaneous eects: 259
260
CHAPTER 13. IDENTIFICATION OF A STRUCTURAL VAR
A0 xt = A1 xt1 +a 0 xt1 +a +vt , vt Np (0, A0 A00 ) where A1 = A0 1 , a = A0 , a = A0 , and vt = A0 t . The MA representation of the VAR model assuming no linear trends in the data: xt = C where X i +C (L)(t +0 ), (13.1)
The SVAR model: The SVAR model diers from the Eq.C. model in the following sense: 1. the VAR residuals are assumed to be related to some underlying structural shocks which are linearly independent 2. the p structural shocks are divided into (p r) permanent and the r transitory shocks. Most common trends models achieve this by assuming that 1. transitory shocks have no long-run impact on the variables in the system (i.e. the transitory shocks dene zero columns in the C matrix) 2. permanent shocks have a signicant long-run impact on the variables of the system. We consider now a structural representation as dened by the matrix B (which is similar to the matrix A0 of the Eq.C model, except that B does not assume a normalization) associating the structural shocks ut with the VAR residuals: ut = Bt (13.4)
C = (0 )1 0 e = 0 .
(13.2) (13.3)
13.1. TRANSITORY AND PERMANENT SHOCKS
261
or, alternatively, associating the VAR residuals with the underlying structural shocks: t = B1 ut We can now reformulate (13.1) using (13.5): e xt = 0 B1 X vi +C (L)B1 ut +0 . (13.6) (13.5)
We need to choose B so that the usual assumptions underlying a structural intepretation are satised: 1. Distinction between transitory and permant shocks, i.e. ut = [us , ul ] = [us,1 , ..., us,r , ul,1 , ..., ul,pr ] 2. The transitory shocks, [us,1 , ..., us,r ] have no long-run impact on the variables of the system whereas the permanent shocks [ul,1 , ..., ul,pr ] have. 3. E(ut u0t ) = Ip , i.e. all structural shocks are linearly independent or, alternatively 4. E(us,t u0s,t ) = Ir and E(ul,t u0l,t ) = Ipr Identication: By including the relation (13.4) to the unrestricted VAR model we have introduced p p (= 25) additional parameters so that we need to impose as many restrictions to achieve just identication. The orthogonality condition 3. implies that {p (p + 1)}/2 (= 15) of the parameters in B1 have been xed. The condition 2. restricts (p r) r (= 6) additional parameters. Thus, in our example we need to impose 4 additional restrictions to achieve complete uniqueness of the structural shocks. We will rst choose B so that conditions 1, 2, and 3 are satised. 1. Orthogonality of the transitory shocks, us,t and the permanent shocks ul,t = Bt can be achieved by choosing: us,t = 0 1 t ul,t = 0 t
262
CHAPTER 13. IDENTIFICATION OF A STRUCTURAL VAR
2. Orthogonality within the two groups can be achieved by choosing: " 0 (1 )1/2 0 1 (0 )1/2 0 #
B= which corresponds to: B

1
We now relax the assumption that permanent and transitory shocks are uncorrelated, while maintaining the orthogonality assumption within the groups. Thus, we still need to distinguish between permanent and transitory shocks, but transitory shocks, say, can be correlated with permanent shocks: This can be ahieved by the the following choice: 0 B= 0 and B1 =
( )
1/2
(0 )1/2
where 0 = (0 )1 and = (0 )1 , so that 0 t dene the transitory shocks, and 0 t dene the permanent shocks. Orhogonality of the transitory shocks implies that E{(0 t )(0t )} = 0 = I, which can be achieved by: us,t = (0 )1/2 0 t Orhogonality of the permanent shocks implies that E{(0 t )(0t )} = 0 = I, which can be achieved by: ul,t = (0 )1/2 0 t . The uniqueness can be achieved econometrically by choosing A0 so that the covariance matrix becomes diagonal and by appropriately restricting . For empirical applications see Mellander, Vredin and Warne (1992),
13.1. TRANSITORY AND PERMANENT SHOCKS
263
Hansen and Warne (200?), and Coenen and Vega (2000). Whether the resulting estimate vt can be given an economic interpretation as a unique structural shock, depends crucially on the plausibility of the identifying assumptions. Some of them are just-identifying and, thus, cannot be tested against the data.
Chapter 14 I(2) Symptoms in I(1) Models

The purpose of this chapter is to give a soft introduction to the I(2) model so that the reader has a good intuition for the basic ideas prior to the analysis of the formal I(2) model of the next chapter. Generally, there are few applications of the cointegrated VAR model for I(2) data. The reason for this is that the existence of I(2) data and, hence, the relevance of the I(2) model has often been disregarded from the outset based on economic arguments. However, as argued in Chapter 2, the unit root property is a statistical concept which in general should not be translated into a property of a economic problem. Therefore, to give a double unit root a structural economic interpretation as a very long-run relationships in the data is generally not granted. For example, the intuition that I(2) trends can be found over very long time periods is at odds with the fact that signicant mean reversion is more likely to be found in large than in small samples. Thus, the hypothesis of a double unit root is often hard to reject even in moderately sized samples when the adjustment behavior is sluggish. Thus, while the distinction between the I(1) and I(2) model is theoretically sharp, empirically it is often much more diuse. The statistical analysis of the I(2) model is quite involved and the move from the fairly well-known I(1) world to the more complex I(2) analysis may easily seem prohibitive. The aim of this chapter and the next is to convince the reader that the I(2) analysis, though possibly demanding, is well worth the eort. We will argue here that the I(2) analysis oer a wealth of largely unexploited possibilities for an improved understanding of empirical problems where acceleration rates matter, for example, the determination of the ination rate. 265
266
CHAPTER 14. I(2) SYMPTOMS
The distinction between an I(2) variable and a trend-adjusted I(1) variable is often diuse, particularly in small samples. Therefore, Section 14.1 gives a brief informal introduction to the role of deterministic and stochastic components in models with nominal variables. Most econometric software packages contains a routine for cointegration analysis based on the I(1) model, but only a few include tests and estimation procedures for the I(2) model. Section 14.2, therefore, provides an intuitive approach to the I(2) model based on the denition of the so called R model of Chapter 7. Section 14.3 discusses typical symptoms in the I(1) model signalling I(2) problems. Section 14.4 gives some examples of questions which can be adequately asked and tested based on the I(1) model even if the data are I(2). Section 14.5 discusses under which circumstances the data can be transformed to I(1) variables without loss of information. Section 14.6 concludes.
14.1
Stochastic and deterministic components in nominal models
When analyzing nominal instead of real variables, for example, mt and pt instead of (m pt ), we need to reconsider the role of the stochastic and deterministic components and how they enter the model. However, before doing that one would probably rst like to know whether the data are I(2) or not. Therefore, most empirical applications in which any of the variables might contain a double unit root report the results of some univariate DickeyFuller type tests applied to each variable separately. We will, however, not discuss the univariate test procedures here because the next chapter will strongly argue that univariate tests of individual variables cannot (and should not) replace the multivariate I(2) test procedure. Thus, we will leave the formal I(2) testing to the next chapter and, instead, we will take a look at the graphs of the data in levels and dierences as a rst step in the analysis. Because an I(2) variable typically exhibits a very smooth behavior, which can be dicult to distinguish from an I(1) variable with a linear trend, the dierenced data are often more informative about potential I(2) behavior than the data in levels. It is often the case that an I(2) variable can be approximated by a trend-adjusted I(1) variable when it is observed over shorter periods. Since the slope coecient of a linear trend in xt corresponds to an average growth rate of xt , the graph of the latter
14.1. STOCHASTIC AND DETERMINISTIC COMPONENTS IN NOMINAL MODELS267

.75 .5 .25 1975 1.5 1.4 1.3 1.2 1975 .05 .04 .03 1975 1980 1985 1990 1995
ib ry
.5 0 -.5 1980 1985 1990 1995 .03 .025 .02 .015 1980 1985 1990 1995
py
1975
id
1980
1985
1990
1995
1975
1980
1985
1990
1995
Figure 14.1: The levels of the variables for the Danish nominal data. often suggests whether there is signicant mean reversion in the dierences or not. In many cases it may be possible to avoid the I(2) analysis by allowing suciently many mean shifts in xt (i.e. broken linear trends in xt ) even if the growth rate seems to be drifting o in a nonstationary manner. For example nominal money stock and prices in Figure 14.1, panel a and b, exhibit smooth trending behavior over the whole sample period but with a change in the slope at around 1982-83. Consistent with this the growth rates (in panel c and d) seem to uctuate around a higher mean value up to 1983 and a lower value thereafter. Hypothetically, trend-adjusted nominal money and prices could be found to be empirically I(1) when allowing for dierent growth rates in the two regimes, but to be I(2) when assuming constant linear growth rates. In the former case we would have chosen to model the shift from a high ination to a low ination regime deterministically in the latter case stochastically. Does the choice matter or not? It depends! In the Danish data we detected an extraordinary large shock in the bond rate and real money stock at 1983:1 which approximately coincided with a change in inationary regimes. From an econometric point of view this shock violated the normality assumption of the VAR model and, hence, was accounted for by a blip dummy. The latter cumulates to a shift in the levels of the bond rate and the levels of real
268
.1
Dm

.05 .025 0 0 1975 .1 .05 0 1975 .005 0 -.005 1975 1980 1985 1990 1995
Dib Dry Dpy
1980
1985
1990
1995 .005
1975
Did
1980
1985
1990
1995
1980
1985
1990
1995
1975
1980
1985
1990
1995
Figure 14.2: The dierences of the variables for the Danish nominal data. money stock. Chapter 7 demonstrated that an unrestricted step dummy in the VAR model corresponds to a broken linear trend in the data. Thus, by econometrically accounting for the extraordinary large shock in the data we have in fact chosen to model the shift deterministically. Nonetheless, Table 9.3. showed that the stationarity of the Danish ination rate was rejected (though borderline so) even when corrected for a shift in the mean at 1983:1. In order not to violate the multivariate normality assumption of the VAR model, big interventions and reforms often need to be modelled deterministically and one can ask whether this is good or bad. One could on one hand argue that if the behavioral shift was properly anticipated by the economic agents then a model derived under the assumption of such forward looking behavior should be able to account for these large changes in the data. For example, if model based forward looking expectations are adequately describing agents behavior then the linear VAR model should exhibit signs of non-constancy and we should preferably move to a nonlinear model analysis. If on the other hand the VAR model provides a good description of the data, then the magnitude of the shock was probably unanticipated and we should treat it as a large innovation outlier. When there are large shocks in the data, though not large enough to violate the normality assumption one might, nevertheless, prefer to allow for a mean shift in the dierences to avoid the I(2) analysis. In this case we treat some shocks as deterministic though strictly not required by the econo-
14.2. ESTIMATING AN I(1) MODEL WITH I(2) DATA
269
metric analysis. Such a choice should, therefore, be justied by economic arguments. For example, one could argue that a shift dummy/linear broken trend is a proxy for an omitted variable which, if included, would have made the trend/dummy variable superuous in the model. To use broken linear trends and step dummies exclusively to avoid the I(2) analysis does not seem reasonable.
14.2
Estimating an I(1) model with I(2) data
Chapter 9 found that the ination rate was empirically I(1) based on the data vector x0t = [mr , y r , p, Rm , Rb ], where mr = m p. This implies that prices are I(2) and, therefore, that nominal money stock is likely to be I(2) as well. Chapter 2 demonstrated that we need to reconsider the role of the stochastic trends when the VAR analysis is based on nominal money and prices x0t = [m, p, y r , Rm , Rb ] . However, in this case we also need to reconsider the role of the deterministic components. The empirical analyses of Chapter 9 showed that the real money stock and the real income variable contained a linear trend and a level shift at around 1983. The latter did not seem to have generated a broken linear trend in the two variables. The question is whether we need to reconsider the possibility of a broken linear trend in the nominal variables. As a matter of fact nominal money and prices might very well contain a broken linear trend even though (m p) showed no such evidence. This would be the case if m and p contain the same (broken) trend as it would then cancel in the real transformation. The graphs of nominal money stock and prices in Figure 14.1 suggest that the slope of the linear trend might indeed have changed at around 1983. Based on the nominal VAR analysis it is possible to test the hypothesis that there are broken linear trends in the data and that they cancel in m p. To illustrate this possibility we respecify the VAR model so that it is consistent with broken linear trends both in the data and in the cointegration relations: 0 xt = 1 xt1 + xt2 +Dpt +01 +02 Dst +t t Np (0, ), t = 1, ..., T
(14.1)
0 where = [0 , 11 , 12 ] and x0t = [xt , t, tDst ], , 01 , and 02 are unre-
270
stricted and the broken linear trend is restricted to lie in the cointegration relations to avoid quadratic trends in the data. When xt I(2) and, hence xt I(1) the reduced rank restriction on the matrix is not sucient to get rid of all (near) unit roots in the model. Because the process xt also contains unit roots we need to impose another reduced rank restriction on the matrix = I 1 . This will be formally discussed in the next chapter. Note, however, that even if the rank of = 0 has been correctly determined there will remain additional unit roots in the VAR model when xt is I(2). Therefore, a straightforward way of nding out whether there is an additional (near) unit root in the model is to calculate the roots of vector process (as described in Chapter 3) after the rank r has been determined. If there remain one or several large roots in the model for any reasonable choice of r, then it is a clear sign of I(2) behavior in at least some of the variables. Because the additional unit root(s) belong to the dierence matrix = I 1 , lowering the value of r does not remove the additional unit roots associated with the I(2) components in the data. In the Danish nominal money model there are altogether p k = 5 2 = 10 roots in the characteristic polynomial. Since the specication of the deterministic components is likely to inuence any inference regarding possible I(2) components in the VAR model we will rst estimate the Danish nominal money model allowing for broken linear trends both in the data and in the cointegration relations and then without allowing for such trends. The characteristic roots are reported below for the model (14.1) allowing for broken linear trends in the data and in the cointegration relations: V AR(p) 0.98 0.87 0.74 0.74 0.58 0.58 0.45 0.31 0.19 0.19 r=3 1.0 1.0 0.90 0.67 0.67 0.65 0.44 0.30 0.08 0.08 r=2 1.0 1.0 1.0 0.78 0.61 0.61 0.45 0.26 0.22 0.17
There appears to be two large roots in the unrestricted model consistent with r = 3. When imposing this rank restriction, the model contains two unit roots plus an additional fairly large root (0.90). When imposing three unit roots a quite large root (0.78) remains in the model, though it is now considerably smaller compared to r = 3. Thus, if the rank is three as argued in Chapter 8, the results might suggest a double unit root in the data. Nev-
14.3. AN INTUITIVE APPROACH
271
ertheless, the evidence of I(2) behavior is not very strong, a conclusion that will be conrmed by the graphs of 0 xt and 0 R1,t given below. For the model version with no broken trends and Ds83t restricted to the cointegration space the modulus of the roots are reported below: V AR(p) 0.99 0.90 0.74 0.72 0.72 0.56 0.51 0.32 0.24 0.24 r=3 1.0 1.0 0.92 0.79 0.65 0.65 0.51 0.34 0.23 0.23 r=2 1.0 1.0 1.0 0.78 0.78 0.47 0.46 0.46 0.33 0.15
Compared to the model with broken linear trends the results are almost unchanged. Thus, the stochastic movements in the data do not seem to be particularly well approximated by the introduction of broken linear trends in the model.
14.3
An intuitive approach
One might ask whether it makes sense at all sense to estimate the I(1) model in this case. We will argue below that one can test a number of hypotheses based on the I(1) procedure even if xt is I(2) without loosing much precision but that the interpretation of the results has to be modied as compared to the I(1) case. The intuition for this can be seen from the so called R-model discussed in Section 7.1. We reproduce the basic denitions of the R-model: R0t = 0 R1t +t .
(14.2)
where R0t and R1t are found by rst concentrating out lagged short-run eects, xt1 , and intervention eects, Dpt , Dst , in model (14.1): xt = B1 xt1 +Dpt +01 +02 Dst + R0t and xt2 = B2 xt1 +Dpt +01 +02 Dst + R1t . (14.4) (14.3)
272
When xt I(2), both xt and xt1 contain a common I(1) trend which is cancelled when regressing one on the other as is done in (14.3). Thus, R0t I(0) even if xt I(1). On the other hand, the I(2) trend in xt2 cannot be cancelled by regression on the I(1) variable xt1 as is done in (14.4). Thus, R1t I(2). Because R0t I(0) and t I(0), the equation (14.2) can only be true if 0 R1t I(0) or = 0. Thus, the linear combination 0 R1t transforms the process from I(2) to I(0) and we say that the estimate is super-super consistent. The connection between 0 xt2 and 0 R1t can be seen by inserting (14.4) into (14.2):
R0t = 0 (t2 B2 xt1 ) + t x = = ( 0t2 x 0 ( t2 x

0
(14.5)
B2 xt1 ) + t 0 xt1 ) + t
where = 0 B2 . Thus, the stationary relations 0 R1t consists of two components 0 xt2 and 0 xt1 . For the cointegration relations 0i R1t , i = 1, ..., r to be stationary there are two possibilities: (i) either i = 0 and 0i xt2 0 0 I(0), or (ii) i xt1 I(1) and i xt I(1) cointegrate to produce the stationary relation 0 R1t I(0). In the rst case we talk about directly stationary relations, in the second case about polynomially cointegrated relations. Here we consider 0 xt I(1) without separating between the two cases, albeit recognizing that some of the cointegration relations 0 xt may be stationary by themselves. The next chapter will discuss more formally how to distinguish between the two cases. To conclude: when xt I(2) we have that 0i xt I(1) and 0i R1t I(0) for at least one i, i = 1, ..., r. It is a strong indication of double unit roots in the data when the graphs of at least one of the cointegration relations, 0i xt , exhibits nonstationary behavior but 0i R1t looks stationary. This gives a powerful diagnostic for detecting I(2) behavior in the VAR model. The graphs of the cointegration relations based on model (14.1) with no broken trend in the cointegration relations are given in Figures 14.3-14.5. The upper panels show the relations, 0i xt , and the lower panels the cointegration relations corrected for short-run dynamics, 0i R1t .
14.3. AN INTUITIVE APPROACH

Beta1'*Z2(t)
273
-21.6 -22.8 -24.0 -25.2 -26.4 -27.6 74 76 78 80
82
84
86
88
90
92
Beta1'*R2(t)
4 3 2 1 0 -1 -2 -3 74 76 78 80 82 84 86 88 90 92
Figure 14.3. The graphs of 01 xt (upper panel) and 01 R1t (lower panel).
Beta2'*Z2(t)
3.6 1.8 -0.0 -1.8 -3.6 -5.4 -7.2 -9.0 -10.8 74 76 78 80
82
84
86
88
90
92
Beta2'*R2(t)
2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 74 76 78 80 82 84 86 88 90 92
Figure 14.4. The graphs of 02 xt (upper panel) and 02 R1t (lower panel).
274

Beta3'*Z2(t)
-34.4 -35.2 -36.0 -36.8 -37.6 -38.4 -39.2 -40.0 -40.8 -41.6 74 76 78 80
82
84
86
88
90
92
Beta3'*R2(t)
3 2 1 0 -1 -2 -3 74 76 78 80 82 84 86 88 90 92
Figure 14.5. The graphs of 03 xt (upper panel) and 03 R1t (lower panel). As can be seen the graphs of 0 xt and 0 R1,t are quite dierent supporting the interpretation above that there might be a double unit root in the data.
14.4
Transforming I(2) data to I(1)
Chapter 2 demonstrated that (m p) I(1) when both nominal money and prices contain the same I(2) trend (with the same coecients). All the previous empirical analyses using real money/ination rate were econometrically valid under the implicit assumption that the stochastic I(2) trend had been cancelled by the nominal to real transformation. We will now test the hypothesis that all cointegration relations satisfy long-run price homogeneity using the standard I(1) procedure. In the next chapter we will address the nominal-to-real transformation more formally and show that long-run price homogeneity of the cointegration relations is a necessary, but not sucient condition for his transformation. Chapter 2 discussed the case where the I(2) trends exclusively aected nominal money and prices. Under the assumption of long-run price homogeneity we demonstrated that a transformation of the two nominal variables,
14.4. TRANSFORMING I(2) DATA TO I(1)
275
m and p, to real money, m p, and price ination, p, removed the I(2) trend without loss of information, i.e. mt pt r yt Rm,t Rb,t (m p)t pt r yt Rm,t Rb,t
r if (mt , pt ) I(2), (yt , Rm,t , Rb,t ) I(1), and mt and pt cointegrate (1, 1) from I(2) to I(1). In this case the VAR analysis based on the nominal or the real data give essentially the same results with the exception that a VAR(k) model based on the real vector will have one more lag of prices (ptk1 ) compared to a VAR(k) in nominal variables. Thus, long-run price homogeneity is a very important property when analyzing models based on real transformations. For the Danish data the test of long-run price homogeneity in the nominal model with no (broken) linear trends in the data or in the cointegration relations was accepted based on a 2 (3) = 3.38 and a p-value of 0.34. However, when broken linear trends were allowed in the model long-run price homogeneity was not accepted. To be added! We will now examine the results of the nominal analysis when is unrestricted and when long-run price homogeneity has been imposed and compare the results with the empirical results from the real money analysis of Chapter 9. Table 14.1 reports the estimates of the relations and the corresponding for Danish money data in nominal values. The results of the nominal and real analysis dier primarily with respect to ination being absent in the cointegration space in the former case but present in latter case. As shown in Table 8.2 ination was only present in one of the long-run relations of the preferred structure H4 , (p Rm ) 0.5(Rm Rb ) + 0.1y r , i.e. in a relation describing a homogeneous relation between ination and the two interest rates as a function of real aggregate income. The other two long-run relations described a relationship between (1) velocity and the interest rate spread and (2) the short and the long-term interest rate. We have imposed three over-identifying restrictions (similar to the ones of HS4 in Table 10.2) on the vectors of Table 14.1 to make the nominal results as comparable as possible with the results of the real analysis. We note that 02 xt approximately reproduces the money demand
I(2)
I(1)
276
Table 14.1: Two just-identied long-run structures Unrestricted V AR Generically identif ied 1 2 3 1 2 3 m -0.03 1.00 0.01 0 1.0 0 p 0.02 -1.13 -0.01 0 -1.0 0 yr 0.01 -1.10 0.01 0.18 -1.0 0
(4.2)
Rm Rb D83 mt pt
r yt
1.00 -0.60 0.00 1 7.5

(4.9)
-1.65 10.48 -0.08 2 0.10

(1.7)
1.00 -0.42 -0.00 3 0.42

(0.3) (1.1) (1.5)
1.00 13.6
(6.2)
1.00
(7.1)
0.35
(1.0)
13.6 0.45
(6.2) (5.0)
0 0.15 1 2 0.75 0.28

(3.8) (5.1) (5.2)
0.00
(0.1)
(7.3)
3 2.6 1.2
(1.6) (1.2)
3.10 0.15
(0.2) (1.9) (0.9)
0.14
(3.8)
0.87 1.40
(3.7)
0.64 0.25
(1.7) (0.3)
0.00
(0.2) (1.2)
0.06
(1.3) (2.1)
0.05 1.29
(1.1) (4.4)
Rm,t Rb,t
0.10 0.01 0.25 0.07 0.00

(0.4)
0.00 0.00
(0.2)
0.00 0.39
(1.3)
0.03
(0.3)
0.00
(0.5)
0.06
(0.5)
14.5. CONCLUDING REMARKS
277
relation and 03 xt the interest rate relation of H4 . The homogeneous ination interest rate relation can be reproduced from the second equation in Table 14.1 as: pt = 0.64(0.2y r + 1.0Rm 0.35Rb ) + 1.25(Rm 0.45Rb ) (14.6) (14.7) = 0.12y r + 1.9Rm 0.8Rm i.e. it corresponds approximately to the following cointegration relation: (p Rm ) 0.8(Rm Rb ) + 0.1y r . Except for that the coecient to the spread is slightly higher in (14.6), the nominal money analysis reproduces the results of the real money analysis remarkably well. The dierence between the two versions of the model is basically whether ination should be treated as stationary or nonstationary. In the real model where ination is included in the cointegration space, it should correspond to a unit vector if stationary. Since, Table 9.3 indicated that this was not so, the econometric analysis will benet from treating nominal prices as I(2).
14.5
Concluding remarks
We will argue here that both econometrically as well as economically there is potentially a lot to be gained from using the econometrically rich structure of the I(2) analysis. There are at least three straightforward ways of checking the possibility of double unit roots in the data: 1. The graphs of the data in levels and dierences. 2. The graphs of the cointegration relations 0 xt compared to 0 R1,t . 3. The characteristic roots of the model for reasonable choice of cointegration rank. When long-run price homogeneity is not present we may still be able to transform the data but with some violation of the I(1) properties. However, the analysis based on the I(1) model will then imply some loss of information.
ii
Contents
0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1 Introduction 1.1 A historical overview . . . . . . . . . . . . . 1.2 On the choice of economic models . . . . . . 1.3 Theoretical, true and observable variables . 1.4 Testing a theory as opposed to a hypothesis 1.5 Experimental design in macroeconomics . . 1.6 On the choice of empirical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 3 . 3 . 5 . 7 . 8 . 10 . . . . . . . . . . . . . . . . 13 15 19 22 28 34 35 37 37 40 46 47 50 53 53 55 57 58
2 Models and Relations 2.1 Ination and money growth . . . . . . . . . . 2.2 The time dependence of macroeconomic data . 2.3 A stochastic formulation . . . . . . . . . . . . 2.4 Scenario Analyses: treating prices as I(2) . . . 2.5 Scenario Analyses: treating prices as I(1) . . . 2.6 Concluding remarks . . . . . . . . . . . . . . . 3 The 3.1 3.2 3.3 3.4 3.5 3.6
Probability Approach A single time series process . . . . . . . . . . . . . . . . A vector process . . . . . . . . . . . . . . . . . . . . . . . Sequential decomposition of the likelihood function . . . Deriving the VAR . . . . . . . . . . . . . . . . . . . . . . Interpreting the VAR model . . . . . . . . . . . . . . . . The dynamic properties of the VAR process . . . . . . . 3.6.1 The roots of the characteristic function . . . . . . 3.6.2 Calculating the roots using the companion matrix 3.6.3 Illustration . . . . . . . . . . . . . . . . . . . . . 3.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . iii
iv
CONTENTS 59 60 63 65 66 68 70 71 72 72 76 79 80 81 83 85 85 86 91 93 96 98 99 99 102 106 107 109 110 115 118
4 Estimation and Specication 4.1 Likelihood based estimation in the unrestricted VAR . . . . . 4.1.1 The estimates of the unrestricted VAR(2) for the Danish data . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Three dierent ECM-representations . . . . . . . . . . . . . . 4.2.1 The ECM formulation with m = 1. . . . . . . . . . . . 4.2.2 The ECM formulation with m = 2 . . . . . . . . . . . 4.2.3 ECM-representation in acceleration rates, changes and levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 The relationship between the dierent VAR formulations 4.3 Misspecication tests . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Specication checking . . . . . . . . . . . . . . . . . . . 4.3.2 Residual correlations and information criteria . . . . . 4.3.3 Tests of residual autocorrelation . . . . . . . . . . . . . 4.3.4 Tests of residual heteroscedastisity . . . . . . . . . . . 4.3.5 Normality tests . . . . . . . . . . . . . . . . . . . . . . 4.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . 5 The 5.1 5.2 5.3 5.4 5.5 5.6 Cointegrated VAR Model Integration and cointegration . . . . . An intuitive interpretation of = 0 Common trends . . . . . . . . . . . . . From AR to MA . . . . . . . . . . . . Pulling and pushing forces . . . . . . . Concluding discussion . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
6 Deterministic Components 6.1 A dynamic regression model . . . . . . 6.2 A trend and a constant in the VAR . . 6.3 Five cases . . . . . . . . . . . . . . . . 6.4 The MA representation . . . . . . . . . 6.5 Dummy variables in a simple regression 6.6 Dummy variables and the VAR . . . . 6.7 An illustrative example . . . . . . . . . 6.8 Conclusions . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . model . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
CONTENTS 7 Estimation in the I(1) Model 7.1 Concentrating the general VAR-model . . . 7.2 Derivation of the ML estimator . . . . . . . 7.3 Normalization . . . . . . . . . . . . . . . . . 7.4 The uniqueness of the unrestricted estimates 7.5 An illustration . . . . . . . . . . . . . . . . . 8 Cointegration Rank 8.1 The trace test . . . . . . . . . . . . . . . 8.2 The asymptotic tables . . . . . . . . . . 8.3 Choosing the rank . . . . . . . . . . . . 8.4 An illustration . . . . . . . . . . . . . . . 8.5 Recursive tests of constancy . . . . . . . 8.5.1 Recursively calculated trace tests 8.5.2 8.5.3 . . . . . . . . . . . .
v 119 . 119 . 122 . 125 . 126 . 127 139 . 139 . 143 . 150 . 153 . 155 . 156
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
The recursively calculated log-likelihood . . . . . . . . 157 Recursively calculated prediction tests . . . . . . . . . 158 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 . 166 . 168 . 171 . 177 . 179 . 181 . 184 . 186 . 189 . 192 195 . 196 . 197 . 202 . 206 . 209 . 211 . 212 . 216
9 Testing restrictions 9.1 Formulating hypotheses . . . . 9.2 Same restriction . . . . . . . . . 9.2.1 Illustrations . . . . . . . 9.3 Some assumed known . . . . 9.3.1 Illustrations . . . . . . . 9.4 Some coecients known . . . . 9.4.1 Illustrations . . . . . . . 9.5 Long-run weak exogeneity . . . 9.5.1 Empirical illustration: . 9.6 Revisiting the scenario analysis
10 Identication Long-Run Structure 10.1 Identication . . . . . . . . . . . . 10.2 Identifying restrictions . . . . . . . 10.3 Formulating identifying hypotheses 10.4 Just-identifying restrictions . . . . 10.5 Over-identifying restrictions . . . . 10.6 Lack of identication . . . . . . . . 10.7 Recursive tests of and . . . . . 10.8 Concluding discussions . . . . . . .
vi 11 Identication of the Short-Run Structure 11.1 Formulating identifying restrictions . . . . 11.2 Interpreting shocks . . . . . . . . . . . . . 11.3 Which economic questions? . . . . . . . . 11.4 Reduced form restrictions . . . . . . . . . 11.5 The VAR in triangular form . . . . . . . . 11.6 General restrictions . . . . . . . . . . . . . 11.7 A partial system . . . . . . . . . . . . . . 11.8 Economic identication . . . . . . . . . . . 12 Identication of Common trends 12.1 The common trends representation 12.2 Some special cases . . . . . . . . . 12.3 Illustrations . . . . . . . . . . . . . 12.4 Economic identication . . . . . . .
CONTENTS 219 . 220 . 222 . 223 . 228 . 231 . 234 . 239 . 241 245 246 247 248 254
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
13 Identication of a Structural VAR 259 13.1 Transitory and permanent shocks . . . . . . . . . . . . . . . . 259 14 I(2) 14.1 14.2 14.3 14.4 14.5 15 The 15.1 15.2 15.3 15.4 15.5 15.6 15.7 Symptoms Stochastic and deterministic components in nominal Estimating an I(1) model with I(2) data . . . . . . An intuitive account . . . . . . . . . . . . . . . . . Transforming I(2) data to I(1) . . . . . . . . . . . . Concluding remarks . . . . . . . . . . . . . . . . . . I(2) Model Introducing the I(2) model . . . . . . . . . . . . . . Dening the I(2) model . . . . . . . . . . . . . . . . Deterministic components in the I(2) model . . . . The two-step procedure . . . . . . . . . . . . . . . . Th full information maximum likelihood procedure Price relations in the long run and the medium run The nominal to real transformation . . . . . . . . . 265 266 268 270 273 276
models . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
277 . 277 . 279 . 281 . 282 . 287 . 290 . 292
16 On the Econometric Approach 295 16.1 A concluding discussion . . . . . . . . . . . . . . . . . . . . . 296 16.2 General-to-Specic and Specic-to-General . . . . . . . . . . . 299

VAR VEC Juselius Johansen

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VAR VEC Juselius Johansen

Uploaded by

Copyright:

Available Formats

The Cointegrated VAR Model: Econometric Methodology and Macroeconomic Applications

Katarina Juselius July, 20th, 2003

1.1. A HISTORICAL OVERVIEW

On the choice of economic models

1.3. THEORETICAL, TRUE AND OBSERVABLE VARIABLES

Theoretical, true and observable variables

1.4. TESTING A THEORY AS OPPOSED TO A HYPOTHESIS

Testing a theory as opposed to a hypothesis

Experimental design in macroeconomics

1.5. EXPERIMENTAL DESIGN IN MACROECONOMICS

On the choice of empirical example

1.6. ON THE CHOICE OF EMPIRICAL EXAMPLE

Chapter 2 Models and Relations in Economics and Econometrics

The VAR approach and theory based models

CHAPTER 2. MODELS AND RELATIONS

Dynamic general equilibrium models

2.2. INFLATION AND MONEY GROWTH

Ination and money growth

CHAPTER 2. MODELS AND RELATIONS

2.2. INFLATION AND MONEY GROWTH

CHAPTER 2. MODELS AND RELATIONS

Deviations from money steady-state: m-p-y-14(Rm-Rb)

where vt is a stationary process measuring the deviation from the steady-state

2.3. THE TIME DEPENDENCE OF MACROECONOMIC DATA

The time dependence of macroeconomic data

CHAPTER 2. MODELS AND RELATIONS

2.3. THE TIME DEPENDENCE OF MACROECONOMIC DATA

Century long inflation

Inflation after 2nd world war

Inflation after 1975

CHAPTER 2. MODELS AND RELATIONS

2.4. A STOCHASTIC FORMULATION macroeconomic variable. X =T CI

CHAPTER 2. MODELS AND RELATIONS

1975 .1 .05 0 -.05 1975 .15 .1 .05 1975

Stochastic I(1) trend in real income

Stochastic I(1) trend in inflation

2.4. A STOCHASTIC FORMULATION

CHAPTER 2. MODELS AND RELATIONS

2.4. A STOCHASTIC FORMULATION

CHAPTER 2. MODELS AND RELATIONS

Scenario Analyses: treating prices as I(2)

2.5. THE I(2) SCENARIO

d11 d21 d31 d41 d51

d12 d22 d32 d42 d52

[t] + stat.comp. (2.11)

CHAPTER 2. MODELS AND RELATIONS

2.5. THE I(2) SCENARIO

Rt = Et (m pt+m )/m + R0 = (pt+m pt )/m + R0 = 1/m

2.5. THE I(2) SCENARIO

Deviations from money steady-state

CHAPTER 2. MODELS AND RELATIONS

Scenario Analyses: treating prices as I(1)

u2i + g1 t + stat.comp. u2i + g2 t + stat.comp.

2.7. CONCLUDING REMARKS

Chapter 3 The Probability Approach in Econometrics and the VAR

A single time series process

CHAPTER 3. THE PROBABILITY APPROACH

Figure 3.1. E(xt ) = , V ar(xt ) = 2 , t = 1, .., 6 x(t) 6

3.1. A SINGLE TIME SERIES PROCESS

= 1.0 2.1 3.2 . . .

0 Cov[x] = E[x E(x)][x E(x)] = x1 x2 . . . xT

CHAPTER 3. THE PROBABILITY APPROACH

3.2. A VECTOR PROCESS