Giorgio Pauletto
November 1995 Department of Econometrics University of Geneva
Contents
1 Introduction 2 A Review of Solution Techniques 2.1 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 2.1.2 2.1.3 2.2 2.2.1 2.2.2 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.4 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6 2.4.7 2.5 2.5.1 2.5.2 2.5.3 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational Complexity . . . . . . . . . . . . . . . . . Practical Implementation . . . . . . . . . . . . . . . . . . Computational Complexity . . . . . . . . . . . . . . . . . Practical Implementation . . . . . . . . . . . . . . . . . . Data Structures and Storage Schemes . . . . . . . . . . . Fillin in Sparse LU . . . . . . . . . . . . . . . . . . . . . Computational Complexity . . . . . . . . . . . . . . . . . Practical Implementation . . . . . . . . . . . . . . . . . . Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . GaussSeidel Method . . . . . . . . . . . . . . . . . . . . . Successive Overrelaxation Method . . . . . . . . . . . . . Fast GaussSeidel Method . . . . . . . . . . . . . . . . . . Block Iterative Methods . . . . . . . . . . . . . . . . . . . Convergence . . . . . . . . . . . . . . . . . . . . . . . . . Computational Complexity . . . . . . . . . . . . . . . . . Conjugate Gradient . . . . . . . . . . . . . . . . . . . . . Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . Conjugate Gradient Normal Equations . . . . . . . . . . . 1 4 5 7 8 8 8 10 10 10 11 13 14 14 14 15 15 16 17 17 19 20 21 21 22 24
QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . .
Direct Methods for Sparse Matrices . . . . . . . . . . . . . . . . .
Stationary Iterative Methods . . . . . . . . . . . . . . . . . . . .
Nonstationary Iterative Methods . . . . . . . . . . . . . . . . . .
CONTENTS
ii
2.5.4 2.5.5 2.5.6 2.5.7 2.6 2.6.1 2.6.2 2.7 2.8 2.7.1
Generalized Minimal Residual . . . . . . . . . . . . . . . . BiConjugate Gradient Method . . . . . . . . . . . . . . . BiConjugate Gradient Stabilized Method . . . . . . . . . Implementation of Nonstationary Iterative Methods . . . Computational Complexity . . . . . . . . . . . . . . . . . Convergence . . . . . . . . . . . . . . . . . . . . . . . . . Convergence of the Finite Diﬀerence Newton Method . .
25 27 28 29 29 31 31 32 33 34 35 35 37 38 39 41 41 43 44 46 47 48 50 56 56 57 57 59 61 62 62 63 67 69
Newton Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .
Finite Diﬀerence Newton Method . . . . . . . . . . . . . . . . . . Simpliﬁed Newton Method . . . . . . . . . . . . . . . . . . . . . . 2.8.1 Convergence of the Simpliﬁed Newton Method . . . . . .
2.9
QuasiNewton Methods . . . . . . . . . . . . . . . . . . . . . . . 2.10.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Nonlinear Firstorder Methods . . . . . . . . . . . . . . . . . . . 2.11 Solution by Minimization . . . . . . . . . . . . . . . . . . . . . . 2.12 Globally Convergent Methods . . . . . . . . . . . . . . . . . . . . 2.12.1 Linesearch . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12.2 Modeltrust Region . . . . . . . . . . . . . . . . . . . . . . 2.13 Stopping Criteria and Scaling . . . . . . . . . . . . . . . . . . . . 3 Solution of Large Macroeconometric Models 3.1 3.2 Blocktriangular Decomposition of the Jacobian Matrix . . . . . . Orderings of the Jacobian Matrix . . . . . . . . . . . . . . . . . . 3.2.1 3.2.2 3.3 The Logical Framework of the Algorithm . . . . . . . . . Practical Considerations . . . . . . . . . . . . . . . . . . .
Point Methods versus Block Methods . . . . . . . . . . . . . . . . 3.3.1 3.3.2 3.3.3 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . Discussion of the Block Method . . . . . . . . . . . . . . . Ordering and Convergence for Firstorder Iterations . . .
3.4
Essential Feedback Vertex Sets and the Newton Method . . . . .
4 Model Simulation on Parallel Computers 4.1 Introduction to Parallel Computing . . . . . . . . . . . . . . . . . 4.1.1 4.1.2 4.1.3 A Taxonomy for Parallel Computers . . . . . . . . . . . . Communication Tasks . . . . . . . . . . . . . . . . . . . . Synchronization Issues . . . . . . . . . . . . . . . . . . . .
CONTENTS
iii
4.1.4 4.2 4.2.1 4.2.2 4.2.3
Speedup and Eﬃciency of an Algorithm . . . . . . . . . . Econometric Models and Solution Algorithms . . . . . . . Parallelization Potential for Solution Algorithms . . . . . Practical Results . . . . . . . . . . . . . . . . . . . . . . .
70 71 71 73 76 82 82 85 86 89 89 90 92 92 93 94 97
Model Simulation Experiences . . . . . . . . . . . . . . . . . . . .
5 Rational Expectations Models 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 5.1.2 5.2 Formulation of RE Models . . . . . . . . . . . . . . . . . . Uniqueness and Stability Issues . . . . . . . . . . . . . . .
The Model MULTIMOD . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 5.2.2 5.2.3 Overview of the Model . . . . . . . . . . . . . . . . . . . . Equations of a Country Model . . . . . . . . . . . . . . . Structure of the Complete Model . . . . . . . . . . . . . . Extended Path . . . . . . . . . . . . . . . . . . . . . . . . Stackedtime Approach . . . . . . . . . . . . . . . . . . . Block Iterative Methods . . . . . . . . . . . . . . . . . . .
5.3
Solution Techniques for RE Models . . . . . . . . . . . . . . . . . 5.3.1 5.3.2 5.3.3 5.3.4
Newton Methods . . . . . . . . . . . . . . . . . . . . . . . 107 122
A Appendix
A.1 Finite Precision Arithmetic . . . . . . . . . . . . . . . . . . . . . 122 A.2 Condition of a Problem . . . . . . . . . . . . . . . . . . . . . . . 123 A.3 Complexity of Algorithms . . . . . . . . . . . . . . . . . . . . . . 125
List of Tables
4.1 4.2 4.3 4.4 Complexity of communication tasks on a linear array and a hypercube with p processors. . . . . . . . . . . . . . . . . . . . . . Execution times of GaussSeidel and Jacobi algorithms. . . . . . Execution time on CM2 and Sun ELC. . . . . . . . . . . . . . . . Execution time on Sun ELC and CM2 for the Newtonlike algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Labels for the zones/countries considered in MULTIMOD. . . .
69 77 80 81 90
5.1 5.2 5.3 5.4 5.5 5.6 5.7
Spectral radii for point and block GaussSeidel. . . . . . . . . . . 100 Operation count in Mﬂops for Newton combined with SGE and MATLAB’s sparse solver, and GaussSeidel. . . . . . . . . . . . 113 Average number of Mﬂops for BiCGSTAB. . . . . . . . . . . . . 118 Average number of Mﬂops for QMR. . . . . . . . . . . . . . . . 119 Average number of Mﬂops for GMRES(m). . . . . . . . . . . . . 120 Average number of Mﬂops for MATLAB’s sparse LU. . . . . . . 121
List of Figures
2.1 2.2 3.1 3.2 3.3 3.4 3.5 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 A one dimensional function F (x) with a unique zero and its corresponding function f (x) with multiple local minima. . . . . . . . The quadratic model g (ω) built to determine the minimum ω. . . ˆ ˆ Blockrecursive pattern of a Jacobian matrix. . . . . . . . . . . . Sparsity pattern of the reordered Jacobian matrix. . . . . . . . . Situations considered for the transformations. . . . . . . . . . . . Tree T = (S, U ). . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical example showing the structure is not suﬃcient. . . .
41 43 49 49 53 55 60 64 65 65 66 66 66 67 67 70 70 75 77 78
Shared memory system. . . . . . . . . . . . . . . . . . . . . . . . Distributed memory system. Ring. . . . . . . . . . . . . . . . . . . . . Linear Array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Torus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hypercubes.
Complete graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . Long communication delays between two processors. . . . . . . . . . . . . . .
4.10 Large diﬀerences in the workload of two processors.
4.11 Original and ordered Jacobian matrix and corresponding DAG. . 4.12 Blockrecursive pattern of the model’s Jacobian matrix. . . . . . . 4.13 Matrix L for the GaussSeidel algorithm. . . . . . . . . . . . . . . 5.1 5.2 5.3 Linkages of the country models in the complete version of MULTIMOD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Incidence matrix of D in MULTIMOD. . . . . . . . . . . . . . .
92 93
Incidence matrices E3 to E1 , D and A1 to A5 .
. . . . . . . . . . 100
LIST OF FIGURES
vi
5.4 5.5 5.6 5.7 5.8
Alignment of data in memory.
. . . . . . . . . . . . . . . . . . . 104 . . . . 106
Elapsed time for 4 processors and for a single processor.
Relation between r and κ2 in submodel for Japan for MULTIMOD.112 Scheduling of operations for the solution of the linear system as computed on page 110. . . . . . . . . . . . . . . . . . . . . . . . . 116 Incidence matrix of the stacked system for T = 10. . . . . . . . 117
Acknowledgements
This thesis is the result of my research at the Department of Econometrics of the University of Geneva, Switzerland. First and foremost, I wish to express my deepest gratitude to Professor Manfred Gilli, my thesis supervisor for his constant support and help. He has shown a great deal of patience, availability and humane qualities beyond his professional competence. I would like to thank Professor Andrew HughesHallett for accepting to read and evaluate this work. His research also made me discover and take interest in the ﬁeld of simulation of large macroeconometric models. I am also grateful to Professor Fabrizio Carlevaro for accepting the presidency of the jury, and also reading my thesis. Moreover, I thank Professor JeanPhilippe Vial and Professor Gerhard Wanner for being part of the jury and evaluating my work. I am happy to be able to show my gratitude to my colleagues and friends of the Departement of Econometrics for creating a pleasant and enjoyable working environment. David Miceli provided constant help and kind understanding during all the stages of my research. I am grateful to Pascale Mignon for helping me proofreading my text. Finally, I wish to thank my parents for their kindness and encouragements without which I could never have achieved my goals. Geneva, November 1995.
Chapter 1
Introduction
The purpose of this book is to present the available methodologies for the solution of largescale macroeconometric models. This work reviews classical solution methods and introduces more recent techniques, such as parallel computing and nonstationary iterative algorithms. The development of new and more eﬃcient computational techniques has signiﬁcantly inﬂuenced research and practice in macroeconometric modeling. Our aim here is to supply practitioners and researchers with both a general presentation of numerical solution methods and speciﬁc discussions about particular problems encountered in the ﬁeld. An econometric model is a simpliﬁed representation of actual economic phenomena. Real economic behavior is typically represented by an algebraic set of equations that forms a system of equations. The latter involves endogenous variables, which are determined by the system itself, and exogenous variables, which inﬂuence but are not determined by the system. The model also contains parameters that we will assume are already estimated by an adequate econometric technique. We may express the econometric model in matrix form for a given period t as F (yt , zt , β) = εt , where F is a vector of n functions fi , yt is a vector of n endogenous variables, zt is a vector of m exogenous variables, β is a vector of k parameters and εt is a vector of n stochastic disturbances with zero mean. In this work, we will concentrate on the solution of the model with respect to the endogenous variables yt . Hence, we will solve a system such as F (yt , zt ) = 0 . (1.1)
Such a model will be solved period after period for some horizon, generally outside the sample range used for estimation. Therefore, we usually drop the index t. A particular class of models, which contain anticipated variables, are described in Chapter 5. In this case, the solution has to be computed simultaneously for the periods considered.
Introduction
2
Traditionally, in the practice of solving large macroeconometric models, two kinds of solution algorithms have been used. The most popular ones are probably ﬁrstorder iterative techniques and related methods like GaussSeidel. One obvious reason for this is their ease of implementation. Another reason is that their computational complexity is in general quite low, mainly because GaussSeidel naturally exploits the sparse structure of the system of equations. The convergence of these methods depends on the particular quantiﬁcation of the equations and their ordering. Convergence is not guaranteed and its speed is linear. Newtontype methods constitute a second group of techniques commonly used to solve models. These methods use the information about the derivatives of the equations. The major advantages are then a quadratic convergence, the fact that the equations do not need to be normalized and that the ordering does not inﬂuence the convergence rate. The computational cost comprises the evaluation of the derivatives forming the Jacobian matrix and the solution of the linear system. If the linear system is solved using a classical direct method based on LU or QR decomposition, the complexity of the whole method is O(n3 ). This promises interesting savings in computations if size n can be reduced. A common technique consists then in applying the Newton method only to a subset of equations, for instance the equations formed by the spike variables. This leads to a block method, i.e. a ﬁrstorder iterative method where only a subsystem of equations is solved with a Newton method. A recursive system constitutes the ﬁrst block constitutes and the second block (in general much smaller) is solved by a Newton method. However, such a method brings us back to the problem of convergence for the outer loop. Moreover, for macroeconometric models in most cases the block of spike variables is also recursive, which then results in carrying out unnecessary computations. Thus, the block method tries to take advantage from both the sparse structure of the system under consideration and the desirable convergence properties of Newtontype algorithms. However, as explained above, this approach relapses into the convergence problem existing in the framework of a block method. This suggests that the sparsity should be exploited when solving the linear system in the Newton method, which can be achieved by using appropriate sparse techniques. This work presents methods for the solution of large macroeconometric models. The classical approaches mentioned above are presented with a particular emphasis on the problem of the ordering of the equations. We then look into more recent developments in numerical techniques. The solution of a linear system is a basic task of most solution algorithms for systems of nonlinear equations. Therefore, we pay special attention to the solution of linear systems. A central characteristic of the linear systems arising in macroeconometric modeling is their sparsity. Hence, methods able to take advantage of a sparse structure are of crucial importance. A more recent set of tools available for the solution of linear equations are nonstationary methods. We explore their performance for a particular class of
Introduction
3
models in economics. The last decade has revealed that parallel computation is now practical and has a signiﬁcant impact on how large scale computation is performed. This technology is therefore available to solve large numerical problems in economics. A consequence of this trend is that the eﬃcient use of parallel machines may require new algorithm development. We therefore address some practical aspects concerning parallel computation. A particular class of macroeconometric models are models containing forward looking variables. Such models naturally give raise to very large systems of equations, the solution of which requires heavy computations. Thus such models constitute an interesting testing ground for the numerical methods addressed in this research. This work is organized into ﬁve chapters. Chapter 2 reviews solution techniques for linear and nonlinear systems. First, we discuss direct methods with a particular stress on the sparse case. This is followed by the presentation of iterative methods for linear systems, displaying both stationary and nonstationary techniques. For the nonlinear case, we concentrate on the Newton method and some of its principal variants. Then, we examine the nonlinear versions of ﬁrstorder iterative techniques and quasiNewton methods. The alternative approach of residual minimization and issues about global convergence are also analyzed. The macroeconometric models we consider are large and sparse and therefore analyzing their logical structure is relevant. Chapter 3 introduces a graphtheoretical approach to perform this analysis. We ﬁrst introduce the method to investigate the recursive structures. Later, original techniques are developed to analyze interdependent structures, in particular by an algorithm for computing minimal feedback sets. These techniques are then used to seek for a block decomposition of a model and we conclude with a comparison of computational complexity of point methods versus block methods. Chapter 4 addresses the main concerning the type of computer and the solution technique used in parallel computation. Practical aspects are also examined through the application of parallel techniques to the simulation of a medium sized macroeconometric model. In Chapter 5, we present the theoretical framework of rational expectation models. In the ﬁrst part, we discuss issues concerning the existence and unicity of the solution. In the second part, we present a multiregion econometric model with forwardlooking variables. Then, diﬀerent solution techniques are experimented to solve this model.
Chapter 2
A Review of Solution Techniques
This chapter reviews classic and well implemented solution techniques for linear and nonlinear systems. First, we discuss direct and iterative methods for linear systems. Some of these methods are part of the fundamental building blocks for many techniques for solving nonlinear systems presented later. The topic has been extensively studied and many methods have been analyzed in scientiﬁc computing literature, see e.g. Golub and Van Loan [56], Gill et al. [47], Barrett et al. [8] and Hageman and Young [60]. Second, the nonlinear case is addressed essentially presenting methods based on Newton iterations. First, direct methods for solving linear systems of equations are displayed. The ﬁrst section presents the LU factorization—or Gaussian elimination technique— is presented and the second section describes, an orthogonalization decomposition leading to the QR factorization. The case of dense and sparse systems are then addressed. Other direct methods also exist, such as the Singular Value Decomposition (SVD) which can be used to solve linear systems. Even though this can constitute an interesting and useful approach we do not resort to it here. Section 2.4 introduces stationary iterative methods such as Jacobi, GaussSeidel, SOR techniques and their convergence characteristics. Nonstationary iterative methods—such as the conjugate gradient, general minimal residual and biconjugate gradient, for instance—a class of more recently developed techniques constitute the topic of Section 2.5. Section 2.10 presents nonlinear ﬁrstorder methods that are quite popular in macroeconometric modeling. The topic of Section 2.11 is an alternative approach to the solution of a system of nonlinear equations: a minimization of the residuals norm. To overcome the nonconvergent behavior of the Newton method in some circumstances, two globally convergent modiﬁcations are introduced in Section 2.12.
2.1 LU Factorization
5
Finally, we discuss stopping criteria and scaling.
2.1
LU Factorization
For a linear model, ﬁnding a vector of solutions amounts to solving for x a system written in matrix form Ax = b , (2.1) where A is a n × n real matrix and b a n × 1 real vector. System (2.1) can be solved by the Gaussian elimination method which is a widely used algorithm and here, we present its application for a dense matrix A with no particular structure. The basic idea of Gaussian elimination is to transform the original system into an equivalent triangular system. Then, we can easily ﬁnd the solution of such a system. The method is based on the fact that replacing an equation by a linear combination of the others leaves the solution unchanged. First, this idea is applied to get an upper triangular equivalent system. This stage is called the forward elimination of the system. Then, the solution is found by solving the equations in reverse order. This is the back substitution phase. To describe the process with matrix algebra, we need to deﬁne a transformation that will take care of zeroing the elements below the diagonal in a column of matrix A. Let x ∈ Rn be a column vector with xk = 0. We can deﬁne
(k) τ (k) = [0 . . . 0 τk+1 . . . τn ] (k)
with
τi
(k)
= xi /xk
for i = k + 1, . . . , n .
Then, the matrix Mk = I − τ (k) ek with ek being the kth standard vector of Rn , represents a Gauss transformation. The vector τ (k) is called a Gauss vector. By applying Mk to x, we check that we get 1 ··· 0 0 ··· 0 x1 x1 . . . . . . . .. . . . . . . . . . . . . 0 ··· 1 0 · · · 0 xk xk . = Mk x = (k) 0 · · · −τk+1 1 · · · 0 xk+1 0 . . . .. . . . . . . . . . . . . . . . . 0 ··· −τn
(k)
0 ··· 1
xn
0
Practically, applying such a transformation is carried out without explicitly building Mk or resorting to matrix multiplications. For example, in order to multiply Mk by a matrix C of size n × r, we only need to perform an outer product and a matrix subtraction: Mk C = (I − τ (k) ek )C = C − τ (k) (ek C) . (2.2)
The product ek C selects the kth row of C, and the outer product τ (k) (ek C) is subtracted from C. However, only the rows from k + 1 to n of C have to be updated as the ﬁrst k elements in τ (k) are zeros. We denote by A(k) the matrix Mk · · · M1 A, i.e. the matrix A after the kth elimination step.
2.1 LU Factorization
6
To triangularize the system, we need to apply n − 1 Gauss transformations, provided that the Gauss vector can be found. This is true if all the divisors (k) akk —called pivots—used to build τ (k) for k = 1, . . . , n are diﬀerent from zero. If for a real n×n matrix A the process of zeroing the elements below the diagonal is successful, we have Mn−1 Mn−2 · · · M1 A = U , where U is a n × n upper triangular matrix. Using the ShermanMorrison−1 Woodbury formula, we can easily ﬁnd that if Mk = I − τ (k) ek then Mk = −1 −1 −1 (k) I + τ ek and so deﬁning L = M1 M2 · · · Mn−1 we can write A = LU .
−1 As each matrix Mk is unit lower triangular, each Mk also has this property; therefore, L is unit lower triangular too. By developing the product deﬁning L, we have n−1
L = (I + τ (1) e1 )(I + τ (2) e2 ) · · · (I + τ (n−1) en−1 ) = I +
k=1
τ (k) ek .
So L contains ones on the main diagonal and the vector τ (k) in the kth column below the diagonal for k = 1, . . . , n − 1 and we have 1 τ (1) 1 1 (1) (2) τ2 . τ1 1 L= . .. . . . . . . τn−1
(1)
τn−2
(2)
· · · τ1
(n−1)
1
By applying the Gaussian elimination to A we found a factorization of A into a unit lower triangular matrix L and an upper triangular matrix U . The existence and uniqueness conditions as well as the result are summarized in the following theorem. Theorem 1 A ∈ Rn×n has an LU factorization if the determinants of the ﬁrst n − 1 principal minors are diﬀerent from 0. If the LU factorization exists and A is nonsingular, then the LU factorization is unique and det(A) = u11 · · · unn . The proof of this theorem can be found for instance in Golub and Van Loan [56, p. 96]. Once the factorization has been found, we obtain the solution for the system Ax = b, by ﬁrst solving Ly = b by forward substitution and then solving U x = y by back substitution. Forward substitution for a unit lower triangular matrix is easy to perform. The ﬁrst equation gives y1 = b1 because L contains ones on the diagonal. Substituting y1 in the second equation gives y2 . Continuing thus, the triangular system Ly = b is solved by substituting all the known yj to get the next one. Back substitution works similarly, but we start with xn since U is upper triangular. Proceeding backwards, we get xi by replacing all the known xj (j > i) in the ith equation of U x = y.
2.1 LU Factorization
7
2.1.1
Pivoting
As described above, the Gaussian elimination breaks down when a pivot is equal to zero. In such a situation, a simple exchange of the equations leading to a nonzero pivot may get us round the problem. However, the condition that all the pivots have to be diﬀerent than zero does not suﬃce to ensure a numerically reliable result. Moreover at this stage, the Gaussian elimination method, is still numerically unstable. This means that because of cancellation errors, the process described can lead to catastrophic results. The problem lies in the size of the elements of the Gauss vector τ . If they are too large compared to the elements from which they are subtracted in Equation (2.2), rounding errors may be magniﬁed thus destroying the numerical accuracy of the computation. To overcome this diﬃculty a good strategy is to exchange the rows of the matrix during the process of elimination to ensure that the elements of τ will always be smaller or equal to one in magnitude. This is achieved by choosing the permutation of the rows so that akk  = max aik  .
i>k (k) (k)
(2.3)
Such an exchange strategy is called partial pivoting and can be formalized in matrix language as follows. Let Pi be a permutation matrix of order n, i.e. the identity matrix with its rows reordered. To ensure that no element in τ is larger than one in absolute value, we must permute the rows of A before applying the Gauss transformation. This is applied at each step of the Gaussian elimination process, which leads to the following theorem: Theorem 2 If Gaussian elimination with partial pivoting is used to compute the upper triangularization Mn−1 Pn−1 · · · M1 P1 A = U , then P A = LU where P = Pn−1 · · · P1 and L is a unit lower triangular matrix with  ij  ≤ 1. Thus, when solving a linear system Ax = b, we ﬁrst compute the vector y = Mn−1 Pn−1 · · · M1 P1 b and then solve U x = y by back substitution. This method is much more stable and it is very unlikely to ﬁnd catastrophic cancellation problems. The proof of Theorem 2 is given in Golub and Van Loan [56, p. 112]. Going one step further would imply permuting not only the rows but also the columns of A so that in the kth step of the Gaussian elimination the largest element of the submatrix to be transformed is used as pivot. This strategy is called complete pivoting. However, applying complete pivoting is costly because one needs to search for the largest element in a matrix instead of a vector at each elimination step. This overhead does not justify the gain one may obtain in the stability of the method in practice. Therefore, the algorithm of choice for solving Ax = b, when A has no particular structure, is Gaussian elimination with partial pivoting.
2.2 QR Factorization
8
2.1.2
Computational Complexity
The number of elementary arithmetic operations (ﬂops) for the Gaussian elimination is 2 n3 − 1 n2 − 1 n and therefore this methods is O(n3 ). 3 2 6
2.1.3
Practical Implementation
In the case where one is only interested in the solution vector, it is not necessary to explicitly build matrix L. It is possible to directly compute the y vector (solution of Ly = b) while transforming matrix A into an upper triangular matrix U . Despite the fact that Gaussian elimination seems to be easy to code, it is certainly not advisable to write our own code. A judicious choice is to rely on carefully tested software as the routines in the LAPACK library. These routines are publicly available on NETLIB1 and are also used by the software MATLAB2 which is our main computing environment for the experiments we carried out.
2.2
QR Factorization
The QR factorization is an orthogonalization method that can be applied to square or rectangular matrices. Usually this is a key algorithm for computing eigenvalues or leastsquares solutions and it is less applied to ﬁnd the solution of a square linear system. Nevertheless, there are at least 3 reasons (see Golub and Van Loan [56]) why orthogonalization methods, such as QR, might be considered: • The orthogonal methods have guaranteed numerical stability which is not the case for Gaussian elimination. • In case of illconditioning, orthogonal methods give an added measure of reliability. • The ﬂop count tends to exaggerate the Gaussian elimination advantage.3 (Particularly for parallel computers, memory traﬃc and other overheads tend to reduce this advantage.) Another advantage that might favor the QR factorization is the possibility of updating the factors Q and R corresponding to a rank one modiﬁcation of matrix A in O(n2 ) operations. This is also possible for the LU factorization; however,
1 NETLIB can be accessed through the World Wide Web at http://www.netlib.org/ and collects mathematical software, articles and databases useful for the scientiﬁc community. In Europe the URL is http://www.netlib.no/netlib/master/readme.html or http://elib.zibberlin.de/netlib/master/readme.html . 2 MATLAB High Performance Numeric Computation and Visualization Software is a product and registered trademark of The MathWorks, Inc., Cochituate Place, 24 Prime Park Way, Natick MA 01760, USA. URL: http://www.mathworks.com/ . 3 In the application discussed in Section 4.2.2 we used the QR factorization available in the libraries of the CM2 parallel computer.
2.2 QR Factorization
9
the implementation is much simpler with QR, see Gill et al. [47]. Updating techniques will prove particularly useful in the quasiNewton algorithm presented in Section 2.9. These reasons suggest that QR probably are, especially on parallel devices, a possible alternative to LU to solve square systems. The QR factorization can be applied to any rectangular matrix, but we will focus on the case of a n × n real matrix A. The goal is to apply to A successive orthogonal transformation matrices Hi , i = 1, 2, . . . , r to get an upper triangular matrix R, i.e. Hr · · · H1 A = R . The orthogonal transformations presented in the literature are usually based upon Givens rotations or Householder reﬂections. This latter choice leads to algorithms involving less arithmetic operations and is therefore presented in the following. A Householder transformation is a matrix of the form H = I − 2ww with ww=1.
Such a matrix is symmetric, orthogonal and its determinant is −1. Geometrically, this matrix represents a reﬂection with respect to the hyperplane deﬁned by {xw x = 0}. By properly choosing the reﬂection plane, it is possible to zero particular elements in a vector. Let us partition our matrix A in n column vectors [a1 · · · an ]. We ﬁrst look for a matrix H1 such as all the elements of H1 a1 except the ﬁrst one are zeros. We deﬁne s1 = −sign(a11 ) a1 µ1 = (2s2 − 2a11 s1 )−1/2 1 u1 = [(a11 − s1 ) a21 · · · an1 ] w1 = µ1 u1 . Actually the sign of s1 is free, but it is chosen to avoid catastrophic cancellation that may otherwise appear in computing µ1 . As w1 w1 = 1, we can let H1 = I − 2w1 w1 and verify that H1 a1 = [s1 0 · · · 0] . Computationally, it is more eﬃcient to calculate the product H1 A in the following manner H1 A = A − 2w1 w1 A = A − 2w1 w1 a1
w1 a2
· · · w1 am
so the ith column of H1 A is ai − 2(w1 ai )w1 = ai − (c1 u1 ai )w1 and c1 = 2µ2 = 1 (s2 − s1 a11 )−1 . 1 We continue this process in a similar way on a matrix A where we have removed the ﬁrst row and column. The vectors w2 and u2 will now be of dimension (n − 1) × 1 but we can complete them with zeros to build H2 = I − 0 w2 0 w2 .
2.3 Direct Methods for Sparse Matrices
10
After n − 1 steps, we have Hn−1 · · · H2 H1 A = R. As all the matrices Hi are orthogonal, their product is orthogonal too and we get A = QR , with Q = (Hn−1 · · · H1 ) = H1 · · · Hn−1 . In practice, one will neither form the vectors wi nor calculate the Q matrix as all the information is contained in the ui vectors and the si scalars for i = 1, . . . , n. The possibility to choose the sign of s1 such that there never is a subtraction in the computation of µ1 is the key for the good numerical behavior of the QR factorization. We notice that the computation of u1 also involves a subtraction. It is possible to permute the column with the largest sum of squares below row i − 1 into column i during the ith step in order to minimize the risk of digit cancellation. This then leads to a factorization P A = QR , where P is a permutation matrix. Using this factorization of matrix A, it is easy to ﬁnd a solution for the system Ax = b. We ﬁrst compute y = Q b and then solve Rx = y by back substitution.
2.2.1
Computational Complexity
The computational complexity of the QR algorithm for a square matrix of order n is 4 n3 + O(n2 ). Hence the method is of O(n3 ) complexity. 3
2.2.2
Practical Implementation
Again as for the LU decomposition, the explicit computation of matrix Q is not necessary as we may build vector y during the triangularization process. Only the back substitution phase is needed to get the solution of the linear system Ax = b. As has already been mentioned, the routines for computing a QR factorization (or solving a system via QR) are readily available in LAPACK and are implemented in MATLAB.
2.3
Direct Methods for Sparse Matrices
In many cases, matrix A of the linear system contains numerous zero entries. This is particularly true for linear systems derived from large macroeconometric models. Such a situation may be exploited in order to organize the computations in a way that involves only the nonzero elements. These techniques are known as sparse direct methods (see e.g. Duﬀ et al. [30]) and crucial for eﬃcient solution of linear systems in a wide class of practical applications.
2.3 Direct Methods for Sparse Matrices
11
2.3.1
Data Structures and Storage Schemes
The interest of considering sparse structures is twofold: ﬁrst, the information can be stored in a much more compact way; second, the computations may be performed avoiding redundant arithmetic operations involving zeros. These two aspects are somehow conﬂicting as a compact storage scheme may involve more time consuming addressing operations for performing the computations. However, this conﬂict vanishes quickly when large problems are considered. In order to deﬁne our idea more clearly, let us deﬁne the density of a matrix as the ratio between its nonzero entries and its total number of entries. Generally, when the size of the system gets larger, the density of the corresponding matrix decreases. In other words, the larger the problem is, the sparser its structure becomes. Several storage structures exist for a same sparse matrix. There is no one best data structure since the choice depends both on the data manipulations the computations imply and on the computer architecture and/or language in which these are implemented. The following three data structures are generally used: • coordinate scheme, • list of successors (collection of sparse vectors), • linked list. The following example best illustrates these storage schemes. We consider the 5 × 5 sparse matrix
0 0 A= 0 3.1 0 −2 5 0 0 0 0 0 1.7 0 1.2 0 7 0 −0.2 −3 0.5 0 6 0 0 .
Coordinate Scheme In this case, three arrays are used: two integer arrays for the row and column indices—respectively r and c—and a real array x containing the elements. For our example we have
r c x 4 1 3.1 1 2 −2 2 2 5 3 3 1.7 5 3 1.2 2 4 7 4 4 −0.2 5 4 −3 1 5 0.5 3 5 6
.
Each entry of A is represented by a triplet and corresponds to a column in the table above. Such a storage scheme needs less memory than a full storage if the density of A is less than 1 . The insertion and deletion of elements are 3 easy to perform, whereas the direct access of elements is relatively complex. Many computations in linear algebra involve successive scans of the columns of a matrix which is diﬃcult to carry out using this representation.
2.3 Direct Methods for Sparse Matrices
12
List of successors (Collection of Sparse Vectors) With this storage scheme, the sparse matrix A is stored as the concatenation of the sparse vectors representing its columns. Each sparse vector consists of a real array containing the nonzero entries and an integer array of corresponding row indices. A second integer array gives the locations in the other arrays of the ﬁrst element in each column. For our matrix A, this representation is
index h index x 1 1 1 4 3.1 2 2 3 4 2 1 −2 4 6 3 2 5 5 9 4 3 1.7 6 11 5 5 1.2 6 2 7 7 4 −0.2 8 5 −3 9 1 0.5 10 3 6
The integer array h contains the addresses of the list of row elements in and x. For instance, the nonzero entries in column 4 of A are stored at positions h(4) = 6 to h(5)−1 = 9−1 = 8 in x. Thus, the entries are x(6) = 7, x(7) = −0.2 and x(8) = −3. The row indices are given by the same locations in array , i.e. (6) = 2, (7) = 4 and (8) = 5. MATLAB mainly uses this data structure to store its sparse matrices, see Gilbert et al. [44]. The main advantage is that columns can be easily accessed, which is of very important for numerical linear algebra algorithms. The disadvantage of such a representation is the diﬃculty of inserting new entries. This arises for instance when adding a row to another. Linked List The third alternative that is widely used for storing sparse matrices is the linked list. Its particularity is that we deﬁne a pointer (named head) to the ﬁrst entry and each entry is associated to a pointer pointing to the next entry or to the null pointer (named 0) for the last entry. If the matrix is stored by columns, we start a new linked list for each column and therefore we have as many head pointers as there are columns. Each entry is composed of two pieces: the row index and the value of the entry itself. This is represented by the picture:
head 1
E
4
3.1
0
. . .
head 5
E
1
0.5
•
E
3
6
0
The structure can be implemented as before with arrays and we get
2.3 Direct Methods for Sparse Matrices
13
index head index row entry link
1 4 1 2 7 2
2 5
3 9
4 1 3 5 −3 0
5 7 4 4 3.1 0 5 1 −2 6 6 2 5 0 7 1 0.5 8 8 3 6 0 9 3 1.7 10 10 5 1.2 0
2 4 −0.2 3
For instance, to retrieve the elements of column 3, we begin to read head(3)=9. Then row (9)=3 gives the row index, the entry value is entry(9)=1.7 and the pointer link (9)=10 gives the next index address. The values row (10)=5, entry(10)=1.2 and link (10)=0 indicate that the element 1.2 is at row number 5 and is the last entry of the column. The obvious advantage is the ease with which elements can be inserted and deleted: the pointers are simply updated to take care of the modiﬁcation. This data structure is close to the list of successors representation, but does not necessitate contiguous storage locations for the entries of a same column. In practice it is often necessary to switch from one representation to another. We can also note that the linked list and the list of successors can similarly be deﬁned rowwise rather than column wise.
2.3.2
Fillin in Sparse LU
Given a storage scheme, one could think of executing a Gaussian elimination as described in Section 2.1. However, by doing so we may discover that the sparsity of our initial matrix A is lost and we may obtain relatively dense matrices L and U . Indeed, depending on the choice of the pivots, the number of entries in L and U may vary. From Equation (2.2), we see that at step k of the Gaussian elimination algorithm, we subtract two matrices in order to zero the elements below the diagonal of the kth column. Depending on the Gauss vector τ (k) , matrix τ (k) ek C may contain nonzero elements which do not exist in matrix C. This creation of new elements is called ﬁllin. A crucial problem is then to minimize the ﬁllin as the number of operations is proportional to the density of the submatrix to be triangularized. Furthermore, a dense matrix U will result in an expensive back substitution phase. A minimum ﬁllin may however conﬂict with the pivoting strategy, i.e. the pivot chosen to minimize the ﬁllin may not correspond to the element with maximum magnitude among the elements below the kth diagonal as deﬁned by Equation (2.3). A common tradeoﬀ to limit the loss of numerical stability of the sparse Gaussian elimination is to accept a pivot element satisfying the following threshold inequality akk  ≥ u max aik  ,
i>k (k) (k)
where u is the threshold parameter and belongs to (0, 1]. A choice for u suggested by Duﬀ, Erisman and Reid [30] is u = 0.1 . This parameter heavily inﬂuences the ﬁllin and hence the complexity of the method.
2.4 Stationary Iterative Methods
14
2.3.3
Computational Complexity
It is not easy to establish an exact operation count for the sparse LU. The count depends on the particular structure of matrix A and on the chosen pivoting strategy. For a good implementation, we may expect a complexity of O(c2 n) where c is the average number of elements in a row and n is the order of matrix A.
2.3.4
Practical Implementation
A widely used code for the direct solution of sparse linear systems is the Harwell MA28 code available on NETLIB, see Duﬀ [29]. A new version called MA48 is presented in Duﬀ and Reid [31]. The software MATLAB has its own implementation using partial pivoting and minimumdegree ordering for the columns to reduce ﬁllin, see Gilbert et al. [44] and Gilbert and Peierls [45]. Other direct sparse solvers are also available through NETLIB (e.g. Y12MA, UMFPACK, SuperLU, SPARSE).
2.4
Stationary Iterative Methods
Iterative methods form an important class of solution techniques for solving large systems of equations. They can be an interesting alternative to direct methods because they take into account the sparsity of the system and are moreover easy to implement. Iterative methods may be divided into two classes: stationary and nonstationary. The former rely on invariant information from an iteration to another, whereas the latter modify their search by using the results of previous iterations. In this section, we present stationary iterative methods such as Jacobi, GaussSeidel and SOR techniques. The solution x∗ of the system Ax = b can be approximated by replacing A by a simpler nonsingular matrix M and by rewriting the systems as, M x = (M − A)x + b . In order to solve this equivalent system, we may use the following recurrence formula from a chosen starting point x0 , M x(k+1) = (M − A)x(k) + b , k = 0, 1, 2, . . . . (2.4)
At each step k the system (2.4) has to be solved, but this task can be easy according to the choice of M . The convergence of the iterates to the solution is not guaranteed. However, if the sequence of iterates {x(k) }k=0,1,2,... converges to a limit x(∞) , then we have x(∞) = x∗ , since relation (2.4) becomes M x(∞) = (M − A)x(∞) + b, that is Ax(∞) = b.
2.4 Stationary Iterative Methods
15
The iterations should be carried out an inﬁnite number of times to reach the solution, but we usually obtain a good approximation of x∗ after a fairly small number of iterations. There is a tradeoﬀ between the ease in computing x(k+1) from (2.4) and the speed of convergence of the stationary iterative method. The simplest choice for M would be to take M = I and the fastest convergence would be obtained by setting M = A. Of course, the choices of M that are of interest to us lie between these two extreme cases. Let us split the original system matrix A into A=L+D+U , where D is the diagonal of matrix A and L and U are the strictly lower and upper triangular parts of A, deﬁned respectively by dii = aii for all i, lij = aij for i > j and uij = aij for i < j.
2.4.1
Jacobi Method
One of the simplest iterative procedures is the Jacobi method, which is found by setting M = D. If we assume that the diagonal elements of A are nonzero, then solving the system Dx(k+1) = c for x(k+1) is easy; otherwise, we need to permute the equations to ﬁnd such a matrix D. We can note that when the model is normalized, we have D = I and the iterations are further simpliﬁed. The sequence of Jacobi’s iterates is deﬁned in matrix form by Dx(k+1) = −(L + U )x(k) + b , or by Algorithm 1 Jacobi Method
Given a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence for i = 1, . . . , n (k+1) (k) = (bi − aij xi )/aii xi
j=i
k = 0, 1, 2, . . . ,
end end
In this method, all the entries of the vector x(k+1) are computed using only the entries of x(k) . Hence, two separate vectors must be stored to carry out the iterations.
2.4.2
GaussSeidel Method
In the GaussSeidel method (GS), we use the most recently available information to update the iterates. In this case, the ith component of x(k+1) is computed using the (i − 1) ﬁrst entries of x(k+1) that have already been obtained and the (n − i − 1) other entries from x(k) .
2.4 Stationary Iterative Methods
16
This process amounts to using M = L + D and leads to the formula (L + D)x(k+1) = −U x(k) + b , or to the following algorithm: Algorithm 2 GaussSeidel Method
Given a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence for i = 1, . . . , n (k+1) (k+1) = (bi − aij xi )− xi
j<i j>i
aij xi )/aii
(k)
end end
The matrix formulation of the iterations is useful for theoretical purposes, but the actual computation will generally be implemented componentwise as in Algorithm 1 and Algorithm 2.
2.4.3
Successive Overrelaxation Method
A third useful technique called SOR for Successive Overrelaxation method is very closely related to the GaussSeidel method. The update is computed as an (k+1) extrapolation of the GaussSeidel step as follows: let xGS denote the (k + 1) iterate for the GS method; the new iterates can then be written as in the next algorithm. Algorithm 3 Successive Overrelaxation Method
Given a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence (k+1) by Algorithm 2 Compute xGS for i=1,. . . ,n (k+1) (k) (k+1) (k) = xi + ω(xGS,i − xi ) xi end end
The scalar ω is called the relaxation parameter and its optimal value, in order to achieve the fastest convergence, depends on the characteristics of the problem in question. A necessary condition for the method to converge is that ω lies in the interval (0, 2]. When ω < 1, the GS step is dampened and this is sometimes referred to as underrelaxation. In matrix form, the SOR iteration is deﬁned by (ωL + D)x(k+1) = ((1 − ω)D − ωU )x(k) + ωb , When ω is unity, the SOR method collapses to GS. k = 0, 1, 2, . . . . (2.5)
2.4 Stationary Iterative Methods
17
2.4.4
Fast GaussSeidel Method
The idea of extrapolating the step size to improve the speed of convergence can also be applied to SOR iterates and gives rise to the Fast GaussSeidel method (FGS) or Accelerated Over Relaxation method, see Hughes Hallett [68] and Hadjidimos [59]. Let us denote by xSOR the (k + 1) iterate obtained by Equation (3); then the FGS iterates are deﬁned by Algorithm 4 FGS Method
Given a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence (k+1) Compute xSOR by Algorithm 3 for i = 1, . . . , n (k+1) (k) (k+1) (k) = xi + γ(xSOR,i − xi ) xi end end
(k+1)
This method may be seen as a secondorder method, since it uses a SOR iterate as an intermediate step to compute its next guess, and that the SOR already uses the information from a GS step. It is easy to see that when γ = 1, we ﬁnd the SOR method. Like ω in the SOR part, the choice of the value for γ is not straightforward. For some problems, the optimal choice of ω can be explicitly found (this is discussed in Hageman and Young [60]). However, it cannot be determined a priori for general matrices. There is no way of computing the optimal value for γ cheaply and some authors (e.g. Hughes Hallett [69], Yeyios [103]) oﬀered approximations of γ. However, numerical tests produced variable outcomes: sometimes the approximation gave good convergence rates, sometimes poor ones, see HughesHallett [69]. As for the ω parameter, the value of γ is usually chosen by experimentation on the characteristics of system at stake.
2.4.5
Block Iterative Methods
Certain problems can naturally be decomposed into a set of subproblems with more or less tight linkages.4 In economic analysis, this is particularly true for multicountry macroeconometric models where the diﬀerent country models are linked together by a relatively small number of trade relations for example (see Faust and Tryon [35]). Another such situation is the case of disaggregated multisectorial models where the links between the sectors are relatively weak. In other problems where such a decomposition does not follow from the construction of the system, one may resort to a partition where the subsystems are easier to solve. A block iterative method is then a technique where one iterates over the subsystems. The technique to solve the subsystem is free and not relevant for the
4 The original problem is supposed to be indecomposable in the sense described in Section 3.1.
2.4 Stationary Iterative Methods
18
discussion. Let us suppose the matrix of our system is partitioned in the form
A11 A21 A= . . . AN1 A12 A22 . . . AN2 ··· ··· ··· A1N A2N . . . ANN
where the diagonal blocks Aii i = 1, 2, . . . , N are square. We deﬁne the block diagonal matrix D, the block lower triangular matrix L and the block upper triangular matrix U such that A = D + L + U :
D= 0 ··· 0 A11 0 0 A22 · · · . . . .. . . . . . . . 0 0 · · · AN N 0 0 0 A21 ,L = . . . . . . AN 1 AN 2 ··· ··· .. . ··· 0 0 . . . 0 0 A12 · · · A1N 0 · · · A2N 0 ,U = . . . .. . . . . . . . 0 0 ··· 0 .
If we write the problem Ay = b under the same partitioned form, we have y1 b1 A11 · · · A1N . . . = . . . . . . . . . AN 1 · · · AN N yN bN or else
N
Aij yj = bi ,
j=1
i = 1, 2, . . . , N .
Suppose the Aii i = 1, 2, . . . , N are nonsingular, then the following solution scheme may be applied: Algorithm 5 Block Jacobi method (BJ)
Given a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence (k+1) : Solve for yi Aii yi end
(k+1) N
= bi −
j=1 j=i
Aij yj
(k)
,
i = 1, 2, . . . , N
As we only use the information of step k to compute yi a block iterative Jacobi method (BJ).
(k+1)
, this scheme is called
We can certainly use the most recent available information on the y’s for updating y (k+1) and this leads to the block GaussSeidel method (BGS):
2.4 Stationary Iterative Methods
19
Algorithm 6 Block GaussSeidel method (BGS)
Given a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence (k+1) : Solve for yi Aii yi end
(k+1) i−1
= bi −
j=1
Aij yj
(k+1)
N j=i+1
−
Aij yj
(k)
,
i = 1, 2, . . . , N
Similarly to the presentation in Section 2.4.3, the SOR option can also be applied as follows: Algorithm 7 Block successive over relaxation method (BSOR)
Given a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence (k+1) : Solve for yi Aii yi end
(k+1)
= Aii yi
(k)
i−1 j=1
+ω
bi −
Aij yj
(k+1)
N
−
j=i+1
Aij yj
(k)
− Aii yi
(k)
,
i = 1, 2, . . . , N
We assume that the systems Aii yi = ci can be solved by either direct or iterative methods. The interest of such block methods is to oﬀer possibilities of splitting the problem in order to solve one piece at a time. This is useful when the size of the problem is such that it cannot entirely ﬁt in the memory of the computer. Parallel computing also allows taking advantage of a block Jacobi implementation, since diﬀerent processors can simultaneously take care of diﬀerent subproblems and thus speed up the solution process, see Faust and Tryon [35].
2.4.6
Convergence
Let us now study the convergence of the stationary iterative techniques introduced in the last section. The error at iteration k is deﬁned by e(k) = (x(k) − x∗ ) and subtracting Equation 2.4 evaluated at x∗ to the same evaluated at x(k) , we get M e(k) = (M − A)e(k−1) . We can now relate e(k) to e(0) by writing e(k) = Be(k−1) = B 2 e(k−2) = · · · = B k e(0) , where B is a matrix deﬁned to be M −1 (M − A). Clearly, the convergence of {x(k) }k=0,1,2,... to x∗ depends on the powers of matrix B: if limk→∞ B k = 0, then limk→∞ x(k) = x∗ . It is not diﬃcult to show that
k→∞
lim B k = 0 ⇐⇒ λi  < 1 ∀i .
2.4 Stationary Iterative Methods
20
Indeed, if B = P JP −1 where J is the Jordan canonical form of B, then B k = P J k P −1 and limk→∞ B k = 0 if and only if limk→∞ J k = 0. The matrix J is formed of Jordan blocks Ji and we see that the kth power (for k larger than the size of the block) of Ji is
k (Ji ) = λk i
k−1 kλi
k 2 .. .
k−2 λi
···
k n−1 . . .
λk−n+1 i
,
..
.
k k−2 λi 2 k−1 kλi λk i
and therefore that the powers of J tend to zero if and only if λi  < 1 for all i. We can write the diﬀerent matrices governing the convergence for each stationary iterative method as follows: BJ BGS Bω = = = −D−1 (L + U ) −(L + D)−1 U (ωL + D)−1 ((1 − ω)D − ωU ) for Jacobi’s method, for GaussSeidel, for SOR.
Therefore, the speed of convergence of such methods depends on the spectral radius of B, denoted by ρ(B) = maxi λi  where λi stands for the ith eigenvalue of matrix B. The FGS method converges for some γ > 0, if the real part of the eigenvalues of the matrix Bω is less than unity. Given that the method converges, i.e. that ρ(B) < 1, the number of iterations is approximately log , log ρ(B) with a convergence criterion5 expressed as max
i
xi
(k)
− xi
(k−1)

(k−1) xi 
< .
Hence, to minimize the number of iterations, we seek a splitting of matrix A and parameters that yield a matrix B with the lowest possible spectral radius. Diﬀerent rowcolumn permutations of A inﬂuence ρ(B) when GS, SOR and FGS methods are applied, whereas Jacobi method is invariant to such permutations. These issues are discussed in more detail in Section 3.3. For matrices without special structure, these problems do not have a practical solution so far.
2.4.7
Computational Complexity
The number of elementary operations for an iteration of Jacobi or GaussSeidel is (2c + 1)n where c is the average number of elements in a row of A. For SOR, the count is (2c + 4)n and for FGS (2c + 7)n. Therefore, iterative methods become competitive with sparse direct methods if the number of iterations K needed to converge is of order c or less.
5 See
Section 2.13 for a discussion of stopping criteria.
2.5 Nonstationary Iterative Methods
21
2.5
Nonstationary Iterative Methods
Nonstationary methods have been more recently developed. They use information that changes from iteration to iteration unlike the stationary methods discussed in Section 2.4. These methods are computationally attractive as the operations involved can easily be executed on sparse matrices and also require few storage. They also generally show a better convergence speed than stationary iterative methods. Presentations of nonstationary iterative methods can be found for instance in Freund et al. [39], Barrett et al. [8], Axelsson [7] and Kelley [73]. First, we have to present some algorithms that solve particular systems, such as symmetric positive deﬁnite ones, from which were derived the nonstationary iterative methods for solving the general linear systems we are interested in.
2.5.1
Conjugate Gradient
The ﬁrst and perhaps best known of the nonstationary methods is the Conjugate Gradient (CG) method proposed by Hestenes and Stiefel [64]. This technique solves symmetric positive deﬁnite systems Ax = b by using only matrixvector products, inner products and vector updates. The method may also be interpreted as arising from the minimization of the quadratic function q(x) = 1 x Ax − x b where A is the symmetric positive deﬁnite matrix and b the 2 righthand side of the system. As the ﬁrst order conditions for the minimization of q(x) give the original system, the two approaches are equivalent. The idea of the CG method is to update the iterates x(i) in the direction p(i) and to compute the residuals r(i) = b − Ax(i) in such a way as to ensure that we achieve the largest decrease in terms of the objective function q and furthermore that the direction vectors p(i) are Aorthogonal. The largest decrease in q at x(0) is obtained by choosing an update in the direction −Dq(x(0) ) = b−Ax(0) . We see that the direction of maximum decrease is the residual of x(0) deﬁned by r(0) = b − Ax(0) . We can look for the optimum step length in the direction r(0) by solving the line search problem min q(x(0) + αr(0) ) .
α
As the derivative with respect to α is Dα q(x(0) + αr(0) ) = x(0) Ar(0) + αr(0) Ar(0) − b r(0) = (x(0) A − b )r(0) + αr(0) Ar(0) = r(0) r(0) + αr(0) Ar(0) , the optimal α is r(0) r(0) . r(0) Ar(0) The method described up to now is just a steepest descent algorithm with exact line search on q. To avoid the convergence problems which are likely to arise with this technique, it is further imposed that the update directions p(i) be α0 = −
2.5 Nonstationary Iterative Methods
22
Aorthogonal (or conjugate with respect to A)—in other words, that we have p(i) Ap(j) = 0 i=j. (2.6)
It is therefore natural to choose a direction p(i) that is closest to r(i−1) and satisﬁes Equation (2.6). It is possible to show that explicit formulas for such a p(i) can be found, see e.g. Golub and Van Loan [56, pp. 520–523]. These solutions can be expressed in a computationally eﬃcient way involving only one matrixvector multiplication per iteration. The CG method can be formalized as follows: Algorithm 8 Conjugate Gradient
Compute r (0) = b − Ax(0) for some initial guess x(0) for i = 1, 2, . . . until convergence ρi−1 = r (i−1) r (i−1) if i = 1 then p(1) = r (0) else βi−1 = ρi−1 /ρi−2 p(i) = r (i−1) + βi−1 p(i−1) end q (i) = Ap(i) αi = ρi−1 /(p(i) q (i) ) x(i) = x(i−1) + αi p(i) r (i) = r (i−1) − αi q (i) end
In the conjugate gradient method, the ith iterate x(i) can be shown to be the vector minimizing (x(i) − x∗ ) A(x(i) − x∗ ) among all x(i) in the aﬃne subspace x(0) + span{r(0) , Ar(0) , . . . , Am−1 r(0) }. This subspace is called the Krylov subspace. Convergence of the CG Method In exact arithmetic, the CG method yields the solution in at most n iterations, see Luenberger [78, p. 248, Theorem 2]. In particular we have the following relation for the error in the kth CG iteration √ k √ κ−1 x(0) − x∗ 2 , x(k) − x∗ 2 ≤ 2 κ √ κ+1 where κ = κ2 (A), the condition number of A in the two norm. However, in ﬁnite precision and with a large κ, the method may fail to converge.
2.5.2
Preconditioning
As explained above, the convergence speed of the CG method is linked to the condition number of the matrix A. To improve the convergence speed of the CGtype methods, the matrix A is often preconditioned, that is transformed into ˆx b ˆ A = SAS , where S is a nonsingular matrix. The system solved is then Aˆ = ˆ
2.5 Nonstationary Iterative Methods
23
where x = (S )−1 x and ˆ = Sb. The matrix S is chosen so that the condition ˆ b ˆ number of matrix A is smaller than the condition number of the original matrix A and, hence, speeds up the convergence. ˆ To avoid the explicit computation of A and the destruction of the sparsity pattern of A, the methods are usually formalized in order to use the original matrix A directly. We can build a preconditioner M = (S S)−1 and apply the preconditioning step by solving the system M r = r. Since ˜ ˆ κ2 (S A(S )−1 ) = κ2 (S SA) = κ2 (M A), we do not actually form M from S but rather directly choose a matrix M . The choice of M is constrained to being a symmetric positive deﬁnite matrix. The preconditioned version of the CG is described in the following algorithm. Algorithm 9 Preconditioned Conjugate Gradient
Compute r (0) = b − Ax(0) for some initial guess x(0) for i = 1, 2, . . . until convergence Solve M r (i−1) = r (i−1) ˜ ρi−1 = r (i−1) r (i−1) ˜ if i = 1 then ˜ p(1) = r (0) else βi−1 = ρi−1 /ρi−2 p(i) = r (i−1) + βi−1 p(i−1) ˜ end q (i) = Ap(i) αi = ρi−1 /(p(i) q (i) ) x(i) = x(i−1) + αi p(i) r (i) = r (i−1) − αi q (i) end
As the preconditioning speeds up the convergence, the question of how to choose a good preconditioner naturally arises. There are two conﬂicting goals in the choice of M . First, M should reduce the condition number of the system solved as much as possible. To achieve this, we would like to choose an M as close to matrix A as possible. Second, since the system M r = r has to be solved ˜ at each iteration of the algorithm, this system should be as easy as possible to solve. Clearly, the preconditioner will be chosen between the two extreme cases M = A and M = I. When M = I, we obtain the unpreconditioned version of the method, and when M = A, the complete system is solved in the preconditioning step. One possibility is to take M = diag(a11 , . . . , ann ). This is not useful if the system is normalized, as it is sometimes the case for macroeconometric systems. Other preconditioning methods do not explicitly construct M . Some authors, for instance Dubois et al. [28] and Adams [1], suggest to take a given number of steps of an iterative method such as Jacobi. We can note that taking one step of Jacobi amounts to doing a diagonal scaling M = diag(a11 , . . . , ann ), as mentioned above. Another common approach is to perform an incomplete LU factorization (ILU)
2.5 Nonstationary Iterative Methods
24
of matrix A. This method is similar to the LU factorization except that it respects the pattern of nonzero elements of A in the lower triangular part of L and the upper triangular part of U . In other words, we apply the following algorithm: Algorithm 10 Incomplete LU factorization
Set L = In The identity matrix of order n for k = 1, . . . , n for i = k + 1, . . . , n Respect the sparsity pattern of A if aki = 0 then ik = 0 else ik = aik /akk for j = k + 1, . . . , n Respect the sparsity pattern of A if aij = 0 then aij = aij − ik akj Gaussian elimination end end end end end Set U = upper triangular part of A
This factorization can be written as A = LU + R where R is a matrix containing the elements that would ﬁllin L and U and is not actually computed. The approximate system LU r = r is then solved using forward and backward ˜ substitution in the preconditioning step of the nonstationary method used. A more detailed analysis of preconditioning and other incomplete factorizations may be found in Axelsson [7].
2.5.3
Conjugate Gradient Normal Equations
In order to deal with nonsymmetric systems, it is necessary either to convert the original system into a symmetric positive deﬁnite equivalent one, or to generalize the CG method. The next sections discuss these possibilities. The ﬁrst approach, and perhaps the easiest, is to transform Ax = b into a symmetric positive deﬁnite system by multiplying the original system by A . As A is assumed to be nonsingular, A A is symmetric positive deﬁnite and the CG algorithm can be applied to A Ax = A b. This method is known as the Conjugate Gradient Normal Equation (CGNE) method. A somewhat similar approach is to solve AA y = b by the CG method and then to compute x = A y. The diﬀerence between the two approaches is discussed in Golub and Ortega [55, pp. 397ﬀ]. Besides the computation of the matrixmatrix and matrixvector products, these methods have the disadvantage of increasing the condition number of the system solved since κ2 (A A) = (κ2 (A))2 . This in turn increases the number of iterations of the method, see Barrett et al. [8, p. 16] and Golub and Van Loan [56]. However, since the transformation and the coding are easy to implement, the
2.5 Nonstationary Iterative Methods
25
method might be appealing in certain circumstances.
2.5.4
Generalized Minimal Residual
Paige and Saunders [86] proposed a variant of the CG method that minimizes the residual r = b − Ax in the 2norm. It only requires the system to be symmetric and not positive deﬁnite. It can also be extended to unsymmetric systems if some more information is kept from step to step. This method is called GMRES (Generalized Minimal Residual) and was introduced by Saad and Schultz [90]. The diﬃculty is not to loose the orthogonality property of the direction vectors p(i) . To achieve this goal, all previously generated vectors have to be kept in order to build a set of orthogonal directions, using for instance a modiﬁed GramSchmidt orthogonalization process. However, this method requires the storage and computation of an increasing amount of information. Thus, in practice, the algorithm is very limited because of its prohibitive cost. To overcome these diﬃculties, the method may be restarted after a chosen number of iterations m; the information is erased and the current intermediate results are used as a new starting point. The choice of m is critically important for the restarted version of the method, usually referred to as GMRES(m). The pseudocode for this method is given hereafter.
2.5 Nonstationary Iterative Methods
26
Algorithm 11 Preconditioned GMRES(m)
¯ Choose an initial guess x(0) and initialize an (m + 1) × m matrix Hm to hij = 0 for k = 1, 2, . . . until convergence Solve for r (k−1) M r (k−1) = b − Ax(k−1) β = r (k−1) 2 ; v (1) = r (k−1) /β ; q = m for j = 1, . . . , m Solve for w M w = Av (j) for i = 1, . . . , j Orthonormal basis by modiﬁed GramSchmidt hij = w v (i) w = w − hij v (i) end hj+1,j = w 2 if hj+1,j is suﬃciently small then q=j exit from loop on j end v (j+1) = w/hj+1,j end Vm = [v (1) . . . v (q) ] ¯ Use the method given below to compute ym ym = argminy βe1 − Hm y 2 (k) (k−1) x =x + Vm ym Update the approximate solution end ¯ Apply Givens rotations to triangularize Hm to solve the leastsquares prob¯ lem involving the upper Hessenberg matrix Hm d = β e1 e1 is [1 0 . . . 0] for i = 1, . . . , q Compute the sine and cosine values of the rotation if hii = 0 then c=1; s=0 else if hi+1,i  > hii  then √ t = −hii /hi+1,i ; s = 1/ 1 + t2 ; c = s t else √ t = −hi+1,i /hii ; c = 1/ 1 + t2 ; s = c t end end t = c di ; di+1 = −s di ; di = t hij = c hij − s hi+1,j ; hi+1,j = 0 ¯ Apply rotation to zero the subdiagonal of Hm for j = i + 1, . . . , m t1 = hij ; t2 = hi+1,j hij = c t1 − s t2 ; hi+1,j = s t1 + c t2 end end ¯ Solve the triangular system Hm ym = d by back substitution.
Another issue with GMRES is the use of the modiﬁed GramSchmidt method which is fast but not very reliable, see Golub and Van Loan [56, p. 219]. For illconditioned systems, a Householder orthogonalization process is certainly a better alternative, even if it leads to an increase in the complexity of the algorithm.
2.5 Nonstationary Iterative Methods
27
Convergence of GMRES The convergence properties of GMRES(m) are given in the original paper which introduces the method, see Saad and Schultz [90]. A necessary and suﬃcient condition for GMRES(m) to converge appears in the results of recent research, see Strikwerda and Stodder [95]: Theorem 3 A necessary and suﬃcient condition for GMRES(m) to converge is that the set of vectors Vm = {vv Aj v = 0 contains only the vector 0. Speciﬁcally, it follows that for a symmetric or skewsymmetric matrix A, GMRES(2) converges. Another important result stated in [95] is that, if GMRES(m) converges, it does so with a geometric rate of convergence: Theorem 4 If r(k) is the residual after k steps of GMRES(m), then r(k) where ρm = min
v =1 2 2
for
1 ≤ j ≤ m}
≤ (1 − ρm )k r(0)
2 2
m (1) Av (j) )2 j=1 (v m (j) 2 2 j=1 Av
and the vectors v (j) are the unit vectors generated by GMRES(m). Similar conditions and rate of convergence estimates are also given for the preconditioned version of GMRES(m).
2.5.5
BiConjugate Gradient Method
The BiConjugate Gradient method (BiCG) takes a diﬀerent approach based upon generating two mutually orthogonal sequences of residual vectors {˜(i) } r and {r(j) } and Aorthogonal sequences of direction vectors {˜(i) } and {p(j) }. p The interpretation in terms of the minimization of the residuals r(i) is lost. The updates for the residuals and for the direction vectors are similar to those of the CG method but are performed not only using A but also A . The scalars αi and βi ensure the biorthogonality conditions r(i) r(j) = p(i) Ap(j) = 0 if i = j. ˜ ˜ The algorithm for the Preconditioned BiConjugate Gradient method is given hereafter.
2.5 Nonstationary Iterative Methods
28
Algorithm 12 Preconditioned BiConjugate Gradient
Compute r (0) = b − Ax(0) for some initial guess x(0) Set r (0) = r (0) ˜ for i = 1, 2, . . . until convergence Solve M z (i−1) = r (i−1) Solve M z (i−1) = r (i−1) ˜ ˜ ρi−1 = z (i−1) r (i−1) ˜ if ρi−1 = 0 then the method fails if i = 1 then p(i) = z (i−1) p(i) = z (i−1) ˜ ˜ else βi−1 = ρi−1 /ρi−2 p(i) = z (i−1) + βi−1 p(i−1) p(i) = z (i−1) + βi−1 p(i−1) ˜ ˜ ˜ end q (i) = Ap(i) q (i) = A p(i) ˜ ˜ αi = ρi−1 /(˜(i) q (i) ) p x(i) = x(i−1) + αi p(i) r (i) = r (i−1) − αi q (i) r (i) = r (i−1) − αi q (i) ˜ ˜ ˜ end
The disadvantages of the method are the potential erratic behavior of the norm of the residuals ri and unstable behavior if ρi is very small, i.e. the vectors r(i) and r(i) are nearly orthogonal. Another potential breakdown situation is when ˜ p(i) q (i) is zero or close to zero. ˜ Convergence of BiCG The convergence of BiCG may be irregular, but when the norm of the residual is signiﬁcantly reduced, the method is expected to be comparable to GMRES. The breakdown cases may be avoided by sophisticated strategies, see Barrett et al. [8] and references therein. Few other convergence results are known for this method.
2.5.6
BiConjugate Gradient Stabilized Method
A version of the BiCG method which tries to smooth the convergence was introduced by van der Vorst [99]. This more sophisticated method is called BiConjugate Gradient Stabilized method (BiCGSTAB) and its algorithm is formalized in the following.
2.6 Newton Methods
29
Algorithm 13 BiConjugate Gradient Stabilized
Compute r (0) = b − Ax(0) for some initial guess x(0) Set r = r (0) ˜ for i = 1, 2, . . . until convergence ˜ ρi−1 = r r (i−1) if ρi−1 = 0 then the method fails if i = 1 then p(i) = r (i−1) else βi−1 = (ρi−1 /ρi−2 )(αi−1 /wi−1 ) p(i) = r (i−1) + βi−1 (p(i−1) − wi−1 v (i−1) ) end Solve M p = p(i) ˆ v (i) = Aˆ p r αi = ρi−1 /˜ v (i) s = r (i−1) − αi v (i) if s is small enough then ˆ x(i) = x(i−1) + αi p stop end Solve M s = s ˆ t = Aˆ s wi = (t s)/(t t) ˆ ˆ x(i) = x(i−1) + αi p + wi s r (i) = s − wi t For continuation it is necessary that wi = 0 end
The method is more costly in terms of required operations than BiCG, but does not involve the transpose of matrix A in the computations; this can sometimes be an advantage. The other main advantages of BiCGSTAB are to avoid the irregular convergence pattern of BiCG and usually to show a better convergence speed.
2.5.7
Implementation of Nonstationary Iterative Methods
The codes for conjugate gradient type methods are easy to implement, but the interested user should ﬁrst check the NETLIB repository. It contains the package SLAP 2.0 that solves sparse and large linear systems using preconditioned iterative methods. We used the MATLAB programs distributed with the Templates book by Barrett et al. [8] as a basis and modiﬁed this code for our experiments.
2.6
Newton Methods
In this section and the following ones, we present classical methods for the solution of systems of nonlinear equations. The following notation will be used for nonlinear systems. Let F : Rn → Rn represent a multivariable function.
2.6 Newton Methods
30
The solution of the nonlinear system amounts then to that f1 (x∗ ) = f2 (x∗ ) = ∗ F (x ) = 0 ←→ . . . fn (x∗ ) =
the vector x∗ ∈ Rn such 0 0 (2.7) 0.
We assume F to be continuously diﬀerentiable in an open convex set U ⊂ Rn . In the next section, we discuss the classical Newton recalling the main results about convergence. We then turn to modiﬁcations of the classical method where the exact Jacobian matrix is replaced by some approximation. The classical Newton method proceeds in approximating iteratively x∗ by a sequence {x(k) }k=0,1,2,... . Given a point x(k) ∈ Rn and an evaluation of F (x(k) ) and of the Jacobian matrix DF (x(k) ), we can construct a better approximation, called x(k+1) of x∗ . We may approximate F (x) in the neighborhood of x(k) by an aﬃne function and get F (x) ≈ F (x(k) ) + DF (x(k) )(x − x(k) ) . (2.8) We can solve this local model to obtain the value of x that satisﬁes F (x(k) ) + DF (x(k) )(x − x(k) ) = 0 , i.e. the point x = x(k) − (DF (x(k) ))−1 F (x(k) ) . (2.9)
The value for x computed in (2.9) is again used to deﬁne a local model such as (2.8). This leads to build the iterates x(k+1) = x(k) − (DF (xk ))−1 F (x(k) ) . (2.10)
The algorithm that implements these iterations is called the classical Newton method and is formalized as follows: Algorithm 14 Classical Newton Method
Given F : Rn → Rn continuously diﬀerentiable and a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence Compute DF (x(k) ) Check that DF (x(k) ) is suﬃciently well conditioned Solve for s(k) DF (x(k) ) s(k) = −F (x(k) ) x(k+1) = x(k) + s(k) end
Geometrically, x(k+1) may be interpreted as the intersection of the n tangent hyperplanes to the functions f1 , f2 , . . . , fn with the hyperplane {xx = 0}. This local solution of the linearized problem is then used as a new starting point for the next guess. The main advantage of the Newton method is its quadratic convergence behavior, if appropriate conditions stated later are satisﬁed. However, this technique
2.6 Newton Methods
31
may not converge to a solution for starting points outside some neighborhood of the solution. The classical Newton algorithm is also computationally intensive since it requires at every iteration the evaluation of the Jacobian matrix DF (x(k) ) and the solution of a linear system.
2.6.1
Computational Complexity
The computationally expensive steps in the Newton algorithm are the evaluation of the Jacobian matrix and the solution of the linear system. Hence, for a dense Jacobian matrix, the complexity of the latter task determines an order of O(n3 ) arithmetic operations. If the system of equations is sparse, as is the case in large macroeconometric models, we obtain an O(c2 n) complexity (see Section 2.3) for the linear system. An analytical evaluation of the Jacobian matrix will automatically exploit the sparsity of the problem. Particular attention must be paid in case of a numerical evaluation of DF as this could introduce a O(n2 ) operation count. An iterative technique may also be utilized to approximate the solution of the linear system arising. Such techniques save computational eﬀort, but the number of iterations needed to satisfy a given convergence criterion is not known in advance. Possibly, the number of iterations of the iterative technique may be ﬁxed beforehand.
2.6.2
Convergence
To discuss the convergence of the Newton method, we need the following deﬁnition and theorem. Deﬁnition 1 A function G : Rn → Rn×m is said to be Lipschitz continuous on an open set U ⊂ Rn if for all x, y ∈ U there exists a constant γ such that G(y) − G(y) a ≤ γ y − x b , where · a is a norm on Rn×m and · b on Rn . The value of the constant γ depends on the norms chosen and the scale of DF . Theorem 5 Let F : Rn → Rm be continuously diﬀerentiable in the open convex set U ⊂ Rn , x ∈ U and let DF be Lipschitz continuous at x in the neighborhood U . Then, for any x + p ∈ U , F (x + p) − F (x) − DF (x)p ≤ where γ is the Lipschitz constant. This theorem gives a bound on how close the aﬃne F (x)+DF (x)p is to F (x+p). This bound contains the Lipschitz constant γ which measures the degree of nonlinearity of F . A proof of Theorem 5 can be found in Dennis and Schnabel [26, p. 75] for instance. γ p 2
2
,
2.7 Finite Diﬀerence Newton Method
32
The conditions for the convergence of the classical Newton method are then stated in the following theorem. Theorem 6 If F is continuously diﬀerentiable in an open convex set U ⊂ Rn containing x∗ with F (x∗ ) = 0, DF is Lipschitz continuous in a neighborhood of x∗ and DF (x∗ ) is nonsingular and such that DF (x∗ )−1 ≤ β > 0, then the iterates of the classical Newton method satisfy x(k+1) − x∗ ≤ βγ x(k) − x∗
2
,
k = 0, 1, 2, . . . ,
for a starting guess x(0) in a neighborhood of x∗ . Two remarks are triggered by this theorem. First, the method converges fast as the error of step k + 1 is guaranteed to be less than some proportion of the square of the error of step k, provided all the assumptions are satisﬁed. For this reason, the method is said to be quadratically convergent. We refer to Dennis and Schnabel [26, p. 90] for the proof of the theorem. The original works and further references are cited in Ortega and Rheinboldt [85, p. 316]. The constant βγ gives a bound for the relative nonlinearity of F and is a scale free measure since β is an upper bound for the norm of (DF (x∗ ))−1 . Therefore Theorem 6 tells us that the smaller this measure of relative nonlinearity, the faster Newton method converges. The second remark concerns the conditions needed to verify a quadratic convergence. Even if the Lipschitz continuity of DF is veriﬁed, the choice of a starting point x(0) lying in a convergent neighborhood of the solution x∗ may be an a priori diﬃcult problem. For macroeconometric models the starting values can naturally be chosen as the last period solution, which in many cases is a point not too far from the current solution. Macroeconometric models do not generally show a high level of nonlinearity and, therefore, the Newton method is generally suitable to solve them.
2.7
Finite Diﬀerence Newton Method
An alternative to an analytical Jacobian matrix is to replace the exact derivatives by ﬁnite diﬀerence approximations. Even though nowadays software for symbolic derivation is readily available, there are situations where one might prefer to, or have to, resort to an approach which is easy to implement and only requires function evaluations. A circumstance where ﬁnite diﬀerences are certainly attractive occurs if the Newton algorithm is implemented on a SIMD computer. Such an example is discussed in Section 4.2.1. We may approximate the partial derivatives in DF (x) by the forward diﬀerence formula (DF (x))·j ≈ F (x + hj ej ) − F (x) = J·j hj j = 1, . . . , n . (2.11)
2.7 Finite Diﬀerence Newton Method
33
The discretization error introduced by this approximation veriﬁes the following bound γ J·j − (DF (x))·j 2 ≤ max hj  , 2 j where F a function satisfying Theorem 5. This suggests taking hj as small as possible to minimize the discretization error. A central diﬀerence approximation for DF (x) can also be used, F (x + hj ej ) − F (x − hj ej ) ¯ J·j = . 2hj (2.12)
The bound of the discretization error is then lowered to maxj (γ/6)h2 at the j cost of twice as many function evaluations. Finally, the choice of hj also has to be discussed in the framework of the numerical accuracy one can obtain on a digital computer. The approximation theory suggests to take hj as small as possible to reduce the discretization error in the approximation of DF . However, since the numerator of (2.11) evaluates to function values that are close, a cancellation error might occur so that the elements Jij may have very few or even no signiﬁcant digits. According to the theory, e.g. Dennis and Schnabel [26, p. 97], one may choose hj so that F (x + hj ej ) diﬀers from F (x) in at least the leftmost half of its signiﬁcant digits. Assuming that the relative error in computing F (x) is u, deﬁned as in Section A.1, then we would like to have √ fi (x + hj ej ) − fi (x) ≤ u ∀i, j . fi (x) √ The best guess is then hj = u xj in order to cope with the diﬀerent sizes of the elements of x, the discretization error and the cancellation error. In the case of central diﬀerence approximations, the choice for hj is modiﬁed to hj = u2/3 xj . The ﬁnite diﬀerence Newton algorithm can then be expressed as follows. Algorithm 15 Finite Diﬀerence Newton Method
Given F : Rn → Rn continuously diﬀerentiable and a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence Evaluate J (k) according to 2.11 or 2.12 Solve J (k) s(k) = −F (x(k) ) x(k+1) = x(k) + s(k) end
2.7.1
Convergence of the Finite Diﬀerence Newton Method
When replacing the analytically evaluated Jacobian matrix by a ﬁnite diﬀerence approximation, it can be shown that the convergence of the Newton iterative process remains quadratic if the ﬁnite diﬀerence step size is chosen to satisfy conditions speciﬁed in the following. (Proofs can be found, for instance, in Dennis and Schnabel [26, p. 95] or Ortega and Rheinboldt [85, p. 360].)
2.8 Simpliﬁed Newton Method
34
If the ﬁnite diﬀerence step size h(k) is invariant with respect to the iterations k, then the discretized Newton method shows only a linear rate of convergence. (We drop the subscript j for convenience.) If a decreasing sequence h(k) is imposed, i.e. limk→∞ h(k) = 0, the method achieves a superlinear rate of convergence. Furthermore, if one of the following conditions is veriﬁed, ∀k ≥ k1 , there exist constants c1 and k1 such that h(k)  ≤ c1 x(k) − x∗ there exist constants c2 and k2 such that h(k)  ≤ c2 F (x(k) ) ∀k ≥ k2 , (2.13) then the convergence is quadratic, as it is the case in the classical Newton method. The limit condition on the sequence h(k) may be interpreted as an improvement in the accuracy of the approximations of DF as we approach x∗ . Conditions (2.13) ensure a tight approximation of DF and therefore lead to the quadratic convergence of the method. In practice, however, none of the conditions (2.13) can be tested as neither x∗ nor c1 and c2 are known. They are nevertheless important from a theoretical point of view since they show that for good enough approximations of the Jacobian matrix, the ﬁnite diﬀerence Newton method will behave as well as the classical Newton method.
2.8
Simpliﬁed Newton Method
To avoid the repeated evaluation of the Jacobian matrix DF (x(k) ) at each step k, one may reuse the ﬁrst evaluation DF (x(0) ) for all subsequent steps k = 1, 2, . . . . This method is called simpliﬁed Newton method and it is attractive when the level of nonlinearity of F is not too high, since then the Jacobian matrix does not vary too much. Another advantage of this simpliﬁcation is that the linear system to be solved at each step is the same for diﬀerent righthand sides, leading to signiﬁcant savings in the computational work. As discussed before, the computationally expensive steps in the Newton method are the evaluation of the Jacobian matrix and the solution of the corresponding linear system. If a direct method is applied in the simpliﬁed method, these two steps are carried out only once and, for subsequent iterations, only the forward and back substitution phases are needed. In the one dimensional case, this technique corresponds to a parallelchord method. The ﬁrst chord is taken to be the tangent to the point at coordinates (x(0) , F (x(0) )) and for the next iterations this chord is simply shifted in a parallel way. To improve the convergence of this method, DF may occasionally be reevaluated by choosing an integer increasing function p(k) with values in the interval [0, k] and the linear system DF (x(p(k)) )s(k) = −F (x(k) ) solved. In the extreme case where p(k) = 0 , ∀k we have the simpliﬁed Newton method and, at the other end, when p(k) = k , ∀k we have the classical Newton
2.9 QuasiNewton Methods
35
method. The choice of the function p(k), i.e. the reevaluation scheme, has to be determined experimentally. Algorithm 16 Simpliﬁed Newton Method
Given F : Rn → Rn continuously diﬀerentiable and a starting point x(0) ∈ Rn for k = 0, 1, 2, . . . until convergence Compute DF (xp(k) ) if needed Solve for s(k) DF (xp(k) ) s(k) = −F (x(k) ) x(k+1) = x(k) + s(k) end
2.8.1
Convergence of the Simpliﬁed Newton Method
The kind of simpliﬁcation presented leads to a degradation of the speed of convergence as the Jacobian matrix is not updated at each step. However, one may note that for some macroeconometric models the nonlinearities are often such that this type of techniques may prove advantageous compared to the classical Newton iterations because of the computational savings that can be made. In the classical Newton method, the direction s(k) = −(DF (x(k) )−1 F (x(k) ) is a guaranteed descent direction for the function f (x) = 1 F (x) F (x) = 1 F (x) 2 2 2 2 since Df (x) = DF (x) F (x) and Df (x(k) ) s(k) = = −F (x(k) ) DF (x(k) )(DF (x(k) ))−1 F (x(k) ) −F (x
(k)
(2.14)
) F (x
(k)
)<0
for all F (x
(k)
) = 0 . (2.15)
In the simpliﬁed Newton method the direction of update is s(k) = −(DF (x(0) ))−1 DF (x(k) ) , which is a descent direction for the function f (x) as long as the matrix (DF (x(0) ))−1 DF (x(k) ) , is positive deﬁnite. If s(k) is not a descent direction, then the Jacobian matrix has to be reevaluated at x(k) and the method restarted form this point.
2.9
QuasiNewton Methods
The methods discussed previously did not use the exact evaluation of the Jacobian matrix but resorted to approximations. We will limit our presentation in this section to Broyden’s method which belongs to the class of so called QuasiNewton methods. QuasiNewton methods start either with an analytical or with a ﬁnite diﬀerence evaluation for the Jacobian matrix at the starting point x(0) , and therefore
2.9 QuasiNewton Methods
36
compute x(1) like the classical Newton method does. For the successive steps, DF (x(0) )—or an approximation J (0) to it—is updated using (x(0) , F (x(0) )) and (x(1) , F (x(1) )). The matrix DF (x(1) ) then can be approximated at little additional cost by a secant method. The secant approximation A(1) satisﬁes the equation A(1) (x(1) − x(0) ) = F (x(1) ) − F (x(0) ) . Matrix A(1) is obviously not uniquely deﬁned by relation (2.16). Broyden [20] introduced a criterion which leads to choosing—at the generic step k—a matrix A(k+1) deﬁned as A(k+1) where y (k) and s(k) (y (k) − A(k) s(k) ) s(k) s(k) s(k) (k+1) = F (x ) − F (x(k) ) (k+1) = x − x(k) . = A(k) + (2.17) (2.16)
Broyden’s method updates matrix A(k) by a rank one matrix computed only from the information of the current step and the preceding step. Algorithm 17 QuasiNewton Method using Broyden’s Update
Given F : Rn → Rn continuously diﬀerentiable and a starting point x(0) ∈ Rn Evaluate A(0) by DF (x(0) ) or J (0) for k = 0, 1, 2, . . . until convergence Solve for s(k) A(k) s(k) = −F (x(k) ) x(k+1) = x(k) + s(k) y (k) = F (x(k+1) ) − F (x(k) ) A(k+1) = A(k) + ((y (k) − A(k) s(k) ) s(k) )/(s(k) s(k) ) end
Broyden’s method may generate sequences of matrices {A(k) }k=0,1,... which do not converge to the Jacobian matrix DF (x∗ ), even though the method produces a sequence {x(k) }k=0,1,... converging to x∗ . Dennis and Mor´ [25] have shown that the convergence behavior of the method e is superlinear under the same conditions as for Newtontype techniques. The underlying reason that enables this favorable behavior is that A(k) −DF (x(k) ) stays suﬃciently small. From a computational standpoint, Broyden’s method is particularly attractive since the solution of the successive linear systems A(k) s(k) = −F ((k) ) can be determined by updating the initial factorization of A(0) for k = 1, 2, . . . . Such an update necessitates O(n2 ) operations, therefore reducing the original O(n3 ) cost of a complete refactorization. Practically, the QR factorization update is easier to implement than the LU update, see Gill et al. [47, pp. 125–150]. For sparse systems, however, the advantage of the updating process vanishes. A software reference for Broyden’s method is MINPACK by Mor´, Garbow and e Hillstrom available on NETLIB.
2.10 Nonlinear Firstorder Methods
37
2.10
Nonlinear Firstorder Methods
The iterative techniques for the solution of linear systems described in sections 2.4.1 to 2.4.4 can be extended to nonlinear equations. If we interpret the stationary iterations in algorithms 1 to 4 in terms of obtaining (k+1) xi as the solution of the jth equation with the other (n − 1) variables held ﬁxed, we may immediately apply the same idea to the nonlinear case. The ﬁrst issue is then the existence of a onetoone mapping between the set of equations {fi , i = 1, . . . , n} and the set of variables {xi , i = 1, . . . , n}. This mapping is also called a matching and it can be shown that its existence is a necessary condition for the solution to exist, see Gilli [50] or Gilli and Garbely [51]. A matching m must be provided in order to deﬁne the variable m(i) that has to be solved from equation i. For the method to make sense, the solution of the ith equation with respect to xm(i) must exist and be unique. This solution can then be computed using a one dimensional solution algorithm—e.g. a one dimensional Newton method. We can then formulate the nonlinear Jacobi algorithm. Algorithm 18 Nonlinear Jacobi Method
Given a matching m and a starting point x(0) ∈ Rn Set up the equations so that m(i) = i for i = 1, . . . , n for k = 0, 1, 2, . . . until convergence for i = 1, . . . , n Solve for xi (k) (k) fi (x1 , . . . , xi , . . . , xn ) = 0 (k+1) = xi and set xi end end
The nonlinear GaussSeidel is obtained by modifying the “solve” statement. Algorithm 19 Nonlinear GaussSeidel Method
Given a matching m and a starting point x(0) ∈ Rn Set up the equations so that m(i) = i for i = 1, . . . , n for k = 0, 1, 2, . . . until convergence for i = 1, . . . , n Solve for xi (k+1) (k+1) (k) (k) fi (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) = 0 (k+1) and set xi = xi end end
In order to keep notation simple, we will assume from now on that the equations and variables have been set up so that we have m(i) = i for i = 1, . . . , n. The nonlinear SOR and FGS6 algorithms are obtained by a straightforward
6 As
already mentioned in the linear case, the FGS method should be considered as a second
2.10 Nonlinear Firstorder Methods
38
modiﬁcation of the corresponding linear versions. If it is possible to isolate xi from fi (x1 , . . . , xn ) for all i, then we have a normalized system of equations. This is often the case in systems of equations arising in macroeconometric modeling. In such a situation, each variable is isolated as follows, (2.18) xi = gi (x1 , . . . , xi−1 , xi+1 , . . . , xn ), i = 1, 2, . . . , n . The “solve” statement in Algorithm 18 and Algorithm 19 is now dropped since the solution is given in an explicit form.
2.10.1
Convergence
The matrix form of these nonlinear iterations can be found by linearizing the equations around x(k) , which yields A(k) x(k) = b(k) , where A(k) = DF (x(k) ) and b(k) denotes the constant part of the linearization of F . As the path of the iterates {x(k) }k=0,1,2,... yields to diﬀerent matrices A(k) and vectors b(k) , the nonlinear versions of the iterative methods can no longer be considered stationary methods. Each system A(k) x(k) = b(k) will have a diﬀerent convergence behavior not only according to the splitting of A(k) and the updating technique chosen, but also because the values of the elements in matrix A(k) and vector b(k) change from an iteration to another. It follows that convergence criteria can only be stated for starting points x(0) within a neighborhood of the solution x∗ . Similarly to what has been presented in Section 2.4.6, we can evaluate the matrix B that governs the convergence at x∗ and state that, if ρ(B) < 1, then the method is likely to converge. The diﬃculty is that now the eigenvalues, and hence the spectral radius, vary with the solution path. The same is also true for the optimal values of the parameters ω and γ of the SOR and FGS methods. In such a framework, Hughes Hallett [69] suggests several ways of computing approximate optimal values for γ during the iterations. The simplest form is to take (k) (k−2) −1 ) , γk = (1 ± xi /xi the sign being positive if the iterations are cycling and negative if the iterations are monotonic; xi is the element which violates the most the convergence criterion. One should also constrain γk to lie in the interval [0, 2], which is a necessary condition for the FGS to converge. To avoid large ﬂuctuations of γk , one may smooth the sequence by the formula γk = αk γk + (1 − αk )γk−1 , ˜ where αk is chosen in the interval [0, 1]. We may note that such strategies can also be applied in the linear case to automatically set the value for γ.
order method.
2.11 Solution by Minimization
39
2.11
Solution by Minimization
In the preceding sections, methods for the solution of nonlinear systems of equations have been considered. An alternative to compute a solution of F (x) = 0 is to minimize the following objective function f (x) = F (x) where ·
a a
,
(2.19)
denotes a norm in Rn .
A reason that motivates such an alternative is that it introduces a criterion to decide whether x(k+1) is a better approximation to x∗ than x(k) . As at the solution F (x∗ ) = 0, we would like to compare the vectors F (x(k+1) ) and F (x(k) ), and to do so we compare their respective norms. What is required7 is that F (x(k+1) )
a
< F (x(k) )
a
,
which then leads us to the minimization of the objective function (2.19). A convenient choice is the standard euclidian norm, since it permits an analytical development of the problem. The minimization problem then reads min f (x) =
x
1 F (x) F (x) , 2
(2.20)
where the factor 1/2 is added for algebraic convenience. Thus methods for nonlinear leastsquares problem, such as GaussNewton or LevenbergMarquardt, can immediately be applied to this framework. Since the system is square and has a solution, we expect to have a zero residual function f at x∗ . In general it is advisable to take advantage of the structure of F to directly approach the solution of F (x) = 0. However, in some circumstances, resorting to the minimization of f (x) constitutes an interesting alternative. This is the case when the nonlinear equations contain numerical inaccuracies preventing F (x) = 0 from having a solution. If the residual f (x) is small, then the minimization approach is certainly preferable. To devise a minimization algorithm for f (x), we need the gradient Df (x) and the Hessian matrix D2 f (x), that is Df (x) D2 f (x) = = DF (x) F (x) DF (x) DF (x) + Q(x)
n
(2.21) (2.22)
with Q(x) =
i=1
fi (x)D2 fi (x) .
We recall that F (x) = [f1 (x) . . . fn (x)] , each fi (x) is a function from Rn into R and that each D2 fi (x) is therefore the n × n Hessian matrix of fi (x). The GaussNewton method approaches the solution by computing a Newton step for the ﬁrst order conditions of (2.20), Df (x) = 0. At step k the Newton
7 The iterate x(k+1) computed for instance by a classical Newton step does not necessarily satisfy this requirement.
2.11 Solution by Minimization
40
direction s(k) is determined by D2 f (x(k) ) s(k) = −Df (x(k) ) . Replacing (2.21) and (2.22) in the former expression we get (DF (x(k) ) DF (x(k) ) + Q(x(k) ))s(k) = −DF (x(k) ) F (x(k) ) . (2.23)
For x(k) suﬃciently close to the solution x∗ , the term Q(x(k) ) in the Hessian (k) matrix is negligible and we may obtain the approximate sGN called the GaussNewton step from DF (x(k) ) DF (x(k) ) sGN = −DF (x(k) ) F (x(k) ) .
(k) (k)
(2.24)
Computing sGN using (2.24) explicitly would require calculating the solution of a symmetric positive deﬁnite linear system. With such an approach the condition of the linear system involving DF (x(k) ) DF (x(k) ) is squared compared to the following alternative. The system (2.24) constitutes the set of normal equations, and its solution can be obtained by solving DF (x(k) ) sGN = −F (x(k) ) via a QR factorization. We notice that this development leads to the same step as in the classical Newton method, see Algorithm 14. It is worth mentioning that the GaussNewton method does not yield the same iterates as the Newton method for a direct minimization of f (x). The LevenbergMarquardt method is closely related to GaussNewton and to a modiﬁcation of Newton’s method for nonlinear equations that is globally con(k) vergent, see Section 2.12. The LevenbergMarquardt step sLM is computed as the solution of (DF (x(k) ) DF (x(k) ) + λk I) sLM = −DF (x(k) ) F (x(k) ) . It can be shown that solving this equation for sLM is equivalent to compute sLM
(k) (k) (k) (k)
= argmins F (x(k) ) + DF (x(k) ) s subject to s 2≤δ.
2
This method is therefore a trustregion technique which is presented in Section 2.12. We immediately see that if λk is zero, then sLM = sGN ; whereas when λk be(k) comes very large, sLM is the steepest descent update for minimizing f at x(k) , −Df (x(k) ) = −DF (x) F (x). We may also note that every solution of F (x) = 0 is a solution to problem (2.20). However, the converse is not true since there may be local minimizers of f (x). Such a situation is illustrated in Figure 2.1 and can be explained by recalling that the gradient of f (x) given by Equation (2.21), may vanish either when F (x) = 0 or when DF (x) is singular.
(k) (k)
2.12 Globally Convergent Methods
41
F(x) 10 8 30 6 4 2 0 2 0 2 4 6 0 0 20 10 40
f(x) = F(x)'*F(x)/2
2
4
6
Figure 2.1: A one dimensional function F (x) with a unique zero and its corresponding function f (x) with multiple local minima.
2.12
Globally Convergent Methods
We saw in Section 2.6 that the Newton method is quadratically convergent when the starting guess x(0) is close enough to x∗ . An issue with the Newton method is that when x(0) is not in a convergent neighborhood of x∗ , the method may not converge at all. There are diﬀerent strategies to overcome this diﬃculty. All of them ﬁrst consider the Newton step and modify it only if it proves unsatisfactory. This ensures that the quadratic behavior of the Newton method is maintained near the solution. Some of the methods proposed resort to a hybrid strategy, i.e. they allow to switch their search technique to a more robust method when the iterate is not close enough to the solution. It is possible to devise a hybrid method by combining a Newtonlike method and a quasirandom search (see e.g. Hickernell and Fang [65]), whereas an alternative is to expand the radius of convergence and try to diminish the computational cost by switching between a GaussSeidel and a Newtonlike method as in Hughes Hallett et al. [71]. Other such hybrid methods can be imagined by turning to a more robust—though less rapid—technique when the Newton method does not provide a satisfactory step. The second modiﬁcation amounts to building a modeltrust region and we take some combination of the Newton direction for F (x) = 0 and the steepest descent direction for minimizing f (x).
2.12.1
Linesearch
As already mentioned, a criterion to decide whether a Newton step is acceptable is to impose a decrease in f (x). We also know that s(k) = (DF (x(k) ))−1 F (x(k) )
2.12 Globally Convergent Methods
42
is a descent direction for f (x) since Df (x(k) ) s(k) < 0, see Equation 2.14 on page 35. The idea is now to adjust the length of s(k) by a factor ωk to provide a step ωk s(k) that leads to a decrease in f . The simple condition f (x(k+1) ) < f (x(k) ) is not suﬃcient to ensure that the sequence of iterates {x(k) }k=0,1,2,... will lead to x∗ . The issue is that either the decreases of f could be too small compared to the length of the steps or that the steps are too small compared to the decrease of f . The problem can be ﬁxed by imposing bounds on the decreases of ω. We recall that ω = 1 ﬁrst has to be tried to retain the quadratic convergence of the method. The value of ω is chosen in a way such as it minimizes a model built on the information available. Let us deﬁne g(ω) = f (x(k) + ωs(k) ) , where s(k) is the usual Newton step for solving F (x) = 0. We know the values of g(0) = f (x(k) ) = (1/2)F (x(k) ) F (x(k) ) and Dg(0) = Df (x(k) ) s(k) = F (x(k) ) DF (x(k) )s(k) .
Since a Newton step is tried ﬁrst we also have the value of g(1) = f (x(k) + s(k) ). If the inequality g(1) > g(0) + αDg(0) , α ∈ (0, 0.5) (2.25)
is satisﬁed, then the decrease in f is too small and a backtrack along s(k) is introduced by diminishing ω. A quadratic model of g is built—using the information g(0), g(1) and Dg(0)—to ﬁnd the best approximate ω. The parabola is deﬁned by g(ω) = (g(1) − g(0) − Dg(0))w2 + Dg(0)w + g(0) , ˆ and is illustrated in Figure 2.2. The minimum value taken by g(ω), denoted ω , is determined by ˆ Dˆ(ˆ ) = 0 g ω =⇒ ω= ˆ −Dg(0) . 2(g(1) − g(0) − Dg(0))
Lower and upper bounds for ω are usually set to constrain ω ∈ [0.1, 0.5] so that ˆ ˆ very small or too large step values are avoided. If x(k) = x(k) + ω s(k) still does not satisfy (2.25), a further backtrack is perˆ ˆ formed. As we now have a new item of information about g, i.e. g(ˆ ), a cubic ω ﬁt is carried out. The linesearch can therefore be formalized as follows.
2.12 Globally Convergent Methods
43
2
1.5 g(w)
1
0.5
0 0.2
0
0.2
0.4 w
0.6
0.8
1
Figure 2.2: The quadratic model g (ω) built to determine the minimum ω . ˆ ˆ
Algorithm 20 Linesearch with backtrack
Choose 0 < α < 0.5 (α = 10−4 ) and 0 < l < u < 1 (l = 0.1, u = 0.5) wk = 1 while f (x(k) + wk s(k) ) > f (x(k) ) + αωk Df (x(k) ) s(k) Compute wk by cubic interpolation ˆ (or quadratic interpolation the ﬁrst time) ˆ end x(k+1) = x(k) + ωk s(k)
A detailed version of this algorithm is given in Dennis and Schnabel [26, Algorithm A6.3.1].
2.12.2
Modeltrust Region
The second alternative to modify the Newton step is to change not only its length but also its direction. The Newton step comes from a local model of the nonlinear function f around x(k) . The modeltrust region explicitly limits the step length to a region where this local model is reliable. We therefore impose to s(k) to lie in such a region by solving s(k) = argmins 1 DF (x(k) )s + F (x(k) ) 2 subject to s 2 ≤ δk
2 2
(2.26)
for some δk > 0. The objective function (2.26) may also be written 1 1 s DF (x(k) ) DF (x(k) )s + F (x(k) ) DF (x(k) )s + F (x(k) ) F (x(k) ) 2 2 (2.27)
and problems arise when the matrix DF (x(k) ) DF (x(k) ) is not safely positive deﬁnite, since the Newton step that minimizes (2.27) is s(k) = −(DF (x(k) ) DF (x(k) ))−1 DF (x(k) ) F (x(k) ) .
2.13 Stopping Criteria and Scaling
44
We can detect that this matrix becomes close to singularity for instance by checking when κ2 ≥ u−1/2 , where u is deﬁned as the unit roundoﬀ of the computer, see Section A.1. In such a circumstance we decide to perturb the matrix by adding a diagonal matrix to it and get √ DF (x(k) )DF (x(k) ) + λk I where λk = n u DF (x(k) ) DF (x(k) ) 1 . (2.28) This choice of λk can be shown to satisfy 1 √ ≤ κ2 (DF (x(k) ) DF (x(k) ) + λk I) − 1 ≤ u−1/2 , n u when DF (x(k) ) ≥ u1/2 . Another theoretical motivation for a perturbation such as (2.28) is the following result: lim (J J + λI)−1 J = J + ,
λ→0+
where J denotes the MoorePenrose pseudoinverse. This can be shown using the SVD decomposition.
+
2.13
Stopping Criteria and Scaling
In all the algorithms presented in the preceding sections, we did not specify a precise termination criterion. Almost all the methods would theoretically require an inﬁnite number of iterations to reach the limit of the sequence {x(k) }k=0,1,2... . Moreover even the techniques that should converge in a ﬁnite number of steps in exact arithmetic may need a stopping criterion in a ﬁnite precision environment. The decision to terminate the algorithm is of crucial importance since it determines which approximation to x∗ the chosen method will ultimately produce. To devise a ﬁrst stopping criterion, we recall that the solution x∗ of our problem must satisfy F (x∗ ) = 0. As an algorithm produces approximations to x∗ and as we use a ﬁnite precision representation of numbers, we should test whether F (x(k) ) is suﬃciently close to the zero vector. A second way of deciding to stop is to test that two consecutive approximations, for example x(k) and x(k−1) , are close enough. This leads to considering two kinds of possible stopping criteria. The ﬁrst idea is to test F (x(k) ) < F for a given tolerance F > 0, but this test will prove inappropriate. The diﬀerences in the scale of both F and x largely inﬂuence such a test. If F = 10−5 and if any x yields an evaluation of F in the range [10−8 , 10−6 ], then the method may stop at an arbitrary point. However, if F yields values in the interval [10−1 , 102 ], the algorithm will never satisfy the convergence criterion. The test F (x(k) ) ≤ F will also very probably take into account the components of x diﬀerently when x is badly scaled, i.e. when the elements in x vary widely. The remedy to some of these problems is then to scale F and to use the inﬁnity norm. The scaling is done by dividing each component of F by a value di
2.13 Stopping Criteria and Scaling
45
selected so that fi (x)/di is of magnitude 1 for values of x not too close to x∗ . Hence, the test SF F ∞ ≤ F where SF = diag(1/d1 , . . . , 1/dn ) should be safe. The second criterion tests whether the sequence {x(k) }k=0,1,2,... stabilizes itself suﬃciently to stop the algorithm. We might want to perform a test on the relative change between two consecutive iterations such as x(k) − x(k−1) ≤ x(k−1)
x
.
To avoid problems when the iterates converge to zero, it is recommended to use the criterion (k) (k−1)  x − xi , max ri ≤ x with ri = i (k) i max{xi , xi } ¯ where xi > 0 is an estimate of the typical magnitude of xi . The number of ¯ expected correct digits in the ﬁnal value of x(k) is approximately − log10 ( x ). In the case where a minimization approach is used, one can also devise a test on the gradient of f , see e.g. Dennis and Schnabel [26, p. 160] and Gill et al. [46, p. 306]. The problem of scaling in macroeconometric models is certainly an important issue and it may be necessary to rescale the model to prevent computational problems. The scale is usually chosen so that all the scaled variables have magnitude 1. We may therefore replace x by x = Sx x where Sx = diag(1/¯1 , . . . , 1/¯n ) ˜ x x is a positive diagonal matrix. −1 ˜ x To analyze the impact of this change of variables, let us deﬁne F (˜) = F (Sx x) ˜ so that we get ˜ x DF (˜) = =
−1 −1 (Sx ) DF (Sx x) ˜ −1 Sx DF (x) ,
and the classical Newton step becomes s = ˜ = ˜ x ˜ x −(DF (˜))−1 F (˜) −1 −(Sx DF (x))−1 F (x) .
This therefore results in scaling the rows of the Jacobian matrix. We typically expect that such a modiﬁcation will allow a better numerical behavior by avoiding some of the issues due to large diﬀerences of magnitude in the numbers we manipulate. We also know that an appropriate row scaling will improve the condition number of the Jacobian matrix. This is linked to the problem of ﬁnding a preconditioner, since we could replace matrix Sx in the previous development by a general nonsingular matrix approximating the inverse of (DF (x)) .
Chapter 3
Solution of Large Macroeconometric Models
As already introduced in Chapter 1, we are interested in the solution of a nonlinear macroeconometric model represented by a system of equations of the form F (y, z) = 0 , where y represents the endogenous variables of the model at hand. The macroeconometric models we study are essentially large and sparse. This allows us to investigate interesting properties that follow on from sparse structures. First, we can take advantage of the information given in the structure to solve the model eﬃciently. An obvious task is to seek for the blocktriangular decomposition of a model and take into account this information in the solution process. The result is both a more eﬃcient solution and a signiﬁcant contribution to a better understanding of the model’s functioning. We essentially derive orderings of the equations for ﬁrstorder iterative solution methods. The chapter is organized in three sections. In the ﬁrst section, the model’s structure will be analyzed using a graphtheoretic approach which has the advantage of allowing an eﬃcient algorithmic implementation. The second section presents an original algorithm for computing minimal essential sets which are used for the decomposition of the interdependent blocks of the model. The results of the two previous sections provide the basis for the analysis of a popular technique used to solve large macroeconometric models.
3.1 Blocktriangular Decomposition of the Jacobian Matrix
47
3.1
Blocktriangular Decomposition of the Jacobian Matrix
The logical structure of a system of equations is already deﬁned if we know which variable appears in which equation. Hence, the logical structure is not connected to a particular quantiﬁcation of the model. Its analysis will provide important insights into the functioning of a model, revealing robust properties which are invariant with respect to diﬀerent quantiﬁcations. The ﬁrst task consists in seeking whether the system of equations can be solved by decomposing the original system of equations into a sequence of interdependent subsystems. In other words, we are looking for a permutation of the model’s Jacobian matrix in order to get a blocktriangular form. This step is particularly important as macroeconometric models almost always allow for such a decomposition. Many authors have analyzed such sparse structures: some of them, e.g. Duﬀ et al. [30], approach the problem using incidence matrices and permutations, while others, for instance Gilbert et al. [44], Pothen and Fahn [88] and Gilli [50], use graph theory. This presentation follows the lines of the latter papers, since we believe that the structural properties often rely on concepts better handled and analyzed with graph theory. A clear discussion about the uses of graph theory in macromodeling can be found in Gilli [49] and Gilli [50]. We ﬁrst formalize the logical structure as a graph and then use a methodology based on graphs to analyse its properties. Graphs are used to formalize relations existing between elements of a set. The standard notation for a graph G is G = (X, A) , where X denotes the set of vertices and A is the set of arcs of the graph. An arc is a couple of vertices (xi , xj ) deﬁning an existing relation between the vertex xi and the vertex xj . What is needed to deﬁne the logical structure of a model is its deterministic part formally represented by the set of n equations F (y, z) = 0 . In order to keep the presentation simpler, we will assume that the model has been normalized, i.e. a diﬀerent lefthand side variable has been assigned to each equation, so that we can write yi = gi (y1 , . . . , yi−1 , yi+1 , . . . , yn , z), i = 1, . . . , n .
We then see that there is a link going from the variables in the righthand side to the variable on the lefthand side. Such a link can be very naturally represented by a graph G = (Y, U ) where the vertices represent the variables yi , i = 1, . . . , n and an arc (yj , yi ) represents the link. The graph of the complete model is obtained by putting together the partial graphs corresponding to all single equations. We now need to deﬁne the adjacency matrix AG = (aij ) of our graph G = (Y, U ). We have 1 if and only if the arc (yj , yi ) exists, aij = 0 otherwise.
3.2 Orderings of the Jacobian Matrix
48
We may easily verify that the adjacency matrix AG has the same nonzero pattern as the Jacobian matrix of our model. Thus, the adjacency matrix contains all the information about the existing links between the variables in all equations deﬁning the logical structure of the model. We already noticed that an arc in the graph corresponds to a link in the model. A sequence of arcs from a vertex yi to a vertex yj is a path, which corresponds to an indirect link in the model, i.e. there is a variable yi which has an eﬀect on a variable yj through the interaction of a set of equations. A ﬁrst task now consists in ﬁnding the sets of simultaneous equations of the model, which correspond to irreducible diagonal matrices in the blockrecursive decomposition; in the graph, this corresponds to the strong components. A strong component is the largest possible set of interdependent vertices, i.e. the set where any ordered pair for vertices (yi , yj ) veriﬁes a path from yi to yj and a path from yj to yi . The strong components deﬁne a unique partition among the equations of the model into sets of simultaneous equations. The algorithms that ﬁnd the strong components of a graph are standard and can be found in textbooks about algorithmic graph theory or computer algorithms, see e.g. Sedgewick [93, p. 482] or Aho et al. [2, p. 193]. Once the strong components are identiﬁed, we need to know in what order they appear in the blocktriangular Jacobian matrix. A technique consists in resorting to the reduced graph the vertices of which are the strong components of the original graph and where there will be an arc between the new vertices, if there is at least one arc between the vertices corresponding to the two strong components considered. The reduced graph is without circuits, i.e. there are no interdependencies and therefore it will be possible to number the vertices in a way that there does not exist arcs going from higher numbered vertices to lower numbered vertices. This ordering of the corresponding strong components in the Jacobian matrix then exhibits a blocktriangular pattern. We may mention that Tarjan’s algorithm already identiﬁes the strong components in the order described above. The solution of the complete model can now be performed by considering the sequence of submodels corresponding to the strong components. The eﬀort put into the ﬁnding the blocktriangular form is negligible compared to the complexity of the modelsolving. Moreover, the increased knowledge of which parts of the model are dependent or independent of others is very helpful in simulation exercises. The usual patterns of the blocktriangular Jacobian matrix corresponding to macroeconometric models exhibits a single large interdependent block, which is both preceeded and followed by recursive equations. This pattern is illustrated in Figure 3.1.
3.2
Orderings of the Jacobian Matrix
Having found the blocktriangular decomposition, we now switch our attention to the analysis of the structure of an indecomposable submodel. Therefore, the Jacobian matrices considered in the following are indecomposable and the
3.2 Orderings of the Jacobian Matrix
49
.. .. .. .. . ... . .. . . .... ....................... . . . . . ...................... ...................... . ...................... ...................... ...................... ...................... . .................... . . . . . ...................... ...................... . . ...................... ...................... ...................... . . . . ...................... ...................... ...................... .... ...................... .................... . .................... . . . . . ...................... ...................... .. .................... .. .................. .... ...................... . . . . . ...................... ..... ............ ...................... ...................... . . . . . ...................... ...................... ...................... ...................... ...................... ...................... . . . . . ...................... . ...................... ...................... ...................... ... .. . .. . .. . .. . .. . . . . . ...................... . . . .. . . ...................... ...................... ..... ................. . . . . . .............................. . . . . . ...... . . . . . . . . . . . . . . . . . . . . . . . . ....... . . . . . . . . . . . . . ...... ..... ...... . . . . . . . . . . . . . . . ...... ..... . . ..... . .. . ...... . . . . . . . . . . . . . . . . . . . . . . . . ........ .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .. . . . . . . . . . . . . . . . . ... ..
0
Figure 3.1: Blockrecursive pattern of a Jacobian matrix. corresponding graph is a strong component. Let us consider a directed graph G = (V, A) with n vertices and the corresponding set C = {c1 , . . . , cp } of all elementary circuits of G. An essential set S of vertices is a subset of V which covers the set C, where each circuit is considered as the union set of vertices it contains. According to Guardabassi [57], a minimal essential set is an essential set of minimum cardinality, and it can be seen as a problem of minimum cover, or as a problem of minimum transversal of a hypergraph with edges deﬁned by the set C. For our Jacobian matrix, the essential set will enable us to ﬁnd orderings such that the sets S and F in the matrices shown in Figure 3.2 are minimal. The set S corresponds to a subset of variables and the set F to a subset of feedbacks (entries above the diagonal of the Jacobian matrix). The set S is also called the essential feedback vertex set and the set F is called the essential feedback arc set.
.. . ...... .. .. . . . . . . . ...... .. .. . . . . . .... . . ...... . .. . . . . . ...... . ...... . ... . ........ .. . . . . . . ...... .. . . ........ .. .. . . . . . . ...... .. .. . . . . . .......... .. . .. . ...... .. . . . . . . . . . . . .. . . ...... .. . . . . . ................ .. . . ...... .. .. . . . . . ................... . ...... . .................... .. . . . . . .. . ...... ....... . ...................... .. . . . . . .. . ...... . ........................ .. .. . . . . . . ...... .. .. . . . . . .......................... . . ...... .. .. . . . . . .. ............. . . . ...... . .............................. .. ................................ . ............ . .. . . . . . . . ............. .. . .................................. . ............ .. . . ..................................... ............ . ............... .. ............................................. ...... . . . . . . . . . . . . . . . . . . . . ...... . ... . ..................................... ............ . . . ..................................... ............ . ..................................... ............ .. . . . . . . . . . . . . . . . . . . . . . . .. . ..................................... ............ . . . . . . . . . . . . . . . . . .. . . . . . . . . .. .. .. ... . ..... .. × ....... .. . ......... .. ........... × .. ............. . . . . .. .. ............... . ................. ................... .. . . . . . . . . ....... × . ....................... .. .. ......................... ........................... . . . . . . . . . . . .. . . ............................. ... . ............................... × . . . . . . . . . . . . . . . ... . ................................... ... .. ..................................... .. ....................................... .. . . . . . . . . . . . . . . . . . .. ......................................... ............................................ ... . . ............................................. .. . . . . . . . . . . . . . . . . . . . ... . ... .. ................................................. . . . . . . . . . . . . . . . . . . . . . . ..... .
0
F
S
(a)
(b)
Figure 3.2: Sparsity pattern of the reordered Jacobian matrix. Such minimum covers are an important tool in the study of large scale interdependent systems. A technique often used to understand and to solve such complex systems consists in representing them as a directed graph which can be made feedbackfree by removing the vertices belonging to an essential set.1
1 See, for instance, Garbely and Gilli [42] and Gilli and Rossier [54], where some aspects of the algorithm presented here have been discussed.
3.2 Orderings of the Jacobian Matrix
50
However, in the theory of complexity, this problem of ﬁnding minimal essential sets—also referred to as the feedbackvertexset problem—is known to be NPcomplete2 and we cannot hope to obtain a solution for all graphs. Hence, the problem is not a new one in graph theory and system analysis, where several heuristic and nonheuristic methods have been suggested in the literature. See, for instance, Steward [94], Van der Giessen [98], Reid [89], Bodin [16], Nepomiastchy et al. [83], Don and Gallo [27] for heuristic algorithms, and Guardabassi [58], Cheung and Kuh [22] and Bhat and Kinariwala [14] for nonheuristic methods. The main feature of the algorithm presented here is to give all optimal solutions for graphs corresponding in size and complexity to the commonly used largescale macroeconomic models in reasonable time. The eﬃciency of the algorithm is obtained mainly by the generation of only a subset of the set of all elementary circuits from which the minimal covers are then computed iteratively by considering one circuit at a time. Section 3.2.1 describes the iterative procedure which uses Boolean properties of minimal monomial, Section 3.2.1 introduces the appropriate transformations necessary to reduce the size of the graph and Section 3.2.1 presents an algorithm for the generation of the subset of circuits necessary to compute the covers. Since any minimal essential set of G is given by the union of minimal essential sets of its strong components, we will assume, without loss of generality, that G has only one strong component.
3.2.1
The Logical Framework of the Algorithm
The particularity of the algorithm consists in the combination of three points: a procedure which computes covers iteratively by considering one circuit at a time, transformations likely to reduce the size of the graph, and an algorithm which generates only a particular small subset of elementary circuits. Iterative Construction of Covers Let us ﬁrst consider an elementary circuit ci of G as the following set of vertices ci =
j∈Ci
{vj }
,
i = 1, . . . , p
,
(3.1)
where Ci is the index set for the vertices belonging to circuit ci . Such a circuit ci can also be written as a sum of symbols representing the vertices, i.e. vj
j∈Ci
,
i = 1, . . . , p
.
(3.2)
What we are looking for are covers, i.e. sets of vertices selected such as at least one vertex is in each circuit ci , i = 1, . . . , p. Therefore, we introduce the product
2 Proof for the NPcompleteness are given in Karp [72], Aho et al. [2, pp. 378384], Garey and Johnson [43, p. 192] and Even [32, pp. 223224].
3.2 Orderings of the Jacobian Matrix
51
of all circuits represented symbolically in (3.2) :
p
(
i=1 j∈Ci
vj ) ,
(3.3)
which can be developed in a sum of K monomials of the form:
K
(
k=1 j∈Mk
vj ) ,
(3.4)
where Mk is the set of indices for vertices in the kth monomial. To each monomial j∈Mk vj corresponds the set j∈Mk {vj } of vertices which covers the set C of all elementary circuits. Minimal covers are then obtained by considering the vertices vi as Boolean variables and applying the following two properties as simpliﬁcation rules, where a and b are Boolean variables: • Idempotence: • Absorption: a+a=a a+a·b =a and and a·a=a a · (a + b) = a .
After using idempotence for simpliﬁcation of all monomials, the minimal covers will be given by the set of vertices corresponding to the monomials with minimum cardinality Mk . We will now use the fact that the development of the expression (3.3) can be carried out iteratively, by considering one circuit at a time. Step r in this development is then:
r
(
j=1 i∈Cj
vi ) ·
i∈Cr+1
vi
,
r = 1, . . . , p − 1 .
(3.5)
Considering the set of covers E = {e } obtained in step r − 1, we will construct the set of covers E ∗ which also accounts for the new circuit cr+1 . Denoting now by Cr+1 the set of vertices forming circuit cr+1 , we partition the set of vertices V and the set of covers E as follows: V1 = {vv ∈ Cr+1 V2 = Cr+1 − V1 E1 = {e e ∈ E E2 = E − E1 and v ∈ e ∈ E} and e ∩ Cr+1 = ∅} (3.6) (3.7) (3.8) (3.9)
with V1 as the vertices of the new circuit cr+1 that are already covered, V2 as those which are not covered, E1 as the satisfactory covers and E2 as those covers which must be extended. This partition is illustrated by means of a graph where vertices represent the sets and where there is an edge if the corresponding sets have common elements.
3.2 Orderings of the Jacobian Matrix
52
Cr+1
.......... .......... ............. ............. ............. ............. ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ... ..
V1 V2
.................................... .................................... ... ... . ... ...
E1
... ...
. ... ...
... ...
. ... ...
... ...
. ... ...
V3
.................................... ....................................
E2
V3 is the set given by V − V1 − V2 and the dotted edge means that those sets may have common elements. Let us now discuss the four possibilities of combining an element of V1 , V2 with an element of E1 , E2 , respectively: 1. for v ∈ V1 and ek ∈ E1 , we have by deﬁnition, ek ∪ v = ek , which implies E1 ⊆ E ∗ ; 2. for v ∈ V1 and ei ∈ E2 , we have {ei ∪ v} ∈ E ∗ under the constraint ek ⊆ ei ∪ v for all ek ∈ E1 ; 3. for v ∈ V2 and ek ∈ E1 , we have ek ⊂ ek ∪ v, implying {ek ∪ v} ∈ E ∗ ; 4. for v ∈ V2 and ei ∈ E2 , we have {ei ∪ v} ∈ E ∗ , since such covers cannot be eliminated by idempotence or absorption. The new set of covers E ∗ we seek is then E ∗ = E1 ∪ A ∪ B (3.10)
where the covers of set A are those deﬁned in point 4 and the covers of set B are those deﬁned in point 2. Set A can be computed automatically, whereas the construction of the elements of set B necessitates the veriﬁcation of the condition (3.11) ek ⊆ ei ∪ v for all ek ∈ E1 , ei ∈ E2 , v ∈ V1 . For a given vertex v ∈ V1 , the veriﬁcation of (3.11) can be limited to sets ek verifying v ∈ ek and 1 < card(ek ) ≤ card(ei ) . Since ek ∈ E1 and ei ∈ E2 are covers, verifying ek ⊆ ei , it follows that v ∈ ek ⇒ ek ⊆ ei ∪ v .
And for sets ek with cardinality 1, we verify ek = {v} Condensation of the Graph In many cases, the size of the original graph can be reduced by appropriate transformations. Subsequently, we present transformations which reduce the size of G while preserving the minimal covers. Considering the graph G = (V, A), we deﬁne three transformations of G: ⇒ ek ⊂ ei ∪ v .
3.2 Orderings of the Jacobian Matrix
53
• Transformation 1: Let vi ∈ V be a vertex verifying a single outgoing arc (vi , vj ). Transform predecessors of vi into predecessors of vj and remove vertex vi from G. • Transformation 2: Let vi ∈ V be a vertex verifying a single ingoing arc (vj , vi ). Transform successors of vi into successors of vj and remove vertex vi from G. • Transformation 3: Let vi ∈ V be a vertex verifying an arc of the form (vi , vi ). Store vertex vi and remove vertex vi from G. Repeat these transformations in any order as long as possible. The transformed graph will then be called the condensed graph of G. Such a condensed graph is not necessarily connected. The situations described in the transformations 1 to 3 are illustrated in Figure 3.3.
..... ..... ..... ..... .. ............ ......... . . . ..... ...... ..... ..... ... .... . .. . ... . .. ......... ... ............. .. ... ... .. .............. . . .. ........... . . ... . .... .... .. ...... . ...... .. .. .... ... ........... ........... .. . .... .... ...... .. ..... .. .... .... ..... .. .. ......... .. . ............. .. .. .. .. ............. . . .. ............. .. ... .... .... .. .. .... ... .... .... .. . . . . . . . . .. .. .. .. .. . ..
vi
.............. ........... .. .
vj
vi
.............. ............. .. .
vj
vi
case 1
case 2
case 3
Figure 3.3: Situations considered for the transformations.
Proposition 1 The union of the set of vertices stored in transformation 3, with the set of vertices constituting a minimal cover for the circuits of the condensed graph, is also a minimal cover for the original graph G. Proof. In case 1 and in case 2, every circuit containing vertex vi must also contain vertex vj , and therefore vertex vi can be excluded from covers. Obviously, vertices eliminated from the graph in transformation 3 must belong to every cover. The loop of any such given vertex vi absorbs all circuits containing vi , and therefore vertex vi can be removed. 2 Generation of Circuits According to idempotence and absorption rules, it is obvious that it is unnecessary to generate all the elementary circuits of G, since a great number of them will be absorbed by smaller subcircuits. Given a circuit ci as deﬁned in (3.1), we will say that circuit ci absorbs circuit cj , if and only if ci ⊂ cj . Proposition 2 Given the set of covers E corresponding to r circuits cj , j = 1, . . . , r, let us consider an additional circuit cr+1 in such a way that there exists a circuit cj , j ∈ {1, . . . , r} verifying cj ⊂ cr+1 . Then, the set of covers E ∗ corresponding to the r + 1 circuits cj , j = 1, . . . , r + 1 is E.
3.2 Orderings of the Jacobian Matrix
54
Proof. By deﬁnition, any element of ei ∈ E will contain a vertex of circuit cj . Therefore, as cj ⊂ cr+1 , the set of partitions deﬁned in (3.6–3.9) will verify cj ⊂ V1 ⇒ E1 = E ⇒ E2 = ∅ and E ∗ = E1 = E. 2 We will now discuss an eﬃcient algorithm for the enumeration of only those circuits which are not absorbed. This point is the most important, as we can only expect eﬃciency if we avoid the generation of many circuits from the set of all elementary circuits, which, of course, is of explosive cardinality. In order to explore the circuits of the condensed graph systematically, we ﬁrst consider the circuits containing a given vertex v1 , then consider the circuits containing vertex v2 in the subgraph V − {v1 }, and so on. This corresponds to the following partition of the set of circuits C:
n−1
C=
i=1
Cvi
,
(3.12)
where Cvi denotes the set of all elementary circuits containing vertex vi in the subgraph with vertex set V − {v1 , . . . , vi−1 }. It is obvious that Cvi ∩ Cvj = ∅ for i = j and that some sets Cvi will be empty. Without loss of generality, let us start with the set of circuits Cv1 . The following deﬁnition characterizes a subset of circuits of Cv1 which are not absorbed. Deﬁnition 2 The circuit of length k + 1 deﬁned by the sequence of vertices [v1 , x1 , . . . , xk , v1 ] is a chordless circuit if G contains neither arcs of the form (xi , xj ) for j − i > 1, nor the arc (xi , v1 ) for i = k. Such arcs, if they exist, are called chords. In order to seek the chordless circuits containing the arc (v1 , vk ), let us consider the directed tree T = (S, U ) with root v1 and vertex set S = Adj(v1 ) ∪ Adj(vk ). The tree T enables the deﬁnition of a subset AT = AV ∪ AS T T of arcs of the graph G = (V, A). The set AV = {(xi , xj )xi ∈ V − S T and xj ∈ S} (3.14) (3.13)
contains arcs going from vertices not in the tree to vertices in the tree. The set AS = {(xi , xj )xi , xj ∈ S T and (xi , xj ) ∈ U } (3.15)
contains the cross, back and forward arcs in the tree. The tree T is shown in Figure 3.4, where the arcs belonging to AT are drawn in dotted lines. The set Rvi denotes the adjacency set Adj(vi ) restricted to vertices not yet in the tree. Proposition 3 A chordless circuit containing the arc (v1 , vk ) cannot contain arcs (vi , vj ) ∈ AT . Proof. The arc (v1 , vk ) constitutes a chord for all paths [v1 , . . . , vk ]. The arc (vk , vi ), vi ∈ Rvk constitutes a chord for all paths [vk , . . . , vi ]. 2
3.2 Orderings of the Jacobian Matrix
55
. ... ... ... ... ... ... ... . . ... ... .. . . ... . . ... ... ... ... . . . . ... ... . . ... ... . . . . . ... . ... . ... ... . . . . ... . . ... . . ... ... . . ... . ... . . . . . ... ... ... . . ... . . . . . . ... ... . . ... ... . . . ... . ... . . . . ... ... . . ... ... . . . . . ... . ... . ... . ... . . . ... .. . ... . . ... .. .. ... ... . ... .. . . . . . .. . ... ... ... ... .. . . ... ... .... ... .... ... .. . .... ... .. ... ... ... ... ... .. .. .. .. ... k+1 k .. . ..... .. ... . . .... . . .. .. .. . .. . .. .. .. ... .. . . ... .. . . . .. . .. . . .. .. . .. .. .. .. .. . .. .. .. . ... .. . .. .. . ... . . . . . .. .. . . .. . . . ... . . . .. . . . . .. . .. .. . . . . . . . . . . . . .. . . . .. .. ... . .. . . . . . . . . . .. . .. . . . . .. . ... . . . . .. . . . .. . . . .. .. .. .. . . . . . .. . . . . .. . . ... . . . . .. .. . .. . . . . . . .. . .. . . . . . .. . .. .. .. .. .. .. .. . .. .. . .. . . . . ... ... .. .. .. .. ... . ... . ... . ... ... ... .. .. ... . .. ... ... ... .. ... . . ... ... ... ... .. .. .. .. .
v1
root
•
v
v
···
•
level 1
•
•
···
•
level 2
Rvk
Figure 3.4: Tree T = (S, U ). From Proposition 3, it follows that all ingoing arcs to the set of vertices adjacent to vertex v1 can be ignored in the searchalgorithm for the circuits in Cv1 . For the circuits containing the arc (v1 , vk ), the same reasoning can be repeated, i.e. that all ingoing arcs to the set of vertices adjacent to vertex vk can be ignored. Continuing this procedure leads to the recursive algorithm given hereafter.
Algorithm 21 Chordless circuits Input: The adjacency sets Adj(vi ) of graph G = (V, A). Output: All chordless circuits containing vertex v. begin initialize: k = 1; circuit(1) = v1 ; Rv1 = Adj(v1 ); S = Adj(v1 ); chordless(Rv1 ); end chordless(Rv ): 1. for all i ∈ Rv k = k + 1; circuit(k) = i; Ri = ∅; 2. if any j ∈ Adj(i) and j ∈ circuit(2 : k) then goto 5 3. for all j ∈ Adj(i) and j ∈ S do if j = circuit(1) then output circuit(n), n = 1 : k; goto 5 end S = S ∪ {j}; Ri = Ri ∪ {j}; end 4. chordless(Ri ); 5. k = k − 1; end 6. S = S − Rv ; end
Algorithm 21 then constructs recursively the directed tree T = (S, U ), which
3.3 Point Methods versus Block Methods
56
obviously evolves continuously. The loop in line 1 goes over all vertices of a given level. The test in line 2 detects a chordless circuit not containing vertex v1 . Such a circuit is not reported as it will be detected while searching circuits belonging to some Cvi = Cv1 . For vertex vk , the loop in line 3 constructs Rvk , the set of restricted successors, and expands the tree. The recursive call in line 4 explores the next level. In line 5, we replace the last vertex in the explored path by the next vertex vk+1 in the level. Finally, in line 6, all vertices of a given level have been explored and we remove the vertices of the set Rvk from the tree.
3.2.2
Practical Considerations
The algorithm which generates the chordless circuits certainly remains NP, and the maximum size of a graph one can handle depends on its particular structure. For nonreducible graphs, i.e. graphs to which none of the transformations described in Section 3.2.1 apply, we experimented that an arc density of about 0.2 characterizes the structures that are the most diﬃcult to explore. The algorithm handles such nonreducible graphs, with up to about 100 vertices. For applications, as those encountered in large scale systems with feedback, this corresponds to much greater problems, i.e. models with 200 to 400 interdependent variables, because, in practice, the corresponding graphs are always condensable. This, at least, is the case for almost all macroeconomic models and, to our best knowledge, it has not yet been possible to compute minimal essential sets for such large models.
3.3
Point Methods versus Block Methods
Two types of methods are commonly used for numerical solution of macroeconometric models: nonlinear ﬁrstorder iterative techniques and Newtonlike methods. To be eﬃcient, both methods have to take into account the sparsity of the Jacobian matrix. For ﬁrstorder iterations, one tries to put the Jacobian matrix into a quasitriangular form, whereas for Newtonlike methods, it is interesting to reorder the equations such as to minimize the dimension of the simultaneous block, i.e. the essential set S, to which the Newton algorithm is then applied. In practice, this involves the computation of a feedback arc set for the ﬁrst method and a set of spike variables for the latter method, which is discussed in the previous section. The second method can be considered as a block method, as it combines the use of a Newton technique for the set of spike variables with the use of ﬁrstorder iterations for the remaining variables. Whereas Newton methods applied to the complete system, are insensitive to diﬀerent orderings of the equations, the performance of the block method will vary for diﬀerent sets of spike variables with the same cardinality. Block methods solve subsets of equations with an inner loop and execute an outer loop for the complete system. If the size of the subsystems reduces to a single equation, we have a point method. The GaussSeidel method, as explained
3.3 Point Methods versus Block Methods
57
in Algorithm 2, is a point method. We will show that, due to the particular structure of most macroeconomic models, the block method is not likely to constitute an optimal strategy. We also discuss the convergence of ﬁrstorder iterative techniques, with respect to orderings corresponding to diﬀerent feedback arc sets, leaving, however, the question of the optimal ordering open.
3.3.1
The Problem
We have seen in Section 2.10 that for a normalized system of equations the generic iteration k for the point GaussSeidel method is written as xk+1 = gi (xk+1 , . . . , xk+1 , xk , . . . , xk ) , i+1 n 1 i i−1 i = 1, . . . , n . (3.16)
It is then obvious that (3.16) could be solved within a single iteration, if the entries in the Jacobian matrix of the normalized equations, corresponding to the xk , . . . , xk , for each equation i = 1, . . . , n, were zero.3 Therefore, it is often n i+1 argued that an optimal ordering for ﬁrstorder iterative methods should yield a quasitriangular Jacobian matrix, i.e. where the nonzero entries in the upper triangular part are minimum. Such a set of entries corresponds to a minimum feedback arc set discussed earlier. For a given set F , the Jacobian matrix can then be ordered as shown on panel (b) of Figure 3.2 already given before. The complexity of Newtonlike algorithms is O(n3 ) which promises interesting savings in computation if n can be reduced. Therefore, various authors, e.g. Becker and Rustem [11], Don and Gallo [27] and Nepomiastchy and Ravelli [82], suggest a reordering of the Jacobian matrix as shown on panel (a) of Figure 3.2. The equations can therefore be partitioned into two sets xR = gR (xR ; xS ) fS (xS ; xR ) = 0 , (3.17) (3.18)
where xS are the variables deﬁning the feedback vertex set S (spike variables). Given an initial value for the variables xS , the solution for the variables xR is obtained by solving the equations gR recursively. The variables xR are then exogenous for the much smaller subsystem fS , which is solved by means of a Newtonlike method. These two steps of the block method in question are then repeated until they achieve convergence.
3.3.2
Discussion of the Block Method
The advantages and inconveniences of ﬁrstorder iterative methods and Newtonlike techniques have been extensively discussed in the literature. Recently, it has been shown clearly by Hughes Hallett [70] that a comparison of the theoretical performance between these two types of solution techniques is not possible. As already mentioned, the solution of the original system, after the introduction of the decomposition into the subsystems (3.17) and (3.18), is obtained by means
3 Which
then corresponds to a lower triangular matrix.
3.3 Point Methods versus Block Methods
58
of a ﬁrstorder iterative method, combined with an embedded Newtonlike technique for the subsystem fS . Thus, the solution not only requires convergence for the subsystem fS , but also convergence for the successive steps over gR and fS . Nevertheless, compared with the complexity of the solution of the original system by means of Newtonlike methods, such a decomposition will almost always be preferable. The following question then arises: would it be interesting to solve the subsystem fS with a ﬁrstorder iterative technique? In order to discuss this question, we establish the operation count to solve the system in both cases. Using Newton methods for the subsystem fS , the approximate operation count is
F N kgR (p nR + kfS 2 n3 ) 3 S
(3.19)
F N where kgR is the number of iterations over gR and fS , and kfS is the number of iterations needed to solve the embedded subsystem fS . Clearly, the values for F N N kgR and kfS are unknown prior to solution, but we know that kfS can be, at best, equal to 2 . By solving subsystem fS with a ﬁrstorder iterative technique, we will obtain the following operation count F F kgR (p nR + kfS p nS )
(3.20)
F where kfS is the number of iterations needed to solve the embedded subsystem N fS . Taking kfS equal to 2 will enable us to establish from (3.19) and (3.20) the following inequality F (3.21) kfS < 4 n2 /p 3 S
which characterizes situations where ﬁrstorder iterative techniques are always preferable. It might now be interesting to investigate whether a decomposition of the Jacobian is still preferable in such a case. The operation count for solving f with F a ﬁrstorder method in kf iterations is obviously
F kf p n .
(3.22)
Then, using (3.22) and (3.20), we obtain the following inequality
F F kf < kgR F (nR + kfS nS )
n
(3.23)
which characterizes situations for which a ﬁrstorder method for solving f involves less computation than the resolution of the decomposed system (3.17) and (3.18). The fraction in expression (3.23) is obviously greater or equal to one. The analysis of the structure of Jacobian matrices corresponding to most of the commonly used macroeconomic models shows that subsystem fS , corresponding F to a minimum feedback vertex set, is almost always recursive. Thus, kfS is equal F F to one and kgR , kf are identical, and therefore it is not necessary to formalize the solution of subsystem fS into separate step.
3.3 Point Methods versus Block Methods
59
3.3.3
Ordering and Convergence for Firstorder Iterations
The previous section clearly shows the interest of ﬁrstorder iterative methods for solving macroeconomic models. It is wellknown that the ordering of the equations is crucial for the convergence of ﬁrstorder iterations. The ordering we considered so far, is the result of a minimization of the cardinality nS of the feedback vertex set S. However, such an ordering is, in general, not optimal. Hereafter, we will introduce the notation used to study the convergence of ﬁrstorder iterative methods in relation with the ordering of the equations. A linear approximation for the normalized system is given by x = Ax + b , (3.24)
∂g with A = ∂x , the Jacobian matrix evaluated at the solution x∗ and b representing exogenous and lagged variables. Splitting A into A = L + U with L and U , respectively a lower and an upper triangular matrix, system (3.24) can then be written as (I − L)x = U x + b. Choosing GaussSeidel’s ﬁrstorder iterations (see Section 2.10), the kth step in the solution process of our system is
(I − L)x(k+1) = U x(k) + b
,
k = 0, 1, 2, . . .
(3.25)
where x(0) is an initial guess for x. It is well known that the convergence of (3.25) to the solution x∗ can be investigated on the error equation (I − L)e(k+1) = U e(k) ,
with e(k) = x∗ − x(k) as the error for iteration k. Setting B = (I − L)−1 U and relating the error e(k) to the original error e(0) we get ek = B k e0 , (3.26)
which shows that the necessary and suﬃcient condition for the error to converge to zero is that B k converges to a zero matrix, as presented in Section 2.4.6. This is guaranteed if all eigenvalues λi of matrix B verify λi  < 1. The convergence then clearly depends upon a particular ordering of the equations. If the Jacobian matrix A is a lower triangular matrix, then the algorithm will converge within one iteration. This suggests choosing a permutation of A so that U is as “small” as possible. Usually, the “magnitude” of matrix U is deﬁned in terms of the number of nonzero elements. The essential feedback arc set, deﬁned in Section 3.2, deﬁnes such matrices U which are “small”. However, we will show that such a criterion for the choice of matrix U is not necessarily optimal for convergence. In general, there are several feedback arc sets with minimum cardinality and the question of which one to choose arises. Without a theoretical framework from which to decide about such a choice, we resorted to an empirical investigation of a small macroeconomic model.4 The size n of the Jacobian matrix for this model is 28 and there exist 76 minimal feedback arc sets5 ranging from cardinality 9 to cardinality 15. We solved the
4 The 5 These
City University Business School (CUBS) model of the UK economy [13]. sets have been computed with the program Causor [48].
3.3 Point Methods versus Block Methods
60
CUBS model using all the orderings corresponding to these 76 essential feedback arc sets. We observed that convergence has been achieved, for a given period, in less than 20 iterations by 70% of the orderings. One ordering needs 250 iterations to converge and surprisingly 8 orderings do not converge at all. Moreover, we did not observe any relation between the cardinality of the essential feedback arc set and the number of iterations necessary to converge to the solution. Among others, we tried to weigh matrix U by means of the sum of the squared elements. Once again, we came to the conclusion that there is no relationship between the weight of U and the number of iterations. Trying to characterize the matrix U which achieves the fastest convergence, we chose to systematically explore all possible orderings for a four equations linear system. We calculated λmax = maxi λi  for each matrix B corresponding to the orderings. We observed that the n! possible orderings produce at most (n − 1)! distinct values for λmax . Again, we veriﬁed that neither the number of nonzero elements in matrix U nor their value is related to the values for λmax .
............. .................... ....... .. ... .. ..... . . ...... . ...... . . ... . .. ... .. ..... . . ... ... . . 1 .................................. 2 . . . . . . . . .. . . . . . .. . . .. . .. . . .. . . . .. . . . . .. . . .. . . . . . .. . . .. . .. . . . .. . .. . .. . . . . .. .. .. . . .. .. . . . . . .. .. . . .. .. .. . . . . .. . . .. . .. . . . .. ... .. ... . . .. .... .. .. . . ... . . . . .. . ... . . . . . . . . .... . ... . . . . 3 .................................... 4 . .. . . . . .. ... ... . .. . . .. ....... .. .... ... .. ...... ...... .................. ...............
−.65 i
1 1 2 3 4 −1 .5
2 0 −1
3 0 0
4 −.65 0 0 −1 3 1 2 4
3 −1 0 0
1
2
4 0 −.65 0 −1
i
.5
−.4 −1.3 −1 .5 .6 0 −1 2
.6
−.4
−1.3
2
i a
i
−.4 −1.3 −1 .6 2
a
a
Figure 3.5: Numerical example showing the structure is not suﬃcient. Figure 3.5 displays the graph representing the four linear equations and matrices A − I corresponding to two particular orderings—[1, 2, 3, 4] and [3, 1, 2, 4]—of these equations. If the value of the coeﬃcient a is .7, then the ordering [1, 2, 3, 4], corresponding to matrix U with a single element, has the smallest possible λmax = .56. If we take the equations in the order [3, 1, 2, 4], we get λmax = 1.38. Setting the value of coeﬃcient a at −.7 produces the opposite. The ordering [3, 1, 2, 4] which does not minimize the number of nonzero elements in matrix U gives λmax = .69, whereas the ordering [1, 2, 3, 4] gives λmax = 1.52. This example clearly shows that the structure of the equations in itself is not a suﬃcient information to decide on a good ordering of the equations. Having analyzed the causal structure of many macroeconomic models, we observed that the subset of equations corresponding to a feedback vertex set (spike variables) happens to be almost always recursive. For models of this type, it is certainly not indicated to use a Newtontype technique to solve the submodel within the block method. Obviously, in such a case, the block method is equivalent to a ﬁrstorder iterative technique applied to the complete system. The crucial question is then, whether a better ordering than the one derived from the block method exists.
3.4 Essential Feedback Vertex Sets and the Newton Method
61
3.4
Essential Feedback Vertex Sets and the Newton Method
In general the convergence of Newtontype methods is not sensitive to diﬀerent orderings of the Jacobian matrix. However, from a computational point of view it can be interesting to reorder the model so that the Jacobian matrix shows the pattern displayed in Figure 3.2 panel (a). This pattern is obtained by computing an essential feedback vertex set as explained in Section 3.2. One way to exploit this ordering is to condense the system into one with the same size as S. This has been suggested by Don and Gallo [27], who approximate the Jacobian matrix of the condensed system numerically and apply a Newton method on the latter. The advantage comes from the reduction of the size of the system to be solved. We note that this amounts to compute the solution of a diﬀerent system, which has the same solution as the original one. Another way to take advantage of the particular pattern generated the set S is that the LU factorization of the Jacobian matrix needed in the Newton step, can easily be computed. Since the model is assumed to be normalized, the entries on the diagonal of the Jacobian matrix are ones. The columns not belonging to S remain unchanged in the matrices L and U , and only the columns of L and U corresponding to the set S must be computed. We may note that both approaches do not need sets S that are essential feedback vertex sets. However, the smaller the cardinality of S, the larger the computational savings.
Chapter 4
Model Simulation on Parallel Computers
The simulation of large macroeconometric models may be considered as a solved problem, given the performance of the computers available at present. This is probably true if we run single simulations on a model. However, if it comes to solving a model repeatedly a large number of times, as is the case for stochastic simulation, optimal control or the evaluation of forecast errors, the time necessary to execute this task on a computer can become excessively large. Another situation in which we need even more eﬃcient solution procedures arises when we want to explore the behavior of a model with respect to continuous changes in the parameters or exogenous variables. Parallel computers may be a way of achieving this goal. However, in order to be eﬃcient, these computing devices require the code to be speciﬁcally structured. Examples of use of vector and parallel computers to solve economic problems can be found in Nagurney [81], Amman [3], Ando et al. [4], Bianchi et al. [15] and Petersen and Cividini [87]. In the ﬁrst section, the fundamental terminology and concepts of parallel computing that are generally accepted are presented. Among these, we ﬁnd a taxonomy of hardware, the interconnection network, synchronization and communication issues, and some performance measures such as speedup and eﬃciency. In a second section, we will report simulation experiences with a medium sized macroeconometric model. Practical issues of the implementation of parallel algorithms on a CM2 massively parallel computer are also presented.
4.1
Introduction to Parallel Computing
Parallel computation can be deﬁned as the situation where several processors simultaneously execute programs and cooperate to solve a given problem. There are many issues that arise when considering parallel numerical computation. In our discussion, we focus on the case where processors are located in one
4.1 Introduction to Parallel Computing
63
computer and communicate reliably and predictably. The distributed computation scheme where, for instance, several workstations are linked through a network is not considered even though there are similar issues regarding the implementation of the methods in such a framework. There are important distinctions to be made and discussed when considering parallel computing, the ﬁrst being the number of processors and the type of processors of the parallel machine. Some parallel computers contain several thousand processors and are called massively parallel. Some others contain fewer processing elements (up to a few hundreds) that are more powerful and can execute more complicated tasks. This kind of computing system is usually called a coarsegrained parallel computer. Second, a global control on the processors’ behavior can be more or less tight. For instance, certain parallel computers are controlled at each step by a frontend computer which sends the instructions and/or data to every processor of the system. In other cases, there may not be such a precise global control, but just a processor distributing tasks to every processor at the beginning of the computation and gathering results at the end. A third distinction is made at the execution level. The operations can be synchronous or asynchronous. In a synchronous model, there usually are phases during which the processors carry out instructions independently from the others. The communication can only take place at the end of a phase, therefore insuring that the next phase starts at each processor with an updated information. Clearly, the time a processor waits for its data as well as the time for synchronizing the process may create an overhead. An alternative is to allow an asynchronous execution where there is no constraint to wait for information at a given point. The exchange of information can be done at any point in time and the old information is purged if it has not been used before new information is made available. In this model of execution, it is much harder to ensure the convergence if numerical algorithms are carried out. The development of algorithms is a diﬃcult task since a priori we have no clear way of checking precisely what will happen during the execution.
4.1.1
A Taxonomy for Parallel Computers
Parallel computers can be classiﬁed using Flynn’s taxonomy, which is based upon levels of parallelization in the data and the instructions. These diﬀerent classes are presented hereafter. A typical serial computer is a Single Instruction Single Data (SISD) machine, since it processes one item of data and one operation at a time. When several processors are present, we can carry out the same instruction on diﬀerent data sets. Thus, we get the Single Instruction Multiple Data (SIMD) category also called data parallel. In this case, the control mechanism is present at each step since every processor is told to execute the same instruction. We are also in the presence of a synchronous execution because all the processors carry out their job in unison and none of them can execute more instructions than another.
4.1 Introduction to Parallel Computing
64
Symmetrically, Multiple Instruction Single Data (MISD) computers would process a single stream of data by performing diﬀerent instructions simultaneously. Finally, the Multiple Instruction Multiple Data (MIMD) category seems the most general as it allows an asynchronous execution of diﬀerent instructions on diﬀerent data. It is of course possible to constrain the execution to be synchronous by blocking the processes and by letting them wait for all others to reach the same stage. Nowadays, the MIMD and SIMD categories seem to give way to a new class named SPMD or Same Program Multiple Data. In this scheme, the system is controlled by a single program and combines the ease of use of SIMD with the ﬂexibility of MIMD. Each processor executes a SIMD program on its data but it not constrained by a totally synchronous execution with the other processors. The synchronization can actually take place only when processors need to communicate some information to each other. We may note that this can be achieved by programming a MIMD machine in a speciﬁc way. Thus, the SPMD is not usually considered as a new category of parallel machines but may be viewed as a programming style. Network Structures We essentially ﬁnd two kinds of memory systems in parallel computers; ﬁrst a shared memory system, where the main memory is a global resource available to all processors. This is illustrated by Figure 4.1, where M stands for memory modules and P for processors. A drawback of such a system is the diﬃculty in managing simultaneous write instructions given by distinct processors. P P ··· P
Interconnection
M
M
···
M
Figure 4.1: Shared memory system. A second memory system is a local memory, where each processor has its own memory; it is also called a distributed memory system and is illustrated in Figure 4.2. In the following, we will focus our discussion on distributed memory systems. The topology of the interconnection between the processing units is a hardware characteristic. The various ways the processors can be connected determines how the commu
4.1 Introduction to Parallel Computing
65
M
M ···
M
P
P
P
Interconnection
Figure 4.2: Distributed memory system. nication takes place in the system. This in turn may inﬂuence the choices for particular algorithms. The interconnection network linking the processors also plays a critical part in the communication time. We will now describe now some possible architectures for connecting the processing elements together. The communication network must try to balance two conﬂicting goals: on one hand, a short distance between any pair of processors in order to minimize the communication costs and, on the other hand, a low number of connections because of physical and construction costs constraints. The following three points can be put forward: • A ﬁrst characteristic of networks is their diameter deﬁned as the maximum distance between any pair of processors. The distance between two processors is measured as the minimum number of links between them. This determines the time an item of information takes to travel from a processor to another. • A second characteristic is the ﬂexibility of the network, i.e. the possibility of mapping a given topology into another that may better ﬁt the problem or the algorithm. • The network topology should also be scalable in order to allow other processing units to be added to extend the processing capabilities. The simplest network is the linear array where the processors are aligned and connected only with their immediate neighbors. This scheme tends to be used in distributed computing rather than in parallel computing, but it is useful as worstcase benchmark for evaluating a parallel algorithm. Figure 4.3 illustrates such a linear array.
... ....... ....... ........ ........ ........ ........ ........ ......... ......... ........ ......... ... .... .. .. .. .. .. .. . . . . . .. .. .. .. .. . . . . . . . . ...................................................................................................... . . . . . . . . . . . . ............................................. .......... ............................................ . . . . . . . . . . . . . . . .. . .. .. .. .. .. .. .. . . . .. ... ... ... ... ... ... .... .. .. .. .. ........ ........ ....... ........ ....... ..... .. ....... ....... ..... ..... ..... ....
Figure 4.3: Linear Array.
4.1 Introduction to Parallel Computing
66
This network can be improved by connecting the ﬁrst and last processors of the linear array, thus creating a ring, see Figure 4.4. For a ring the times for communication are at best halved compared to the linear array.
........ ......... .. .. . . . .... . . . ..... . . ... .. . ... .. . .. . ................ ................. . . ... ... ... ... ... ... ... ... ... ....... ... ........ ....... .... ............ .. . .. . .. .. . .. ... . ...... . ... . .... . . ... . . . . . . . .. . . . . . . . .. . .. .. . .. .. . . .. . . . . ....... ........ ....... ...... . . . . . . . . . . . . . . . . . . . . . . . . ... .. . ....... ....... . ... . ... ... . ... .. . ... .. . ... . . . . . . . . . . . ... . . ... . . . ... . . .. . . .... . .. .. ... . .. .... .. . ........ .... .. . .. ........ ... ... . . ... ... ... . ... ... ... ... ... ... ... ... ... ........ ..... ... .. ........... .. . . .. . .... ... . . ....... . . ..... . . . .. . .. ......... .......
Figure 4.4: Ring. Another possibility is to arrange the processors on a mesh, that is a grid in dimension 2. The deﬁnition of the mesh can be extended to dimension d, by stating that only neighbor processors along the axes are connected. Figure 4.5 shows a mesh of 6 processors in dimension 2.
........ ........ ........ ......... ......... ......... .. .. .. . . .. .. .. . . . ......................................... . . . . . . . . ............... ......... ............... . . . . . . . . . .. . . . .. . .. .. . .. .. . . .. . . .. . . .. . . . ... ........ . ....... ........ .... . ....... ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... .... ....... ....... ......... ..... ... . ... . ... . . ... . ... . . . .. .. . .. . .. . .. . .......................................... . ..................... .... . . . . ................ . . . . . . . . . . . . . . . . . .. .. .. .. . . .. .. ......... ......... ......... ........ ........ ........
Figure 4.5: Mesh. In a torus network, the maximum distance is approximately cut in half compared to a mesh by adding connections between the processors along the edges to wrap the mesh around. Figure 4.6 illustrates a torus based on the mesh of Figure 4.5.
.................. .................... ..... .... .... ... ... ... ... ....... ...... .... ....... .. ........... ........ .. ......... . . .. .. .... . .. . . .. . .......................................... .. . .. . ........................................... . ... . . . . . . . .. ... . . . . . . . . . . . .. . . .. . . .. . .. .. . .. .. . . .. . . ......... . ....... ........ .... ... ....... ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... ..... ..... ...... ...... ...... .. . .. . .. . ... . . ... . . ... . . .. . .. .. . .. .. . .. . . . . . . . . .. .............. ......... ............. . . . .......................................... . . .. . . . . . . . . . . .. . . . .. .. . .... .. .... .. .. . . . ... . . .. . ......... ...... . ... ........ .... .. ..... ... ... ....... .... ... .... .... .... ...... .... ....... ............... .............
Figure 4.6: Torus. Another important way of interconnecting the system is the hypercube topology. A hypercube of dimension d can be constructed recursively from 2 hypercubes of dimension d−1 by connecting their corresponding processors. A hypercube of dimension 0 is a single processor. Figure 4.7 shows hypercubes up to dimension 4. Each of the 2d processors in a dcube has therefore log2 p connections and the maximum distance between two processors is also log2 p. The last topology we consider here is the complete graph, where every processor is connected to each other, see Figure 4.8. Even if this topology is the best in terms of maximum distance, it is unfortunately not feasible for a large number of processors since the number of connections grows quadratically.
4.1 Introduction to Parallel Computing i i ( ( i ( i ( i i ( ( i i ( (
67
i i i i i
i i
Figure 4.7: Hypercubes.
..... ........ ... ... .. . . . ... . . ..... . . ......... . . . . ....... .. . . . .. .. .. .................. ... ............... . . . . ... .. . .. ... ... .. . ... .... . . .. ... ... .. . .. ............. ..... ...... .. . ... .............. . . ... ...... .. . . . .. . ... ... . .. . . .. . . . ...... . . ................................................... . . . . . .... . . . ... . . .. . . ................................................... . .. . . . . ....... .. . . .. . . .... .. .. ..... ... .. .. . . .. . ........ . . ....... . . . ......... .... ....... . .. . . .. ... . ..... .... ........ . .. . . .. . . . .. .. ...... . ....... . . . . . .. ... ....... . ....... .. .. . . ....... . .. .. . . .. . .. . . . ... .. . ...... . ....... .. . . . .. . . .. .. ..... . ....... .. .. . .... . ....... .. .. . . . . . .... ....... . .... ......... ...... ....... . ...... ....... . . . .. . .. ... . . . . . . ..... .. . .. . . .... .. .. ..... ... .. .. . .. ... . . .. . .. .. . .............................. ............. . . .. ... .. .. . ............................................. . . .. ...... . . . ... . . . .. . . . .... . .. . .. ... .. .. ... . . ... ..... .. ... . . ....... .... . .. . . . ... ...... .. ...... ... ... ... . .. ..... ..... . ... .. . ... ..... ... .. . . ... ... . . . ... ... ........ .. ... ........ ... . . ........... . .. . .... ........ ... . .. . ... . . .... . . . . . .. .. ......... .......
Figure 4.8: Complete graph.
4.1.2
Communication Tasks
The new problem introduced by multiple processing elements is the communication. The total time used to solve a problem is now the sum of of the time spent for computing and the time spent for communicating: T = Tcomp + Tcomm . We cannot approach parallel computing without taking the communication issues into account, since these may prevent us from achieving improvements over serial computing. When analyzing and developing parallel algorithms, we would like to keep the computation to communication ratio, i.e. Tcomp/Tcomm , as large as possible. This seems quite obvious, but is not easy to achieve since a larger number of processors tends to reduce the computation time, while increasing the communication time (this is also known as the minmax problem). This tradeoﬀ challenges the programmer to ﬁnd a balance between these two conﬂicting goals. The complexity measures—deﬁned in Appendix A.3—allow us to evaluate the
4.1 Introduction to Parallel Computing
68
time and communication complexities of an algorithm without caring about the exact time spent on a particular computer in a particular environment. These measures are functions of the size of the problem that the algorithm solves. We now describe a small set of communication tasks that play an important role in a large number of algorithms. The basic communication functions implement send and receive operations, but one quickly realizes that some standard communications tasks of higher level are needed. These are usually optimized for a given architecture and are provided by the compiler manufacturer. Single Node Broadcast The ﬁrst standard communication task is the single node broadcast. In this case, we want to send the same packet of information from a given processor, also called node, to all other processors. Multinode Broadcast An immediate generalization is the multinode broadcast where each node simultaneously sends the same information to all others. Typically, this operation takes place in iterative methods when a part of the problem is solved by a node, which then sends the results of its computation to all other processors before starting the next iteration. Single Node and Multinode Accumulation The dual problem of the single node and multinode broadcast are respectively the single node and multinode accumulation. Here, the packets are sent from every node to a given node and these packet can be combined along the path of the communication. The simplest example of such a task is to think of it as the addition of numbers computed at each processors. The partial sums of these numbers are combined at the various nodes to ﬁnally lead to the total sum at the given accumulation node. The multinode accumulation task is performed by carrying out a single node accumulation at each node simultaneously. Single Node Scatter The single node scatter operation is accomplished by sending diﬀerent packets from a node to every other node. This is not to be considered as a single node broadcast since diﬀerent information is dispatched to diﬀerent processors. Single Node Gather As in the previous cases, there is a dual communication task called the single node gather that collects diﬀerent items of information at a given node from every other node.
4.1 Introduction to Parallel Computing
69
Problem Single Node Broadcast Single Node Scatter Multinode Broadcast Total Exchange
Linear Array Θ(p) Θ(p) Θ(p) Θ(p2 )
Hypercube Θ(log p) Θ(p/ log p) Θ(p/ log p) Θ(p)
Table 4.1: Complexity of communication tasks on a linear array and a hypercube with p processors. Total Exchange Finally, the total exchange communication task involves sending diﬀerent packets from each node to every other node. Complexity of Communication Tasks All these communication tasks are linked through a hierarchy. The most general and most diﬃcult problem is the total exchange. The multinode accumulation and multinode broadcast tasks are special cases of the total exchange. They are simpler to carry out and are ranked second in the hierarchy. Then come the single node scatter and gather which are again simpler to perform. At last the single node broadcast and accumulation tasks are the fourth in terms of communication complexity. The ranking in the hierarchy remains the same whatever topology is used for interconnecting the processors. Of course, the complexity of the communication operations change according to the connections between the processors. Table 4.1 gives the complexity of the communication tasks with respect to the interconnection network used.
4.1.3
Synchronization Issues
As has already been mentioned, a parallel algorithm can carry out tasks synchronously or asynchronously. A synchronous behavior is obtained by setting synchronization points in the code, i.e. statements that instruct the processors to wait until all of them have reached this point. In the case were there exists a global control unit, the synchronization is done automatically. Another technique is local synchronization. If the kind of information a processor needs at a certain point in execution is known in advance, the processors can continue its execution as soon as it has received this information. It is not necessary for a processor to know whether any other information sent has been received, so there is no need to wait conﬁrmation from other processors. The problem of a synchronous algorithm is that slow communication between the processors can be detrimental to the whole method. As depicted in Figure 4.9, long communication delays may lead to excessively large total execution times. The idle periods are dashed in the ﬁgure. Another frequent problem is that a heavier workload for a processor may degrade
4.1 Introduction to Parallel Computing
70
I q
I q
Figure 4.9: Long communication delays between two processors. the whole execution. Figure 4.10 shows that this happens although communication speed is fast enough. 0 t t t ! ¡ ¡ t t t
¡
Figure 4.10: Large diﬀerences in the workload of two processors. The communication penalty and the overall execution time of many algorithms can often be substantially reduced by means of an asynchronous implementation. For synchronous algorithms we a priori know in which sequence the statements are executed. In contrast, for an asynchronous algorithm the sequence of computations can diﬀer from one execution to another thus leading to diﬀerences in the execution.
4.1.4
Speedup and Eﬃciency of an Algorithm
In order to compare serial algorithms with parallel algorithms, we need to recall a few deﬁnitions. The speedup for a parallel implementation of an algorithm using p processors and solving a problem of size n is generally deﬁned as Sp (n) = T ∗ (n) , Tp (n)
where Tp (n) is the time needed for parallel execution with p processors and T ∗ (n) is the optimal time for serial execution. As this optimal time is generally unknown, T ∗ (n) is replaced by T1 (n), the time required by a single processor to execute the particular parallel algorithm. An ideal situation is characterized by Sp (n) = p. The eﬃciency of an algorithm is then deﬁned by the ratio Ep (n) = T1 (n) Sp (n) = , p p Tp (n)
which ranges from 0 to 1 and measures the fraction of time a processor is not standing idle in the execution process.
4.2 Model Simulation Experiences
71
4.2
Model Simulation Experiences
In this section, we will present a practical experience with solution algorithms executed in a SIMD environment. These results have been published in Gilli and Pauletto [53] and the presentation closely follows the original paper.
4.2.1
Econometric Models and Solution Algorithms
The econometric models we consider here for solution are represented by a system of n linear and nonlinear equations F (y, z) = 0 ⇐⇒ fi (y, z) = 0 , i = 1, 2 . . . , n , (4.1)
where F : Rn × Rm → Rn is diﬀerentiable in the neighborhood of the solution y ∗ ∈ Rn and z ∈ Rm are the lagged and the exogenous variables. In practice, the Jacobian matrix DF = ∂F/∂y of an econometric model can often be put into a blockrecursive form, as shown in Figure 3.1, where the dark shadings indicate interdependent blocks and the light shadings the existence of nonzero elements. The solution of the model then concerns only interdependent submodels. The approach presented hereafter, applies both to solution of macroeconometric models and to any large system of nonlinear equations having a structure similar to the one just described. Essentially, two types of wellknown methods are commonly used for the numerical solution of such systems: ﬁrstorder iterative techniques and Newtonlike methods. The algorithm have already been introduced as Algorithm 18 and Algorithm 19 for Jacobi and GaussSeidel, and Algorithm 14 for Newton. Hereafter, these algorithms are presented again with slightly diﬀerent layouts as we will deal with a normalized system of equations. Firstorder Iterative Techniques The main ﬁrstorder iterative techniques are the GaussSeidel and the Jacobi iterations. For these methods, we consider the normalized model as shown in Equation (2.18), i.e. yi = gi (y1 , . . . , yi−1 , yi+1 , . . . , yn , z) , i = 1, . . . , n . In the Jacobi case, the generic iteration k can be written yi
(k+1) (k) = gi (y1 , . . . , yi−1 , yi+1 , . . . , yn , z) , (k) (k) (k)
i = 1, . . . , n
(4.2)
and in the GaussSeidel case the generic iteration k uses the i − 1 updated components of the vector y (k+1) as soon as they are available, i.e. yi
(k+1)
= gi (y1
(k+1)
k (k) , . . . , yi−1 , yi+1 , . . . , yn , z) ,
(k+1)
i = 1, . . . , n .
(4.3)
In order to be operational the algorithm must also specify a termination criterion for the iterations. These are stopped if the changes in the solution vector y (k+1)
4.2 Model Simulation Experiences
72
become small enough. For instance, when the following condition is veriﬁed εi = yi
(k)
− yi
(k−1)

(k−1) yi 
<η
i = 1, 2, . . . , n ,
+1
where η is a given tolerance. This criterion is similar to the one introduced in Section 2.13 for appropriately scaled variables. Firstorder iterative algorithms then can be summarized into the three statements given in Algorithm 22. The only diﬀerence in the code between the Jacobi and the GaussSeidel algorithms appears in Statement 2. Algorithm 22 Firstorder Iterative Method
1. 2. 3. do while ( not converged ) y0 = y1 Evaluate all equations not converged = any(y 1 − y 0 /(y 0  + 1) > η) end do
Jacobi uses diﬀerent arrays for y 1 and for y 0 , i.e.
1 0 0 0 0 yi = gi (y1 , . . . , yi−1 , yi+1 , . . . , yn , z) ,
whereas GaussSeidel overwrites y 0 with the computed values for y 1 and therefore the same array is used in the expression
1 1 1 0 0 yi = gi (y1 , . . . , yi−1 , yi+1 , . . . , yn , z) .
Obviously, the logical variable “not converged” and the array y 1 have to be initialized before entering the loop of the algorithm. Newtonlike Methods When solving the system with Newtonlike methods, the general step k in the iterative process can be written as Equation (2.10), i.e. y (k+1) = y (k) − (DF (y (k) , z))−1 F (y (k) , z) . (4.4)
Due to the fact that the Jacobian matrix DF is large but very sparse, as far as econometric models are concerned, Newtonlike and iterative methods are often combined into a hybrid method (see for instance Section 3.3.) This consists in applying the Newton algorithm to a subset of variables only, which are the feedback or spikevariables. Two types of problems occur at each iteration (4.4) when solving a model with a Newtonlike method: (a) the evaluation of the Jacobian matrix DF (y k , z) (b) the solution of the linear system involving (DF (y (k) , z))−1 F (y (k) , z).
4.2 Model Simulation Experiences
73
The Jacobian matrix has not to be evaluated analytically and the partial derivatives of hi (y) are, in most algorithms, approximated by a quotient of diﬀerences as given in Equation (2.11). The seven statements given in Algorithm 23 schematize the Newtonlike method, and we use the same initialization as for Algorithm 22. Algorithm 23 Newton Method
1. 2. 3. 4. 5. 6. 7. do while ( not converged ) y0 = y1 X = hI + ι y 0 matrix X contains elements xij for i = 1 : n, evaluate aij = fi (x.j , z), j = 1 : n and fi (y 0 , z) J = (A − ι F (y 0 , z) )/h solve J s = F (y 0 , z) y1 = y0 + s not converged = any(y 1 − y 0 /(y 0  + 1) > η) end do
4.2.2
Parallelization Potential for Solution Algorithms
The opportunities existing for a parallelization of solution algorithms depend upon the type of computer used, the particular algorithm selected to solve the model and the kind of use of the model. We essentially distinguish between situations where we produce one solution at a time and situations where we want to solve a same model for a large number of diﬀerent data sets. In order to compare serial algorithms with parallel algorithms, we will use the concepts of speedup and eﬃciency introduced in Section 4.1.4. MIMD Computer A MIMD (Multiple Instructions Multiple Data) computer possesses up to several hundreds of fairly powerful processors which communicate eﬃciently and have the ability to execute a diﬀerent program. The Jacobi algorithm. Firstorder iterative methods present a natural set up for parallel execution. This is particularly the case for the Jacobi method, where the computation of the n equations in statement 2 in Algorithm 22, consists of n diﬀerent and independent tasks. They can be executed on n processors at the same time. If we consider that the solution of one equation necessitates one time unit, the speedup for a parallel execution of the Jacobi method is T1 (n)/Tn (n) = n and the eﬃciency is 1, provided that we have n processors at our disposal. This potential certainly looks very appealing, if parallel execution of the Jacobi algorithm is compared to a serial execution. In practice however, GaussSeidel iterations are often much more attractive than the Jacobi method, which, in general, converges very slowly. On the contrary, the advantage of the Jacobi method is that its convergence does not depend on particular orderings of the
4.2 Model Simulation Experiences
74
equations. The GaussSeidel algorithm. In the case of GaussSeidel iterations, the k+1 system of equations (4.3) deﬁnes a causal structure among the variables yi , i = 1, . . . , n. Indeed, each equation i deﬁnes a set of causal relations going from the k+1 k+1 righthand side variables yj , j = 1, . . . , i−1 to the lefthand side variable yi . This causal structure can be formalized by means of a graph G = (Y k+1 , A), k+1 k+1 where the set of vertices Y k+1 = {y1 , . . . , yn } represents the variables and k+1 k+1 k+1 A is the set of arcs. An arc yj → yi exists if the variable yj appears k+1 in the righthand side of the equation explaining yi . The way the equations (4.3) are written1 results in a graph G without circuits, i.e. a directed acyclic graph (DAG). This implies the existence of a hierarchy among the vertices, which means that they can be partitioned into a sequence of sets, called levels, where arcs go only from lower numbered levels to higher numbered levels and where there are no arcs between vertices in a same level. As a consequence, all variables in a level can be updated in parallel. If we denote by q the number of levels existing in the DAG, the speedup for a parallel execution of a single iteration is Sp (n) = n and the eﬃciency is pnq . q Diﬀerent orderings can yield diﬀerent DAGs and one might look for an ordering which minimizes the number of levels2 in order to achieve the highest possible speedup. However, such an ordering can result in a slower convergence (a larger number of iterations) and the question of an optimal ordering certainly remains open. The construction of the DAG can best be illustrated by means of a small system of 12 equations the incidence matrix of the Jacobian matrix g of which is shown on the lefthand side of Figure 4.11. We then consider a decomposition L + U of this incidence matrix, where L is lower triangular and U is upper triangular. The matrix on the righthand side in Figure 4.11 has an ordering which minimizes3 the elements in U . Matrix L then corresponds to the incidence matrix of our directed acyclic graph G presented before, and the hierarchical ordering of the vertices into levels is also shown in Figure 4.11. We can notice the absence of arcs between vertices in a same level and, therefore, the updating of the equations corresponding to these vertices constitutes diﬀerent tasks which can all be executed in parallel. According to the number of vertices in the largest level, which is level 3, the speedup for this example, when using 5 processors, is S5 (12) = T1 (12)/T5 (12) = 12/4 = 3 with eﬃciency 12/(5 · 4) = 0.6. This deﬁnition neglects the time needed to communicate the results from the processors which update the equations in a level to the processors which need this information to update the equations in the next level. Therefore, the speedup one can expect in practice will be inferior.
1A
i > j. 2 This would correspond to a minimum coloring of G. 3 In general, such an ordering achieves a good convergence of the GaussSeidel iterations, see Section 3.3.
k+1 k+1 variable yj in the righthand side of the equation explaining yi always veriﬁes
4.2 Model Simulation Experiences
75
1 2 3 4 5 6 7 8 9 10 11 12
1 . . . 1 . . . . . . 1
. 1 . 1 . 1 . 1 1 . 1 .
. . 1 . . . . . . 1 . .
. . . 1 1 . . . . . . 1
. 1 . . 1 . . . . . . .
. . 1 . . 1 . . . . . 1
. . 1 1 . 1 1 1 . . . .
. 1 1 . . . . 1 . . . .
. . . . 1 . . . 1 . . .
1 . 1 1 . . 1 . . 1 . .
1 . . . . . . . 1 . 1 .
1 . . . . . . . . . . 1
1 2 3 4 5 6 7 8 9 1 1 1 0 1 2
10 2 7 11 8 6 4 1 9 3 12 5
1 . 1 . . . 1 1 . 1 . .
. 1 . 1 1 1 1 . 1 . . .
. . 1 . 1 1 1 . . 1 . .
. . . 1 . . . 1 1 . . .
. 1 . . 1 . . . . 1 . .
. . . . . 1 . . . 1 1 .
. . . . . . 1 . . . 1 1
. . . . . . . 1 . . 1 1
. . . . . . . . 1 . . 1
1 . . . . . . . . 1 . .
. . . . . . . 1 . . 1 .
. 1 . . . . . . . . . 1
10 2 . .. . . .. . . .. ... . . . ... . . . .. . . .. ... .. . . .. ... . . . . . . ... . . . . . . . .. . .. ......... .. . . ... .. . .. ....... ... . .. ... . . . . ..... . . .. . . ... .. .. . . . . .. . . .. . .. . .. . . . . . .. . ... .. . . . ... ..... ... . . . . ..... .. .... . .. . . ..... .. ...... . . . . .. . 11 ... . .... .. 7 .... .. . ..... .. .. .. . ...... .... . .. . ... ... . . . .. . .. .. . . ......... .... . . .. .. . ... .. .... .. .. . ............. .. .. . . .. . . . . . . .. ... .... .... ... . . . ... . ... .. .. ..... ... ..... ... .. . ... .. .... ... . . ... . ... . . . ... . .. .. .. .. .. .. . ... .. .. . .. . . . . ... ..... ... .. . . .. . .. .. ..... .. ...... .. . ........ . . ... ... ...... .. . ... .. . .. . . .. . .. ..... . ....... . . ... . . . . .. . .. . . . . . . . . . . . . . 8 6 4 1 9 . . . . . . . . . . . . .. ... .. . . . . .... ..... . .. .. . .. .. . .... . .. . . .. . . .. .. . . ... .. . .. . . .. .. ... . . ... . . . .. .. ... ... . .. ... .. .... .... .. ... ... . .. .. . ... ... .. .. . . .. .. .. . . ... . ... .. .. ... .. ..... .. . .. .. . ... . .. .. .. .. .. ... . .. . . ... . . .. ..
h
h
level 1
h
h
level 2
h
h
h
h
5
h level 3
level 4
3
h
12
h
h
1 2 7 1 8 6 4 1 9 3 1 5 0 1 2
Figure 4.11: Original and ordered Jacobian matrix and corresponding DAG. SIMD Computer A SIMD (Single Instruction Multiple Data) computer usually has a large number of processors (several thousands), which all execute the same code on diﬀerent data sets stored in the processor’s memory. SIMDs are therefore data parallel processing machines. The central locus of control is a serial computer called the front end. Neighboring processors can communicate data eﬃciently, but general interprocessor communication is associated with a signiﬁcant loss of performance. Single Solution. When solving a model for a single period with the Jacobi or the GaussSeidel algorithm, there are not any possibilities of executing the same code with diﬀerent data sets. Only the Newtonlike method oﬀers opportunities for data parallel processing in this situation. This concerns the evaluation of the Jacobian matrix and the solution of the linear system.4 We notice that the computations involved in approximating the Jacobian matrix are all independent. We particularly observe that the elements of the matrices of order n in Statements 2 and 4 in Algorithm 23 can be evaluated in a single step with a speedup of n2 . For a given row i of the Jacobian matrix, we need to evaluate the function hi for n + 1 diﬀerent data sets (Statement 3 Algorithm 23). Such a row can then be processed in parallel with a speedup of n + 1.
do not discuss here the ﬁne grain parallelization for the solution of the linear system, for which we used code from the CMSSL library for QR factorization, see [97].
4 We
4.2 Model Simulation Experiences
76
Repeated Solution of a Same Model. For large models, the parallel implementation for the solution of a model can produce considerable speedup. However, these techniques really become attractive if a same model has to be solved repeatedly for diﬀerent data sets. In econometric analysis, this is the case for stochastic simulation, optimal control, evaluation of forecast errors and linear analysis of nonlinear models. The extension of the solution techniques to the case where we solve the same model many times is immediate. For ﬁrstorder iterative methods, equations (4.3) of the GaussSeidel method for instance, become yir
(k+1)
= gi (y1r
(k+1)
(k) , . . . , yi−1,r , yi+1,r , . . . , ynr , Z) ,
(k+1)
(k)
i = 1, . . . , n r = 1, . . . , p
(4.5)
where the subscript r accounts for the diﬀerent data sets. One equation gi can then be computed at the same time for all p data sets.5 For the Jacobi method, we proceed in the same way. In a similar fashion, the Newtonlike solution can be carried out simultaneously for all p diﬀerent data sets. The p Jacobian matrices are represented by a 3 dimensional array J, where the element Jijr represents the derivation of equation hi , with respect to variable yj , and evaluated for the rth data set. Once again, the matrix J(i, 1 : n, 1 : p) can be computed at the same time for all elements. For the solution of the linear system, the aforementioned software from the CMSSL library can also exploit the situation, where all the p linear problems are presented at once.
4.2.3
Practical Results
In order to evaluate the interest of parallel computing in the econometric practice of model simulation, we decided to solve a real world mediumsized nonlinear model. Our evaluation not only focuses on speedup but also comments on the programming needed to implement the solution procedures on a parallel computer. We used the Connection Machine CM2, a socalled massively parallel computer, equipped with 8192 processors. The programming language used is CM Fortran. The performances are compared against serial execution on a Sun ELC Sparcstation. The model we solved is a macroeconometric model of the Japanese economy, developed at the University of Tsukuba. It is a dynamic annual model, constituted by 98 linear and nonlinear equations and 53 exogenous variables. The dynamic is generated by the presence of endogenous variables, lagged over one and two periods. The Jacobian matrix of the model, put into its blockrecursive form, has the pattern common to most macroeconometric models, i.e. a large fraction of the variables are in one interdependent block preceeded and followed by recursive equations. In our case, the interdependent block contains 77 variables. Six variables are deﬁned recursively before the block followed by 15 variables which do not feed back on the block. Figure 4.12 shows the blockrecursive pattern of the Jacobian matrix where big dots represent the nonzero elements and small dots correspond to zero entries.
5 Best performance will be obtained if the number of data sets equals the number of available processors.
4.2 Model Simulation Experiences
77
· ·· ··· ···· ···· ····· · ·········································································· ····· ·· ·· ·· ·············································································· ····· ···· ········································································· ····· ·············································································· ····· ····························································· ····· ·············································································· ····· ··· · ················· ···························································· ····· ·················· ·························································· ·················· ························································ ····· ··· ·················· ······················································· ·· ·· ·············································································· ····· ··············································· ····· ······················· ····· ······························· ·············································· ····· · ··················· ·········· ············································ ····· ············· ········ ······················································ ···························································· ······································ ····· ·································· ··········································· ····· ·············· ·· ····· ·································· ····· ············ ··························· ································· ····· ············································· ································ ····· ···· ······ ·································································· ······ ···································· ······························· ····· ····· ··············································· ······························ ····· ················································ ····························· ····· ········ ········· ······························ ···························· ····· ·············································································· ····· ········· ········································ ························· ····· ······· ············································· ························ ····· ····· ···· ······················ ············································ ····· ·············································································· ····· ·············································· ······· ······················· ····· ············· ········································ ······················ ····· ············· ··················· ··········································· ····· ····················· ·································· ····················· ····· ············ ································································· ····· ·········· ·············································· ··················· ····· ······· ···· ········ ····································· ·················· ····· ····································· ······································· ····· ··································· ·· ····· ········· ···· ················ ····· ·············································································· ····· ····················· ········································ ·············· ····· ···································· ·· ·· ····················· ············· ····· ············· ······························ ···················· ············ ····· · ····· ······················································· ··········· ····· ···················· ·············································· ·········· ······························ ····· ···················· ························································· ····· ············· ·················· ·············· ···························· ····· ············· ··················· ············· ····························· ····· ···················· ····························· ·························· ····· ······ ········· ····························································· ····· ············· ·················· ··········································· ··················································· ························ ····· ····· ······················································ ············· ········· ····· ································· ····················· ···· ················· ····· ················· ······································ ··· ················· ····· ············ ············································ ··· ················ ····· ······················································· ·· ····· ····· ········································································ ····· ················· ···························································· ····· ····································· ······························· ········ ····· ··················· ····················· ···································· ····································· ······································· ····· ·············································································· ····· ············ ························································· ······· ····· ············································· ································ ····· ······ ······································································· ····· ·································································· ···· ······ ····· ····· ················· ···························································· ···· ····· ·············································································· ····· ···················································· ························· ····· ························· ················································ · ········································································ ····· ······················· ··· ·················································· ····· ············ ··················································· ············· ···· ························ ·········· ····· ··············· ···················· ····· ·········································································· ·· ····· ····· ····· ········································································ · ····· ·········· ··································································· ·· ····· ··················· ························································ · ··· ····· ························· ··················································· ···· ····· ··························· ······································· ·········· ····· ····· ··································· ····· ··············· ···················· ······ ····· ··································· ··· ····································· ······· ····· ···································································· ········· ········ ····· ··········································································· ·· ········· ····· ··········································································· · ······· ·· ····· ·································································· ··········· ······· ·· · ····· ············································· ································ ·········· · ····· ··········································· ·································· ·········· · · ····· ·············································································· ·········· · · ····· ············· ······························ ·············· ····· ··········· Figure 4.12: Blockrecursive pattern of the model’s Jacobian matrix. The model’s parameters have been estimated on data going back to 1972 and we solved the model for the 10 periods from 1973 to 1982, for which the Jacobi and GaussSeidel methods did converge. The average number of iterations needed for the interdependent block to converge and the total execution time on a Sun ELC Sparcstation, which is rated with 24 Mips and 3.5 MFlops, is reported in the Table 4.2.
Method Jacobi GaussSeidel Average iterations 511 20 Execution time (seconds) 4.5 0.18
Table 4.2: Execution times of GaussSeidel and Jacobi algorithms.
Execution on a MIMD Computer In the following, the theoretical performance of a parallelization of the solution algorithms is presented. According to the deﬁnition given above, we assume that every step in the algorithms needs one time unit, independently from whether it is executed on a single or on a multiple processor machine. This, of course, neglects not only the diﬀerence existing between the performance of processors
4.2 Model Simulation Experiences
78
but also the communication time between the processors. From the Jacobian matrix in Figure 4.12 we can count respectively 5, 1, 77, 10, 2, 2, 1 variables in the seven levels from top to bottom. As the average number of iterations for the interdependent block is 511 for the Jacobi algorithm, a one period solution of the model necessitates on average 5+1+511∗77+10+2+2+1 = 39368 equations to be evaluated, which, on a serial machine, is executed in 39368 time units. On a MIMD computer, this task could be executed with 77 processors in 1 + 1 + 511 + 1 + 1 + 1 + 1 = 517 time units, which results in a speedup of 39368/517 = 76.1 and an eﬃciency of 76.1/77 = .99. For the GaussSeidel algorithm, the average number of equations to solve for one period is 5 + 1 + 20 ∗ 77 + 10 + 2 + 2 +1 = 1561, which then necessitates 1561 time units for serial execution. The order in which the equations are solved has been established by choosing a decomposition L + U of the Jacobian matrix which minimizes the number of nonzero columns in matrix U . Figure 4.13 reproduces the incidence matrix of matrix L, which shows 18 levels and where big and small dots have the same meaning as in Figure 4.12. · ·· ··· ···· ····· ······ ······· ······· ·· ····· · ·· ····· ··· ·· ··· ···· ···· ···· ··· ····· ········ ······ ··· ·· ······· ····· ·· ······· ····· ·· · ······ · ·· ····· ··· ···· ·· ········ ···· ··· ··· ········ ····· ·· ···· ···· ·· ······ · ····· ········ ······ · ······ ········ ······ · ······· ········ ········ ······· ········ ········ · ······ · ·· ····· ········ ········ ·· ····· ··· ·· ········ ········ ····· ·· ···· ········ ········ ········ ···· ······· ······· ······ ········ ········ ····· ·· ·· ·· ·· · ········ ········ ········ ··· · ··· ······· ········ ········ ··· · ··· ········ ········ ········ ···· · ·· · ········ ········ ········ ····· ···· ········ ········ ····· · ······ ········ ········ · ··· ···· · · · ······ ··· ···· ········ ····· ···· · ·· ········ ········ ········ ····· ···· ·· ·· ········ ········ ········ ····· ···· · · · · · ······ ········ ········ ····· ···· · · ·· · ······ ········ ········ ····· ···· · · · ··· · ······ ········ ········ ····· ···· · · · ···· · ······ ········ ········ ····· ···· · · ····· · ······ · ······ ········ ········ ····· ···· · · · ······ ········ ········ ········ ····· ···· ·· ··· ··· ··· · ········ ········ ········ ····· ···· ·· ··· ···· ·· ·· ········ ········ ········ ····· ···· ·· ··· ······· ··· ····· ···· ·· ··· ····· ········ ········ ········ ····· ···· ·· ··· ······ ···· ········ ·· ····· · ······ ····· ···· ·· ··· ··· ··· ···· ········ ········ ··· ···· ····· ·· · ·· ··· ······· · ··· · ······ · ········ ········ ····· ···· ·· ··· ······· ··· · ·· ········ ·· · ·· · ······ ··· · ···· ·· ··· ······· ··· · ··· ········ ········ ··· · ·· ········ ········ ····· ···· ·· ··· ······· ··· · ···· ········ ········ ········ ····· ···· ·· ··· ······· ····· ···· ········ ········ ········ ····· ···· ·· ··· ······· ·· ·· · ··· · ········ ········ ········ ··· · ···· ·· ··· ······ ····· ·· ·· ·· ········ ····· ·· ········ ····· ·· · ·· ··· ······· ····· ··· · ··· ········ ····· ·· ········ ··· ···· ····· ···· ·· ··· ······· ····· ···· ···· ········ ········ ········ ····· ···· ·· ··· ······· ····· ····· ···· · ········ ····· ·· ········ ····· ·· · ·· ··· ······· ····· ··· · ···· ·· ······ · ····· ···· ·· ··· ······· ····· ···· ···· ···· ··· ········ ········ ····· ··· ·· ·· ······· ····· ··· · ···· ··· ········ ······· ········ ····· ···· ·· ··· ······· ···· ····· ····· ··· ········ ········ ········ ···· ···· ·· ··· ······· ····· ····· ····· · ·· · ········ ········ ········ ····· ··· ·· ·· ······· ····· ····· ··· · ··· ·· ······ · ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· · · ········ ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ·· · ······ · ········ ········ ····· ···· ·· ··· · ····· ····· ····· ····· ···· ·· ·· ········ ········ ········ ··· · ···· ·· ··· · ····· ····· ····· ····· ···· ·· ··· ········ ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· ··· ····· · ······ · ···· ··· ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· · ·· · ········ ········ ········ ····· ···· ·· ··· · ····· ····· ····· ····· ···· ··· ·· · ·· ········ ········ ········ ····· ···· ·· ··· ······· ···· ····· ····· ···· ··· ·· ··· ········ ·· ····· ········ ····· ···· ·· ··· ·· ···· ····· ····· ····· ···· ··· ··· ···· ········ ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· · · ···· ···· ········ ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· ···· ··· · · ········ ········ ········ ····· ···· ·· ··· ······· ····· ····· · ··· ···· ··· ···· ····· · ········ ········ ········ ·· ····· ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· ·· ····· · · ········ ·· ····· ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· ·· ····· · ·· ··· ·· ······ ···· ····· ····· ···· ·· ········ ········ ········ ····· ···· ·· ·· ······· ····· ····· ····· ···· ··· ···· ····· · ··· ········ ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· ···· ····· · ···· ········ ········ ········ ····· ···· ·· ··· ······· · · ····· ····· · ·· ··· ···· ·· · ·· ···· · ·· ···· ····· ·· ···· ········ ········ ········ ···· Figure 4.13: Matrix L for the GaussSeidel algorithm. The maximum number of variables in a level of this incidence matrix is 8. The solution of the model on a MIMD computer then necessitates 1 + 1 + 20 ∗ 18 + 1 + 1 + 1 + 1 = 366 time units, giving a speedup of 1561/366 = 4.2 and an eﬃciency of 4.2/8 = .52 . In serial execution, the GaussSeidel is 25 times faster than the Jacobi, but when these two algorithms are executed on a MIMD computer, this factor reduces to 1.4 . We also see that, for this case, the parallel Jacobi method is 3 times faster
4.2 Model Simulation Experiences
79
than the serial GaussSeidel method, which means that we might be able to solve problems with the Jacobi method in approximatively the same amount of time as with a serial GaussSeidel method. If the model’s size increases, the solution on MIMD computers becomes more and more attractive. For the Jacobi method, the optimum is obviously reached if the number of equations in a level equals the number of available processors. For the GaussSeidel method, we observe that the ratio between the size of matrix L and the number of its levels has a tendency to increase for larger models. We computed this ratio for two large macroeconometric models, MPS and RDX2.6 For the MPS model, the largest interdependent block is of size 268 and has 28 levels, giving a ratio of 268/28 = 9.6 . For the RDX2 model, this ratio is 252/40 = 6.3 . Execution on a SIMD Computer The situation where we solve repeatedly the same model is ideal for the CM2 SIMD computer. Due to the arrayprocessing features implemented in CM FORTRAN, which naturally map onto the data parallel architecture7 of the CM2, it is very easy to get an eﬃcient parallelization for the steps in our algorithms which concern the evaluation of the model’s equations, as it is the case in Statement 2 in Figure 22 for the ﬁrstorder algorithms. Let us consider one of the equations of the model, which, coded in FORTRAN for a serial execution, looks like y(8) = exp(b0 + b1 ∗ y(15) + b3 ∗ log(y(12) + b4 ∗ log(y(14))) . If we want to perform p independent solutions of our equation, we deﬁne the vector y to be a twodimensional array y(n,p), where the second dimension corresponds to the p diﬀerent data sets. As CM FORTRAN treats arrays as objects, the p evaluations for the equation given above, is simply coded as follows: y(8, :) = exp(b0 + b1 ∗ y(15, :) + b3 ∗ log(y(12, :) + b4 ∗ log(y(14, :))) and the computations to evaluate the p components of y(8,:) are then executed on p diﬀerent processors at once.8 To instruct the compiler that we want the p diﬀerent data sets, corresponding to the columns of matrix y in the memories of p diﬀerent processors we use a compiler directive.9 Repeated solutions of the model have then been experienced in a sensitivity analysis, where we shocked the values of some of the exogenous variables and observed the diﬀerent forecasts. The empirical distribution of the forecasts then
Helliwell et al. [63] and Brayton and Mauskopf [19]. essence of the CM system is that it stores array elements in the memories of separate processors and operates on multiple elements at once. This is called data parallel processing. For instance, consider a n × m array B and the statement B=exp(B). A serial computer would execute this statement in nm steps. The CM machine, in contrast, provides a virtual processor for each of the nm data elements and each processor needs to perform only one computation. 8 The colon indicates that the second dimension runs over all columns. 9 The statement layout y(:serial,:news) instructs the compiler to lay out the second dimension of array y across the processors.
7 The 6 See
4.2 Model Simulation Experiences
80
provides an estimate for their standard deviation. The best results, in terms of speedup, will be obtained if the number of repeated solutions equals the number of processors available in the computer.10 We therefore generated 8192 diﬀerent sets of exogenous variables and produced the corresponding forecasts. The time needed to solve the model 8192 times for the ten time periods with the GaussSeidel algorithm is reported in Table 4.3, where we also give the time spent in executing the diﬀerent statements.
CM2 seconds 1.22 5.44 13.14 22.2 12.7 Sun ELC seconds 100 599 366 1109 863 50 68
Statements 1. y = y 2. Evaluate all equations 3. not converged = any(y 1 − y 0 /(y 0  + 1) > η) Total time Modiﬁed algorithm
0 1
Speedup
Table 4.3: Execution time on CM2 and Sun ELC. At this point, the ratio of the total execution times gives us a speedup of approximatively 50, which, we will see, can be further improved. We observe that the execution of Statement 3 on the CM2 takes more than twice the time needed to evaluate all equations. One of the reasons is that this statement involves communication between the processors. Statements 1 and 3 together, which in the algorithm serve exclusively to check for convergence, need approximatively as much time as two iterations over all equations. We therefore suggest a modiﬁed algorithm, where the ﬁrst test for convergence takes place after k1 iterations and all subsequent tests every k2 iterations.11 Algorithm 24 Modiﬁed Firstorder Iterative Methods
do i = 1 : k1 , Evaluate all equations, enddo do while ( not converged ) 1. y0 = y1 2. do i = 1 : k2 , Evaluate all equations, enddo 3. not converged = any(y 1 − y 0 /(y 0  + 1) > η) enddo
According to the results reported in the last row of Table 4.3, such a strategy increases the performance of the execution on both machines, but not in equal proportions, which then leads to a speedup of 68. We also solved the model with the Newtonlike algorithm on the Sun and CM2. We recall, what we already mentioned in Section 4.2.1, that for sparse Jacobian matrices like ours, the Newtonlike method is, in general, only applied to the
10 If the data set is larger then the set of physical processors, each processor processes more than one data set consecutively. 11 The parameters k and k depend, of course, on the particular problem and can be guessed 1 2 in preliminary executions. For our application, we choose k1 = 20 and k2 = 2 for the execution on the CM2, and k1 = 20, k2 = 1 for the serial execution.
4.2 Model Simulation Experiences
81
subset of spike variables. However, the cardinality of the set of spike variables for the interdependent block is ﬁve and a comparison of the execution times for such a small problem would be meaningless. Therefore, we solved the complete interdependent block without any decomposition, which certainly is not a good strategy.12 The execution time concerning the main steps of the Newtonlike algorithm needed to solve the model for ten periods is reported in Table 4.4.
Seconds Sun CM2 0.27 2.1 1.1 19.7 0.23 122 0.6 12
Statements 2. 3. 4. 5. X = hI + ι y 0 for i = 1 : n, evaluate aij = fi (x.j , z), j = 1 : n and fi (y 0 , z) J = (A − ι F (y 0 , z) )/h solve J s = F (y 0 , z)
Table 4.4: Execution time on Sun ELC and CM2 for the Newtonlike algorithm.
In order to achieve the numerical stability of the evaluation of the Jacobian matrix, Statements 2, 3 and 4 have to be computed in double precision. Unfortunately, the CM2 hardware is not designed to execute such operations eﬃciently, as the evaluation of an arithmetic expression in double precision is about 60 times slower as the same evaluation in single precision. This inconvenience does not apply to the CM200 model. By dividing the execution time for Statement 3 by a factor of 60, we get approximatively the execution time for the Sun. Since the Sun processor is about 80 times faster than a processor on the CM2 and since we have to compute 78 columns in parallel, we therefore just reached the point from which the CM200 would be faster. Statements 2 and 4 operate on n2 elements in parallel and therefore their execution is faster on the CM2. From these results, we conclude that the CM2 is certainly not suited for data parallel processing in double precision. However, with the CM200 model signiﬁcant speedup will be obtained if the size of the model to be solved becomes superior to 80.
problem has, for instance, been discussed in Gilli et al. [52]. The execution time for the Newton method can therefore not be compared with the execution time for the ﬁrstorder iterative algorithms.
12 This
Chapter 5
Rational Expectations Models
Nowadays, many largescale models explicitly include forward expectation variables that allow for a better accordance with the underlying economic theory and also provide a response to the Lucas critique. These ongoing eﬀorts gave rise to numerous models currently in use in various countries. Among others, we may mention MULTIMOD (Masson et al. [79]) used by the International Monetary Fund and MX3 (Gagnon [41]) used by the Federal Reserve Board in Washington; model QPM (Armstrong et al. [5]) from the Bank of Canada; model Quest (Brandsma [18]) constructed and maintained by the European Commission; models MSG and NIGEM analyzed by the Macro Modelling Bureau at the University of Warwick. A major technical issue introduced by forwardlooking variables is that the system of equations to be solved becomes much larger than in the case of conventional models. Solving rational expectation models often constitutes a challenge and therefore provides an ideal testing ground for the solution techniques discussed earlier.
5.1
Introduction
Before the more recent eﬀorts to explicitly model expectations, macroeconomic model builders have taken into account the forward looking behavior of agents by including distributed lags of the variables in their models. This actually comes down to supposing that individuals predict a future or current variable by only looking at past values of the same variable. Practically, this explains economic behavior relatively well if the way people form their expectations does not change. These ideas have been explored by many economists, which has lead to the socalled “adaptive hypothesis.” This theory states that the individual’s expectations react to the diﬀerence between actual values and expected values of the variable in question. If the level of the variable is not the same as the forecast,
5.1 Introduction
83
the agents use their forecasting error to revise their expectation of the next period’s forecast. This setting can be summarized in the following equation1 xe − xe = λ(xt−1 − xe ) with t t−1 t−1 0<λ≤1,
where xe is the forecasted and non observable value of variable x at period t. t By rearranging terms we get xe = λxt−1 + (1 − λ)xe , t t−1 i.e. the forecasted value for the current period is a weighted average of the true value of x at period t − 1 and the forecasted value at period t − 1. Substituting lagged values of xe when λ = 1 in this expression yields to t
∞
xe = λ t
i=1
(1 − λ)i−1 xt−i .
The speed at which the agents adjust is determined by parameter λ. In this framework, individuals constantly underpredict the value of the variable whenever the variable constantly increases. This behavior is labelled “not rational” since agents make systematically errors in their expected level of x. It is therefore argued that individuals would not behave that way since they can do better. As many other economists, Lucas [77] criticized this assumption and proposed models consistent with the underlying idea that agents do their best with the information they have. The “rational expectation hypothesis” is an alternative which takes into account such criticisms. We assume that individuals use all information eﬃciently and do not systematically make errors in their predictions. Agents use the knowledge they have to form views about the variables entering their model and produce expectations of the variables they want to use to make their decisions. This role of expectations in economic behavior has been recognized for a long time. The explicit use of the term “rational expectations” goes back to the work of Muth in 1961 as reprinted in Lucas and Sargent [75]. The rational expectations are the true mathematical expectations of future and current variables conditional on the information set available to the agents. This information set at date t − 1 contains all the data on the variables of the model up to the end of period t − 1, as well as the relevant economic model itself. The implications of the rational expectation hypothesis is that the policies the government may try to impose are ineﬀective since agents forecast the change in the variables using the same information basis. Even in the short term, systematic interventions of monetary or ﬁscal policies only aﬀect the general level of prices but not the real output or employment. A criticism often made against this hypothesis is that obtaining the information is costly and therefore not all information may be used by the agents. Moreover not all agents may build a good model to reach their forecasts. An answer to these points is that even if not all individuals are well informed and produce good predictions, some individuals will. An arbitrage process can therefore
1 The model is sometimes expressed as xe − xe e t t−1 = λ(xt − xt−1 ) . In this case, the expectation formation is somewhat diﬀerent but the conclusions are similar for our purposes.
5.1 Introduction
84
take place and the agents who have the correct information can make proﬁts by selling the information to the illinformed ones. This process will lead the economy to the rational expectation equilibrium. Fair [33] summarizes the main characteristics of rational expectation (RE) models such as those of Lucas [76], Sargent [91] and Barro [9] in the three following points, • the assumption that expectations are rational, i.e. consistent with the model, • the assumption that information is imperfect regarding the current state of the economy, • the postulation of an aggregate supply equation in which aggregate supply is a function of exogenous terms plus the diﬀerence between the actual and the expected price level. These characteristics imply that government actions have an inﬂuence on the real output only if these actions are unanticipated. The second assumption allows for an eﬀect of government actions on the aggregate supply since the diﬀerence between the actual and expected price levels may be inﬂuenced. A systematic policy is however anticipated by the agents and cannot aﬀect the aggregate supply. One of the key conclusions of newclassical authors is that existing macroeconometric models were not able to provide guidance for economic policy. This critique has lead economists to incorporate expectations in the relationships of small macroeconomic models as Sargent [92] and Taylor [96]. By now this idea has been adopted as a part of newclassical economics. The introduction of the rational expectation hypothesis opened new research topics in econometrics that has given rise to important contributions. According to Beenstock [12, p. 141], the main issues of the RE hypothesis are the following: • Positive economics, i.e. what are the eﬀects of policy interventions if rational expectations are assumed? • Normative economics, i.e. given that expectations are rational, ought the government take policy actions? • Hypothesis testing, i.e. does evidence in empirical studies support the rational expectations hypothesis? Are such models superior to others which do not include this hypothesis? • Techniques, i.e. how does one solve a model with rational expectations? What numerical issues are faced? How can we eﬃciently solve large models with forwardlooking variables? These problems, particularly the ﬁrst three, have been addressed by many authors in the literature. New procedures for estimation of the model’s parameters were set forth in McCallum [80], Wallis [101], Hansen and Sargent [62], Wickens [102], Fair and
5.1 Introduction
85
Taylor [34] and Nijman and Palm [84]. The presence of forward looking variables also creates a need for new solution techniques for simulating such models, see for example Fair and Taylor [34], Hall [61], Fisher and Hughes Hallett [37], Laﬀargue [74] and Boucekkine [17]. In the next section, we introduce a formulation of RE models that will be used to analyze the structure of such models. A second section discusses issues of uniqueness and stability of the solutions.
5.1.1
Formulation of RE Models
Using the conventional notation (see for instance Fisher [36]) a dynamic model with rational expectations is written fi (yt , yt−1 , . . . , yt−r , yt+1t−1 , . . . , yt+ht−1 , zt ) = 0 , i = 1, . . . , n , (5.1)
where yt+jt−1 is the expectation of yt+j conditional on the information available at the end of period t − 1, and zt represents the exogenous and random variables. For consistent expectations, the forward expectations yt+jt−1 have to coincide with the next period’s forecast when solving the model conditional on the information available at the end of period t − 1. These expectations are therefore linked forward in time and, solving model (5.1) for each yt conditional on some start period 0 requires each yt+j0 , for j = 1, 2, . . . , T −t, and a terminal condition yT +j0 , j = 1, . . . , h. We will now consider that the system of equations contains one lag and one lead, that is, we set r = 1 and h = 1 in (5.1) so as to simplify notation. In order to discuss the structure of our system of equations, it is also convenient to resort to a linearization. System (5.1) becomes therefore Dt yt + E t−1 yt−1 + At+1 yt+1t−1 = zt where we have Dt =
∂f ∂yt , ∂f E t−1 = − ∂y
t−1
(5.2)
∂f
t+1t−1
and At+1 = − ∂y
. Stacking up
the system for period t + 1 to period t + T we get
yt+0 yt+1 zt+1 y t+2 zt+2 . . . = . . . yt+T −1 zt+T −1 yt+T zt+T yt+T +1 (5.3)
E
t+0
D A E t+1 Dt+2 At+3 .. .. .. . . . t+T −2 Dt+T −1 At+T E E t+T −1 Dt+T
t+1
t+2
At+T +1
and a consistent expectations solution to (5.3) is then obtained by solving Jy = b ,
5.1 Introduction
86
where J is the boxed matrix in Equation (5.3), y = [yt+1 . . . yt+T ] and zt+1 −E t+0 0 . . . . . b= . + yt+T +1 yt+0 + . . . t+T +1 zt+T 0 −A are the stacked vectors of endogenous respectively exogenous variables which contain the initial condition yt+0 and the terminal conditions yt+T +1 .
5.1.2
Uniqueness and Stability Issues
To study the dynamic properties of RE models, we begin with the standard presentation of conventional models containing only lagged variables. The structural form of the general linear dynamic model is usually written as B(L)yt + C(L)xt = ut , where B(L) and C(L) are matrices of polynomials in the lag operator L and ut a vector of error terms. The matrices are respectively of size n × n and n × g, where n is the number of endogenous variables and g the number of exogenous variables, and deﬁned as B(L) = B0 + B1 L + · · · + Br Lr , C(L) = C0 + C1 L + · · · + Cs Ls . The model is generally subject to some normalization rule i.e. diag(B0 ) = [1 1 . . . 1]. The dynamic of the model is stable if the determinantal polynor mial B(z) = det( j=0 Bj z j ) has all its roots λi , i = 1, 2, . . . , nr outside the unit circle. The endogenous variables yt can be expressed as yt = −B(L)−1 C(L)xt + B(L)−1 ut . (5.4)
Assuming the stability of the system, the distributed lag on the exogenous variables xt is nonexplosive. If there exists at least one root with modulus less than unity, the solution for yt is explosive. Wallis [100] writes Equation (5.4) using B(L) the determinant of the matrix polynomial B(L) and b(L) the adjoint matrix of B(L) b(L) b(L) C(L)xt + ut . (5.5) yt = − B(L) B(L) To study stability, let us write the following polynomial in factored form
nr nr
B(z) =
i=1
βi z = βnr
i=1
i
(z − λi ) ,
(5.6)
where λi , i = 1, 2, . . . , nr are the roots of B(z) and βnr is the coeﬃcient of the term z nr . Assuming βnr = 0, the polynomial deﬁned in (5.6) has the same roots as the polynomial
nr
(z − λi ) .
i=1
(5.7)
5.1 Introduction
87
When β1 = 0, none of the λi , i = 1, . . . , nr can be zero. Therefore, we may express a polynomial with the same roots as (5.7) by the following expression
nr
(1 −
i=1
z )= (1 − µi z) with λi i=1
nr
µi = 1/λi .
(5.8)
Assuming that the model is stable, i.e. that λi  > 1 , i = 1, 2, . . . , nr, we have that µi  < 1 , i = 1, 2, . . . , nr. We now consider the expression 1/B(z) that arises in (5.5) and develop the expansion in partial fractions of the ratio of polynomials 1/ (1 − µi z) (the numerator being the polynomial ‘1’) that has the same roots as 1/B(z): 1 = nr (1 − µi z) i=1
nr
i=1
ci (1 − µi z)
.
(5.9)
For a stable root, say λ > 1 (µ < 1), the corresponding term of the right hand side of (5.9) may be expanded into a polynomial of inﬁnite length 1 (1 − µz) = = 1 + µz + µ2 z 2 + · · · 1 + z/λ + z 2 /λ2 + · · · . (5.10) (5.11)
This corresponds to an inﬁnite series in the lags of xt in expression (5.5). In the case of an unstable root, λ < 1 (µ > 1), the expansion (5.11) is not deﬁned, but we can use an alternative expansion as in [36, p. 75] 1 (1 − µz) = = −µ−1 z −1 (1 − µ−1 z −1 ) − λz −1 + λ2 z −2 + · · · . (5.12) (5.13)
In this formula, we get an inﬁnite series of distributed leads in terms of Equation (5.5) by deﬁning L−1 xt = F xt = xt+1 . Therefore, the expansion is dependent on the future path of xt . This expansion will allow us to ﬁnd conditions for the stability in models with forward looking variables. Consider the model ˜ B0 yt + A(F )yt+1t−1 = C0 xt + ut , (5.14)
where for simplicity no polynomial lag is applied to the current endogenous and exogenous variables. As previously, yt+1t−1 denotes the conditional expectation of variable yt+1 given the information set available at the end of period t − 1. The matrix polynomial in the forward operator F is deﬁned as ˜ ˜ ˜ ˜ A(F ) = A0 + A1 F + · · · + Ah F h , and has dimensions n×n. The forward operator aﬀects the dating of the variable but not the dating of the information set so that F yt+1t−1 = yt+2t−1 .
5.1 Introduction
88
When solving the model for consistent expectations, we set yt+1t−1 to the solution of the model, i.e. yt+i . In this case, we may rewrite model (5.14) as A(F )yt = C0 xt + ut , (5.15)
˜ with A(F ) = B0 + A(F ). The stability of the model is therefore governed by the roots γi of the polynomial A(z). When for all i we have γi  > 1, the model is stable and we can use an expansion similar to (5.11) to get an inﬁnite distributed lead on the exogenous term xt . On the other hand, if there exist unstable roots γi  < 1 for some i, we may resort to expansion (5.13) which yields an inﬁnite distributed lag. As exposed in Fisher [36, p. 76], we will need to have γi  > 1 , ∀i in order to get a unique stable solution. To solve (5.15) for yt+i , i = 1, 2, . . . , T we must choose values for the terminal conditions yt+T +j , j = 1, 2, . . . , h. The solution path selected for yt+i , i = 1, 2, . . . , T depends on the values of these terminal conditions. Diﬀerent conditions on the yt+T +j , j = 1, 2, . . . , h generate diﬀerent solution paths. If there exists an unstable γi  < 1 for some i, then part of the solution only depends on lagged values of xt . One may therefore freely choose some terminal conditions by selecting values of xt+T +j , j = 1, . . . , h, without changing the solution path. This hence would allow for multiple stable trajectories. We will therefore usually require the model to be stable in order to obtain a unique stable path. The general model includes lags and leads of the endogenous variables and may be written B(L)yt + A(F )yt+1t−1 + C(L)xt = ut , or equivalently D(L, F )yt + C(L) = ut , where D(L, F ) = B(L)+A(F ) and we suppose the expectations to be consistent with the model solution. Recalling that L = F −1 in our notation, the stability conditions depend on the roots of the determinantal polynomial D(z, z −1 ). To obtain these roots, we resort to a factorization of matrix D(z, z −1 ). When there is no zero root, we may write D(z, z −1 ) = U (z) W V (z −1 ) , where U (z) is a matrix polynomial in z, W is a nonsingular matrix and V (z −1 ) is a matrix polynomial in z −1 . The roots of D(z, z −1 ) are those of U (z) and V (z −1 ). Since we assumed that there exists no zero root, we know that if ˜ V (z −1 ) = v0 + v1 z −1 + · · · + vh z −h has roots δi , i = 1, . . . , h then V (z) = h vh + vh−1 z + · · · + v0 z has roots 1/δi , i = 1, . . . , h. The usual stability ˜ condition is that U (z) and V (z) must have roots outside the unit circle. This, therefore, is equivalent to saying that U (z) has roots in modulus greater than unity, whereas V (z −1 ) has roots less than unity. With these conditions, the model must have as many unstable roots as there are expectation terms, i.e. h. In this case, we can deﬁne inﬁnite distributed lags and leads that allow the selection of a unique stable path of yt+j , j = 1, 2, . . . , T given the terminal conditions.
5.2 The Model MULTIMOD
89
5.2
The Model MULTIMOD
In this section, we present the model we used in our solution experiments. First, a general overview of the model is given in Section 5.2.1; then, the equations that compose an industrialized country are presented in Section 5.2.2. Section 5.2.3 brieﬂy introduces the structure of the complete model.
5.2.1
Overview of the Model
MULTIMOD (Masson et al. [79]) is a multiregion econometric model developed by the International Monetary Fund in Washington. The model is available upon request and is therefore widely used for academic research. MULTIMOD is a forwardlooking dynamic annual model and describes the economic behavior of the whole world decomposed into eight industrial zones and the rest of the world. The industrial zones correspond to the G7 and a zone called “small industrial countries” (SI), which collects the rest of the OECD. The rest of the world comprises two developing zones, i.e. highincome oil exporters (HO) and other developing countries (DC). The model mainly distinguishes three goods, which are oil, primary commodities and manufactures. The shortrun behavior of the agents is described by error correction mechanisms with respect to their longrun theory based equilibrium. Forwardlooking behavior is modeled in real wealth, interest rates and in the price level determination. The speciﬁcation of the models for industrialized countries is the same and consists of 51 equations each. These equations explain aggregate demand, taxes and government expenditures, money and interest rates, prices and supply, and international balances and accounts. Consumption is based on the assumption that households maximize the discounted utility of current and future consumption subject to their budget constraint. It is assumed that the function is inﬂuenced by changes in disposable income wealth and real interest rates. The demand for capital follows Tobin’s theory, i.e. the change in the net capital stock is determined by the gap between the value of existing capital and its replacement cost. Conventional import and export functions depending upon price and demand are used to model trade. This makes the simulation of a single country model meaningful. Trade ﬂows for industrial countries are disaggregated into oil, manufactured goods and primary commodities. Government intervention is represented by public expenditures, money supply, bonds supply and taxes. Like to most macroeconomic models, MULTIMOD describes the LM curve by conventional demand for money balances and a money supply consisting in a reaction function of the shortterm interest rate to a nominal money target, except for France, Italy and the smaller industrial countries where it is assumed that Central Banks move shortterm interest rates to limit movements of their exchange rates with respect to the Deutsche Mark. The aggregate supply side of the model is represented by an inﬂation equation containing the expected inﬂation rate. A set of equation covers the current account balance, the net international asset or liability position and the determination of exchange rates. As already mentioned, the rest of the world is aggregated into two regions, i.e.
5.2 The Model MULTIMOD
90
capital exporting countries (mainly represented by highincome oil exporters) and other developing countries. The main diﬀerence between the developing country model and other industrial countries is that the former is ﬁnance constrained and therefore its imports and its domestic investment depend on its ability to service an additional debt. The output of the developing region is disaggregated into one composite manufactured good, oil, primary commodities and one nontradable good. The group of high income oil exporters do not face constraints on their balance of payments ﬁnancing and have a diﬀerent structure as oil exports constitute a large fraction of their GNP. Many equations are estimated on pooled data from 1965 to 1987. As a result, the parameters in many behavioral equations are constrained to be the same across the regions except for the constant term. The MULTIMOD model is designed for evaluating eﬀects of economic policies and other changes in the economic environment around a reference path given exogenously. Thus, the model is not designed to provide socalled baseline forecasts. The model is calibrated through residuals in regression equations to satisfy a baseline solution.
5.2.2
Equations of a Country Model
Hereafter, we reproduce the set of 51 equations deﬁning the country model of Japan. The country models for the other industrial zones (CA, FR, GR, IT, UK, US, SI) all have the same structure. Variables are preceded by a twoletter label of indicating the country the variables belong to. These labels are listed in Table 5.1. The model for capital exporting countries (HO) contains 10 equations and the one for the other developing countries 33. A ﬁnal set of 14 equations describes the links between the country models.
Label Country Label Country Label Country CA Canada JA Japan HO Capital Exporting FR France UK United Kingdom DC Other Developing GR Germany US United States IT Italy SI Small Industrial
Table 5.1: Labels for the zones/countries considered in MULTIMOD. Each country’s coeﬃcients are identiﬁed by a oneletter preﬁx that is the ﬁrst letter of the country name (except for the United Kingdom where the symbol E is used). The notation DEL(n : y) deﬁnes the nth diﬀerence of variable y, e.g. DEL(1:y) = y − y(−1).
JC: DEL(1:LOG(JA_C)) = JC0 + JC1*LOG(JA_W(1)/JA_C(1))+ JC2*JA_RLR + JC3*DEL(1:LOG(JA_YD)) + JC4*JA_DEM3 + JC5*DUM80 + RES_JA_C DEL(1:LOG(JA_COIL)) = JCOIL0 + JCOIL1*DEL(1:LOG(JA_GDP)) + JCOIL2*DEL(1:LOG(POIL/JA_PGNP)) + JCOIL3*LOG(POIL(1)/JA_PGNP(1)) + JCOIL4*LOG(JA_GDP(1)/JA_COIL(1)) + RES_JA_COIL DEL(1:LOG(JA_K)) = JK0 + JK1*LOG(JA_WK/JA_K(1)) + JK2*LOG(JA_WK(1)/JA_K(2)) + RES_JA_K JA_INVEST = DEL(1:JA_K) + JA_DELTA*JA_K(1) + RES_JA_INVEST
JCOIL:
JK: JINV:
5.2 The Model MULTIMOD
91
JXM:
JXA: JXT: JIM:
JIOIL: JIC:
JIT: JA: JGDP: JGNP: JW: JWH: JWK:
JYD: JGE: JTAX: JTAXK: JTAXH: JTRATE:
JB: JGDEF: JM: JRS: JRL: JR: JRLR: JRSR: JRR: JPGNP:
JPNO: JP: JPFM:
JPXM:
JPXT: JPIM:
DEL(1:LOG(JA_XM)) = JXM0 + JXM1*DEL(1:JA_REER) + JXM2*DEL(1:LOG(JA_FA)) + JXM3*LOG(JA_XM(1)/JA_FA(1)) + JXM4*JA_REER(1) + JXM5*TME + JXM6*TME**2 + RES_JA_XM JA_XMA = JA_XM + T1*(WTRADERTRDER) JA_XT = JA_XMA + JA_XOIL DEL(1:LOG(JA_IM)) = JIM0 + JIM1*DEL(1:JIM7*LOG(JA_A) + (1  JIM7)*LOG(JA_XMA)) + JIM2*DEL(1:LOG(JA_PIMA/JA_PGNPNO)) + JIM3*LOG(JA_PIMA(1)/JA_PGNPNO(1)) + JIM4*(UIM7*LOG(JA_A(1))+(1  UIM7)*LOG(JA_XMA(1))  LOG(JA_IM(1))) + JIM5*TME + JIM6*TME**2 + RES_JA_IM JA_IOIL = JA_COIL + JA_XOIL  JA_PRODOIL + RES_JA_IOIL DEL(1:LOG(JA_ICOM)) = JIC0 + JIC2*DEL(1:LOG(PCOM/JA_ER/JA_PGNP)) + JIC1*DEL(1:LOG(JA_GDP)) + JIC3*LOG(PCOM(1)/JA_ER(1)/JA_PGNP(1)) + JIC4*LOG(JA_GDP(1)) + JIC5*LOG(JA_ICOM(1)) + RES_JA_ICOM JA_IT = JA_IM + JA_IOIL + JA_ICOM JA_A = JA_C + JA_INVEST + JA_G + RES_JA_A JA_GDP = JA_A + JA_XT  JA_IT + RES_JA_GDP JA_GNP = JA_GDP + JA_R*(JA_NFA(1)+JA_NFAADJ(1))/JA_PGNP + RES_JA_GNP JA_W = JA_WH + JA_WK + (JA_M + JA_B + JA_NFA/JA_ER)/JA_P JA_WH = JA_WH(1)/(1+URBAR+0.035)+ ((1JA_BETA)*JA_GDP*JA_PGNP  JA_TAXH)/JA_P + URPREM*JA_WK JA_WK = JA_WK(1)/(1 + JA_RSR + (JA_K/JA_K(1)  1)) + (JA_BETA*JA_GDP*JA_PGNP  JA_TAXK)/JA_P (JA_DELTA+URPREM)*JA_WK JA_YD = (JA_GDP*JA_PGNP  JA_TAX)/JA_P  JA_DELTA*JA_K(1) JA_GE = JA_P*JA_G + JA_R*JA_B(1) + JA_GEXOG JA_TAX = JA_TRATE*(JA_PGNP*JA_GNP  JA_DELTA*JA_K(1)*JA_P + JA_R*JA_B(1) + RES_JA_TAX*JA_PGNP) JA_TAXK = UDUMCT*JA_BETA*JA_TAX + (1UDUMCT)*JA_CTREFF*JA_BETA*JA_GDP*JA_PGNP JA_TAXH = JA_TAXJA_TAXK DEL(1:JA_TRATE) = JA_DUM*(TAU1*((JA_B  JA_BT)/(JA_GNP*JA_PGNP)) + TAU2*(DEL(1:JA_B  JA_BT)/(JA_GNP*JA_PGNP))) + TAU3*(JA_TRATEBAR(1)  JA_TRATE(1)) + RES_JA_TRATE DEL(1:JA_B)+DEL(1:JA_M) = JA_R*JA_B(1) + (JA_P*JA_G  JA_TAX + JA_GEXOG) + RES_JA_B*JA_P JA_GDEF = DEL(1:JA_B + JA_M) LOG(JA_M/JA_P) = JM0 + JM1*LOG(JA_A) + JM2*JA_RS + JM4*LOG(JA_M(1)/JA_P(1)) + RES_JA_M DEL(1:JA_RS)  JR3*(JA_RS(1)JA_RS(1)) = JR1*(LOG(JA_MT/JA_M)/JM2) + RES_JA_RS JA_RL/100 = ((1 + JA_RS/100)*(1 + JA_RS(1)/100)*(1 + JA_RS(2)/100)* (1 + JA_RS(3)/100)*(1 + JA_RS(4)/100))**0.2  1 + RES_JA_RL JA_R = 0.5*JA_RS(1)/100 + 0.5*(JA_RL(3)+JA_RL(2)+JA_RL(1))/3/100) JA_RLR = (1 + JA_RL/100)/(JA_P(5)/JA_P)**0.2  1 JA_RSR = (1 + JA_RS/100)/(JA_P(1)/JA_P)  1 JA_RR = (0.8*JA_RS(1) + 0.2*(JA_RL(3) + JA_RL(2) + JA_RL(1))/3)/100 DEL(1:LOG(JA_PGNPNO)) = DEL(1:LOG(JA_PGNPNO(1))) + JP1*(JA_CU/1001) + JP2*DEL(1:LOG(JA_P/JA_PGNPNO)) JP3*DEL(1:LOG(JA_PGNPNO(1)/JA_PGNPNO(1))) + RES_JA_PGNP JA_PGNPNO = (JA_GDP*JA_PGNP  JA_PRODOIL*POIL)/(JA_GDP  JA_PRODOIL) JA_PGNP = (JA_P*JA_A + JA_XT*JA_PXT  JA_IT*JA_PIT)/JA_GDP + RES_JA_P*JA_PGNP LOG(JA_PFM) = 0.5*(W11*LOG(JA_ER/UE87)  LOG(JA_ER/UE87) + W12*LOG(JA_PXM*JA_ER/JE87) + L12*LOG(JA_PGNPNO*JA_ER/JE87) + W13*LOG(GR_PXM*GR_ER/GE87) + L13*LOG(GR_PGNPNO*GR_ER/GE87) + W14*LOG(CA_PXM*CA_ER/CE87) + L14*LOG(CA_PGNPNO*CA_ER/CE87) + W15*LOG(FR_PXM*FR_ER/FE87) + L15*LOG(FR_PGNPNO*FR_ER/FE87) + W16*LOG(IT_PXM*IT_ER/IE87) + L16*LOG(IT_PGNPNO*IT_ER/IE87) + W17*LOG(UK_PXM*UK_ER/EE87) + L17*LOG(UK_PGNPNO*UK_ER/EE87) + W18*LOG(SI_PXM*SI_ER/SE87) + L18*LOG(SI_PGNPNO*SI_ER/SE87) + W19*LOG(RW_PXM*RW_ER/RE87) + L19*LOG(DC_PGNP*RW_ER/RE87)) DEL(1:LOG(JA_PXM)) = JPXM0 + JPXM1*DEL(1:LOG(JA_PGNPNO)) + (1JPXM1)*DEL(1:LOG(JA_PFM)) + JPXM2*LOG(JA_PGNPNO(1)/JA_PXM(1)) + RES_JA_PXM JA_PXT = (JA_XMA*JA_PXM + POIL*JA_XOIL)/JA_XT JA_PIM = (S11*JA_PXM + S21*JA_PXM*JA_ER/JE87 + S31*GR_PXM*GR_ER/GE87 + S41*CA_PXM*CA_ER/CE87 +
5.3 Solution Techniques for RE Models
92
JPIMA: JPIT: JYCAP:
JBETA: JLF: JCU: JNFA: JTB: JCAB: JER: JREER: JMERM:
JFA:
S51*FR_PXM*FR_ER/FE87 + S61*IT_PXM*IT_ER/IE87 + S71*UK_PXM*UK_ER/EE87 + S81*SI_PXM*SI_ER/SE87 + S91*RW_PXM*RW_ER/RE87)/(JA_ER/UE87)*(1 + RES_JA_PIM) JA_PIMA = JA_PIM + T1*(WTRADE  TRDE)/JA_IM JA_PIT = (JA_IM*JA_PIMA + JA_IOIL*POIL + JA_ICOM*PCOM)/JA_IT JA_YCAP = JY87*(UBETA*(JA_K/JK87)**(JRHO) + (1  JBETA)*((1 + JPROD)**(TME  21)* (1 + RES_JA_YCAP/(1JBETA))*JA_LF/JL87)**(JRHO))**((1)/JRHO) JA_BETA = JBETA*(JA_YCAP/JA_K/(JY87/JK87))**JRHO JA_LF = JA_POP*JA_PART/(1 + JA_DEM3) JA_CU = 100*JA_GDP/JA_YCAP DEL(1:JA_NFA) = (JA_XT*JA_PXT  JA_IT*JA_PIT)*JA_ER + JA_R*(JA_NFA(1) + JA_NFAADJ(1)) + RES_JA_NFA JA_TB = JA_XT*JA_PXT  JA_IT*JA_PIT JA_CURBAL = DEL(1:JA_NFA) 1+US_RS/100 = (1 + JA_RS/100)*(JA_ER(1)/JA_ER) + RES_JA_ER JA_REER = LOG(JA_PXM)  LOG(JA_PFM) JA_MERM = EXP((V12*LOG(JA_ER/JE87) + V13*LOG(GR_ER/GE87) + V14*LOG(CA_ER/CE87) + V15*LOG(FR_ER/FE87) + V16*LOG(IT_ER/IE87) + V17*LOG(UK_ER/EE87) + V18*LOG(SI_ER/SE87))) JA_FA = (JA_A*UE87)**L11*(JA_A*JE87)**L12*(GR_A*GE87)**L13* (CA_A*CE87)**L14*(FR_A*FE87)**L15*(IT_A*IE87)**L16* (UK_A*EE87)**L17*(SI_A*SE87)**L18*((HO_A+DC_A)*RE87)**L19/UE87
5.2.3
Structure of the Complete Model
As already described in Section 5.2.1, the country models are linked together by trade equations. This then results into a 466 equation model. The linkages between country models are schematized in Figure 5.1. UK ............................. CA . .. . ... .. ..... ..... ... . . ... .. .. . ..................................................................... .. . . ... ............ ..... . .... . ... .......... .. . ...... .. ... . .. . . .. . . . . . . . ...... ..
.. . . .. . ..... . . . .... .... . . . ..... ...... .. . .. ...... ..... . . .. ...... . . . .. . . ... . . . . . . . . . . .. .
. . .. . . . .. . . FR .................................................................................................................................................................................................................. JA . . . . . . . . .. . . . . . . . . . .. . ..... . ...... .......... . .............................................................................................................................. . . . ..... . ... . . ...... ......... .. . . .. .. .. . . . . . ... ... ...... . . . . . . .. .. .... ...... . .. .... .... .. . ... . ... ..... .. . .. .. . . . ... .. ....... ..... . ....... ......... . .... . .... ..... .. . ..... . . . ... . . .. . . . .............. . ....... . . .... . . .. ..... .. . ......... . . . ... . . . . . . . ......... . ... .... ... . . .... .. . .. . . . . .... .. .. . . . ... . .. . . . . .. . . . .. . .. ................................... ...... ...... .................. ......................................................... .. . .... . . . . .. . ...... . . . . ... .. . . .. . . . . . . . . . . . . . . .. . . . . . . . . . . . GR ........................................................................................................................................................................................................................................................ US . . . . . . . . . . . . . . ... . .. .. ... .. . ... . . . . . .. . . . .. .. ..... . ...... ... . . ................................................. ........ ........ ............................................................................................................ . . . . ... . ... . . . . .. . .. .. . . .. ... .. . . .. . . . . ... ... . .. . .... . . .. . .. . ... . .. .. . . . . .. . .. . .... . . ... ......... . .... . .... . .. .. . . .. .. .. . . ..... .. .. . . ...... . . . .. . . .. ...... .. . . .. ........... ... . ..... ....... ... .. .. .... . . . . . . . .. ....... . .... . . .. . . . . . . . ... . . ... . ......... . ....... . ...... . . .. .. . . ...... .. .. . . . . .. . ... .... .. . .. . .. ... .... .... .................. ...... ...... ............................................. . . . . ...... . ... . .. . .... . . . . . .. . . . . .. . . . . . IT ............................................................................................................................................................................................................... DC . . . . . . .. . . .. . ... ......... ... ........ . ............ ... . .. .. ... ........ .. .... . ...................................................................... ... . . .. .. . . .. ... ... .. ...... ... .. .... . .... . . .... .. . .. . . . . . ... .. . .. .. . ... ...... .. . .... . ..... .. . ... .. .. . .. . . . . . . ...... . . .... . . .. . .. . . .. .. .. .. . . . .. ... . . ... ... . . ... .... . .. .. . . .. . .. .. .. .. . . .. .. . ........ . . ...... . . .. . . .
SI ............... HO Figure 5.1: Linkages of the country models in the complete version of MULTIMOD. The incidence matrix of matrix D corresponding to the Jacobian matrix of current endogenous variables is given in Figure 5.2.
.. .. .. .... . .. .... . ...
5.3
Solution Techniques for RE Models
In the following section, we ﬁrst present the classical approach to solve models with forward looking variables, which essentially amounts to the FairTaylor extended path algorithm given in Section 5.3.1.
5.3 Solution Techniques for RE Models
93
0 50 100 150 200 250 300 350 400 450 0
100
200 300 nz = 2260
400
Figure 5.2: Incidence matrix of D in MULTIMOD. An alternative approach presented in Section 5.3.2 consists in stacking up the equations for a given number of time periods. In such a case, it is necessary to explicitly specify terminal conditions for the system. This system can then be either solved with a block iterative method or a Newtonlike method. The computation of the Newton step requires the solution of a linear system for which we consider diﬀerent solution techniques in Section 5.3.4. Section 5.3.3 introduces block iterative schemes and possibilities of parallelization of the algorithms.
5.3.1
Extended Path
Fair’s and Taylor’s extended path method (EP) [34] constitutes the ﬁrst operational approach to the estimation and solution of dynamic nonlinear rational expectation models. The method does not require the setting of terminal conditions as explained in Section 5.1.1. In the case where terminal conditions are known, the method does not need type III iterations and the correct answer is found after type II iterations converged. The method can be described as follows: 1. Choose an integer k as an initial guess for the number of periods beyond the horizon h for which expectations need to be computed to obtain a solution within a prescribed tolerance. Choose initial values for yt+1t−1 , . . . , yt+2h+kt−1 . 2. Solve the model dynamically to obtain new values for the variables yt+1t−1 , . . . , yt+h+kt−1 . Fair and Taylor suggest using GaussSeidel iterations to solve the model. (Type I iterations). 3. Check the vectors yt+1t−1 , . . . , yt+h+kt−1 for convergence. If any of these values have not converged, go to step 2. (Type II iterations).
5.3 Solution Techniques for RE Models
94
4. Check the set of vectors yt+1t−1 , . . . , yt+h+kt−1 for convergence with the same set that most recently reached this step. If the convergence criterion is not satisﬁed, then extend the solution horizon by setting k to k + 1 and go to step 2. (Type III iterations). 5. Use the computed values of yt+1t−1 , . . . , yt+h+kt−1 to solve the model for period t. The method treats endogenous leads as predetermined, using initial guesses, and solves the model period by period over some given horizon. This solution produces new values for the forwardlooking variables (leads). This process is repeated until convergence of the system. Fair and Taylor call these iterations “type II iterations” to distinguish them from the standard “type I iterations” needed to solve the nonlinear model within each time period. The type II iterations generate future paths of the expected endogenous variables. Finally, in a “type III iteration”, the horizon is augmented until this extension does not aﬀect the solution within the time range of interest. Algorithm 25 FairTaylor Extended Path Method
i Choose k and initial values yt+r , r = 1, 2, . . . , k + 2h , III III III repeat until [yt yt+1 . . . yt+h ] converged II II II repeat until [yt yt+1 . . . yt+h+k ] converged for i = t, t + 1, . . . , t + h + k I repeat until yi converged II I set yi = yi end end III III II II set [yt . . . yt+h ] = [yt . . . yt+h ] k =k+1 end
i = I, II, III
5.3.2
Stackedtime Approach
An alternative approach consists in stacking up the equations for successive time periods and considering the solution of the system written in Equation (5.3). According to what has been said in Section 3.1, we ﬁrst begin to analyze the structure of the incidence matrix of J, which is D A 0 ··· 0 E D A ··· 0 J = 0 E D ··· 0 , . . .. . . . . . 0 ··· D where we have dropped the time indices as the incidence matrices of E t+j , Dt+j and At+j j = 1, . . . , T are invariant with respect to j. As matrix D, and particularly matrices E and A are very sparse, it is likely that matrix J can be rearranged into a blocktriangular form J ∗ . We know that as a consequence it would then be possible to solve parts of the model recursively.
5.3 Solution Techniques for RE Models
95
When rearranging matrix J, we do not want to destroy the regular pattern of J where the same set of equations is repeated in the same order for the successive periods. We therefore consider the incidence matrix D∗ of the sum of the three matrices E + D + A and then seek its blocktriangular decomposition ∗ D11 . .. (5.16) D∗ = . . . .
∗ ∗ Dp1 . . . Dpp
Matrix D∗ corresponds to the structure of a system where all endogenous variables have been transformed into current variables by removing lags and leads. Therefore, if matrix D∗ can be put into a blocktriangular form by a permuta∗ tion of its rows and columns as shown in (5.16), where all submatrices Dii are ∗ indecomposable, there also exists a blockrecursive decomposition J of matrix J and the solution of system (5.3) can be obtained by solving the sequence of p subproblems in J ∗ y ∗ = b∗ , where y ∗ and b∗ have been rearranged according to J ∗ . The variables solved in subproblem i − 1 will then be exogenous in subproblem i for i = 2, . . . , p. Let us illustrate this with a small system of 13 equations containing forward looking variables and for which we give the matrices E, D and A already partitioned according to the blocktriangular pattern of matrix D.
11 4 7 6 1 12 5 9 10 13 3 2 8
E=
··· ·· ·· ··· ·· ··· ··· ··· ··· ··· ··· ·· ···
· · · · · · · · · · · · ·
······· ······· ······· ··· ··· ······· ······ ······· ···· ·· ······· · ···· ······· ······· ····· ·
· · · · · · · · · · · ·
· · · · · · · · D= · · · · ·
11 4 7 6 1 12 5 9 10 13 3 2 8
· · · · ··· ··· ··· ·· ··· ·· ··· · · ···
11 4 7
· · · · · · · ·
··· ·· · · ··· ·· ··· ···· · · ··· · ·· ·· ···· ·· ···· ··
6
A=
1 12 5 9 10 13 3 2 8
··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···
· · · · · · · · · · · · ·
······· ······· ······· ······ ······· ······· ······· ······ · ····· ······· ······· · ·· ·· ·······
· · · · · · · · · · · · ·
· · · · · · · · · · · · ·
147 6 1159113 2 8 1 2 03
147 6 1159113 2 8 1 2 03
147 6 1159113 2 8 1 2 03
The sum of these three matrices deﬁne D∗ which has the following blocktriangular pattern
11 4 7
D =
∗
6 1 12 5 9 10 13 3 2 8
·· ··· ··· ·· ··· ·· ··· · · · ···
····· ·· ·· · ··· · · ···· · ·· ·· ·· ··· ·· ·· ·· · ·· ·· · ··· ·· · ···· · ·
147 611591132 8 1 2 03
With respect to D, the change in the structure of D∗ consists in variables 6 and 2 being now in an interdependent block. The pattern of D∗ shows that the solution of our system of equations can be decomposed into a sequence of three subproblems.
5.3 Solution Techniques for RE Models
96
Matrix D∗ deﬁnes matrices E11 , D11 , E22 , D22 and A22 . Matrix A11 , E33 and A33 are empty matrices and matrix D33 is constituted by a single element.
E11 =
11 4 7
··· ·· ··
147 1
D11 =
11 4 7
·
·
147 1
6 1 12 5 9 10 13 3 2
E22 =
· · · · · · · · ·
6
· · · · · · · ·
· · · · · · · · ·
· · · · · · · ·
· · · · · · · ·
· · · · · · · ·
· · · · · · · · ·
· · · · · · · ·
· · · · · · · ·
2
6 1 12 5 9 10 13 3 2
D22 =
· · · · · · ·
6
·· ·· · · · · ·· ·· ···· · · · ·· ·· · ···· ··
1 1 5 9 1 1 3 2 0 3 2
· · · · ·
6 1 12 5 9 10 13 3 2
A22 =
· · · · · · · · ·
6
· · · · · · · ·
· · · · · · ·
· · · · · · · · ·
· · · · · · · · ·
· · · · · · · ·
· · · · · · · · ·
· · · · · · · ·
· · · · · · · · ·
2
1 1 5 9 1 1 3 2 0 3
1 1 5 9 1 1 3 2 0 3
Stacking the system, for instance, for three periods gives J ∗ · · ··· · ··· · · ··· · ··· · · . ..... ...
∗ J11 ∗ J = · · · J22 = ∗ · · · · · · J33
∗
.................. ............................................................. .................. ............................................................. .................. ............................................................. ... ..... .
.................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. .................. ..................
· · · · · · · · · · · · · · · ·
· ······ · · ······· · · ······· · · ······· · · · ····· · · ······· · · ······· · · ······ · · · ·· ·· · · ······ · · · · ····· · ······· · · · · · ···· · ······· · · ······· · ····· · · · ······· · ······· · · ·· · · · · · ······· · ······· · ·· · · ······· · · · ···· ·· · · ·· ·· · · ··· ··· · · ······· · · ····· · · ······· · · · ······ · ·· ··· · ······· · · ······· · ······· · · ····· · · ······· · · ····· · · · ·· · · ······· · · ···· ·· ... ..... ..... .... ..... ..... ... · ····· ······· · ···· ····· · ··· ·· ···· ·· ··· ··· ······· ······· ······ ······· ······· · · ·· ······· ······· .
As we have seen in the illustration, the subproblems one has to consider in the p successive steps are deﬁned by the partition of the original matrices D, E and A according to the p blocks of matrix D∗ D11 A11 E11 . . . .. .. .. E = . D = . A = . (5.17) . . . . . . Ep1 . . . Epp Dp1 . . . Dpp Ap1 . . . App
∗ and the subsystem to solve at step i is deﬁned by the matrix Jii Dii −Aii −Eii Dii −Aii ∗ .. .. .. Jii = . . . . −Eii Dii −Aii −Eii Dii
(5.18)
5.3 Solution Techniques for RE Models
97
It is clear that for the partition given in (5.17), some of the submatrices Dii are likely to be decomposable and some of the matrices Eii and Aii will be zero. ∗ As a consequence, matrix Jii can have diﬀerent kinds of patterns according to whether the matrices Eii and Aii are zero or not. These situations are commented hereafter. If matrix Aii ≡ 0, we have to solve a blockrecursive system which corresponds ∗ to the solution of a conventional dynamic model (case of J11 in our example). In practice, this corresponds to a separable submodel for which there do not exist any forward looking mechanisms. As a special case, the system can be completely recursive if Dii is of order one, or a sequence of independent equations ∗ or blocks if Eii ≡ 0. This is the case of J33 in our example.
∗ In the case where matrix Aii ≡ 0 and Eii ≡ 0 as for J22 in our example, we have to solve an interdependent system. In practice, we often observe that a large part of the equations are contained in one block Dii , for which the corresponding matrix Aii is nonzero, and one might think that little is gained with such a decomposition. However, even if the size of the blockrecursive part is relatively small compared to the rest of the model, this decomposition is nevertheless useful, as in some problems the number of periods T for which we have to stack the model can go into the order of hundreds. The savings in term of size of the interdependent system to be solved may therefore not be negligible.
From this point onwards in our work, we will consider the solution of an inde∗ composable system of equations deﬁned by Jii .
5.3.3
Block Iterative Methods
∗ We now consider the solution of a given system Jii and denote by y = [y1 . . . yT ] the array corresponding to the stacked vectors for T periods. In the following we will use dots to indicate subarrays. For instance, y·t designates the tth column of array y. ∗ A block iterative method for solving an indecomposable Jii , consists in executing K loops over the equations of each period. For K = 1, we have a point method and, for an arbitrary K, the algorithm is formalized in Algorithm 26.
Algorithm 26 Incomplete Inner Loops
1. 2. 3. 4. 5. repeat until convergence for t = 1 to T for k = 1 to K 0 1 y·t = y·t for i = 1 to n 1 Evaluate yit end end end end
Completely solving the equations for each period constitutes a particular case of this algorithm. In such a situation, we execute the loop over k until the
5.3 Solution Techniques for RE Models
98
convergence of the equations of the particular period is reached, or equivalently set K to a corresponding large enough value. We may mention for block methods that other algorithms than GaussSeidel can be used for solving a given block t. We notice that this technique is identical to the FairTaylor method without considering type III iterations, i.e. if the size of the system is kept ﬁx and the solution horizon is not extended. As already discussed in Section 2.4.6, the convergence of nonlinear ﬁrstorder iterations has to be investigated on the linearized equations. To present the analysis of the convergence of these block iterative methods, it is convenient to introduce the notation hereafter. When solving Cx = b the kth iteration of a ﬁrstorder iterative method is generally written as M xk+1 = N xk + b , where matrices M and N correspond to a splitting of matrix C as introduced in Section 2.4. The iteration k of the inner loop for period t is then deﬁned by
k+1 k+1 k k M yt = −N yt − Eyt−1 − Ayt+1 ,
(5.19)
where matrices M and N correspond to the chosen splitting of matrix Dii . When solving for a single period, variables related to periods other than t are considered exogenous and the matrix which governs the convergence in this case is M −1 N . Let us now write the matrix that governs the convergence when solving the system for all T periods and where we iterate exactly K times for each of the k+1 systems concerning the diﬀerent periods. To do this we write yt in (5.19) as 0 a function of yt which gives yt where H (K) = I +
i=1 (K) 0 = (M −1 N )K yt − H (K) M −1 Eyt−1 − H (K) M −1 Ayt+1 , (K) (K)
K−1
(M −1 N )i
.
Putting these equations together for all periods we obtain the following matrix
(M −1 N )K H (K) M −1 E H (K) M −1 A (M −1 N )K .. . H (K) M −1 A .. . H (K) M −1 E .. H (K) M −1 A −1 N )K (M
. (M −1 N )K H (K) M −1 E
.
(5.20)
In the special case where K = 1, we have H 1 = I and we iterate over the stacked model. In the case where K is extended until each inner loop is completely solved we have (M −1 N )K ≈ 0 and H (K) reaches its limit.
5.3 Solution Techniques for RE Models
99
Serial Implementation Fundamentally, our problem consists in solving a system as given in (5.3). The extended path method increases the system period by period until the solution stabilizes within a given range of periods. The other strategy ﬁxes the size of the stacked model as well as the terminal conditions in (5.3) and solves this system at once. As has already been mentioned, system (5.3) has a particular pattern since the logical structure of matrices E t+j , Dt+j and At+j , j = 1, . . . , T is invariant with respect to j. In the following, we will take advantage of this particularity of the system. A ﬁrst and obvious task is to analyze the structure of system (5.3) in order to rearrange the equations into a blockrecursive form. This can be done without destroying the regular pattern of matrix J as shown in Section 5.3.2. We will now solve the country model for Japan in MULTIMOD. We recall that all country models in the seven industrialized countries and the model covering small industrial countries are identical in their structure. The same analysis therefore holds for all these submodels. The model of our experiment has a maximum lag of three and lead variables up to ﬁve periods. We therefore have to solve a system the structure of which is given by
Dt+1 E t+1 1 E t+1 2 E t+1 3 = At+2 At+3 1 2 Dt+2 At+3 1 t+2 E1 Dt+3 t+2 t+3 E2 E1 At+4 3 At+4 2 At+4 1 Dt+4 At+5 4 At+5 3 At+5 2 At+5 1 At+6 5 At+6 At+7 4 5 At+6 At+7 At+8 3 4 5 t+6 A2 At+7 At+8 At+9 3 4 5 .
t+2 t+3 t+4 E3 E2 E1 Dt+5 At+6 At+7 At+8 At+9 1 2 3 4 t+3 t+4 t+5 E2 E1 Dt+6 At+7 At+8 At+9 E3 1 2 3 t+4 t+5 t+6 E3 E2 E1 Dt+7 At+8 At+9 1 2 t+5 t+6 t+7 E3 E2 E1 Dt+8 At+9 1 t+6 t+7 t+8 E3 E2 E1 Dt+9 t+7 t+8 t+9 E3 E2 E1 t+8 t+9 E3 E2 t+9 E3
.. .. .. .. .. .. .. .. ..
. . At+T −3 5 . At+T −3 At+T −2 4 5 . At+T −3 At+T −2 At+T −1 3 4 5 . At+T −3 At+T −2 At+T −1 At+T 2 3 4 5 . At+T −3 At+T −2 At+T −1 At+T 1 2 3 4 . Dt+T −3 At+T −2 At+T −1 At+T 1 2 3
t+T . E1 −3 Dt+T −2 At+T −1 At+T 1 2 t+T t+T . E2 −3 E1 −2 Dt+T −1 At+T 1 t+T t+T t+T E3 −3 E2 −2 E1 −1 Dt+T
S
The size of the submatrices is 45 × 45 and the corresponding incidence matrices are given in Figure 5.3. As already mentioned in the introduction of this section, we consider two fundamental types of solution algorithms: ﬁrstorder iterative techniques and Newtonlike methods. For ﬁrstorder iterative techniques we distinguish point methods and block methods. The FairTaylor algorithm typically suggests a block iterative GaussSeidel method.
5.3 Solution Techniques for RE Models
100
E3
E2
E1
D
A1
A2 to A4
A5
Figure 5.3: Incidence matrices E3 to E1 , D and A1 to A5 . We ﬁrst tried to execute the FairTaylor method using GaussSeidel for the inner loop. For this particular model, this method immediately fails as GaussSeidel does not converge for the inner loop. The model is provided in a form such that the spectral radius governing the convergence is around 64000. We know that diﬀerent normalizations and orderings of the equations change this spectral radius. However, it is very diﬃcult to ﬁnd the changes that make this radius inferior to unity (see Section 3.3.3). We used heuristic methods such as threshold accepting to ﬁnd a spectral radius of value 0.62, which is excellent. This then allowed us to apply the GaussSeidel for the inner loop in the FairTaylor method. For the reordered and renormalized model, it might now be interesting to look at the spectral radii of the matrices which govern the convergence of the ﬁrstorder iterative point and block methods. Table 5.2 gives these values for diﬀerent T and for the matrices evaluated at the solution.
T 1 2 10 15 20 point GS 0.6221 0.7903 1.2810 1.3706 1.4334 block GS 0.6221 0.2336 0.7437 0.8483 0.8910
Table 5.2: Spectral radii for point and block GaussSeidel. First, these results reveal that a point GaussSeidel method will not converge for relevant values of T . Block GaussSeidel is likely to converge, however we
5.3 Solution Techniques for RE Models
101
observe that this convergence slows down as T increases. Using the SOR or FGS method with ω < 1, which amounts to damping the iterates, would provide a possibility of overcoming this problem. Parallel Implementation The opportunities existing for a parallelization of solution algorithms depend on the type of computer used and the particular algorithm selected to solve the model. We distinguish between two types of parallel computers: single instruction multiple data (SIMD) and multiple instructions multiple data (MIMD) computers. We will ﬁrst discuss the parallelization potential for data parallel processing and then the possibilities of executing diﬀerent tasks in parallel. In this section, the tasks are split at the level of the execution of a single equation. Data Parallelism. The solution of RE models with ﬁrstorder iterative techniques oﬀer large opportunities for data parallel processing. These opportunities are of two types. In the case of a single solution of a model, the same equations have to be solved for all the stacked periods. This means that one equation has to be computed for diﬀerent data, a task which can be performed in parallel as in Algorithm 27. Algorithm 27 Period Parallel
1. 2. 3. 4. while not converged y0 = y1 for i = 1 to n for t = 1 to T in parallel do 1 Evaluate yit end end end
To ﬁnd a rational solution which uses the mathematical expectation of the variables, or in the case of stochastic simulation or sensitivity analysis, we have to solve the same model repeatedly for diﬀerent data sets. This is a situation where we have an additional possibility for data parallel processing. In such a case, an equation is computed simultaneously for a data set which constitutes a threedimensional array. The ﬁrst dimension is for the n variables, the second for the T diﬀerent time periods and the third dimension goes over S diﬀerent simulations. Algorithm 28 presents such a possibility.
5.3 Solution Techniques for RE Models
102
Algorithm 28 Parallel Simulations
1. 2. 3. 4. while not converged y0 = y1 for i = 1 to n for t = 1 to T and s = 1 to S in parallel do 1 Evaluate yits end end end
With the arrayprocessing features, as implemented for instance in High Performance Fortran [66], it is very easy to get an eﬃcient parallelization for the steps that concern the evaluation of the model’s equations in the above algorithms. According to the deﬁnitions given in Section 4.1.4, the speedup for such an implementation is theoretically T S with an eﬃciency of one, provided that T S processors are available. Statement 1 and the computation of the stopping criteria can also be executed in parallel. Task Parallelism. We already presented in Section 4.2.2 that all the equations can be solved in parallel with the Jacobi algorithm and that for the GaussSeidel algorithm it is possible to execute sets of equations in parallel. Clearly, as mentioned above, in a stacked model a same equation is repeated for all periods. Of course, such equations also ﬁt the data parallel execution model aforementioned. It therefore appears that for the solution of a RE model, both kinds of parallelism are present. To take advantage of the data parallelism and the control or task parallelism, one needs a MIMD computer. In the following, we present execution models that solve diﬀerent equations in parallel. For Jacobi, this parallelism is for all equations whereas for GaussSeidel it applies only to subsets of equations. Contrary to what has been done before, the two methods will now be presented separately. For the Jacobi method, we then have Algorithm 29. Algorithm 29 Equation Parallel Jacobi
1. 2. 3. 4. while not converged y0 = y1 for i = 1 to n in parallel do for t = 1 to T and s = 1 to S in parallel do 1 0 Evaluate yits using y··s end end end
Statement 2 corresponds to a control parallelism and Statement 3 to a data parallelism. The theoretical speedup for this algorithm on a machine with nT S
5.3 Solution Techniques for RE Models
103
processors is nT S. For a decomposition of the equations into q levels L1 , . . . , Lq , the GaussSeidel method is shown in Algorithm 30. Algorithm 30 Equationparallel GaussSeidel
1. 2. 3. 4. 5. while not converged y0 = y1 for j = 1 to q for i ∈ Lj in parallel do for t = 1 to T and s = 1 to S in parallel do 1 1 Evaluate yits using y··s end end end end
Statement 3 corresponds to a control parallelism and Statement 4 to a data parallelism. The theoretical speedup for the GaussSeidel algorithm using maxj Lj × T S processors is nT S . q In the Jacobi algorithm, Statement 1 updates the array for the computed values which are then used in Statement 4. In the GaussSeidel algorithm, Statement 5 overwrites the same array with the computed values. The update in Statement 1 is only needed in order to be able to check for convergence. Incomplete Inner Loops. When executing an incomplete inner loop we have to make sure we update the variables for a period t in separate arrays because, during the execution of the K iterations for a given period t, variables belonging to periods others than t must remain unchanged. As the computations are made in parallel for all T periods, the data has to be replicated in separate arrays. Figure 5.4 illustrates how data has to be aligned in the computer’s memory. The incomplete inner loop technique with Jacobi is shown in Algorithm 31. Algorithm 31 Equation Parallel Jacobi with Incomplete Inner Loops
1. 2. 3. 4. 5. 6. while not converged y0 = y1 1 for t = 1 to T and s = 1 to S in parallel do y··s = y··s end for k = 1 to K for i = 1 to n in parallel do for t = 1 to T and s = 1 to S in parallel do 1 Evaluate yits using y··ts end end 1 7. for t = 1 to T and s = 1 to S in parallel do y·tts = y·ts end end end
In this algorithm y 1 and y are diﬀerent arrays. The former contains the updates for the n×T ×S computed variables and the latter is an array with an additional
5.3 Solution Techniques for RE Models
104
1 y··s
. . .. . ·tts .. .. . . . .. .. .. . .. . . . . ................. ............................................... .. .................·...............................·.....·......... . · · . · · · . . . . . . . · .. . · .· . · · · .· · · · · . . . . · . ·. .. . ..........·........................·.....·...... .....·....... .. . . . . .· ·· .. ·. ... .......................··.....·.................·................. · . · · · .. . · . · . . .· . . . .. . .. . · . · · . · .........·........................·... ........·.....·.....·.... · . · . . . . . . . · . . . . . ......................·..................·...................... · . · . · · · · .. · · · .. . . . . · · · · ·. · · · .. . · . · . · . .. . . . . . . · . . . . · · · ·. · ·. · · ·. . · · . · . . . . .. . . . . . . ................·............. ....·............·................ . . . · · · .............................··...........·..................··.. · . · . · . . . . . . · · · · . · · · . . · . · . · . . . .. . · . . . . . · ·. ·. · · ·. · .. . · . · . · . . . · . . . · . . . .. · . . . . ..........·........................·.....·...................... · . · . · . · . .. · · · .....................··.....·.......................··....·.... · . · . · . · . .. . . . . . . . . . .. · · · · · · · . . . . .. . . . . . . · . . . · . ·.....·.....·........................·.... . . . . . ..... · ...............·........................·.....·................ .· . · . · . · ...... . . . . . · . . . . .. · · · · .. · . · . · . · . .. . ........................ · . . . . .. . · . · . · . · ..× . · ·· ·· ·· . · · · · · .. . . . . . . · .. . .. . · · · · · .. . . . . · .. T . . . . . . . · . . . . ......................·.....·........................·.......... · . · . · . · . ..... . . . . . ..........·........................·.....·...................... · . · . · . · ...... . · .· ·· ·· ·· .· .· ·· ·· ·· . · . · . · . . . . . . . . . . . · . .. .. . . .. . · · · · · ·. . . . . . . · . .. . . . · . . . . . . . . . ................·........................·.....·............. ... · . · . · . · . ..... · · . .............................·.....·........................·.... · . · . · . · ...... . · . · . · . . · ·· ·· ·· . . · · · · · .. . · . . . . .. · . . . . .. . . . . · . .. . · · · · · .. . . . · . . · . . . ..........·........................·.....·...... .....·.......... · . · . · . · . ..... . . · . .......................·........................·................ · . · . · . · ....... · · · ·. . . . . . . . · · · · · · · · .. . · . · . · . · . .... · . . . . . . .. . · .. . · . . . . . · · · . · . · ·. . . . . . · . . . . . . . . ....... .....................·.....·...... .....·.....·....... .. · . · . · . · . ..... .. t .. . · . . · . . · . . . .. . . · .................·.................·......·.....·.....·......·.... · . · . · . · ...... .. . · · · · · ·. . · . · . · . · . .. . .. ... · . . . . . . · . · . · . · .. .. . · . . . . . · · · · . · · · · ..· . . . . . ..... . . . . . . .. . · · · · · · · · · . .. · . · . · . · ...... . . . . . . . . . . . .. . . · · · · · · · ·. · . · . · . · . .. . .. .. . · . · . . · . · · · · · · · ..· · . · . ...... . · . · . ..... . · .. .. . . . . . . .. . · · · · · · · · · · . . . . ··ts . · · · t · · · T · .. · · . · . · . .. . .1 .. . · . · . · . . . . . . . . · · · · · · .. · · · . . . ..... . · . . .. ...... . . · · · · · · · . · · · . · . . . 1 .. · · . · . .. . .. .. · . · . . . . · · . .... .. . . · . . · . . . . .. · · . .... · · · . . . ·................. . · . . ............... . · . .. .. 1 i .. .. . .. .. .. .. .. n . . .. .. .. .. . .. .. .. .. . .. .. .. .. .. .. .. .. .. . .
y
y···s
y
1 y·ts
Figure 5.4: Alignment of data in memory. dimension. Statement 2 performs the replication of data. If we take the fourth dimension, which stands for the independent repetitions of the solution of the model constant, Statement 2 builds a cube by spreading a twodimensional data array into the third dimension. In Statement 6, we update the equations using these separate data sets. Statement 7 updates the diagonal y·tt , t = 1, . . . , T in the cube. Algorithm 32 Equation Parallel GaussSeidel with Incomplete Inner Loops
1. 2. 3. 4. 5. 6. 7. while not converged y0 = y1 1 for t = 1 to T and s = 1 to S in parallel do y··ts = y··s end for k = 1 to K for j = 1 to q for i ∈ Lj in parallel do for t = 1 to T and s = 1 to S in parallel do Evaluate yitts using y··ts end end end end 1 8. for t = 1 to T and s = 1 to S in parallel do y·ts = y·tts end end
Apart from the fact that only equations of a same level are updated in parallel, the only diﬀerence between the Jacobi and the GaussSeidel algorithm, resides in the fact that we use the same array for the updated variables and the variables of the previous iteration (Statement 7).
5.3 Solution Techniques for RE Models
105
The parallel version of Jacobi (Algorithm 29) and the algorithm which executes incomplete inner loops (Algorithm 31) have the same convergence as the corresponding serial algorithm (Algorithm 26). This is because Jacobi updates the variables at the end of the iteration and therefore the order in which the equations are evaluated does not matter. Clearly the spectral radius varies with the number T of stacked models. The algorithm with the incomplete inner loops updates the data after having executed the iterations for all the stacked models. This update between the stacked models corresponds to a Jacobi scheme. The convergence of this algorithm is then deﬁned by the spectral radius of a matrix similar to the one given in (5.20). Application to MULTIMOD For this experience, we only considered the Japan country model in isolation and solved it for 50 periods, which results in a system of 51 × 50 = 2500 equations. According to what has been discussed in Section 5.3.2 and Section 3.1, it is possible to decompose this system into three recursive systems. The ﬁrst contains 1 × 50 equations and the last 5 × 50 = 250 equations that are all independent. The nontrivial stacked system to solve has 45 × 50 = 2250 equations. In the previous section, we discussed in detail the several possibilities for solving such a system in parallel at an equation level. It rapidly became obvious that such a parallelization does not suit the SPMD execution model of the IBM SP1, since this approach seems not to work eﬃciently in such an environment for our “medium grain” parallelization. We therefore chose to parallelize at modellevel, which means that we consider ﬁnding the solution of the diﬀerent models related to the diﬀerent periods t = 1, . . . , T as the basic tasks that have to be executed in parallel. These tasks correspond to the solution of systems of equations for which the structure of the Jacobian matrices are given in Figure 5.3. As a consequence, the algorithm becomes Algorithm 33 Model Parallel GaussSeidel
1. 2. 3. 4. while not converged y0 = y1 1 for t = 1 to T and s = 1 to S in parallel do y··s = y··s end for t = 1 to T in parallel do for s = 1 to S solve model for y·tts end end end 1 5. for t = 1 to T and s = 1 to S in parallel do y·tts = y·ts end end end
where each model in Statement 4 is solved on a particular processor with a conventional serial point GaussSeidel method. Solving the system for T = 50 and S = 1 on 4 processors, we obtained a
5.3 Solution Techniques for RE Models
106
surprising result, i.e. the execution time with four processors was larger than the execution time with one processor. To understand this result we have to analyze in more detail the time needed to execute the diﬀerent statements in Algorithm 33. Figure 5.5 monitors the elapsed time to solve the problem for S = 1 and S = 100 with four processors and with a single processor. The sequence of solid segments represents the time spent execute Statement 4 in Algorithm 33. The intervals between these segments correspond to the time spent to execute the other statements, which essentially concerns the communication between processors. In particular, Statement 2 broadcasts the new initial values to the processors and Statement 5 gathers the computed results.
R=1 proc 3 proc 2 Processor proc 1 proc 0 single proc 0 0.1 0.2 0.3 0.4 0.5 0.6 Elapsed time in seconds R=100 proc 3 proc 2 Processor proc 1 proc 0 single proc 0 10 20 30 40 Elapsed time in seconds 50 60 0.7 0.8 0.9 1
Figure 5.5: Elapsed time for 4 processors and for a single processor. From Figure 5.5, it appears that, in the case for S = 1, the overhead for communication with four processors is roughly more than three times the time spent for computation. For S = 100, the communication time remains constant, while the computing time increases proportionally. Hence, as the communication time is relatively small compared to the computing time, in this case we achieve a speedup of almost four. This experience conﬁrms the crucial tradeoﬀ existing between the level of parallelization and the corresponding intensity of communication. In our case, a parallelization of the solution algorithm at equationlevel appeared to be by far too ﬁnegrained. Gains over a single processor execution start only at modellevel where the repetitive solutions are executed serially on the diﬀerent processors.
5.3 Solution Techniques for RE Models
107
5.3.4
Newton Methods
As already mentioned in Section 2.6.1, the computationally expensive steps for the Newton method are the evaluation of the Jacobian matrix and the solution of the linear system. However, if we look at these problems closer, we may discover that interesting savings can be obtained. First, if we resort to an analytical calculation of the Jacobian matrix, we automatically exploit the sparse structure since only the nonzero derivatives of matrices Ei , D and Aj , i = 1, . . . , r , j = 1, . . . , h will be computed. Second, the linear systems in successive Newton steps all have an identical logical structure. This fact can be exploited to avoid a complete refactorization from scratch of the system’s matrix. Third, macroeconometric models are in general used for policy simulations or forecasting, which can be done in a deterministic way or better by means of stochastic simulations. This implies that one has to solve a system of equations the logical structure of which does not change many times. As a consequence, some intermediate results of the computations in the solution algorithm remain identical for all simulations. These results are therefore computed only once and their computational cost can be considered as an overhead. The following three sections present methodologies and applications of techniques that make the most of the features just discussed hereabove. In the next section, we discuss a sparse elimination algorithm using a graph theoretical framework. The linear system shows a multiple block diagonal pattern that is then utilized to set up a block LU factorization. Nonstationary iterative methods are suited for the large and sparse systems we deal with and some experiments with our application are also given. The numerical experiments are again carried out on the MULTIMOD model we presented in Section 5.2. Sparse Elimination Although eﬃcient methods for the solution of sparse linear systems are by now available (see Section 2.3.4), these methods are not widely used for economic modeling. In the following, we will present a method for solving linear systems which only considers nonzero entries in the coeﬃcient matrix. The linear system is solved by substituting one variable after another in the equations. This technique is equivalent to a Gaussian elimination where the order of the pivots is the order in which the variables are substituted. We will refer to this technique as sparse Gaussian elimination (SGE). In order to eﬃciently exploit the sparsity of the system of equations in this substitution process, we represent the linear system of the Newton algorithm, say Ay = b, by means of an oriented graph. The adjacency matrix of this graph corresponds to the incidence matrix of our Jacobian matrix J. The arcs of the graph are given values. For an arc going from vertex i to vertex j the value is given by the element Jji in the Jacobian matrix. The substitution of variables in the equations corresponds to an elimination of vertices in the graph.
5.3 Solution Techniques for RE Models
108
Similar to what happens when substituting variables in a system of equations, the sum of direct and indirect links between the remaining vertices has to remain unchanged. In other words, this means that the ﬂow existing between the vertices before and after the elimination has to remain the same. Elimination of Vertices. Let us now formalize the elimination of a vertex k. We denote Pk the set of predecessors and Sk the set of successors of vertex us k. For all vertices i ∈ Pk and j ∈ Sk , we create a new arc i −→ j, the value ur uv us = ur uv of which corresponds to the path i −→ k −→ j. Then vertex k and all its ingoing and outgoing arcs are dropped.
w If before elimination of vertex k the graph contains an arc i −→ j with i ∈ Pk and j ∈ Sk , then the new graph will contain parallel arcs. We replace these parallel arcs by a single arc, the value of which corresponds to the sum of the two parallel arcs.
u
Elimination of Loops. If the sets Pk and Sk contain a same vertex i, the elimination of vertex k will create an arc which appears as a loop over vertex i in the new graph. To illustrate how loops can be removed from a vertex, let us consider the following simple case: i k In terms of our equations this corresponds to: yk yj and substituting yk gives us yj = ur uv yi , 1 − ul = =
ul .... ........ .. .. . . . . .. .. .. .. ... uv .. .. . ........................ ...................... ........................ ...................... . ur
j
u r yi + u l yk , u v yk ,
from which we deduce that the elimination of the loop corresponds to one of the following modiﬁcations in the graph i
ur . ....................... ....................... .
k
uv 1−ul .. ....................... .......................
j
i
ur 1−ul .. ....................... .......................
k
uv . ....................... ....................... .
j
.
We now describe the complete elimination process of the linear system Ay = b, where A is a real matrix of order n and GA the associated graph. The elimination proceeds in n − 1 steps and, at step k, we denote by A(k) the coeﬃcient matrix of the linear system after substitution of k variables and GA(k) is the corresponding graph. In GA(n−1) , the variable yn depends only upon exogenous elements and the solution can therefore be obtained immediately. Similarly, given yn we can solve for variable yn−1 in GA(n−2) . This is continued until all variables are known. Hence, before eliminating variable yk , we store the equation deﬁning yk . In the
5.3 Solution Techniques for RE Models
109
graph GA(k−1) , this corresponds to the partial graph deﬁned by the arcs ingoing vertex k. At this point, it might clarify our point to illustrate the procedure by means of a small example constituted by a system of four equations
f1 f2 f3 f4 : : : : f1 (y1 , y3 , y4 ) = 0 y2 = g2 (y1 , z1 ) , y3 = g3 (y1 , y4 , z2 ) f4 (y2 , y4 ) = 0 −1 0 u8 u6 u5 −1 0 0 ∂F = u7 0 −1 u3 , ∂y 0 u2 0 −1 0 0 u1 0 ∂F = 0 u4 , ∂z 0 0
which is supposed to represent the mix of explicit and implicit relations one frequently encounters in real macroeconometric models.
z1 GA :
1 . .............. ............ .. ..
u
y2
u2 u3 u4 . . . .. . . .. . . . . .............. ............ .. . . . . . .. . .. 4 ..................... 3 ..................... ... ... .. .. . .. .. . .. ... . ... . . .. . .. . . .. .. . . . .. . .. . . .. u .u . .. . . 6 .. 5 .. . .. . . . .. . .. .. . . . . .. .. .. .. . . .. u8 .. . .. . . .. . . .. . . u7 .. .. .. .. ... ... .. . . . . ....... .... ....... .... . .. .. ........................... ........................ .. 1
y
y
z2
y
• Elimination of vertex y1 :
y1 u9 u10 u11 u12 = u6 y4 + u8 y3 = = = = u8 u5 u8 u7 u6 u5 u3 + u6 u7
Py1 = {y3 , y4 } and Sy1 = {y2 , y3 }
u11 u10 .... .......... .. ..... .... ....... ... ... ... ... .. .. .. . . .. .. . .. . .. .. .. ... . u2 u12 .. ... u4 ..... ..... ...... . . 2 .................... 4 .................... 3 ................... .... ... . .. ... ... ... ... ... ... ... ... .... . ... .... .... ...... ...................... ..............
GA(1) :
z1
u1 ............. ............. .
y
y
y
z2
u9
• Elimination of vertex y2 :
y2 = u1 z1 + u11 y4 + u9 y3
Py2 = {z1 , y3 , y4 } and Sy2 = {y4 }
u15 .... ..... .. .. ... . . . . .. .. .. .. .. . u10 .... ..... .. .. ... . . . . .. .. .. . .. ..
u13 = u1 u2 u14 = u9 u2 u15 = u11 u2
GA(2) :
z1
............. .............
u13
y4
............. ............. ... .. .. ... ... ... ... ... .... ....... .......... ... u14
u12
y3
4 ............. ............
u
z2
• Elimination of vertex y3 :
c1 = 1 − u10 u16 = u14 /c1
Py3 = {z2 , y4 } and Sy3 = {y4 }
u15 ..... ....... .. .. . . . . .. .. .. . .. ...
GA(2) :
z1
13. ............. .......... .. .
u
y4
12. ............. .......... .. . ... . .. . .. ... .. ... ... .... ....... ............ ... u16
u
y3
. . 4 ............. ............
u
z2
y3
= u4 z2 + u12 y4 GA(3) : z1
u13 . ............. .............
u17 = u4 u16 u18 = u15 + u12 u16
u18 .... ..... .. .. ... . . . . . . .. .. . .. .. .
y4
. 17 ............. .............
u
z2
• Solution for y4 :
5.3 Solution Techniques for RE Models
110
c2 = 1 − u18 y4 = (u13 z1 + u17 z2 )/c2
Given the expressions for the forward substitutions and the backward substitutions, the linear system of our example can be solved executing 29 elementary arithmetic operations. Minimizing the Fillin. It is well known that, for a sparse Gaussian elimination, the choice of the variables to be substituted (pivots) is also conditioned by making sure an excessive loss of sparsity is avoided. From the illustration describing the procedure of vertex elimination, we see that an upper bound for the number of new arcs generated is given by d− d+ , the k k product of indegree and outdegree of vertex k. We therefore select a vertex k such that d− d+ is minimum over all remaining vertices in the graph.2 k k We reconsider the previous example in order to illustrate the eﬀect of the choice of the order in which the variables are eliminated. • Elimination of vertex y2 :
y2 = u5 y1 + u1 z 1 GA(1) :
Py2 = {z1 , y1 } and Sy2 = {y4 }
z1
u9 u3 u4 . . ...... ............. ............. ..... 4 .................... 3 ................... . ... .. ... ... . . ... . .. .. . .... ... . . . . . . . .. . . .. . . . . .u . . . . 6 .. .. . . u10 .. . . . . . . .. . .. . . .. .. . . .. . .. u8 . . .. . . . ... . . u .. ... . 7 .. . ... .. . ... ... ... ... .. .. .......................... .... . ........................... . 1 ...
y
y
z2
u9 = u1 u2 u10 = u5 u2
y
• Elimination of vertex y3 :
y3 = u7 y1 + u3 y4 + u4 z 2
Py3 = {z2 , y1 , y4 } and Sy3 = {y1 }
z1
u9 . ............. ............. 4 2 .. .... ... . . .. . . .. . . . .. . .. .. . . . . . .u .. . . 13 . . u10 .. . . . . . .. .. . .. .. .. .. . . .. .. u11 . ... . ... .. .. ... ... ... . . ....... ...... . 1 ... . .. .. .. ... .. . . .. . .. . .. .. ..... ... u12
y
z
u11 = u4 u8 u12 = u7 u8 u13 = u3 u8 + u6
GA(2) :
y
• Elimination of vertex y4 :
y4 = u10 y1 + u9 z1
Py4 = {z1 , y1 } and Sy4 = {y1 }
z1 z2
u14 = u9 u13 u15 = u10 u13 + u12
GA(3) :
. . . . . . . . . . .. . .. .. .. . u15 .. .. . ...u14 . ... .... ..... .. ... .. ... .. .. .. u11 . . .... . .... . .. .. ..... .. .. . ..... .. .. . . . . ...... ... .. ... ..... . ..... .
y1
• Solution for y1 :
c1 = 1 − u15 y1 = (u14 z1 + u11 z2 )/c1
2 This
is known as the Markowitz criterion, see Duﬀ et al. [30, p. 128].
5.3 Solution Techniques for RE Models
111
For this order of vertices, the forward substitutions and the backward substitutions necessitate only 24 elementary arithmetic operations. Condition of the Linear System. In order to achieve a good numerical solution for a linear system, two aspects are of crucial importance. The ﬁrst is the condition of the problem. If the problem is not reasonably well conditioned there is little hope of obtaining a satisfactory solution. The second problem concerns the method used which should be numerically stable. This means roughly that the errors due to the ﬂoating point computations are not excessively magniﬁed. In the following, we will suggest practical guidelines which may prove helpful to enhance the condition of a linear system. Let us recall that the linear system we want to solve is given by J (k) s = b. We choose to associate a graph to this linear system. In order to do this, it is necessary to select a normalization for the matrix J ( k). Such a normalization corresponds to a particular row scaling of the original linear system. This transformation modiﬁes the condition of the linear system. Our goal is to ﬁnd a normalization for which the condition is likely to be good. We recall that the condition κ of a matrix A in the frame of the solution of linear system Ax = b is deﬁned as κp (A) = A
p
A−1
p
≥1
,
and when κ(A) is large the matrix is said to be illconditioned. We know that κp varies with the choice of the norm p, however this does not signiﬁcantly aﬀect the order of magnitude of the condition number. In the following, p is assumed ∞ unless stated otherwise. Hence, we have κ(A) = A with A = max
i=1,...,n
A−1
n
,
aij  .
j=1
Row scaling can signiﬁcantly reduce the condition of a matrix. We therefore look for a scaling matrix D for which κ(D−1 A) κ(A) .
From a theoretical point of view, the problem of ﬁnding D minimizing κ(D−1 A) has been solved for the inﬁnity norm, see Bauer [10], however the result cannot be used in practice. Our concern is to suggest a practical solution to this problem by providing an eﬃcient heuristic. We certainly are aware that the problem of row scaling cannot be solved so far automatically for any given matrix. A technique consists in choosing D such that each row in D−1 A has approximately the same ∞norm, see Golub and Van Loan [56, p. 125]. As we want to represent our system using a graph for which we need a normalization, the set of possible scaling matrices is ﬁnite. The idea about rows
5.3 Solution Techniques for RE Models
112
having approximately the same ∞norm will guide us in choosing a particular normalization. We therefore introduce a measure for the dispersion of the ∞norm of the rows of a matrix. Recall that for a vector x ∈ Rn , the ∞norm is deﬁned as: x We compute a vector m with mi = ai·
∞ ∞
= max xi  .
i=1,...,n
i = 1, . . . , n
,
the ∞norm of row i of matrix A. A measure for the relative range of the elements of m is given by the ratio max(m)/ min(m). Since we are only interested in the order of magnitude of the ratio, we take the logarithm to obtain r = log(max(m)) − log(min(m)) .
Due to the normalization, matrix A is such that each row contains an element of value one and therefore min(m) ≥ 1 and r ≥ 0. In order to explore the relation existing between κ and r in a practical situation we took the Jacobian matrix of the MULTIMOD country model for Japan. The Jacobian matrix is of size 40 with 134 nonzero elements admitting 44707 diﬀerent normalizations. For each normalization r and κ2 have been computed. In Figure 5.6, r is plotted against the log of κ2 for each normalization.
Figure 5.6: Relation between r and κ2 in submodel for Japan for MULTIMOD. This ﬁgure shows a clear relationship between r and κ2 . This suggests a criterion for the selection of a normalization corresponding to an appropriate scaling of our matrix.
5.3 Solution Techniques for RE Models
113
The selection of the normalization is then performed by seeking a matching in a bipartite graph3 so that it optimizes a criterion built upon the values of the edges, which in turn correspond to the entries of the Jacobian matrix of the model. Selection of Pivots. We know that the choice of the pivots in a Gaussian elimination is determinant for the numerical stability. For the SGE method, a partial pivoting strategy would require to resort once more to the bipartite graph of the Jacobian matrix. In this case, for a given vertex, we choose new adjacent matchings such that the edge belonging to the new matching is of maximum magnitude. A strategy for the achievement a reasonable ﬁllin is to apply a threshold such as explained in Section 2.3.2. The Markowitz criterion is easy to implement in order to select candidate vertices deﬁning the edge in the new matching, since this criterion only depends on the degree of the vertices. Table 5.3 summarizes the operation counts for a GaussSeidel method and a Newton method using two diﬀerent sparse linear solvers. The Japan model is solved for a single period and the counts correspond to the total number of operations for the respective methods to converge. We recall that for the GS method, we needed to ﬁnd a new normalization and ordering of the equations, which constitutes a diﬃcult task. The two sparse Newton methods are similar in their computational complexity: the advantage of the SGE method is that this method reuses the symbolic factorizations of the Jacobian matrix between successive iterations.
Statement 2 3 4 total Newton MATLAB SGE 1.5 11.5 3.3 16.3 1.5 11.5 1.7 14.7 GS 9.7 9.7
Table 5.3: Operation count in Mﬂops for Newton combined with SGE and MATLAB’s sparse solver, and GaussSeidel. The total count of operations clearly favors the sparse Newton method, as the diﬃcult task of appropriate renormalizing and reordering is not required then. Multiple Blockdiagonal LU Matrix S in 5.3.3 is a block partitioned matrix and the LU factorization of such a matrix can be performed at block level. Such a factorization is called a block LU factorization. To take advantage of the blockdiagonal structure of matrix S, the block LU factorization can be adapted to a multiple blockdiagonal matrix with r lower blocks and h upper blocks (see Golub and Van Loan [56, p. 171]).
3 The association of a bipartite graph to the Jacobian matrix can be found in Gilli [50, p. 100].
5.3 Solution Techniques for RE Models
114
The algorithm consists in factorizing matrix S given in (5.3.3) on page 99 where r = 3 and h = 5 into S = LU :
I F t+1 1 t+1 F = 2 t+1 F 3 I
t+2 F1 I t+2 F2 t+3 F1
,
=
U t+1 Gt+2 Gt+3 Gt+4 Gt+5 Gt+6 1 2 3 4 5 U t+2 Gt+3 Gt+4 Gt+5 Gt+6 Gt+7 1 2 3 4 5 U t+3 Gt+4 Gt+5 Gt+6 Gt+7 Gt+8 1 2 3 4 5 .. .. .. .. .. .. . . . . . .
L
U
I
t+2 t+3 t+4 F3 F2 F1 I .. .. .. .. . . . .
The submatrices in L and U can be determined recursively as done in Algorithm 34. Algorithm 34 Block LU for Multiple Blockdiagonal Matrices
1. for k = 1 to T 2. 3. end 4. 5. 6. end end U t+k = Dt+k −
min(k−1,h) i=1
for j = min(k − 1, r) down to 1 solve for Fjt+k−j
t+k−j Fjt+k−j U t+k−j = Ej − min(k−1,j,h) i=j+1
Fit+k−i Gt+k−j i−j
Fit+k−i Gt+k i Fit+k−i Gt+k+j i+1
for j = 1 to min(T − k, h) Gt+k+j = At+k+j − j j
min(k−1,j,h) i=1
In Algorithm 34 the loops of Statements 2 and 5 limit the computations to the bandwidth formed by the blockdiagonals. The same goes for the limits in the sums in Statements 3, 4 and 6. In Statement 3, matrix Fjt+k−j is the solution of a linear system and matrices U t+k and Gt+k−j , respectively in Statement 4 and 6, are computed as sums and j products of known matrices. After the computation of matrices L and U , solution y can be obtained via block forward and back substitution; this is done in Algorithm 35.
5.3 Solution Techniques for RE Models
115
Algorithm 35 Block Forward and Back Substitution
1. ct+1 = bt+1 2. for k = 2 to T min(k−1,r) Fit+k−i ct+k−i 3. ct+k = bt+k − i=1 end 4. solve U t+T y t+T = ct+T for y t+T 5. for k = T − 1 down to 1 6. solve for y t+k min(T U t+k y t+k = ct+k − i=1 −k,h) Gt+k+i y t+k+i i end
Statements 1, 2 and 3 concern the forward substitution and Statements 4, 5 and 6 perform the back substitution. The F matrices in Statement 3 are already available after computation of the loop in Statement 2 of Algorithm 34. Therefore the forward substitution could be done during the block LU factorization. From a numerical point of view, block LU does not guarantee the numerical stability of the method even if Gaussian elimination with pivoting is used to solve the subsystems. The safest method consists in solving the Newton step in the stacked system with a sparse linear solver using pivoting. This procedure is implemented in portable TROLL [67], which uses MA28, see Duﬀ et al. [30]. Structure of the Submatrices in L and U . To a certain extent Block LU already takes advantage of the sparsity of the system as the zero submatrices are not considered in the computations. However, we want to go further and exploit the invariance of the incidence matrices, i.e. their structure. As matrices t+k t+k Ei , Dt+k and Aj have respectively the same sparse structure for all k, it t+k comes that the structure of matrices Fit+k , U t+k and Gj is also sparse and predictable. Indeed, if we execute the block LU Algorithm 34, we observe that the matrices t+k Fit+k , U t+k and Gj involve identical computations for min(r, h) < k < T − min(r, h). These computations involve sparse matrices and may therefore be expressed as sums of paths in the graphs corresponding to these matrices. The computations involving structurally identical matrices can be performed without repeating the steps used to analyze the structure of these matrices. Parallelization. The block LU algorithm proceeds sequentially to compute the diﬀerent submatrices in L and U . The same goes for the block forward and back substitution. In these procedures, the information necessary to execute a given statement is always linked to the result of immediately preceeding statements, which means that there are no immediate and obvious possibilities of parallelizing the algorithm. On the contrary, the matrix computations within a given statement oﬀer appealing opportunities for parallel computations. If, for instance, one uses the
5.3 Solution Techniques for RE Models
116
SGE method suggested in Section 5.3.4 for the solution of the linear system, the implementation of a data parallel execution model to perform eﬃciently repeated independent solutions of the model turns out to be straightforward. To identify task parallelism, we can analyze the structure of the operations deﬁned by the algebraic expressions discussed above. In order to do this, we seek for sets of expressions which can be evaluated independently. We illustrate this for the solution of the linear system presented on page 110. The identiﬁcation of the parallel tasks can be performed eﬃciently by resorting to a representation of the expressions by means of a graph as shown in Figure 5.7.
u11
.. ... .... ... .. .. . .. .. . ... . ... ..... ... .. .. .... ... ... .. .. . ... ... .... ... ... .. ... .. ...... . ... ........ . ... ... .. . ... ... ... ... 15 ... ... ... .... . .. ... ... . ..... ..... . .. .. . .. . . ..... ..... ... .. ... . ..... ..... . . . ..... . ..... .. . . ......
u10
u12
u13
u
u14
. . .. .. ... ...
u9
y4 y3
. . .. . .. . . . .
.. ... . .. ... ..
y1
.. .. . .. ... .. . .
y2
Figure 5.7: Scheduling of operations for the solution of the linear system as computed on page 110. We see that the solution can be computed in ﬁve steps, which each consist of one to ﬁve parallel tasks.4
t t The elimination of a vertex yk (corresponding to variable yk ) in the SGE algorithm described in Section 5.3.4 allows interesting developments in a stacked t s t t model. Let us consider a vertex yk for which there exists no vertex yj ∈ Pyk ∪Syk t for all j and s = t. In other words all the predecessors and successors of yk belong to the same period t. The operations for the elimination of such a vertex t are independent and identical for all the T variables yk , t = 1, . . . , T . Hence the computation of the arcs deﬁning the new graph resulting from the elimination of such a vertex are executed only once and then evaluated (possibly in parallel) t for the diﬀerent T periods. For the elimination of vertices yk with predecessors and successors in period t + 1, it is again possible to split the model into independent pieces concerning periods [t, t + 1], [t + 2, t + 3] etc. for which the elimination is again performed independently. This process can be continued until elimination of all vertices.
Nonstationary Iterative Methods This section reports results on numerical experiments with diﬀerent nonstationary iterative solvers that are applied to ﬁnd the step in the Newton method. We recall that the system is now stacked and has been decomposed into J ∗ according to the decomposition technique explained earlier. The size of the
4 This same approach has been applied for the parallel execution of GaussSeidel iterations in Section 4.2.2.
5.3 Solution Techniques for RE Models
117
nontrivial system to be solved is T × 413, where T is the number of times the model is stacked. Figure 5.8 shows the pattern of the stacked model for T = 10.
Figure 5.8: Incidence matrix of the stacked system for T = 10. The nonstationary solvers suited for nonsymmetric problems suggested in the literature we chose to experiment are BiCGSTAB, QMR and GMRES(m). The QMR method (QuasiMinimal Residual) introduced by Freund and Nachtigal [40] has not been presented in Section 2.5 since the method presents potentials for failures if implemented without sophisticated lookahead procedures. We tried a version of QMR without lookahead strategies for our application. BiCGSTAB—proposed by van der Vorst [99]—is also designed to solve large and sparse nonsymmetric linear systems and usually displays robust features and small computational cost. Finally, we chose to experiment the behavior in our framework of GMRES(m) originally presented by Saad and Schultz [90]. For all these methods, it is known that preconditioning can greatly inﬂuence the convergence. Therefore, following some authors (e.g. Concus et al. [24], Axelsson [6] and Bruaset [21]), we applied a preconditioner based upon the block structure of our problem. The block preconditioner we used is built on the LU factorization of the ﬁrst block of our stacked system. If we dropped the leads and lags of the model, the Jacobian matrix would be block diagonal, i.e. t+1 D Dt+2 . .. . t+T D The tradeoﬀ between the cost of applying the preconditioner and the expected gain in convergence speed can be improved by using a same matrix D along
5.3 Solution Techniques for RE Models
118
the diagonal. This simpliﬁcation is reasonable when the matrices display little change in their structure and values. We therefore selected Dt+1 for the diagonal block, computed its sparse LU factorization with partial pivoting and used it to perform the preconditioning steps in the iterative methods. Since the factorization is stored, only the forward and back substitutions are carried out for applying this block preconditioner. The values in the Jacobian matrix change at each step of the Newton method and therefore a new LU factorization of Dt+1 is computed at each iteration. Another possibility involving a cheaper cost would have been to keep the factorization ﬁxed during the whole solution process. For our experiment we shocked the variable of Canada’s government expenditures by 1% of the canadian GDP for the ﬁrst year of simulation. We report the results of the linear solvers in the classical Newton method. The ﬁgures reported are the average number of Mﬂops (millions of ﬂoating point operations) used to solve the linear system of size T × 413 arising in the Newton iteration. The number of Newton steps needed to converge is 2 for T less than 20 and 3 for T = 30. Table 5.4 presents the ﬁgures for the solver BiCGSTAB. The column labeled “size” contains the number of equations in the systems solved and the one named “nnz” shows the number of nonzero entries in the corresponding matrices. In order to keep them more compact, this information is not repeated in the other tables.
Mﬂops/it
T 7 8 10 15 20 30 50
size 2891 3304 4130 6195 8260 12390 20650
nnz 17201 19750 24848 37593 50338 75821 126783
10−4 53 67 100 230 400 970 ∗
tolerance 10−8 10−12 71 90 140 320 555 1366 ∗ 85 105 165 385 670 1600 ∗
1600 1400 1200 1000 800 600 400 200 0
◦ 2
10−4 10−8 10−12
2 ◦ 2 ◦
2◦ 2 ◦2 ◦ 7 10
2 ◦ 15 size
20
30
The symbol ∗ indicates a failure to converge.
Table 5.4: Average number of Mﬂops for BiCGSTAB. We remark that the increase in the number of ﬂops is less than the increase in the logarithm of the tolerance criterion. This seems to indicate that for our case the rate of convergence of BiCGSTAB is, as usually expected, more than linear. The work to solve the linear system increases with the size of the set of equations; doubling the number of equations leads to approximately a fourfold increase in the number of ﬂops. Table 5.5 summarizes the results obtained with the QMR method. We choose to report only the ﬂop count corresponding to a solution with a tolerance of 10−4 . As for BiCGSTAB, the numbers reported are the average Mﬂops counts of the successive linear solutions arising in the Newton steps. The increase of ﬂoating point operations is again linear in the size of the problem.
5.3 Solution Techniques for RE Models
119
T 7 8 10 15 20 30 50
tol=10−4 91 120 190 460 855 2100 9900∗
Mﬂops/it 2000 1800 1600 1400 1200 1000 800 600 400 200 0 ◦
◦ ◦ ◦ ◦◦ 7 10 15 size 20 30
of the ﬁrst two Newton steps; failure to converge in the third step.
∗ Average
Table 5.5: Average number of Mﬂops for QMR. The QMR method seems however, about twice as expensive as BiCGSTAB for the same tolerance level. The computational burden of QMR consists of about 14 level1 BLAS and 4 level2 BLAS operations, whereas BiCGSTAB uses 10 level1 BLAS and 4 level2 BLAS operations. This apparently indicates a better convergence behavior of BiCGSTAB than of QMR. Table 5.6 presents a summary of the results obtained with the GMRES(m) technique. Like in the previous methods, the convergence displays the expected superlinear convergence behavior. Another interesting feature of GMRES(m) is the possibility of tuning the restart parameter m. We know that the storage requirements increase with m and that the larger m becomes, the more likely the method converges, see [90, p. 867]. Each iteration uses approximately 2m + 2 level1 BLAS and 2 level2 BLAS operations. To conﬁrm the fact that the convergence will take place for suﬃciently large m, we ran a simulation of our model with T = 50, tol=10−4 and m = 50. In this case, the solver converged with an average count of 9900 Mﬂops. It is also interesting to notice that long restarts, i.e. large values of m, do not in general generate much heavier computations and that the increase in convergence may even lead to diminishing the global computational cost. An operation count per iteration is given in [90], which clearly shows this feature of GMRES(m). Even though this last method is not cheaper than BiCGSTAB in terms of ﬂops, the possibility of overcoming nonconvergent cases by using larger values of m certainly favors GMRES(m). Finally, we used the sparse LU solver provided in MATLAB. For general nonsymmetric matrices, a strategy that this method implements is to reorder the columns according to their minimum degree in order to minimize the ﬁllin. On the other hand, a sparse partial pivoting technique proposed by Gilbet and Peierls [45] is used to prevent losses in the stability of the method. In Table 5.7, we present some results obtained with the method we have just described.
5.3 Solution Techniques for RE Models
120
T 7 8 10 15 20 30 50
10−4 76 117 185 415 725 2566 ∗
m = 10 tolerance 10−8 10−12 130 170 275 600 1060 3466 ∗ 170 220 355 785 1350 4666 ∗
10−4 74 103 175 430 705 2100 ∗
m = 20 tolerance 10−8 10−12 107 125 250 620 990 2900 ∗ 135 190 320 810 1300 3766 ∗
10−4 76 110 165 460 770 2166 ∗
m = 30 tolerance 10−8 10−12 109 150 240 660 1085 3000 ∗ 145 200 310 855 1450 3833 ∗
The symbol ∗ indicates a failure to converge.
Mﬂops/it 2400 2000 1600 1200 800 400 0
× + m 10 20 30
× +
× + + + × × 7 10
+ × 15 size
× +
20
30
Table 5.6: Average number of Mﬂops for GMRES(m). The direct solver obtains an excellent error norm for the computed solution, which in general is less than the machine precision for our hardware (i.e. about 2·10−16). A drawback, however, is the steep increase in the number of arithmetic operations when the size of the system increases. This results in our situation favors the nonstationary iterative solvers for systems larger than T = 10, i.e. 4130 equation with about 25000 nonzero elements. Another major disadvantage of the sparse solver is that memory requirements became a constraint so rapidly that we were not able to experiment with matrices of order larger than approximately 38000 with 40 Mbytes of memory. We may mention, however, that no special tuning of the parameters in the sparse method was performed for our experiments. Careful control of such parameters may probably allow to obtain better performance results. The recent nonstationary iterative methods proposed in the scientiﬁc computing literature are certainly an alternative to sparse direct methods for solving large and sparse linear systems such as the ones arising with forward looking macroeconomic models. Sparse direct methods, however, have the advantage of allowing the monitoring of the stability of the process and the reuse of the structural information for instance, in several Newton steps.
5.3 Solution Techniques for RE Models
121
T 7 8 10 15 20 30 50
Mﬂops 91 160 295 730 1800 5133 ∗
MBytes
Mﬂops/it 4800 4000 3200 2400 1600 800 0
◦
13.1 19.8 40.2 89.3 ∗
◦ ◦◦ ◦ 7 10 ◦ 15 size 20 30
The symbol ∗ indicates that the memory capacity has been exceeded.
Table 5.7: Average number of Mﬂops for MATLAB’s sparse LU.
Appendix A
The following pages provide background material on computation in ﬁnite precision and an introduction to computational complexity—two issues directly relevant to the discussions of numerical algorithms.
A.1
Finite Precision Arithmetic
The representation of numbers on a digital computer is very diﬀerent from the one we usually deal with. Modern computer hardware, can only represent a ﬁnite subset of the real numbers. Therefore, when a real number is entered in a computer, a representation error generally appears. The eﬀects of ﬁnite precision arithmetic are thoroughly discussed in Forsythe et al. [38] and Gill et al. [47] among others. We may explain what occurs by ﬁrst stating that the internal representation of a real number is a ﬂoating point number. This representation is characterized by the number base β, the precision t and the exponent range [L, U ], all four being integers. The form of the set of all ﬂoating point numbers F is F = {f  f = ±.d1 d2 . . . dt × β e and 0 ≤ di < β, i = 1, . . . , n, d1 = 0, L < e < U } ∪ {0} . Nowadays, the standard base is β = 2, whereas the other integers t, L, U vary according to the hardware; the number .d1 d2 . . . dt is called the mantissa. The magnitudes of the largest and smallest representable numbers are M m = β U (1 − β −t ) for the largest, = β L−1 for the smallest.
Therefore, when we input x ∈ R in a computer it is replaced by fl (x) which is the closest number to x in F . The term “closest” is deﬁned as the nearest number in F (rounded away from zero if there is a tie) when rounded arithmetic is used and the nearest number in F such as fl(x) ≤ x when chopped arithmetic is used.
A.2 Condition of a Problem
123
If x > M or 0 < x < m, then an arithmetic fault occurs, which, most of the time, implies the termination of the program. When computations are made, further errors are introduced. If we denote by op one of the four arithmetic operations +, −, ×, ÷, then (a op b) is represented internally as fl (a op b). To show which relative error is introduced, we ﬁrst note that fl (x) = x(1 + ε) where ε < u and u is the unit roundoﬀ deﬁned as u= 1 (1−t) β 2 β (1−t) for rounded arithmetic, for chopped arithmetic.
Hence, fl (x op y) = (x op y)(1 + ε) with ε < u and therefore the relative error, if (x op y) = 0, is fl (x op y) − (x op y) ≤u. x op y Thus we see that an error corresponds to each arithmetic operation. This error is not only the result of a rounding in the computation itself, but also of the inexact representation of the arguments. Even if one should do one’s best not to accumulate such errors, this is to be the most likely source of problems when doing arithmetic computations with a computer. The most important danger is catastrophic cancellation, which leads to a complete loss of correct digits. When close ﬂoating point numbers are subtracted, the number of signiﬁcant digits may be small or even inexistent. This is due to close numbers carrying many identical digits in the ﬁrst left positions of the mantissa. The diﬀerence then cancels these digits and the renormalized mantissa contains very few signiﬁcant digits. For instance, if we have a computer with β = 10 and t = 4, we may ﬁnd that fl (fl(10−4 + 1) − 1) = fl(1 − 1) = 0 . We may notice that the exact answer can be found by associating the terms diﬀerently fl (10−4 + fl(1 − 1)) = fl (10−4 + 0) = 10−4 , which shows that ﬂoating point computations are, in general, not associative. Without careful control, such situations can lead to a disastrous degradation of the result. The main goal is to build algorithms that are not only fast, but above all reliable in their numerical accuracy.
A.2
Condition of a Problem
The condition of a problem reﬂects the sensitivity of its exact solution with respect to changes in the data. If small changes in the data lead to large changes in the solution, the problem is said to be illconditioned.
A.2 Condition of a Problem
124
The condition number of the problem measures the maximum possible change in the solution relative to the change in the data. In our context of ﬁnite precision computations, the condition of a problem becomes important since when we input data in a computer, representation errors generally lead to store slightly diﬀerent numbers than the original ones. Moreover, the linear systems we deal with most of the time solve approximate or linearized problems and we would like to ensure that small approximation errors will not lead to drastic changes in the solution. If we consider the problem of solving a linear system of equations, the condition can be formalized in the following: Let us consider a linear system Ax = b with a nonsingular matrix A. We want to determine the change in the solution x∗ , given a change in b or in A. If b is perturbed by ∆b and the corresponding perturbation of x∗ is ∆xb so that the equation A(x∗ + ∆xb ) = b + ∆b is satisﬁed, we then have A ∆xb = ∆b . Taking the norm of ∆xb = A
−1
∆b and of A x∗ = b, we get ∆b and A x∗ ≤ b ,
∆xb ≤ A−1 so that
∆xb ≤ A−1 x∗
A
∆b . b
(A.1)
However, perturbing A by ∆A and letting ∆xA be such that (A + ∆A)(x∗ + ∆xA ) = b , we ﬁnd ∆xA = −A−1 ∆A(x∗ + ∆xA ) .
Taking norms and rewriting the expression, we ﬁnally get x∗ ∆xA ≤ A−1 + ∆xA A ∆A . A (A.2)
We see that both expressions (A.1) and (A.2) are a bound for the relative change in the solution given a relative change in the data. Both contain the factor κ(A) = A−1 A ,
called the conditionnumber of A. This number can be interpreted as the ratio of the maximum stretch the linear operator A has on vectors, over the minimum stretch of A. This follows from the deﬁnition of matrix norms. The formulas deﬁning the matrix norms of A and A−1 are A A−1 = = minv=0 max
v=0
Av v 1
Av v
.
A.3 Complexity of Algorithms
125
We note that κ(A) depends on the norm used. With this interpretation, it is clear that κ(A) must be greater than or equal to 1 and the closer to singularity matrix A is, the greater κ(A) becomes. In the limiting case where A is singular, the minimum stretch is zero and the condition number is deﬁned to be inﬁnite. It is certainly not operational to compute the condition number of A by the formula A−1 A . This number can be estimated by other procedures when the 1 norm is used. A classical reference is Cline et al. [23] and the LAPACK library.
A.3
Complexity of Algorithms
The analysis of the computational complexity of algorithms is a very sophisticated and diﬃcult topic in computer science. Our goal is simply to present some terminology and distinctions that are of interest in our work. The solution of most problems can be approached using diﬀerent algorithms. Therefore, it is natural to try to compare their performance in order to ﬁnd the most eﬃcient method. In its broad sense, the eﬃciency of an algorithm takes into account all the computing resources needed for carrying out its execution. Usually, for our purposes, the crucial resource will be the computing time. However, there are other aspects of importance such as the amount of memory needed (space complexity) and the reliability of an algorithm. Sometimes a simpler implementation can be preferred to a more sophisticated one, for which it becomes diﬃcult to assess the correctness of the code. Keeping these caveats in mind, it is nonetheless very informative to calculate the time requirement of an algorithm. The techniques presented in the following deal with numerical algorithms, i.e. algorithms were the largest part of the time is devoted to arithmetic operations. With serial computers, there is an almost proportional relationship between the number of ﬂoating point operations (additions, subtractions, multiplications and divisions) and the running time of an algorithm. Clearly, this time is very speciﬁc to the computer used, thus the quantity of interest is the number of ﬂops (ﬂoating point operations) used. The time requirements of an algorithm are conveniently expressed in terms of the size of the problem. In a broad sense, this size usually represents the number of items describing the problem or a quantity that reﬂects it. For a general square matrix A with n rows and n columns, a natural size would be its order n. We will focus on the time complexity function which expresses, for a given algorithm, the largest amount of time needed to solve a problem as a function of its size. We are interested in the leading terms of the complexity function so it is useful to deﬁne a notation. Deﬁnition 3 Let f and g be two functions f, g : D → R, D ⊆ R. 1. f (x) = O(g(x)) if and only if there exists a constant a > 0 and xa such that for every x ≥ xa we have f (x) ≤ a g(x), 2. f (x) = Ω(g(x)) if and only if there exists a constant b > 0 and xb such that for every x ≥ xb we have f (x) ≥ b g(x),
A.3 Complexity of Algorithms
126
3. f (x) = Θ(g(x)) if and only if f (x) = O(g(x)) and f (x) = Ω(g(x)). Hence, when an algorithm is said to be O(g(n)), it means that the running time to solve a problem of size n ≥ n0 is less than c1 g(n) + c2 where c1 > 0 and c2 are constants depending on the computer and the problem, but not on n. Some simple examples of time complexity using the O(·) notation are useful for subsequent developments: O(n) Linear complexity. For instance, computing a dot product of two vectors of size n is O(n). O(n2 ) Quadratic complexity generally arises in algorithms processing all pairs of data input; generally the code shows two nested loops. Adding two n × n matrices or multiplying a n × n matrix by a n × 1 vector is O(n2 ). O(n3 ) Cubic complexity may appear in triple nested loops. For instance the product of two n × n full matrices involves n2 dot products and therefore is of cubic complexity. O(bn ) Exponential complexity (b > 1). Algorithms proceeding to exhaustive searches usually have this exploding time complexity. Common lowlevel tasks of numerical linear algebra are extensively used in many higherlevel packages. The eﬃciency of these basic tasks is essential to provide good performance to numerical linear algebra routines. These components are called BLAS for Basic Linear Algebra Subroutines. They have been grouped in categories according to the computational and space complexity they use: • Level 1 BLAS are vectorvector operations involving O(n) operations on O(n) data, • Level 2 BLAS are matrixvector operations involving O(n2 ) operations on O(n2 ) data, • Level 3 BLAS are matrixmatrix operations involving O(n3 ) operations on O(n2 ) data. Levels in BLAS routines are independent in the sense that Level 3 routines do not make calls to Level 2 routines, and Level 2 routines do not make calls to Level 1 routines. Since these operations should be as eﬃcient as possible, diﬀerent versions are optimized for diﬀerent computers. This leads to a large portability of the code using such routines without losing the performance on each particular machine. There are other measures for the number of operations involved in an algorithm. We presented here the worstcase analysis, i.e. the maximum number of operations to be performed to execute the algorithm. Another approach would be to determine the average number of operations after having assumed a probability distribution for the characteristics of the input data. This kind of analysis is not relevant for the class of algorithms presented later and therefore is not further developed.
A.3 Complexity of Algorithms
127
An important distinction is made between polynomial time algorithms, the time complexity of which is O(p(n)), where p(n) is some polynomial function in n, and non deterministic polynomial time algorithms. The class of polynomial is denoted by P ; nonpolynomial time algorithms fall in the class NP . For an exact deﬁnition and clear exposition about P and NP classes see Even [32] and Garey and Johnson [43]. Clearly, we cannot expect nonpolynomial algorithms to solve problems eﬃciently. However they sometimes are applicable for some small values of n. In very few cases, as the bound found is a worstcase complexity, some nonpolynomial algorithms behave quite well in an average case complexity analysis (as, for instance, the simplex method or the branchandbound method).
Bibliography
[1] L. Adams. MStep Preconditioned Conjugate Gradient Methods. SIAM J. Sci. Stat. Comput., 6:452–463, 1985. [2] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. AddisonWesley, Reading, MA, 1974. [3] H. M. Amman. Nonlinear Control Simulation on a Vector Machine. Parallel Computing, 10:123–127, 1989. [4] A. Ando, P. Beaumont, and M. Ando. Eﬃciency of the CYBER 205 for Stochastic Simulation of A Simultaneous, Nonlinear, Dynamic Econometric Model. Iternat. J. Supercomput. Appl., 1(4):54–81, 1987. [5] J. Armstrong, R. Black, D. Laxton, and D. Rose. A Robust Method for Simulating ForwardLooking Models. The Bank of Canada’s New Quarterly Projection Model, Part 2. Technical Report 73, Bank of Canada, Ottawa, Canada, 1995. [6] O. Axelsson. Incomplete Block Matrix Factorization Preconditioning Methods. The Ultimate Answer? J. Comput. Appl. Math., 12:3–18, 1985. [7] O. Axelsson. Iterative Solution Methods. Oxford University Press, Oxford, UK, 1994. [8] R. Barrett et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, PA, 1994. [9] R. J. Barro. Rational Expectations and the Role of Monetary Policy. Journal of Monetary Economics, 2:1–32, 1976. [10] F. L. Bauer. Optimally Scaled Matrices. Numer. Math., 5:73–87, 1963. [11] R. Becker and B. Rustem. Algorithms for Solving Nonlinear Models. PROPE Discussion Paper 119, Imperial College, London, 1991. [12] M. Beenstock. A Neoclassical Analysis of Macroeconomic Policy. Cambridge University Press, London, 1980. [13] M. Beenstock, A. Dalziel, P. Lewington, and P. Warburton. A Macroeconomic Model of Aggregate Supply and Demand for the UK. Economic Modelling, 3:242–268, 1986. [14] K. V. Bhat and B. Kinariwala. Optimum Tearing in Large Systems and Minimum Feedback Cutsets of a Digraph. Journal of the Franklin Institute, 307(2):71–154, 1979. [15] C. Bianchi, G. Bruno, and A. Cividini. Analysis of Large Scale Econometric Models Using Supercomputer Techniques. Comput. Sci. Econ. Management, 5:271–281, 1992. [16] L. Bodin. Recursive FixPoint Estimation, Theory and Applications. Selected Publications of the Department of Statistics. University of Uppsala, Uppsala, Norway, 1974.
BIBLIOGRAPHY
129
[17] R. Boucekkine. An Alternative Methodology for Solving Nonlinear Forwardlooking Models. Journal of Economic Dynamics and Control, 19:711–734, 1995. [18] A. S. Brandsma. The Quest Model of the European Community. In S. Ichimura, editor, Econometric Models of AsianPaciﬁc Countries. SpringerVerlag, Tokyo, 1994. [19] F. Brayton and E. Mauskopf. The Federal Reserve Board MPS Quarterly Econometric Model of the U.S. Economy. Econom. Modelling, 2(3):170–292, 1985. [20] C. J. Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equations. Mathematics of Computation, 19:577–593, 1965. [21] A. M. Bruaset. Eﬃcient Solutions of Linear Equations Arising in a Nonlinear Economic Model. In M. Gilli, editor, Computational Economics: Models, Methods and Econometrics, Advances in Computational Economics. Kluwer Academic Press, Boston, MA, 1995. [22] L. K. Cheung and E. S. Kuh. The Bordered Triangular Matrix and Miminum Essential Sets of a Digraph. IEEE Transactions on Circuits and Systems, 21(1):633–639, 1974. [23] A. K. Cline, C. B. Moler, G. W. Stewart, and J. H. Wilkinson. An Estimate for the Condition Number of a Matrix. SIAM J. Numer. Anal., 16:368–375, 1979. [24] P. Concus, G. Golub, and G. Meurant. Block Preconditioning for the Conjugate Gradient Method. SIAM J. Sci. Stat. Comput., 6:220–252, 1985. [25] J. E. Jr. Dennis and J. J. Mor´. A Characterization of Superlinear Convergence e and its Application to QuasiNewton Methods. Mathematics of Computation, 28:549–560, 1974. [26] J. E. Jr. Dennis and R. B. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Series in Computational Mathematics. PrenticeHall, Englewood Cliﬀs, NJ, 1983. [27] H. Don and G. M. Gallo. Solving Large Sparse Systems of Equation in Econometric Models. Journal of Forecasting, 6:167–180, 1987. [28] P. Dubois, A. Greenbaum, and G. Rodrigue. Approximating the Inverse of a Matrix for Use in Iterative Algorithms on Vector Processors. Computing, 22:257–268, 1979. [29] I. S. Duﬀ. MA28 – A Set of FORTRAN Subroutines for Sparse Unsymmetric Linear Equations. Technical Report AERE R8730, HMSO, London, 1977. [30] I. S. Duﬀ, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Oxford Science Publications, New York, 1986. [31] I. S. Duﬀ and J. K. Reid. The Design of MA48, a Code for the Direct Solution of Sparse Unsymmetric Linear Systems of Equations. Technical Report RALTR95039, Computer and Information Systems Department, Rutherford Appelton Laborartory, Oxfordshire, August 1995. [32] S. Even. Graph Algorithms. Computer Sience Press, Rockville, MD, 1979. [33] R. C. Fair. Speciﬁcation, Estimation and Analysis of Macroeconometric Models. Harvard University Press, Cambridge, MA, 1984. [34] R. C. Fair and J. B. Taylor. Solution and Maximum Likelihood Estimation of Dynamic Nonlinear Rational Expectations Models. Econometrica, 51(4):1169– 1185, 1983. [35] J. Faust and R. Tryon. A Distributed Block Approach to Solving NearBlockDiagonal Systems with an Application to a Large Macroeconometric Model. In M. Gilli, editor, Computational Economics: Models, Methods and Econometrics, Advances in Computational Economics. Kluwer Academic Press, Boston, MA, 1995.
BIBLIOGRAPHY
130
[36] P. Fisher. Rational Expectations in Macroeconomic Models. Kluwer Academic Publishers, Dordrecht, 1992. [37] P. G. Fisher and A. J. HughesHallett. An Eﬃcient Solution Strategy for Solving Dynamic Nonlinear Rational Expectations Models. Journal of Economic Dynamics and Control, 12:635–657, 1988. [38] G. E. Forsythe, M. A. Malcolm, and C. B. Moler. Computer Methods for Mathematical Computations. PrenticeHall, Englewood Cliﬀs, NJ, 1977. [39] R. W. Freund, G. H. Golub, and N. M. Nachtigal. Iterative Solution of Linear Systems. Acta Numerica, pages 1–44, 1991. [40] R. W. Freund and N. M. Nachtigal. QMR: A Quasimininal Residual Method for NonHermitian Linear Systems. Numer. Math., 60:315–339, 1991. [41] J. Gagnon. A ForwardLooking Multicountry Model for Policy Analysis: MX3. Jornal of Economic and Financial Computing, 1:331–361, 1991. [42] M. Garbely and M. Gilli. Two Approaches in Reading Model Interdependencies. In J.P. Ancot, editor, Analysing the Structure of Econometric Models, pages 15– 33. Martinus Nijhoﬀ, The Hague, 1984. [43] M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to the Theory of NPCompleteness. W.H. Freeman and Co., San Francisco, 1979. [44] J. R. Gilbert, C. B. Moler, and R. Schreiber. Sparse Matrices in MATLAB: Design and Implementation. SIAM J. Matrix Anal. Appl., 13:333–356, 1992. [45] J. R. Gilbert and Peierls. Sparse Partial Pivoting in Time Proportional to Arithmetic Operations. SIAM J. Sci. Statist. Comput., 9:862–874, 1988. [46] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, London, 1981. [47] P. E. Gill, W. Murray, and M. H. Wright. Numerical Linear Algebra and Optimization. Advanced Book Program. AddisonWesley, Redwood City, CA, 1991. [48] M. Gilli. CAUSOR — A Program for the Analysis of Recursive and Interdependent Causal Structures. Technical Report 84.03, Departement of Econometrics, University of Geneva, 1984. [49] M. Gilli. Causal Ordering and Beyond. 33(4):957–971, 1992. International Economic Review,
[50] M. Gilli. GraphTheory Based Tools in the Practice of Macroeconometric Modeling. In S. K. Kuipers, L. Schoonbeek, and E. Sterken, editors, Methods and Applications of Economic Dynamics, Contributions to Economic Analysis. North Holland, Amsterdam, 1995. [51] M. Gilli and M. Garbely. Matching, Covers, and Jacobian Matrices. Journal of Economic Dynamics and Control, 20:1541–1556, 1996. [52] M. Gilli, M. Garbely, and G. Pauletto. Equation Reordering for Iterative Processes — A Comment. Computer Science in Economics and Management, 5:147– 153, 1992. [53] M. Gilli and G. Pauletto. Econometric Model Simulation on Parallel Computers. International Journal of Supercomputer Applications, 7:254–264, 1993. [54] M. Gilli and E. Rossier. 17(4):647–652, 1981. Understanding Complex Systems. Automatica,
[55] G. H. Golub and J. M. Ortega. Scientiﬁc Computing: An Introduction with Parallel Computing. Academic Press, San Diego, CA, 1993. [56] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins, Baltimore, 1989.
BIBLIOGRAPHY
131
[57] G. Guardabassi. A Note on Minimal Essential Sets. IEEE Transactions on Circuit Theory, 18:557–560, 1971. [58] G. Guardabassi. An Indirect Method for Minimal Essential Sets. IEEE Transactions on Circuits and Systems, 21(1):14–17, 1974. [59] A. Hadjidimos. Accelerated Overrelaxation Method. Mathematics of Computation, 32:149–157, 1978. [60] L. A. Hageman and D. M. Young. Applied Iterative Methods. Computer Science and Applied Mathematics. Academic Press, Orlando, FL, 1981. [61] S. G. Hall. On the Solution of Large Economic Models with Consistent Expectations. Bulletin of Economic Research, 37:157–161, 1985. [62] L. P. Hansen and T. J. Sargent. Formulating and Estimating Dynamic Linear Rational Expectations Models. Journal of Economic Dynamics and Control, 2:7–46, 1980. [63] J. Helliwell et al. The Structure of RDX2—Part 1 and 2. Staﬀ Research Studies 7, Bank of Canada, Ottawa, Canada, 1971. [64] M. R. Hestenes and E. Stiefel. Method of Conjugate Gradients for Solving Linear Systems. J. Res. Nat. Bur. Stand., 49:409–436, 1952. [65] F. J. Hickernell and K. T. Fang. Combining Quasirandom Search and NewtonLike Methods for Nonlinear Equations. Technical Report MATH–037, Departement of Mathematics, Hong Kong Baptist College, 1993. [66] High Performance Fortran Forum, Houston, TX. High Performance Fortran Language Speciﬁcation. Version 0.4, 1992. [67] P. Hollinger and L. Spivakovsky. Portable TROLL 0.95. Intex Solution, Inc., 35 Highland Circle, Needham, MA 02194, Preliminary Draft edition, May 1995. [68] A. J. Hughes Hallett. Multiparameter Extrapolation and Deﬂation Methods for Solving Equation Systems. International Journal of Mathematics and Mathematical Sciences, 7:793–802, 1984. [69] A. J. Hughes Hallett. Techniques Which Accelerate the Convergence of First Order Iterations Automatically. Linear Algebra and Applications, 68:115–130, 1985. [70] A. J. Hughes Hallett. A Note on the Diﬃculty of Comparing Iterative Processes with Diﬀering Rates of Convergence. Comput. Sci. Econ. Management, 3:273– 279, 1990. [71] A. J. Hughes Hallett, Y. Ma, and Y. P. Ying. Hybrid Algorithms with Automatic Switching for Solving Nonlinear Equations Systems in Economics. Computational Economics, forthcoming 1995. [72] R. M. Karp. Reducibility Among Combinatorial Problems. In R. E. Miller and J. W. Thatcher, editors, Complexity of Computer Computations, pages 85–104. Plenum Press, New York, 1972. [73] C. T. Kelley. Iterative Methods for Linear and Nonlinear Systems of Equations. Frontiers in Applied Mathematics. SIAM, Phildelphia, PA, 1995. [74] J.P. Laﬀargue. R´solution d’un mod`le macro´conom´trique avec anticipations e e e e rationnelles. Annales d’Economie et Statistique, 17:97–119, 1990. [75] R. E. Lucas and T. J. Sargent, editors. Rational Expectations and Econometric Practice. George Allen & Unwin, London, 1981. [76] R. E. Lucas, Jr. Some International Evidence on OutputInﬂation Tradeoﬀs. American Economic Review, 63:326–334, 1973.
BIBLIOGRAPHY
132
[77] R. E. Lucas, Jr. Econometric Policy Evaluation: A Critique. In K. Brunner and A. H. Meltzer, editors, The Phillps Curve and Labor Markets, volume 1 of Supplementary Series to the Jornal of Monetary Economics, pages 19–46. North Holland, 1976. [78] D. G. Luenberger. Linear and Nonlinear Programming. AddisonWesley, Reading, MA, second edition, 1989. [79] P. Masson, S. Symanski, and G. Meredith. MULTIMOD Mark II: A Revised and Extended Model. Occasional Paper 71, International Monetary Fund, Washington D.C., July 1990. [80] B. T. McCallum. Rational Expectations and the Estimation of Econometric Models: An Alternative Procedure. International Economic Review, 17:484– 490, 1976. [81] A. Nagurney. Parallel Computation. In H. M. Amman, D. Kendrick, and J. Rust, editors, Handbook of Computational Economics. North Holland, Amsterdam, forthcoming 1995. [82] P. Nepomiastchy and A. Ravelli. Adapted Methods for Solving and Optimizing QuasiTriangular Econometric Models. Anals of Economics and Social Measurement, 6:555–562, 1978. [83] P. Nepomiastchy, A. Ravelli, and F. Rechenmann. An Automatic Method to Get an Econometric Model in Quasitriangular Form. Technical Report 313, INRIA, 1978. [84] T. Nijman and F. Palm. Generalized Least Square Estimation of Linear Models Containing Rational Future Expectations. International Economic Review, 32:383–389, 1991. [85] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York, 1970. [86] L. Paige and M. Saunders. Solution of Sparse Indeﬁnite Systems of Linear Equations. SIAM J. Numer. Anal., 12:617–629, 1975. [87] C. E. Petersen and A. Cividini. Vectorization and Econometric Model Simulation. Comput. Sci. Econ. Management, 2:103–117, 1989. [88] A. Pothen and C. Fan. Computing the Block Triangular Form of a Sparse Matrix. ACM Trans. Math. Softw., 16(4):303–324, 1990. [89] J. K. Reid, editor. Large Sparse Sets of Linear Equations. Academic Press, London, 1971. [90] Y. Saad and M. Schultz. GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems. SIAM J. Sci. Stat. Comput., 7:856– 869, 1986. [91] T. J. Sargent. Rational Expectations, the Real Rate of Interest, and the Natural Rate of Unemployment. Brooking Papers on Economic Activity, 2:429–480, 1973. [92] T. J. Sargent. A Classical Macroeconometric Model for the United States. Journal of Political Economy, 84(2):207–237, 1976. [93] R. Sedgewick. Algorithms. Addison Wesley, Reading, MA, 2nd edition, 1983. [94] D. Steward. Partitioning and Tearing Systems of Equations. SIAM J. Numer. Anal., 7:856–869, 1965. [95] J. C. Strikwerda and S. C. Stodder. Convergence Results for GMRES(m). Department of Computer Sciences, University of Wisconsin, August 1995. [96] J. B. Taylor. Estimation and Control of a Macroeconometric Model with Rational Expectations. Econometrica, 47(5):1267–1286, 1979.
BIBLIOGRAPHY
133
[97] Thinking Machines Corporation, Cambridge, MA. CMSSL release notes for the CM200. Version 3.00, 1992. [98] A. A. Van der Giessen. Solving Nonlinear Systems by Computer; A New Method. Statistica Neerlandica, 24(1), 1970. [99] H. van der Vorst. BiCGSTAB: A Fast and Smoothly Converging Variant of BiCG for the Solution of Nonsymmetric Linear Systems. SIAM J. Sci. Stat. Comput., 13:631–644, 1992. [100] K. F. Wallis. Multiple Time Series Analysis and the Final Form of Econometric Models. Econometrica, 45(6):1481–1497, 1977. [101] K. F. Wallis. Econometric Implications of the Rational Expectations Hypothesis. Econometrica, 48(1):49–73, 1980. [102] M. R. Wickens. The Estimation of Econometric Models with Rational Expectation. Review of Economic Studies, 49:55–67, 1982. [103] A. Yeyios. On the Optimisation of an Extrapolation Method. Linear Algebra and Applications, 57:191–203, 1983.