You are on page 1of 37

Symbolic Incorporation of External Procedures into Process

Modeling Environments

John E. Tolsma∗ Jerry A. Clabaugh† Paul I. Barton†

September 24, 2001

Abstract

Despite the widespread availability of sophisticated, user-friendly process modeling environments, the

use of external procedures within these software packages will be necessary for quite some time. This

paper illustrates the importance of properly handling these external procedures and describes an

automated, symbolic approach for incorporating them correctly into an equation-oriented modeling

environment.

Keywords: equation-oriented simulation, automatic differentiation, source code transformation, DAEPACK,

ABACUSS II, open systems, physical property and chemical kinetics libraries.
∗ Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

(jtolsma@mit.edu, jerryc@mit.edu, and pib@mit.edu).

1
1 Introduction

The value of process modeling within the chemical and biological processing industries today is indis-

putable. There are an increasing number and variety of applications where it is used. For example,

modeling is used in activities such as experimental design, process feasibility studies, process synthesis [1]

and design [2], de-bottlenecking and optimization [3, 4], operator training [5, 6], and real-time optimiza-

tion. The information gained from proper modeling include improved understanding of the process, safer,

cleaner, and more profitable designs and operations, and reduction in time-to-market for new products.

For these reasons alone, process modeling, simulation, and optimization play a crucial role within the

process industries.

Two main approaches exist for process modeling: the sequential (and simultaneous) modular approach

and the equation-oriented approach. In the modular approach, the user constructs the process flowsheet

model, typically with a user-friendly graphical interface, by connecting blocks corresponding to members

of a library of unit operation model subroutines. The simulator then analyzes the flowsheet structure and

solves the problem with an appropriate algorithm. Substantial research has been performed over the past

several decades in flowsheet analysis and solution strategies. The unit operation model libraries typically

contain collections of subroutines implemented in lower level programming languages such as C or Fortran.

These subroutines take as arguments a relatively few number of parameters (e.g., flowrates, composition,

temperature, and pressure of input streams, unit operation parameters) and return the variables charac-

terizing a relatively small number of output streams. The implementation of these subroutines typically

contains sophisticated tailored solution strategies to ensure convergence to correct results. Information

moves through the flowsheet model in a similar manner to the material flow through the actual process,

from the output of one unit operation to the input of the next, and so on. This unidirectional flow of

information from inputs to outputs makes the modular simulator ideally suited for steady-state solution

2
of flowsheets with relatively few recycle streams. Flowsheets containing many recycle streams or design

constraints (e.g., specifications on outputs) are often much more difficult to converge.

In contrast to the modular approach, modern equation-oriented modeling environments attempt to

solve the entire system of equations involved in the process flowsheet model simultaneously. The overall

system model is typically very large (tens to hundreds of thousands of equations and variables), but

extremely sparse. For example, each equation typically involves only five to ten variables. Having the

entire system of equations simultaneously allows very sophisticated large-scale numerical integration and

optimization codes to be applied to the process flowsheet model. Thus, the equation-oriented approach

is more suited for dynamic calculations, such as dynamic simulation, parametric sensitivity analysis, and

dynamic optimization, and steady-state optimization. When the entire system of equations is in this

open form, that is, all equations and variables are visible to the numerical algorithms, it is not typically

possible to combine subsets of the model equations with custom solution algorithms as is the case in the

modular approach. Consequently, equation-oriented process modeling environments tend to be far less

robust for steady-state simulations compared to modular environments.

Many equation-oriented process modeling environments provide high-level declarative input languages

which allow the user to develop their own models in a very natural and intuitive way, much as the modeler

would with a pencil and paper using standard mathematical notation. This model prototyping capability

is extremely important from the standpoint of dynamic calculations. In order to capture properly the

dynamic behavior of a process, very detailed models must be constructed which include descriptions of

vessel geometry and internal arrangement, possible flow reversals and transitions, phase changes, etc.

Since it is obviously not possible to provide a library of unit operation models with sufficient detail for all

processes, the availability of a flexible input language is crucial for dynamic calculations. Furthermore,

most input languages support the ability to describe complex hybrid phenomena such as discrete control

3
actions, safety interlock systems, and process disturbances. This capability enables activities such as

the development of startup and shutdown procedures [7], disturbance studies, operator training, and

development of batch process recipes.

Another important characteristic of modern equation-oriented modeling environments is the decou-

pling between model description and model solution. For example, the same model, written in the input

language of the process simulator, may be used for a variety of computations such as simulation, param-

eter estimation, sensitivity analysis, and optimization during the life of the process, from initial process

development to process decommissioning [8]. As experience with the process grows, additional insight

and detail may be incorporated into the model. Hence, the model becomes a repository of knowledge for

the process [9].

The ability to prototype efficiently detailed models is important within many emerging fields where

new and innovative processes require custom models. Some of these fields are biotechnological processes,

biomedical devices, specialty polymer and other advanced materials design, and microprocess systems.

The model development involved in these emerging areas must be performed by experts in the field, who

are often not experts in process modeling and the subsequent calculations. Consequently, the availability

of sophisticated, user-friendly model prototyping tools offer several benefits such as rapid model develop-

ment, model analysis tools for debugging, automated application of advanced numerical algorithms, and

visualization of results.

2 Motivation

Although the input languages of modern equation-oriented modeling environments enable quite general

models to be constructed, there are limitations. For example, the input languages of these modeling envi-

4
ronments are ideal for describing large scale systems of differential-algebraic equations (DAEs) or partial

differential-algebraic equations (PDAEs) where every equation and variable should be made accessible

symbolically to the numerical engine. As alluded to above, this eliminates the possibility of combining so-

phisticated solution algorithms for specific portions of the model that are difficult to converge, expensive

to compute, or exhibit multiple solutions. For example, a subset of the model equations may compute

vapor phase partial molar volumes from a cubic equation of state. A specific numerical algorithm should

be applied for this subtask, ensuring the calculation always converges to the correct vapor phase root.

Also, a user may wish to apply an inside-out flash calculation in order to compute robustly VLE. For these

reasons and others, most modern equation-oriented input languages provide the user with the ability to

incorporate external procedures coded in programming languages such as C or Fortran into the overall

process model.

There are a number of reasons why a user would want to use external procedures within a process

model in addition to being able to incorporate custom numerical algorithms. One reason is to make use of

third party libraries such as physical property or chemical kinetics packages, which are usually available

as subroutine or procedure libraries. Another is the fact that many organizations have vast amounts of

legacy codes (typically written in Fortran) that contain proprietary or classified information. Requiring

the user to recode the equations in the existing code in the form of the input language is tedious, error

prone, and often not possible. Furthermore, many of these existing codes have been well-tested, validated,

and are trusted. Consequently, they should be used “as is” whenever possible.

When model equations are written in the input language of the modeling environment, the simulator

is able to analyze these equations and construct analytical derivatives, sparsity patterns, and essentially

any other symbolic information that may be exploited during the subsequent numerical calculation.

However, with current technology external procedures are generally evaluated as “black-boxes” within

5
the simulator. That is, given a set of independent variables, the external procedure is called by the

simulator to calculate merely values for the corresponding dependent variables. The implication of this

is that, in general, necessary partial derivatives are approximated using finite differences, sparsity is not

exploited, and any discontinuities in the equations are not handled explicitly. All of these issues may

significantly impact the performance or even success of the numerical calculation. For example, it is well-

known that computing partial derivatives using finite differences is both inaccurate and expensive (if the

full Jacobian matrix is desired and no structural information is available). This is particularly important

during parametric sensitivity analysis and chemical kinetics simulations, where derivative calculation can

contribute significantly to the overall cost of the calculation. Similarly, without knowledge of how the

independent variables influence the dependent variables, the simulator must assume that every dependent

variable is a function of every independent variable. Consequently, the blocks of the overall Jacobian

matrix (i.e., Jacobian matrix of the entire process model) corresponding to external procedures will be

completely dense. This has two implications. First, sparsity is not exploited during LU factorization.

This can substantially effect performance since it is the presence of sparsity that enables equation-oriented

process simulators to solve large scale systems of equations efficiently. Second, even if the problem is

relatively small or matrix-free algorithms are employed during numerical solution, not having the sparsity

pattern for the subsets of equations corresponding to external procedures substantially reduces the utility

of performing structural diagnosis of errors or problems in model formulation. A number of structural

algorithms are applied to the process model by equation-oriented simulators, providing information such

as whether or not the model is well-posed, high index, etc. Equations contained in external procedures

may introduce these problems into the overall process model and by not using the exact sparsity pattern

during the structural analysis, the user may not get appropriate diagnosis of errors, other than failure

of the subsequent numerical calculation. Finally, when external procedures are treated as “black-boxes”

6
any discontinuities contained in the external code will not be revealed to the numerical algorithm. These

discontinuities are quite common and are the result of nonsmooth intrinsic functions such as MIN, MAX,

and ABS appearing in the code, in addition to the more obvious IF statements. The consequences of

not explicitly handling these discontinuities during numerical integration are well-known [10, 11] and

typical results are inefficient calculations and/or integration failures. The situation is much worse for

parametric sensitivity analysis where failure to handle the discontinuities properly will often lead to

quantitatively and qualitatively incorrect results being computed without any warning to the user [12, 13].

The importance of proper handling of external procedures is probably most compelling with the use of

physical property libraries. Calls to external physical property subroutines can account for as much as

90% of the overall computational cost of a process flowsheet calculation and the discontinuities embedded

in these subroutines are the cause of as much as 90% of the numerical integration failures during dynamic

simulation.

The difficulty of combining third-party codes properly into process modeling environments has been

recognized for many years. One approach is to incorporate external codes as software components via

technologies such as OMG’s CORBA [14] and Microsoft COM [15]. For example, the CAPE-OPEN

committee [16] has developed interface standards that will enable integration between a wide variety of

process modeling components, provided they share the standardized interfaces and the process modeling

environment implements the necessary component architecture. Unfortunately, an end user desiring to

integrate their external code using this approach will have to implement the necessary interfaces. For

example, the standard interface for the model components used within equation-oriented process modeling

environments, the Equation Set Object (ESO) [17, 16], requires the user to provide numerical values of

the partial derivatives, sparsity patterns, and an explicit form of the state transition network [18, 19]

corresponding to the discontinuous equations in the model. This is not only a burden (e.g., there can

7
be a combinatorial number of modes in the state transition network), but requires the end user to be

familiar with component-oriented programming and all of its pitfalls [20]. Lastly, although component-

based technologies enable integration between different components, there are communication overheads

that can adversely effect the performance of the resulting calculation.

This paper describes an alternative approach where automated code analysis and code transformation

techniques are used to incorporate properly external procedures automatically into an equation-oriented

modeling environment. In this approach, the user’s code is analyzed automatically and new code is

generated which is compiled and linked into the process simulator, providing all of the necessary symbolic

information that would otherwise be neglected. In fact, all of the symbolic information that is available

to the numerical algorithms for the subset of equations that are written in the input language of the

simulator also becomes readily available for the equations corresponding to external procedures, provided

source code is present. This enables the modeler to use external procedures with the confidence that

the subsequent calculation will be performed efficiently, robustly, and correctly. Furthermore, these

techniques, which will be described in detail below, enable the user to apply a much broader class of

numerical algorithms to models embedding external codes. For example, the automated code analysis

and code generation techniques can also be used to generate code for evaluating convex relaxations of

nonlinear functions [21] and interval extensions of the process model [22], allowing activities such as

robust solution of nonlinear systems of equations, global optimization, and nonconvex MINLP. Another

advantage of the correct incorporation of external procedures is in applications where speed is crucial (e.g.,

online applications). Most modern process modeling environments employ an interpretive architecture

for evaluating model equations and derivatives. That is, the model equations and some symbolic form

of the partial derivatives are held in computer memory as data structures which are “interpreted” to

provide values. It is well-known that interpreted evaluation can be as much as an order of magnitude

8
slower than compiled evaluation. Although this does not typically imply the overall calculation is an

order of magnitude slower, the speedup can be significant by using optimized, compiled external code,

particularly in applications such as parametric sensitivity analysis, dynamic optimization, and stochastic

optimization.

The ideas developed in this paper are implemented in two software packages, ABACUSS II and

DAEPACK. The following two sections elaborate on the features of these software packages. This is

followed by example problems illustrating the importance of proper incorporation of external procedures

into an equation-oriented modeling environment.

3 Equation-oriented Modeling Environment

The equation-oriented modeling environment employed to demonstrate the ideas presented in this paper

is ABACUSS II [23]. Similar to other equation-oriented modeling environments, such as gPROMS [9],

Aspen Custom Modeler, DIVA [24], and ASCEND [25], ABACUSS II provides an intuitive, high level

declarative input language with which the user may describe the process model (see syntax manual in

[23] for details). ABACUSS II also allows the user to formulate some (or all) of the process model using

external subroutine or procedures.

The algorithmic (or automatic) differentiation (AD) literature (e.g., [26]) assumes that the code to

be differentiated operates in the following manner. A collection of independent variable values are taken

as arguments to the code, and the code evaluates the values of a collection of dependent variables as

a function of these independent variables. This very general model for the operation of a code is also

adopted for external procedures interfaced to ABACUSS II. If we denote the set of independent variables

by x ∈ Rnx and the set of dependent variables by y ∈ Rny , a reference to an external procedure within

9
an ABACUSS II model amounts to inserting the following ny equations in the overall process model:

y = f (x) (1)

where f : Rnx −→ Rny is the function evaluated by the external code. Note that the number of equations

implied by the external procedure is determined by the number of dependent variables.

We will use the following simple Fortran subroutine that just multiplies two numbers and returns the

result to illustrate the utility of this formulation:

SUBROUTINE MULT(A,B,C)

DOUBLE PRECISION A,B,C

C = A * B

RETURN

END.

In this example, A and B are the independent variables, and C is the dependent variable. It is first

necessary for the user to communicate this information in the ABACUSS II input language, which is

done with the following code segment:

EXTERNAL MULT(INDEPENDENT,

INDEPENDENT,

DEPENDENT) ;

END.

This declares that there is an external procedure identified by MULT that has three arguments, the first

two of which are independent variables, and the last being a dependent variable. Since any argument

may be an open dimension array, the actual number of equations implied by the external procedure is

not fixed at this point.

10
Once the EXTERNAL block has been introduced, it is possible to employ it to define equations within

ABACUSS II MODEL blocks. For example,

MODEL Example1

VARIABLE

U,V,W AS NOTYPE

EQUATION

MULT(U,V,W) ;

END

introduces the equation

w = uv (2)

into the process model. On the other hand, the following ABACUSS II input:

MODEL Example2

VARIABLE

U,V AS NOTYPE

EQUATION

MULT(U^2,V,0) ;

END

introduces the equation

u2 v = 0 (3)

into the process model. It should be noted that any of the arguments in the reference to the external

procedure may be expressed as a function of the MODEL variables, including constants as illustrated in

this second example.

11
Often, an external procedure will correspond to the residual evaluator for a system of equations, the

residual being zero at a root of the equations. For example the subroutine:

SUBROUTINE RES(NZ,T,Z,ZPRIME,DELTA)

INTEGER NZ

DOUBLE PRECISION T,Z(NZ),ZPRIME(NZ),DELTA(NZ)

DELTA(1) = ...

etc.

RETURN

END

would be introduced by the EXTERNAL block:

EXTERNAL RES(INTEGER,

INDEPENDENT,

INDEPENDENT,

INDEPENDENT,

DEPENDENT) ;

END

and could be used in the MODEL block:

MODEL Example3

PARAMETER

NC AS INTEGER

VARIABLE

X AS ARRAY(NC) OF NOTYPE

12
EQUATION

RES(NC,TIME,X,$X,0(1:NC)) ;

END

to introduce the following equations:

f (t, x, ẋ) = 0 (4)

into the overall process model where f : R × Rnx × Rnx −→ Rnx . Note that the number of equations

introduced is inferred from the number of dependent variables. In this case, the dependent variables are

the zero vector, so it is necessary to associate a dimensionality with this constant in order to infer the

number of equations. This is denoted by the slice syntax 0(1:NC).

As stated above, a common situation where external procedures are employed is for computing physical

properties. In the example below, a call is made to an external physical property routine to compute the

equilibrium K-values. In the MODEL block in Figure 1, the array equation Y=K*X corresponds to NC

equations, one for each component, and it is equivalent to writing:

DO I := 1 TO NC DO

Y(I) = K(I) * X(I) ;

END.

The next equation is the reference to the external procedure. This line in the input file defines NC

additional relationships of the form: Ki = Ki (T, P, x, y), i = 1, . . . , Nc . The ABACUSS II function

SIGMA sums the entries of an array. That is, the equation SIGMA(X) = 1 corresponds to the equation:
Nc
X
xi = 1
i=1

The external procedure KVAL in Figure 1 has seven arguments. The first is an integer parameter

corresponding to the number of components present (the integer keyword indicates this argument is

13
EXTERNAL KVAL(integer # Number of components
dependent, # Array of K-values computed in this routine
independent, # Temperature
independent, # Pressure
independent, # Array of liquid mole fractions
independent, # Array of vapor mole fractions
workspace double # Real workspace required by this routine
) ;

MODEL VLE
# ABACUSS II model computing vapor-liquid equilibrium

PARAMETER

NC AS INTEGER

VARIABLE

Temp AS Temperature
Pres AS Pressure
X AS ARRAY(NC) OF MoleFraction
Y AS ARRAY(NC) OF MoleFraction
K AS ARRAY(NC) OF PositiveValues

EQUATION

# VLE
Y = K * X ;

# Physical property calculation in external code


KVAL(NC,K,Temp,Pres,X,Y,100) ;

SIGMA(X) = 1 ;
SIGMA(Y) = 1 ;

END

Figure 1: External declaration and ABACUSS II input file excerpt for the computation of VLE using an
external physical property subroutine.

14
simply a parameter and not a model variable). The second argument, an array of K-values, is designated

as the dependent variables, that is, these values are computed from the independent variables by the

external procedure. The third through sixth arguments are independent variables corresponding to

temperature (a scalar), pressure (a scalar), and liquid and vapor composition (both arrays). The final

argument enables the user to pass workspace to an external procedure; the user simply specifies the size

of the workspace required by the routine. The external subroutine called has the interface shown below.

SUBROUTINE KVAL(N,K,T,P,X,Y,W)

INTEGER N

DOUBLE PRECISION K(N),T,P,X(N),Y(N),W(*)

END

The subroutine computing the K-values may be arbitrarily complex and call any number of additional

subroutines and/or functions. What is important is that given values for the independent variables

temperature, pressure, and the vapor and liquid mole fractions, the code return the corresponding values

for the dependent variables, the K-values.

The interface to external procedures used in ABACUSS II models is completely arbitrary provided

all arguments are native types (e.g., integers, reals, and characters) and the external procedure may be

implemented in any compiled programming language (e.g., C/C++, Fortran, and Pascal). The imple-

mentation inside the external code is completely arbitrary.

The key novelty in this paper is that, if source code is available for this subroutine, ABACUSS II will

use DAEPACK to automatically construct all of the additional symbolic information it needs in order to

15
perform the subsequent numerical calculation. DAEPACK is described in detail in the following section.

The remainder of this section outlines the architecture of the ABACUSS II software.

The architecture of ABACUSS II has been designed to enable high levels of flexibility in several

areas, including how the software is interfaced, how the software is deployed and in what environment,

how the process model is described, and what numerical algorithms are applied to the process model.

Figure 2 contains a schematic of the three layer architecture of ABACUSS II. The top layer is the

Input Layer

Microsoft Developer Custom C, C++, Fortran, etc


MatlabTM CLI Program
ExcelTM GUI GUI

Middle Layer
(Embeddable Process Simulator)

ABACUSS II (Input Translator and Calculation Executive)

Available as shared library (.so or .dll), object, or component

Bottom Layer

DAEPACK and other Numerical Engines

External Code External Solvers

Figure 2: Layered architecture of ABACUSS II.

interface level, where user has access to the functionality of ABACUSS II to perform the desired analyses

and calculations. The ABACUSS II distribution provides both a command-line interface (CLI) and a

graphical user interface (GUI). In addition, ABACUSS II exports a set of documented interfaces that

may be accessed in any number of third-party applications including Microsoft Excel and Matlab. The

exported interfaces also allow the user to readily construct custom GUIs using Microsoft Visual Basic

16
or Java, for example. Lastly, the full functionality of ABACUSS II is available within programs written

by the user in programming languages such as C, C++, or Fortran. The middle layer of ABACUSS II

is the input translator and calculation executive. This layer implements the interfaces exported to the

top layer. This level includes features such as input file translation, model symbolic analysis, and the

calculation executive. This level also provides a number of visualization tools for examining information

such as model structure and numerical results. This layer is available in several formats which allows

ABACUSS II to be deployed in a number of environments. The software is available as a shared library

(.so in UNIX/Linux and .dll in Windows) and C++ object for using the software locally. The software

may also be executed remotely as a CORBA or DCOM component or using XML-RPC. The bottom layer

provides numerical algorithms and ability to incorporate external procedures. The external procedures

incorporated may be either portions of an overall model or user-supplied numerical components. The

ability to incorporate properly external procedures, the focus of this paper, is provided by the software

package DAEPACK, described in detail below.

4 Symbolic Incorporation of External Procedures

The symbolic techniques used to incorporate external procedures into an equation-oriented modeling en-

vironment have been implemented in the software library DAEPACK, a general purpose software library

for numerical calculations [27, 28]. DAEPACK is divided into two main sections, one containing a collec-

tion of numerical components for performing calculations such as numerical integration and parametric

sensitivity analysis, and the other containing a set of symbolic components which construct automatically

all of the symbolic information required by the numerical components. DAEPACK provides both the

standard numerical functionality of ABACUSS II and the ability to incorporate properly external codes

17
into an ABACUSS II model. Currently, the symbolic components of DAEPACK only support Fortran

source code, however, the ideas may be readily extended to any procedural programming language.

The symbolic components of DAEPACK have emerged from ideas originally developed in the algo-

rithmic, or automatic, differentiation (AD) community. AD is a technique for computing exact derivative

values (to within roundoff error) for functions implemented in the form of codes written in some program-

ming language. Derivative values are obtained by simply decomposing the computer code into sequences

of elementary operations for which partial derivatives are known symbolically and applying the chain-

rule. It is important to note that AD does not furnish symbolic expressions for the derivatives; rather, it

furnishes values for the partial derivatives of the dependent variables with respect to any desired values

for the independent variables. How the chain-rule is applied gives rise to several variants of AD, each

of which is appropriate under different circumstances. A full description of AD is beyond the scope of

this paper and excellent descriptions of AD may be found in [26, 29, 30, 31], and a description of chem-

ical engineering applications can be found in [32]. What is important to emphasize however is that AD

can be quite efficient, general, and automated. For example, using the appropriate variant of AD, the

cost of evaluating gradients and general vector-Jacobian and Jacobian-vector products is only a small

multiple of the cost of evaluating the underlying function evaluation code, independent of the number of

variables or functions involved. If the full Jacobian is desired, then sparsity may be exploited to reduce

the cost of Jacobian evaluation in a number of ways [33, 34, 27, 35]. AD is also quite general. The

underlying functions need not simply be implemented as a sequence of assignments but can be arbitrarily

complex code including common blocks, loops, IF statements, and complex hierarchies of subroutines

and functions. Moreover, AD is more properly termed algorithmic differentiation (see [26]) because the

code may contain complex iterative solution algorithms (e.g., computing partial molar volumes from an

equation of state using an iterative algorithm). Finally, the application of AD can be completely auto-

18
mated given relatively little information about the original code. Of course improved performance (both

in terms of memory usage and computational complexity) can be achieved through careful application of

appropriately selected AD techniques to specific codes.

Two main approaches exist for applying AD to computer codes: the operator overloading approach and

the source-to-source translation approach. The former uses the operator overloading capabilities of many

modern programming languages (e.g., C++, Fortran-90, and Pascal-SC) to have the compiler generate

the additional instructions required for computing the derivative values. Some AD implementations

applying this technique are [36, 37]. The latter approach relies on the use of compiler technologies to

generate new code for computing the derivative values. That is, given a code for evaluating a function,

source-to-source AD tools will generate new code for evaluating the derivative values. This new code

is compiled and linked with the application requiring the derivative values. Some AD implementations

employing this approach are [38, 39, 40, 27].

The symbolic components of DAEPACK extend the source-to-source transformation ideas of AD

described above to generate from a function evaluator code a wide variety of additional codes required

when applying state-of-the-art numerical algorithms to the model. DAEPACK provides a component that

constructs derivative code for evaluating gradients of scalar functions and vector-Jacobian and Jacobian-

vector products of vector functions. In addition, if the model is sparse, DAEPACK can be used to

generate code exploiting this sparsity to compute sparse Jacobian matrices with surprising efficiency (see

Example section below). DAEPACK may also be used to generate code that determines the sparsity

pattern of the code for use by sparse linear solvers, in block triangularization, and structural diagnosis. If

the original code contains nonsmooth intrinsic functions or IF statements, then DAEPACK can generate

new code which allows these discontinuities to be handled properly during numerical integration [41] and

parametric sensitivity analysis [13]. DAEPACK can also be used to generate new codes that evaluate the

19
interval extension of the original code and convex relaxations of nonlinear functions in the original code.

This enables the user to apply algorithms such as interval Newton/Generalized bisection [22] and global

optimization and nonconvex MINLP [21].

Figure 3 contains a schematic showing the steps performed by DAEPACK to transform the user’s

original code into a set of new codes providing a wide variety of information. In this example, the user
Fortran Source Code for
New Fortran Source
External Procedure Evaluation
Code for Hybrid Calculations
File1.for EXT(...)

{
"Locked" Model
Evaluation
EXTDL(...)
Discontinuity Function
SUB1(...) SUB2(...)
Evaluation

Sparsity Pattern
FUNC1(...) FUNC2(...) TRANSLATION EXTDL_SP(...)
Determination

FUNC3(...) Jacobian Matrix


EXTDL_AD(...)
Evaluation
DEPENDENCY
ANALYSIS
Discontinuity Function
DISCON_AD(...)
Gradient Evaluation
Code Generation
Specifications

EXTDL_ADT(...) Derivatives of Model


CODE Residuals and
TRANSFORMATION Discontinuity Functions
with Respect to Time
EXTDL_ADP(...) and Parameters

CODE
GENERATION

DAEPACK
Automatic
Code Generation

Figure 3: Automatic generation of code with DAEPACK.

must simply provide the collection of Fortran source codes defining the external model and a DAEPACK

specification file. The specification file contains information such as what code is to be generated, which

are the independent and dependent variables, etc. ABACUSS II is able to generate automatically this

specification file from the ABACUSS II input file containing the declaration of the external code. In this

figure, six new Fortran codes are generated from the original code, providing all of the information for

20
performing hybrid parametric sensitivity analysis [13].

Figure 4 shows how DAEPACK is used in conjunction with ABACUSS II to incorporate properly

external codes into an overall process model. In this example, the ABACUSS II input file contains a

Input file (.ABACUSS)


EXTERNAL ext ...
.
(1)
.
ABACUSS II
.
(2) MODEL MYMODEL
.
DAEPACK .
Library (.dll, .so, etc.) .
END
(4) (3)
"Locked" Model Evaluator
External code (.f)
Jacobian Evaluator
SUBROUTINE EXT(...)
Sparsity Pattern Evaluator .
.
Discontinuity Function Evaluator .
END
"Convexified" Model Evaluator
Hidden from user

Figure 4: Symbolic incorporation of external procedures with DAEPACK.

reference to an external subroutine EXT, for which the user has source code. Given the location of this

source code and a description of the independent and dependent variables (from the declaration of EXT

in the ABACUSS II input file), ABACUSS II will call DAEPACK to generate all of the code it needs to

perform the subsequent calculation robustly, efficiently, and correctly. The code generated by DAEPACK

will be compiled and placed into a shared library which can be dynamically loaded by ABACUSS II prior

21
to executing the calculation.

To summarize this section, all of the additional symbolic information required by ABACUSS II for

proper numerical calculation (e.g., sparsity pattern, analytical derivative values, discontinuity informa-

tion, etc.) can be obtained for external code with DAEPACK (provided the source code is available

and written in Fortran). Thus, all equations, regardless of whether they are written in the ABACUSS

II input language or available as external code, are treated in a consistent and correct manner within

ABACUSS II. The following section contains several example problems illustrating the importance of

proper handling of external procedures.

5 Examples

This section contains several example problems illustrating the importance of proper incorporation of

external procedures into an equation-oriented modeling environment.

5.1 Hidden Discontinuities

The first example consists of three (very) small DAEs containing discontinuities. Each of these DAE

systems were coded into a Fortran subroutine and incorporated into ABACUSS II in two ways: 1) as

a “black-box” and 2) with additional symbolic information provided by DAEPACK. The first example,

case A, is:

ẋ = 1 (5)

 xk1

 if x ≤ 0
y =
 xk2

 otherwise.

22
See top diagram in Figure 5. The integration statistics are shown in first two columns of Table 1. As

might be expected by the continuity at the event there is not a significant difference between the two

cases although explicit handling of the event does reduce the number of error test failures. The second

example, case B, is:

ẋ = 1 (6)

 xk1

 if x ≤ 0
y =

 αx
 otherwise.

See center diagram in Figure 5. The integration statistics for this example are shown in the second two

columns of Table 1. In this example, there is a significant difference between the case where the external

procedure is incorporated as a “black-box” and when it is not. The third example, case C, is:

ẋ = 1 (7)

 xk1

 if x ≤ 0
y =
 α + xk2

 otherwise.

See bottom diagram in Figure 5. In this example, the numerical integration fails at the event when the

external is treated as a “black-box”. Although these three simple DAEs are small they illustrate the

importance of properly handling external procedures containing discontinuities. In the second example,

the “black-box” mode required nearly twice as many residual evaluations and Jacobian evaluations and

LU factorizations. If these equations were part of a much larger overall model then this additional work

could be quite significant. In the third example, if these equations were part of a much larger overall

model, they would probably cause a numerical integration failure and it would be very difficult to isolate

the cause of the failure.

23
Table 1: Integration statistics for three simple discontinuity examples. Steps = number of integration

steps performed. RES = number of model residual evaluations required. JAC = number of Jacobian

evaluations required. ETF = number of integration error test failures. CTF = number of integration

corrector test failures.

Case A Case B Case C

Black-box DAEPACK Black-box DAEPACK Black-box DAEPACK

Steps 177 159 72 47 F 44

RES 309 270 135 65 A 62

JAC 52 50 89 48 I 38

ETF 13 8 25 4 L 2

CTF 0 0 0 0 S 0

24
5.2 Correct Use of Chemical Kinetics Libraries

The remaining two examples illustrate the importance of handling properly external procedures for phys-

ical property and chemical kinetics calculations. These computations are typically very complex, costly,

and difficult to code correctly and efficiently. In addition, there are many existing high quality codes

available that have been extensively validated and are trusted. Consequently, they should be used with

minimal modification whenever possible.

The first example is a model of an adiabatic, constant pressure problem for a perfectly stirred, batch

reactor. The model consists of the following Nc + 2 equations:

dyi
ρ = Wi wi i = 1, . . . , Nc (8)
dt
Nc
dT X
ρCp = Wk hk wk (9)
dt
k=1

ρ = ρ(T, y) (10)

where ρ is the mass density, yi is the mass fraction of component i, Nc is the number of chemical

species, Wi is the molecular weight of species i, T is temperature, Cp is the constant pressure heat

capacity, wi is the molar production rate of species i per unit volume, and hi is the enthalpy of species

i. The molar production rates, heat capacity, mass density, and enthalpies were computed with external

Fortran subroutines from the CHEMKIN-II library [42]. The chemical mechanism for the reaction of

oxygen, nitrogen, and n-heptane involves 544 chemical species (i.e., Nc = 544) and 2446 reactions.

This mechanism was obtained from Curran et al. [43]. Two simulations were performed with this

model, one where the external procedures were treated as “black-boxes” and the other when they were

incorporated properly using DAEPACK. The numerical integration was performed for a simulated time of

5.0 seconds on a 1.4 GHz PC with 512 MB RAM. The initial conditions were: yO2 = 0.0252, yN2 = 0.9734,

yC7 = 0.0014, yO = yH = 1o−16 and T = 800 K. Table 2 contains the timing information for this example.

25
Table 2: Timing information for constant pressure batch reactor example. Calculations performed on 1.4

GHz PC with 512 MB RAM.

Timings in seconds Black-Box DAEPACK

Overall simulation time 454.8 20.6

Clearly significant improvements are realized by exploiting the additional symbolic information available

when the external procedures are incorporated properly. This benefit is examined in more detail in the

following example.

The next example is a reacting flow simulation. A gaseous mixture of oxygen, nitrogen, and n-heptane

are injected into a tubular reactor. In this model, the reactor is assumed to be isothermal and isobaric

and the gas is assumed to be ideal. Also, it is assumed that there are only variations in time and the

axial direction and diffusion is negligible. These assumptions result in the following system of equations:

∂xi ∂(xi vz ) RT
+ − wi = 0 i = 1, . . . , Nc − 1 (11)
∂t ∂z P
Nc
∂vz RT X
− wk = 0 (12)
∂z P
k=1
Nc
X
xk = 1 (13)
k=1

where xi (t, z) is the mole fraction of species i, Nc is the number of species, t is time, z is the axial

coordinate, R is the gas constant, T is temperature, P is pressure, vz (t, z) is the gas velocity in the

z-direction (all other velocity components are assumed to be negligible), and wi (T, P, x(t, z)) is the molar

production rate of species i per unit volume. The PDAE above was discretized using upwind finite

differences and coded into an ABACUSS II input file. As in the previous example, the molar production

rates, {wk }N
k=1 , were computed with external Fortran subroutines from the CHEMKIN-II library and the
c

26
same n-heptane mechanism was used.

Using 10 grid points in the discretization the overall model consisted of 10,890 variables (Nc mole

fractions, Nc molar production rates, and one velocity on each of the ten grid points) and equations.

Figure 6 contains a diagram of the exact sparsity pattern for the unsteady PFR model. The ordering of

variables in this sparsity pattern are the molar production rates on grid point 1, followed by the molar

production rates on grid point 2, and so on. The molar production rates are followed by the mole fractions

on grid point 1, followed by the mole fractions on grid point 2, and so on. The velocities on each grid

point are the last 10 columns of the sparsity pattern. The order of the equations (rows of the sparsity

pattern) are the calls to the external Fortran code to compute the molar production rates on each grid

point, followed by the discretization of the species balance equations, followed by the discretization of

the velocity relationships, and lastly the summation of mole fraction constraints on each grid point. The

sparsity pattern for the portions of the overall model corresponding to external procedures was obtained

with DAEPACK and these blocks can be seen in the upper right corner of the sparsity pattern in Figure

6. Note that each of these sub blocks of the Jacobian matrix are 544 rows by 544 columns, however,

they only contain 12,518 nonzero entries each. Although not clearly evident due to the limited resolution

of the Figure, these blocks are actually approximately 96% sparse. When these external procedures are

treated as “black-boxes”, these 10 blocks of the overall Jacobian matrix would contain 295,936 entries

each.

Two numerical integrations were performed with this model using ABACUSS II, one where the ex-

ternal procedure was treated as a “black-box” and another where symbolic information was obtained

with DAEPACK. Both simulations were performed at an absolute pressure of 12.5 atmospheres and

a temperature of 900 Kelvin. The reactor length was five centimeters. The initial condition was the

tubular reactor initially filled with pure nitrogen. The boundary condition was a fixed composition and

27
Table 3: Timing information for unsteady PFR example. Calculations performed on 1.4 GHz PC with

512 MB RAM.

Timings in seconds Black-Box DAEPACK

Overall simulation time 8252 280

Single Jacobian evaluation time 160 3.3

Single LU factorization time 2.0 0.1

Single LU backsubstitution time 0.12 0.01

velocity at the inlet of the reactor (mole fractions of O2 , N2 , and n-heptane equal to 0.0252, 0.9734, and

0.0014, respectively, with trace quantities of oxygen and hydrogen free radicals, mole fractions of 10−16

each). The inlet velocity was fixed at 1 cm/s. The numerical integration was performed for a simulated

time of 0.1 seconds on a 1.4 GHz PC with 512 MB RAM. Table 3 contains the timing information for

this example. This example clearly illustrates the benefit of properly incorporating external procedures

during numerical calculations. Using this additional symbolic information, obtained automatically by

ABACUSS II with DAEPACK, the simulation time reduced from approximately 2.3 hours to 4.7 min-

utes, a 30 fold speed improvement. The performance improvement can be attributed to essentially two

factors. First, by efficient accumulation of (structurally) nonzero derivative values in the sparse blocks

of the Jacobian matrix corresponding to the external code, the overall Jacobian matrix evaluation time

was reduced from 160 seconds to 3.3 seconds. In both scenarios, all derivative values other than those

associated with the external code were computed in the same manner. Second, by exploiting sparsity in

the linear solver, the time for a single LU factorization was reduced from 2 seconds to 0.1 seconds and

the time for a single backsubstitution was reduced from 0.12 seconds to 0.01 seconds. Again note that

in the “black-box” example, it is only the sub blocks associated with the external code that are dense

28
and sparsity is exploited for all other portions of the Jacobian matrix in both scenarios. This example

highlights an interesting observation when performing simulations involving complex physical properties

or kinetic mechanism calculations. In many numerical integration calculations, it is the cost of the LU

factorization that tends to dominate the cost of the overall calculation (which is why most modern nu-

merical integration codes attempt to reduce the number of LU factorizations performed). However, if

the residual evaluation is costly, as is the case in the example above, the Jacobian evaluation time can

significantly exceed the cost of the linear algebra. Thus, proper incorporation of external procedures

can substantially reduce the cost of the calculation even if matrix-free linear solvers are applied. Also of

significance is the amount of memory saved by exploiting sparsity. For example, when the external pro-

cedures were treated as “black-boxes”, each dense block of the overall Jacobian matrix corresponding to

these equations contained 295,936 entries. Since the matrix is stored in sparse triplet form (i.e., a double

precision array containing the values of the Jacobian matrix and two integer arrays containing the row

and column indices) the external blocks alone require approximately 47 megabytes to store the derivative

information. This is compared to only 2 megabytes of storage required for the external procedure blocks

when sparsity is exploited.

6 Conclusion

This paper describes how source-to-source code transformation techniques can be used to incorporate

external code into an equation-oriented process modeling environment properly. This enables the user

to write complex models described partly in the input language of the process simulator and partly with

new or legacy external codes. By properly handling these external procedures, the user can be confident

that the subsequent calculation will be performance robustly, efficiently, and correctly.

29
The ideas described in this paper have been implemented with the equation-oriented process simulator

ABACUSS II and numerical and symbolic software library DAEPACK. DAEPACK currently works with

Fortran but can be readily extended to other procedural programming languages. Comparable capabilities

can be achieved with object-oriented languages like C++ using operator overloading features.

Although the focus of this paper is on incorporating external procedures into equation-oriented mod-

eling environments, the techniques described are quite useful for incorporating external procedures into

modular simulators for steady-state simulation and optimization. In particular, the ability to gener-

ate fast and accurate analytical derivative values can often substantially improve the performance of

steady-state calculations, in particular, steady-state optimization.

Acknowledgments

The authors would like to acknowledge support from the EPA Center for Airborne Organics at MIT and

Mitsubishi Chemical Corporation.

References

[1] J. M. Douglas. Conceptual Design of Chemical Processes. McGraw-Hill, New York, 1988.

[2] J. D. Seader. Computer modeling of chemical processes. In AIChE Symposium Series, volume 81.

American Institute of Chemical Engineers, 1985.

[3] A. L. Parker and R. R. Hughes. Approximation programming of chemcial processes – 1: Optimization

of FLOWTRAN models. Computers and Chemical Engineering, 5(3):123–133, 1981.

30
[4] L. T. Biegler and R. R. Hughes. Process optimization: A comparative case study. Computers and

Chemical Engineering, 7(5):645–661, 1983.

[5] S. C. Kassianides. An Integrated System for Compute Based Training of Process Operators. PhD

thesis, University of London, London, U.K., 1991.

[6] S. Mani, S. K. Shoor, and H. S. Pederson. Experience with simulator training for ammonia plant

operators. Plant/Operations Progress, 10:6–10, 1990.

[7] F. A. Perris. The growing importance of dynamic simulation for process engineers. In Dynamic

Simulation in the Process Industries, UMIST Manchester, 1990. IChemE.

[8] J. D. Perkins and G. W. Barton. Modelling and simultion in process operation. In G. V. Reklaitis and

H. D. Spriggs, editors, Foundations of Computer Aided Operations, pages 287–316. Cache-Elsevier,

1987.

[9] Paul Inigo Barton. The Modeling and Simulation of Combined Discrete/Continuous Processes. PhD

thesis, University of London, London, U.K., May 1992.

[10] M. B. Carver. Efficient integration over discontinuities in ordinary differential equation simulations.

Mathematics and Computers in Simulation, XX:190–196, 1978.

[11] J. L. Hay and A. W. J. Griffin. Simulation of discontinuous dynamical systems. In L. Dekker,

G. Savastano, and G. C. Vansteenkiste, editors, Proceedings of the 9th IMACS Conference on Sim-

ulation of Systems, 1979. North-Holland, 1980.

[12] Santos Galán, Willian F. Feehery, and Paul I. Barton. Parametric sensitivity functions for hybrid

discrete/continuous systems. Applied Numerical Mathematics, 31:17–47, 1999.

31
[13] John E. Tolsma and Paul I. Barton. Hidden discontinuities and parametric sensitivity calculations.

accepted, SIAM Journal on Scientific Computing, 2001.

[14] OMG. The Common Object Request Broker: Architecture and specifications. Technical Report

Release 2.0 July 1995, Update July 1996, Object Management Group, 1997. Formal document

97-02-25, (http://www.omg.org).

[15] D. Box. Essential COM. Addison–Wesley, Menlo Park, CA, 1998.

[16] CAPE-OPEN Consortium. Conceptual Design Document, December 1997. Adobe Acrobat PDF

document obtainable from http://www.quantsci.co.uk/CAPE-OPEN.

[17] B. L. Braunschweig, C. C. Pantelides, H. I. Britt, and S. Sama. Open software architectures for

process modeling: Current status and future perspectives. AIChE Symposium Series, presented at

FOCAPD ’99, July 1999.

[18] J. G. Pearce. Computater simulation of multi-state systems. In Proc UKSC Conference on Computer

Simulation. IPC Science and Technology Press, 1978.

[19] M. P. Avraam, N. Shah, and C. C. Pantelides. Modelling and optimisation of general hybrid systems

in the continuous time domain. Computers and Chemical Engineering, 22(Suppl.):S221–S228, 1998.

[20] Clemens Szyperski. Component Software: Beyond Object-Oriented Programming. ACM Press, New

York, NY, 1998.

[21] Edward P. Gatzke, John E. Tolsma, and Paul I. Barton. Construction of convex function relaxations

using automated code generation techniques. submitted to Optimization and Engineering, 2001.

[22] R. E. Moore. Methods and Applications of Interval Analysis. SIAM, Philadelphia, 1979.

32
[23] John E. Tolsma, Jerry Clabaugh, and Paul I. Barton. ABACUSS II: Advanced modeling environment

and embedded process simulator. Technical Report ABACUSS II Web Page, Massachusetts Institute

of Technology, 2000. http://yoric.mit.edu/abacuss2/abacuss2.html.

[24] P. W. Holl, W. Marquardt, and E. D. Gilles. DIVA – A powerful tool for dynamic process simulation.

Computers and Chemical Engineering, 12:421, 1988.

[25] Peter C. Piela. ASCEND: An Object-oriented Computer Environment for Modeling and Analysis.

PhD thesis, Carnegie-Mellon University, Pittsburg, PA, 1989.

[26] A. Griewank. Evaluating Derivatives: Principles and techniques of algorithmic differentiation. SIAM,

Philadelphia, PA, 2000.

[27] John E. Tolsma and Paul I. Barton. DAEPACK: A combined symbolic and numeric library for

general numerical calculations. Technical Report DAEPACK Web Page, Massachusetts Institute of

Technology, 2000. http://yoric.mit.edu/daepack/daepack.html.

[28] John E. Tolsma and Paul I. Barton. DAEPACK: An open modeling environment for legacy models.

Industrial and Engineering Chemistry Research, 39(6):1826–1839, 2000.

[29] Andreas Griewank. On automatic differentiation. In M. Iri and K. Tanabe, editors, Mathematical

Programming: Recent Developments and Applications, pages 83–108. Kluwer Academic Publishers,

Dordrecht, 1989.

[30] Masao Iri, T. Tsuchiya, and M. Hoshi. Automatic computation of partial derivatives and rounding

error estimates with applications to large-scale systems of nonlinear equations. J. Computational

and Applied Mathematics, 24:365–392, 1988. Original Japanese version appeared in J. Information

Processing, 26 (1985), pp. 1411–1420.

33
[31] Masao Iri. History of automatic differentiation and rounding estimation. In Andreas Griewank and

George F. Corliss, editors, Automatic Differentiation of Algorithms: Theory, Implementation, and

Application, pages 1–16. SIAM, Philadelphia, Penn., 1991.

[32] John E. Tolsma and Paul I. Barton. On computational differentiation. Computers and Chemical

Engineering, 22(4/5):475–490, 1998.

[33] B. M. Averick, J. J. Moré, C. H. Bischof, A. Carle, and A Griewank. Computing large sparse

Jacobian matrices using automatic differentiation. SIAM J. Sci. Stat. Comput., 15:285–294, 1994.

[34] C. H. Bischof, P. Khademi, A. Bouaricha, and A. Carle. Efficient computation of gradients and

Jacobians by transparent exploitation of sparsity in automatic differentiation. Optimization Methods

and Software, 7:1–39, 1996.

[35] John E. Tolsma and Paul I. Barton. Efficient calculation of sparse Jacobians. SIAM Journal on

Scientific Computing, 20(6):2282–2296, 1999.

[36] M. C. Bartholomew-Biggs, L. Bartholomew-Biggs, and B. Christianson. Optimization and automatic

differentiation in Ada: Some practical experience. Opimization Methods and Software, 4:47–73, 1994.

[37] A. Griewank, D. Juedes, and J. Utke. ADOL–C: A package for the automatic differentiation of

algorithms written in C/C++. ACM TOMS, 22(2):131–167, 1996.

[38] Christian Bischof, Alan Carle, George Corliss, Andreas Griewank, and Paul Hovland. ADIFOR –

Generating derivative codes from Fortran programs. Scientific Programming, 1(1):11–29, 1992.

[39] N. Rostaing, S. Dalmas, and A. Galligo. Automatic differentiation in Odyssee. Tellus, 45A:558–568,

1993.

34
[40] R. Giering and T. Kaminski. Recipes for adjoint construction. ACM Transactions on Mathematical

Software, 24:437–474, 1998.

[41] Taeshin Park and Paul I. Barton. State event location in differential algegraic models. ACM Trans-

actions on Modelling and Computer Simulation, 6(2):137–165, 1996.

[42] R. J. Kee, F. M. Rupley, and J. A. Miller. CHEMKIN-II: A FORTRAN chemical kinetics package

for the analysis of gas-phase chemical kinetics. Technical Report Technical Report SAND89-8009,

Sandia National Laboratory, 1980.

[43] H. J. Curran, P. Gaffuri, W. J. Pitz, and C. K. Westbrook. A comprehensive modeling study of

n-heptane oxidation. Combustion and Flame, 114:149–177, 1988.

35
x k1 x k2

k2 > k1

x k1 α x
α >0
k1 > 1

x k1 α +xk2

α >0

Figure 5: Simple discontinuities: example A (top), example B (middle) example C (bottom).

36
Figure 6: Sparsity pattern of the unsteady PFR model.

37

You might also like