Hindawi Publishing Corporation

Scientific Programming
Volume 2015, Article ID 797325, 22 pages
http://dx.doi.org/10.1155/2015/797325

Research Article
Finite Element Assembly Using an Embedded Domain
Specific Language

Bart Janssens,1 Támas Bányai,2 Karim Limam,3 and Walter Bosschaerts1
1
Department of Mechanics, Royal Military Academy, Avenue de Renaissance 30, 1000 Brussels, Belgium
2
von Karman Institute for Fluid Dynamics, Chaussée de Waterloo 72, 1640 Rhode-Saint-Genèse, Belgium
3
LaSIE, La Rochelle University, Avenue Michel Crépeau, 17042 La Rochelle Cedex 1, France

Correspondence should be addressed to Bart Janssens; bart@bartjanssens.org

Received 18 November 2013; Revised 22 October 2014; Accepted 19 January 2015

Academic Editor: Bormin Huang

Copyright © 2015 Bart Janssens et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In finite element methods, numerical simulation of the problem requires the generation of a linear system based on an integral
form of a problem. Using C++ meta-programming techniques, a method is developed that allows writing code that stays close to
the mathematical formulation. We explain the specifics of our method, which relies on the Boost.Proto framework to simplify the
evaluation of our language. Some practical examples are elaborated, together with an analysis of the performance. The abstraction
overhead is quantified using benchmarks.

1. Introduction aspects are beyond the scope of the current paper. In this
context, we provide an Embedded Domain Specific Language
The application of the finite element method (FEM) requires (EDSL) that can be used to implement finite element solver
the discretization of the integral form of the partial differ- components. It uses a notation similar to the mathematical
ential equations that govern the physics, thus transforming formulation of the weak form of a problem, allowing the
the problem into a set of algebraic equations that can be programmer to focus on the physics rather than coding
solved numerically. In essence, the resulting numerical model details. We assume a typical finite element workflow, where
only depends on the governing equations and the set of a global linear system is assembled from element matrix
basis and test functions. From the point of view of a model contributions. Our language can also describe boundary
developer, it would therefore be ideal to only need to specify conditions and field arithmetic operations.
these parameters in a concise form that closely resembles the All language elements consist of standard C++ code, so
mathematical formulation of the problem, without sacrificing they easily embed into the components of our framework.
computational efficiency. An extension mechanism is available, allowing any developer
The current work is part of Coolfluid 3 [1], a C++ frame- to implement his own language elements without touching
work intended primarily for the solution of fluid dynamics the library code. The implementation uses the Boost.Proto
problems and available as open source under the LGPL v3 [2] framework, which builds on template meta programming
license. The system is designed around a Component class techniques for generating efficient code. The actual perfor-
that can provide a dynamic interface through Python script- mance overhead will be quantified and compared to the
ing and a GUI. Model developers will typically write a global solution time for some applications.
set of components to build a solver for a given set of The automatic generation of code based on an intuitive
equations, using functionality provided by our framework specification of the physical problem is of course a feature that
and external libraries. Problem-dependent settings such as is highly desirable for any numerical simulation framework.
the mesh, boundary conditions, and model parameters can One example of previous work providing such functionality
then be controlled through Python or a GUI, though these is OpenFOAM [3], a C++ library primarily intended for fluid

2 Scientific Programming

mechanics. It allows for the easy expression of differential combination of ? shape functions ?(x) with the unknown
equations by providing an embedded tensor manipulation coefficients ?? :
language, specific to finite volume discretizations. The FEn- ?
iCS project [4] provides an easy and efficient method for ̃
? ≈ ∑?? (x) ?? = ?. (3)
developing finite element discretizations. The approach dif- ?=1
fers from ours in that the language is embedded into Python
and compiled using a just-in-time compiler. It also offers The shape functions depend only on the spatial coordi-
the option of automated analytical evaluation of integrals nates x and to simplify the notation we will from here on
[5]. Another example is the FEEL++ project [6], which also just write ? instead of ?(x). The shape functions have a
local support and the interpolation happens on a per-element
provides a domain specific language embedded into C++
basis, so ? is equal to the number of nodes in an element. The
using expression templates.
interpolation can also be written in vector form as ?̃ = N? f? .
Notwithstanding the existence of all these excellent
Here, N? is a row vector with the shape functions associated
projects, we decided to implement our own framework for with unknown ? and f? is the column vector of the unknown
tight integration with the Coolfluid data structures and to coefficients for the element.
support our workflow of building user-configurable compo- To obtain a linear system for the discrete problem, we
nents. The implementation relies on expression templates, need to multiply the equations with a weighting function and
just like FEEL++, but we use Boost.Proto to handle the integrate over the domain. Using the Galerkin method (i.e.,
expression template generation and to define a grammar the weighting functions are the shape functions) as well as
for our language. This results in simpler code and has a integrating per-element yields the following linear system for
limited impact on runtime performance, as we will show. the Poisson problem, using the weak formulation here:
Even though our language is not as feature-complete as the
?Ω ?Ω
more established FEEL++ and FEniCS projects, we do believe
∑∫ ∇NTf ∇Nf f? dΩ? = ∑∫ NTg ?̃ dΩ? .
that the easy integration of user-defined extensions into the ?=1⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
Ω? ?=1⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
Ω? (4)
language is a unique feature. The purpose of the current ? ?? a?
paper is to present how Proto can be used to construct a
language for finite element modeling—based on a simple Due to the local support of the shape functions, the
example building a mass matrix—and to show the capabilities integral is written as a sum of integrals over all the elements.
and performance of the language we developed. Aspects that Each element integral results in an element matrix ? ?? on the
are new compared to previous work in the field are the use left hand side and an element vector a? on the right hand side.
of Proto as a new way to construct the language and the Summing up the contributions of all elements that share a
possibility to add user-defined terminals to the language. given node assembles a global linear system that has the total
number of nodes as dimension.
This paper is structured as follows. In Section 2 the The indices ? are useful for problems with multiple
mathematical formulation of the integral form will be laid unknowns. In this case, each unknown can be associated
out in order to clearly define the class of problems that is with its own shape function, and the element matrices and
supported. Next, in Section 3 the mechanics for constructing vectors are built up of blocks associated with each variable
and interpreting the EDSL are explained in detail, based and equation.
on a simple application. Section 4 contains some application The matrix assembly procedure is the same for all prob-
examples and Section 5 a performance analysis. Finally, in lems; only the values of the element matrices depend on
Section 6 we present our conclusions and suggest future the initial equation (1). This observation drives the current
work. work: we will present a convenient language to write out the
element equations and automate the steps that are common
2. Finite Element Discretization for all finite element discretizations. The code for the Poisson
problem is presented in Listing 1, where lines (17) and (18)
In this section, we introduce the notation used for finite map directly to the element matrix and vector, as defined
element discretizations, starting from the Poisson problem in (4), and lines (20) and (21) represent the assembly into
as an example. A more generic and much more thorough the global system. Lines (17) and (18) form the core of our
introduction to the finite element method is given in, for language, and the techniques for implementing this will be
example, [7]. Considering a domain Ω with boundary Γ, the explained in detail in the next section. The remainder of
Listing 1 will be further explained in Section 4.1.
differential equations describing the Poisson problem are
From this simple example, the basic requirements for
our language can be identified: provide access to the shape
∇2 ? + ? = 0 over Ω, (1) functions and their derivatives, as well as nodal values of the
variables; control the evaluation of integrals; and allow the
? = ?0 over Γ. (2) application of boundary conditions. Finally, we must be able
to express the assembly into global matrices and vectors. The
To transform these continuous equations into a discrete embedding of such a language into C++ is the main topic of
problem, the unknown ? is interpolated using a linear the remainder of this paper.

that is. which consists of keywords that the user combines into an expression. only valid C++ expressions are allowed. of the weight and shape functions. more detail. This allows us to easily demonstrate the techniques used in Coolfluid 3 while avoiding the need to explain the details of our mesh the integral over an element and the += operator should structure and linear algebra interface. since we only consider a single. 3. Some of the actions may result in calls to a shape N function library or matrix operations. Combining ?Ω Ω? different terminals using operators builds an expression that can be evaluated later. Leaf nodes in this tree correspond to Proto terminals. Line (19) contains the expression. Here. so the final layer contains all external libraries needed to execute these.Proto Data grammar + transforms libraries Dense matrix External Sparse matrix Numerical integration Shape function Mesh library library library library Figure 1: The structure of the program. concrete type of each element in the expression. We want to ensure that the following code the expression for (5). FEEL++ [6]. for each element (mesh. a Proto expression as second argument. The Language Layer. The scope of this paper. retaining the (M+ = transpose (N) ∗ N)) . The example will result in a program outer product of the shape function vectors maps directly that builds the mass matrix for a finite element problem to transpose(N)∗N.” These are C++ objects that are instances of the Proto terminal ? = ∑ ∫ NT N dΩ? . tak. unspecified Listing 2 presents a minimal program that allows compiling scalar variable. showing the three different levels of the implementation.Scientific Programming 3 Language Expression implementation Algorithm Boost. but we can define the += () interpretation freely via operator overloading. This is the same kind of data structure that we encounter in any here. which overloads all of the C++ operators. Construction of the Language Element quadrature Our language consists of three levels of implementation. for example. but it is generated automatically ing the finite element mesh structure as first argument and by Proto here. . This is the job of the algorithm implementation layer.Proto [2] library. the code compiles. The top level contains the language () M Transpose itself. as shown in Figure 1. we show how we can with a single scalar variable. The mass matrix ? is assembled from the outer product 3. Because all operations are defined on a Proto terminal. The expression Figure 2 shows the expression tree corresponding to line consists here of a call to element quadrature to evaluate (19). In what follows. for each element represents a generic function.1. (5) class. element quadrature (6) The result of line (19) is an expression tree. depends only on the Boost libraries and on the Eigen library [8] for matrix operations. Figure 2: The expression tree for line (19) in Listing 2. It is a stand-alone program that evaluate this expression. variable here. the language is parsed and appropriate actions—called semantic actions ∗ N in language grammar terms—are linked to each part of the input. expressions are constructed from so-called “terminals. using a simple stand-alone example. The function The remainder of this section will focus on each layer in call operator is denoted by (). built evaluates (5): from the terminals defined at lines (11) to (13). which are beyond the be interpreted as an assembly into the mass matrix M. expression template library. Because the language is embedded. Using the Boost. This expression is analogous to the The shape function has no index referring to a specific expression templates used in.

The terminal to the left and right hand side. a = 0. (21) } Listing 2: Generating expressions from Boost. The binary nodes: the terminal representing the function itself (e. // All first order Lagrange elements (12) group (13) ( (14) A = 0. (8) struct transpose tag {}.g. (5) (6) // Action handling the assembly (7) Handle<ProtoAction> assembly = create component<ProtoAction>("Assembly"). (8) // Set the expression (9) assembly->set expression(elements expression (10) ( (11) mesh::LagrangeP1::CellTypes(). (18) a[f] += transpose(N(g))∗g (19) ). (20) system matrix += A. corresponding transpose) and the argument to the function. based on the unique type of that parsing and the implementation of the appropriate actions to terminal. that represents the function is used to link it with the correct So far.. (12) proto::terminal< element quadrature tag >::type const element quadrature = {}. . solution tag()).hpp> (2) (3) using namespace boost. initialized to 0 (15) element quadrature // Integration over the element (16) ( (17) A(f) += transpose(nabla(f)) ∗ nabla(f). (20) return 0. // Assemble into the global linear system (21) system rhs += a (22) ) (23) )). (7) struct element quadrature tag {}. ScalarField> g("g". operators += and ∗ also have two children. (3) // The source term. This implies that functions can easily be renamed evaluate this are the subject of the next section. (14) (15) int main(void) (16) { (17) double∗ M. "source term"). (9) (10) // Some statically created terminals (11) proto::terminal< shape func tag >::type const N = {}. (1) #include <boost/proto/proto. Listing 1: Implementation of the assembly procedure for the Poisson problem. ScalarField> f("f". The code of the function. to be set at runtime using an initial condition (4) FieldVariable<1.// The element matrix and RHS vector. the expression on line (19) does nothing.4 Scientific Programming (1) // The unknown function. (13) proto::terminal< transpose tag >::type const transpose = {}. (4) (5) // Different types to distinguish terminals at compile time (6) struct shape func tag {}. (18) // This is a proto expression: (19) element quadrature(M += transpose(N)∗N).Proto terminals. Each function call operator (denoted by ()) has two child by creating a new terminal of the same type. The first template argument is a constant to distinguish each variable at compile time (2) FieldVariable<0.

(21) eval element quadrature(proto:: value(proto:: left(proto:: child1)). The or construct allows us to list (8). we take a top-down approach for acts as a Proto “transform” (i. The grammar matches this using embedded domain specific language provided by Proto itself proto::terminal with the appropriate type on line for defining a grammar. (9) eval shape func(proto:: data) (10) >. In (6). Line (9) is then not really a function call. All the grammar code is checked in sequence. 3. Listing 3: Grammar capable of parsing a simple expression for interpolating coordinates using a shape function. The second argument in the when clause (line a series of rules (each a Proto grammar in itself) that are (9)) performs the evaluation. To be usable in a Proto grammar. we first the expression by executing the semantic actions embedded define a class—grammar in Proto terminology—capable into the grammar. are used by Proto to compute the result type of the functor. which does The argument consists of context data here—the “data” block not evaluate the expression but simply checks if it matches from Figure 1—and will be discussed in detail later. but the the semantic action—that describes how to evaluate the type of a function that returns something of the type expression matching the when clause. eval shape func and takes proto:: data as argument. . so everything is actually a kind of syntax to expect. followed by a second argument— type. time. The when statement first describes what inside template parameters. (15) eval transpose(fem grammar(proto:: child1)) (16) >. First. Lines (4) to (13) further discussed in the current work.e. To this end. (23) // On any other expression: perform the default C++ action (24) proto:: default<fem grammar> (25) > (26) { (27) }. The grammar can This function is never really defined. we can use fem grammar as a tation of expressions and the execution of the associated C++ functor on an expression. a given grammar. The algorithm The main use of the grammars is the actual evaluation of implementation layer in Figure 1 takes care of the interpre.. a functor that can help in finding errors in expressions. The class defined in Listing 3 grammar in detail. shape function expressions are The grammar is defined entirely in a templated type from matched and evaluated. in which case the grammar actions. taking into account only the when clauses. (17) // Evaluate element quadrature using eval element quadrature (18) proto::when (19) < (20) proto::function< proto::terminal<element quadrature tag>. (11) // Evaluate transpose expressions using eval transpose (12) proto::when (13) < (14) proto::function<proto::terminal<transpose tag>. proto:: data) (22) >. fem grammar> >.Scientific Programming 5 (1) struct fem grammar: (2) // Match the following rules in order: (3) proto::or (4) < (5) // Evaluate shape functions using the eval shape func transform (6) proto::when (7) < (8) proto::terminal<shape func tag>. proto::plus assign<proto:: terminal<Eigen::MatrixXd>. at line (6). The Algorithm Implementation Layer. but Proto uses the type be used to check the validity of an expression at compile to call an eval shape func functor and pass its arguments. The implementation of the eval shape func functor is This functionality can be used to generate compilation errors given in Listing 4. This notation is actually the as the terminalN. describes the language that we use in our example expression. To this end. In the case of terminals. evaluating evaluating our example expression. In this section. using the proto::matches metafunction. but this is not must inherit the proto::callable class. we can use the type to distinguish them. a kind of functor).2. proto:: right(proto:: child1). shape functions appear which fem grammar derives. the expressions. We will now describe each rule in the of parsing the expression. fem grammar >.

the element matrix is updated recursive call to fem grammar itself (using child1 to isolate with the value of the expression at the Gauss point. can be part of the code can be omitted. (7) } (8) }. stored in data (19) return data. Here.6 Scientific Programming (1) struct eval shape func: proto::callable (2) { (3) // C++ result of declaration (4) template<typename Signature> (5) struct result. the on line (18) in Listing 3. Next. where the value of the be seen later. the numerical on line (14). On line (20). we make a data (lines (15) and (16)). we impose that only state (0 here). the shape function values and expression that matches fem grammar is the correct form of Jacobian determinant are computed and stored in the context a transpose expression. To evaluate the argument. and the context data. 0. on lines (15)–(20) and on line (19) we return the shape value This expression is then picked apart on line (21) using the functions. First. multiplied the function argument in the expression) and apply the with the appropriate weight and the Jacobian determinant. this cached value is actually computed in the left hand side (an Eigen matrix representing the global system element quadrature function. The function itself is defined used as an argument to the element quadrature function. so this expressions of the plus assign type. matrix). Listing 4: Functor to evaluate the shape functions. which are just cached in the context data. matches are passed to the eval element quadrature functor.transpose(). On line (13). (20) } (21) }. that is. our grammar is default-constructed (i. +=. the first ()) The rule describing the integration over an element starts and then called using three arguments: the expression. The state is similar to . on line (12) in Listing 3. It is a functor that fem grammar () (expr. eval transpose functor to the result. In the matching rule functor is defined in Listing 6. and the context data The next grammar rule.e.. As will left and right Proto operations. (7) takes any matrix type from the Eigen library [8] as argument and calls the transpose function on line (6). (14) (15) template<typename DataT> (16) const typename DataT::element t::shape func t& operator()(DataT& data) const (17) { (18) // Return a reference to the result. This and evaluates the matrix transpose. Listing 5: Functor to evaluate the matrix transpose. (13) }. In C++11. Listing 5 shows the The expression itself is evaluated using the call: relevant code for evaluating the transpose. it is possible to obtain this automatically. the right hand side expression. it is prescribed that a function call using a integration is performed by looping over the Gauss points terminal of type transpose tag and with as argument an defined for the element. data) . (6) (7) // C++ result of implementation (8) template<class ThisT. (1) struct eval transpose: proto::callable (2) { (3) template<typename MatT> (4) Eigen::Transpose<MatT> operator()(MatT& mat) const (5) { (6) return mat.shape func. typename DataT> (9) struct result<ThisT(DataT)> (10) { (11) typedef const typename (12) boost::remove reference<DataT>::type::element t::shape func t& type.

This happens on lines (21)–(23). It tells Proto to perform to evaluate an expression without resorting to complicated the default C++ action when none of the earlier rules are metaprogramming techniques. This are completely free to define this type as we see fit. Proto passes it along as a template parameter.det jacobian = DataT::element t::jacobian determinant(gauss points. 0. DataT::element t::nb nodes> rhs result . so the software cannot know what concrete type is used at we provide storage for results such as the shape function compile time. In the case of our example expression. typename DataT> (4) void operator()(Eigen::MatrixXd& mat. j != DataT::element t::nb nodes. no matter how many times it occurs in the expression.row(i). By supplying the context data.shape func = DataT::element t::shape function(gauss points. (13) for(int i = 0. (9) (10) // Loop over the gauss points (11) const typename DataT::element t::gauss points t gauss points = DataT::element t:: gauss points(). the Eigen library.j).setZero(). updated. (17) rhs result += gauss weights[i] ∗ data. we need to . ++i) (14) { (15) data. We the mesh data lacks this information at compile-time. (16) data. This function updates the node coordinates and the mapping This also means that the shape function is evaluated only between local element node index and global node index. It holds the knowledge about the concrete concrete element type. i != DataT::element t::nb nodes. The once.det jacobian ∗ fem grammar()(expr.node indices[i]. To select the element at run time. M in (6)). we still need to implement using fem grammar. the loop over the elements and generate the data that is The definition of the context data used by the functors is used in expression evaluation. data. which allows In this example. after evaluating the left and right operand To obtain a working code. const ExprT& expr.e. coord mat). data. but as can be seen from Listing 14 element type and the finite element mesh that we use. it ensures that the expressions it can parse is limited only by the time and the matrix product is evaluated using the operators defined in memory available for the compilation. DataT::element t::nb nodes. data). since is logical.row(i)). (12) const typename DataT::element t::gauss weights t gauss weights = DataT::element t:: gauss weights(). but we chose not to use it in the present work and values and the node coordinates. (24) } (25) }. The data changes for each just pass an integer. the element quadrature function also using high-performance fixed size matrices from the Eigen places the result in the global system matrix (i. any shape element and is updated by the set element function that is function evaluations that occur in expr will use the correct called by the element looping algorithm (still to be discussed). the data. (18) } (19) (20) // Place the result in the global matrix (21) for(int i = 0. ++i) (22) for(int j = 0. library. DataT& data) const (5) { (6) // Temporary storage for the result from the RHS (7) Eigen::Matrix<double.node indices[j]) += rhs result(i. recursively calling fem grammar to interpret any to read and maintain. The data is templated on the given in Listing 7. where the global node index Using predefined transforms from the Proto framework for every entry in the element matrix is obtained from the and user-defined function objects allows for the creation of context data and the corresponding global matrix entry is grammars which can call themselves transforms recursively. This concise domain specific language describing grammars The final rule in the grammar (line (24) in Listing 3) is a is a key feature of Proto: it makes it possible to indicate how fall-back to the default C++ behavior. value at the current Gauss point as computed on line (15). In our example. i != DataT::element t::nb gauss points. entire class is templated on the element type. Listing 6: Functor to perform numerical integration over an element and assemble the result into the global matrix. since in a real application the user loads the mesh.Scientific Programming 7 (1) struct eval element quadrature: proto::callable (2) { (3) template<typename ExprT. The complexity of the grammar and subtrees. resulting in code that is easier matched. ++j) (23) mat(data.. (8) rhs result.

since code at runtime. We use the MPL functor defined in Listing 8 we can have different shape functions for the geometry and to match the correct element type to the mesh. allowing us to write any kind of Proto expression triangles and quadrilaterals. line (17)). we will support both 1D line a similar system that could be set up using virtual functions. in the same way fied data structure supporting expressions with an arbitrary as in the previously discussed element integration functor number of variables. (20) node indices[i] = node idx. so we can use that to determine programming [9]. ++j) (22) coord mat(i. This means we executed for each item in a list of allowed elements (line (37)). The functor is each variable that appears in an expression. parameter. (28) // Storage for the coordinates of the current element nodes (29) typename ElementT::coord mat t coord mat. In this example. where the code to generate is defined which mesh we are using. The difference in the By generating code for a list of predefined element types. Listing 7: Data passed to the functors. we can execute the correct In Coolfluid 3.connectivity[e][i]. j != element t::dimension. By checking for each supported through the MPL vector of element types. (23) } (24) } (25) (26) // Reference to the mesh (27) const fem::mesh data& mesh data. for example. elements and 2D triangle elements. We chose to organize the data so that each Once the element type check at line (13) passes. mesh data lies in the number of columns in the coordinates the technique can be seen as a special case of generative and connectivity tables.j) = mesh data. This approach makes it possible to have a uni- executed using the grammar at line (24). i != element t::nb nodes. must now generate code for every possible combination of resulting in cleaner code than a long list of if-else statements. shape functions. (32) // Value of the last shape function computation (33) typename element t::shape func t shape func.8 Scientific Programming (1) template<typename ElementT> (2) struct dsl data (3) { (4) // Required by Eigen to store fixed-size data (5) EIGEN MAKE ALIGNED OPERATOR NEW (6) // The concrete element type (7) typedef ElementT element t.coordinates[node idx][j]. the data unknown that appears in an expression has its own data can be constructed and we can start the loop. Another complication is the possibility (Listing 6. ++i) (18) { (19) const int node idx = mesh data. (34) // Value of the last Jacobian determinant computation (35) double det jacobian. (30) // Global indices of the nodes of the current element (31) int node indices[element t::nb nodes]. generate code for all the element types that are supported information. the process is more complicated. We solve this by organizing the in a call to for each element and keeping all compile-time mesh into sets that contain only one element type and then . (36) }. The actual expression is expression. aggregated into a global data structure for the data for each element at line (23). element type if the mesh matches. The expression itself remains a template to mix different element types in one mesh. (8) (9) // Construct using the mesh (10) dsl data(const fem::mesh data& d): mesh data(d) (11) { (12) } (13) (14) // Set the current element (15) void set element(const int e) (16) { (17) for(int i = 0. (21) for(int j = 0. Both of these properties are key advantages over by the solver. updating the structure.

i != nb elems. but for a more complex mesh data structure.coordinates. Listings 13 and 14 show the code for types. it would be easy to fix a first order 1D line element of the Lagrange family and the using compile-time error messages. (30) }. We do shape function library. const ExprT& expr) (35) { (36) // Allowed element types (37) typedef mpl::vector2<fem::line1d. (29) const ExprT& expr. deal with each set in turn. data). (21) for(int i = 0. possible data structures: the mesh simply contains arrays for the coordinates and the element nodes.Scientific Programming 9 (1) /// MPL functor to loop over elements (2) template<typename ExprT> (3) struct element looper (4) { (5) element looper(const fem::mesh data& m. .shape()[1] || ElemT::nb nodes != mesh. ++i) (22) { (23) data. expr(e) (6) { (7) } (8) (9) template < typename ElemT > (10) void operator()(ElemT) const (11) { (12) // Bail out if the shape function doesn’t match (13) if(ElemT::dimension != mesh.g. 0. The shape functions N + transpose). and already errors For operations at the element level. we provide our own mesh data structure. The example code uses the simplest try to evaluate an expression that makes no sense (e. but they are often obscured by a great number of the data and loop over a set of elements of the same type is as other errors. implemented a satisfactory error handling system. in Algorithm 1. (31) (32) /// Execute the given expression for every element in the mesh (33) template<typename ExprT> (34) void for each element(const fem::mesh data& mesh. the Eigen [8] library is from Eigen indicating incompatible matrix expressions come used. (38) mpl::for each<element types>(element looper<ExprT>(mesh. language grammar this is difficult and we have not yet In Coolfluid 3. At this point we might wonder what happens if we 3.. respectively.size(). const ExprT& e): mesh(m). (15) (16) // Construct helper data (17) dsl data<ElemT> data(mesh). External Libraries. this often results in a very are simplified to only provide the functionality needed for the long list of compiler messages (430 lines for the example). since it provides highly optimized routines for small. connectivity.shape()[1]) (14) return.set element(i). intend to handle some common cases. Unfortunately. The complete algorithm to create through. (25) } (26) } (27) (28) const fem::mesh data& mesh. (24) fem grammar()(expr. examples and they also include a fixed set of Gauss points for exposing Proto implementation details and long template numerical integration. expr)). (18) (19) // Evaluate for all elements (20) const int nb elems = mesh. In the current example. and numerical integration framework.connectivity. (39) } Listing 8: The element looping algorithm. fem::triag2d> element types.3.

In evaluate user op. we components. this technique is used to implement 3. are also defined this way. the correct function call operator in the type that is passed to wise assembly—are performed using the Trilinos [10] library. this currently requires the element matrix when analytical expressions are known. If we want to extend the specialized code for certain solvers. This data is then feature for this kind of embedded language.5. long Proto type name. for each element looping function as entry point.10 Scientific Programming (1) template<typename T> struct user op {}. On line (9). (2) struct my callable {}. dense matrices. Finally. so before showing the concrete examples in have to provide a way to evaluate the expression. by simply implementing for the solution of the linear system arising from the element. To provide more flexibility. We on line (30). proto::vararg<proto:: >> 3. Integration into a Framework. A particular subclass use the template argument to user op (my callable here) Action implements the Command pattern [11]. Expressions can then be added to . with the vararg allows the function to have any number of argu. our EDSL is used to implement reusable solver to make my op a user defined terminal. (5) // Can be used as a function (6) my op(1. Since this is part Some core functions. this is a unique which we want to compute the divergence. To the best of our knowledge. proto:: is a wildcard that matches any type and focused on building a small language. and we as a functor to evaluate the operator. directly setting entries of language with new functionality.e. In ments..4. This kind of construct is exactly what we need Coolfluid 3. using the MakeSFOp metafunction to avoid a created a terminal with the template class user op as type. (3) // A terminal typed using the above structs (4) proto::terminal< user op<my callable>>::type const my op={}. We will show here used to access the gradient matrix (line (12)). We return to the definition (i. This is grammar: used on line (24) to provide an alternative divergence call that does not take a mapped coordinate as an argument but eval- proto::function<proto::terminal< user uates the divergence at the current quadrature point instead. In Coolfluid 3. op<proto:: >>. such as the gradient and divergence of the programming framework. proto::vararg<fem grammar> >) (5) > Listing 10: Extension of fem grammar to add user-defined operations. User Defined Terminals. line (15) in Listing 6). (4) evaluate user op(proto::function< proto:: . This allows the user to derive a class from that for working with expressions. implementation for the divergence operation. Adding Section 4 we have to present a few concepts about our a when clause as in Listing 10 yields the code that framework and how it interfaces with our EDSL. The terminal is statically created of terminals. define how the terminal should be evaluated without ever called ProtoAction. where var is terminals that can be used in expressions without modifying expanded into the data associated with the unknown for the grammar. we need to add to fem grammar in Listing 3. which offers infrastructure to set calls the evaluate user op functor to evaluate the call (not options at run time and holds an arbitrary number of child listed here for the sake of brevity). We can easily overload the function Its use as a function (line (6)) can be matched using the signature using additional function call operators. modification to the grammar (Listing 3). by providing an overview of how this could computed the shape function value in our previous example be added to our example code. tagged using empty classes as in Listing 9. Listing 11 shows the is supposed to be modified by the user of the language. So far.2. proto::vararg<proto:: > >. much like we how it works. (1) proto::when (2) < (3) proto::function< proto::terminal< user op<proto:: > >.3). Listing 9: Definition of a terminal allowing user extension. it is not something that operations. thus creating a tree. though we still components. This uses Each class in Coolfluid 3 is derived from the fem grammar to evaluate all function arguments and then Component base class. we allow users to define new we have the function call implementation. we have Here. sparse matrix operations—required needing to touch the core grammar. user op.

(25) }.nabla(mapped coords). computed at mapped coords (8) template<typename VarT> (9) Real operator()(const VarT& var. Listing 11: Definition of the Coofluid 3 gradient operation.col(i).row(i) ∗ var.. (6) (7) // Return the divergence of unknown var. (20) } (21) (22) // Divergence at the current quadrature point (23) template<typename VarT> (24) Real operator()(const VarT& var). (13) Real result = 0. (1) // Operator definition (2) struct DivOp (3) { (4) // The result is a scalar (5) typedef Real result type. . xi): compute the divergence at any mapped coordinate xi (29) // divergence(v): compute the divergence at current quadrature point (30) static MakeSFOp<DivOp>::type const divergence = {}. using a user-defined terminal. ++i) (16) { (17) result += nabla. (18) } (19) return result. (14) // Apply each component and return the result (15) for(int i = 0. const typename VarT::MappedCoordsT& mapped coords) (10) { (11) // Get the gradient matrix (12) const typename VarT::GradientT& nabla = var.value().Scientific Programming 11 Require: Mesh with field data for ? variables Require: A compile-time list of ? shape functions ?? that may be used Require: An expression ? Ensure: Construction of context data of the correct type for all shape functions ?? do if ?? matches the geometry shape function then set geometry shape function: ?? = ?? for all variables ? do for all ?? compatible with ?? do if ?? matches the variable shape function then set ?? = ?? end if end for end for create context data ? using known ?? and ?? (∀?) for all elements ? do execute grammar (?. i != VarT::EtypeT::dimensionality. (26) (27) // The terminal to use in the language (28) // divergence(v. ?) end for end if end for Algorithm 1: Element looping algorithm.

the temperature is defined on line (2). The elements expression function indicates that the loop Boundary and initial conditions are provided in a similar will happen over the elements and is comparable to the fashion. choosing an expression that loops over nodes and the expression.. so here we multiplication result is then stored in the Proto expression only use A(f) to indicate that the element matrix contains . We distinct number to distinguish the variables at compile then create the expression component and set its expression time. We also need a way of accessing the mesh and 4. so it can be reused in any solver. Finally. using a specifies whether we are dealing with a scalar or a vector. the ? contributions from the ? equation (again. The Poisson problem was already used This seems redundant here. (5) // Set an expression to loop over nodes (6) action->set expression(nodes expression(T = 288. which uses its own expression templates. The evaluation. Furthermore. create temporary matrices on the fly. The actual 3. simply assigns the value 288 to the temperature field here. such libraries cause problems when embedded into a into one loop over the elements. the applicable element types are chosen (all ele- ments from the first order Lagrange family here). A basic action for looping In this section we work out a few examples to illustrate the over nodes could be added to a parent component as in mapping between the mathematical formulation and the code Listing 12. [12] review some cases where to it. The assembly makes use of Proto expression tree. which we do by making use of FieldVariable terminals. Expressions in Coolfluid 3 can loop over either process is transparent to the user of the expressions and elements or nodes. rows in this case). since they can be combined out.6. Any matrix is also determined by looking for blocks like this. f) would return a block that only contains these temporaries might appear. This must have a different number. Here. which are set to of complex operations. resulting in memory errors. The size of the element matrix that can hold the result of the multiplication.12 Scientific Programming (1) // Terminal that refers to the temperature (2) FieldVariable<0. zero on line (14). and Iglberger et al.e. Poisson Problem. On line (7). for different problems. but there the component is parametrized on the for each element function in the simplified example. The node allows writing matrix arithmetic of arbitrary complexity. Here. the multiplication expression is the presence of multiple variables to select only the relevant modified to store a temporary matrix—allocated only once— entries in an element matrix. line (11). indicating that it is stored as variable T in the field temperature field. Listing 12: Loop over nodes.1. The preprocessing happens by calling an addi- framework executes it transparently just like any other tional functor in the element looping function. Compatibility with Matrix Expression Templates. we provide some one variable. in this case setting the temperature to 288 K. the expression is set on line (9). At the expression starts on line (12). A library like Eigen will. respectively. On field and variable names. The prob. When a temporary object is passed on through the functors An expression of the form A(f) returns only the rows of that evaluate our expressions we return references to these the element matrix that refer to the equation for ? (i. we create an Action component to hold on line (6). as a child component of parent (4) Handle<ProtoAction> action = parent.create component<ProtoAction>("Action"). there is the declaration of the unknown f and Each distinct variable that is used in the same expression the source term g on lines (2) and (4). 4. in the case an element matrix A and element vector a. which is why each the introduction. all temporary objects. First. using different grammars. ScalarField> T("T". all rows process the expressions using a special grammar: whenever and columns in this case). If we pass a second variable as argument. an object tree by creating a ProtoAction and setting itself and a reference to it is returned during matrix product its expression using the set expression function. This notation is convenient in a matrix product is found. (7) // Execute the action (8) action->execute(). As it turns steps are required in the assembly. This is an advantage if multiple library. Application Examples the unknowns. so the whole Action. expressions are primarily used for setting boundary and initial conditions and to update the solution. (3) // Create a new action. To handle this issue. so A(f. variable has a unique number in the first template argument. "temperature field"). with a call to group to combine element level. we pre. the parser needs to be able to more details on the code from Listing 1 that was skipped in distinguish each variable at compile time. lem is common to all modern matrix expression template the returned matrix only contains the columns that refer libraries. but fields can store more than as introductory example in Section 2.)). matrix calculations are delegated to the Eigen multiple expressions into one. setting a temperature field. The second template argument follows the same mechanism as in Listing 12.

Navier-Stokes Equations Using Chorin’s Method. (13) (14) // Compute the shape function vector at mapped coordinate c (15) static shape func t shape function(const coord t& c) (16) { (17) const double xi = c[0]. nb nodes.Scientific Programming 13 (1) /// 1D Line shape function (2) struct line1d (3) { (4) static const int nb nodes = 2. The assembly into the global system is a separate step and occurs on lines (20) and (21). const coord mat t& node coords) (26) { (27) return 0. At run time. (8) system rhs terminals keep track of wrapper objects for a ?u ∇? Trilinos matrix and vector. Navier-Stokes equations for incompressible flow with velocity The element integral fills these matrices on lines (17) vector u. using a generic linear solver API + (u ⋅ ∇) u + − ]∇2 u = 0. nb nodes> shape func t. (18) shape func t result.5∗(node coords[1] − node coords[0]). dimension> coord mat t. The same applies to the element 4.5∗(1. in a similar fashion to the simple example. Listing 13: Element type for first order line elements of the Lagrange family. // Number of nodes (5) static const int dimension = 1. pressure ?.+xi). all of which can be controlled from a computing first an auxiliary velocity uaux . 1> gauss points t. (20) result[1] = 0. (28) } (29) (30) static const int nb gauss points = 2. nb gauss points. The system matrix and ∇ ⋅ u = 0. (19) result[0] = 0. They are initialized in a base class that provides the functionality for working with expressions and linear systems. (6) (7) // Type of the mapped coordinates (8) typedef Eigen::Matrix<double. followed by the Python script.5∗(1. nb gauss points. (21) return result. 1. only a single equation for ?. (31) // Type of the matrix with the Gauss points (32) typedef Eigen::Matrix<double. (9) // Type of the shape function vector (10) typedef Eigen::Matrix<double. dimension> coord t. (37) // The Gauss weights (definition omitted) (38) static const gauss weights t gauss weights(). solve the linear system. (11) // Type of the coordinates matrix (12) typedef Eigen::Matrix<double. Chorin proposed an iterative time stepping scheme. (9) ?? ? that is part of Coolfluid. 1> gauss weights t. and kinematic viscosity ] are and (18). The right hand side vector a using square brackets. pressure and finally the corrected velocity at the new time . (33) // The Gauss points for the current shape function (definition omitted) (34) static const gauss points t gauss points(). We will first solve this system of equations using Chorin’s The code in Listing 1 builds a component that assembles method [13]. (22) } (23) (24) // Compute the jacobian determinant (25) static double jacobian determinant(const coord t& mapped coord. it can be combined with other performance later on with the same example from the FEniCS components to add initial and boundary conditions and to project. 1. (35) // Type for the weights (36) typedef Eigen::Matrix<double. since this will allow us to easily compare the linear system.−xi).2. (39) }.

2> connectivity. We store the nodal values for a vector value ?̃?? and a vector of values for each node of the element variable as a matrix. segment into the element right hand side vector. If we restrict the (11) solver to triangles. const int nb nodes per elem. and pressure using second and first order shape functions. we will interpolate the velocity ∫ ∇NTu ∇Nu dΩ? p?+1 ? =− ∫ NT (∇N? )? dΩ? (uaux ? )? . The code to build this linear sure. invdt returns Δ? Ω? u ? a reference to the inverse of the time step. For the right hand side assembly on line here through the third constructor argument for the velocity (22). This matrix is obtained ? )? . The same applies for the ( ∫ NTu Nu dΩ? + ]∫ ∇NTu ∇Nu dΩ? ) (uaux ? )? Δ? ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ Ω? ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ Ω? mass matrix on line (20). Here. (15) }. we used the divergence function that was defined in variable on line (6). with each column corresponding to one for the unknowns (uaux physical component of the vector. so we can run element quadrature( a[p] the assembly procedure only once. since in this case the matrix like this: matrix coefficients do not change with time. through the following Poisson problem: system is presented in Listing 15. The actual shape function is chosen at run time. The notation is a bit verbose. 2> coordinates. += transpose(N(p)) The Laplacian is written exactly the same as in the Poisson ∗nabla(u)[ i]∗ col(nodal values(u).1). indicates the block corresponding to the rows and columns The next step in Chorin’s algorithm calculates the pres- of component ? of the velocity. The weak form (written again for a single element) for problem (Section 4. The last term represents the step ?. using u[ i] to get each component of the The superscript ? indicates the known solution at time interpolated velocity vector u ̃ . We split up but if the user determines this to be a problem it is easy to the element matrix as indicated by the braces. the index Ω? i is used again. (7) connectivity(boost::extents[nb elems][nb nodes per elem]) (8) { (9) } (10) (11) // Global coordinates array (12) boost::multi array<double. addressing the diagonal blocks of the 1 element matrices as defined in (10). which is a scalar in the case of the interpolated nodal values vector u? . but a new aspect is the introduction the uaux equation is of the index i when indexing into the element matrix. Listing 14: The mesh data structure. It is automatically expanded into a loop over the number of physical dimensions. step. Listing 16 shows the code for this system. The index ? represents the ?th component of the advection. for each value of ?. 1 To get a stable solution. where ??? ?? introduce a user defined terminal for the advection operation. The definition of the element matrices ? and ? is provided on lines (19) and (20). again using two but we enforce the use of second order for the velocity assembly actions. We could also write this in terms of the nodal values matrix and right hand side assembly. we can obtain this using the typedef on line (2). (13) // Global connectivity array (14) boost::multi array<int. const int nb elems. Ω? Δ? Ω? ? respectively (the Taylor-Hood element). which is set by the user running the simulation. ̃ ? ∇Nu (u?? )? dΩ? = a?? . const int dimension): (6) coordinates(boost::extents[nb nodes][dimension]).14 Scientific Programming (1) /// Unstructured mesh data (2) struct mesh data (3) { (4) // Construct using a number of nodes and element properties (5) mesh data(const int nb nodes. This means that. i)) . and both matrices are combined ??? ?? ? ?? ?? and assembled into the global system on line (22). We use separate actions for the system Listing 11. − ∫ NTu u In the right hand side expression on line (34). The (10) auxiliary lss component provides access to the terminals 1 = ∫ NT ?̃? dΩ? related to the linear system and the time. and it requires access to a single component of the velocity here. we insert a square block with dimension equal to the number using the nodal values function while individual columns of nodes into the element matrix and a corresponding can be addressed using col.

and it is needed to delay the evaluation of the minus mechanism. rather than a reference to the current value. On line (23). VectorField> u("u". (24) (25) // RHS assembly (26) Handle<ProtoAction> auxiliary rhs assembly = (27) auxiliary lss->create component<ProtoAction>("RHSAssembly"). (7) // LSSActionUnsteady links with the linear algebra backend and the time tracking (8) Handle<LSSActionUnsteady> auxiliary lss = (9) create component<LSSActionUnsteady>("AuxiliaryLSS"). (37) auxiliary lss->system rhs += a (38) ))).lit(correction lss->dt()) The final step of the algorithm updates the velocity. (5) // The velocity (6) FieldVariable<0. "cf3. using ∗(nabla(p)[ i]∗nodal values(p))[0]) the gradient of the newly calculated pressure: The implementation of Chorin’s method shows how ∫ NT? N? dΩ? (u?+1 ? )? different systems can be combined to solve a problem with Ω? multiple unknowns. As we assemble it in its own action. u[ i]) += transpose(N(u))∗N(u) (21) ). as a user-configurable constant: (4) PhysicsConstant nu("kinematic viscosity"). we call the lit function on invdt. Listing 15: Code to build the linear system for uaux in Chorin’s method. ∗(u[ i] . i) (36) ). The coding of the assembly procedure remains con- =∫ NT? ?̃?aux dΩ? − Δ? ∫ NT? (∇N? )? p?+1 ? dΩ? .LagrangeP2"). T(u) = 0. (3) // Kinematic viscosity. (17) element quadrature (18) ( (19) A(u[ i]. Note the similarity here with an alternative to the use of the Taylor-Hood element used . (13) auxiliary mat assembly->set expression(elements expression(LagrangeP1P2(). PSPG/SUPG Stabilized Incompressible Navier-Stokes. and just as is the case with the divergence it can sign. The gradient function is actually a Proto function that constructs a terminal in.Scientific Programming 15 (1) // Allow a mix of first and second order shape functions (2) typedef boost::mpl::vector2<mesh::LagrangeP1::Triag2D. (29) group (30) ( (31) a[u] = 0. (22) auxiliary lss->system matrix += auxiliary lss->invdt()∗ T + nu∗ A (23) ))). so as seen in Listing 17 4. (10) // The matrix assembly (11) Handle<ProtoAction> auxiliary mat assembly = (12) auxiliary lss->create component<ProtoAction>("MatrixAssembly").3. on line (18) is defined using the user defined function place.mesh. be written using nodal values as well: resulting in the storage of a copy of the negative inverse transpose(N(u)) timestep at expression creation. Ω? Ω? The system matrix is the mass matrix. u[ i]) += transpose(nabla(u))∗nabla(u). which would otherwise be evaluated right away by C++. (28) auxiliary rhs assembly->set expression(elements expression(LagrangeP1P2(). each interpolated using a different shape (12) function. (20) T(u[ i]. (14) group (15) ( (16) A(u) = 0. (32) element quadrature (33) ( (34) a[u[ i]] += auxiliary lss->invdt() ∗ transpose(N(u))∗u[ i] − (35) transpose(N(u))∗(u∗nabla(u))∗ col(nodal values(u). mesh::LagrangeP2::Triag2D> LagrangeP1P2. cise and follows the structure of the mathematical equations. This the stand-alone example equation (6). "navier stokes u velocity".

This results in a (15) . We also start from the skew T ̃ adv ∇N? ) dΩ? . (14) ? ?? ? = ∫ T ̃ adv ∇N? ) (∇N? ) dΩ? . Δx? . (23) pressure lss->system rhs += −lit(pressure lss->invdt())∗ a (24) ))). (12) element quadrature( A(p) += transpose(nabla(p))∗nabla(p)). can be written as Ω? + ? ?? ?? . (5) // The assembly action (6) Handle<ProtoAction> pressure mat assembly = (7) pressure lss->create component<ProtoAction>("MatrixAssembly"). in the previous example. using the default first order shape function (2) FieldVariable<1. (22) element quadrature( a[p] += transpose(N(p))∗divergence(u)). (8) pressure mat assembly->set expression(elements expression(LagrangeP1(). Ω? ? ?u u (∇ ⋅ u) ∇? 1 T + (u ⋅ ∇) u + + − ]∇2 u = 0. (3) // The linear system manager (4) Handle<LSSActionUnsteady> pressure lss = create component<LSSActionUnsteady>("PressureLSS"). We follow the method presented in [14]. as well as a bulk-viscosity term. (15) (16) Handle<ProtoAction> pressure rhs assembly = (17) pressure lss->create component<ProtoAction>("RHSAssembly"). (N? + ?SUPG u Δ? Ω? ? ? T ???? = ∫ ?PSPG (∇N? )? N? dΩ? . Ω? solving for the difference between two time steps. + ?PSPG (∇N? )? u symmetric momentum equation. 1 1 ( ? + ??) Δx? = −?x?? . and so on. the discretization of the skew ? ?? ?? symmetric form preserves the kinetic energy [15]. where the blocks are given by a stabilization term to the continuity equation. first the pressures for all nodes. yielding the following form to replace (9): 1 ? ?? = ∫ ?PSPG ∇NT? ∇N? dΩ? . (13) ? ?? ?? =∫ (?BULK (∇N? )? + (̃ ̃ adv ∇N? )) uadv )? (N? + ?SUPG u ?? 2 ? Ω? 2 ⋅ (∇N? )? dΩ? . The vector of unknowns x?? is arranged in blocks for each T ̃ adv ∇N? ) N? dΩ? . (19) group (20) ( (21) a[p] = 0. We applied a theta scheme here for the time discretization. (9) group (10) ( (11) A(p) = 0. (13) pressure lss->system matrix += A (14) ))). The weak form of the equations for a single element. When using Ω? 2 Crank-Nicolson time stepping. provided that we add and ?. that is. we can use equal interpolation corresponding blocked structure for the element matrices ? order for the pressure and velocity. pressure lss->solution tag()). after discretization = ∫ (]∇NT? ∇N? + (N? +?SUPG u ̃ adv ∇N? ) u T ̃ adv ∇N? ) dΩ? in time. then the Ω? velocities in the ? direction. ScalarField> p("Pressure". the method is second order accurate in both time and space. In the absence of stabilization. Listing 16: Pressure Poisson problem for Chorin’s method. which adds PSPG and SUPG ̃ adv ∇N? ?PSPG u T ? ??? = ∫ ((N? + ) (∇N? )? stabilization. ??? ?? = ∫ (N? + ?SUPG u unknown. (18) pressure rhs assembly->set expression(elements expression(LagrangeP1P2().16 Scientific Programming (1) // The pressure field.

4. passing a reference to a double for the Poisson problem. (5) group (6) ( (7) A(u) = 0. using the application examples from the previous ? = 1 + ?2 + 2?2 over Γ. The bulk viscosity and skew Trilinos version 11. Finally. Poisson Problem. The element equations (4) can easily each coefficient (line (7)). We calculate them using a user-defined 5. Table 1 lists our system characteristics. the code is DOLFIN on the cluster.3 symmetric terms in the momentum equation fill the off- diagonal blocks. allowing a comparison between manually coded versions.1 11. The PSPG/ the elements expression defined in Listing 18 (showing only SUPG tests were run on a cluster with 28 nodes. adding Compiler GCC 4. represents the CPU(s) Intel i7-2600 Two Intel Xeon E5520 pressure Laplacian that arises from the PSPG stabilization. (19) correction lss->system rhs += a (20) ))). (9) correction lss->system matrix += A (10) ))). ? ?? ?? . The value of the coefficients ? depends on local element properties [16]. represented by x. u[ i]) += transpose(N(u))∗N(u)). The right hand side from (14) is built code using a virtual function interface to the shape functions by applying the element matrix ? to the current element and code generated using our language. the manually coded version and with DOLFIN [4] from the improving stability.1.8. for example. the tests comparing our results with integrated into the framework in the same way as in Listing DOLFIN were run on a separate desktop computer. (2) (3) Handle<ProtoAction> correction matrix assembly = correction lss->create component< ProtoAction>("MatrixAssembly").0 more weight to upstream nodes. To avoid any difficulties in installing Since only one linear system is involved. On line (22). a filling all ? ?? ?? blocks. Poisson and Chorin PSPG/SUPG (per node) symmetric terms. On line (13). (8) element quadrature( A(u[ i]. (4) correction matrix assembly->set expression(elements expression(LagrangeP1P2(). RAM 16 GB 24 GB The SUPG terms in blocks ? ?? ?? . as indicated by the use of both indices ? and ?. In this section we discuss the results of some performance (16) tests. Listing 17: The code for the correction step in Chorin’s method. (14) group (15) ( (16) a[u] = 0. (11) (12) Handle<ProtoAction> correction rhs assembly = correction lss->create component<ProtoAction>( "RHSAssembly"). j to create a nested loop over the dimension of the problem. using 1 Gb Ethernet. we use indices i and be calculated analytically when using linear shape functions. coefficient ?.2 written as a modification to the weighting function. Performance Analysis ? = −6 over Ω. (13) correction rhs assembly->set expression(elements expression(LagrangeP1P2().2. Our first performance test concerns terminal compute tau. Additionally. .2 GCC 4. Splitting the equation into blocks helps in man- aging the added complexity due to the stabilization and skew. on line (23) we write the system FEniCS project. 1. we divide by compare with a specialized user-defined terminal containing ? to avoid the use of the ? scheme on the continuity equation. Block ? ?? . Problem (4) is completed with boundary matrix as in the left hand side of (14). conditions and a source term identical to the Poisson demo case from FEniCS: 5. (17) element quadrature( a[u[ i]] += (18) transpose(N(u))∗(u[ i] − lit(correction lss->dt()) ∗ gradient(p)[ i])).7. Equation (14) is assembled into a single linear system using section. connected the part relevant to the assembly itself).Scientific Programming 17 (1) Handle<LSSActionUnsteady> correction lss = create component<LSSActionUnsteady>(" CorrectionLSS"). Each stabilization term is multiplied with a corresponding Table 1: System characteristics for the performance tests. we unknowns x? . and ??? ?? are all Operating system Fedora 18 CentOS 6.

Due to the simplicity of the Poisson problem. nu eff. (18) T(p. Given the large problem size. (7) compute tau(u. divided into 1000 gperftools (http://gperftools. the solution for the least of the order of the time it takes to compute the element discrete system captures the analytical solution up to machine matrix itself. On the from the Lagrange family are used. lit(tau ps). u[ j]) += transpose((tau bulk + 1/3∗nu eff)∗nabla(u)[ i] (14) + 0.35 1. u ref. runs. We run the benchmark using both the Trilinos backend Virtual 2. generat- parts in both the ? and ? direction.27 0.03 be more expensive than the evaluation of the element inte- Manual 0. As an illustration. the Proto 0.82 52. using project demo. DOLFIN 0. “Virtual” is the code using the virtual function is due to the extra matrix operations involved in the sec- interface to the shape functions as it is available in Coolfluid ond order quadrature. u[ i]) += nu eff ∗ transpose(nabla(u)) ∗ nabla(u) (16) + transpose(N(u) + tau su∗u adv∗nabla(u)) ∗ u adv∗nabla(u).93 0. Each square cell is ing the call graphs shown in Figure 3.054 1 0. p) += transpose(N(u) + tau su∗u adv∗nabla(u)) ∗ nabla(p)[ i] / rho. u[ i]) += transpose(N(u) + tau su∗u adv∗nabla(u)) ∗ N(u) (20) ). The difference between assemble the linear system as shown on line (41). we can conclude that its inherent overhead is entirely. (22) A(p) = A(p) / theta. T = 0.13 3.3 s in Coolfluid 3 and 0.2 3. second order Gauss quadrature—is more than 5 times slower The terminal assemble triags can then be used to directly than the manually coded version. u[ i]) += tau ps ∗ transpose(nabla(p)[ i]) ∗ N(u). All timings represent the average of 10 assembly first order triangle shape functions on a 1000 × 1000 grid.35 (using an Epetra CRS matrix) and a “dummy” matrix—not storing any data—to properly time the assembly procedure. avoiding the use of Proto mechanism. small. manually coded version uses the same algorithm. that is.18 9.18 Scientific Programming (1) assembly->set expression(elements expression (2) ( (3) AllElementsT(). (19) T(u[ i]. For the specialized function (middle .com/).5) ∗ nabla(u)[ i] (11) + tau ps ∗ transpose(nabla(p)[ i]) ∗ u adv∗nabla(u). We confirmed this by profiling the assembly with We first run a test on the unit square. (12) A(p. the code for the matrix.31 5. (13) A(u[ i]. (8) element quadrature (9) ( (10) A(p. Table 2: Linear system assembly times (wall clock time and timing 3. (17) A(u[ i].61 1. lit(tau su).googlecode. and finally “DOLFIN” is the code generated by the FEniCS relative to Manual) for the Poisson problem on the unit square. The the specialized and the manual versions is much smaller. The terminal from Listing 19. “Proto specialized” is the user-defined calls to other functions (mostly the shape functions).069 1. (23) system matrix += invdt() ∗ T + theta ∗ A (24) ))). at When using linear shape functions. we see that the generic Proto code—which uses specialized user-defined terminal is presented in Listing 19. Listing 18: The assembly of the PSPG/SUPG stabilized incompressible Navier-Stokes equations. Table 2 summarizes the left. p) += tau ps ∗ transpose(nabla(p)) ∗ nabla(p) / rho.32 5. with only 15% of the time spent in EDSL (see Listing 1). lit(tau bulk)).5∗u adv[ i]∗(N(u) + tau su∗u adv∗nabla(u))) ∗ nabla(u)[ j].32 As seen from Table 2. the generic Proto code is seen to be mostly inlined results.8 s in DOLFIN. Each graph starts in divided into two triangles and first order shape functions the execute method of the relevant Action class.34 1 grals. (4) group (5) ( (6) A = 0. but here Since the specialized code still uses the Proto element looping we also loop over elements directly. (15) A(u[ i]. 10 runs are representative Dummy matrix Epetra matrix and the variation between subsequent averaged runs was less Wall clock (s) Relative Wall clock (s) Relative than 2%. (21) system rhs += − A ∗ x.74 1. “Manual” is the manually coded large absolute execution time (numbers next to the arrows) assembly loop. with labeling as follows: “Proto” is the code usng our into the element looper. When comparing the timings for the dummy precision at the nodes. u[ i]) += transpose(N(p) + tau ps∗u adv∗nabla(p)∗0.79 insertion into the global sparse matrix structure can actually Proto specialized 0. the overhead of the matrix insertion is about 0.

while our in DOLFIN. The lem on the unit cube. but we do emphasize that Proto. In the auxiliary matrix. 10 fewer ticks for the overall execution time. default 3. expressed in number of “ticks. A separate to the lower number of elements (each hexahedron represents test comparing dynamically and statically sized matrices (size 6 tetrahedra). causing DOLFIN to apply quadrature. since this code can be applied in 3D relative to the default Proto implementation) for the Poisson prob.0%) (0. . and one for each right hand side. and the manually coded version. If the performance penalty is performed in the loop function. using first order tetrahedron or hexahedron effect of the quadrature order is obvious here. which recognizes the coefficients of the element Proto code assumes the worst case and applies second order matrix can easily be precomputed. The element worse than any other method. avoiding quadrature. code does not allow this optimization to happen automati- The above observation leads us to perform the test in cally. The presented timing we present timings here for comparison purposes. Even though the In DOLFIN. it the user-defined PoissonTriagAssembly operator. divided by the number summarizes the results. 4 × 4) from the Eigen library shows a slowdown by a factor of about 8 for dynamic matrices when running matrix-matrix 5. although it is of course possible to add a user-defined 3D.48 we do not have a method for determining the integration DOLFIN 1. puted using separate assembly loops. resulting in longer com- generic code for all elements but does so by using virtual putation times for the matrix operations. Finally. order quadrature being almost three times slower than the Dummy matrix Epetra matrix first order quadrature performed by DOLFIN. while the rest is executed in significant.37 5.0%) (0. our integration system assumes there is a mass matrix term in the equation and proceeds to choose the integration order graph). without modification.6%) 47 (100. that is.” Table 3: Linear system assembly times (wall clock time and timings DOLFIN versions. matrices only need to be assembled once for the simulation. resulting in altogether.03 it is only obtained after modification of the integration code: Proto. This is due to an optimization detect here that first order quadrature is sufficient.0%) Figure 3: Call graphs of optimized code for the Poisson element matrix computation.Scientific Programming 19 ProtoAction ProtoAction execute execute (0.64 0. The speedup is as expected. it allows writing matrix dimension is also doubled. the right hand side and the matrix are com. with our second shape functions on a 100 × 100 × 100 grid. there are a total and matrix-vector multiplications. This is surprising. but and built up of tetrahedra. For this particular case. The virtual function implementation performs much where the second order quadrature is necessary. Like Proto. In Chorin’s method. reinforcing the results of 6 different assembly loops to be run: one for each system from Table 2. except for the auxiliary of assembles executed. is possible to use a user-defined terminal to override the in the manual version on the right. We see that. We also include some results for hexahedral elements.25 2. This makes compensated here when inserting into the sparse matrix. we temporarily forced our code to use first Wall clock (s) Relative Wall clock (s) Relative order quadrature.65 1. using a unit cube with 100 segments in each direction terminal.0%) 47 53 312 PoissonTriagAssembly PoissonManualAssembly ElementLooperImpl execute 53 operator 266 operator (74.05 0. hexahedra 4.92 order based on the terms appearing in the equations. as is mostly the case with simple problems. The results are shown in Table 3.0%) (85. all operations are in the integration method.0%) 313 63 ElementsExpression TimedAction ElementsExpression loop 16 execute loop (25. We only compare the Proto and here it is divided by Δ?. matrix. Instead. Chorin’s Method. We see that this is function calls and dynamically allocated matrices. with a very large discrepancy for since the more advanced integration routines in DOLFIN the correction matrix assembly. or even to avoid numerical integration specific implementation of the execute function. Our integration.40 5. the same term appears. using one quadrature point instead Proto. “dummy” matrix is used. The numbers next to the arrows indicate the absolute execution time. DOLFIN is faster every time when the DOLFIN result in the same performance. To check if this is the only effect. The percentages indicate the time spent in each function relative to the total execution time.29 1 5.81 0.4%) (0. due the method much slower than the Proto code.51 1 of four. some operations related to index book-keeping are based on the shape function. Table 4 is the total time spent on assembly. 1st order 0.60 1. Proto and matrix assembly.22 0. from left to right: the generic Proto code.2. the Proto code with user-defined terminal.

(34) lss. The Proto-generated code is currently suboptimal for the 5. While this problem is the flow between two infinite flat plates. j != 3. (12) (13) // Compute normals (14) ElementT::NodesT normals.element connectivity()).rhs().add values(acc). of Table 4. The vector calculations will not be a dominant factor in the total average timings for the first 100 timesteps (initial Courant solution time.rhs[0]= (2∗f0 + f1 + f2). (15) normals(0. valid for linear shape functions over a triangle. LSST& lss. (20) (21) // Indices of the nodes of the current element (22) acc.mat(i. XX)∗normals(j.nodes(). (16) // .acc. using the PSPG/SUPG chances for matrix reuse: the advection operation in (10).f1 and f2 (31) acc. XX) + normals(i.neighbour indices(f. (10) // Get the coordinates of the element nodes (11) const ElementT::NodesT& nodes = f. m block accumulator) (45) )). ++i) (25) for(Uint j = 0. . / (2. (27) (28) // Get the values of the source term (29) const Real f0 = f. (44) assemble triags(f.∗det jac). YY) − nodes(2. YY). XX). YY)).add rhs values(acc). (5) // Functor that takes: source term f. i != 3.3. (35) lss. typename LSST> (7) void operator()(const FT& f. We initialize linear system. assemblies of the right hand sides.and level is adequate for practical use and the element matrix and span-wise directions and a no-slip condition at the walls.. .rhs[2] (33) acc. .support(). it is much less apparent when looking at the the flow using a laminar solution with centerline Reynolds results for Epetra matrices and vectors in the last column number of 11250 with respect to the channel half-height.20 Scientific Programming (1) // Specialized code for triangles (2) struct PoissonTriagAssembly (3) { (4) typedef void result type. XX) = nodes(1. . YY)∗normals(j.rhs ∗= det jac/24.rhs[1] and acc. number: 0. .57) are presented in Table 5. YY)∗normals(2. YY)∗normals(1. that is. (40) // Usage example: (41) assembly->set expression(elements expression (42) ( (43) boost::mpl::vector1< mesh::LagrangeP1::Triag2D>(). . Linear system lss. This leads us to conclude that our performance We apply periodic boundary conditions in the stream.repetitive code omitted (17) // Jacobian determinant (18) const Real det jac = normals(2. (19) const Real c = 1. for stabilized Navier-Stokes formulation from Listing 18. In the last performance test. j) = c ∗ (normals(i. a effect is significant when we eliminate the influence of the 3D channel flow with two periodic directions. math::LSS::BlockAccumulator& acc) const (8) { (9) typedef mesh::LagrangeP1::Triag2D ElementT. (32) // . (30) // .value()[0]. Listing 19: Code for the specialized user-defined terminal for the Poisson problem. This is due to some missed we take a look at a practical example. (36) } (37) }. We ran the test using . (23) (24) for(Uint i = 0. ++j) (26) acc.matrix(). Channel Flow Simulation. element matrix and vector acc (6) template <typename FT.support(). The test example. is calculated once for every component. XX) − normals(1. (38) // Create an terminal that can be used as a function in a proto expression (39) static MakeSFOp <PoissonTriagAssembly>::type const assemble triags = {}. system matrix.

33 10. compared between our Proto expressions and DOLFIN. Proto uses concise grammars to describe preconditioned using ML algebraic multigrid.86 0.09 5. Any We also analyzed the performance.42 Corr.Scientific Programming 21 Table 4: Assembly times for each step in Chorin’s method.13 Tetra specialized 0.67 0. We detail using the stand-alone example. only the second largest consumer of CPU time by a large The user-defined code for tetrahedra results in a further margin. Our work used a specialized code wrapped into a user-defined terminal is set apart from other work in this area by the use of the (“Tetra specialized” in the table).53 s 21.51 Tetra specialized 2. while staying within the automated framework for the relatively complicated assembly expressions of (15).97 s 90.38 0.98 2.47 128 Tetra 2.58 s 54. to better easily specify the generic formulation and then check the deduce the required quadrature order. our element looping.14 s 40.12 1.53 0. It is our opinion that the sacrifice of some speed in speedup factor of 2.19 0.89 s 48. grammars.23 8. where the test on tetrahedra also clean separation between numerics and equations.05 s 32. The language for recurring matrix products and calculate each product mirrors the mathematical notation faithfully.73 s 70.17 0. Relative is the DOLFIN timing in multiples of the Proto timing.22 0. # CPU Element Assembly Solution solution/assembly Hexa 8. In pared to manual implementations and FEniCS. Better error handling 6. providing a only once. such as variadic templates and automatic return type deduction.5. time for model development. This makes the assembly between the nodes. This shows that.32 0.55 Aux.85 0. A large scale this context.88 64 Tetra 4.59 1. for example. matrix 4. This may be Navier-Stokes equations showed that assembly took up to 10 related to the relatively slow communication (1 Gb Ethernet) % of the linear system solution time.99 Corr. even for possible. One final interesting possibility is the investigation of expression optimization techniques. matrix 2.45 s 40.73 ? RHS 0.34 7.09 1.95 s 25.14 32 Tetra 7. language can be used to assemble the system efficiently.67 Hexa 3.26 1.69 s 73.47 0.17 17. defined terminals. We tweaked and extend the functionality of the language.Proto library and the possibility to implement user solved using the Belos Block GMRES method from Trilinos.375 3.28 1. . In this case.15 s 27. some parts of the code could be simplified by using new the automatically generated matrices. Using We presented a domain specific language for the implementa. A domain specific language can also assist in developing Possible directions for future development include hand-tuned code. written only for tetrahedra.23 0. as explained in the settings to obtain the fastest possible solution time. The linear system was Boost.15 2. embedded in C++.32 s 46. however: using the language we can first changes to the numerical integration framework.99 s 46.33 s 9. On a more technical element matrices of manually coded optimizations against level.40 0. Wall times are in seconds. The addition of user see that the solution of the system takes about 10 times as defined terminals allows using hand-optimized code when long as its assembly using our EDSL. features of the C++11 standard.85 0. the code was reused from the assembly is acceptable in view of the reduced turnaround a previous version of the solver.35 0.26 Table 5: Assembly and solution times for the coupled PSPG/SUPG stabilized Navier-Stokes equations (Listing 18) on a 3D channel flow with 128 hexahedra (tetrahedralized in the tetra cases) in each direction.90 s 10. Dummy matrix Epetra matrix Proto DOLFIN Proto DOLFIN Wall Wall Relative Wall Wall Relative Aux. RHS 3. RHS 1. Conclusion and Future Work can also be looked into. demonstrating—in further optimization should first focus on the linear system our opinion—acceptable abstraction overhead when com- solution before the assembly will become a bottleneck.70 hexahedra and tetrahedra.51 ? matrix 0. it should be noted that the solution of the linear test with the PSPG/SUPG method for the incompressible system does not scale as well as the assembly.81 Tetra specialized 1. it is theoretically possible to scan the expressions tion of finite element solvers.28 0.75 1.18 2.91 s 10.99 Hexa 4.59 s 9.06 s 9.

A. Tabor. Budapest. 620–631. 14. no. 34. p. Deconinck. The authors declare that there is no conflict of interests [16] T. adding a “dummy” linear [4] A. J. 1. 2003. Prud’homme. J. “A tensorial code for the stand-alone example can be found in the approach to computational continuum mechanics using object- repository https://github.” functions Tech. vol. Modelling Fluid Flow (CMFF ’06). https://bitbucket.” 2012.” Applied Numerical Mathe- matics. 1. Chorin. J. Rep.” SIAM Journal on Scientific Computing. October 2007. 2. Fureby.io/. 2010. Coolfluid 3 is licensed under the [1] T. Tools. 2000. 2006. Mardal. Vanden Abeele. The Finite Element Method. 12. Eigen v3. from the linear system backend..” ACM Transactions on Mathematical Software. integration and Nomenclature variational formulations. ?Ω : Number of elements in the mesh [13] A.cpp contains the complete running program. W. 1998. 2003.com/barche/eigen-proto/.” plugins/UFEM/test/demo directory. pp. fem example.” Scientific Programming. Eisenecker. “Proto: a compiler construction toolkit for DSELs. no.-A. no. vector for the ?th component of the vector [7] O. pp. Iglberger. Rüde. Tezduyar and S. [15] T. G. component of vector variable u and the [9] K. 104. 2006. The [3] H. Niebler. f? : Vector of values at the element nodes for [11] R. M. no. 1. vol. ̃ ?: Value of unknown ? interpolated by shape [10] M. UK. Johnson. 2012. 4. Hungary. without which this work would not have been possible. pp.org. 1991. 2010. where oriented techniques. pp. Czarnecki and U. A. 745– ?: Time 762. “On the efficiency of symbolic Finally. and C. ?: Number of nodes in an element pp. SAND2003-2952. corresponding to rows for the ?th http://eigen. 7. 22. and E. “Cooluid 3.. Heroux and J. “Trilinos users guide. Alnæs and K. pp. M. article 20. Wells.github. Most benchmarks are in the [2] E. USA. [6] C. Hager. a?? : Block of the element right hand side pp. 1968. H. 81–110. Zienkiewicz and R. B. 6. Bányai. USA. 42–51. “A fast Γ: The problem domain boundary fullycoupledsolution algorithm for the unsteady incompressible ]: Kinematic viscosity Navier-Stokes equations. D. Guennebaud. Sathe. and U. Gamma. vol.org/barche/dolfin/.” Mathematics of Computation. Addison- ?: Source term for the Poisson problem Wesley. Taylor. “A domain specific embedded language in C++ for automatic differentiation. ?: Unknown in the Poisson problem Mass. 1995. Oxford. 2000. This code can be found at 37. Boston. coolfluid/coolfluid3/. and H. vol. Deconinck.cpp. the code for benchmarks (including FEniCS tests) computations combined with code generation for finite element used in this paper is in the repository https://github. B. projection. “DOLFIN: automated finite element algebra backend to be able to measure without any overhead computing. Willenbring. “Stabilization parameters in SUPG regarding the publication of this paper. 27–40. no. “On the rotation and skew-symmetric forms for Conflict of Interests incompressible flow simulations. barche/coolfluid3-benchmarks/. and PSPG formulations. Lenz et al.22 Scientific Programming Appendix Willem Deconinck for the work on the mesh structure. Design Pat- variable ? Terns: Elements of Reusable Object-Oriented Software. N? : Element shape function vector for variable [12] K. 2. vol. The test comparing in Proceedings of the Symposium on Library-Centric Software dynamically and statically sized Eigen matrices is at Design (LCSD ’07). Acknowledgments The authors thank the Coolfluid 3 development team for the many hours of work and helpful discussions. N.” Journal of Computational and Applied Mechanics.com/ methods.” in Proceedings of the Conference on Ω: The problem domain.com/ http://coolfluid. no. G.” Computers in Physics. . Weller. We also had to adapt DOLFIN. ? ?? ?? : Block of the element matrix [8] G. variable u Butterworth-Heinemann. C42–C69. G. Jasak. Janssens et al. u: Velocity vector [14] T. Zang. Reading. LGPL version 3 and available from https://github. Logg and G.tuxfamily. vol. “Expression ? templates revisited: a performance analysis of current method- ologies. “Numerical solution of the Navier-Stokes equa- ?: Pressure tions. Code Download Information All of the code used in this work is available under References open source licenses. Vlissides. Quintino. 71–88. Treibig. 1. vol. and Quentin Gasper for the work on the GUI. Addison-Wesley.” ACM Transactions on Mathematical Software. Mass. Jacob. Helm. vol. In particular they thank Tiago Quintino for laying out the basic framework. S. Generative Programming: columns of the ?th component Methods. R. 2010. no. and Applications. no. [5] M. test/math/ptest-eigen-vs-matrixt. 2. 37.

com Volume 2014 http://www.hindawi.hindawi.com Volume 2014 http://www.com Volume 2014 .hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.com Volume 2014 Advances in Fuzzy Systems Modelling & Simulation in Engineering Hindawi Publishing Corporation Hindawi Publishing Corporation Volume 2014 http://www.com Submit your manuscripts at Journal of http://www.com Computer Networks and Communications  Advances in  Artificial Intelligence Hindawi Publishing Corporation http://www.hindawi.hindawi.com Volume 2014 International Journal of Advances in Biomedical Imaging Artificial Neural Systems International Journal of Advances in Computer Games Advances in Computer Engineering Technology Software Engineering Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.com Volume 2014 http://www.com Volume 2014 http://www.hindawi.hindawi.com Volume 2014 http://www.hindawi.hindawi. Advances in Journal of Industrial Engineering Multimedia Applied Computational Intelligence and Soft Computing The Scientific International Journal of Distributed Hindawi Publishing Corporation World Journal Hindawi Publishing Corporation Sensor Networks Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.hindawi.hindawi.com Volume 2014 International Journal of Reconfigurable Computing Advances in Computational Journal of Journal of Human-Computer Intelligence and Electrical and Computer Robotics Hindawi Publishing Corporation Interaction Hindawi Publishing Corporation Neuroscience Hindawi Publishing Corporation Hindawi Publishing Corporation Engineering Hindawi Publishing Corporation http://www.hindawi.hindawi.hindawi.com Volume 2014 Hindawi Publishing Corporation http://www.com Volume 2014 http://www.