A Tutorial on NIMROD Physics Kernel

Code Development
Carl Sovinec
University of Wisconsin–Madison
Department of Engineering Physics
August 1–3, 2001
Madison, Wisconsin
Contents
Contents i
Preface 1
I Framework of NIMROD Kernel 3
A Brief Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
A.1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
A.2 Conditions of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
A.3 Geometric Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 6
B Spatial Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
B.1 Fourier Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 7
B.2 Finite Element Representation . . . . . . . . . . . . . . . . . . . . . . 8
B.3 Grid Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
B.4 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
C Temporal Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
C.1 Semi-implicit Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
C.2 Predictor-corrector Aspects . . . . . . . . . . . . . . . . . . . . . . . 15
C.3 Dissipative Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
C.4 ∇· B Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
C.5 Complete time advance . . . . . . . . . . . . . . . . . . . . . . . . . . 16
D Basic functions of the NIMROD kernel . . . . . . . . . . . . . . . . . . . . . 18
II FORTRAN Implementation 19
A Implementation map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
B Data storage and manipulation . . . . . . . . . . . . . . . . . . . . . . . . . 19
B.1 Solution Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
B.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
B.3 Vector_types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
B.4 Matrix_types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
B.5 Data Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
B.6 Seams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
i
ii CONTENTS
C Finite Element Computations . . . . . . . . . . . . . . . . . . . . . . . . . . 34
C.1 Management Routines . . . . . . . . . . . . . . . . . . . . . . . . . . 34
C.2 Finite_element Module . . . . . . . . . . . . . . . . . . . . . . . . . 36
C.3 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . 36
C.4 Integrands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C.5 Surface Computations . . . . . . . . . . . . . . . . . . . . . . . . . . 43
C.6 Dirichlet boundary conditions . . . . . . . . . . . . . . . . . . . . . . 43
C.7 Regularity conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
D Matrix Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
D.1 2D matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
D.2 3D matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
E Start-up Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
F Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
G Extrap_mod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
H History Module hist_mod . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
I I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
III Parallelization 53
A Parallel communication introduction . . . . . . . . . . . . . . . . . . . . . . 53
B Grid block decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C Fourier “layer” decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 55
D Global-line preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
List of Figures
I.1 continuous linear basis function . . . . . . . . . . . . . . . . . . . . . . . . . 9
I.2 continuous quadratic basis function . . . . . . . . . . . . . . . . . . . . . . . 9
I.3 input geom=‘toroidal’ & ‘linear’ . . . . . . . . . . . . . . . . . . . . . . 10
I.4 schematic of leap-frog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
II.1 Basic Element Example. (n_side=2, n_int=4) . . . . . . . . . . . . . . . . . 22
II.2 1-D Cubic : extent of α
i
in red and α
j
in green, integration gives joff= −1 25
II.3 1-D Cubic : extent of α
i
in red and β
j
in green, note j is an internal node . . 26
II.4 1-D Cubic : extent of β
i
in red and β
j
in green, note both i, j are internal nodes 26
II.5 iv=vertex index, is=segment index, nvert=count of segment and vertices . 30
II.6 3 blocks with corresponding vertices . . . . . . . . . . . . . . . . . . . . . . . 31
II.7 Block-border seaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
II.8 4 quad points, 4 elements in a block, dot denotes quadrature position, not node 38
III.1 Normal Decomp: nmodes=2 on il=0 and nmodes=1 on il=1, nf=mx×my=9
Config-space Decomp: nphi=2
lphi
= 8, mpseudo=5 on il=0 and mpseudo=4 on
il=1
Parameters are mx=my=3, nlayer=2, lphi=3 . . . . . . . . . . . . . . . . . . 56
iii
iv LIST OF FIGURES
Preface
The notes contained in this report were written for a tutorial given to members of the
NIMROD Code Development Team and other users of NIMROD. The presentation was
intended to provide a substantive introduction to the FORTRAN coding used by the kernel, so
that attendees would learn enough to be able to further develop NIMROD or to modify it
for their own applications.
This written version of the tutorial contains considerably more information than what
was given in the three half-day sessions. It is the author’s hope that these notes will serve
as a manual for the kernel for some time. Some of the information will inevitably become
dated as development continues. For example, we are already considering changes to the
time-advance that will alter the predictor-corrector approach discussed in Section C.2. In
addition, the 3.0.3 version described here is not complete at this time, but the only difference
reflected in this report from version 3.0.2 is the advance of temperature instead of pressure
in Section C.5. For those using earlier versions of NIMROD (prior to and including 2.3.4),
note that basis function flexibility was added at 3.0 and data structures and array ordering
are different than what is described in Section B.
If you find errors or have comments, please send them to me at sovinec@engr.wisc.edu.
1
2 LIST OF FIGURES
Chapter I
Framework of NIMROD Kernel
A Brief Reviews
A.1 Equations
NIMROD started from two general-purpose PDE solvers, Proto and Proteus. Though rem-
nants of these solvers are a small minority of the present code, most of NIMROD is modular
and not specific to our applications. Therefore, what NIMROD solves can and does change,
providing flexibility to developers and users.
Nonetheless, the algorithm implemented in NIMROD is tailored to solve fluid-based mod-
els of fusion plasmas. The equations are the Maxwell equations without displacement current
and the single fluid form (see [1]) of velocity moments of the electron and ion distribution
functions, neglecting terms of order m
e
/m
i
smaller than other terms.
Maxwell without displacement current:
Ampere’s Law:
µ
0
J = ∇× B
Gauss’s Law:
∇· B = 0
Faraday’s Law:
∂B
∂t
= −∇× E
Fluid Moments → Single Fluid Form
Quasi-neutrality is assumed for the time and spatial scales of interest, i.e.
• Zn
i
n
e
→n
3
4 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
• e(Zn
i
− n
e
)E is negligible compared to other forces
• However ∇· E = 0
Continuity:
∂n
∂t
+V· ∇n = −n∇· V
Center-of-mass Velocity Evolution:
ρ
∂V
∂t
+ ρV· ∇V = J ×B− ∇p − ∇· Π
• Π often set to ρν∇V
• Kinetic effects, e.g., neoclassical, would appear through Π
Temperature Evolution (3.0.2 & earlier evolve p, p
e
)
n
α
γ − 1


∂t
+V
α
· ∇

T
α
= −p
α
∇· V
α
− Π
α
: ∇V
α
−∇· q
α
+Q
α
• q
α
= −n
α
χ
α
· ∇T
α
in collisional limit
• Q
e
includes ηJ
2
• α = i, e
• p
α
= n
α
kT
α
Generalized Ohm’s (∼ electron velocity moment)(underbrace denotes input parameters, un-
less otherwise noted ohms=)
E = −V× B
. .. .
‘mhd’
‘‘mhd & hall’
‘2fl’
+ηJ
....
elecd>0
+
1
ne
J × B
. .. .
‘hall’
‘mhd & hall’
‘2fl’
+
m
e
ne
2

∂J
∂t
+∇· (JV +VJ)
. .. .
advect=‘all’

e
m
e
(∇p
e
. .. .
‘2fl’
+∇· Π
. .. .
neoclassical
)
¸
¸
¸
¸
¸
¸
¸
A steady-state part of the solution is separated from the solution fields in NIMROD. For
resistive MHD for example, ∂/∂t →0 implies
A. BRIEF REVIEWS 5
1. ρ
s
V
s
· ∇V
s
= J
s
× B
s
− ∇p
s
+∇· ρ
s
ν∇V
s
2. ∇· (n
s
V
s
) = 0
3. ∇×E
s
= ∇× (ηJ
s
−V
s
× B
s
) = 0
4.
n
s
γ−1
V
s
· ∇T
s
= −p
s
∇· V
s
+∇· n
s
χ · ∇T
s
+Q
s
Notes:
• An equilibrium (Grad-Shafranov) solution is a solution of 1 (J
s
× B
s
= ∇p
s
); 2 – 4
• Loading an ‘equilibrium’ into NIMROD means that 1 – 4 are assumed. This is appro-
priate and convenient in many cases, but it’s not appropriate when the ‘equilibrium’
already contains expected contributions from electromagnetic activity.
• The steady-state part may be set to 0 or to the vacuum field.
Decomposing each field into steady and evolving parts and canceling steady terms gives
(for MHD)
[A →A
s
+A]
∂n
∂t
+ V
s
· ∇n +V· ∇n
s
+V· ∇n = −n
s
∇· V− n∇· V
s
− n∇· V

s
+ ρ)
∂V
∂t
+ ρ [(V
s
+ V) · ∇(V
s
+ V)] + ρ
s
[V
s
· ∇V+ V· ∇V
s
+ V· ∇V]
= J
s
× B+J × B
s
+J × B−∇p +∇· ν [ρ∇V
s
+ ρ
s
∇V + ρ∇V]
∂B
∂t
= ∇×(V
s
×B+ V× B
s
+V× B− ηJ)
(n + n
s
)
γ −1
∂T
∂t
+
n
γ −1
[(V
s
+ V) · ∇(T
s
+T)] +
n
s
γ − 1
[V
s
· ∇T + V· ∇T
s
+V· ∇T]
= −p
s
∇· V− p∇· V
s
− p∇· V
+∇· [n
s
χ · ∇T + nχ · T
s
+nχ · ∇T]
Notes:
6 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
• The code further separates χ (and other dissipation coefficients ?) into steady and
evolving parts.
• The motivation is that time-evolving parts of fields can be orders of magnitude smaller
than the steady part, and linear terms tend to cancel, so skipping decompositions would
require tremendous accuracy in the large steady part. It would also require complete
source terms.
• Nonetheless, decomposition adds to computations per step and code complexity.
General Notes:
1. Collisional closures lead to local spatial derivatives only. Kinetic closures lead to
integro-differential equations.
2.

∂t
equations are the basis for the NIMROD time-advance.
A.2 Conditions of Interest
The objective of the project is to simulate the electromagnetic behavior of magnetic confine-
ment fusion plasmas. Realistic conditions make the fluid equations stiff.
• Global magnetic changes occur over the global resistive diffusion time.
• MHD waves propagate 10
4
– 10
10
times faster.
• Topology-changing modes grow on intermediate time scales.
* All effects are present in equations sufficiently complete to model behavior in experi-
ments.
A.3 Geometric Requirements
The ability to accurately model the geometry of specific tokamak experiments has always
been a requirement for the project.
• Poloidal cross section may be complicated.
• Geometric symmetry of the toroidal direction was assumed.
• The Team added:
– Periodic linear geometry
– Simply-connected domains
B. SPATIAL DISCRETIZATION 7
B Spatial Discretization
B.1 Fourier Representation
A pseudospectral method is used to represent the periodic direction of the domain. A
truncated Fourier series yields a set of coefficients suitable for numerical computation:
A(φ) = A
0
+
N
¸
n=1

A
n
e
inφ
+A

n
e
−inφ

or
A(z) = A
0
+
N
¸
n=1

A
n
e
i
2πn
L
z
+ A

n
e
−i
2πn
L
z

All physical fields are real functions of space, so we solve for the complex coefficients, A
n
,
n > 0, and the real coefficient A
0
.
• Linear computations find A
n
(t) (typically) for a single n-value. (Input ‘lin_nmax’,
‘lin_nmodes’)
• Nonlinear simulations use FFTs to compute products on a uniform mesh in φ and z.
Example of nonlinear product: E = −V ×B
• Start with V
n
, B
n
, 0 ≤ n ≤ N
• FFT gives
V

m
π
N

, B

m
π
N

, 0 ≤ m ≤ 2N − 1
• Multiply (colocation)
E

m
π
N

= −V

m
π
N

× B

m
π
N

, 0 ≤ m ≤ 2N − 1
• FFT returns Fourier coefficients
E
n
, 0 ≤ n ≤ N
Standard FFTs require 2N to be a power of 2.
• Input parameter ‘lphi’ determines 2N.
2N = 2
lphi
8 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
• Collocation is equivalent to spectral convolution if aliasing errors are prevented.
• One can show that setting V
n
, B
n
, etc. = 0 for n >
2N
3
prevents aliasing from quadratic
nonlinearities (products of two functions).

nmodes total =
2
lphi
3
+ 1

• Aliasing does not prevent numerical convergence if the expansion converges.
As a prelude to the finite element discretization, Fourier expansions:
1. Substituting a truncated series for f (φ) changes the PDE problem into a search for
the best solution on a restricted solution space.
2. Orthogonality of basis functions is used to find a system of discrete equations for the
coefficients.
e.g. Faraday’s law
∂B(φ)
∂t
= −∇×E(φ)

∂t
¸
B
0
+
N
¸
n=1
B
n
e
inφ
+ c.c.
¸
= −∇×
¸
E
0
+
N
¸
n=1
E
n
e
inφ
+ c.c.
¸
apply

dφe
−in

φ
for 0 ≤ n

≤ N

∂B
n

∂t
= −


pol
× E
n
+
in

ˆ
φ
R
×E
n

, 0 ≤ n

≤ N
B.2 Finite Element Representation
Finite elements achieve discretization in a manner similar to the Fourier series. Both are
basis function expansions that impose a restriction on the solution space, the restricted
spaces become better approximations of the continuous solution space as more basis functions
are used, and the discrete equations describe the evolution of the coefficients of the basis
functions.
The basis functions of finite elements are low-order polynomials over limited regions of
the domain.
α
i
(x) =

0 x < x
i−1
x−x
i−1
x
i
−x
i−1
x
i−1
≤ x ≤ x
i
x
i+1
−x
x
i+1
−x
i
x
i
≤ x ≤ x
i+1
0 x > x
i+1
B. SPATIAL DISCRETIZATION 9
x
i!1
(x)
i
!
1
x x
i i+1
Figure I.1: continuous linear basis function
x
i!1
(x)
i
!
1
x x
i i+1
x
i!1
(x)
i
"
1
x x
i i+1
Figure I.2: continuous quadratic basis function
α
i
(x) =

0 x < x
i−1
(x−x
i−1
)(x−x
i−
1
2
)
(x
i
−x
i−1
)(x
i
−x
i−
1
2
)
x
i−1
≤ x ≤ x
i
(x−x
i+1
)(x−x
i+
1
2
)
(x
i
−x
i+1
)(x
i
−x
i+
1
2
)
x
i
≤ x ≤ x
i+1
0 x > x
i+1
β
i
(x) =

(x−x
i−1
)(x−x
i
)
(x
i−
1
2
−x
i−1
)(x
i−
1
2
−x
i
)
x
i−1
≤ x ≤ x
i
0 otherwise
In a conforming approximation, continuity requirements for the restricted space are the
same as those for the solution space for an analytic variational form of the problem.
• Terms in the weak form must be integrable after any integration-by-parts; step func-
tions are admissible, but delta functions are not.
• Our fluid equations require a continuous solution space in this sense, but derivatives
need to be piecewise continuous only.
Using the restricted space implies that we seek solutions of the form
A(R, Z) =
¸
j
A
j
α
j
(R, Z)
10 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
• The j-summation is over the nodes of the space.
• α
j
(R, Z) here may represent different polynomials. [α
j

j
in 1D quadratic example
(Figure I.2)]
• 2D basis functions are formed as products of 1D basis functions for R and Z.
Nonuniform meshing can be handled by a mapping from logical coordinates to physical
coordinates. To preserve convergence properties, i.e. lowest order polynomials of physical
coordinates, the order of the mapping should be no higher than polynomial degree of the
basis functions[2].
The present version of NIMROD allows the user to choose the polynomial degree of the
basis functions (poly_degree) and the mapping (‘met_spl’).
Numerical analysis shows that a correct application of the finite element approach ties
convergence of the numerical solution to the best possible approximation by the restricted
space. (See [2] for analysis of self-adjoint problems.) Error bounds follow from the error
bounds of a truncated Taylor series expansion.
For scalar fields, we now have enough basis functions. For vector fields, we also need
direction vectors. All recent versions of NIMROD have equations expressed in (R, Z, φ)
coordinates for toroidal geometries and (X, Y, Z) for linear geometries.
y
z
x
z
e
e
e
e
e
e
Z
R
#
y
x
z
Figure I.3: input geom=‘toroidal’ & ‘linear’
For toroidal geometries, we must use the fact that ˆ e
R
, ˆ e
φ
are functions of φ, but we always
have ˆ e
i
· ˆe
j
= δ
ij
.
B. SPATIAL DISCRETIZATION 11
• Expanding scalars
S (R, Z, φ) →
¸
j
α
j
(R, Z)
¸
S
j0
+
N
¸
n=1
S
jn
e
inφ
+ c.c.
¸
• Expanding vectors
V(R, Z, φ) →
¸
j
¸
l
α
j
(R, Z) ˆ e
l
¸
V
jl0
+
N
¸
n=1
V
jln
e
inφ
+ c.c.
¸
* For convenience ¯ α
jl
≡ α
j
ˆ e
l
and I’ll suppress c.c. and make n sums from 0 to N,
where N here is nmodes_total−1.
While our poloidal basis functions do not have orthogonality properties like e
inφ
and ˆ e
R
,
their finite extent means that

dRdZD
s

i
(R, Z)) D
t

j
(R, Z)) = 0,
where D
s
is a spatial derivative operator of allowable order s (here 0 or 1), only where nodes
i and j are in the same element (inclusive of element borders).
Integrands of this form arise from a procedure analogous to that used with Fourier series
alone; we create a weak form of the PDEs by multiplying by a conjugated basis function.
Returning to Faraday’s law for resistive MHD, and using

E to represent the ideal −V×B
field (after pseudospectral calculations):
• Expand B(R, Z, φ, t) and

E (R, Z, φ, t)
• Apply

dRdZdφ ¯ α
j

l
(R, Z) e
−in

φ
for j

= 1 : # nodes, l

{R, Z, φ} , 0 ≤ n

≤ N
• Integrate by parts to symmetrize the diffusion term and to avoid inadmissible deriva-
tives of our solution space.

dRdZdφ

¯ α
j

l
e
−in

φ

·
¸
jln
∂B
jln
∂t
¯ α
jl
e
inφ
= −

dRdZdφ∇ ×

¯ α
j

l
e
−in

φ

·
¸
η
µ
0
¸
jln
B
jln
∇× ¯ α
jl
e
inφ
+
¸
jln
E
jln
¯ α
jl
e
inφ
¸
12 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
+


¯ α
j

l
e
−in

φ

·
¸
jln
E
jln
¯ α
jl
e
inφ
× dS
Using orthogonality to simplify, and rearranging the lhs,

¸
j
∂B
jl

n

∂t

dRdZ α
j
α
j
+
¸
jl
B
jln

dRdZdφ
η
µ
0
∇×

¯ α
j

l
e
−in

φ

· ∇×

¯ α
jl
e
in

φ

= −

dRdZdφ ∇×

¯ α
j

l
e
−in

φ

·
¸
jl
E
jl
¯ α
jl
e
in

φ
+

¯ α
j

l
·
¸
jl
E
jln
¯ α
jl
×dS
for all {j

, l

, n

}
Notes on this weak form of Faraday’s law:
• Surface integrals for inter-element boundaries cancel in conforming approximations.
• The remaining surface integral is at the domain boundary.
• The lhs is written as a matrix-vector product with the

dRdZ f (¯ α
j

l
) g (¯ α
jl
) integrals
defining a matrix element for row (j

, l

) and column (j, l). These elements are diagonal
in n.
• {B
jln
} constitutes the solution vector at an advanced time (see C).
• The rhs is left as a set of integrals over functions of space.
• For self-adjoint operators, such as I+∆t∇×
η
µ
0
∇×, using the same functions for test and
basis functions (Galerkin) leads to the same equations as minimizing the corresponding
variational [2], i.e. Rayleigh-Ritz-Galerkin problem.
C. TEMPORAL DISCRETIZATION 13
B.3 Grid Blocks
For geometric flexibility and parallel domain decomposition, the poloidal mesh can be con-
structed from multiple grid-blocks. NIMROD has two types of blocks: Logically rectangular
blocks of quadrilateral elements are structured. They are called ‘rblocks’. Unstructured
blocks of triangular elements are called ‘tblocks’. Both may be used in the same simulation.
The kernel requires conformance of elements along block boundaries, but blocks don’t
have to conform.
More notes on weak forms:
• The finite extent of the basis functions leads to sparse matrices
• The mapping for non-uniform meshing has not been expressed explicitly. It entails:
1.

dRdZ →

Jdξdη, where ξ, η are logical coordinates, and J is the Jacobian,
2.
∂α
∂R

∂α
∂ξ
∂ξ
∂R
+
∂α
∂η
∂η
∂R
and similarly for Z, where α is a low order polynomial of ξ
and η with uniform meshing in (ξ, η).
3. The integrals are computed numerically with Gaussian quadrature

1
0
f (ξ) dξ →
¸
i
w
i
f (ξ
i
) where {w
i
, ξ
i
}, i = 1, . . . n
g
are prescribed by the quadrature method
[3].
B.4 Boundary Conditions
Dirichlet conditions are normally enforced for B
normal
and V. If thermal conduction is used,
Dirichlet conditions are also imposed on T
α
. Boundary conditions can be modified for the
needs of particular applications, however.
Formally, Dirichlet conditions are enforced on the restricted solution space (“essential
conditions”). In NIMROD we simply substitute the Dirichlet boundary condition equations
in the linear system at the appropriate rows for the boundary nodes.
Neuman conditions are not set explicitly. The variational aspects enforce Neuman con-
ditions - through the appearance (or non-appearance) of surface integrals - in the limit of
infinite spatial resolution ([2] has an excellent description). These “natural” conditions pre-
serve the spatial convergence rate without using ghost cells, as in finite difference or finite
volume methods.
C Temporal Discretization
NIMROD uses finite difference methods to discretize time. The solution field is marched in
a sequence of time-steps from initial conditions.
14 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
C.1 Semi-implicit Aspects
Without going into an analysis, the semi-implicit method as applied to hyperbolic systems of
PDEs can be described as a leap-frog approach that uses a self-adjoint differential operator
to extend the range of numerical stability to arbitrarily large ∆t. It is the tool we use to
address the stiffness of the fluid model as applied to fusion plasmas. (See [4, 5].)
Though any self-adjoint operator can be used to achieve numerical stability, accuracy
at large ∆t is determined by how well the operator represents the physical behavior of the
system [4]. The operator used in NIMROD to stabilize the MHD advance is the linear
ideal-MHD force operator (no flow) plus a Laplacian with a relatively small coefficient [5].
Like the leap-frog method, the semi-implicit method does not introduce numerical dissi-
pation. Truncation errors are purely dispersive. This aspect makes semi-implicit methods
attractive for stiff, low dissipation systems like plasmas.
NIMROD also uses time-splitting. For example, when the Hall electric field is included
in a computation, B is first advanced with the MHD electric field, then with the Hall electric
field.
B
MHD
= B
n
−∆t ∇× E
MHD
B
n+1
= B
MHD
−∆t ∇× E
Hall
Though it precludes 2
nd
order temporal convergence, this splitting reduces errors of large
time-step [6]. A separate semi-implicit operator is then used in the Hall advance of B. At
present, the operator implemented in NIMROD often gives inaccuracies at ∆t much larger
than the explicit stability limit for a predictor/corrector advance of B due to E
Hall
. Hall
terms must be used with great care until further development is pursued.
Why not an implicit approach?
A true implicit method for nonlinear systems converges on nonlinear terms at advanced
times, requiring nonlinear system solves at each time step. This may lead to ‘the ultimate’
time-advance scheme. However, we have chosen to take advantage of the smallness of the
solution field with respect to steady parts → the dominant stiffness arises in linear terms.
Although the semi-implicit approach eliminates time-step restrictions from wave propa-
gation, the operators lead to linear matrix equations.
• Eigenvalues for the MHD operator are ∼ 1 +ω
2
k
∆t
2
where ω
k
are the frequencies of the
MHD waves supported by the spatial discretization.
• We have run tokamak simulations accurately at max|ω
k
|∆t ∼ 10
4
– 10
5
, and min |ω
k
|
0.
• Leads to very ill-conditioned matrices.
• Anisotropic terms use steady + (n = 0) fields → operators are diagonal in n.
C. TEMPORAL DISCRETIZATION 15
C.2 Predictor-corrector Aspects
The semi-implicit approach does not ensure numerical stability for advection, even for V
s
·∇
terms. A predictor-corrector approach can be made numerically stable.
• Nonlinear and linear computations require two solves per equation.
• Synergistic wave-flow effects require having the semi-implicit operator in the predic-
tor step and centering wave-producing terms in the corrector step when there are no
dissipation terms [7]
• NIMROD does not have separate centering parameters for wave and advective terms,
so we set centerings to 1/2 + δ and rely on having a little dissipation.
• Numerical stability still requires ∆t ≤ C
∆x
V
, where C is O(1) and depends on centering.
[See input.f in the nimset directory.]
C.3 Dissipative Terms
Dissipation terms are coded with time-centerings specified through input parameters. We
typically use fully forward centering.
C.4 ∇· B Constraint
Like many MHD codes capable of using nonuniform, nonorthogonal meshes, ∇ · B is not
identically 0 in NIMROD. Left uncontrolled, the error will grow until it crashes the simula-
tion. We have used an error-diffusion approach [8] that adds the unphysical diffusive term
κ
∇·B
∇∇· B to Faraday’s law.
∂B
∂t
= −∇× E + κ
∇·B
∇∇· B (I.1)
Applying ∇· gives
∂∇· B
∂t
= ∇· κ
∇·B
∇(∇· B) (I.2)
so that the error is diffused if the boundary conditions maintain constant

dS · B[9].
16 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
C.5 Complete time advance
A complete time-step in version 3.0.3 will solve coefficients of the basis functions used in the
following time-discrete equations.

ρ
n
−∆tf
ν
∇· ρ
n
ν∇−
C
sm
∆t
2
4
[B
0
× ∇× ∇× (B
0
× I·) − J
0
× ∇×(B
0
×I·)]

C
sp
∆t
2
4
[∇γP
0
∇· +∇[∇(p
0
) · I·]] −
C
sn
∆t
2
4

2

(∆V)
pass
= −∆t [ρ
n
V

· ∇V

+ J
n
×B
n
−∇p
n
+∇· ρ
n
ν∇V
n
+∇· Π
neo
]
• For pass = predict, V

= V
n
.
• For pass = correct, V

= V
n
+f
v
(∆V)
predict
, and
• V
n+1
= V
n
+ (∆V)
correct
(∆n)
pass
= −∆t

V
n+1
· ∇n

+n

∇· V
n+1

• pass = predict →n

= n
n
• pass = correct →n

= n
n
+f
n
(∆n)
predict
• n
n+1
= n
n
+ (∆n)
correct
Note: f

’s are centering coefficients, C
s∗
’s are semi-implicit coefficients.

1 + ∇×

m
e
ne
2
+ (∆t)f
η
eta
µ
0

∇×−(∆t)f
∇·B
κ
∇·B
∇∇·

(∆B)
pass
= (∆t)∇×
¸
V
n+1
× B


n
µ
0
∇× B
n
+
1
ne
∇· Π
e
neo

+ (∆t)κ
∇·B
∇∇· B
n
• pass = predict →B

= B
n
• pass = correct →B

= B
n
+ f
b
(∆B)
predict
• B
MHD
= B
n
+ ∆B
correct

1 +∇×

m
e
ne
2
+ C
sb
(∆t)
2
b
|B
0
|
ne

∇×−∆tf
∇·B
κ
∇·B
∇∇·

(∆B)
pass
= −(∆t)
b
∇×
¸
1
ne
(J

×B

− ∇p
n
e
) +
m
e
ne
2
∇· (J

V
n+1
+ V
n+1
J

)

+(∆t)
b
κ
∇·B
∇∇· B
MHD
C. TEMPORAL DISCRETIZATION 17
• pass = predict →(∆t)
b
= ∆t/2, B

= B
MHD
• pass = correct →(∆t)
b
= ∆t, B

= B
MHD
+ (∆B)
predict
• B
n+1
= B
MHD
+ (∆B)
correct

n
n+1
γ − 1
+ (∆t)f
χ
∇· n
n+1
χ
α
· ∇

(∆T
α
)
pass
= −∆t
¸
n
n+1
γ −1
V
n+1
α
· ∇T

α
+ p
n
α
∇· V
n+1
α
−∇· n
n+1
χ
α
· ∇T
n
α

+ Q
• pass = predict →T

α
= T
n
α
• pass = correct →T

α
= T
n
α
+f
T
(∆T
α
)
predict
• T
n+1
α
= T
n
α
+ (∆T
α
)
correct
• α = e, i
Notes on time-advance in 3.0.3
• Without the Hall time-split, approximate time-centered leap-frogging could be achieved
by using
1
2

n
n
+ n
n+1

in the temperature equation
{B,n,T}
n
{B,n,T}
n!1
V
n!1/2
V
n+1/2
t
Figure I.4: schematic of leap-frog
• The different centering of the Hall time split is required for numerical stability (see
Nebel notes).
• the
∂J
∂t
electron inertia term necessarily appears in both time-splits for B when it is
included in E.
• Fields used in semi-implicit operators are updated only after changing more than a set
level to reduce unnecessary compution of the matrix elements (which are costly).
18 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
D Basic functions of the NIMROD kernel
1. Startup
• Read input parameters
• Initialize parallel communication as needed
• Read grid, equilibrium, and initial fields from dump file
• Allocate storage for work arrays, matrices, etc.
2. Time Advance
(a) Preliminary Computations
• Revise ∆t to satisfy CFL if needed
• Check how much fields have changed since matrices were last updated. Set
flags if matrices need to be recomputed
• Update neoclassical coefficients, voltages, etc. as needed
(b) Advance each equation
• Allocate rhs and solution vector storage
• Extrapolate in time previous solutions to provide an initial guess for the linear
solve(s).
• Update matrix and preconditioner if needed
• Create rhs vector
• Solve linear system(s)
• Reinforce Dirichlet conditions
• Complete regularity conditions if needed
• Update stored solution
• Deallocate temporary storage
3. Output
• Write global, energy, and probe diagnostics to binary and/or text files at desired
frequency in time-step (nhist)
• Write complete dump files at desired frequency in time-step (ndump)
4. Check stop condition
• Problems such as lack of iterative solver convergence can stop simulations at any
point
Chapter II
FORTRAN Implementation
Preliminary remarks
As implied in the Framework section, the NIMROD kernel must complete a variety of
tasks each time step. Modularity has been essential in keeping the code manageable as the
physics model has grown in complexity.
FORTRAN 90 extensions are also an important ingredient. Arrays of data structures pro-
vide a clean way to store similar sets of data that have differing numbers of elements. FORTRAN
90 modules organize different types of data and subprograms. Array syntax leads to cleaner
code. Use of KIND when defining data settles platform-specific issues of precision. ([10] is
recommended.)
A Implementation map
All computationally intensive operations arise during either 1) the finite element calculations
to find rhs vectors and matrices, and 2) solution of the linear systems. Most of the operations
are repeated for each advance, and equation-specific operations occur at only two levels in
a hierarchy of subprograms for 1). The data and operation needs of subprograms at a level
in the hierarchy tend to be similar, so FORTRAN module grouping of subprograms on a level
helps orchestrate the hierarchy.
B Data storage and manipulation
B.1 Solution Fields
The data of a NIMROD simulation are the sets of basis functions coefficients for each solution
field plus those for the grid and steady-state fields. This information is stored in FORTRAN
90 structures that are located in the fields module.
19
20 CHAPTER II. FORTRAN IMPLEMENTATION
I
m
p
l
e
m
e
n
t
a
t
i
o
n
M
a
p
o
f
P
r
i
m
a
r
y
N
I
M
R
O
D
O
p
e
r
a
t
i
o
n
s
·
a
d
v
*
r
o
u
t
i
n
e
s
m
a
n
a
g
e
a
n
a
d
v
a
n
c
e
o
f
o
n
e

e
l
d
¨
¨
¨%
r
r
r
rj
6 c
6 c
F
I
N
I
T
E
E
L
E
M
E
N
T
·
m
a
t
r
i
x
c
r
e
a
t
e
·
g
e
t
r
h
s
o
r
g
a
n
i
z
e
f
.
e
.
c
o
m
p
u
t
a
t
i
o
n
s
'
'
c
c















E
B
O
U
N
D
A
R
Y
e
s
s
e
n
t
i
a
l
c
o
n
d
i
t
i
o
n
s
R
E
G
U
L
A
R
I
T
Y
R
B
L
O
C
K
(
T
B
L
O
C
K
)
·
r
b
l
o
c
k
m
a
k
e
m
a
t
r
i
x
·
r
b
l
o
c
k
g
e
t
r
h
s
P
e
r
f
o
r
m
n
u
m
e
r
i
c
a
l
i
n
t
e
g
r
a
t
i
o
n
i
n
r
b
l
o
c
k
s
c
c
I
N
T
E
G
R
A
N
D
S
P
e
r
f
o
r
m
m
o
s
t
e
q
u
a
-
t
i
o
n
s
p
e
c
i

c
p
o
i
n
t
-
w
i
s
e
c
o
m
p
u
t
a
t
i
o
n
s
c
c
F
F
T
M
O
D
N
E
O
C
L
A
S
S
I
N
T
p
e
r
f
o
r
m
c
o
m
p
u
t
a
t
i
o
n
s
f
o
r
n
e
o
c
l
a
s
s
i
c
a
l
-
s
p
e
c
i

c
e
q
u
a
t
i
o
n
s

C
c
c
M
A
T
H
T
R
A
N
d
o
a
r
o
u
t
i
n
e
m
a
t
h
o
p
e
r
a
t
i
o
n
S
U
R
F
A
C
E
·
s
u
r
f
a
c
e
c
o
m
p
r
h
s
p
e
r
f
o
r
m
n
u
m
e
r
i
c
a
l
s
u
r
f
a
c
e
i
n
t
e
g
r
a
l
s
c
S
U
R
F
A
C
E
I
N
T
S
p
e
r
f
o
r
m
s
u
r
f
a
c
e
-
s
p
e
c
i

c
c
o
m
p
u
t
a
t
i
o
n
s
'
¥
§ c
c
G
E
N
E
R
I
C
E
V
A
L
S
i
n
t
e
r
p
o
l
a
t
e
o
r

n
d
s
t
o
r
a
g
e
f
o
r
d
a
t
a
a
t
q
u
a
d
r
a
t
u
r
e
p
o
i
n
t
s
I
T
E
R
3
D
C
G
M
O
D
p
e
r
f
o
r
m
3
D
m
a
t
r
i
x
s
o
l
v
e
s
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g g
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
I
T
E
R
C
G
F
9
0
p
e
r
f
o
r
m
r
e
a
l
2
D
m
a
t
r
i
x
s
o
l
v
e
s
¨
¨
¨
¨
¨
¨
¨
¨
¨
I
T
E
R
C
G
C
O
M
P
p
e
r
f
o
r
m
c
o
m
p
l
e
x
2
D
m
a
t
r
i
x
s
o
l
v
e
s

E
D
G
E
·
e
d
g
e
n
e
t
w
o
r
k
c
a
r
r
y
o
u
t
c
o
m
m
u
n
i
c
a
t
i
o
n
a
c
r
o
s
s
b
l
o
c
k
b
o
r
d
e
r
s
c
V
E
C
T
O
R
T
Y
P
E
M
O
D
p
e
r
f
o
r
m
o
p
s
o
n
v
e
c
t
o
r
c
o
e

c
i
e
n
t
s
c
M
A
T
R
I
X
M
O
D
c
o
m
p
u
t
e
2
D
m
a
t
r
i
x
-
v
e
c
t
o
r
p
r
o
d
u
c
t
s
c
N
O
T
E
:
1
.
a
l
l
c
a
p
s
i
n
d
i
c
a
t
e
a
m
o
d
u
l
e
n
a
m
e
.
2
.

·

i
n
d
i
c
a
t
e
s
a
s
u
b
r
o
u
t
i
n
e
n
a
m
e
o
r
i
n
t
e
r
f
a
c
e
t
o
m
o
d
u
l
e
p
r
o
c
e
d
u
r
e
B. DATA STORAGE AND MANIPULATION 21
• rb is a 1D pointer array of defined-type rblock_type. At the start of a simulation, it
is dimensioned 1:nrbl, where nrbl is the number of rblocks.
• The type rblock_type (analogy:integer), is defined in module rblock_type_mod. This
is a structure holding information such as the cell-dimensions of the block(mx,my), ar-
rays of basis function information at Gaussian quadrature nodes (alpha,dalpdr,dalpdz),
numerical integration information(wg,xg,yg), and two sets of structures for each solu-
tion field:
1. A lagr_quad_type structure holds the coefficients and information for evaluating
the field at any (ξ, η).
2. An rb_comp_qp_type, just a pointer array, allocated to save the interpolates of
the solution spaces at the numerical quadrature points.
• tb is a 1D pointer array, similar to rb but of type tblock_type_mod, dimensioned
nrbl+1:nbl so that tblocks have numbers distinct from the rblocks.
Before delving into the element-specific structures, let’s backup to the fields module
level. Here, there are also 1D pointer arrays of cvector_type and vector_type for each
solution and steady-state field, respectively. These structures are used to give block-type-
independent access to coefficients. They are allocated 1:nbl. These structures hold pointer
arrays that are used as pointers (as opposed to using them as ALLOCATABLE arrays in struc-
tures). There are pointer assignments from these arrays to the lagr_quad_type structure
arrays that have allocated storage space. (See [10])
In general, the element-specific structures(lagr_quad_type, here) are used in conjunction
with the basis functions (as occurs in getting field values at quadrature point locations). The
element-independent structure(vector_types) are used in linear algebra operations with the
coefficients.
The lagr_quad_types are defined in the lagr_quad_mod module, along with the in-
terpolation routines that use them. [The tri_linear_types are the analogous structures
for triangular elements in the tri_linear module.] The lagr_quad_2D_type is used for
φ-independent fields.
The structures are self-describing, containing array dimensions and character variables
for names. There are also some arrays used for interpolation. The main storage locations
are the fs, fsh, fsv, and fsi arrays.
• fs(iv,ix,iy,im) → coefficients for grid-vertex nodes.
– iv = direction vector component (l:nqty)
– ix = horizontal logical coordinate index (0:mx)
– iy = vertical logical coordinate index (0:my)
22 CHAPTER II. FORTRAN IMPLEMENTATION
– im = Fourier component (“mode”) index (1:nfour)
• fsh(iv,ib,ix,iy,im) → coefficients for horizontal side nodes.
– iv = direction vector component (*:*)
– ib = basis index (1:n_side)
– ix = horizontal logical coordinate index (1:mx)
– iy = vertical logical coordinate index (0:my)
– im = Fourier component (“mode”) index (1:nfour)
• fsv(iv,ib,ix,iy,im) → coefficients for vertical side nodes.
• fsi(iv,ib,ix,iy,im) → coefficients for internal nodes [ib limit is (1:n_int)]
fs(:,ix!1,iy,:)
fs(:,ix,iy!1,:)
fsh(:,1,ix,iy!1,:)
fsh(:,1,ix,iy,:) fsh(:,2,ix,iy,:)
fsh(:,2,ix,iy!1,:)
fs(:,ix,iy,:)
fs(:,ix!1,iy!1,:)
fsi(:,3,ix,iy,:)
fsi(:,1,ix,iy,:)
fsi(:,4,ix,iy,:)
fsv(:,1,ix!1,iy,:)
fsv(:,2,ix!1,iy,:) fsv(:,2,ix,iy,:)
fsv(:,1,ix,iy,:)
fsi(:,2,ix,iy,:)
Figure II.1: Basic Element Example. (n_side=2, n_int=4)
B.2 Interpolation
The lagr_quad_mod module also contains subroutines to carry out the interpolation of fields
to arbitrary positions. This is a critical aspect of the finite element algorithm, since interpo-
lations are needed to carry out the numerical integrations.
For both real and complex data structures, there are single point evaluation routines
(*_eval) and routines for finding the field at the same offset in every cell of a block
(*_all_eval routines). All routines start by finding α
i
|

)
[and

∂ξ
α
i
|

)
and

∂η
α
i
|

)
B. DATA STORAGE AND MANIPULATION 23
if requested] for the (element–independent) basis function values at the requested offset

, η

). The lagr_quad_bases routine is used for this operation. [It’s also used at startup
to find the basis function information at quadrature point locations, stored in the rb%alpha,
dalpdr, dalpdz arrays for use as test functions in the finite element computations.] The
interpolation routines then multiply basis function values with respective coefficients.
The data structure, logical positions for the interpolation point, and desired derivative
order are passed into the single-point interpolation routine. Results for the interpolation are
returned in the structure arrays f, fx, and fy.
The all_eval routines need logical offsets (from the lower-left corner of each cell) instead
of logical coordinates, and they need arrays for the returned values and logical derivatives in
each cell.
See routine struct_set_lay in diagnose.f for an example of a single point interpolation
call. See generic_3D_all_eval for an example of an all_eval call.
Interpolation operations are also needed in tblocks. The computations are analogous
but somewhat more complicated due to the unstructured nature of these blocks. I won’t
cover tblocks in any detail, since they are due for a substantial upgrade to allow arbitrary
polynomial degree basis functions, like rblocks. However, development for released versions
must be suitable for simulations with tblocks.
B.3 Vector_types
We often perform linear algebra operations on vectors of coefficients. [Management routines,
iterative solves, etc.] Vector_type structures simplify these operations, putting block and
basis-specific array idiosyncrasies into module routines. The developer writes one call to a
subroutine interface in a loop over all blocks.
Consider an example from the adv_b_iso management routine. There is a loop over
Fourier components that calls two real 2D matrix solves per component. One solves α for
Re{B
R
}, Re{B
Z
}, and −Im{B
φ
} for the F-comp, and the other solves for Im{B
R
}, Im{B
Z
},
and Re{B
φ
}. After a solve in a predictor step (b_pass=bmhd pre or bhall pre), we need
to create a linear combination of the old field and the predicted solution. At this point,
• vectr (1:nbl) → work space, 2D real vector_type
• sln (1:nbl) → holds linear system solution, also a 2D vector_type
• be (1:nbl) → pointers to B
n
coefficients, 3D (complex) vector_type
• work1 (1:nbl) →pointers to storage that will be B

in corrector step, 3D vector_type
What we want is B

imode
= f
b
B
pre
imode
+ (1 − f
b
)B
n
imode
. So we have
• CALL vector_assign_cvec(vectr(ibl), be(ibl), flag, imode)
24 CHAPTER II. FORTRAN IMPLEMENTATION
– Transfers the flag-indicated components such as r12mi3 for Re{B
R
}, Re{B
Z
},
and −Im{B
φ
} for imode to the 2D vector.
• CALL vector_add(sln(ibl), vectr(ibl), v1fac=‘center’, v2fac=‘1-center’)
– Adds ‘1-center’*vectr to ‘center’*sln and saves results in sln.
• CALL cvector_assign_vec(work1(ibl), sln(ibl), flag, imode)
– Transfer linear combination to appropriate vector components and Fourier com-
ponent of work1.
Examining vector_type_mod at the end of the vector_type_mode.f file, ‘vector_add is
defined as an interface label to three subroutines that perform the linear combination for
each vector_type. Here we interface vector_add_vec, as determined by the type definitions
of the passed arrays. This subroutine multiplies each coefficient array and sums them.
Note that we did not have to do anything different for tblocks, and coding for coefficients
in different arrays is removed from the calling subroutine.
Going back to the end of the vector_type_mode.f file, note that ‘=’ is overloaded, so we
can write statements like
work1(ibl) = be(ibl) if arrays conform
be(ibl) = 0
which perform the assignment operations component array to component array or scalar to
each component array element.
B.4 Matrix_types
The matrix_type_mod module has definitions for structures used to store matrix elements
and for structures that store data for preconditioning iterative solves. The structures are
somewhat complicated in that there are levels containing pointer arrays of other structures.
The necessity of such complication arises from varying array sizes associated with different
blocks and basis types.
At the bottom of the structure, we have arrays for matrix elements that will be used
in matrix-vector product routines. Thus, array ordering has been chosen to optimize these
product routines which are called during every iteration of a conjugate gradient solve. Fur-
thermore, each lowest-level array contains all matrix elements that are multiplied with a
grid-vertex, horizontal-side, vertical-side, or cell-interior vector type array (arr, arrh, arrv,
and arri respectively) i.e. all coefficients for a single“basis-type”within a block. The matri-
ces are 2D, so there are no Fourier component indices at this level, and our basis functions do
not produce matrix couplings from the interior of one grid block to the interior of another.
The 6D matrix is unimaginatively named arr. Referencing one element appears as
B. DATA STORAGE AND MANIPULATION 25
arr(jq,jxoff,jyoff,iq,ix,iy)
where
iq = ‘quantity’ row index
ix = horizontal-coordinate row index
iy = vertical-coordinate row index
jq = ‘quantity’ column index
jxoff = horizontal-coordinate column offset
jyoff = vertical-coordinate column offset
For grid-vertex to grid-vertex connections 0 ≤ ix ≤ mx and 0 ≤ iy ≤ my, iq and jq
correspond to the vector index dimension of the vector type. (simply 1 : 3 for A
R
, A
Z
, A
φ
or just 1 for a scalar). The structuring of rblocks permits offset storage for the remaining
indices, giving dense storage for a sparse matrix without indirect addressing. The column
coordinates are simply
jx = ix + jxoff
jy = iy + jyoff
For each dimension, vertex to vertex offsets are −1 ≤ j ∗ off ≤ 1, with storage zeroed
out where the offset extends beyond the block border. The offset range is set by the extent
of the corresponding basis functions.
i j
Figure II.2: 1-D Cubic : extent of α
i
in red and α
j
in green, integration gives joff= −1
Offsets are more limited for basis functions that have their nonzero node in a cell interior.
Integration here gives joff=-1 matrix elements, since cells are numbered from 1, while
vertices are numbered from 0. Thus for grid-vertex-to-cell connections −1 ≤ jxoff ≤
0, while for cell-to-grid-vertex connections 0 ≤ jxoff ≤ 1. Finally, j*off=0 for cell-to-
cell. The ‘outer-product’ nature implies unique offset ranges for grid-vertex, horizontal-side,
vertical-side, and cell-interior centered coefficients.
Though offset requirements are the same, we also need to distinguish different bases of
the same type, such as the two sketched here.
26 CHAPTER II. FORTRAN IMPLEMENTATION
i j
Figure II.3: 1-D Cubic : extent of α
i
in red and β
j
in green, note j is an internal node
j i
Figure II.4: 1-D Cubic : extent of β
i
in red and β
j
in green, note both i, j are internal nodes
For lagr_quad_type storage and vector_type pointers, this is accomplished with an
extra array dimension for side and interior centered data. For matrix-vector multiplication
it is convenient and more efficient to collapse the vector and basis indices into a single index
with the vector index running faster.
Examples:
• Bi-cubic grid vertex to vertical side for P
jq=1, 1 ≤ iq ≤ 2
• Bi-quartic cell interior to horizontal side for

B
1 ≤ jq ≤ 27, 1 ≤ iq ≤ 9
Storing separate arrays for matrix elements connecting each of the basis types there-
fore lets us have contiguous storage for nonzero elements despite different offset and basis
dimensions.
B. DATA STORAGE AND MANIPULATION 27
Instead of giving the arrays 16 different names, we place arr in a structure contained by
a 4 × 4 array mat. One of the matrix element arrays is then referenced as
mat(jb,ib)%arr
where
ib=basis-type row index
jb=basis-type column index
and 1 ≤ ib ≤ 4, 1 ≤ jb ≤ 4. The numeral coding is
1≡grid-vertex type
2≡horizontal-vertex type
3≡vertical-vertex type
4≡cell-interior type
Bilinear elements are an exception. There are only grid-vertex bases, so mat is a 1 × 1
array.
This level of the matrix structure also contains self-descriptions, such as nbtype=1 or
4; starting horizontal and vertical coordinates (=0 or 1) for each of the different types;
nb_type(1:4) is equal to the number of bases for each type (poly_degree-1 for types 2-3
and (poly_degree − 1)
2
for type 4); and nq_type is equal to the quantity-index dimension
for each type.
All of this is placed in a structure, rbl_mat, so that we can have on array of rbl_mat’s
with one component component per grid block. A similar tbl_mat structure is also set at
this level, along with self-description.
Collecting all block_mats in the global_matrix_type allows one to define an array of
these structures with one array component for each Fourier component.
This is a 1D array since the only stored matrices are diagonal in the Fourier component
index, i.e. 2D.
matrix_type_mod also contains definitions of structures used to hold data for the pre-
conditioning step of the iterative solves. NIMROD has several options, and each has its own
storage needs.
So far, only matrix_type definitions have been discussed. The storage is located in the
matrix_storage_mod module. For most cases, each equation has its own matrix storage
that holds matrix elements from time-step to time-step with updates as needed. Anisotropic
operators require complex arrays, while isotropic operators use the same real matrix for two
sets of real and imaginary components, as in the B advance example in II.B.3.
28 CHAPTER II. FORTRAN IMPLEMENTATION
The matrix_vector product subroutines are available in the matrix_mod module. Inter-
face matvec allows one to use the same name for different data types. The large number of
low-level routines called by the interfaced matvecs arose from optimization. The operations
were initially written into one subroutine with adjustable offsets and starting indices. What’s
there now is ugly but faster due to having fixed index limits for each basis type resulting in
better compiler optimization.
Note that tblock matrix_vector routines are also kept in matrix_mod. They will undergo
major changes when tblock basis functions are generalized.
The matelim routines serve a related purpose - reducing the number of unknowns in a
linear system prior to calling an iterative solve. This amounts to a direct application of
matrix partitioning. For the linear system equation,

A
11
A
12
A
21
A
22

x
1
x
2

=

b
1
b
2

(II.1)
where A
ij
are sub-matrices, x
j
and b
i
are the parts of vectors, and A and A
22
are invertible,
x2 = A
−1
22
(b
2
− A
21
x
1
) (II.2)

A
11
−A
12
A
−1
22
A
21
. .. .
Schur component
¸
¸
x
1
= b
1
−A
12
A
−1
22
b
2
(II.3)
gives a reduced system. If A
11
is sparse but A
11
−A
12
A
−1
22
A
21
is not, partitioning may slow the
computation. However, if A
11
− A
12
A
−1
22
A
21
has the same structure as A
11
, it can help. For
cell-interior to cell-interior sub-matrices, this is true due to the single-cell extent of interior
basis functions. Elimination is therefore used for 2D linear systems when poly_degree> 1.
Additional notes regarding matrix storage: the stored coefficients are not summed
across block borders. Instead, vectors are summed, so that Ax is found through operations
separated by block, followed by a communication step to sum the product vector across block
borders.
B.5 Data Partitioning
Data partitioning is part of a philosophy of how NIMROD has been - and should continue to
be - developed. NIMROD’s kernel in isolation is a complicated code, due to the objectives
of the project and chosen algorithm. Development isn’t trivial, but it would be far worse
without modularity.
NIMROD’s modularity follows from the separation of tasks described by the implemen-
tation map. However, data partitioning is the linchpin in the scheme. Early versions of
the code had all data storage together. [see the discussion of version 2.1.8 changes in the
kernel’s README file.]. While this made it easy to get some parts of the code working, it
B. DATA STORAGE AND MANIPULATION 29
encouraged repeated coding with slight modifications instead of taking the time to develop
general purpose tools for each task. Eventually it became too cumbersome, and the changes
made to achieve modularity at 2.1.8 were extensive.
Despite many changes since, the modularity revision has stood the test of time. Changing
terms or adding equations requires relatively minimal amount of coding. But, even major
changes, like generalizing finite element basis functions, are tractable. The scheme will need
to evolve as new strides are taken. [Particle closures come to mind.] However, thought and
planning for modularity and data partitioning reward in the long term.
NOTE
• These comments relate to coding for release. If you are only concerned about one
application, do what is necessary to get a result
• However, in many cases, a little extra effort leads to a lot more usage of development
work.
B.6 Seams
The seam structure holds information needed at the borders of grid blocks, including the
connectivity of adjacent blocks and geometric data and flags for setting boundary conditions.
The format is independent of block type to avoid block-specific coding at the step where
communication occurs.
The seam_storage_mod module hold the seam array, a list of blocks that touch the
boundary of the domain (exblock_list), and seam0. The seam0 structure is of edge_type,
like the seam array itself. seam0 was intended as a functionally equivalent seam for all space
beyond the domain, but the approach hampered parallelization. Instead, it is only used
during initialization to determine which vertices and cell sides lie along the boundary. We
still require nimset to create seam0.
As defined in the edge_type_mod module, an edge_type holds vertex and segment struc-
tures. Separation of the border data into a vertex structure and segment structure arises
from the different connectivity requirements; only two blocks can share a cell side, while
many blocks can share a vertex. The three logical arrays: expoint, excorner, and r0point,
are flags to indicate whether a boundary condition should be applied. Additional arrays
could be added to indicate where different boundary conditions should be applied.
Aside For flux injection boundary conditions, we have added the logical array applV0(1:2,:).
This logical is set at nimrod start up and is used to determine where to apply injection cur-
rent and the appropriate boundary condition. The first component applies to the vertex and
the second component to the associated segment.
The seam array is dimensioned from one to the number of blocks. For each block the
vertex and segment arrays have the same size, the number of border vertices equals the
30 CHAPTER II. FORTRAN IMPLEMENTATION
iv=1 iv=2
iv=7
is=1 is=2
iv=nvert
is=nvert
is=7
Figure II.5: iv=vertex index, is=segment index, nvert=count of segment and vertices
number of border segments, but the starting locations are offset (see Figure II.5). For
rblocks, the ordering is unique.
Both indices proceed counter clockwise around the block. For tblocks, there is no re-
quirement that the scan start at a particular internal vertex number, but the seam indexing
always progresses counter clockwise around its block. (entire domain for seam0)
Within the vertex_type structure, the ptr and ptr2 arrays hold the connectivity infor-
mation. The ptr array is in the original format with connections to seam0. It gets defined
in nimset and is saved in dump files. The ptr2 array is a duplicate of ptr, but references
to seam0 are removed. It is defined during the startup phase of a nimrod simulation and
is not saved to dump files. While ptr2 is the one used during simulations, ptr is the one
that must be defined during pre-processing. Both arrays have two dimensions. The second
index selects a particular connection, and the first index is either: 1 to select (global) block
numbers (running from 1 to nbl_total, not 1 to nbl, see information on parallelization), or
2 to select seam vertex number. (see Figure II.6)
Connections for the one vertex common to all blocks are stored as:
B. DATA STORAGE AND MANIPULATION 31
iv=1 iv=2
iv=1 iv=2
iv=1 iv=2
iv=4
iv=3
iv=5
block1
block2
block3
iv=11
.

.

.
Figure II.6: 3 blocks with corresponding vertices
• seam(1)%vertex(5)
– seam(1)%vertex(5)% ptr(1,1) = 3
– seam(1)%vertex(5)% ptr(2,1) = 11
– seam(1)%vertex(5)% ptr(1,2) = 2
– seam(1)%vertex(5)% ptr(2,2) = 2
• seam(2)%vertex(2)
– seam(2)%vertex(2)%ptr(1,1) = 1
– seam(2)%vertex(2)%ptr(2,1) = 5
– seam(2)%vertex(2)%ptr(1,2) = 3
– seam(2)%vertex(2)%ptr(2,2) = 11
• seam(3)%vertex(11)
32 CHAPTER II. FORTRAN IMPLEMENTATION
– seam(3)%vertex(11)%ptr(1,1) = 2
– seam(3)%vertex(11)%ptr(2,1) = 2
– seam(3)%vertex(11)%ptr(1,2) = 1
– seam(3)%vertex(11)%ptr(2,2) = 5
Notes on ptr:
• It does not hold a self-reference, e.g. there are no references like:
– seam(3)%vertex(11)%ptr(1,3) = 3
– seam(3)%vertex(11)%ptr(2,3) = 11
• The order of the connections is arbitrary
The vertex_type stores interior vertex labels for its block in the intxy array. In rblocks
vertex(iv)%intxy(1:2) saves (ix,iy) for seam index iv. In tblocks, there is only one
block-interior vertex label. intxy(1) holds its value, and intxy(2) is set to 0.
Aside: Though discussion of tblocks has been avoided due to expected revisions, the vec-
tor_type section should have discussed their index ranges for data storage. Triangle vertices
are numbered from 0 to mvert, analogous to the logical coordinates in rblocks. The list is
1D in tblocks, but an extra array dimension is carried so that vector_type pointers can be
used. For vertex information, the extra index follows the rblock vertex index and has the
ranges 0:0. The extra dimension for cell information has the range 1:1, also consistent with
rblock cell numbering.
The various %seam_* arrays in vertex_type are temporary storage locations for the data
that is communicated across block boundaries. The %order array holds a unique prescription
(determined during NIMROD startup) of the summation order for the different contributions
at a border vertex, ensuring identical results for the different blocks. The %ave_factor and
%ave_factor_pre arrays hold weights that are used during the iterative solves.
For vertices along a domain boundary, tang and norm are allocated (1:2) to hold the R and
Z components of unit vectors in the boundary tangent and normal directions, respectively,
with respect to the poloidal plane. These unit vectors are used for setting Dirichlet boundary
conditions. The scalar rgeom is set to R for all vertices in the seam. It is used to find vertices
located at R = 0, where regularity conditions are applied.
The %segment array (belonging to edge_type and of type segment_type) holds similar
information; however, segment communication differers from vertex communication three
ways.
• First, as mentioned above, only two blocks can share a cell side.
B. DATA STORAGE AND MANIPULATION 33
• Second, segments are used to communicate off-diagonal matrix elements that extend
along the cell side.
• Third, there may be no element_side_centered data (poly degree = 1), or there
may be many nodes along a given element side.
The 1D ptr array reflects the limited connectivity. Matrix element communication makes
use of intxyp and inxyn internal vertex indices for the previous and next vertices along the
seam (intxys holds the internal cell indices). Finally, the tang and norm arrays have n extra
dimension corresponding to the different element side nodes if poly degree > 1.
Before describing the routines that handle block border communication, it is worth noting
what is written to the dump files for each seam. The subroutine dump_write_seam in the
dump module shows that very little is written. Only the ptr, intxy, and excorner arrays
plus descriptor information are saved for the vertices, but seam0 is also written. None of
the segment information is saved. Thus, NIMROD creates much of the vertex and all of
the segment information at startup. This ensures self-consistency among the different seam
structures and with the grid.
The routines called to perform block border communication are located in the edge
module. Central among these routines is edge_network, which invokes serial or parallel op-
erations to accumulate vertex and segment data. The edge_load and edge_unload routines
are used to transfer block vertex-type data to or from the seam storage. There are three sets
of routines for the three different vector types.
A typical example occurs at the end of the get_rhs routine in finite_element_mod.
The code segment starts with
DO ibl=1,nbl
CALL edge_load_carr(rhsdum(ibl),nqty,1_i4,nfour,nside,seam(ibl))
ENDDO
which loads vector component indices 1:nqty (starting location assumed) and Fourier com-
ponent indices 1:nfour (starting location set by the third parameter in the CALL statement)
for the vertex nodes and ‘nside cell-side nodes from the cvector_type rhsdum to seam_cin
arrays in seam. The single ibl loop includes both block types.
The next step is to perform the communication.
CALL edge_network(nqty,nfour,nside,.false.)
There are a few subtleties in the passed parameters. Were the operation for a real (nec-
essarily 2D) vector_type, nfour must be 0 (‘0_i4 in the statement to match the dummy
argument’s kind). Were the operation for a cvector_2D_type, as occurs in iter_cg_comp,
nfour must be 1 (‘1_i4). Finally, the fourth argument is a flag to perform the load and
34 CHAPTER II. FORTRAN IMPLEMENTATION
unload steps within edge_network. If true, it communicates the crhs and rhs vector types
in the computation_pointers module, which is something of an archaic remnant of the
pre-data-partitioning days.
The final step is to unload the border-summed data.
DO ibl=1,nbl
CALL edge_unload_carr(rhsdum(ibl),nqty,1_i4,nfour,nside,seam(ibl))
ENDDO
The pardata module also has structures for the seams that are used for parallel communi-
cation only. They are accessed indirectly through edge_network and do not appear in the
finite element or linear algebra coding.
C Finite Element Computations
Creating a linear system of equations for the advance of a solution field is a nontrivial task
with 2D finite elements. The FORTRAN modules listed on the left side of the implementation
map (Sec. A) have been developed to minimize the programming burden associated with
adding new terms and equations. In fact, normal physics capability development occurs
at only two levels. Near the bottom of the map, the integrand modules hold coding that
determines what appears in the volume integrals like those on pg11 for Faraday’s law. In
many instances, adding terms to existing equations requires modifications at this level only.
The other level subject to frequent development is the management level at the top of the
map. The management routines control which integrand routines are called for setting-up
an advance. They also need to call the iterative solver that is appropriate for a particular
linear system. Adding a new equation to the physics model usually requires adding a new
management routine as well as new integrand routines. This section of the tutorial is intended
to provide enough information to guide such development.
C.1 Management Routines
The adv_* routines in file nimrod.f manage the advance of a physical field. They are called
once per advance in the case of linear computations without flow, or they are called twice to
predict then correct the effects of advection. A character pass-flag informs the management
routine which step to take.
The sequence of operations performed by a management routine is outlined in Sec. D.2b.
This section describes more of the details needed to create a management routine.
To use the most efficient linear solver for each equation, there are now three different types
of management routines. The simplest matrices are diagonal in Fourier component and have
decoupling between various real and imaginary vector components. These advances require
C. FINITE ELEMENT COMPUTATIONS 35
two matrix solves (using the same matrix) for each Fourier component. The adv_v_clap
is an example; the matrix is the sum of a mass matrix and the discretized Laplacian. The
second type is diagonal in Fouries component, but all real and imaginary vector components
must be solved simultaneously, usually due to anisotropy. The full semi-implicit operator
has such anisotropy, hence adv_v_aniso is this type of management routine. The third type
of management routine deals with coupling among Fourier components, i.e. 3D matrices.
Full use of an evolving number density field requires a 3D matrix solve for velocity, and this
advance is managed by adv_v_3d.
All these management routine types are similar with respect to creating a set of 2D ma-
trices (used only for preconditioning in the 3D matrix-systems) and with respect to finding
the rhs vector. The call to matrix_create passes an array of global_matrix_type and an
array of matrix_factor_type, so that operators for each Fourier components can be saved.
Subroutine names for the integrand and essential boundary conditions are the next argu-
ments, followed by a b.c. flag and the ’pass’ character variable. If elimination of cell-interior
data is appropriate (see Sec. B.4), matrix elements will be modified within matrix_create,
and interior storage [rbl_mat(ibl)%mat(4,4)%arr] is then used to hold the inverse of the
interior submatrix (A
−1
22
).
The get_rhs calls are similar, except the parameter list is (integrand routine name,
cvector_type array, essential b.c. routine name, b.c. flags (2), logical switch for using
a surface integral, surface integrand name, global_matrix_type). The last argument is
optional and its presence indicates that interior elimination is used. [Use rmat_elim= or
cmat_elim= to signify real or complex matrix types.
Once the matrix and rhs are formed, there are vector and matrix operations to find the
product of the matrix and the old solution field, and the result is added to the rhs vector
before calling the iterative solve. This step allows us to write integrand routines for the
change of a solution field rather than its new value, but then solve for the new field so that
the relative tolerance of the iterative solver is not applied to the change, since |∆x| |x| is
often true.
Summarizing:
• Integrands are written for A and b with A∆x = b to avoid coding semi-implicit terms
twice.
• Before the iterative solve, find b − Ax
old
.
• The iterative method solves Ax
new
= (b − x
old
).
The 3D management routines are similar to the 2D management routines. They differ in
the following way:
• Cell-interior elimination is not used.
• Loops over Fourier components are within the iterative solve.
36 CHAPTER II. FORTRAN IMPLEMENTATION
• The iterative solve used a ’matrix-free’ approach, where the full 3D matrix is never
formed. Instead, an rhs integrand name is passed.
See Sec. D for more information on the iterative solves.
After an iterative solve is complete, there are some vector operations, including the com-
pletion of regularity conditions and an enforcement of Dirichlet boundary conditions. The
latter prevents an accumulation of error from the iterative solver, but it is probably super-
fluous. Data is saved (with time-centering in predictor steps) as basis function coefficients
and at quadrature points after interpolating through the *block_qp_update calls.
C.2 Finite_element Module
Routines in the finite_element module also serve managerial functions, but the operations
are those repeated for every field advance, allowing for a few options.
The real_ and comp_matrix_create routines first call the separate numerical integration
routines for rblocks and tblocks, passing the same integrand routine name. Next, they call
the appropriate matelim routine to reduce the number of unknowns when poly_degree> 1.
Then, connections for a degenerate rblock (one with a circular-polar origin vertex) are
collected. Finally, iter_factor finds some type of approximate factorization to use as a
preconditioner for the iterative solve.
The get_rhs routine also starts with block-specific numerical integration, and it may
perform an optional surface integration. The next step is to modify grid-vertex and cell-side
coefficients through cell-interior elimination (b
1
− A
12
A
−1
22
b
2
from p.53). The block-border
seaming then completes a scatter process:
The final steps in get_rhs apply regularity and essential boundary conditions.
Aside on the programming:
• Although the names of the integrand, surface integrand, and boundary condition rou-
tines are passed through matrix_create and get_rhs, finite_element does not use
the integrands, surface_ints, or boundary modules. finite_element just needs
parameter-list information to call any routine out of the three classes. The parameter
list information, including assumed-shape array definitions, are provided by the inter-
face blocks. Note that all integrand routines suitable for comp_matrix_create, for
example, must use the same argument list as that provided by the interface block for
‘integrand’ in comp_matrix_create.
C.3 Numerical integration
Modules at the next level in the finite element hierarchy perform the routine operations
required for numerical integration over volumes and surfaces. The three modules, rblock,
C. FINITE ELEMENT COMPUTATIONS 37
A volume integral over
this bicubic cell contributes
to node coefficients at the
positions.
A volume integral here
contributes to node
coefficients at the
positions.
Node coefficients along block borders require
an edge_network call to get all contributions.
Figure II.7: Block-border seaming
38 CHAPTER II. FORTRAN IMPLEMENTATION
ig=1 ig=2
Figure II.8: 4 quad points, 4 elements in a block, dot denotes quadrature position, not node
tblock, and surface, are separated to provide a partitioning of the different geometric infor-
mation and integration procedures. This approach would allow us to incorporate additional
block types in a straightforward manner should the need arise.
The rblock_get_comp_rhs routine serves as an example of the numerical integration
performed in NIMROD. It starts by setting the storage arrays in the passed cvector_type
to ∅. It then allocates the temporary array, integrand, which is used to collect contributions
for test functions that are nonzero in a cell, one quadrature point at a time. Our quadrilateral
elements have (poly_degree+1)
2
nonzero test functions in every cell (see the Figure II.7 and
observe the number of node positions).
The integration is performed as loop over quadrature points, using saved data for the
logical offsets from the bottom left corner of each cell, R, and w
i
∗ J(ξ
i
, η
i
). Thus each
iteration of the do-loop finds the contributions for the same respective quadrature point in
all elements of a block.
The call to the dummy name for the passed integrand then finds the equation and
algorithm-specific information from a quadrature point. A blank tblock is passed along
with the rb structure for the block, since the respective tblock integration routine calls the
same integrands and one has to be able to use the same argument list. In fact, if there were
only one type of block, we would not need the temporary integrand storage at all.
The final step in the integration is the scatter into each basis function coefficient storage.
Note that w
i
∗ J appears here, so in the integral

dξdηJ(ξ, η)f(ξ, η), the integrand routine
finds f(ξ, η) only.
The matrix integration routines are somewhat more complicated, because contributions
have to be sorted for each basis type and offset pair, as described in B.4 [NIMROD uses two
different labeling schemes for the basis function nodes. The vector_type and global_matrix_type
separate basis types (grid vertex, horizontal side, vertical side, cell interior) for efficiency in
C. FINITE ELEMENT COMPUTATIONS 39
linear algebra. In contrast, the integrand routines use a single list for nonzero basis func-
tions in each element. The latter is simpler and more tractable for the block-independent
integrand coding, but each node may have more than one label in this scheme. The matrix
integration routines have to do the work of translating between the two labeling schemes.]
C.4 Integrands
Before delving further into the coding, let’s review what is typically needed for an integrand
computation. The Faraday’s law example in Section A.1 suggests the following:
1. Finite element basis functions and their derivatives with respect to R and Z.
2. Wavenumbers for the toridal direction
3. Values of solution fields and their derivatives.
4. Vector operations.
5. Pseudospectral computations.
6. Input parameters and global simulation data.
Some of this information is already stored at the quadrature point locations. The basis
function information depends on the grid and the selected solution space. Neither vary during
a simulation, so the basis function information is evaluated and saved in block-structure
array as described in B.1. Solution field data is interpolated to the quadrature pions and
stored at the end of the equation advances. It is available through the qp_type storage, also
described in B.1. Therefore, generic_alpha_eval and generic_ptr_set routines merely
set pointers to the appropriate storage locations and do not copy the data to new locations.
It is important to remember that integrand computations must not change any values
in the pointer arrays. Doing so will lead to unexpected and confusing consequences in other
integrand routines.
The input and global parameteres are satisfied by the F90 USE statement of modules given
those names. The array keff(1:nmodes), is provided to find the value of the wavenumber
associated with each Fourier component. The factor keff(1:nmodes)/bigr is the toroidal
wavenumber for the data at Fourier index imode. There are two things to note:
1. In the toroidal geometry, keff, holds the n-value, and bigr(:,:) holds R. In linear
geometry, keff holds k
n
=
2πn
per length
and bigr=1.
2. Do not assume that a particular n-value is associated with some value of imode. Lin-
ear computations often have nmodes=1 and keff(1) equal to the input parameter
lin_nmax. Domain decomposition also affects the values and range of keff. This is
why there is a keff array.
40 CHAPTER II. FORTRAN IMPLEMENTATION
The k2ef array holds the square of each value of keff.
Continuing with the Faraday’s law example, the first term on the lhs of the second
equation on pg 11 is associated with
∂B
∂t
. The spatial discretization gives the integrand α
i
α
j
(recall that J appears at the next higher level in the hierarchy). This ‘mass’ matrix is so
common that it has its own integrand routine, get_mass. [The matrix elements are stored in
the mass_mat data structure. Operators requiring a mass matrix can have matrix_create
add it to another operator.]
The int array passed to get_mass has 6 dimensions, like int arrays for other matrix
computations. The first two dimensions are direction-vector-component or ‘quantity’ indices.
The first is the column quantity (jq), and the second is the row quantity (iq). The next two
dimensions have range (1:mx,1:my) for the cell indices in a grid-block. The last two indices
are basis function indices (jv,iv) for the j’ column basis index and the j row index. For
get_mass, the orthogonality of direction vectors implies that ˆ e
l
does not appear, so iq=jq=1:
α
j
(ξ, η)ˆ e
l
exp (−in

φ) test function dotted with
α
j
(ξ, η)ˆ e
l
exp(inφ)
∂B
jln
∂t
gives
α
j
α
j
for all l

and n

All of the stored matrices are 2D, i.e. diagonal in n, so separate indices for row and column
n indices are not used.
The get_mass subroutines then consists of a dimension check to determine the number
of nonzero finite elemnt basis function in each cell, a pointer set for α, and a double loop over
basis indices. Note that our operators have full storage, regardless of symmetry. This helps
make our matrix-vector products faster and it implies that the same routines are suitable
for forming nonsymmetric matrices.
The next term in the lhs (from pgB.2) arises from implicit diffusion. The adv_b_iso
management routine passes the curl_de_iso integrand name, so lets examine that routine.
It is more complicated than get_mass, but the general form is similar. The routine is used
for creating the resistive diffusion operator and the semi-implicit operator for Hall terms, so
there is coding for two different coefficients. The character variable integrand_flag, that
is set in the advance subroutine of nimrod.f and saved in the global module, determines
which coefficient to use. Note that calls to generic_ptr_set pass both rblock andtblock
structures for a field. One of them is always a blank structure, but this keeps block specific
coding out of integrands. Instead, the generic_evals module selects the appropriate data
storage.
After the impedence coefficient is found, the int array is filled. The basis loops appear
in a conditional statement to use or skip the ∇∇· term for the divergence cleaning. We will
skip them in this discussion. Thus we need ∇ × (α
j
ˆe
l
exp (−in

φ)) · ∇ × (α
j
ˆ e
l
exp(inφ))
only, as on pg 11. To understand what is in the basis function loops, we need to evaluate
∇× (α
j
ˆ e
l
exp(inφ)). Though ˆ e
l
is ˆ e
R
, ˆ e
Z
or ˆ e
φ
only, it is straightforward to use a general
C. FINITE ELEMENT COMPUTATIONS 41
direction vector and then restrict attention to the three basis vectors of our coordinate
system.
∇× (α
j
ˆe
l
exp(inφ)) = ∇(α
j
exp (inφ)) × ˆe
l
+ α
j
exp (inφ)∇×(ˆ e
l
)
∇(α
j
exp(inφ)) =

∂α
∂R
ˆ
R +
∂α
∂Z
ˆ
Z +
inα
R
α
ˆ
φ

exp(inφ)
∇(α
j
exp (inφ)) × ˆ e
l
=
¸
∂α
∂Z
(ˆ e
l
)
φ

inα
R
(ˆ e
l
)
Z

ˆ
R +

inα
R
(ˆ e
l
)
R

∂α
∂R
(ˆ e
l
)
φ

ˆ
Z
+

∂α
∂R
(ˆ e
l
)
Z

∂α
∂Z
(ˆ e
l
)
R

ˆ
φ

exp(inφ)
∇× (ˆ e
l
) = 0 for l = R, Z
∇× (ˆ e
l
) = −
1
R
ˆ
Z for l = φand toroidal geometry
Thus, the scalar product ∇×(α
j
ˆ e
l
exp (−in

φ)) · ∇× (α
j
ˆ e
l
exp(inφ)) is

∂α
j

∂Z
(ˆ e
l
)
φ
+
in

α
j

R
(ˆ e
l
)
Z

∂α
j
∂Z
(ˆ e
l
)
φ

inα
j
R
(ˆ e
l
)
Z

+


inα
j

R
(ˆ e
l
)
R

α
j

R
+
∂α
j

∂R

(ˆ e
l
)
φ

inα
j
R
(ˆ e
l
)
R

α
j
R
+
∂α
∂R

(ˆ e
l
)
φ

+

∂α
j

∂R
(ˆ e
l
)
Z

∂α
j

∂Z
(ˆ e
l
)
R

∂α
j
∂R
(ˆ e
l
)
Z

∂α
j
∂Z
(ˆ e
l
)
R

Finding the appropriate scalar product for a given (jq,iq) pair is then a matter of sub-
stituting
ˆ
R,
ˆ
Z,
ˆ
φ for ˆ e
l
for l

⇒ iq = 1, 2, 3, respectively. This simplifies the scalar product
greatly for each case. Note how the symmetry of the operator is preserved by construction.
For (jq, iq = (1, 1), (ˆ e
l
)
R
= (ˆ e
l
)
R
= 1 and (ˆ e
l
)
Z
= (ˆ e
l
)
φ
= (ˆ e
l
)
Z
= (ˆ e
l
)
φ
= 0 so
int(1,1,:,:,jv,iv)=

n
2
R
α
jv
αiv +
∂α
jv
∂Z
α
iv
∂Z

×ziso
where ziso is an effective impedance.
Like other isotropic operators the resulting matrix elements are either purely real or
purely imaginary, and the only imaginary elements are those coupling poloidal and toroidal
vector components. Thus, only real poloidal coefficients are coupled to imaginary toroidal
coefficients and vice versa. Furthermore, the coupling between Re(B
R
, B
Z
) and Im(B
φ
) is the
same as that between Im(B
R
, B
Z
) and Re(B
φ
), making the operator phase independent. This
lets us solve for these separate groups of components as two real matrix equations, instead
of one complex matrix equation, saving CPU time. An example of the poloidal-toroidal
coupling for real coefficients is
42 CHAPTER II. FORTRAN IMPLEMENTATION
int(1,3,:,:,jv,iv)=

jv
R

αiv
R
+
α
iv
∂R

×ziso
which appears through a transposeof the (3, 1) element. [compare with −
inα
jv
R

αiv
R
+
α
iv
∂R

from the scalar product on pg 41]
Notes on matrix integrands:
• The Fourier component index, jmode, is taken from the global module. It is the
do-loop index set in matrix_create and must not be changed in integrand routines.
• Other vector-differential operations acting on α
j
ˆ e
l
exp(inφ) are needed elsewhere. The
computations are derived in the same manner as ∇× (α
j
ˆ e
l
exp(inφ)) given above.
• In some case (semi-implicit operators, for example) the n = 0 component of a field
is needed regardless of the n-value assigned to a processor. Separate real storage is
created and saved on all processors (see Sects F& C).
The rhs integrands differ from matrix integrands in a few ways:
1. The vector aspect (in a linear algebra sense) of the result implies one set of basis
function indices in the int array.
2. Coefficients for all Fourier components are created in one integrand call.
3. The int array has 5 dimensions, int(iq,:,:,iv,im), where im is the Fourier index
(imode).
4. There are FFTs and pseudospectral operations to generate nonlinear terms.
Returning to Faraday’s law as an example, the brhs_mhd routine finds ∇×(α
j
ˆ e
l
exp (−in

φ))·
E(R, Z, φ), where E may contain ideal mhd, resistive, and neoclassical contributions. As with
the matrix integrand, E is only needed at the quadrature points.
From the first executable statement, there are a number of pointers set to make various
fields available. The pointers for B are set to the storage for data from the end of the last
time-split, and the math_tran routine, math_curl is used to find the corresponding current
density. Then, ∇·B is found for error diffusion terms. After that, the be,ber, and bez arrays
are reset to the predicted B for the ideal electric field during corrector steps (indicated by
the integrand_flag. More pointers are then set for neoclassical calculations.
The first nonlinear computation appears after the neoclassical_init select block. The
V and Bdata is transformed to functions of φ, where the cross product, V×Bis determined.
The resulting nonlinear ideal E is then transformed to Fourier components (see Sec B.1).
Note that fft_nim concatenates the poloidal position indices into one index, so that real_be
has dimensions (1:3,1:mpseudo,1:nphi) for example. The mpseudo parameter can be less
C. FINITE ELEMENT COMPUTATIONS 43
than the number of cells in a block due to domain decomposition (see Sec C), so one should
not relate the poloidal index with the 2D poloidal indices used with Fourier components.
A loop over the Fourier index follows the pseudospectral computations. The linear ideal
E terms are created using the data for the specified steady solution, completing - (V
s
×B+
(V× B
s
+ (V×B) (see Sec A.1). Then, the resistive and neoclassical terms are computed.
Near the end of the routine, we encounter the loop over test-function basis function
indices (j

, l

). The terms are similar to those in curl_de_iso, except ∇×(α
j
ˆ e
l
exp (−inφ))
is replaced by the local E, and there is a sign change for −∇× (α
j
ˆ e
l
exp(−in

φ)) · E.
The integrand routines used during iterative solves of 3D matrices are very similar to
rhs integrand routines. They are used to find the dot product of a 3D matrix and a vector
without forming matrix elements themselves (see Sec D.2). The only noteworthy difference
with rhs integrands is that the operand vector is used only once. It is therefore interpolated
to the quadrature locations in the integrand routine with a generic_all_eval call, and the
interpolate is left in local arrays. In p_aniso_dot, for example, pres,presr, and presz are
local arrays, not pointers.
C.5 Surface Computations
The surface and surface_int modules function in an analogous manner to the volume
integration and integrand modules, respectively. One important difference is that not all
border segments of a block require surface computation (when there is one). Only those
on the domain boundary have contributions, and the seam data is not carried below the
finite_element level. Thus, the surface integration is called one cell-side at a time from
get_rhs.
At present, 1D basis function information for the cell side is generated on the fly in the
surface integration routines. It is passed to a surface integrand. Further development will
likely be required if more complicated operations are needed in the computations.
C.6 Dirichlet boundary conditions
The boundary module contains routines used for setting Dirichlet (or ‘essential’) boundary
conditions. They work by changing the appropriate rows of a linear system to
(A
jln jln
)(x
jln
) = 0, (II.4)
where j is a basis function node along the boundary, and l and n are the direction-vector and
Fourier component indices. In some cases, a linear combination is set to zero. For Dirichlet
conditions on the normal component, for example, the rhs of Ax = b is modified to
(I − ˆ n
j
ˆ n
j
) · b ≡
˜
b (II.5)
44 CHAPTER II. FORTRAN IMPLEMENTATION
where ˆ n
j
is the unit normal direction at boundary node j. The matrix becomes
(I − ˆ n
j
ˆ n
j
) · A · (I − ˆ n
j
ˆ n
j
) + ˆ n
j
ˆ n
j
(II.6)
Dotting ˆ n
j
into the modified system gives
x
jn
= ˆ n
j
· x = 0 (II.7)
Dotting (I − ˆ n
j
ˆ n
j
) into the modified system gives
(I − ˆ n
j
ˆ n
j
) · A · ˜ x =
˜
b (II.8)
The net effect is to apply the boundary condition and to remove x
jn
from the rest of the
linear system.
Notes:
• Changing boundary node equations is computationally more tractable than changing
the size of the linear system, as implied by finite element theory.
• Removing coefficients from the rest of the system [through ·(I −ˆ nˆ n) on the right side of
A] may seem unnecessary. However, it preserves the symmetric form of the operator,
which is important when using conjugate gradients.
The dirichlet_rhs routine in the boundary module modifies a cvector_type through
operations like (I −ˆ nˆ n)·. The ‘component’ character input is a flag describing which compo-
nent(s) should be set to 0. The seam data tang, norm, and intxy(s) for each vertex (segment)
are used to minimize the amount of computation.
The dirichlet_op and dirichlet_comp_op routines perform the related matrix opera-
tions described above. Though mathematically simple, the coding is involved, since it has
to address the offset storage scheme and multiple basis types for rblocks.
At present, the kernel always applies the same Dirichlet conditions to all boundary nodes
of a domain. Since the dirichlet_rhs is called one block at a time, different component
input could be provided for different blocks. However, the dirichlet_op routines would
need modification to be consistent with the rhs. It would also be straightforward to make
the ‘component’ information part of the seam structure, so that it could vary along the
boundary independent of the block decomposition.
Since our equations are constructed for the change of a field and not its new value, the
existing routines can be used for inhomogeneous Dirichlet conditions, too. If the conditions
do not change in time, one only needs to comment out the calls to dirichlet_rhs at the end
of the respective management routine and in dirichlet_vals_init. For time-dependent
inhomogeneous conditions, one can adjust the boundary node values from the management
routine level, then proceed with homogeneous conditions in the system for the change in the
field. Some consideration for time-centering the boundary data may be required. However,
the error is no more than O(∆t) in any case, provided that the time rate of change does not
appear explicitly.
C. FINITE ELEMENT COMPUTATIONS 45
C.7 Regularity conditions
When the input parameter geometry is set to ‘tor’, and the left side of the grid has points
along R = 0, the domain is simply connected – cylindrical with axis along Z, not toroidal.
With the cylindrical (R, Z, φ) coordinate system, our Fourier coefficients must satisfy regu-
larity conditions so that fields and their derivatives approach φ-independent values at R = 0.
The regularity conditions may be derived by inserting the x = Rcos φ, y = Rsin φ
transformation and derivatives into the 2D Taylor series expansion in (x, y) about the origin.
It is straightforward to show that the Fourier φ-coefficients of a scalar S(R, φ) must satisfy
S
n
(φ) ∼ R
|n|
ψ(R
2
), (II.9)
where ψ is a series of nonnegative powers of its argument. For vectors, the Z-direction
component satisfies the relation for scalars, but
V
R
n
, V
φ
n
∼ R
|n−1|
ψ(R
2
) for n ≥ 0. (II.10)
Furthermore, at R = 0, phase information for V
R
1
and V
φ
1
is redundant. We take φ to be 0
at R = 0 and enforce
V
φ
1
= iV
R
1
. (II.11)
The entire functional form of the regularity conditions is not imposed numerically. The
conditions are for the limit of R →0 and would be too restrictive if applied across the finite
extent of a computational cell. Instead, we focus on the leading terms of the expansions.
The routines in the regularity module impose the leading-order behavior in a manner
similar to what is done for Dirichlet boundary conditions. The regular_vec, regular_op,
and regular_comp_op routines modify the linear systems to set S
n
= 0 at R = 0 for n > 0
and to set V
R
n
and V
φ
n
to 0 for n = 0 and n ≥ 2. The phase relation for V
R
1
and V
φ
1
at
R = 0 is enforced by the following steps:
1. At R = 0 nodes only, change variables to V
+
= (V
r
+iV
φ
)/2 and V

= (V
r
− iV
φ
)/2.
2. Set V
+
= 0, i.e. set all elements in a V
+
column to 0.
3. To preserve symmetry with nonzero V

column entries, add −i times the V
φ
-row to
the V
R
-row and set all elements of the V
φ
-row to 0 except for the diagonal.
After the linear system is solved, the regular_ave routine is used to set V
φ
= iV

at the
appropriate nodes.
Returning to the power series behavior for R → 0, the leading order behavior for S
0
,
V
R
1
, and V
φ
is the vanishing radial derivative. This is like a Neumann boundary condition.
Unfortunately, we cannot rely on a ‘natural B.C.’ approach because there is no surface with
46 CHAPTER II. FORTRAN IMPLEMENTATION
finite area along an R = 0 side of a computational domain. Instead, we add the penalty
matrix,

dV
w
R
2
∂α
j

∂R
∂α
j
∂R
, (II.12)
saved in the dr_penalty matrix structure, with suitable normalization to matrix rows for
S
0
, V
R
1
and V
φ
1
. The weight w is nonzero in elements along R = 0 only. Its positive value
penalizes changes leading to

∂R
= 0 in these elements. It is not diffusive, since it is added to
linear equations for the changes in physical fields. The regular_op and regular_comp_op
routines add this penalty term before addressing the other regularity conditions.
[Looking through the subroutines, it appears that the penalty has been added for V
R
1
and V
φ
1
only. We should keep this in mind if a problem arises with S
0
(or V
Z
0
) near R = 0.]
D Matrix Solution
At present, NIMROD relies on its own iterative solvers that are coded in FORTRAN 90 like
the rest of the algorithm. They perform the basic conjugate gradient steps in NIMROD’s
block decomposed vector_type structures, and the matrix_type storage arrays have been
arranged to optimized matrix-vector multiplication.
D.1 2D matrices
The iter_cg_f90 and iter_cg_comp modules contain the routines needed to solve symmetric-
positive-definite and Hermitian-positive-definite systems, respectively. The iter_cg module
holds interface blocks to give common names and the real and complex routines. The rest
of the iter_cg.f file has external subroutines that address solver-specific block-border com-
munication operations.
Within the long iter_cg_f90.f and iter_cg_comp.f files are separate modules for
routines for different preconditioning schemes, a module for managing partial factoriza-
tion(iter_*_fac), and a module for managing routines that solve a system(iter_cg_*).
The basic cg steps are performed by iter_solve_*, and may be compared with textbook
descriptions of cg, once one understands NIMROD’s vector_type manipulation.
Although normal seam communication is used during matrix_vector multiplication,
there are also solver-specific block communication operations. The partial factorization rou-
tines for preconditioning need matrix elements to be summed across block borders(including
off-diagonal elements connecting border nodes), unlike matrix_vector product routines.
The iter_mat_com routines execute this communication using special functionality in edge_seg_network
for the off-diagonal elements. After the partial factors are found and stored in the ma-
trix_factor_type structure, border elements of the matric storage are restored to their
original values by iter_mat_rest.
D. MATRIX SOLUTION 47
Other seam-related oddities are the averaging factors for border elements. The scalar
product of two vector_type is computed by iter_dot. The routine is simple, but it has
to account for redundant storage of coefficients along block borders, hence ave_factor. In
the preconditioning step, the solve of
˜
Az = r where
˜
A is an approximate A and r is the
residual, results from block-based partial-factors(the “direct”, “bl_ilu_*”, and “bl_diag*”
choices) are averaged at block borders. However, simply multiplying border z-value by
ave_factor and seaming after preconditioning destroys the symmetry of the operation(and
convergence). Instead, we symmetrize the averaging by multiply border elements of r by
ave_factor_pre(∼

ave factor), inverting
˜
A, then multiply z by ave_factor_pre before
summing.
Each preconditioner has its own set of routines and factor storage, except for the global
and block line-Jacobi algorithms which share 1D matrix routines. The block-direct option
uses Lapack library routines. The block-incomplete factorization options are complicated but
effective for mid-range condition numbers(arising roughly when ∆t
explicit−limit
∆t τ
A
).
Neither of these two schemes has been updated to function with higher-order node data. The
block and global line-Jacobi schemes use 1D solves of couplings along logical directions, and
the latter has its own parallel communication(see D). There is also diagonal preconditioning,
which inverts local direction vector couplings only. This simple scheme is the only one has
functions in both rblocks and tblocks. Computations with both block types and solver =
‘diagonal’ will use the specified solver in rblocks and ‘diagonal’ in tblocks.
NIMROD grids may have periodic rblocks, degenerate points in rblocks, and cell-interior
data may or may not be eliminated. These idiosyncracies have little effect on the basic
cg steps, but they complicate the preconditioning operations. They also make coupling to
external library solver packages somewhat challenging.
D.2 3D matrices
The conjugate gradient solver for 3D systems is mathematically similar to the 2D solvers.
Computationally, it is quite different. The Fourier representation leads to matrices that are
dense in n-index when φ-dependencies appear in the lhs of an equation. Storage requirements
for such a matrix would have to grow as the number of Fourier components is increased, even
with parallel decomposition. Achieving parallel scaling would be very challenging.
To avoid these issues, we use a ‘matrix-free’ approach, where the matrix-vector products
needed for cg iteration are formed directly from rhs-type finite element computation. For
example, were the resistivity in our form if Faraday’s law on p.? a function of φ, it would
generate off-diagonal in n contributions:
¸
jln
B
jln

dRdZdφ
η(R, Z, φ)
µ
0
∇×(¯ α
j

l
e
−in

φ
) · ∇×(¯ α
jl
e
inφ
) (II.13)
which may be nonzero for all n

.
48 CHAPTER II. FORTRAN IMPLEMENTATION
We can switch the summation and integration orders to arrive at

dRdZdφ∇× (¯ α
j

l
e
−in

φ
) ·
η(R, Z, φ)
µ
0
[
¸
jln
B
jln
∇× (¯ α
jl
e
inφ
)] (II.14)
Identifying the term in brackets as µ
0
J(R, Z, φ), the curl of the interpolated and FFT’ed
B, shows how the product can be found as a rhs computation. The B
jln
data are coeffi-
cients of the operand, but when they are associated with finite-element structures, calls to
generic_all_eval and math_curl create µ
0
J
n
. Thus, instead of calling an explicit matrix-
vector product routine, iter_3d_solve calls get_rhs in finite element, passing a dot-product
integrand name. This integrand routine uses the coefficients of the operand in finite-element
interpolations.
The scheme uses 2D matrix structures to generate partial factors for preconditioning that
do not address n → n

coupling. It also accepts a separate 2D matrix structure to avoid
repetition of diagonal in n operations in the dot-integrand; the product is then the sum of
the finite element result and a 2D-matrix-vector multiplication. Further, iter_3d_cg solves
systems for changes in fields directly. The solution field at the beginning of the time split is
passed into iter_3d_cg_solve to scale norms appropriately for the stop condition.
E Start-up Routines
Development of the physics model often requires new *block_type and matrix storage. [The
storage modules and structure types are described in Sec. B.] Allocation and initialization
of storage are primary tasks of the start-up routines located in file nimrod_init.f. Adding
new variables and matrices is usually a straightforward task of identifying an existing similar
data structure and copying and modifying calls to allocation routines.
The variable_alloc routine creates quadrature-point storage with calls to rblock_qp_alloc
and tblock_qp_alloc. There are also lagr_quad_alloc and tri_linear_alloc calls to
create finite-element structures for nonfundamental fields (like J which is computed from
the fundamental B field) and work structures for predicted fields. Finally, variable_alloc
creates vector_type structures used for diagnostic computations. The quadrature_save
routine allocates and initializes quadrature_point storage for the steady-state (or ’equilib-
rium’) fields, and it initializes quadrature-point storage for dependent fields.
Additional field initialization information:
• The ˆ e
3
component of be_eq structures use the covariant component (RB
φ
) in toroidal
geometry, and the ˆ e
3
of ja_eq is the contravariant J
φ
/R. The data is converted to
cylindrical components after evaluating at quadrature point locations. The cylindrical
components are saved in their respective qp structures.
• Initialization routines have conditional statements that determine what storage is cre-
ated for different physics model options.
F. UTILITIES 49
• The pointer_init routine links the vector_type and finite element structures via
pointer assignment. New fields will need to be added to this list too.
• The boundary_vals_init routine enforces the homogeneous Dirichlet boundary con-
ditions on the initial state. Selected calls can be commented out if inhomogeneous
conditions are appropriate.
The matrix_init routine allocates matrix and preconditioner storage structures. Opera-
tions common to all matrix allocations are coded in the *_matrix_init_alloc, iter_fac_alloc,
and *_matrix_fac_degen subroutines. Any new structure allocations can be coded by copy-
ing and modifying existing allocations.
F Utilities
The utilities.f file has subroutines for operations that are not encompassed by finite
_element and normal vector_type operations. The new_dt and ave_field_check sub-
routines are particularly important, though not elegantly coded, subroutines. Teh new_dt
routine computes the rate of flow through mesh cells to determine if advection should limit
the time step. Velocity vectors are averaged over all nodes in an element, and rates of ad-
vection are found from |v· x
i
/|x
i
|
2
| in each cell, where x
i
is a vector displacement across the
cell in the i-th logical coordinate direction. A similar computation is performed for electron
flow when the two-fluid model is used.
The ave_field_check routine is an important part of our semi-implicit advance. The
‘linear’ terms in the operators use the steady-state fields and the n = 0 part of the solution.
The coefficient for the isotropic part of the operator is based on the maximum difference be-
tween total pressures and the steady_plus_n=0 pressures, over the φ-direction. Thus, both
the ‘linear’ and ‘nonlinear’ parts of the semi-implicit operator change in time. However, com-
puting matrix elements and finding partial factors fro preconditioning are computationally
intensive operations. To avoid calling these operations during every time step, we determine
how much the fields have changed sicne the last matrix update, and we compute new matrices
when the change exceeds a tolerance (ave_change_limit). [Matrices are also recomputed
when ∆t changes.]
The ave_field_check subroutine uses the vector_type pointers to facilitate the test.
It also considers the grid-vertex data only, assuming it would not remain fixed while other
coefficients change. If the tolerance is exceeded, flags such as b0_changed in the global
module are set to true, and the storage arrays for the n = 0 fields or nonlinear pressures
are updated. Parallel communication is required for domain decomposition of the Fourier
components (see Sec. C).
Before leaving utilities.f, let’s take a quick look at find_bb. This routine is used to
find the toroidally symmetric part of the
ˆ
b
ˆ
b dyad, which is used for the 2D preconditioner
matrix for anisotropic thermal conduction. The computation requires information from
50 CHAPTER II. FORTRAN IMPLEMENTATION
all Fourier components, so it cannot occur in a matrix integrand routine. [Moving the
jmode loop from matrix_create to matrix integrands is neither practical nor preferrable.]
Thus,
ˆ
b
ˆ
b is created and stored at quadrature point locations, consistent with an integrand-
type of computation, but separate from the finite-element hierarchy. The routine uses the
mpi_allreduce command to sum contributions from different Fourier ’layers’.
Routines like find_bb may become common if we find it necessary to use more 3D linear
systems in our advances.
G Extrap_mod
The extrap_mod module holds data and routines for extrapolating solution fields to the new
time level, providing the initial guess for an iterative linear system solve. The amount of
data saved depends on the extrap_order input parameter.
Code development rarely requires modification of these routines, except extrap_init.
This routine decides how much storage is required, and it creates an index for locating the
data for each advance. Therefore, when adding a new equation, increase the dimension fo
extrap_q, extrap_nq, and extrap_int and define the new values in extrap_init. Again,
using existing code for examples is helpful.
H History Module hist_mod
The time-dependent data written to XDRAW binary or text files is generated by the routines in
hist_mod. The output from probe_hist is presently just a collection of coefficients from a
single grid vertex. Diagnostics requiring surface or volume integrals (toroidal flux and kinetic
energy, for example) use rblock and tblock numerical integration of the integrands coded
in the diagnostic_ints module.
The numerical integration and integrands differ only slightly from those used in form-
ing systems for advancing the solution. First, the cell_rhs and cell_crhs vector_types
are allocated with poly_degree=0. This allocates a single cell-interior basis and no grid or
side bases. [The *block_get_rhs routines have flexibility for using different basis functions
through a simulation. They ’scatter’ to any vector_type storage provided by the calling rou-
tine.] The diagnostic_ints integrands differ in that there are no ¯ α
j

l
e
−in

φ
test functions.
The integrals are over physical densities.
I I/O
Coding new input parameters requires two changes to input.f (in the nimset directory
and an addition to parallel.f. The first part of the input module defines variables and
default values. A description of the parameters should be provided if other users will have
I. I/O 51
access to the modified code. A new parameter should be defined near related parameters
in the file. In addition, the parameter must be added to the appropriate namelist in the
read_namelist subroutine, so that it can be read from a nimrod.in file. The change required
in parallel.f is just to add the new parameterto the list in broadcast_input. This sends
the read values from the single processor that reads nimrod.in to all others. Be sure to use
another mpi_bcast or bcase_str with the same data type as an example.
Changes to dump files are made infrequently to maintain compatibility across code ver-
sions to the greatest extent possible. However, data structures are written and read with
generic subroutines, which makes it easy to add fields. The overall layout of a dump file is
1. global information (dump_write and dump_read)
2. seams (dump_write_seam, dump_read_seam)
3. rblocks (* write rblock, * read rblock)
4. tblocks (* write tblock, * read tblock)
Within the rblock and tblock routines are calls to structure-specific subroutines. These
calls can be copied and modified to add new fields. [But, don’t expect compatibility with
other dump files.]
Other dump notes:
• The read routines allocate data structures to appropriate sizes just before a record is
read.
• All data is written as 8-byte real to improve portability of the file while using FORTRAN
binary I/O.
• Only one processor reads and writes dump files. Data is transferred to other processors
via MPI communication coded in parallel_io.f.
• The NIMROD and NIMSET dump.f files are different to avoid parallel data requirements
in NIMSET. Any changes must be made to both files.
52 CHAPTER II. FORTRAN IMPLEMENTATION
Chapter III
Parallelization
A Parallel communication introduction
For the present and forseeable future, ‘high-performance’ computing implies parallel comput-
ing. This fact was readily apparent at the inception of the NIMROD project, so NIMROD
has always been developed as a parallel code. The distributed memory paradigm and MPI
communication were chosen for portability and efficiency. A well conceived implementation
(thanks largely to Steve Plimpton of Sandia) keeps most parallel communication in modular
coding, isolated from the physics model coding.
A detailed description of NIMROD’s parallel communication is beyond the scope of this
tutorial. However, a basic understanding of the domain decomposition is necessary for
successful development, and all developers will probably write at least one mpi_allreduce
call at some time.
While the idea of dividing a simulation into multiple processes is intuitive, the indepen-
dence of the processes is not. When mpirun is used to launch a parallel simulation, it starts
multiple processes that execute NIMROD. The first statements in the main program estab-
lish communication between the different processes, assign an integer label (node) to each
process, and tells each process the total number of processes. From this information and
that read from the input and dump files, processes determine what part of the simulation to
perform (in parallel_block_init). Each process then performs its part of the simulation
as if it were performing a serial computation, except when it encounters calls to communicate
with other processes. The program itself is responsible for orchestration without an active
director.
Before describing the different types of domain decomposition used in NIMROD, lets
return to the find_bb routine in utilities.f for an MPI example. Each process is finding
¸
n
ˆ
b
n
(R, Z)
ˆ
b
n
(R, Z) for a potentially limited range of n-values and grid blocks. What’s
needed in every process is the sum over all n-values, i.e.
ˆ
b
ˆ
b, for each grid block assigned to a
53
54 CHAPTER III. PARALLELIZATION
process. This need is met by calling mpi_allreduce within grid_block do loops. Here an
array of data is sent by each process, and the MPI library routine sums the data element by
element with data from other processes that have difference Fourier components for the same
block. The resulting array is returned to all processes that participate in the communication.
Then, each process proceeds independently until its next MPI call.
B Grid block decomposition
The distributed memory aspect of domain decomposition implies that each processor needs
direct access to a limited amount of the simulation data. For grid block decomposition, this
is just a matter of having each processor choose a subset of the blocks. Within a layer
of Fourier components (described in III.C.), the block subsets are disjoint. Decomposition
of the data itself is facilitated by the FORTRAN 90 data structures. Each process has an rb
and/or tb block_type array but the sum of the dimensions is the number of elements in its
subset of blocks.
Glancing through finite_element.f, one sees do-loops starting from 1 and ending at
nrbl, or starting from nrbl+1 and ending at nbl. Thus, the nrbl and nbl values are the
number of rblocks and rblocks+tblocks in a process. In fact, parallel simulations assign
new block index labels that are local to each process. The rb(ibl)%id and tb(ibl)%id
descriptors can be used to map a processor’s local block index (ibl) to the global index (id)
defined in nimset. parallel_block_init sets up the inverse mapping global2local array
that is stored in pardata.[pardata has comment lines defining the function of each of its
variables.] The global index limits, nrbl_total and nbl_total, are saved in the global
module.
Block to block communication between different processes is handled within the edge_network
and edge_seg_network subroutines. These routines call parallel_seam_comm and related
routines instead of the serial edge_accumulate. In routines performing finite element or
linear algebra computations, cross block communication is invoked with one call, regardless
of parallel decomposition.
For those interested in greater details of the parallel block communication, the paral-
lel_seam_init routine in parallel.f creates new data structures during start-up. These
send and recv structures are analogous to seam structures, but the array at the highest level
is over processes with which the local process shares block border information.
The parallel_seam_comm routine (and its complex version) use MPI ‘point to point’
communication to exchange information. This means that each process issues separate ‘send’
and ‘receive’ request to every other process involved in the communication.
The general sequence of steps used in NIMROD for ‘point to point’ communications is
1. Issue a mpi_irecv to every process in the exchange, which indicates readiness to accept
data.
C. FOURIER “LAYER” DECOMPOSITION 55
2. Issue a mpi_send to every process involved which sends data from the local process.
3. Perform some serial work, like seaming among different blocks of the local process to
avoid wasting CPU time while waiting for data to arrive.
4. Use mpi_waitany to verify that expected data from each participating process has
arrived. Once verification is complete, the process can continue on to other operations.
Caution: calls to nonblocking communication (mpi_isend,mpi_irecv,...) seem to have
difficulty with buffers that are part of F90 structures when the mpich implementation is used.
The difficulty is overcome by passing the first element, e.g.
CALL mpi_irecv(recv(irecv)%data(1),...)
instead of the entire array, %data,...
As a prelude to the next section, block border communication only involves processes
assigned to the same Fourier layer. This issue is addressed in the block2proc mapping
array. An example statement
inode=block2proc(ibl_global
assigns the node label for the process that owns global block number ibl_global - for the
same layer as the local process - to the variable inode.
Separating processes by some condition (e.g. same layer) is an important tool for NIM-
ROD due to the two distinct types of decomposition. Process groups are established by
the mpi_comm_split routine. There are two in parallel_block_init. One creates a com-
munication tag grouping all processes in a layer, comm_layer, and the other creates a tag
grouping all processes with the same subset of blocks, but different layers, comm_mode. These
tags are particularly useful for collective communication within the groups of processes. The
find_bb mpi_allreduce was one example. Others appear after scalar products of vectors in
2D iterative solves: different layers solve different systems simultaneously. The sum across
blocks involves processes in the same layer only, hence calls to mpi_allreduce with the
comm_layer tag. Where all processes are involved, or where ‘point to point’ communication
(which references the default node index) is used, the mpi_comm_world tag appears.
C Fourier “layer” decomposition
By this point it’s probably evident that Fourier components are also separated into disjoint
subsets, called layers. The motivation is that a large part of the NIMROD algorithm involves
no interaction among different Fourier components, As envisioned until recently, the matrices
would always be 2D, and Fourier component interaction would occur in explicit pseudo-
spectral operations only. The new3D linear systems preserve layer decomposition by avoiding
explicit computation of matrix elements that are not diagonal in Fourier index.
56 CHAPTER III. PARALLELIZATION
il=0
n=0
n=1
n=2
il=1
Normal Decomp
il=0
il=1
iphi=1
iphi=2
Configuration!space Decomp.
Figure III.1: Normal Decomp: nmodes=2 on il=0 and nmodes=1 on il=1, nf=mx×my=9
Config-space Decomp: nphi=2
lphi
= 8, mpseudo=5 on il=0 and mpseudo=4 on il=1
Parameters are mx=my=3, nlayer=2, lphi=3
Apart from the basic mpi_allreduce exchanges, communication among layers is required
to multiply functions of toroidal angle (see B.1). Furthermore, the decomposition of the
domain is changed before an “inverse” FFT (going from Fourier coefficients to values at
toroidal positions ) and after a “forward” FFT. The scheme lets us use serial FFT routines,
and it maintains a balanced workload among the processes, without redundancy.
For illustrative purposes, consider a poorly balanced choice of lphi= 3, 0 ≤ n ≤nmodes_total−1
with nmodes_total= 3, and nlayers= 2. (layers must be divisible by nlayers.) Since
pseudo-spectral operations are performed on block at a time, we can consider a single-block
problem without loss of generality. (see Figure III.1)
Notes:
• The domain swap uses the collective mpi_alltoallv routine.
• Communication is carried out inside fft_nim, isolating it from the integrand routines.
D. GLOBAL-LINE PRECONDITIONING 57
• calls to fft_nim with nr = nf duplicate the entire block’s config-space data on every
layer.
D Global-line preconditioning
That ill-conditional matrices arise at large δt implies global propagation of perturbations
within a single time advance. Krylov-space iterative methods require very many iterations
to reach convergence in these conditions, unless the precondition step provides global “relax-
ation”. The most effective method we have found for cg on very ill-conditioned matrices is a
line-Jacobi approach. We actually invert two approximate matrices and average the result.
They are defined by eliminating off-diagonal connections in the logical s-direction for one
approximation and by eliminating in the n-direction for the other. Each approximate system
˜
Az = r is solved by inverting 1D matrices,
To achieve global relaxation, lines are extended across block borders. Factoring and solv-
ing these systems uses a domain swap, not unlike that used before pseudospectral compu-
tations. However, point-to-point communication is called (from parallel_line_* routines)
instead of collective communication. (see our 1999 APS poster, “Recent algorithmic ...” on
the web site.)
58 CHAPTER III. PARALLELIZATION
Bibliography
[1] N. A. Krall and Trivelpiece. Principles of Plasma Physics. San Francisco Press, 1986.
[2] G. Strang and G. J. Fix. An analysis of the finite element method. Wesley-Cambridge
Press, 1987.
[3] M. Abramowitz and I. A. Stegun, editors. Handbook of Mathematical Functions. Wash-
ington, D. C.: National Bureau of Standards: For sale by the Supt. of Docs., U.S.
G.P.O., 1964, 1981.
[4] D. D. Schnack, D. C. Barnes, Z. Miki´c, Douglas S. Harned, and E. J. Caramana. Semi-
implicit magnetohydrodynamic calculations. J. Comput. Phys., 70:330–354, 1987.
[5] K. Lerbinger and J. F. Luciani. A new semi-implicit approach for MHD computation.
J. Comput. Phys., 97:444, 1991.
[6] D. S. Harned and Z. Miki´c. Accurate semi-implicit treatment of the Hall effect in MHD
computations. J. Comput. Phys., 83:1, 1989.
[7] R. Lionello, Z. Miki´c, and J. A. Linker. J. Comput. Phys., 152:346, 1999.
[8] B. Marder. A method for incorporating Gauss’ law into electromagnetic PIC codes. J.
Comput. Phys., 68:48, 1987.
[9] C. R. Sovinec, A. H. Glasser, T. A. Gianakon, D. C. Barnes, R. A. Nebel, S. E. Kruger,
D. D. Schnack, S. J. Plimpton, A. Tarditi, M. S. Chu, and the NIMROD Team. Non-
linear magnetohydrodynamics simulation using high-order finite elements. J. Comput.
Phys., 195:355–386, 2004.
[10] C. Redwine. Upgrading to Fortran 90. Springer, 1995.
59

Contents
Contents Preface I Framework of NIMROD Kernel A Brief Reviews . . . . . . . . . . . . . . A.1 Equations . . . . . . . . . . . . A.2 Conditions of Interest . . . . . . A.3 Geometric Requirements . . . . B Spatial Discretization . . . . . . . . . . B.1 Fourier Representation . . . . . B.2 Finite Element Representation . B.3 Grid Blocks . . . . . . . . . . . B.4 Boundary Conditions . . . . . . C Temporal Discretization . . . . . . . . C.1 Semi-implicit Aspects . . . . . . C.2 Predictor-corrector Aspects . . C.3 Dissipative Terms . . . . . . . . C.4 · B Constraint . . . . . . . . C.5 Complete time advance . . . . . D Basic functions of the NIMROD kernel . . . . . . . . . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 1 3 3 3 6 6 7 7 8 13 13 13 14 15 15 15 16 18 19 19 19 19 22 23 24 28 29

II FORTRAN Implementation A Implementation map . . . . . . B Data storage and manipulation B.1 Solution Fields . . . . . B.2 Interpolation . . . . . . B.3 Vector_types . . . . . . B.4 Matrix_types . . . . . . B.5 Data Partitioning . . . . B.6 Seams . . . . . . . . . .

ii C Finite Element Computations . . . . C.1 Management Routines . . . . C.2 Finite_element Module . . . C.3 Numerical integration . . . . C.4 Integrands . . . . . . . . . . . C.5 Surface Computations . . . . C.6 Dirichlet boundary conditions C.7 Regularity conditions . . . . . Matrix Solution . . . . . . . . . . . . D.1 2D matrices . . . . . . . . . . D.2 3D matrices . . . . . . . . . . Start-up Routines . . . . . . . . . . . Utilities . . . . . . . . . . . . . . . . Extrap_mod . . . . . . . . . . . . . . History Module hist_mod . . . . . . I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 34 36 36 39 43 43 45 46 46 47 48 49 50 50 50 53 53 54 55 57

D

E F G H I

III Parallelization A Parallel communication introduction B Grid block decomposition . . . . . . C Fourier “layer” decomposition . . . . D Global-line preconditioning . . . . . .

. n_int=4) .3 II. . . . . . . . . . . . . . . . . . . .5 II. . . . . . . . . . . . . . . . . . . . . 1-D Cubic : extent of βi in red and βj in green. . .8 continuous linear basis function . .1 II. . . . . . . . . . . . . . . . dot denotes quadrature position. . . . Block-border seaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 quad points. . . .6 II.7 II. . . . . . . . . . . mpseudo=5 on il=0 and mpseudo=4 on il=1 Parameters are mx=my=3. . . (n_side=2. . . . . . .4 II. . lphi=3 . . nlayer=2. 56 iii . . . . . . note j is an internal node . . . . . . . . .1 I. . .List of Figures I. . . integration gives joff= −1 1-D Cubic : extent of αi in red and βj in green. input geom=‘toroidal’ & ‘linear’ schematic of leap-frog . . .2 II. . . . .4 II. . . . . . .1 Normal Decomp: nmodes=2 on il=0 and nmodes=1 on il=1. . . . . . 1-D Cubic : extent of αi in red and αj in green. . . . . 9 9 10 17 22 25 26 26 30 31 37 38 Basic Element Example. . . . j are internal nodes iv=vertex index. . . . nf=mx×my=9 Config-space Decomp: nphi=2lphi = 8. . is=segment index. . . . . . . . . . . . . . . not node III. . . . 3 blocks with corresponding vertices . . . . note both i. . . .3 I. . continuous quadratic basis function . . . .2 I. nvert=count of segment and vertices . 4 elements in a block.

iv LIST OF FIGURES .

The presentation was intended to provide a substantive introduction to the FORTRAN coding used by the kernel.edu. For those using earlier versions of NIMROD (prior to and including 2.3. 1 .0 and data structures and array ordering are different than what is described in Section B. we are already considering changes to the time-advance that will alter the predictor-corrector approach discussed in Section C.wisc. It is the author’s hope that these notes will serve as a manual for the kernel for some time. For example.5. In addition.0. note that basis function flexibility was added at 3.4).2 is the advance of temperature instead of pressure in Section C. This written version of the tutorial contains considerably more information than what was given in the three half-day sessions. the 3. so that attendees would learn enough to be able to further develop NIMROD or to modify it for their own applications.Preface The notes contained in this report were written for a tutorial given to members of the NIMROD Code Development Team and other users of NIMROD.0.2.3 version described here is not complete at this time. Some of the information will inevitably become dated as development continues. If you find errors or have comments. but the only difference reflected in this report from version 3. please send them to me at sovinec@engr.

2 LIST OF FIGURES .

Chapter I Framework of NIMROD Kernel
A
A.1

Brief Reviews
Equations

NIMROD started from two general-purpose PDE solvers, Proto and Proteus. Though remnants of these solvers are a small minority of the present code, most of NIMROD is modular and not specific to our applications. Therefore, what NIMROD solves can and does change, providing flexibility to developers and users. Nonetheless, the algorithm implemented in NIMROD is tailored to solve fluid-based models of fusion plasmas. The equations are the Maxwell equations without displacement current and the single fluid form (see [1]) of velocity moments of the electron and ion distribution functions, neglecting terms of order me/mi smaller than other terms. Maxwell without displacement current: Ampere’s Law: µ0 J = Gauss’s Law: ·B= 0 Faraday’s Law: ∂B = − ×E ∂t Fluid Moments → Single Fluid Form Quasi-neutrality is assumed for the time and spatial scales of interest, i.e. • Zni ne → n 3 ×B

4

CHAPTER I. FRAMEWORK OF NIMROD KERNEL • e(Zni − ne)E is negligible compared to other forces • However ·E=0

Continuity: ∂n +V· ∂t Center-of-mass Velocity Evolution: ρ • Π often set to ρν V • Kinetic effects, e.g., neoclassical, would appear through Π Temperature Evolution (3.0.2 & earlier evolve p, pe) nα γ−1 ∂ + Vα · ∂t Tα = −pα · Vα − Πα : Vα − · qα + Qα ∂V + ρV · ∂t n = −n ·V

V = J×B−

p−

·Π

• qα = −nα χα · Tα in collisional limit • Qe includes ηJ 2 • α = i, e • pα = nα kTα Generalized Ohm’s (∼ electron velocity moment)(underbrace denotes input parameters, unless otherwise noted ohms=)  

E=

−V × B ‘mhd’ ‘‘mhd & hall’ ‘2fl’

+ηJ
elecd>0

1 + J×B ne ‘hall’ ‘mhd & hall’ ‘2fl’

me + 2 ne

   ∂J  e   + ·Π ( pe + · (JV + VJ) − )   ∂t  me  neoclassical  advect=‘all’ ‘2fl’

A steady-state part of the solution is separated from the solution fields in NIMROD. For resistive MHD for example, ∂/∂t → 0 implies

A. BRIEF REVIEWS 1. ρsVs · 2. 3. 4. Vs = Js × Bs − ps + · ρs ν V s

5

· (ns Vs) = 0 × Es =
ns γ−1 Vs

× (ηJs − Vs × Bs) = 0 Ts = −ps · Vs + · nsχ · Ts + Qs

·

Notes: • An equilibrium (Grad-Shafranov) solution is a solution of 1 (Js × Bs = ps); 2 – 4

• Loading an ‘equilibrium’ into NIMROD means that 1 – 4 are assumed. This is appropriate and convenient in many cases, but it’s not appropriate when the ‘equilibrium’ already contains expected contributions from electromagnetic activity. • The steady-state part may be set to 0 or to the vacuum field. Decomposing each field into steady and evolving parts and canceling steady terms gives (for MHD) [A → As + A] ∂n + Vs · ∂t n+V· ns + V · n = −ns ·V−n · Vs − n ·V

(ρs + ρ)

∂V + ρ [(Vs + V) · ∂t

(Vs + V)] + ρs [Vs · p+

V+V·

Vs + V ·

V]

= Js × B + J × Bs + J × B − ∂B = ∂t

· ν [ρ Vs + ρs V + ρ V]

× (Vs × B + V × Bs + V × B − ηJ) ns [Vs · γ−1 ·V T]

(n + ns) ∂T n + [(Vs + V) · γ − 1 ∂t γ −1 = −ps + Notes: · [ns χ ·

(Ts + T )] + ·V−p

T +V·

Ts + V ·

T]

· Vs − p

T + nχ · Ts + nχ ·

• Poloidal cross section may be complicated. * All effects are present in equations sufficiently complete to model behavior in experiments. and linear terms tend to cancel. • Geometric symmetry of the toroidal direction was assumed. 2.2 Conditions of Interest The objective of the project is to simulate the electromagnetic behavior of magnetic confinement fusion plasmas. Kinetic closures lead to integro-differential equations. • Global magnetic changes occur over the global resistive diffusion time. FRAMEWORK OF NIMROD KERNEL • The code further separates χ (and other dissipation coefficients ?) into steady and evolving parts. General Notes: 1. decomposition adds to computations per step and code complexity. Collisional closures lead to local spatial derivatives only. • MHD waves propagate 104 – 1010 times faster. ∂ equations are the basis for the NIMROD time-advance. It would also require complete source terms.3 Geometric Requirements The ability to accurately model the geometry of specific tokamak experiments has always been a requirement for the project.6 CHAPTER I. • The motivation is that time-evolving parts of fields can be orders of magnitude smaller than the steady part. ∂t A. so skipping decompositions would require tremendous accuracy in the large steady part. • Nonetheless. A. • The Team added: – Periodic linear geometry – Simply-connected domains . • Topology-changing modes grow on intermediate time scales. Realistic conditions make the fluid equations stiff.

An . n > 0. • Linear computations find An (t) (typically) for a single n-value. 0 ≤ m ≤ 2N − 1 N • FFT returns Fourier coefficients En. ‘lin_nmodes’) • Nonlinear simulations use FFTs to compute products on a uniform mesh in φ and z. 0 ≤ n ≤ N • FFT gives V m • Multiply (colocation) E m π N = −V m π N ×B m π . 0 ≤ m ≤ 2N − 1 N π N .1 Spatial Discretization Fourier Representation A pseudospectral method is used to represent the periodic direction of the domain. and the real coefficient A0 . • Input parameter ‘lphi’ determines 2N. (Input ‘lin_nmax’.B. Example of nonlinear product: E = −V × B • Start with Vn. so we solve for the complex coefficients. A truncated Fourier series yields a set of coefficients suitable for numerical computation: N A (φ) = A0 + n=1 An einφ + A∗ e−inφ n or N A (z) = A0 + n=1 An ei 2πn z L + A∗ e−i n 2πn z L All physical fields are real functions of space. SPATIAL DISCRETIZATION 7 B B. 0 ≤ n ≤ N Standard FFTs require 2N to be a power of 2. B m π . 2N = 2lphi . Bn.

Substituting a truncated series for f (φ) changes the PDE problem into a search for the best solution on a restricted solution space. Orthogonality of basis functions is used to find a system of discrete equations for the coefficients.c. • One can show that setting Vn. n=1 dφe−in φ for 0 ≤ n ≤ N pol × En + ˆ in φ × En R . and the discrete equations describe the evolution of the coefficients of the basis functions. FRAMEWORK OF NIMROD KERNEL • Collocation is equivalent to spectral convolution if aliasing errors are prevented. e. = 0 for n > 2N 3 prevents aliasing from quadratic 2lphi 3 nonlinearities (products of two functions). Both are basis function expansions that impose a restriction on the solution space.8 CHAPTER I. 0≤n ≤N B.c. etc. Bn. 2.   0  x−xi−1     0 x < xi−1 xi−1 ≤ x ≤ xi xi ≤ x ≤ xi+1 x > xi+1 αi (x) = xi −xi−1 xi+1 −x xi+1 −xi . Faraday’s law ∂B (φ) = − × E (φ) ∂t ∂ Bneinφ + c.2 Finite Element Representation Finite elements achieve discretization in a manner similar to the Fourier series. The basis functions of finite elements are low-order polynomials over limited regions of the domain.g. nmodes total = +1 • Aliasing does not prevent numerical convergence if the expansion converges. = − B0 + ∂t n=1 apply 2π ∂Bn =− ∂t dφ N N × E0 + En einφ + c. As a prelude to the finite element discretization. Fourier expansions: 1. the restricted spaces become better approximations of the continuous solution space as more basis functions are used.

B. Z) = j Aj αj (R. • Terms in the weak form must be integrable after any integration-by-parts.1: continuous linear basis function !i(x) 1 1 "i(x) x i!1 xi x i+1 x i!1 xi x i+1 Figure I. Using the restricted space implies that we seek solutions of the form A (R.2: continuous quadratic basis function   0  (x−x )(x−x 1 ) x < xi−1  i−1  i−   (x −x )(x −x 21 ) xi−1 ≤ x ≤ xi i i−1 i       2 (x−xi+1 )(x−xi+ 1 ) 2 2 i− αi(x) = 0 (xi −xi+1 )(xi −xi+ 1 ) xi ≤ x ≤ xi+1 x > xi+1 βi(x) = 0 (x−xi−1 )(x−xi ) (xi− 1 −xi−1 )(xi− 1 −xi ) 2 2 xi−1 ≤ x ≤ xi otherwise In a conforming approximation. but derivatives need to be piecewise continuous only. but delta functions are not. Z) . step functions are admissible. continuity requirements for the restricted space are the same as those for the solution space for an analytic variational form of the problem. • Our fluid equations require a continuous solution space in this sense. SPATIAL DISCRETIZATION !i(x) 1 9 x i!1 xi x i+1 Figure I.

The present version of NIMROD allows the user to choose the polynomial degree of the basis functions (poly_degree) and the mapping (‘met_spl’).10 CHAPTER I. Z. lowest order polynomials of physical coordinates. (See [2] for analysis of self-adjoint problems.e.3: input geom=‘toroidal’ & ‘linear’ For toroidal geometries. we must use the fact that eR.2)] • 2D basis functions are formed as products of 1D basis functions for R and Z. i. For scalar fields. All recent versions of NIMROD have equations expressed in (R. FRAMEWORK OF NIMROD KERNEL • The j-summation is over the nodes of the space. For vector fields. z eZ eR e# y ey ex ez x z Figure I. [αj &βj in 1D quadratic example (Figure I. Z) for linear geometries. • αj (R. Y. Numerical analysis shows that a correct application of the finite element approach ties convergence of the numerical solution to the best possible approximation by the restricted space. ˆ e . Nonuniform meshing can be handled by a mapping from logical coordinates to physical coordinates. we now have enough basis functions. we also need direction vectors. To preserve convergence properties. the order of the mapping should be no higher than polynomial degree of the basis functions[2]. eφ are functions of φ. φ) coordinates for toroidal geometries and (X.) Error bounds follow from the error bounds of a truncated Taylor series expansion. but we always ˆ ˆ have ei · ˆj = δij . Z) here may represent different polynomials.

and make n sums from 0 to N. Z. SPATIAL DISCRETIZATION • Expanding scalars N 11 S (R.c. their finite extent means that dRdZD s (αi (R. φ. 0 ≤ n ≤ N ¯ • Integrate by parts to symmetrize the diffusion term and to avoid inadmissible derivatives of our solution space. and using E to represent the ideal −V ×B field (after pseudospectral calculations): • Expand B (R. t) • Apply dRdZdφ αj l (R. φ) → αj (R. we create a weak form of the PDEs by multiplying by a conjugated basis function. Z)) D t (αj (R.c. φ) → • Expanding vectors αj (R. only where nodes i and j are in the same element (inclusive of element borders). φ. Z. l {R. Z. φ} . where D s is a spatial derivative operator of allowable order s (here 0 or 1).c. ˆ While our poloidal basis functions do not have orthogonality properties like einφ and eR . Returning to Faraday’s law for resistive MHD. Z) el Vj l0 + ˆ j l n=1 Vj lneinφ + c. Z) e−in φ for j = 1 : # nodes. Z)) = 0. t) and E (R. N V (R. Z) Sj 0 + j n=1 Sj neinφ + c.B. ¯ ˆ where N here is nmodes_total−1. Integrands of this form arise from a procedure analogous to that used with Fourier series alone. Z. Z. * For convenience αj l ≡ αj el and I’ll suppress c. αj l e−in φ · ¯ dRdZdφ ∂Bj ln αj l einφ ¯ ∂t dRdZdφ =− η µ0 Bj ln j ln j ln × αj l e−in φ · ¯ Ej ln αj leinφ ¯ × αj l einφ + ¯ j ln .

• The rhs is left as a set of integrals over functions of space. l ) and column (j.e. • {Bj ln } constitutes the solution vector at an advanced time (see C). using the same functions for test and basis functions (Galerkin) leads to the same equations as minimizing the corresponding variational [2]. • The lhs is written as a matrix-vector product with the dRdZ f (¯j l ) g (¯j l ) integrals α α defining a matrix element for row (j . and rearranging the lhs. n } Notes on this weak form of Faraday’s law: • Surface integrals for inter-element boundaries cancel in conforming approximations. . These elements are diagonal in n. η • For self-adjoint operators. Rayleigh-Ritz-Galerkin problem. FRAMEWORK OF NIMROD KERNEL + αj l e−in φ · ¯ j ln Ej ln αj l einφ × dS ¯ Using orthogonality to simplify. l). • The remaining surface integral is at the domain boundary. such as I+∆t × µ0 ×.12 CHAPTER I. l . i. 2π j ∂Bj l n ∂t η µ0 dRdZ αj αj + jl Bj ln dRdZdφ × αj l e−in φ · ¯ × αj lein φ ¯ Ej l αj l ein φ ¯ =− dRdZdφ × αj l e−in φ · ¯ jl + αj l · ¯ jl Ej ln αj l × dS ¯ for all {j .

The solution field is marched in a sequence of time-steps from initial conditions. Both may be used in the same simulation. as in finite difference or finite volume methods. In NIMROD we simply substitute the Dirichlet boundary condition equations in the linear system at the appropriate rows for the boundary nodes. ξi }. but blocks don’t have to conform. The integrals are computed numerically with Gaussian quadrature 0 f (ξ) dξ → i wif (ξi ) where {wi . Neuman conditions are not set explicitly. i = 1. . η are logical coordinates. 1 dRdZ → J dξdη. Boundary conditions can be modified for the needs of particular applications. The variational aspects enforce Neuman conditions . 3.through the appearance (or non-appearance) of surface integrals . ng are prescribed by the quadrature method [3]. Formally. ∂α ∂R ∂ξ ∂η → ∂α ∂R + ∂α ∂R and similarly for Z. Unstructured blocks of triangular elements are called ‘tblocks’. More notes on weak forms: • The finite extent of the basis functions leads to sparse matrices • The mapping for non-uniform meshing has not been expressed explicitly. It entails: 1. Dirichlet conditions are also imposed on Tα . They are called ‘rblocks’.C. . . NIMROD has two types of blocks: Logically rectangular blocks of quadrilateral elements are structured. TEMPORAL DISCRETIZATION 13 B. If thermal conduction is used. where α is a low order polynomial of ξ ∂ξ ∂η and η with uniform meshing in (ξ. and J is the Jacobian. C Temporal Discretization NIMROD uses finite difference methods to discretize time. Dirichlet conditions are enforced on the restricted solution space (“essential conditions”). B.in the limit of infinite spatial resolution ([2] has an excellent description). The kernel requires conformance of elements along block boundaries. 2.4 Boundary Conditions Dirichlet conditions are normally enforced for Bnormal and V. the poloidal mesh can be constructed from multiple grid-blocks. . however. η).3 Grid Blocks For geometric flexibility and parallel domain decomposition. where ξ. These “natural” conditions preserve the spatial convergence rate without using ghost cells.

This may lead to ‘the ultimate’ time-advance scheme. we have chosen to take advantage of the smallness of the solution field with respect to steady parts → the dominant stiffness arises in linear terms. FRAMEWORK OF NIMROD KERNEL C. Truncation errors are purely dispersive. the semi-implicit method as applied to hyperbolic systems of PDEs can be described as a leap-frog approach that uses a self-adjoint differential operator to extend the range of numerical stability to arbitrarily large ∆t. • Anisotropic terms use steady + (n = 0) fields → operators are diagonal in n. the semi-implicit method does not introduce numerical dissipation. Hall terms must be used with great care until further development is pursued. (See [4.1 Semi-implicit Aspects Without going into an analysis. then with the Hall electric field. low dissipation systems like plasmas. It is the tool we use to address the stiffness of the fluid model as applied to fusion plasmas. BMHD = Bn − ∆t Bn+1 = BMHD − ∆t × EMHD × EHall Though it precludes 2 order temporal convergence. requiring nonlinear system solves at each time step. this splitting reduces errors of large time-step [6]. Why not an implicit approach? A true implicit method for nonlinear systems converges on nonlinear terms at advanced times. the operators lead to linear matrix equations. and min |ωk | 0. 5].) Though any self-adjoint operator can be used to achieve numerical stability. At present. • Leads to very ill-conditioned matrices. nd . Like the leap-frog method. when the Hall electric field is included in a computation. NIMROD also uses time-splitting. This aspect makes semi-implicit methods attractive for stiff. The operator used in NIMROD to stabilize the MHD advance is the linear ideal-MHD force operator (no flow) plus a Laplacian with a relatively small coefficient [5]. B is first advanced with the MHD electric field. • We have run tokamak simulations accurately at max |ωk |∆t ∼ 104 – 105. A separate semi-implicit operator is then used in the Hall advance of B. the operator implemented in NIMROD often gives inaccuracies at ∆t much larger than the explicit stability limit for a predictor/corrector advance of B due to EHall .14 CHAPTER I. accuracy at large ∆t is determined by how well the operator represents the physical behavior of the system [4]. For example. • Eigenvalues for the MHD operator are ∼ 1 + ω2∆t2 where ωk are the frequencies of the k MHD waves supported by the spatial discretization. However. Although the semi-implicit approach eliminates time-step restrictions from wave propagation.

f in the nimset directory. where C is O (1) and depends on centering. so we set centerings to 1/2 + δ and rely on having a little dissipation. even for Vs · terms. • Nonlinear and linear computations require two solves per equation.1) ∂ ·B = ∂t ·κ ·B ( · B) (I. We typically use fully forward centering. • Synergistic wave-flow effects require having the semi-implicit operator in the predictor step and centering wave-producing terms in the corrector step when there are no dissipation terms [7] • NIMROD does not have separate centering parameters for wave and advective terms. A predictor-corrector approach can be made numerically stable. the error will grow until it crashes the simulation.] C. TEMPORAL DISCRETIZATION 15 C. V [See input.4 · B Constraint Like many MHD codes capable of using nonuniform. • Numerical stability still requires ∆t ≤ C ∆x .2) dS · B[9]. κ ·B ∂B =− ∂t Applying · gives ×E+κ ·B ·B (I.C. C. nonorthogonal meshes. so that the error is diffused if the boundary conditions maintain constant .3 Dissipative Terms Dissipation terms are coded with time-centerings specified through input parameters.2 Predictor-corrector Aspects The semi-implicit approach does not ensure numerical stability for advection. · B is not identically 0 in NIMROD. We have used an error-diffusion approach [8] that adds the unphysical diffusive term · B to Faraday’s law. Left uncontrolled.

V∗ = Vn + fv (∆V)predict . 1+ = (∆t) me eta × −(∆t)f ·B κ ·B + (∆t)fη · (∆B)pass 2 ne µ0 n 1 · Πeneo + (∆t)κ ·B × Vn+1 × B∗ − × Bn + · Bn µ0 ne × n∗ + n∗ · Vn+1 • pass = predict → B∗ = Bn • pass = correct → B∗ = Bn + fb (∆B)predict • BMHD = Bn + ∆Bcorrect 1+ × me |B0 | × −∆tf ·B κ ·B + Csb (∆t)2 · (∆B)pass b 2 ne ne 1 ∗ me (J × B∗ − pn) + 2 · (J∗ Vn+1 + Vn+1 J∗ ) × e ne ne · BMHD +(∆t)b κ ·B = −(∆t)b . • For pass = correct. Cs∗ ’s are semi-implicit coefficients.0. V∗ = Vn.5 Complete time advance A complete time-step in version 3. FRAMEWORK OF NIMROD KERNEL C.16 CHAPTER I. ρn − ∆tfν · ρn ν Csm ∆t2 [B0 × × × (B0 × I·) − J0 × × (B0 × I·)] 4 Csn∆t2 2 Csp ∆t2 (∆V)pass [ γP0 · + [ (p0 ) · I·]] − − 4 4 = −∆t [ρnV∗ · V∗ + Jn × Bn − pn + · ρn ν Vn + · Πneo ] − • For pass = predict.3 will solve coefficients of the basis functions used in the following time-discrete equations. and • Vn+1 = Vn + (∆V)correct (∆n)pass = −∆t Vn+1 · • pass = predict → n∗ = nn • pass = correct → n∗ = nn + fn (∆n)predict • nn+1 = nn + (∆n)correct Note: f∗ ’s are centering coefficients.

.C. B∗ = BMHD + (∆B)predict • Bn+1 = BMHD + (∆B)correct nn+1 + (∆t)fχ γ−1 ∗ Tα + pn α 17 · nn+1 χα · · nn+1 χα · (∆Tα )pass n Tα + Q = −∆t nn+1 n+1 · V γ −1 α · Vn+1 − α ∗ n • pass = predict → Tα = Tα ∗ n • pass = correct → Tα = Tα + fT (∆Tα )predict n+1 n • Tα = Tα + (∆Tα )correct • α = e.3 • Without the Hall time-split. i Notes on time-advance in 3.T} n V n+1/2 t Figure I. TEMPORAL DISCRETIZATION • pass = predict → (∆t)b = ∆t/2.T} n!1 n!1/2 {B. • Fields used in semi-implicit operators are updated only after changing more than a set level to reduce unnecessary compution of the matrix elements (which are costly).4: schematic of leap-frog • The different centering of the Hall time split is required for numerical stability (see Nebel notes). approximate time-centered leap-frogging could be achieved 1 n n + nn+1 in the temperature equation by using 2 V {B. B∗ = BMHD • pass = correct → (∆t)b = ∆t.n. ∂J electron inertia term necessarily appears in both time-splits for B when it is • the ∂t included in E.n.0.

Output • Write global. and initial fields from dump file • Allocate storage for work arrays. as needed (b) Advance each equation • Allocate rhs and solution vector storage • Extrapolate in time previous solutions to provide an initial guess for the linear solve(s). Startup • Initialize parallel communication as needed • Read grid. • Update matrix and preconditioner if needed • Create rhs vector • Solve linear system(s) • Reinforce Dirichlet conditions • Complete regularity conditions if needed • Update stored solution • Deallocate temporary storage 3. Check stop condition • Problems such as lack of iterative solver convergence can stop simulations at any point . matrices. equilibrium.18 CHAPTER I. Time Advance (a) Preliminary Computations • Revise ∆t to satisfy CFL if needed • Check how much fields have changed since matrices were last updated. FRAMEWORK OF NIMROD KERNEL D Basic functions of the NIMROD kernel • Read input parameters 1. Set flags if matrices need to be recomputed • Update neoclassical coefficients. energy. voltages. and probe diagnostics to binary and/or text files at desired frequency in time-step (nhist) • Write complete dump files at desired frequency in time-step (ndump) 4. etc. etc. 2.

Most of the operations are repeated for each advance. so FORTRAN module grouping of subprograms on a level helps orchestrate the hierarchy. Use of KIND when defining data settles platform-specific issues of precision.Chapter II FORTRAN Implementation Preliminary remarks As implied in the Framework section. the NIMROD kernel must complete a variety of tasks each time step.) A Implementation map All computationally intensive operations arise during either 1) the finite element calculations to find rhs vectors and matrices. The data and operation needs of subprograms at a level in the hierarchy tend to be similar. and 2) solution of the linear systems. and equation-specific operations occur at only two levels in a hierarchy of subprograms for 1). 19 . Array syntax leads to cleaner code. Modularity has been essential in keeping the code manageable as the physics model has grown in complexity. FORTRAN 90 modules organize different types of data and subprograms. This information is stored in FORTRAN 90 structures that are located in the fields module.1 Data storage and manipulation Solution Fields The data of a NIMROD simulation are the sets of basis functions coefficients for each solution field plus those for the grid and steady-state fields. B B. FORTRAN 90 extensions are also an important ingredient. ([10] is recommended. Arrays of data structures provide a clean way to store similar sets of data that have differing numbers of elements.

2. all caps indicate a module name.20 Implementation Map of Primary NIMROD Operations ·adv * routines manage an advance of one field rr rr j 6 6 ¨¨ % ¨ E ¢ gg gg c ¢ g g ¢  ITER CG F90  g g ¢  c g g perform real 2D matrix ¢  c  g ¨ g ¢ RBLOCK(TBLOCK) ¨ solves  SURFACE g ¢  ¨¨ g ·rblock make matrix  c g g ¢ ·surface comp rhs ¨¨  g g ·rblock get rhs  ¢ ¨¨ ITER CG COMP perform numerical ¨ g g ¢ Perform numerical integration in g g perform complex 2D matrix surface integrals g g rblocks c g g   solves c EDGE c g g     INTEGRANDS g g     NEOCLASS INT ·edge network g     g Perform most equaperform computations carry out communication g  g  FINITE ELEMENT BOUNDARY ' ·matrix create essential conditions  ·get rhs  '  organize f. FORTRAN IMPLEMENTATION NOTE:1.e.‘·’ indicates a subroutine name or interface to module procedure . computations REGULARITY  ITER 3D CG MOD perform 3D matrix solves tion specific pointwise computations across block borders c c for neoclassical-specific   equations c § c c c ¥ FFT MOD     C  ' SURFACE INTS perform surface-specific computations VECTOR TYPE MOD perform ops on vector coefficients c c c MATH TRAN do a routine math operation GENERIC EVALS interpolate or find storage for data at quadrature points MATRIX MOD compute 2D matrix-vector products CHAPTER II.

dalpdr. A lagr_quad_type structure holds the coefficients and information for evaluating the field at any (ξ. This is a structure holding information such as the cell-dimensions of the block(mx. DATA STORAGE AND MANIPULATION 21 • rb is a 1D pointer array of defined-type rblock_type. and two sets of structures for each solution field: 1. along with the interpolation routines that use them. arrays of basis function information at Gaussian quadrature nodes (alpha. just a pointer array. These structures hold pointer arrays that are used as pointers (as opposed to using them as ALLOCATABLE arrays in structures). An rb_comp_qp_type. The lagr_quad_types are defined in the lagr_quad_mod module. • The type rblock_type (analogy:integer). The main storage locations are the fs. fsh. Here. fsv.xg. They are allocated 1:nbl. • fs(iv. and fsi arrays. is defined in module rblock_type_mod. 2. [The tri_linear_types are the analogous structures for triangular elements in the tri_linear module. The element-independent structure(vector_types) are used in linear algebra operations with the coefficients. • tb is a 1D pointer array. there are also 1D pointer arrays of cvector_type and vector_type for each solution and steady-state field. (See [10]) In general. where nrbl is the number of rblocks.] The lagr_quad_2D_type is used for φ-independent fields. There are pointer assignments from these arrays to the lagr_quad_type structure arrays that have allocated storage space.B.yg). numerical integration information(wg. it is dimensioned 1:nrbl. let’s backup to the fields module level. here) are used in conjunction with the basis functions (as occurs in getting field values at quadrature point locations). containing array dimensions and character variables for names. There are also some arrays used for interpolation. These structures are used to give block-typeindependent access to coefficients.im) → coefficients for grid-vertex nodes.iy. η).ix.dalpdz). Before delving into the element-specific structures. respectively. the element-specific structures(lagr_quad_type. At the start of a simulation.my). allocated to save the interpolates of the solution spaces at the numerical quadrature points. similar to rb but of type tblock_type_mod. dimensioned nrbl+1:nbl so that tblocks have numbers distinct from the rblocks. The structures are self-describing. – iv = direction vector component (l:nqty) – ix = horizontal logical coordinate index (0:mx) – iy = vertical logical coordinate index (0:my) .

ix.1.:) fsh(:.ix.ix!1.ix.η ) and αi|(ξ .:) fs(:.:) fsh(:. since interpolations are needed to carry out the numerical integrations. For both real and complex data structures.4.2 Interpolation The lagr_quad_mod module also contains subroutines to carry out the interpolation of fields to arbitrary positions.ix. • fsi(iv.2.iy!1. there are single point evaluation routines (*_eval) and routines for finding the field at the same offset in every cell of a block ∂ ∂ αi |(ξ .1.ib.:) fsv(:.2.ix.im) → coefficients for vertical side nodes.iy.η ) (*_all_eval routines).:) fsh(:.iy. This is a critical aspect of the finite element algorithm.ix.2. All routines start by finding αi|(ξ .iy.ix.iy.iy!1.:) fsi(:.1: Basic Element Example.iy.:) fs(:.ix!1.:) Figure II.2.2.iy.:) fsi(:.:) fsi(:.ix.ix.1.ix.im) → coefficients for internal nodes [ib limit is (1:n_int)] fs(:.iy.:) fsv(:.ix.iy.ix.1.iy.iy.:) fsv(:.:) fsv(:.iy.:) fs(:. (n_side=2.ib.iy!1. n_int=4) B.iy.iy.iy!1. – iv = direction vector component (*:*) – ib = basis index (1:n_side) – ix = horizontal logical coordinate index (1:mx) – iy = vertical logical coordinate index (0:my) – im = Fourier component (“mode”) index (1:nfour) • fsv(iv.iy.ix!1.η ) [and ∂ξ ∂η .ix. FORTRAN IMPLEMENTATION – im = Fourier component (“mode”) index (1:nfour) • fsh(iv.iy.3.im) → coefficients for horizontal side nodes.ix.1.22 CHAPTER II.:) fsh(:.:) fsi(:.ib.ix.ix!1.

dalpdr. I won’t cover tblocks in any detail.] The interpolation routines then multiply basis function values with respective coefficients. So we have • CALL vector_assign_cvec(vectr(ibl). DATA STORAGE AND MANIPULATION 23 if requested] for the (element–independent) basis function values at the requested offset (ξ . stored in the rb%alpha. putting block and basis-specific array idiosyncrasies into module routines. and Re{Bφ }. we need to create a linear combination of the old field and the predicted solution. However. The data structure. development for released versions must be suitable for simulations with tblocks.] Vector_type structures simplify these operations. There is a loop over Fourier components that calls two real 2D matrix solves per component. After a solve in a predictor step (b_pass=bmhd pre or bhall pre). and fy. Interpolation operations are also needed in tblocks. One solves α for Re{BR}. Im{BZ }.f for an example of a single point interpolation call. 3D (complex) vector_type • work1 (1:nbl) → pointers to storage that will be B∗ in corrector step. fx.B. [It’s also used at startup to find the basis function information at quadrature point locations. The developer writes one call to a subroutine interface in a loop over all blocks. dalpdz arrays for use as test functions in the finite element computations. be(ibl). flag. and the other solves for Im{BR}. [Management routines. The lagr_quad_bases routine is used for this operation. The computations are analogous but somewhat more complicated due to the unstructured nature of these blocks. Results for the interpolation are returned in the structure arrays f. and they need arrays for the returned values and logical derivatives in each cell. logical positions for the interpolation point. η ).3 Vector_types We often perform linear algebra operations on vectors of coefficients. and −Im{Bφ} for the F-comp. See generic_3D_all_eval for an example of an all_eval call. At this point. B. and desired derivative order are passed into the single-point interpolation routine. See routine struct_set_lay in diagnose. imode) . etc. Re{BZ }. 3D vector_type pre n What we want is B∗ imode = fb Bimode + (1 − fb )Bimode . iterative solves. 2D real vector_type • sln (1:nbl) → holds linear system solution. The all_eval routines need logical offsets (from the lower-left corner of each cell) instead of logical coordinates. • vectr (1:nbl) → work space. since they are due for a substantial upgrade to allow arbitrary polynomial degree basis functions. also a 2D vector_type • be (1:nbl) → pointers to Bn coefficients. Consider an example from the adv_b_iso management routine. like rblocks.

Here we interface vector_add_vec.e. so we can write statements like work1(ibl) = be(ibl) be(ibl) = 0 if arrays conform which perform the assignment operations component array to component array or scalar to each component array element. • CALL vector_add(sln(ibl).f file. The 6D matrix is unimaginatively named arr. Re{BZ }.f file. Thus. each lowest-level array contains all matrix elements that are multiplied with a grid-vertex. so there are no Fourier component indices at this level. imode) – Transfer linear combination to appropriate vector components and Fourier component of work1. Going back to the end of the vector_type_mode. and arri respectively) i. Referencing one element appears as . arrh.4 Matrix_types The matrix_type_mod module has definitions for structures used to store matrix elements and for structures that store data for preconditioning iterative solves. vertical-side. Examining vector_type_mod at the end of the vector_type_mode. horizontal-side. FORTRAN IMPLEMENTATION – Transfers the flag-indicated components such as r12mi3 for Re{BR}. we have arrays for matrix elements that will be used in matrix-vector product routines. The necessity of such complication arises from varying array sizes associated with different blocks and basis types. and our basis functions do not produce matrix couplings from the interior of one grid block to the interior of another. B. • CALL cvector_assign_vec(work1(ibl). vectr(ibl). This subroutine multiplies each coefficient array and sums them. and coding for coefficients in different arrays is removed from the calling subroutine. note that ‘=’ is overloaded. The matrices are 2D. Furthermore. as determined by the type definitions of the passed arrays. all coefficients for a single “basis-type” within a block. and −Im{Bφ} for imode to the 2D vector. array ordering has been chosen to optimize these product routines which are called during every iteration of a conjugate gradient solve. or cell-interior vector type array (arr. arrv. Note that we did not have to do anything different for tblocks. sln(ibl). v2fac=‘1-center’) – Adds ‘1-center’*vectr to ‘center’*sln and saves results in sln. At the bottom of the structure. flag. The structures are somewhat complicated in that there are levels containing pointer arrays of other structures.24 CHAPTER II. v1fac=‘center’. ‘vector_add is defined as an interface label to three subroutines that perform the linear combination for each vector_type.

ix.jyoff. Though offset requirements are the same. while vertices are numbered from 0. horizontal-side. The offset range is set by the extent of the corresponding basis functions. DATA STORAGE AND MANIPULATION arr(jq. Finally. The ‘outer-product’ nature implies unique offset ranges for grid-vertex. j*off=0 for cell-tocell.jxoff. The column coordinates are simply jx = ix + jxoff jy = iy + jyoff For each dimension.iy) where iq ix iy jq jxoff jyoff = = = = = = ‘quantity’ row index horizontal-coordinate row index vertical-coordinate row index ‘quantity’ column index horizontal-coordinate column offset vertical-coordinate column offset 25 For grid-vertex to grid-vertex connections 0 ≤ ix ≤ mx and 0 ≤ iy ≤ my. giving dense storage for a sparse matrix without indirect addressing. with storage zeroed out where the offset extends beyond the block border. integration gives joff= −1 Offsets are more limited for basis functions that have their nonzero node in a cell interior. and cell-interior centered coefficients.iq. such as the two sketched here. since cells are numbered from 1. we also need to distinguish different bases of the same type. . vertex to vertex offsets are −1 ≤ j ∗ off ≤ 1. AZ . Integration here gives joff=-1 matrix elements. (simply 1 : 3 for AR . The structuring of rblocks permits offset storage for the remaining indices. Aφ or just 1 for a scalar).B.2: 1-D Cubic : extent of αi in red and αj in green. iq and jq correspond to the vector index dimension of the vector type. Thus for grid-vertex-to-cell connections −1 ≤ jxoff ≤ 0. i j Figure II. vertical-side. while for cell-to-grid-vertex connections 0 ≤ jxoff ≤ 1.

1 ≤ iq ≤ 9 Storing separate arrays for matrix elements connecting each of the basis types therefore lets us have contiguous storage for nonzero elements despite different offset and basis dimensions.26 CHAPTER II. note both i. j are internal nodes For lagr_quad_type storage and vector_type pointers. .4: 1-D Cubic : extent of βi in red and βj in green. 1 ≤ iq ≤ 2 • Bi-quartic cell interior to horizontal side for B 1 ≤ jq ≤ 27.3: 1-D Cubic : extent of αi in red and βj in green. note j is an internal node i j Figure II. FORTRAN IMPLEMENTATION i j Figure II. Examples: • Bi-cubic grid vertex to vertical side for P jq=1. For matrix-vector multiplication it is convenient and more efficient to collapse the vector and basis indices into a single index with the vector index running faster. this is accomplished with an extra array dimension for side and interior centered data.

nb_type(1:4) is equal to the number of bases for each type (poly_degree-1 for types 2-3 and (poly_degree − 1)2 for type 4). One of the matrix element arrays is then referenced as mat(jb. All of this is placed in a structure. so that we can have on array of rbl_mat’s with one component component per grid block. while isotropic operators use the same real matrix for two sets of real and imaginary components.B. rbl_mat. The storage is located in the matrix_storage_mod module. DATA STORAGE AND MANIPULATION 27 Instead of giving the arrays 16 different names.B. only matrix_type definitions have been discussed. A similar tbl_mat structure is also set at this level. Collecting all block_mats in the global_matrix_type allows one to define an array of these structures with one array component for each Fourier component. So far.e. i. The numeral coding is 1≡grid-vertex type 2≡horizontal-vertex type 3≡vertical-vertex type 4≡cell-interior type Bilinear elements are an exception. and nq_type is equal to the quantity-index dimension for each type. . each equation has its own matrix storage that holds matrix elements from time-step to time-step with updates as needed. starting horizontal and vertical coordinates (=0 or 1) for each of the different types. This is a 1D array since the only stored matrices are diagonal in the Fourier component index. Anisotropic operators require complex arrays. NIMROD has several options. 2D.3. we place arr in a structure contained by a 4 × 4 array mat. 1 ≤ jb ≤ 4. For most cases. and each has its own storage needs. There are only grid-vertex bases.ib)%arr where ib=basis-type row index jb=basis-type column index and 1 ≤ ib ≤ 4. matrix_type_mod also contains definitions of structures used to hold data for the preconditioning step of the iterative solves. This level of the matrix structure also contains self-descriptions. such as nbtype=1 or 4. as in the B advance example in II. along with self-description. so mat is a 1 × 1 array.

reducing the number of unknowns in a linear system prior to calling an iterative solve. NIMROD’s kernel in isolation is a complicated code. if A11 − A12 A−1 A21 has the same structure as A11 .developed. The matelim routines serve a related purpose . However. If A11 is sparse but A11 −A12 A−1 A21 is not. The operations were initially written into one subroutine with adjustable offsets and starting indices. it . What’s there now is ugly but faster due to having fixed index limits for each basis type resulting in better compiler optimization.1) where Aij are sub-matrices. partitioning may slow the 22 computation. due to the objectives of the project and chosen algorithm. B. so that Ax is found through operations separated by block. Elimination is therefore used for 2D linear systems when poly_degree> 1. For 22 cell-interior to cell-interior sub-matrices. Development isn’t trivial. The large number of low-level routines called by the interfaced matvecs arose from optimization. this is true due to the single-cell extent of interior basis functions. A11 A12 A21 A22 x1 x2 = b1 b2 (II. data partitioning is the linchpin in the scheme. vectors are summed. but it would be far worse without modularity. Interface matvec allows one to use the same name for different data types. followed by a communication step to sum the product vector across block borders.and should continue to be . Note that tblock matrix_vector routines are also kept in matrix_mod.3) A11 − A12 A−1 A21  x1 = b1 − A12 A−1 b2 22 22 Schur component gives a reduced system.5 Data Partitioning Data partitioning is part of a philosophy of how NIMROD has been . Early versions of the code had all data storage together. They will undergo major changes when tblock basis functions are generalized.28 CHAPTER II. it can help. FORTRAN IMPLEMENTATION The matrix_vector product subroutines are available in the matrix_mod module.2) (II. However. Instead. This amounts to a direct application of matrix partitioning. For the linear system equation.1. NIMROD’s modularity follows from the separation of tasks described by the implementation map.  x2 = A−1 (b2 − A21 x1 ) 22  (II.].8 changes in the kernel’s README file. and A and A22 are invertible. xj and bi are the parts of vectors. [see the discussion of version 2. While this made it easy to get some parts of the code working. Additional notes regarding matrix storage: the stored coefficients are not summed across block borders.

a list of blocks that touch the boundary of the domain (exblock_list).] However. If you are only concerned about one application. and seam0. the modularity revision has stood the test of time. the number of border vertices equals the . but the approach hampered parallelization. only two blocks can share a cell side.6 Seams The seam structure holds information needed at the borders of grid blocks. excorner. As defined in the edge_type_mod module. We still require nimset to create seam0. Separation of the border data into a vertex structure and segment structure arises from the different connectivity requirements. while many blocks can share a vertex. This logical is set at nimrod start up and is used to determine where to apply injection current and the appropriate boundary condition. and the changes made to achieve modularity at 2. Additional arrays could be added to indicate where different boundary conditions should be applied. like the seam array itself. it is only used during initialization to determine which vertices and cell sides lie along the boundary. The seam array is dimensioned from one to the number of blocks. B. Eventually it became too cumbersome. The first component applies to the vertex and the second component to the associated segment. in many cases. are flags to indicate whether a boundary condition should be applied. a little extra effort leads to a lot more usage of development work. thought and planning for modularity and data partitioning reward in the long term. including the connectivity of adjacent blocks and geometric data and flags for setting boundary conditions. are tractable. seam0 was intended as a functionally equivalent seam for all space beyond the domain. we have added the logical array applV0(1:2.:). The scheme will need to evolve as new strides are taken. do what is necessary to get a result • However. DATA STORAGE AND MANIPULATION 29 encouraged repeated coding with slight modifications instead of taking the time to develop general purpose tools for each task.8 were extensive. The three logical arrays: expoint. even major changes. Instead. and r0point. The format is independent of block type to avoid block-specific coding at the step where communication occurs. an edge_type holds vertex and segment structures. Changing terms or adding equations requires relatively minimal amount of coding. Aside For flux injection boundary conditions. The seam_storage_mod module hold the seam array. The seam0 structure is of edge_type. [Particle closures come to mind. For each block the vertex and segment arrays have the same size.B. NOTE • These comments relate to coding for release.1. like generalizing finite element basis functions. But. Despite many changes since.

For tblocks. (entire domain for seam0) Within the vertex_type structure. or 2 to select seam vertex number.5). The ptr2 array is a duplicate of ptr.30 CHAPTER II. there is no requirement that the scan start at a particular internal vertex number. but references to seam0 are removed. For rblocks. but the starting locations are offset (see Figure II. (see Figure II. Both arrays have two dimensions. is=segment index. ptr is the one that must be defined during pre-processing. but the seam indexing always progresses counter clockwise around its block.5: iv=vertex index. The second index selects a particular connection.6) Connections for the one vertex common to all blocks are stored as: . While ptr2 is the one used during simulations. see information on parallelization). nvert=count of segment and vertices number of border segments. and the first index is either: 1 to select (global) block numbers (running from 1 to nbl_total. the ptr and ptr2 arrays hold the connectivity information. The ptr array is in the original format with connections to seam0. Both indices proceed counter clockwise around the block. not 1 to nbl. the ordering is unique. FORTRAN IMPLEMENTATION iv=7 is=7 is=nvert iv=nvert is=1 iv=1 is=2 iv=2 Figure II. It is defined during the startup phase of a nimrod simulation and is not saved to dump files. It gets defined in nimset and is saved in dump files.

2) = 2 – seam(1)%vertex(5)% ptr(2. iv=4 iv=3 iv=11 .1) = 3 – seam(1)%vertex(5)% ptr(2..1) = 11 – seam(1)%vertex(5)% ptr(1.1) = 5 – seam(2)%vertex(2)%ptr(1. DATA STORAGE AND MANIPULATION 31 block2 block3 iv=1 iv=2 iv=5 block1 iv=1 iv=1 iv=2 iv=2 Figure II..1) = 1 – seam(2)%vertex(2)%ptr(2.2) = 11 • seam(3)%vertex(11) .2) = 2 • seam(2)%vertex(2) – seam(2)%vertex(2)%ptr(1.B.2) = 3 – seam(2)%vertex(2)%ptr(2.6: 3 blocks with corresponding vertices • seam(1)%vertex(5) – seam(1)%vertex(5)% ptr(1.

In tblocks.g. For vertex information. In rblocks vertex(iv)%intxy(1:2) saves (ix.3) = 3 – seam(3)%vertex(11)%ptr(2. only two blocks can share a cell side. For vertices along a domain boundary.1) = 2 – seam(3)%vertex(11)%ptr(2. ensuring identical results for the different blocks. e.2) = 5 Notes on ptr: • It does not hold a self-reference. as mentioned above. the extra index follows the rblock vertex index and has the ranges 0:0. The %ave_factor and %ave_factor_pre arrays hold weights that are used during the iterative solves. It is used to find vertices located at R = 0.2) = 1 – seam(3)%vertex(11)%ptr(2. respectively. FORTRAN IMPLEMENTATION – seam(3)%vertex(11)%ptr(1. with respect to the poloidal plane. The %order array holds a unique prescription (determined during NIMROD startup) of the summation order for the different contributions at a border vertex. segment communication differers from vertex communication three ways. The scalar rgeom is set to R for all vertices in the seam. The %segment array (belonging to edge_type and of type segment_type) holds similar information. The various %seam_* arrays in vertex_type are temporary storage locations for the data that is communicated across block boundaries. but an extra array dimension is carried so that vector_type pointers can be used. analogous to the logical coordinates in rblocks.3) = 11 • The order of the connections is arbitrary The vertex_type stores interior vertex labels for its block in the intxy array. the vector_type section should have discussed their index ranges for data storage. Triangle vertices are numbered from 0 to mvert. there is only one block-interior vertex label.1) = 2 – seam(3)%vertex(11)%ptr(1. there are no references like: – seam(3)%vertex(11)%ptr(1. • First. These unit vectors are used for setting Dirichlet boundary conditions. tang and norm are allocated (1:2) to hold the R and Z components of unit vectors in the boundary tangent and normal directions. however. The list is 1D in tblocks. and intxy(2) is set to 0. also consistent with rblock cell numbering. intxy(1) holds its value. where regularity conditions are applied. .32 CHAPTER II. Aside: Though discussion of tblocks has been avoided due to expected revisions. The extra dimension for cell information has the range 1:1.iy) for seam index iv.

or there may be many nodes along a given element side. None of the segment information is saved. Before describing the routines that handle block border communication.nside. Were the operation for a real (necessarily 2D) vector_type. nfour must be 1 (‘1_i4).nside. This ensures self-consistency among the different seam structures and with the grid. Only the ptr. The next step is to perform the communication. Finally.nfour. but seam0 is also written.1_i4.nfour. The 1D ptr array reflects the limited connectivity. there may be no element_side_centered data (poly degree = 1). Thus. which invokes serial or parallel operations to accumulate vertex and segment data. Finally. The routines called to perform block border communication are located in the edge module. CALL edge_network(nqty.false. The code segment starts with DO ibl=1. The edge_load and edge_unload routines are used to transfer block vertex-type data to or from the seam storage. the fourth argument is a flag to perform the load and .seam(ibl)) ENDDO which loads vector component indices 1:nqty (starting location assumed) and Fourier component indices 1:nfour (starting location set by the third parameter in the CALL statement) for the vertex nodes and ‘nside cell-side nodes from the cvector_type rhsdum to seam_cin arrays in seam.nqty. DATA STORAGE AND MANIPULATION 33 • Second. • Third.nbl CALL edge_load_carr(rhsdum(ibl). The subroutine dump_write_seam in the dump module shows that very little is written.. Were the operation for a cvector_2D_type. it is worth noting what is written to the dump files for each seam. The single ibl loop includes both block types. intxy. segments are used to communicate off-diagonal matrix elements that extend along the cell side. There are three sets of routines for the three different vector types. as occurs in iter_cg_comp.) There are a few subtleties in the passed parameters. nfour must be 0 (‘0_i4 in the statement to match the dummy argument’s kind). and excorner arrays plus descriptor information are saved for the vertices. A typical example occurs at the end of the get_rhs routine in finite_element_mod.B. the tang and norm arrays have n extra dimension corresponding to the different element side nodes if poly degree > 1. NIMROD creates much of the vertex and all of the segment information at startup. Central among these routines is edge_network. Matrix element communication makes use of intxyp and inxyn internal vertex indices for the previous and next vertices along the seam (intxys holds the internal cell indices).

Adding a new equation to the physics model usually requires adding a new management routine as well as new integrand routines. The simplest matrices are diagonal in Fourier component and have decoupling between various real and imaginary vector components. The FORTRAN modules listed on the left side of the implementation map (Sec. there are now three different types of management routines. This section of the tutorial is intended to provide enough information to guide such development.34 CHAPTER II. C. the integrand modules hold coding that determines what appears in the volume integrals like those on pg11 for Faraday’s law.1 Management Routines The adv_* routines in file nimrod. The management routines control which integrand routines are called for setting-up an advance.nbl CALL edge_unload_carr(rhsdum(ibl). or they are called twice to predict then correct the effects of advection.2b. DO ibl=1.seam(ibl)) ENDDO The pardata module also has structures for the seams that are used for parallel communication only. To use the most efficient linear solver for each equation.1_i4. The final step is to unload the border-summed data. The other level subject to frequent development is the management level at the top of the map. These advances require .nfour.f manage the advance of a physical field. The sequence of operations performed by a management routine is outlined in Sec. C Finite Element Computations Creating a linear system of equations for the advance of a solution field is a nontrivial task with 2D finite elements. In fact. They also need to call the iterative solver that is appropriate for a particular linear system. adding terms to existing equations requires modifications at this level only. normal physics capability development occurs at only two levels. Near the bottom of the map. which is something of an archaic remnant of the pre-data-partitioning days. it communicates the crhs and rhs vector types in the computation_pointers module. If true. In many instances. They are called once per advance in the case of linear computations without flow.nside. A character pass-flag informs the management routine which step to take. This section describes more of the details needed to create a management routine. They are accessed indirectly through edge_network and do not appear in the finite element or linear algebra coding. D. FORTRAN IMPLEMENTATION unload steps within edge_network.nqty. A) have been developed to minimize the programming burden associated with adding new terms and equations.

Full use of an evolving number density field requires a 3D matrix solve for velocity. They differ in the following way: • Cell-interior elimination is not used. cvector_type array. global_matrix_type). • Before the iterative solve. Once the matrix and rhs are formed. B. matrix elements will be modified within matrix_create.4)%arr] is then used to hold the inverse of the interior submatrix (A−1 ). flags (2). so that operators for each Fourier components can be saved. This step allows us to write integrand routines for the change of a solution field rather than its new value. logical switch for using a surface integral. and this advance is managed by adv_v_3d. b. The second type is diagonal in Fouries component. The adv_v_clap is an example. routine name. followed by a b. there are vector and matrix operations to find the product of the matrix and the old solution field. i. 3D matrices. the matrix is the sum of a mass matrix and the discretized Laplacian. surface integrand name.4).c. Subroutine names for the integrand and essential boundary conditions are the next arguments.c.c. [Use rmat_elim= or cmat_elim= to signify real or complex matrix types. The call to matrix_create passes an array of global_matrix_type and an array of matrix_factor_type. • The iterative method solves Axnew = (b − xold). FINITE ELEMENT COMPUTATIONS 35 two matrix solves (using the same matrix) for each Fourier component. 22 The get_rhs calls are similar. • Loops over Fourier components are within the iterative solve. Summarizing: • Integrands are written for A and b with A∆x = b to avoid coding semi-implicit terms twice.C. but then solve for the new field so that the relative tolerance of the iterative solver is not applied to the change. hence adv_v_aniso is this type of management routine. flag and the ’pass’ character variable. The last argument is optional and its presence indicates that interior elimination is used. find b − Axold. essential b. The full semi-implicit operator has such anisotropy. and the result is added to the rhs vector before calling the iterative solve. The 3D management routines are similar to the 2D management routines. usually due to anisotropy. . If elimination of cell-interior data is appropriate (see Sec. but all real and imaginary vector components must be solved simultaneously. except the parameter list is (integrand routine name. and interior storage [rbl_mat(ibl)%mat(4. All these management routine types are similar with respect to creating a set of 2D matrices (used only for preconditioning in the 3D matrix-systems) and with respect to finding the rhs vector. The third type of management routine deals with coupling among Fourier components.e. since |∆x| |x| is often true.

finite_element just needs parameter-list information to call any routine out of the three classes. Data is saved (with time-centering in predictor steps) as basis function coefficients and at quadrature points after interpolating through the *block_qp_update calls. surface_ints. Instead. Next. or boundary modules. After an iterative solve is complete.3 Numerical integration Modules at the next level in the finite element hierarchy perform the routine operations required for numerical integration over volumes and surfaces. passing the same integrand routine name. allowing for a few options. See Sec. The three modules. for example. The parameter list information. rblock. . iter_factor finds some type of approximate factorization to use as a preconditioner for the iterative solve. C. they call the appropriate matelim routine to reduce the number of unknowns when poly_degree> 1. The latter prevents an accumulation of error from the iterative solver. where the full 3D matrix is never formed. including assumed-shape array definitions. surface integrand.53). and boundary condition routines are passed through matrix_create and get_rhs. Finally. The next step is to modify grid-vertex and cell-side coefficients through cell-interior elimination (b1 − A12 A−1 b2 from p. must use the same argument list as that provided by the interface block for ‘integrand’ in comp_matrix_create. and it may perform an optional surface integration. but the operations are those repeated for every field advance.36 CHAPTER II. but it is probably superfluous. The block-border 22 seaming then completes a scatter process: The final steps in get_rhs apply regularity and essential boundary conditions. there are some vector operations. Note that all integrand routines suitable for comp_matrix_create. Aside on the programming: • Although the names of the integrand. finite_element does not use the integrands. The real_ and comp_matrix_create routines first call the separate numerical integration routines for rblocks and tblocks. connections for a degenerate rblock (one with a circular-polar origin vertex) are collected. The get_rhs routine also starts with block-specific numerical integration. an rhs integrand name is passed. are provided by the interface blocks. C.2 Finite_element Module Routines in the finite_element module also serve managerial functions. including the completion of regularity conditions and an enforcement of Dirichlet boundary conditions. FORTRAN IMPLEMENTATION • The iterative solve used a ’matrix-free’ approach. Then. D for more information on the iterative solves.

FINITE ELEMENT COMPUTATIONS 37 A volume integral over this bicubic cell contributes to node coefficients at the positions.C. Figure II. Node coefficients along block borders require an edge_network call to get all contributions. A volume integral here contributes to node coefficients at the positions.7: Block-border seaming .

Our quadrilateral elements have (poly_degree+1)2 nonzero test functions in every cell (see the Figure II. dot denotes quadrature position. It starts by setting the storage arrays in the passed cvector_type to ∅. vertical side. It then allocates the temporary array.38 CHAPTER II. we would not need the temporary integrand storage at all. The rblock_get_comp_rhs routine serves as an example of the numerical integration performed in NIMROD. Thus each iteration of the do-loop finds the contributions for the same respective quadrature point in all elements of a block. The integration is performed as loop over quadrature points. not node tblock.4 [NIMROD uses two different labeling schemes for the basis function nodes. 4 elements in a block. FORTRAN IMPLEMENTATION ig=1 ig=2 Figure II. as described in B. so in the integral dξdηJ (ξ. cell interior) for efficiency in . Note that wi ∗ J appears here. integrand.8: 4 quad points. η)f (ξ. This approach would allow us to incorporate additional block types in a straightforward manner should the need arise. and surface. because contributions have to be sorted for each basis type and offset pair. R. horizontal side. η).7 and observe the number of node positions). using saved data for the logical offsets from the bottom left corner of each cell. The final step in the integration is the scatter into each basis function coefficient storage. The vector_type and global_matrix_type separate basis types (grid vertex. and wi ∗ J (ξi . if there were only one type of block. The matrix integration routines are somewhat more complicated. are separated to provide a partitioning of the different geometric information and integration procedures. In fact. ηi ). The call to the dummy name for the passed integrand then finds the equation and algorithm-specific information from a quadrature point. one quadrature point at a time. which is used to collect contributions for test functions that are nonzero in a cell. η) only. since the respective tblock integration routine calls the same integrands and one has to be able to use the same argument list. the integrand routine finds f(ξ. A blank tblock is passed along with the rb structure for the block.

The factor keff(1:nmodes)/bigr is the toroidal wavenumber for the data at Fourier index imode. Vector operations. Do not assume that a particular n-value is associated with some value of imode. The input and global parameteres are satisfied by the F90 USE statement of modules given those names. In the toroidal geometry. per length 2. Neither vary during a simulation. Domain decomposition also affects the values and range of keff. The matrix integration routines have to do the work of translating between the two labeling schemes. Pseudospectral computations.C. The Faraday’s law example in Section A. 5. . In linear 2πn geometry. so the basis function information is evaluated and saved in block-structure array as described in B. Therefore. Finite element basis functions and their derivatives with respect to R and Z. This is why there is a keff array. let’s review what is typically needed for an integrand computation. The latter is simpler and more tractable for the block-independent integrand coding. Wavenumbers for the toridal direction 3.4 Integrands Before delving further into the coding. holds the n-value. 2. is provided to find the value of the wavenumber associated with each Fourier component. The array keff(1:nmodes). the integrand routines use a single list for nonzero basis functions in each element.1. Values of solution fields and their derivatives. 6. Some of this information is already stored at the quadrature point locations. generic_alpha_eval and generic_ptr_set routines merely set pointers to the appropriate storage locations and do not copy the data to new locations.1. FINITE ELEMENT COMPUTATIONS 39 linear algebra. and bigr(:. Doing so will lead to unexpected and confusing consequences in other integrand routines.:) holds R. It is available through the qp_type storage. Linear computations often have nmodes=1 and keff(1) equal to the input parameter lin_nmax. It is important to remember that integrand computations must not change any values in the pointer arrays.1 suggests the following: 1. keff holds kn = and bigr=1. Solution field data is interpolated to the quadrature pions and stored at the end of the equation advances. 4. There are two things to note: 1. In contrast. but each node may have more than one label in this scheme. also described in B. Input parameters and global simulation data.] C. The basis function information depends on the grid and the selected solution space. keff.

a pointer set for α. like int arrays for other matrix computations. so separate indices for row and column n indices are not used. Continuing with the Faraday’s law example. the first term on the lhs of the second ∂B . One of them is always a blank structure. The first two dimensions are direction-vector-component or ‘quantity’ indices. determines which coefficient to use. the int array is filled. The spatial discretization gives the integrand αi αj equation on pg 11 is associated with ∂t (recall that J appears at the next higher level in the hierarchy). The adv_b_iso management routine passes the curl_de_iso integrand name. and the second is the row quantity (iq).1:my) for the cell indices in a grid-block.iv) for the j’ column basis index and the j row index. The next term in the lhs (from pgB. so lets examine that routine. η)ˆl exp (inφ) ∂tln test function dotted with gives for all l and n αj αj All of the stored matrices are 2D.40 CHAPTER II. The character variable integrand_flag. i. Though el is eR . Operators requiring a mass matrix can have matrix_create add it to another operator. Instead. the orthogonality of direction vectors implies that el does not appear. FORTRAN IMPLEMENTATION The k2ef array holds the square of each value of keff. The next two dimensions have range (1:mx. so there is coding for two different coefficients. regardless of symmetry. we need to evaluate ˆ ˆ ˆ ˆ ˆ × (αj el exp (inφ)). and a double loop over basis indices. To understand what is in the basis function loops. The last two indices are basis function indices (jv. the generic_evals module selects the appropriate data storage. but this keeps block specific coding out of integrands. It is more complicated than get_mass. The routine is used for creating the resistive diffusion operator and the semi-implicit operator for Hall terms. so iq=jq=1: ˆ αj (ξ. The get_mass subroutines then consists of a dimension check to determine the number of nonzero finite elemnt basis function in each cell. For get_mass.] The int array passed to get_mass has 6 dimensions. [The matrix elements are stored in the mass_mat data structure.e. it is straightforward to use a general .f and saved in the global module. as on pg 11. get_mass. η)ˆl exp (−in φ) e ∂Bj e αj (ξ. This helps make our matrix-vector products faster and it implies that the same routines are suitable for forming nonsymmetric matrices. The first is the column quantity (jq). This ‘mass’ matrix is so common that it has its own integrand routine. diagonal in n. Thus we need × (αj ˆl exp (−in φ)) · × (αj el exp (inφ)) e ˆ only. After the impedence coefficient is found. We will skip them in this discussion. Note that calls to generic_ptr_set pass both rblock andtblock structures for a field. but the general form is similar. Note that our operators have full storage. The basis loops appear in a conditional statement to use or skip the · term for the divergence cleaning. that is set in the advance subroutine of nimrod. eZ or eφ only.2) arises from implicit diffusion.

(ˆl )R = (ˆl )R = 1 and (ˆl )Z = (ˆl )φ = (ˆl )Z = (ˆl )φ = 0 so e int(1. 3.:. iq = (1. the scalar product × (αj el exp (−in φ)) · ˆ × (αj el exp (inφ)) is ˆ ∂αj ∂αj in αj inαj (ˆl )φ + e (ˆl )Z e (ˆl )φ − e (ˆl )Z e ∂Z R ∂Z R αj ∂αj inαj inαj (ˆl )R − e + (ˆl )φ e (ˆl )R − e + − R R ∂R R ∂αj ∂αj ∂αj ∂αj + (ˆl )Z − e (ˆl )R e (ˆl )Z − e (ˆl )R e ∂R ∂Z ∂R ∂Z αj ∂α + R ∂R (ˆl )φ e Finding the appropriate scalar product for a given (jq. the coupling between Re(BR. BZ ) and Im(Bφ ) is the same as that between Im(BR . saving CPU time. This simplifies the scalar product ˆ greatly for each case. Note how the symmetry of the operator is preserved by construction. only real poloidal coefficients are coupled to imaginary toroidal coefficients and vice versa.:.iv)= n2 ∂αj v αiv αj vαiv + R ∂Z ∂Z ×ziso where ziso is an effective impedance. making the operator phase independent. φ for el for l ⇒ iq = 1. An example of the poloidal-toroidal coupling for real coefficients is . This lets us solve for these separate groups of components as two real matrix equations.C. e × (αj ˆl exp (inφ)) = (αj exp (inφ)) = ˆ (αj exp (inφ)) × el = (αj exp (inφ)) × ˆl + αj exp (inφ) × (ˆl ) e e ∂α ˆ ∂α ˆ inα ˆ R+ Z+ αφ exp (inφ) ∂R ∂Z R ∂α inα ∂α inα ˆ ˆ (ˆl )φ − e (ˆl )Z R + e (ˆl )R − e (ˆl )φ Z e ∂Z R R ∂R ∂α ∂α ˆ + (ˆl )Z − e (ˆl )R φ exp (inφ) e ∂R ∂Z × (ˆl ) = 0 for l = R. Furthermore. Like other isotropic operators the resulting matrix elements are either purely real or purely imaginary.1. and the only imaginary elements are those coupling poloidal and toroidal vector components. e e e e e For (jq. Thus.iq) pair is then a matter of subˆ ˆ ˆ stituting R. FINITE ELEMENT COMPUTATIONS 41 direction vector and then restrict attention to the three basis vectors of our coordinate system. 2.jv. 1). BZ ) and Re(Bφ ). instead of one complex matrix equation. respectively. Z. Z e 1 ˆ × (ˆl ) = − Z for l = φand toroidal geometry e R Thus.

Coefficients for all Fourier components are created in one integrand call.1:nphi) for example. for example) the n = 0 component of a field is needed regardless of the n-value assigned to a processor. 1) element. is taken from the global module.:. int(iq.jv. It is the do-loop index set in matrix_create and must not be changed in integrand routines. From the first executable statement. and neoclassical contributions. Note that fft_nim concatenates the poloidal position indices into one index.1). ·B is found for error diffusion terms. the be. There are FFTs and pseudospectral operations to generate nonlinear terms. The ˆ computations are derived in the same manner as × (αj el exp (inφ)) given above. there are a number of pointers set to make various fields available. ˆ • In some case (semi-implicit operators.:.iv. The pointers for B are set to the storage for data from the end of the last time-split. and bez arrays are reset to the predicted B for the ideal electric field during corrector steps (indicated by the integrand_flag. The mpseudo parameter can be less . 4. After that. More pointers are then set for neoclassical calculations. E is only needed at the quadrature points. The resulting nonlinear ideal E is then transformed to Fourier components (see Sec B. The int array has 5 dimensions.3. The first nonlinear computation appears after the neoclassical_init select block. The vector aspect (in a linear algebra sense) of the result implies one set of basis function indices in the int array. 2. where E may contain ideal mhd.:. jmode. Z.1:mpseudo.iv)= nα j v R αiv αiv + R ∂R ×ziso inαj v R αiv αiv + R ∂R which appears through a transpose of the (3. • Other vector-differential operations acting on αj el exp (inφ) are needed elsewhere. φ). math_curl is used to find the corresponding current density.:. and the math_tran routine. resistive. Separate real storage is created and saved on all processors (see Sects F& C). FORTRAN IMPLEMENTATION int(1. Then.im). so that real_be has dimensions (1:3. As with the matrix integrand. The V and B data is transformed to functions of φ. where im is the Fourier index (imode). the brhs_mhd routine finds ×(αj el exp (−in φ))· ˆ E(R.ber. [compare with − from the scalar product on pg 41] Notes on matrix integrands: • The Fourier component index. 3. where the cross product. The rhs integrands differ from matrix integrands in a few ways: 1.42 CHAPTER II. V ×B is determined. Returning to Faraday’s law as an example.

l ). we encounter the loop over test-function basis function ˆ indices (j . In some cases. They work by changing the appropriate rows of a linear system to (Aj ln j ln )(xj ln ) = 0.6 Dirichlet boundary conditions The boundary module contains routines used for setting Dirichlet (or ‘essential’) boundary conditions.C. Further development will likely be required if more complicated operations are needed in the computations. A loop over the Fourier index follows the pseudospectral computations. C. Then. completing . except × (αj el exp (−inφ)) is replaced by the local E. so one should not relate the poloidal index with the 2D poloidal indices used with Fourier components. ˆ The integrand routines used during iterative solves of 3D matrices are very similar to rhs integrand routines. C.(Vs × B + (V × Bs + (V × B) (see Sec A. and the interpolate is left in local arrays. The linear ideal E terms are created using the data for the specified steady solution. and there is a sign change for − × (αj el exp (−in φ)) · E. and the seam data is not carried below the finite_element level. the surface integration is called one cell-side at a time from get_rhs. pres.5 Surface Computations The surface and surface_int modules function in an analogous manner to the volume integration and integrand modules. FINITE ELEMENT COMPUTATIONS 43 than the number of cells in a block due to domain decomposition (see Sec C). not pointers. Thus. It is passed to a surface integrand. For Dirichlet conditions on the normal component. for example. At present.1). It is therefore interpolated to the quadrature locations in the integrand routine with a generic_all_eval call.5) . The terms are similar to those in curl_de_iso. One important difference is that not all border segments of a block require surface computation (when there is one). 1D basis function information for the cell side is generated on the fly in the surface integration routines. Only those on the domain boundary have contributions. the resistive and neoclassical terms are computed.4) where j is a basis function node along the boundary. a linear combination is set to zero. They are used to find the dot product of a 3D matrix and a vector without forming matrix elements themselves (see Sec D. The only noteworthy difference with rhs integrands is that the operand vector is used only once. Near the end of the routine. and l and n are the direction-vector and Fourier component indices. respectively.2). for example. and presz are local arrays. the rhs of Ax = b is modified to ˆ ˆ b (I − nj nj ) · b ≡ ˜ (II.presr. In p_aniso_dot. (II.

Notes: • Changing boundary node equations is computationally more tractable than changing the size of the linear system. It would also be straightforward to make the ‘component’ information part of the seam structure. the dirichlet_op routines would need modification to be consistent with the rhs. If the conditions do not change in time. the error is no more than O(∆t) in any case. . • Removing coefficients from the rest of the system [through ·(I − nn) on the right side of ˆˆ A] may seem unnecessary. as implied by finite element theory. However. The dirichlet_op and dirichlet_comp_op routines perform the related matrix operations described above.8) The net effect is to apply the boundary condition and to remove xj n from the rest of the linear system. The ‘component’ character input is a flag describing which compoˆˆ nent(s) should be set to 0. it preserves the symmetric form of the operator. so that it could vary along the boundary independent of the block decomposition.7) (II. the existing routines can be used for inhomogeneous Dirichlet conditions. provided that the time rate of change does not appear explicitly. norm. However. which is important when using conjugate gradients. Since our equations are constructed for the change of a field and not its new value. The dirichlet_rhs routine in the boundary module modifies a cvector_type through operations like (I − nn)·. The seam data tang. The matrix becomes ˆ ˆ ˆ ˆ ˆ ˆ ˆ (I − nj nj ) · A · (I − nj nj ) + nj nj Dotting nj into the modified system gives ˆ ˆ xj n = nj · x = 0 Dotting (I − nj nj ) into the modified system gives ˆ ˆ ˆ ˆ ˜ b (I − nj nj ) · A · x = ˜ (II. the coding is involved. Some consideration for time-centering the boundary data may be required. However. one only needs to comment out the calls to dirichlet_rhs at the end of the respective management routine and in dirichlet_vals_init. different component input could be provided for different blocks. FORTRAN IMPLEMENTATION where nj is the unit normal direction at boundary node j. Since the dirichlet_rhs is called one block at a time. For time-dependent inhomogeneous conditions. too.6) (II.44 CHAPTER II. At present. one can adjust the boundary node values from the management routine level. Though mathematically simple. then proceed with homogeneous conditions in the system for the change in the field. and intxy(s) for each vertex (segment) are used to minimize the amount of computation. the kernel always applies the same Dirichlet conditions to all boundary nodes of a domain. since it has to address the offset storage scheme and multiple basis types for rblocks.

but VRn . the regular_ave routine is used to set Vφ = iV− at the appropriate nodes. At R = 0 nodes only.’ approach because there is no surface with . y = R sin φ transformation and derivatives into the 2D Taylor series expansion in (x. y) about the origin. To preserve symmetry with nonzero V− column entries.C. The regularity conditions may be derived by inserting the x = R cos φ. After the linear system is solved.11) The entire functional form of the regularity conditions is not imposed numerically. Returning to the power series behavior for R → 0. and regular_comp_op routines modify the linear systems to set Sn = 0 at R = 0 for n > 0 and to set VRn and Vφn to 0 for n = 0 and n ≥ 2. (II.7 Regularity conditions When the input parameter geometry is set to ‘tor’. The phase relation for VR1 and Vφ1 at R = 0 is enforced by the following steps: 1.e. φ) coordinate system. phase information for VR1 and Vφ1 is redundant. With the cylindrical (R. VR1 . For vectors. (II. Z. the Z-direction component satisfies the relation for scalars. change variables to V+ = (Vr + iVφ )/2 and V− = (Vr − iVφ )/2. Vφn ∼ R|n−1| ψ(R2 ) for n ≥ 0. Instead. at R = 0. add −i times the Vφ -row to the VR-row and set all elements of the Vφ -row to 0 except for the diagonal. our Fourier coefficients must satisfy regularity conditions so that fields and their derivatives approach φ-independent values at R = 0. The conditions are for the limit of R → 0 and would be too restrictive if applied across the finite extent of a computational cell. set all elements in a V+ column to 0. and Vφ is the vanishing radial derivative. Unfortunately. we focus on the leading terms of the expansions. the leading order behavior for S0 .10) Furthermore.9) where ψ is a series of nonnegative powers of its argument. Set V+ = 0. not toroidal. 3. The regular_vec. The routines in the regularity module impose the leading-order behavior in a manner similar to what is done for Dirichlet boundary conditions. We take φ to be 0 at R = 0 and enforce Vφ1 = iVR1 . we cannot rely on a ‘natural B. 2. and the left side of the grid has points along R = 0. It is straightforward to show that the Fourier φ-coefficients of a scalar S(R. This is like a Neumann boundary condition. FINITE ELEMENT COMPUTATIONS 45 C. i.C. (II. φ) must satisfy Sn(φ) ∼ R|n| ψ(R2 ). the domain is simply connected – cylindrical with axis along Z. regular_op.

w ∂αj ∂αj dV 2 . D.1 2D matrices The iter_cg_f90 and iter_cg_comp modules contain the routines needed to solve symmetricpositive-definite and Hermitian-positive-definite systems. Although normal seam communication is used during matrix_vector multiplication. Within the long iter_cg_f90. once one understands NIMROD’s vector_type manipulation. unlike matrix_vector product routines.f files are separate modules for routines for different preconditioning schemes. We should keep this in mind if a problem arises with S0 (or VZ0 ) near R = 0.12) R ∂R ∂R saved in the dr_penalty matrix structure. FORTRAN IMPLEMENTATION finite area along an R = 0 side of a computational domain.] D Matrix Solution At present. The regular_op and regular_comp_op routines add this penalty term before addressing the other regularity conditions. Instead. and a module for managing routines that solve a system(iter_cg_*). and may be compared with textbook descriptions of cg. there are also solver-specific block communication operations. and the matrix_type storage arrays have been arranged to optimized matrix-vector multiplication. The iter_cg module holds interface blocks to give common names and the real and complex routines. we add the penalty matrix. The iter_mat_com routines execute this communication using special functionality in edge_seg_network for the off-diagonal elements. After the partial factors are found and stored in the matrix_factor_type structure. Its positive value ∂ penalizes changes leading to ∂R = 0 in these elements. The rest of the iter_cg. [Looking through the subroutines. since it is added to linear equations for the changes in physical fields. (II.f file has external subroutines that address solver-specific block-border communication operations. They perform the basic conjugate gradient steps in NIMROD’s block decomposed vector_type structures. The weight w is nonzero in elements along R = 0 only. The basic cg steps are performed by iter_solve_*. NIMROD relies on its own iterative solvers that are coded in FORTRAN 90 like the rest of the algorithm. a module for managing partial factorization(iter_*_fac). with suitable normalization to matrix rows for S0 . The partial factorization routines for preconditioning need matrix elements to be summed across block borders(including off-diagonal elements connecting border nodes). it appears that the penalty has been added for VR1 and Vφ1 only. VR1 and Vφ1 . It is not diffusive. respectively. . border elements of the matric storage are restored to their original values by iter_mat_rest.f and iter_cg_comp.46 CHAPTER II.

To avoid these issues. Achieving parallel scaling would be very challenging. which inverts local direction vector couplings only. They also make coupling to external library solver packages somewhat challenging. and cell-interior data may or may not be eliminated. results from block-based partial-factors(the “direct”. effective for mid-range condition numbers(arising roughly when ∆texplicit−limit Neither of these two schemes has been updated to function with higher-order node data. In ˜ ˜ the preconditioning step. were the resistivity in our form if Faraday’s law on p. and the latter has its own parallel communication(see D). then multiply z by ave_factor_pre before summing. φ) µ0 × (¯j l e−in φ ) · α × (¯j l einφ ) α (II. The block and global line-Jacobi schemes use 1D solves of couplings along logical directions. inverting A. “bl_ilu_*”. D. simply multiplying border z-value by ave_factor and seaming after preconditioning destroys the symmetry of the operation(and convergence). we symmetrize the averaging by multiply border elements of r by √ ˜ ave_factor_pre(∼ ave factor). Z. Storage requirements for such a matrix would have to grow as the number of Fourier components is increased. The Fourier representation leads to matrices that are dense in n-index when φ-dependencies appear in the lhs of an equation. it would generate off-diagonal in n contributions: Bj ln j ln dRdZdφ η(R. but they complicate the preconditioning operations. Computations with both block types and solver = ‘diagonal’ will use the specified solver in rblocks and ‘diagonal’ in tblocks. These idiosyncracies have little effect on the basic cg steps. MATRIX SOLUTION 47 Other seam-related oddities are the averaging factors for border elements. Computationally. The routine is simple. There is also diagonal preconditioning. but it has to account for redundant storage of coefficients along block borders. Each preconditioner has its own set of routines and factor storage. even with parallel decomposition.2 3D matrices The conjugate gradient solver for 3D systems is mathematically similar to the 2D solvers. However.D. hence ave_factor. where the matrix-vector products needed for cg iteration are formed directly from rhs-type finite element computation. Instead. we use a ‘matrix-free’ approach. This simple scheme is the only one has functions in both rblocks and tblocks. except for the global and block line-Jacobi algorithms which share 1D matrix routines. The block-direct option uses Lapack library routines. and “bl_diag*” choices) are averaged at block borders. it is quite different.? a function of φ. the solve of Az = r where A is an approximate A and r is the residual. For example.13) which may be nonzero for all n . degenerate points in rblocks. The scalar product of two vector_type is computed by iter_dot. The block-incomplete factorization options are complicated but ∆t τA ). . NIMROD grids may have periodic rblocks.

The data is converted to ˆ cylindrical components after evaluating at quadrature point locations. Finally. the product is then the sum of the finite element result and a 2D-matrix-vector multiplication. The Bj ln data are coefficients of the operand. Thus. Further.] Allocation and initialization of storage are primary tasks of the start-up routines located in file nimrod_init. iter_3d_cg solves systems for changes in fields directly.14) Identifying the term in brackets as µ0J(R. Adding new variables and matrices is usually a straightforward task of identifying an existing similar data structure and copying and modifying calls to allocation routines. The solution field at the beginning of the time split is passed into iter_3d_cg_solve to scale norms appropriately for the stop condition. [The storage modules and structure types are described in Sec. . iter_3d_solve calls get_rhs in finite element. and the e3 of ja_eq is the contravariant Jφ/R. passing a dot-product integrand name.f. B. but when they are associated with finite-element structures. • Initialization routines have conditional statements that determine what storage is created for different physics model options. The quadrature_save routine allocates and initializes quadrature_point storage for the steady-state (or ’equilibrium’) fields. The variable_alloc routine creates quadrature-point storage with calls to rblock_qp_alloc and tblock_qp_alloc. instead of calling an explicit matrixvector product routine. The scheme uses 2D matrix structures to generate partial factors for preconditioning that do not address n → n coupling. FORTRAN IMPLEMENTATION We can switch the summation and integration orders to arrive at dRdZdφ × (¯j l e−in φ ) · α η(R. E Start-up Routines Development of the physics model often requires new *block_type and matrix storage. φ). calls to generic_all_eval and math_curl create µ0Jn. variable_alloc creates vector_type structures used for diagnostic computations. This integrand routine uses the coefficients of the operand in finite-element interpolations. and it initializes quadrature-point storage for dependent fields. There are also lagr_quad_alloc and tri_linear_alloc calls to create finite-element structures for nonfundamental fields (like J which is computed from the fundamental B field) and work structures for predicted fields. Z. Additional field initialization information: • The e3 component of be_eq structures use the covariant component (RBφ) in toroidal ˆ geometry. The cylindrical components are saved in their respective qp structures. φ) [ µ0 Bj ln j ln × (¯j l einφ )] α (II. Z. shows how the product can be found as a rhs computation.48 CHAPTER II. the curl of the interpolated and FFT’ed B. It also accepts a separate 2D matrix structure to avoid repetition of diagonal in n operations in the dot-integrand.

and rates of advection are found from |v · xi /|xi|2 | in each cell. over the φ-direction. Velocity vectors are averaged over all nodes in an element. and we compute new matrices when the change exceeds a tolerance (ave_change_limit).] The ave_field_check subroutine uses the vector_type pointers to facilitate the test. Any new structure allocations can be coded by copying and modifying existing allocations. we determine how much the fields have changed sicne the last matrix update. where xi is a vector displacement across the cell in the i-th logical coordinate direction. C). The new_dt and ave_field_check subroutines are particularly important. The coefficient for the isotropic part of the operator is based on the maximum difference between total pressures and the steady_plus_n=0 pressures.f. Parallel communication is required for domain decomposition of the Fourier components (see Sec. computing matrix elements and finding partial factors fro preconditioning are computationally intensive operations.F. UTILITIES 49 • The pointer_init routine links the vector_type and finite element structures via pointer assignment. • The boundary_vals_init routine enforces the homogeneous Dirichlet boundary conditions on the initial state.f file has subroutines for operations that are not encompassed by finite _element and normal vector_type operations. New fields will need to be added to this list too. Before leaving utilities. The computation requires information from . which is used for the 2D preconditioner bb matrix for anisotropic thermal conduction. let’s take a quick look at find_bb. though not elegantly coded. It also considers the grid-vertex data only. Teh new_dt routine computes the rate of flow through mesh cells to determine if advection should limit the time step. flags such as b0_changed in the global module are set to true. and *_matrix_fac_degen subroutines. Thus. To avoid calling these operations during every time step. subroutines. assuming it would not remain fixed while other coefficients change. iter_fac_alloc. The matrix_init routine allocates matrix and preconditioner storage structures. However. The ave_field_check routine is an important part of our semi-implicit advance. Operations common to all matrix allocations are coded in the *_matrix_init_alloc. A similar computation is performed for electron flow when the two-fluid model is used. F Utilities The utilities. The ‘linear’ terms in the operators use the steady-state fields and the n = 0 part of the solution. [Matrices are also recomputed when ∆t changes. Selected calls can be commented out if inhomogeneous conditions are appropriate. and the storage arrays for the n = 0 fields or nonlinear pressures are updated. This routine is used to find the toroidally symmetric part of the ˆˆ dyad. both the ‘linear’ and ‘nonlinear’ parts of the semi-implicit operator change in time. If the tolerance is exceeded.

This routine decides how much storage is required. First. except extrap_init. when adding a new equation. so it cannot occur in a matrix integrand routine. ˆˆ is created and stored at quadrature point locations. Code development rarely requires modification of these routines.f. and it creates an index for locating the data for each advance. for example) use rblock and tblock numerical integration of the integrands coded in the diagnostic_ints module.50 CHAPTER II. [Moving the jmode loop from matrix_create to matrix integrands is neither practical nor preferrable. They ’scatter’ to any vector_type storage provided by the calling routine.] Thus. providing the initial guess for an iterative linear system solve. A description of the parameters should be provided if other users will have .] The diagnostic_ints integrands differ in that there are no αj l e−in φ test functions. G Extrap_mod The extrap_mod module holds data and routines for extrapolating solution fields to the new time level. Diagnostics requiring surface or volume integrals (toroidal flux and kinetic energy. increase the dimension fo extrap_q. extrap_nq. consistent with an integrandbb type of computation. The numerical integration and integrands differ only slightly from those used in forming systems for advancing the solution. The routine uses the mpi_allreduce command to sum contributions from different Fourier ’layers’. FORTRAN IMPLEMENTATION all Fourier components. but separate from the finite-element hierarchy. [The *block_get_rhs routines have flexibility for using different basis functions through a simulation. ¯ The integrals are over physical densities. and extrap_int and define the new values in extrap_init. the cell_rhs and cell_crhs vector_types are allocated with poly_degree=0. using existing code for examples is helpful. Routines like find_bb may become common if we find it necessary to use more 3D linear systems in our advances. The first part of the input module defines variables and default values. The amount of data saved depends on the extrap_order input parameter. Again. Therefore. The output from probe_hist is presently just a collection of coefficients from a single grid vertex.f (in the nimset directory and an addition to parallel. This allocates a single cell-interior basis and no grid or side bases. I I/O Coding new input parameters requires two changes to input. H History Module hist_mod The time-dependent data written to XDRAW binary or text files is generated by the routines in hist_mod.

These calls can be copied and modified to add new fields. Data is transferred to other processors via MPI communication coded in parallel_io. so that it can be read from a nimrod.f files are different to avoid parallel data requirements in NIMSET. [But.I. Be sure to use another mpi_bcast or bcase_str with the same data type as an example.in to all others. Changes to dump files are made infrequently to maintain compatibility across code versions to the greatest extent possible. . rblocks (* write rblock. • All data is written as 8-byte real to improve portability of the file while using FORTRAN binary I/O. • Only one processor reads and writes dump files. However. • The NIMROD and NIMSET dump. the parameter must be added to the appropriate namelist in the read_namelist subroutine. Any changes must be made to both files. data structures are written and read with generic subroutines. * read tblock) Within the rblock and tblock routines are calls to structure-specific subroutines. which makes it easy to add fields. The overall layout of a dump file is 1. * read rblock) 4.f is just to add the new parameterto the list in broadcast_input.] Other dump notes: • The read routines allocate data structures to appropriate sizes just before a record is read.f. I/O 51 access to the modified code. In addition. seams (dump_write_seam. tblocks (* write tblock. This sends the read values from the single processor that reads nimrod. A new parameter should be defined near related parameters in the file.in file. global information (dump_write and dump_read) 2. The change required in parallel. don’t expect compatibility with other dump files. dump_read_seam) 3.

52 CHAPTER II. FORTRAN IMPLEMENTATION .

f for an MPI example. The first statements in the main program establish communication between the different processes. A well conceived implementation (thanks largely to Steve Plimpton of Sandia) keeps most parallel communication in modular coding. needed in every process is the sum over all n-values. the independence of the processes is not. processes determine what part of the simulation to perform (in parallel_block_init). i. a basic understanding of the domain decomposition is necessary for successful development. ‘high-performance’ computing implies parallel computing. except when it encounters calls to communicate with other processes. From this information and that read from the input and dump files. Each process then performs its part of the simulation as if it were performing a serial computation. Z) for a potentially limited range of n-values and grid blocks. and all developers will probably write at least one mpi_allreduce call at some time. Z)ˆn (R.e. A detailed description of NIMROD’s parallel communication is beyond the scope of this tutorial. ˆˆ for each grid block assigned to a 53 . lets return to the find_bb routine in utilities. isolated from the physics model coding. so NIMROD has always been developed as a parallel code. The program itself is responsible for orchestration without an active director. However. Each process is finding ˆn(R. Before describing the different types of domain decomposition used in NIMROD. This fact was readily apparent at the inception of the NIMROD project. and tells each process the total number of processes. When mpirun is used to launch a parallel simulation. What’s b b n bb. it starts multiple processes that execute NIMROD.Chapter III Parallelization A Parallel communication introduction For the present and forseeable future. While the idea of dividing a simulation into multiple processes is intuitive. assign an integer label (node) to each process. The distributed memory paradigm and MPI communication were chosen for portability and efficiency.

The rb(ibl)%id and tb(ibl)%id descriptors can be used to map a processor’s local block index (ibl) to the global index (id) defined in nimset. the parallel_seam_init routine in parallel. the nrbl and nbl values are the number of rblocks and rblocks+tblocks in a process.C. each process proceeds independently until its next MPI call. this is just a matter of having each processor choose a subset of the blocks.f creates new data structures during start-up. are saved in the global module. Then. Block to block communication between different processes is handled within the edge_network and edge_seg_network subroutines. In fact. These routines call parallel_seam_comm and related routines instead of the serial edge_accumulate. This need is met by calling mpi_allreduce within grid_block do loops. . regardless of parallel decomposition. or starting from nrbl+1 and ending at nbl. Decomposition of the data itself is facilitated by the FORTRAN 90 data structures. nrbl_total and nbl_total.54 CHAPTER III. the block subsets are disjoint. but the array at the highest level is over processes with which the local process shares block border information. Each process has an rb and/or tb block_type array but the sum of the dimensions is the number of elements in its subset of blocks. The parallel_seam_comm routine (and its complex version) use MPI ‘point to point’ communication to exchange information. parallel simulations assign new block index labels that are local to each process. These send and recv structures are analogous to seam structures. cross block communication is invoked with one call. For grid block decomposition. For those interested in greater details of the parallel block communication. Issue a mpi_irecv to every process in the exchange.[pardata has comment lines defining the function of each of its variables. Glancing through finite_element. In routines performing finite element or linear algebra computations. This means that each process issues separate ‘send’ and ‘receive’ request to every other process involved in the communication. one sees do-loops starting from 1 and ending at nrbl.] The global index limits. The general sequence of steps used in NIMROD for ‘point to point’ communications is 1. Within a layer of Fourier components (described in III. which indicates readiness to accept data. The resulting array is returned to all processes that participate in the communication. PARALLELIZATION process. parallel_block_init sets up the inverse mapping global2local array that is stored in pardata.f. B Grid block decomposition The distributed memory aspect of domain decomposition implies that each processor needs direct access to a limited amount of the simulation data. Thus. Here an array of data is sent by each process. and the MPI library routine sums the data element by element with data from other processes that have difference Fourier components for the same block.).

g.) instead of the entire array. Once verification is complete. and the other creates a tag grouping all processes with the same subset of blocks. . The difficulty is overcome by passing the first element....for the same layer as the local process . One creates a communication tag grouping all processes in a layer. Others appear after scalar products of vectors in 2D iterative solves: different layers solve different systems simultaneously. C Fourier “layer” decomposition By this point it’s probably evident that Fourier components are also separated into disjoint subsets. the matrices would always be 2D. CALL mpi_irecv(recv(irecv)%data(1).. 3. Process groups are established by the mpi_comm_split routine. Caution: calls to nonblocking communication (mpi_isend. As a prelude to the next section.mpi_irecv.to the variable inode. These tags are particularly useful for collective communication within the groups of processes. 4. Separating processes by some condition (e. called layers.. and Fourier component interaction would occur in explicit pseudospectral operations only. Perform some serial work. comm_mode. like seaming among different blocks of the local process to avoid wasting CPU time while waiting for data to arrive.g. An example statement inode=block2proc(ibl_global assigns the node label for the process that owns global block number ibl_global ..) seem to have difficulty with buffers that are part of F90 structures when the mpich implementation is used. The new 3D linear systems preserve layer decomposition by avoiding explicit computation of matrix elements that are not diagonal in Fourier index. There are two in parallel_block_init. block border communication only involves processes assigned to the same Fourier layer. Issue a mpi_send to every process involved which sends data from the local process. The find_bb mpi_allreduce was one example. the mpi_comm_world tag appears. the process can continue on to other operations. comm_layer. same layer) is an important tool for NIMROD due to the two distinct types of decomposition.C. e.. FOURIER “LAYER” DECOMPOSITION 55 2. The motivation is that a large part of the NIMROD algorithm involves no interaction among different Fourier components. but different layers. or where ‘point to point’ communication (which references the default node index) is used. Where all processes are involved. This issue is addressed in the block2proc mapping array. The sum across blocks involves processes in the same layer only. %data. As envisioned until recently... Use mpi_waitany to verify that expected data from each participating process has arrived. hence calls to mpi_allreduce with the comm_layer tag.

isolating it from the integrand routines. the decomposition of the domain is changed before an “inverse” FFT (going from Fourier coefficients to values at toroidal positions ) and after a “forward” FFT. . PARALLELIZATION Configuration!space Decomp. communication among layers is required to multiply functions of toroidal angle (see B. For illustrative purposes. we can consider a single-block problem without loss of generality. • Communication is carried out inside fft_nim. mpseudo=5 on il=0 and mpseudo=4 on il=1 Parameters are mx=my=3. without redundancy. consider a poorly balanced choice of lphi= 3.56 CHAPTER III. Furthermore. and nlayers= 2.1). nlayer=2. The scheme lets us use serial FFT routines. 0 ≤ n ≤nmodes_total−1 with nmodes_total= 3. nf=mx×my=9 Config-space Decomp: nphi=2lphi = 8.1) Notes: • The domain swap uses the collective mpi_alltoallv routine. il=1 il=0 Normal Decomp il=1 n=2 iphi=2 il=0 n=1 iphi=1 n=0 Figure III. lphi=3 Apart from the basic mpi_allreduce exchanges.1: Normal Decomp: nmodes=2 on il=0 and nmodes=1 on il=1. (see Figure III. and it maintains a balanced workload among the processes. (layers must be divisible by nlayers.) Since pseudo-spectral operations are performed on block at a time.

Krylov-space iterative methods require very many iterations to reach convergence in these conditions.D. (see our 1999 APS poster. point-to-point communication is called (from parallel_line_* routines) instead of collective communication. lines are extended across block borders.. Each approximate system ˜ Az = r is solved by inverting 1D matrices.. D Global-line preconditioning That ill-conditional matrices arise at large δt implies global propagation of perturbations within a single time advance.) .” on the web site. However. To achieve global relaxation. GLOBAL-LINE PRECONDITIONING 57 • calls to fft_nim with nr = nf duplicate the entire block’s config-space data on every layer. The most effective method we have found for cg on very ill-conditioned matrices is a line-Jacobi approach. They are defined by eliminating off-diagonal connections in the logical s-direction for one approximation and by eliminating in the n-direction for the other. “Recent algorithmic . Factoring and solving these systems uses a domain swap. unless the precondition step provides global “relaxation”. We actually invert two approximate matrices and average the result. not unlike that used before pseudospectral computations.

58 CHAPTER III. PARALLELIZATION .

Marder. of Docs. A. Phys. 1987. S. c [8] B.. editors. Lionello. J. and J. A new semi-implicit approach for MHD computation. Comput. 1991. 97:444. [2] G. E. Glasser. T. J. Semic implicit magnetohydrodynamic calculations. C. Krall and Trivelpiece. J. H. C. Redwine.. and E.O. A. Z. A. J. 1986. J. S. 1989. and the NIMROD Team. Harned. J. An analysis of the finite element method. U. M. Douglas S.. A. Miki´. Comput. S.Bibliography [1] N. D. Phys. Gianakon. Kruger. Springer. Schnack. Phys. Barnes. J. 195:355–386. Tarditi. Barnes.. 68:48. Stegun.S. Accurate semi-implicit treatment of the Hall effect in MHD c computations. D.: National Bureau of Standards: For sale by the Supt. Handbook of Mathematical Functions. Phys.. Caramana. D. R. Upgrading to Fortran 90. Miki´. A method for incorporating Gauss’ law into electromagnetic PIC codes. Miki´. Nonlinear magnetohydrodynamics simulation using high-order finite elements. Wesley-Cambridge Press. D. Washington. Abramowitz and I. A. Strang and G. 70:330–354. [7] R. A. 1987. J. Fix.P. 1995. R. D. Linker. Phys. 1981. A. [4] D. Lerbinger and J. Z. Principles of Plasma Physics. Plimpton. [6] D. [5] K. Chu. 152:346.. 59 . 83:1. San Francisco Press. [10] C. Nebel. S. 2004. Sovinec. F. Comput. [3] M. Harned and Z. G. 1999.. Luciani. C.. Comput. 1964. 1987. Schnack. J. Comput. Comput. D. [9] C. Phys.

Sign up to vote on this title
UsefulNot useful