This action might not be possible to undo. Are you sure you want to continue?
A Tutorial on NIMROD Physics Kernel
Code Development
Carl Sovinec
University of Wisconsin–Madison
Department of Engineering Physics
August 1–3, 2001
Madison, Wisconsin
Contents
Contents i
Preface 1
I Framework of NIMROD Kernel 3
A Brief Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
A.1 Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
A.2 Conditions of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
A.3 Geometric Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 6
B Spatial Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
B.1 Fourier Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 7
B.2 Finite Element Representation . . . . . . . . . . . . . . . . . . . . . . 8
B.3 Grid Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
B.4 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
C Temporal Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
C.1 Semiimplicit Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
C.2 Predictorcorrector Aspects . . . . . . . . . . . . . . . . . . . . . . . 15
C.3 Dissipative Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
C.4 ∇· B Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
C.5 Complete time advance . . . . . . . . . . . . . . . . . . . . . . . . . . 16
D Basic functions of the NIMROD kernel . . . . . . . . . . . . . . . . . . . . . 18
II FORTRAN Implementation 19
A Implementation map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
B Data storage and manipulation . . . . . . . . . . . . . . . . . . . . . . . . . 19
B.1 Solution Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
B.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
B.3 Vector_types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
B.4 Matrix_types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
B.5 Data Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
B.6 Seams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
i
ii CONTENTS
C Finite Element Computations . . . . . . . . . . . . . . . . . . . . . . . . . . 34
C.1 Management Routines . . . . . . . . . . . . . . . . . . . . . . . . . . 34
C.2 Finite_element Module . . . . . . . . . . . . . . . . . . . . . . . . . 36
C.3 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . 36
C.4 Integrands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
C.5 Surface Computations . . . . . . . . . . . . . . . . . . . . . . . . . . 43
C.6 Dirichlet boundary conditions . . . . . . . . . . . . . . . . . . . . . . 43
C.7 Regularity conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
D Matrix Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
D.1 2D matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
D.2 3D matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
E Startup Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
F Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
G Extrap_mod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
H History Module hist_mod . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
I I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
III Parallelization 53
A Parallel communication introduction . . . . . . . . . . . . . . . . . . . . . . 53
B Grid block decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C Fourier “layer” decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 55
D Globalline preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
List of Figures
I.1 continuous linear basis function . . . . . . . . . . . . . . . . . . . . . . . . . 9
I.2 continuous quadratic basis function . . . . . . . . . . . . . . . . . . . . . . . 9
I.3 input geom=‘toroidal’ & ‘linear’ . . . . . . . . . . . . . . . . . . . . . . 10
I.4 schematic of leapfrog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
II.1 Basic Element Example. (n_side=2, n_int=4) . . . . . . . . . . . . . . . . . 22
II.2 1D Cubic : extent of α
i
in red and α
j
in green, integration gives joff= −1 25
II.3 1D Cubic : extent of α
i
in red and β
j
in green, note j is an internal node . . 26
II.4 1D Cubic : extent of β
i
in red and β
j
in green, note both i, j are internal nodes 26
II.5 iv=vertex index, is=segment index, nvert=count of segment and vertices . 30
II.6 3 blocks with corresponding vertices . . . . . . . . . . . . . . . . . . . . . . . 31
II.7 Blockborder seaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
II.8 4 quad points, 4 elements in a block, dot denotes quadrature position, not node 38
III.1 Normal Decomp: nmodes=2 on il=0 and nmodes=1 on il=1, nf=mx×my=9
Conﬁgspace Decomp: nphi=2
lphi
= 8, mpseudo=5 on il=0 and mpseudo=4 on
il=1
Parameters are mx=my=3, nlayer=2, lphi=3 . . . . . . . . . . . . . . . . . . 56
iii
iv LIST OF FIGURES
Preface
The notes contained in this report were written for a tutorial given to members of the
NIMROD Code Development Team and other users of NIMROD. The presentation was
intended to provide a substantive introduction to the FORTRAN coding used by the kernel, so
that attendees would learn enough to be able to further develop NIMROD or to modify it
for their own applications.
This written version of the tutorial contains considerably more information than what
was given in the three halfday sessions. It is the author’s hope that these notes will serve
as a manual for the kernel for some time. Some of the information will inevitably become
dated as development continues. For example, we are already considering changes to the
timeadvance that will alter the predictorcorrector approach discussed in Section C.2. In
addition, the 3.0.3 version described here is not complete at this time, but the only diﬀerence
reﬂected in this report from version 3.0.2 is the advance of temperature instead of pressure
in Section C.5. For those using earlier versions of NIMROD (prior to and including 2.3.4),
note that basis function ﬂexibility was added at 3.0 and data structures and array ordering
are diﬀerent than what is described in Section B.
If you ﬁnd errors or have comments, please send them to me at sovinec@engr.wisc.edu.
1
2 LIST OF FIGURES
Chapter I
Framework of NIMROD Kernel
A Brief Reviews
A.1 Equations
NIMROD started from two generalpurpose PDE solvers, Proto and Proteus. Though rem
nants of these solvers are a small minority of the present code, most of NIMROD is modular
and not speciﬁc to our applications. Therefore, what NIMROD solves can and does change,
providing ﬂexibility to developers and users.
Nonetheless, the algorithm implemented in NIMROD is tailored to solve ﬂuidbased mod
els of fusion plasmas. The equations are the Maxwell equations without displacement current
and the single ﬂuid form (see [1]) of velocity moments of the electron and ion distribution
functions, neglecting terms of order m
e
/m
i
smaller than other terms.
Maxwell without displacement current:
Ampere’s Law:
µ
0
J = ∇× B
Gauss’s Law:
∇· B = 0
Faraday’s Law:
∂B
∂t
= −∇× E
Fluid Moments → Single Fluid Form
Quasineutrality is assumed for the time and spatial scales of interest, i.e.
• Zn
i
n
e
→n
3
4 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
• e(Zn
i
− n
e
)E is negligible compared to other forces
• However ∇· E = 0
Continuity:
∂n
∂t
+V· ∇n = −n∇· V
Centerofmass Velocity Evolution:
ρ
∂V
∂t
+ ρV· ∇V = J ×B− ∇p − ∇· Π
• Π often set to ρν∇V
• Kinetic eﬀects, e.g., neoclassical, would appear through Π
Temperature Evolution (3.0.2 & earlier evolve p, p
e
)
n
α
γ − 1
∂
∂t
+V
α
· ∇
T
α
= −p
α
∇· V
α
− Π
α
: ∇V
α
−∇· q
α
+Q
α
• q
α
= −n
α
χ
α
· ∇T
α
in collisional limit
• Q
e
includes ηJ
2
• α = i, e
• p
α
= n
α
kT
α
Generalized Ohm’s (∼ electron velocity moment)(underbrace denotes input parameters, un
less otherwise noted ohms=)
E = −V× B
. .. .
‘mhd’
‘‘mhd & hall’
‘2fl’
+ηJ
....
elecd>0
+
1
ne
J × B
. .. .
‘hall’
‘mhd & hall’
‘2fl’
+
m
e
ne
2
∂J
∂t
+∇· (JV +VJ)
. .. .
advect=‘all’
−
e
m
e
(∇p
e
. .. .
‘2fl’
+∇· Π
. .. .
neoclassical
)
¸
¸
¸
¸
¸
¸
¸
A steadystate part of the solution is separated from the solution ﬁelds in NIMROD. For
resistive MHD for example, ∂/∂t →0 implies
A. BRIEF REVIEWS 5
1. ρ
s
V
s
· ∇V
s
= J
s
× B
s
− ∇p
s
+∇· ρ
s
ν∇V
s
2. ∇· (n
s
V
s
) = 0
3. ∇×E
s
= ∇× (ηJ
s
−V
s
× B
s
) = 0
4.
n
s
γ−1
V
s
· ∇T
s
= −p
s
∇· V
s
+∇· n
s
χ · ∇T
s
+Q
s
Notes:
• An equilibrium (GradShafranov) solution is a solution of 1 (J
s
× B
s
= ∇p
s
); 2 – 4
• Loading an ‘equilibrium’ into NIMROD means that 1 – 4 are assumed. This is appro
priate and convenient in many cases, but it’s not appropriate when the ‘equilibrium’
already contains expected contributions from electromagnetic activity.
• The steadystate part may be set to 0 or to the vacuum ﬁeld.
Decomposing each ﬁeld into steady and evolving parts and canceling steady terms gives
(for MHD)
[A →A
s
+A]
∂n
∂t
+ V
s
· ∇n +V· ∇n
s
+V· ∇n = −n
s
∇· V− n∇· V
s
− n∇· V
(ρ
s
+ ρ)
∂V
∂t
+ ρ [(V
s
+ V) · ∇(V
s
+ V)] + ρ
s
[V
s
· ∇V+ V· ∇V
s
+ V· ∇V]
= J
s
× B+J × B
s
+J × B−∇p +∇· ν [ρ∇V
s
+ ρ
s
∇V + ρ∇V]
∂B
∂t
= ∇×(V
s
×B+ V× B
s
+V× B− ηJ)
(n + n
s
)
γ −1
∂T
∂t
+
n
γ −1
[(V
s
+ V) · ∇(T
s
+T)] +
n
s
γ − 1
[V
s
· ∇T + V· ∇T
s
+V· ∇T]
= −p
s
∇· V− p∇· V
s
− p∇· V
+∇· [n
s
χ · ∇T + nχ · T
s
+nχ · ∇T]
Notes:
6 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
• The code further separates χ (and other dissipation coeﬃcients ?) into steady and
evolving parts.
• The motivation is that timeevolving parts of ﬁelds can be orders of magnitude smaller
than the steady part, and linear terms tend to cancel, so skipping decompositions would
require tremendous accuracy in the large steady part. It would also require complete
source terms.
• Nonetheless, decomposition adds to computations per step and code complexity.
General Notes:
1. Collisional closures lead to local spatial derivatives only. Kinetic closures lead to
integrodiﬀerential equations.
2.
∂
∂t
equations are the basis for the NIMROD timeadvance.
A.2 Conditions of Interest
The objective of the project is to simulate the electromagnetic behavior of magnetic conﬁne
ment fusion plasmas. Realistic conditions make the ﬂuid equations stiﬀ.
• Global magnetic changes occur over the global resistive diﬀusion time.
• MHD waves propagate 10
4
– 10
10
times faster.
• Topologychanging modes grow on intermediate time scales.
* All eﬀects are present in equations suﬃciently complete to model behavior in experi
ments.
A.3 Geometric Requirements
The ability to accurately model the geometry of speciﬁc tokamak experiments has always
been a requirement for the project.
• Poloidal cross section may be complicated.
• Geometric symmetry of the toroidal direction was assumed.
• The Team added:
– Periodic linear geometry
– Simplyconnected domains
B. SPATIAL DISCRETIZATION 7
B Spatial Discretization
B.1 Fourier Representation
A pseudospectral method is used to represent the periodic direction of the domain. A
truncated Fourier series yields a set of coeﬃcients suitable for numerical computation:
A(φ) = A
0
+
N
¸
n=1
A
n
e
inφ
+A
∗
n
e
−inφ
or
A(z) = A
0
+
N
¸
n=1
A
n
e
i
2πn
L
z
+ A
∗
n
e
−i
2πn
L
z
All physical ﬁelds are real functions of space, so we solve for the complex coeﬃcients, A
n
,
n > 0, and the real coeﬃcient A
0
.
• Linear computations ﬁnd A
n
(t) (typically) for a single nvalue. (Input ‘lin_nmax’,
‘lin_nmodes’)
• Nonlinear simulations use FFTs to compute products on a uniform mesh in φ and z.
Example of nonlinear product: E = −V ×B
• Start with V
n
, B
n
, 0 ≤ n ≤ N
• FFT gives
V
m
π
N
, B
m
π
N
, 0 ≤ m ≤ 2N − 1
• Multiply (colocation)
E
m
π
N
= −V
m
π
N
× B
m
π
N
, 0 ≤ m ≤ 2N − 1
• FFT returns Fourier coeﬃcients
E
n
, 0 ≤ n ≤ N
Standard FFTs require 2N to be a power of 2.
• Input parameter ‘lphi’ determines 2N.
2N = 2
lphi
8 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
• Collocation is equivalent to spectral convolution if aliasing errors are prevented.
• One can show that setting V
n
, B
n
, etc. = 0 for n >
2N
3
prevents aliasing from quadratic
nonlinearities (products of two functions).
nmodes total =
2
lphi
3
+ 1
• Aliasing does not prevent numerical convergence if the expansion converges.
As a prelude to the ﬁnite element discretization, Fourier expansions:
1. Substituting a truncated series for f (φ) changes the PDE problem into a search for
the best solution on a restricted solution space.
2. Orthogonality of basis functions is used to ﬁnd a system of discrete equations for the
coeﬃcients.
e.g. Faraday’s law
∂B(φ)
∂t
= −∇×E(φ)
∂
∂t
¸
B
0
+
N
¸
n=1
B
n
e
inφ
+ c.c.
¸
= −∇×
¸
E
0
+
N
¸
n=1
E
n
e
inφ
+ c.c.
¸
apply
dφe
−in
φ
for 0 ≤ n
≤ N
2π
∂B
n
∂t
= −
dφ
∇
pol
× E
n
+
in
ˆ
φ
R
×E
n
, 0 ≤ n
≤ N
B.2 Finite Element Representation
Finite elements achieve discretization in a manner similar to the Fourier series. Both are
basis function expansions that impose a restriction on the solution space, the restricted
spaces become better approximations of the continuous solution space as more basis functions
are used, and the discrete equations describe the evolution of the coeﬃcients of the basis
functions.
The basis functions of ﬁnite elements are loworder polynomials over limited regions of
the domain.
α
i
(x) =
0 x < x
i−1
x−x
i−1
x
i
−x
i−1
x
i−1
≤ x ≤ x
i
x
i+1
−x
x
i+1
−x
i
x
i
≤ x ≤ x
i+1
0 x > x
i+1
B. SPATIAL DISCRETIZATION 9
x
i!1
(x)
i
!
1
x x
i i+1
Figure I.1: continuous linear basis function
x
i!1
(x)
i
!
1
x x
i i+1
x
i!1
(x)
i
"
1
x x
i i+1
Figure I.2: continuous quadratic basis function
α
i
(x) =
0 x < x
i−1
(x−x
i−1
)(x−x
i−
1
2
)
(x
i
−x
i−1
)(x
i
−x
i−
1
2
)
x
i−1
≤ x ≤ x
i
(x−x
i+1
)(x−x
i+
1
2
)
(x
i
−x
i+1
)(x
i
−x
i+
1
2
)
x
i
≤ x ≤ x
i+1
0 x > x
i+1
β
i
(x) =
(x−x
i−1
)(x−x
i
)
(x
i−
1
2
−x
i−1
)(x
i−
1
2
−x
i
)
x
i−1
≤ x ≤ x
i
0 otherwise
In a conforming approximation, continuity requirements for the restricted space are the
same as those for the solution space for an analytic variational form of the problem.
• Terms in the weak form must be integrable after any integrationbyparts; step func
tions are admissible, but delta functions are not.
• Our ﬂuid equations require a continuous solution space in this sense, but derivatives
need to be piecewise continuous only.
Using the restricted space implies that we seek solutions of the form
A(R, Z) =
¸
j
A
j
α
j
(R, Z)
10 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
• The jsummation is over the nodes of the space.
• α
j
(R, Z) here may represent diﬀerent polynomials. [α
j
&β
j
in 1D quadratic example
(Figure I.2)]
• 2D basis functions are formed as products of 1D basis functions for R and Z.
Nonuniform meshing can be handled by a mapping from logical coordinates to physical
coordinates. To preserve convergence properties, i.e. lowest order polynomials of physical
coordinates, the order of the mapping should be no higher than polynomial degree of the
basis functions[2].
The present version of NIMROD allows the user to choose the polynomial degree of the
basis functions (poly_degree) and the mapping (‘met_spl’).
Numerical analysis shows that a correct application of the ﬁnite element approach ties
convergence of the numerical solution to the best possible approximation by the restricted
space. (See [2] for analysis of selfadjoint problems.) Error bounds follow from the error
bounds of a truncated Taylor series expansion.
For scalar ﬁelds, we now have enough basis functions. For vector ﬁelds, we also need
direction vectors. All recent versions of NIMROD have equations expressed in (R, Z, φ)
coordinates for toroidal geometries and (X, Y, Z) for linear geometries.
y
z
x
z
e
e
e
e
e
e
Z
R
#
y
x
z
Figure I.3: input geom=‘toroidal’ & ‘linear’
For toroidal geometries, we must use the fact that ˆ e
R
, ˆ e
φ
are functions of φ, but we always
have ˆ e
i
· ˆe
j
= δ
ij
.
B. SPATIAL DISCRETIZATION 11
• Expanding scalars
S (R, Z, φ) →
¸
j
α
j
(R, Z)
¸
S
j0
+
N
¸
n=1
S
jn
e
inφ
+ c.c.
¸
• Expanding vectors
V(R, Z, φ) →
¸
j
¸
l
α
j
(R, Z) ˆ e
l
¸
V
jl0
+
N
¸
n=1
V
jln
e
inφ
+ c.c.
¸
* For convenience ¯ α
jl
≡ α
j
ˆ e
l
and I’ll suppress c.c. and make n sums from 0 to N,
where N here is nmodes_total−1.
While our poloidal basis functions do not have orthogonality properties like e
inφ
and ˆ e
R
,
their ﬁnite extent means that
dRdZD
s
(α
i
(R, Z)) D
t
(α
j
(R, Z)) = 0,
where D
s
is a spatial derivative operator of allowable order s (here 0 or 1), only where nodes
i and j are in the same element (inclusive of element borders).
Integrands of this form arise from a procedure analogous to that used with Fourier series
alone; we create a weak form of the PDEs by multiplying by a conjugated basis function.
Returning to Faraday’s law for resistive MHD, and using
E to represent the ideal −V×B
ﬁeld (after pseudospectral calculations):
• Expand B(R, Z, φ, t) and
E (R, Z, φ, t)
• Apply
dRdZdφ ¯ α
j
l
(R, Z) e
−in
φ
for j
= 1 : # nodes, l
{R, Z, φ} , 0 ≤ n
≤ N
• Integrate by parts to symmetrize the diﬀusion term and to avoid inadmissible deriva
tives of our solution space.
dRdZdφ
¯ α
j
l
e
−in
φ
·
¸
jln
∂B
jln
∂t
¯ α
jl
e
inφ
= −
dRdZdφ∇ ×
¯ α
j
l
e
−in
φ
·
¸
η
µ
0
¸
jln
B
jln
∇× ¯ α
jl
e
inφ
+
¸
jln
E
jln
¯ α
jl
e
inφ
¸
12 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
+
¯ α
j
l
e
−in
φ
·
¸
jln
E
jln
¯ α
jl
e
inφ
× dS
Using orthogonality to simplify, and rearranging the lhs,
2π
¸
j
∂B
jl
n
∂t
dRdZ α
j
α
j
+
¸
jl
B
jln
dRdZdφ
η
µ
0
∇×
¯ α
j
l
e
−in
φ
· ∇×
¯ α
jl
e
in
φ
= −
dRdZdφ ∇×
¯ α
j
l
e
−in
φ
·
¸
jl
E
jl
¯ α
jl
e
in
φ
+
¯ α
j
l
·
¸
jl
E
jln
¯ α
jl
×dS
for all {j
, l
, n
}
Notes on this weak form of Faraday’s law:
• Surface integrals for interelement boundaries cancel in conforming approximations.
• The remaining surface integral is at the domain boundary.
• The lhs is written as a matrixvector product with the
dRdZ f (¯ α
j
l
) g (¯ α
jl
) integrals
deﬁning a matrix element for row (j
, l
) and column (j, l). These elements are diagonal
in n.
• {B
jln
} constitutes the solution vector at an advanced time (see C).
• The rhs is left as a set of integrals over functions of space.
• For selfadjoint operators, such as I+∆t∇×
η
µ
0
∇×, using the same functions for test and
basis functions (Galerkin) leads to the same equations as minimizing the corresponding
variational [2], i.e. RayleighRitzGalerkin problem.
C. TEMPORAL DISCRETIZATION 13
B.3 Grid Blocks
For geometric ﬂexibility and parallel domain decomposition, the poloidal mesh can be con
structed from multiple gridblocks. NIMROD has two types of blocks: Logically rectangular
blocks of quadrilateral elements are structured. They are called ‘rblocks’. Unstructured
blocks of triangular elements are called ‘tblocks’. Both may be used in the same simulation.
The kernel requires conformance of elements along block boundaries, but blocks don’t
have to conform.
More notes on weak forms:
• The ﬁnite extent of the basis functions leads to sparse matrices
• The mapping for nonuniform meshing has not been expressed explicitly. It entails:
1.
dRdZ →
Jdξdη, where ξ, η are logical coordinates, and J is the Jacobian,
2.
∂α
∂R
→
∂α
∂ξ
∂ξ
∂R
+
∂α
∂η
∂η
∂R
and similarly for Z, where α is a low order polynomial of ξ
and η with uniform meshing in (ξ, η).
3. The integrals are computed numerically with Gaussian quadrature
1
0
f (ξ) dξ →
¸
i
w
i
f (ξ
i
) where {w
i
, ξ
i
}, i = 1, . . . n
g
are prescribed by the quadrature method
[3].
B.4 Boundary Conditions
Dirichlet conditions are normally enforced for B
normal
and V. If thermal conduction is used,
Dirichlet conditions are also imposed on T
α
. Boundary conditions can be modiﬁed for the
needs of particular applications, however.
Formally, Dirichlet conditions are enforced on the restricted solution space (“essential
conditions”). In NIMROD we simply substitute the Dirichlet boundary condition equations
in the linear system at the appropriate rows for the boundary nodes.
Neuman conditions are not set explicitly. The variational aspects enforce Neuman con
ditions  through the appearance (or nonappearance) of surface integrals  in the limit of
inﬁnite spatial resolution ([2] has an excellent description). These “natural” conditions pre
serve the spatial convergence rate without using ghost cells, as in ﬁnite diﬀerence or ﬁnite
volume methods.
C Temporal Discretization
NIMROD uses ﬁnite diﬀerence methods to discretize time. The solution ﬁeld is marched in
a sequence of timesteps from initial conditions.
14 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
C.1 Semiimplicit Aspects
Without going into an analysis, the semiimplicit method as applied to hyperbolic systems of
PDEs can be described as a leapfrog approach that uses a selfadjoint diﬀerential operator
to extend the range of numerical stability to arbitrarily large ∆t. It is the tool we use to
address the stiﬀness of the ﬂuid model as applied to fusion plasmas. (See [4, 5].)
Though any selfadjoint operator can be used to achieve numerical stability, accuracy
at large ∆t is determined by how well the operator represents the physical behavior of the
system [4]. The operator used in NIMROD to stabilize the MHD advance is the linear
idealMHD force operator (no ﬂow) plus a Laplacian with a relatively small coeﬃcient [5].
Like the leapfrog method, the semiimplicit method does not introduce numerical dissi
pation. Truncation errors are purely dispersive. This aspect makes semiimplicit methods
attractive for stiﬀ, low dissipation systems like plasmas.
NIMROD also uses timesplitting. For example, when the Hall electric ﬁeld is included
in a computation, B is ﬁrst advanced with the MHD electric ﬁeld, then with the Hall electric
ﬁeld.
B
MHD
= B
n
−∆t ∇× E
MHD
B
n+1
= B
MHD
−∆t ∇× E
Hall
Though it precludes 2
nd
order temporal convergence, this splitting reduces errors of large
timestep [6]. A separate semiimplicit operator is then used in the Hall advance of B. At
present, the operator implemented in NIMROD often gives inaccuracies at ∆t much larger
than the explicit stability limit for a predictor/corrector advance of B due to E
Hall
. Hall
terms must be used with great care until further development is pursued.
Why not an implicit approach?
A true implicit method for nonlinear systems converges on nonlinear terms at advanced
times, requiring nonlinear system solves at each time step. This may lead to ‘the ultimate’
timeadvance scheme. However, we have chosen to take advantage of the smallness of the
solution ﬁeld with respect to steady parts → the dominant stiﬀness arises in linear terms.
Although the semiimplicit approach eliminates timestep restrictions from wave propa
gation, the operators lead to linear matrix equations.
• Eigenvalues for the MHD operator are ∼ 1 +ω
2
k
∆t
2
where ω
k
are the frequencies of the
MHD waves supported by the spatial discretization.
• We have run tokamak simulations accurately at maxω
k
∆t ∼ 10
4
– 10
5
, and min ω
k

0.
• Leads to very illconditioned matrices.
• Anisotropic terms use steady + (n = 0) ﬁelds → operators are diagonal in n.
C. TEMPORAL DISCRETIZATION 15
C.2 Predictorcorrector Aspects
The semiimplicit approach does not ensure numerical stability for advection, even for V
s
·∇
terms. A predictorcorrector approach can be made numerically stable.
• Nonlinear and linear computations require two solves per equation.
• Synergistic waveﬂow eﬀects require having the semiimplicit operator in the predic
tor step and centering waveproducing terms in the corrector step when there are no
dissipation terms [7]
• NIMROD does not have separate centering parameters for wave and advective terms,
so we set centerings to 1/2 + δ and rely on having a little dissipation.
• Numerical stability still requires ∆t ≤ C
∆x
V
, where C is O(1) and depends on centering.
[See input.f in the nimset directory.]
C.3 Dissipative Terms
Dissipation terms are coded with timecenterings speciﬁed through input parameters. We
typically use fully forward centering.
C.4 ∇· B Constraint
Like many MHD codes capable of using nonuniform, nonorthogonal meshes, ∇ · B is not
identically 0 in NIMROD. Left uncontrolled, the error will grow until it crashes the simula
tion. We have used an errordiﬀusion approach [8] that adds the unphysical diﬀusive term
κ
∇·B
∇∇· B to Faraday’s law.
∂B
∂t
= −∇× E + κ
∇·B
∇∇· B (I.1)
Applying ∇· gives
∂∇· B
∂t
= ∇· κ
∇·B
∇(∇· B) (I.2)
so that the error is diﬀused if the boundary conditions maintain constant
dS · B[9].
16 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
C.5 Complete time advance
A complete timestep in version 3.0.3 will solve coeﬃcients of the basis functions used in the
following timediscrete equations.
ρ
n
−∆tf
ν
∇· ρ
n
ν∇−
C
sm
∆t
2
4
[B
0
× ∇× ∇× (B
0
× I·) − J
0
× ∇×(B
0
×I·)]
−
C
sp
∆t
2
4
[∇γP
0
∇· +∇[∇(p
0
) · I·]] −
C
sn
∆t
2
4
∇
2
(∆V)
pass
= −∆t [ρ
n
V
∗
· ∇V
∗
+ J
n
×B
n
−∇p
n
+∇· ρ
n
ν∇V
n
+∇· Π
neo
]
• For pass = predict, V
∗
= V
n
.
• For pass = correct, V
∗
= V
n
+f
v
(∆V)
predict
, and
• V
n+1
= V
n
+ (∆V)
correct
(∆n)
pass
= −∆t
V
n+1
· ∇n
∗
+n
∗
∇· V
n+1
• pass = predict →n
∗
= n
n
• pass = correct →n
∗
= n
n
+f
n
(∆n)
predict
• n
n+1
= n
n
+ (∆n)
correct
Note: f
∗
’s are centering coeﬃcients, C
s∗
’s are semiimplicit coeﬃcients.
1 + ∇×
m
e
ne
2
+ (∆t)f
η
eta
µ
0
∇×−(∆t)f
∇·B
κ
∇·B
∇∇·
(∆B)
pass
= (∆t)∇×
¸
V
n+1
× B
∗
−
n
µ
0
∇× B
n
+
1
ne
∇· Π
e
neo
+ (∆t)κ
∇·B
∇∇· B
n
• pass = predict →B
∗
= B
n
• pass = correct →B
∗
= B
n
+ f
b
(∆B)
predict
• B
MHD
= B
n
+ ∆B
correct
1 +∇×
m
e
ne
2
+ C
sb
(∆t)
2
b
B
0

ne
∇×−∆tf
∇·B
κ
∇·B
∇∇·
(∆B)
pass
= −(∆t)
b
∇×
¸
1
ne
(J
∗
×B
∗
− ∇p
n
e
) +
m
e
ne
2
∇· (J
∗
V
n+1
+ V
n+1
J
∗
)
+(∆t)
b
κ
∇·B
∇∇· B
MHD
C. TEMPORAL DISCRETIZATION 17
• pass = predict →(∆t)
b
= ∆t/2, B
∗
= B
MHD
• pass = correct →(∆t)
b
= ∆t, B
∗
= B
MHD
+ (∆B)
predict
• B
n+1
= B
MHD
+ (∆B)
correct
n
n+1
γ − 1
+ (∆t)f
χ
∇· n
n+1
χ
α
· ∇
(∆T
α
)
pass
= −∆t
¸
n
n+1
γ −1
V
n+1
α
· ∇T
∗
α
+ p
n
α
∇· V
n+1
α
−∇· n
n+1
χ
α
· ∇T
n
α
+ Q
• pass = predict →T
∗
α
= T
n
α
• pass = correct →T
∗
α
= T
n
α
+f
T
(∆T
α
)
predict
• T
n+1
α
= T
n
α
+ (∆T
α
)
correct
• α = e, i
Notes on timeadvance in 3.0.3
• Without the Hall timesplit, approximate timecentered leapfrogging could be achieved
by using
1
2
n
n
+ n
n+1
in the temperature equation
{B,n,T}
n
{B,n,T}
n!1
V
n!1/2
V
n+1/2
t
Figure I.4: schematic of leapfrog
• The diﬀerent centering of the Hall time split is required for numerical stability (see
Nebel notes).
• the
∂J
∂t
electron inertia term necessarily appears in both timesplits for B when it is
included in E.
• Fields used in semiimplicit operators are updated only after changing more than a set
level to reduce unnecessary compution of the matrix elements (which are costly).
18 CHAPTER I. FRAMEWORK OF NIMROD KERNEL
D Basic functions of the NIMROD kernel
1. Startup
• Read input parameters
• Initialize parallel communication as needed
• Read grid, equilibrium, and initial ﬁelds from dump ﬁle
• Allocate storage for work arrays, matrices, etc.
2. Time Advance
(a) Preliminary Computations
• Revise ∆t to satisfy CFL if needed
• Check how much ﬁelds have changed since matrices were last updated. Set
ﬂags if matrices need to be recomputed
• Update neoclassical coeﬃcients, voltages, etc. as needed
(b) Advance each equation
• Allocate rhs and solution vector storage
• Extrapolate in time previous solutions to provide an initial guess for the linear
solve(s).
• Update matrix and preconditioner if needed
• Create rhs vector
• Solve linear system(s)
• Reinforce Dirichlet conditions
• Complete regularity conditions if needed
• Update stored solution
• Deallocate temporary storage
3. Output
• Write global, energy, and probe diagnostics to binary and/or text ﬁles at desired
frequency in timestep (nhist)
• Write complete dump ﬁles at desired frequency in timestep (ndump)
4. Check stop condition
• Problems such as lack of iterative solver convergence can stop simulations at any
point
Chapter II
FORTRAN Implementation
Preliminary remarks
As implied in the Framework section, the NIMROD kernel must complete a variety of
tasks each time step. Modularity has been essential in keeping the code manageable as the
physics model has grown in complexity.
FORTRAN 90 extensions are also an important ingredient. Arrays of data structures pro
vide a clean way to store similar sets of data that have diﬀering numbers of elements. FORTRAN
90 modules organize diﬀerent types of data and subprograms. Array syntax leads to cleaner
code. Use of KIND when deﬁning data settles platformspeciﬁc issues of precision. ([10] is
recommended.)
A Implementation map
All computationally intensive operations arise during either 1) the ﬁnite element calculations
to ﬁnd rhs vectors and matrices, and 2) solution of the linear systems. Most of the operations
are repeated for each advance, and equationspeciﬁc operations occur at only two levels in
a hierarchy of subprograms for 1). The data and operation needs of subprograms at a level
in the hierarchy tend to be similar, so FORTRAN module grouping of subprograms on a level
helps orchestrate the hierarchy.
B Data storage and manipulation
B.1 Solution Fields
The data of a NIMROD simulation are the sets of basis functions coeﬃcients for each solution
ﬁeld plus those for the grid and steadystate ﬁelds. This information is stored in FORTRAN
90 structures that are located in the fields module.
19
20 CHAPTER II. FORTRAN IMPLEMENTATION
I
m
p
l
e
m
e
n
t
a
t
i
o
n
M
a
p
o
f
P
r
i
m
a
r
y
N
I
M
R
O
D
O
p
e
r
a
t
i
o
n
s
·
a
d
v
*
r
o
u
t
i
n
e
s
m
a
n
a
g
e
a
n
a
d
v
a
n
c
e
o
f
o
n
e
ﬁ
e
l
d
¨
¨
¨%
r
r
r
rj
6 c
6 c
F
I
N
I
T
E
E
L
E
M
E
N
T
·
m
a
t
r
i
x
c
r
e
a
t
e
·
g
e
t
r
h
s
o
r
g
a
n
i
z
e
f
.
e
.
c
o
m
p
u
t
a
t
i
o
n
s
'
'
c
c
E
B
O
U
N
D
A
R
Y
e
s
s
e
n
t
i
a
l
c
o
n
d
i
t
i
o
n
s
R
E
G
U
L
A
R
I
T
Y
R
B
L
O
C
K
(
T
B
L
O
C
K
)
·
r
b
l
o
c
k
m
a
k
e
m
a
t
r
i
x
·
r
b
l
o
c
k
g
e
t
r
h
s
P
e
r
f
o
r
m
n
u
m
e
r
i
c
a
l
i
n
t
e
g
r
a
t
i
o
n
i
n
r
b
l
o
c
k
s
c
c
I
N
T
E
G
R
A
N
D
S
P
e
r
f
o
r
m
m
o
s
t
e
q
u
a

t
i
o
n
s
p
e
c
i
ﬁ
c
p
o
i
n
t

w
i
s
e
c
o
m
p
u
t
a
t
i
o
n
s
c
c
F
F
T
M
O
D
N
E
O
C
L
A
S
S
I
N
T
p
e
r
f
o
r
m
c
o
m
p
u
t
a
t
i
o
n
s
f
o
r
n
e
o
c
l
a
s
s
i
c
a
l

s
p
e
c
i
ﬁ
c
e
q
u
a
t
i
o
n
s
C
c
c
M
A
T
H
T
R
A
N
d
o
a
r
o
u
t
i
n
e
m
a
t
h
o
p
e
r
a
t
i
o
n
S
U
R
F
A
C
E
·
s
u
r
f
a
c
e
c
o
m
p
r
h
s
p
e
r
f
o
r
m
n
u
m
e
r
i
c
a
l
s
u
r
f
a
c
e
i
n
t
e
g
r
a
l
s
c
S
U
R
F
A
C
E
I
N
T
S
p
e
r
f
o
r
m
s
u
r
f
a
c
e

s
p
e
c
i
ﬁ
c
c
o
m
p
u
t
a
t
i
o
n
s
'
¥
§ c
c
G
E
N
E
R
I
C
E
V
A
L
S
i
n
t
e
r
p
o
l
a
t
e
o
r
ﬁ
n
d
s
t
o
r
a
g
e
f
o
r
d
a
t
a
a
t
q
u
a
d
r
a
t
u
r
e
p
o
i
n
t
s
I
T
E
R
3
D
C
G
M
O
D
p
e
r
f
o
r
m
3
D
m
a
t
r
i
x
s
o
l
v
e
s
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g g
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
I
T
E
R
C
G
F
9
0
p
e
r
f
o
r
m
r
e
a
l
2
D
m
a
t
r
i
x
s
o
l
v
e
s
¨
¨
¨
¨
¨
¨
¨
¨
¨
I
T
E
R
C
G
C
O
M
P
p
e
r
f
o
r
m
c
o
m
p
l
e
x
2
D
m
a
t
r
i
x
s
o
l
v
e
s
E
D
G
E
·
e
d
g
e
n
e
t
w
o
r
k
c
a
r
r
y
o
u
t
c
o
m
m
u
n
i
c
a
t
i
o
n
a
c
r
o
s
s
b
l
o
c
k
b
o
r
d
e
r
s
c
V
E
C
T
O
R
T
Y
P
E
M
O
D
p
e
r
f
o
r
m
o
p
s
o
n
v
e
c
t
o
r
c
o
e
ﬃ
c
i
e
n
t
s
c
M
A
T
R
I
X
M
O
D
c
o
m
p
u
t
e
2
D
m
a
t
r
i
x

v
e
c
t
o
r
p
r
o
d
u
c
t
s
c
N
O
T
E
:
1
.
a
l
l
c
a
p
s
i
n
d
i
c
a
t
e
a
m
o
d
u
l
e
n
a
m
e
.
2
.
‘
·
’
i
n
d
i
c
a
t
e
s
a
s
u
b
r
o
u
t
i
n
e
n
a
m
e
o
r
i
n
t
e
r
f
a
c
e
t
o
m
o
d
u
l
e
p
r
o
c
e
d
u
r
e
B. DATA STORAGE AND MANIPULATION 21
• rb is a 1D pointer array of deﬁnedtype rblock_type. At the start of a simulation, it
is dimensioned 1:nrbl, where nrbl is the number of rblocks.
• The type rblock_type (analogy:integer), is deﬁned in module rblock_type_mod. This
is a structure holding information such as the celldimensions of the block(mx,my), ar
rays of basis function information at Gaussian quadrature nodes (alpha,dalpdr,dalpdz),
numerical integration information(wg,xg,yg), and two sets of structures for each solu
tion ﬁeld:
1. A lagr_quad_type structure holds the coeﬃcients and information for evaluating
the ﬁeld at any (ξ, η).
2. An rb_comp_qp_type, just a pointer array, allocated to save the interpolates of
the solution spaces at the numerical quadrature points.
• tb is a 1D pointer array, similar to rb but of type tblock_type_mod, dimensioned
nrbl+1:nbl so that tblocks have numbers distinct from the rblocks.
Before delving into the elementspeciﬁc structures, let’s backup to the fields module
level. Here, there are also 1D pointer arrays of cvector_type and vector_type for each
solution and steadystate ﬁeld, respectively. These structures are used to give blocktype
independent access to coeﬃcients. They are allocated 1:nbl. These structures hold pointer
arrays that are used as pointers (as opposed to using them as ALLOCATABLE arrays in struc
tures). There are pointer assignments from these arrays to the lagr_quad_type structure
arrays that have allocated storage space. (See [10])
In general, the elementspeciﬁc structures(lagr_quad_type, here) are used in conjunction
with the basis functions (as occurs in getting ﬁeld values at quadrature point locations). The
elementindependent structure(vector_types) are used in linear algebra operations with the
coeﬃcients.
The lagr_quad_types are deﬁned in the lagr_quad_mod module, along with the in
terpolation routines that use them. [The tri_linear_types are the analogous structures
for triangular elements in the tri_linear module.] The lagr_quad_2D_type is used for
φindependent ﬁelds.
The structures are selfdescribing, containing array dimensions and character variables
for names. There are also some arrays used for interpolation. The main storage locations
are the fs, fsh, fsv, and fsi arrays.
• fs(iv,ix,iy,im) → coeﬃcients for gridvertex nodes.
– iv = direction vector component (l:nqty)
– ix = horizontal logical coordinate index (0:mx)
– iy = vertical logical coordinate index (0:my)
22 CHAPTER II. FORTRAN IMPLEMENTATION
– im = Fourier component (“mode”) index (1:nfour)
• fsh(iv,ib,ix,iy,im) → coeﬃcients for horizontal side nodes.
– iv = direction vector component (*:*)
– ib = basis index (1:n_side)
– ix = horizontal logical coordinate index (1:mx)
– iy = vertical logical coordinate index (0:my)
– im = Fourier component (“mode”) index (1:nfour)
• fsv(iv,ib,ix,iy,im) → coeﬃcients for vertical side nodes.
• fsi(iv,ib,ix,iy,im) → coeﬃcients for internal nodes [ib limit is (1:n_int)]
fs(:,ix!1,iy,:)
fs(:,ix,iy!1,:)
fsh(:,1,ix,iy!1,:)
fsh(:,1,ix,iy,:) fsh(:,2,ix,iy,:)
fsh(:,2,ix,iy!1,:)
fs(:,ix,iy,:)
fs(:,ix!1,iy!1,:)
fsi(:,3,ix,iy,:)
fsi(:,1,ix,iy,:)
fsi(:,4,ix,iy,:)
fsv(:,1,ix!1,iy,:)
fsv(:,2,ix!1,iy,:) fsv(:,2,ix,iy,:)
fsv(:,1,ix,iy,:)
fsi(:,2,ix,iy,:)
Figure II.1: Basic Element Example. (n_side=2, n_int=4)
B.2 Interpolation
The lagr_quad_mod module also contains subroutines to carry out the interpolation of ﬁelds
to arbitrary positions. This is a critical aspect of the ﬁnite element algorithm, since interpo
lations are needed to carry out the numerical integrations.
For both real and complex data structures, there are single point evaluation routines
(*_eval) and routines for ﬁnding the ﬁeld at the same oﬀset in every cell of a block
(*_all_eval routines). All routines start by ﬁnding α
i

(ξ
,η
)
[and
∂
∂ξ
α
i

(ξ
,η
)
and
∂
∂η
α
i

(ξ
,η
)
B. DATA STORAGE AND MANIPULATION 23
if requested] for the (element–independent) basis function values at the requested oﬀset
(ξ
, η
). The lagr_quad_bases routine is used for this operation. [It’s also used at startup
to ﬁnd the basis function information at quadrature point locations, stored in the rb%alpha,
dalpdr, dalpdz arrays for use as test functions in the ﬁnite element computations.] The
interpolation routines then multiply basis function values with respective coeﬃcients.
The data structure, logical positions for the interpolation point, and desired derivative
order are passed into the singlepoint interpolation routine. Results for the interpolation are
returned in the structure arrays f, fx, and fy.
The all_eval routines need logical oﬀsets (from the lowerleft corner of each cell) instead
of logical coordinates, and they need arrays for the returned values and logical derivatives in
each cell.
See routine struct_set_lay in diagnose.f for an example of a single point interpolation
call. See generic_3D_all_eval for an example of an all_eval call.
Interpolation operations are also needed in tblocks. The computations are analogous
but somewhat more complicated due to the unstructured nature of these blocks. I won’t
cover tblocks in any detail, since they are due for a substantial upgrade to allow arbitrary
polynomial degree basis functions, like rblocks. However, development for released versions
must be suitable for simulations with tblocks.
B.3 Vector_types
We often perform linear algebra operations on vectors of coeﬃcients. [Management routines,
iterative solves, etc.] Vector_type structures simplify these operations, putting block and
basisspeciﬁc array idiosyncrasies into module routines. The developer writes one call to a
subroutine interface in a loop over all blocks.
Consider an example from the adv_b_iso management routine. There is a loop over
Fourier components that calls two real 2D matrix solves per component. One solves α for
Re{B
R
}, Re{B
Z
}, and −Im{B
φ
} for the Fcomp, and the other solves for Im{B
R
}, Im{B
Z
},
and Re{B
φ
}. After a solve in a predictor step (b_pass=bmhd pre or bhall pre), we need
to create a linear combination of the old ﬁeld and the predicted solution. At this point,
• vectr (1:nbl) → work space, 2D real vector_type
• sln (1:nbl) → holds linear system solution, also a 2D vector_type
• be (1:nbl) → pointers to B
n
coeﬃcients, 3D (complex) vector_type
• work1 (1:nbl) →pointers to storage that will be B
∗
in corrector step, 3D vector_type
What we want is B
∗
imode
= f
b
B
pre
imode
+ (1 − f
b
)B
n
imode
. So we have
• CALL vector_assign_cvec(vectr(ibl), be(ibl), flag, imode)
24 CHAPTER II. FORTRAN IMPLEMENTATION
– Transfers the flagindicated components such as r12mi3 for Re{B
R
}, Re{B
Z
},
and −Im{B
φ
} for imode to the 2D vector.
• CALL vector_add(sln(ibl), vectr(ibl), v1fac=‘center’, v2fac=‘1center’)
– Adds ‘1center’*vectr to ‘center’*sln and saves results in sln.
• CALL cvector_assign_vec(work1(ibl), sln(ibl), flag, imode)
– Transfer linear combination to appropriate vector components and Fourier com
ponent of work1.
Examining vector_type_mod at the end of the vector_type_mode.f ﬁle, ‘vector_add is
deﬁned as an interface label to three subroutines that perform the linear combination for
each vector_type. Here we interface vector_add_vec, as determined by the type deﬁnitions
of the passed arrays. This subroutine multiplies each coeﬃcient array and sums them.
Note that we did not have to do anything diﬀerent for tblocks, and coding for coeﬃcients
in diﬀerent arrays is removed from the calling subroutine.
Going back to the end of the vector_type_mode.f ﬁle, note that ‘=’ is overloaded, so we
can write statements like
work1(ibl) = be(ibl) if arrays conform
be(ibl) = 0
which perform the assignment operations component array to component array or scalar to
each component array element.
B.4 Matrix_types
The matrix_type_mod module has deﬁnitions for structures used to store matrix elements
and for structures that store data for preconditioning iterative solves. The structures are
somewhat complicated in that there are levels containing pointer arrays of other structures.
The necessity of such complication arises from varying array sizes associated with diﬀerent
blocks and basis types.
At the bottom of the structure, we have arrays for matrix elements that will be used
in matrixvector product routines. Thus, array ordering has been chosen to optimize these
product routines which are called during every iteration of a conjugate gradient solve. Fur
thermore, each lowestlevel array contains all matrix elements that are multiplied with a
gridvertex, horizontalside, verticalside, or cellinterior vector type array (arr, arrh, arrv,
and arri respectively) i.e. all coeﬃcients for a single“basistype”within a block. The matri
ces are 2D, so there are no Fourier component indices at this level, and our basis functions do
not produce matrix couplings from the interior of one grid block to the interior of another.
The 6D matrix is unimaginatively named arr. Referencing one element appears as
B. DATA STORAGE AND MANIPULATION 25
arr(jq,jxoff,jyoff,iq,ix,iy)
where
iq = ‘quantity’ row index
ix = horizontalcoordinate row index
iy = verticalcoordinate row index
jq = ‘quantity’ column index
jxoff = horizontalcoordinate column oﬀset
jyoff = verticalcoordinate column oﬀset
For gridvertex to gridvertex connections 0 ≤ ix ≤ mx and 0 ≤ iy ≤ my, iq and jq
correspond to the vector index dimension of the vector type. (simply 1 : 3 for A
R
, A
Z
, A
φ
or just 1 for a scalar). The structuring of rblocks permits oﬀset storage for the remaining
indices, giving dense storage for a sparse matrix without indirect addressing. The column
coordinates are simply
jx = ix + jxoff
jy = iy + jyoff
For each dimension, vertex to vertex oﬀsets are −1 ≤ j ∗ off ≤ 1, with storage zeroed
out where the oﬀset extends beyond the block border. The oﬀset range is set by the extent
of the corresponding basis functions.
i j
Figure II.2: 1D Cubic : extent of α
i
in red and α
j
in green, integration gives joff= −1
Oﬀsets are more limited for basis functions that have their nonzero node in a cell interior.
Integration here gives joff=1 matrix elements, since cells are numbered from 1, while
vertices are numbered from 0. Thus for gridvertextocell connections −1 ≤ jxoff ≤
0, while for celltogridvertex connections 0 ≤ jxoff ≤ 1. Finally, j*off=0 for cellto
cell. The ‘outerproduct’ nature implies unique oﬀset ranges for gridvertex, horizontalside,
verticalside, and cellinterior centered coeﬃcients.
Though oﬀset requirements are the same, we also need to distinguish diﬀerent bases of
the same type, such as the two sketched here.
26 CHAPTER II. FORTRAN IMPLEMENTATION
i j
Figure II.3: 1D Cubic : extent of α
i
in red and β
j
in green, note j is an internal node
j i
Figure II.4: 1D Cubic : extent of β
i
in red and β
j
in green, note both i, j are internal nodes
For lagr_quad_type storage and vector_type pointers, this is accomplished with an
extra array dimension for side and interior centered data. For matrixvector multiplication
it is convenient and more eﬃcient to collapse the vector and basis indices into a single index
with the vector index running faster.
Examples:
• Bicubic grid vertex to vertical side for P
jq=1, 1 ≤ iq ≤ 2
• Biquartic cell interior to horizontal side for
B
1 ≤ jq ≤ 27, 1 ≤ iq ≤ 9
Storing separate arrays for matrix elements connecting each of the basis types there
fore lets us have contiguous storage for nonzero elements despite diﬀerent oﬀset and basis
dimensions.
B. DATA STORAGE AND MANIPULATION 27
Instead of giving the arrays 16 diﬀerent names, we place arr in a structure contained by
a 4 × 4 array mat. One of the matrix element arrays is then referenced as
mat(jb,ib)%arr
where
ib=basistype row index
jb=basistype column index
and 1 ≤ ib ≤ 4, 1 ≤ jb ≤ 4. The numeral coding is
1≡gridvertex type
2≡horizontalvertex type
3≡verticalvertex type
4≡cellinterior type
Bilinear elements are an exception. There are only gridvertex bases, so mat is a 1 × 1
array.
This level of the matrix structure also contains selfdescriptions, such as nbtype=1 or
4; starting horizontal and vertical coordinates (=0 or 1) for each of the diﬀerent types;
nb_type(1:4) is equal to the number of bases for each type (poly_degree1 for types 23
and (poly_degree − 1)
2
for type 4); and nq_type is equal to the quantityindex dimension
for each type.
All of this is placed in a structure, rbl_mat, so that we can have on array of rbl_mat’s
with one component component per grid block. A similar tbl_mat structure is also set at
this level, along with selfdescription.
Collecting all block_mats in the global_matrix_type allows one to deﬁne an array of
these structures with one array component for each Fourier component.
This is a 1D array since the only stored matrices are diagonal in the Fourier component
index, i.e. 2D.
matrix_type_mod also contains deﬁnitions of structures used to hold data for the pre
conditioning step of the iterative solves. NIMROD has several options, and each has its own
storage needs.
So far, only matrix_type deﬁnitions have been discussed. The storage is located in the
matrix_storage_mod module. For most cases, each equation has its own matrix storage
that holds matrix elements from timestep to timestep with updates as needed. Anisotropic
operators require complex arrays, while isotropic operators use the same real matrix for two
sets of real and imaginary components, as in the B advance example in II.B.3.
28 CHAPTER II. FORTRAN IMPLEMENTATION
The matrix_vector product subroutines are available in the matrix_mod module. Inter
face matvec allows one to use the same name for diﬀerent data types. The large number of
lowlevel routines called by the interfaced matvecs arose from optimization. The operations
were initially written into one subroutine with adjustable oﬀsets and starting indices. What’s
there now is ugly but faster due to having ﬁxed index limits for each basis type resulting in
better compiler optimization.
Note that tblock matrix_vector routines are also kept in matrix_mod. They will undergo
major changes when tblock basis functions are generalized.
The matelim routines serve a related purpose  reducing the number of unknowns in a
linear system prior to calling an iterative solve. This amounts to a direct application of
matrix partitioning. For the linear system equation,
A
11
A
12
A
21
A
22
x
1
x
2
=
b
1
b
2
(II.1)
where A
ij
are submatrices, x
j
and b
i
are the parts of vectors, and A and A
22
are invertible,
x2 = A
−1
22
(b
2
− A
21
x
1
) (II.2)
A
11
−A
12
A
−1
22
A
21
. .. .
Schur component
¸
¸
x
1
= b
1
−A
12
A
−1
22
b
2
(II.3)
gives a reduced system. If A
11
is sparse but A
11
−A
12
A
−1
22
A
21
is not, partitioning may slow the
computation. However, if A
11
− A
12
A
−1
22
A
21
has the same structure as A
11
, it can help. For
cellinterior to cellinterior submatrices, this is true due to the singlecell extent of interior
basis functions. Elimination is therefore used for 2D linear systems when poly_degree> 1.
Additional notes regarding matrix storage: the stored coeﬃcients are not summed
across block borders. Instead, vectors are summed, so that Ax is found through operations
separated by block, followed by a communication step to sum the product vector across block
borders.
B.5 Data Partitioning
Data partitioning is part of a philosophy of how NIMROD has been  and should continue to
be  developed. NIMROD’s kernel in isolation is a complicated code, due to the objectives
of the project and chosen algorithm. Development isn’t trivial, but it would be far worse
without modularity.
NIMROD’s modularity follows from the separation of tasks described by the implemen
tation map. However, data partitioning is the linchpin in the scheme. Early versions of
the code had all data storage together. [see the discussion of version 2.1.8 changes in the
kernel’s README ﬁle.]. While this made it easy to get some parts of the code working, it
B. DATA STORAGE AND MANIPULATION 29
encouraged repeated coding with slight modiﬁcations instead of taking the time to develop
general purpose tools for each task. Eventually it became too cumbersome, and the changes
made to achieve modularity at 2.1.8 were extensive.
Despite many changes since, the modularity revision has stood the test of time. Changing
terms or adding equations requires relatively minimal amount of coding. But, even major
changes, like generalizing ﬁnite element basis functions, are tractable. The scheme will need
to evolve as new strides are taken. [Particle closures come to mind.] However, thought and
planning for modularity and data partitioning reward in the long term.
NOTE
• These comments relate to coding for release. If you are only concerned about one
application, do what is necessary to get a result
• However, in many cases, a little extra eﬀort leads to a lot more usage of development
work.
B.6 Seams
The seam structure holds information needed at the borders of grid blocks, including the
connectivity of adjacent blocks and geometric data and ﬂags for setting boundary conditions.
The format is independent of block type to avoid blockspeciﬁc coding at the step where
communication occurs.
The seam_storage_mod module hold the seam array, a list of blocks that touch the
boundary of the domain (exblock_list), and seam0. The seam0 structure is of edge_type,
like the seam array itself. seam0 was intended as a functionally equivalent seam for all space
beyond the domain, but the approach hampered parallelization. Instead, it is only used
during initialization to determine which vertices and cell sides lie along the boundary. We
still require nimset to create seam0.
As deﬁned in the edge_type_mod module, an edge_type holds vertex and segment struc
tures. Separation of the border data into a vertex structure and segment structure arises
from the diﬀerent connectivity requirements; only two blocks can share a cell side, while
many blocks can share a vertex. The three logical arrays: expoint, excorner, and r0point,
are ﬂags to indicate whether a boundary condition should be applied. Additional arrays
could be added to indicate where diﬀerent boundary conditions should be applied.
Aside For ﬂux injection boundary conditions, we have added the logical array applV0(1:2,:).
This logical is set at nimrod start up and is used to determine where to apply injection cur
rent and the appropriate boundary condition. The ﬁrst component applies to the vertex and
the second component to the associated segment.
The seam array is dimensioned from one to the number of blocks. For each block the
vertex and segment arrays have the same size, the number of border vertices equals the
30 CHAPTER II. FORTRAN IMPLEMENTATION
iv=1 iv=2
iv=7
is=1 is=2
iv=nvert
is=nvert
is=7
Figure II.5: iv=vertex index, is=segment index, nvert=count of segment and vertices
number of border segments, but the starting locations are oﬀset (see Figure II.5). For
rblocks, the ordering is unique.
Both indices proceed counter clockwise around the block. For tblocks, there is no re
quirement that the scan start at a particular internal vertex number, but the seam indexing
always progresses counter clockwise around its block. (entire domain for seam0)
Within the vertex_type structure, the ptr and ptr2 arrays hold the connectivity infor
mation. The ptr array is in the original format with connections to seam0. It gets deﬁned
in nimset and is saved in dump ﬁles. The ptr2 array is a duplicate of ptr, but references
to seam0 are removed. It is deﬁned during the startup phase of a nimrod simulation and
is not saved to dump ﬁles. While ptr2 is the one used during simulations, ptr is the one
that must be deﬁned during preprocessing. Both arrays have two dimensions. The second
index selects a particular connection, and the ﬁrst index is either: 1 to select (global) block
numbers (running from 1 to nbl_total, not 1 to nbl, see information on parallelization), or
2 to select seam vertex number. (see Figure II.6)
Connections for the one vertex common to all blocks are stored as:
B. DATA STORAGE AND MANIPULATION 31
iv=1 iv=2
iv=1 iv=2
iv=1 iv=2
iv=4
iv=3
iv=5
block1
block2
block3
iv=11
.
.
.
Figure II.6: 3 blocks with corresponding vertices
• seam(1)%vertex(5)
– seam(1)%vertex(5)% ptr(1,1) = 3
– seam(1)%vertex(5)% ptr(2,1) = 11
– seam(1)%vertex(5)% ptr(1,2) = 2
– seam(1)%vertex(5)% ptr(2,2) = 2
• seam(2)%vertex(2)
– seam(2)%vertex(2)%ptr(1,1) = 1
– seam(2)%vertex(2)%ptr(2,1) = 5
– seam(2)%vertex(2)%ptr(1,2) = 3
– seam(2)%vertex(2)%ptr(2,2) = 11
• seam(3)%vertex(11)
32 CHAPTER II. FORTRAN IMPLEMENTATION
– seam(3)%vertex(11)%ptr(1,1) = 2
– seam(3)%vertex(11)%ptr(2,1) = 2
– seam(3)%vertex(11)%ptr(1,2) = 1
– seam(3)%vertex(11)%ptr(2,2) = 5
Notes on ptr:
• It does not hold a selfreference, e.g. there are no references like:
– seam(3)%vertex(11)%ptr(1,3) = 3
– seam(3)%vertex(11)%ptr(2,3) = 11
• The order of the connections is arbitrary
The vertex_type stores interior vertex labels for its block in the intxy array. In rblocks
vertex(iv)%intxy(1:2) saves (ix,iy) for seam index iv. In tblocks, there is only one
blockinterior vertex label. intxy(1) holds its value, and intxy(2) is set to 0.
Aside: Though discussion of tblocks has been avoided due to expected revisions, the vec
tor_type section should have discussed their index ranges for data storage. Triangle vertices
are numbered from 0 to mvert, analogous to the logical coordinates in rblocks. The list is
1D in tblocks, but an extra array dimension is carried so that vector_type pointers can be
used. For vertex information, the extra index follows the rblock vertex index and has the
ranges 0:0. The extra dimension for cell information has the range 1:1, also consistent with
rblock cell numbering.
The various %seam_* arrays in vertex_type are temporary storage locations for the data
that is communicated across block boundaries. The %order array holds a unique prescription
(determined during NIMROD startup) of the summation order for the diﬀerent contributions
at a border vertex, ensuring identical results for the diﬀerent blocks. The %ave_factor and
%ave_factor_pre arrays hold weights that are used during the iterative solves.
For vertices along a domain boundary, tang and norm are allocated (1:2) to hold the R and
Z components of unit vectors in the boundary tangent and normal directions, respectively,
with respect to the poloidal plane. These unit vectors are used for setting Dirichlet boundary
conditions. The scalar rgeom is set to R for all vertices in the seam. It is used to ﬁnd vertices
located at R = 0, where regularity conditions are applied.
The %segment array (belonging to edge_type and of type segment_type) holds similar
information; however, segment communication diﬀerers from vertex communication three
ways.
• First, as mentioned above, only two blocks can share a cell side.
B. DATA STORAGE AND MANIPULATION 33
• Second, segments are used to communicate oﬀdiagonal matrix elements that extend
along the cell side.
• Third, there may be no element_side_centered data (poly degree = 1), or there
may be many nodes along a given element side.
The 1D ptr array reﬂects the limited connectivity. Matrix element communication makes
use of intxyp and inxyn internal vertex indices for the previous and next vertices along the
seam (intxys holds the internal cell indices). Finally, the tang and norm arrays have n extra
dimension corresponding to the diﬀerent element side nodes if poly degree > 1.
Before describing the routines that handle block border communication, it is worth noting
what is written to the dump ﬁles for each seam. The subroutine dump_write_seam in the
dump module shows that very little is written. Only the ptr, intxy, and excorner arrays
plus descriptor information are saved for the vertices, but seam0 is also written. None of
the segment information is saved. Thus, NIMROD creates much of the vertex and all of
the segment information at startup. This ensures selfconsistency among the diﬀerent seam
structures and with the grid.
The routines called to perform block border communication are located in the edge
module. Central among these routines is edge_network, which invokes serial or parallel op
erations to accumulate vertex and segment data. The edge_load and edge_unload routines
are used to transfer block vertextype data to or from the seam storage. There are three sets
of routines for the three diﬀerent vector types.
A typical example occurs at the end of the get_rhs routine in finite_element_mod.
The code segment starts with
DO ibl=1,nbl
CALL edge_load_carr(rhsdum(ibl),nqty,1_i4,nfour,nside,seam(ibl))
ENDDO
which loads vector component indices 1:nqty (starting location assumed) and Fourier com
ponent indices 1:nfour (starting location set by the third parameter in the CALL statement)
for the vertex nodes and ‘nside cellside nodes from the cvector_type rhsdum to seam_cin
arrays in seam. The single ibl loop includes both block types.
The next step is to perform the communication.
CALL edge_network(nqty,nfour,nside,.false.)
There are a few subtleties in the passed parameters. Were the operation for a real (nec
essarily 2D) vector_type, nfour must be 0 (‘0_i4 in the statement to match the dummy
argument’s kind). Were the operation for a cvector_2D_type, as occurs in iter_cg_comp,
nfour must be 1 (‘1_i4). Finally, the fourth argument is a ﬂag to perform the load and
34 CHAPTER II. FORTRAN IMPLEMENTATION
unload steps within edge_network. If true, it communicates the crhs and rhs vector types
in the computation_pointers module, which is something of an archaic remnant of the
predatapartitioning days.
The ﬁnal step is to unload the bordersummed data.
DO ibl=1,nbl
CALL edge_unload_carr(rhsdum(ibl),nqty,1_i4,nfour,nside,seam(ibl))
ENDDO
The pardata module also has structures for the seams that are used for parallel communi
cation only. They are accessed indirectly through edge_network and do not appear in the
ﬁnite element or linear algebra coding.
C Finite Element Computations
Creating a linear system of equations for the advance of a solution ﬁeld is a nontrivial task
with 2D ﬁnite elements. The FORTRAN modules listed on the left side of the implementation
map (Sec. A) have been developed to minimize the programming burden associated with
adding new terms and equations. In fact, normal physics capability development occurs
at only two levels. Near the bottom of the map, the integrand modules hold coding that
determines what appears in the volume integrals like those on pg11 for Faraday’s law. In
many instances, adding terms to existing equations requires modiﬁcations at this level only.
The other level subject to frequent development is the management level at the top of the
map. The management routines control which integrand routines are called for settingup
an advance. They also need to call the iterative solver that is appropriate for a particular
linear system. Adding a new equation to the physics model usually requires adding a new
management routine as well as new integrand routines. This section of the tutorial is intended
to provide enough information to guide such development.
C.1 Management Routines
The adv_* routines in ﬁle nimrod.f manage the advance of a physical ﬁeld. They are called
once per advance in the case of linear computations without ﬂow, or they are called twice to
predict then correct the eﬀects of advection. A character passﬂag informs the management
routine which step to take.
The sequence of operations performed by a management routine is outlined in Sec. D.2b.
This section describes more of the details needed to create a management routine.
To use the most eﬃcient linear solver for each equation, there are now three diﬀerent types
of management routines. The simplest matrices are diagonal in Fourier component and have
decoupling between various real and imaginary vector components. These advances require
C. FINITE ELEMENT COMPUTATIONS 35
two matrix solves (using the same matrix) for each Fourier component. The adv_v_clap
is an example; the matrix is the sum of a mass matrix and the discretized Laplacian. The
second type is diagonal in Fouries component, but all real and imaginary vector components
must be solved simultaneously, usually due to anisotropy. The full semiimplicit operator
has such anisotropy, hence adv_v_aniso is this type of management routine. The third type
of management routine deals with coupling among Fourier components, i.e. 3D matrices.
Full use of an evolving number density ﬁeld requires a 3D matrix solve for velocity, and this
advance is managed by adv_v_3d.
All these management routine types are similar with respect to creating a set of 2D ma
trices (used only for preconditioning in the 3D matrixsystems) and with respect to ﬁnding
the rhs vector. The call to matrix_create passes an array of global_matrix_type and an
array of matrix_factor_type, so that operators for each Fourier components can be saved.
Subroutine names for the integrand and essential boundary conditions are the next argu
ments, followed by a b.c. ﬂag and the ’pass’ character variable. If elimination of cellinterior
data is appropriate (see Sec. B.4), matrix elements will be modiﬁed within matrix_create,
and interior storage [rbl_mat(ibl)%mat(4,4)%arr] is then used to hold the inverse of the
interior submatrix (A
−1
22
).
The get_rhs calls are similar, except the parameter list is (integrand routine name,
cvector_type array, essential b.c. routine name, b.c. ﬂags (2), logical switch for using
a surface integral, surface integrand name, global_matrix_type). The last argument is
optional and its presence indicates that interior elimination is used. [Use rmat_elim= or
cmat_elim= to signify real or complex matrix types.
Once the matrix and rhs are formed, there are vector and matrix operations to ﬁnd the
product of the matrix and the old solution ﬁeld, and the result is added to the rhs vector
before calling the iterative solve. This step allows us to write integrand routines for the
change of a solution ﬁeld rather than its new value, but then solve for the new ﬁeld so that
the relative tolerance of the iterative solver is not applied to the change, since ∆x x is
often true.
Summarizing:
• Integrands are written for A and b with A∆x = b to avoid coding semiimplicit terms
twice.
• Before the iterative solve, ﬁnd b − Ax
old
.
• The iterative method solves Ax
new
= (b − x
old
).
The 3D management routines are similar to the 2D management routines. They diﬀer in
the following way:
• Cellinterior elimination is not used.
• Loops over Fourier components are within the iterative solve.
36 CHAPTER II. FORTRAN IMPLEMENTATION
• The iterative solve used a ’matrixfree’ approach, where the full 3D matrix is never
formed. Instead, an rhs integrand name is passed.
See Sec. D for more information on the iterative solves.
After an iterative solve is complete, there are some vector operations, including the com
pletion of regularity conditions and an enforcement of Dirichlet boundary conditions. The
latter prevents an accumulation of error from the iterative solver, but it is probably super
ﬂuous. Data is saved (with timecentering in predictor steps) as basis function coeﬃcients
and at quadrature points after interpolating through the *block_qp_update calls.
C.2 Finite_element Module
Routines in the finite_element module also serve managerial functions, but the operations
are those repeated for every ﬁeld advance, allowing for a few options.
The real_ and comp_matrix_create routines ﬁrst call the separate numerical integration
routines for rblocks and tblocks, passing the same integrand routine name. Next, they call
the appropriate matelim routine to reduce the number of unknowns when poly_degree> 1.
Then, connections for a degenerate rblock (one with a circularpolar origin vertex) are
collected. Finally, iter_factor ﬁnds some type of approximate factorization to use as a
preconditioner for the iterative solve.
The get_rhs routine also starts with blockspeciﬁc numerical integration, and it may
perform an optional surface integration. The next step is to modify gridvertex and cellside
coeﬃcients through cellinterior elimination (b
1
− A
12
A
−1
22
b
2
from p.53). The blockborder
seaming then completes a scatter process:
The ﬁnal steps in get_rhs apply regularity and essential boundary conditions.
Aside on the programming:
• Although the names of the integrand, surface integrand, and boundary condition rou
tines are passed through matrix_create and get_rhs, finite_element does not use
the integrands, surface_ints, or boundary modules. finite_element just needs
parameterlist information to call any routine out of the three classes. The parameter
list information, including assumedshape array deﬁnitions, are provided by the inter
face blocks. Note that all integrand routines suitable for comp_matrix_create, for
example, must use the same argument list as that provided by the interface block for
‘integrand’ in comp_matrix_create.
C.3 Numerical integration
Modules at the next level in the ﬁnite element hierarchy perform the routine operations
required for numerical integration over volumes and surfaces. The three modules, rblock,
C. FINITE ELEMENT COMPUTATIONS 37
A volume integral over
this bicubic cell contributes
to node coefficients at the
positions.
A volume integral here
contributes to node
coefficients at the
positions.
Node coefficients along block borders require
an edge_network call to get all contributions.
Figure II.7: Blockborder seaming
38 CHAPTER II. FORTRAN IMPLEMENTATION
ig=1 ig=2
Figure II.8: 4 quad points, 4 elements in a block, dot denotes quadrature position, not node
tblock, and surface, are separated to provide a partitioning of the diﬀerent geometric infor
mation and integration procedures. This approach would allow us to incorporate additional
block types in a straightforward manner should the need arise.
The rblock_get_comp_rhs routine serves as an example of the numerical integration
performed in NIMROD. It starts by setting the storage arrays in the passed cvector_type
to ∅. It then allocates the temporary array, integrand, which is used to collect contributions
for test functions that are nonzero in a cell, one quadrature point at a time. Our quadrilateral
elements have (poly_degree+1)
2
nonzero test functions in every cell (see the Figure II.7 and
observe the number of node positions).
The integration is performed as loop over quadrature points, using saved data for the
logical oﬀsets from the bottom left corner of each cell, R, and w
i
∗ J(ξ
i
, η
i
). Thus each
iteration of the doloop ﬁnds the contributions for the same respective quadrature point in
all elements of a block.
The call to the dummy name for the passed integrand then ﬁnds the equation and
algorithmspeciﬁc information from a quadrature point. A blank tblock is passed along
with the rb structure for the block, since the respective tblock integration routine calls the
same integrands and one has to be able to use the same argument list. In fact, if there were
only one type of block, we would not need the temporary integrand storage at all.
The ﬁnal step in the integration is the scatter into each basis function coeﬃcient storage.
Note that w
i
∗ J appears here, so in the integral
dξdηJ(ξ, η)f(ξ, η), the integrand routine
ﬁnds f(ξ, η) only.
The matrix integration routines are somewhat more complicated, because contributions
have to be sorted for each basis type and oﬀset pair, as described in B.4 [NIMROD uses two
diﬀerent labeling schemes for the basis function nodes. The vector_type and global_matrix_type
separate basis types (grid vertex, horizontal side, vertical side, cell interior) for eﬃciency in
C. FINITE ELEMENT COMPUTATIONS 39
linear algebra. In contrast, the integrand routines use a single list for nonzero basis func
tions in each element. The latter is simpler and more tractable for the blockindependent
integrand coding, but each node may have more than one label in this scheme. The matrix
integration routines have to do the work of translating between the two labeling schemes.]
C.4 Integrands
Before delving further into the coding, let’s review what is typically needed for an integrand
computation. The Faraday’s law example in Section A.1 suggests the following:
1. Finite element basis functions and their derivatives with respect to R and Z.
2. Wavenumbers for the toridal direction
3. Values of solution ﬁelds and their derivatives.
4. Vector operations.
5. Pseudospectral computations.
6. Input parameters and global simulation data.
Some of this information is already stored at the quadrature point locations. The basis
function information depends on the grid and the selected solution space. Neither vary during
a simulation, so the basis function information is evaluated and saved in blockstructure
array as described in B.1. Solution ﬁeld data is interpolated to the quadrature pions and
stored at the end of the equation advances. It is available through the qp_type storage, also
described in B.1. Therefore, generic_alpha_eval and generic_ptr_set routines merely
set pointers to the appropriate storage locations and do not copy the data to new locations.
It is important to remember that integrand computations must not change any values
in the pointer arrays. Doing so will lead to unexpected and confusing consequences in other
integrand routines.
The input and global parameteres are satisﬁed by the F90 USE statement of modules given
those names. The array keff(1:nmodes), is provided to ﬁnd the value of the wavenumber
associated with each Fourier component. The factor keff(1:nmodes)/bigr is the toroidal
wavenumber for the data at Fourier index imode. There are two things to note:
1. In the toroidal geometry, keff, holds the nvalue, and bigr(:,:) holds R. In linear
geometry, keff holds k
n
=
2πn
per length
and bigr=1.
2. Do not assume that a particular nvalue is associated with some value of imode. Lin
ear computations often have nmodes=1 and keff(1) equal to the input parameter
lin_nmax. Domain decomposition also aﬀects the values and range of keff. This is
why there is a keff array.
40 CHAPTER II. FORTRAN IMPLEMENTATION
The k2ef array holds the square of each value of keff.
Continuing with the Faraday’s law example, the ﬁrst term on the lhs of the second
equation on pg 11 is associated with
∂B
∂t
. The spatial discretization gives the integrand α
i
α
j
(recall that J appears at the next higher level in the hierarchy). This ‘mass’ matrix is so
common that it has its own integrand routine, get_mass. [The matrix elements are stored in
the mass_mat data structure. Operators requiring a mass matrix can have matrix_create
add it to another operator.]
The int array passed to get_mass has 6 dimensions, like int arrays for other matrix
computations. The ﬁrst two dimensions are directionvectorcomponent or ‘quantity’ indices.
The ﬁrst is the column quantity (jq), and the second is the row quantity (iq). The next two
dimensions have range (1:mx,1:my) for the cell indices in a gridblock. The last two indices
are basis function indices (jv,iv) for the j’ column basis index and the j row index. For
get_mass, the orthogonality of direction vectors implies that ˆ e
l
does not appear, so iq=jq=1:
α
j
(ξ, η)ˆ e
l
exp (−in
φ) test function dotted with
α
j
(ξ, η)ˆ e
l
exp(inφ)
∂B
jln
∂t
gives
α
j
α
j
for all l
and n
All of the stored matrices are 2D, i.e. diagonal in n, so separate indices for row and column
n indices are not used.
The get_mass subroutines then consists of a dimension check to determine the number
of nonzero ﬁnite elemnt basis function in each cell, a pointer set for α, and a double loop over
basis indices. Note that our operators have full storage, regardless of symmetry. This helps
make our matrixvector products faster and it implies that the same routines are suitable
for forming nonsymmetric matrices.
The next term in the lhs (from pgB.2) arises from implicit diﬀusion. The adv_b_iso
management routine passes the curl_de_iso integrand name, so lets examine that routine.
It is more complicated than get_mass, but the general form is similar. The routine is used
for creating the resistive diﬀusion operator and the semiimplicit operator for Hall terms, so
there is coding for two diﬀerent coeﬃcients. The character variable integrand_flag, that
is set in the advance subroutine of nimrod.f and saved in the global module, determines
which coeﬃcient to use. Note that calls to generic_ptr_set pass both rblock andtblock
structures for a ﬁeld. One of them is always a blank structure, but this keeps block speciﬁc
coding out of integrands. Instead, the generic_evals module selects the appropriate data
storage.
After the impedence coeﬃcient is found, the int array is ﬁlled. The basis loops appear
in a conditional statement to use or skip the ∇∇· term for the divergence cleaning. We will
skip them in this discussion. Thus we need ∇ × (α
j
ˆe
l
exp (−in
φ)) · ∇ × (α
j
ˆ e
l
exp(inφ))
only, as on pg 11. To understand what is in the basis function loops, we need to evaluate
∇× (α
j
ˆ e
l
exp(inφ)). Though ˆ e
l
is ˆ e
R
, ˆ e
Z
or ˆ e
φ
only, it is straightforward to use a general
C. FINITE ELEMENT COMPUTATIONS 41
direction vector and then restrict attention to the three basis vectors of our coordinate
system.
∇× (α
j
ˆe
l
exp(inφ)) = ∇(α
j
exp (inφ)) × ˆe
l
+ α
j
exp (inφ)∇×(ˆ e
l
)
∇(α
j
exp(inφ)) =
∂α
∂R
ˆ
R +
∂α
∂Z
ˆ
Z +
inα
R
α
ˆ
φ
exp(inφ)
∇(α
j
exp (inφ)) × ˆ e
l
=
¸
∂α
∂Z
(ˆ e
l
)
φ
−
inα
R
(ˆ e
l
)
Z
ˆ
R +
inα
R
(ˆ e
l
)
R
−
∂α
∂R
(ˆ e
l
)
φ
ˆ
Z
+
∂α
∂R
(ˆ e
l
)
Z
−
∂α
∂Z
(ˆ e
l
)
R
ˆ
φ
exp(inφ)
∇× (ˆ e
l
) = 0 for l = R, Z
∇× (ˆ e
l
) = −
1
R
ˆ
Z for l = φand toroidal geometry
Thus, the scalar product ∇×(α
j
ˆ e
l
exp (−in
φ)) · ∇× (α
j
ˆ e
l
exp(inφ)) is
∂α
j
∂Z
(ˆ e
l
)
φ
+
in
α
j
R
(ˆ e
l
)
Z
∂α
j
∂Z
(ˆ e
l
)
φ
−
inα
j
R
(ˆ e
l
)
Z
+
−
inα
j
R
(ˆ e
l
)
R
−
α
j
R
+
∂α
j
∂R
(ˆ e
l
)
φ
inα
j
R
(ˆ e
l
)
R
−
α
j
R
+
∂α
∂R
(ˆ e
l
)
φ
+
∂α
j
∂R
(ˆ e
l
)
Z
−
∂α
j
∂Z
(ˆ e
l
)
R
∂α
j
∂R
(ˆ e
l
)
Z
−
∂α
j
∂Z
(ˆ e
l
)
R
Finding the appropriate scalar product for a given (jq,iq) pair is then a matter of sub
stituting
ˆ
R,
ˆ
Z,
ˆ
φ for ˆ e
l
for l
⇒ iq = 1, 2, 3, respectively. This simpliﬁes the scalar product
greatly for each case. Note how the symmetry of the operator is preserved by construction.
For (jq, iq = (1, 1), (ˆ e
l
)
R
= (ˆ e
l
)
R
= 1 and (ˆ e
l
)
Z
= (ˆ e
l
)
φ
= (ˆ e
l
)
Z
= (ˆ e
l
)
φ
= 0 so
int(1,1,:,:,jv,iv)=
n
2
R
α
jv
αiv +
∂α
jv
∂Z
α
iv
∂Z
×ziso
where ziso is an eﬀective impedance.
Like other isotropic operators the resulting matrix elements are either purely real or
purely imaginary, and the only imaginary elements are those coupling poloidal and toroidal
vector components. Thus, only real poloidal coeﬃcients are coupled to imaginary toroidal
coeﬃcients and vice versa. Furthermore, the coupling between Re(B
R
, B
Z
) and Im(B
φ
) is the
same as that between Im(B
R
, B
Z
) and Re(B
φ
), making the operator phase independent. This
lets us solve for these separate groups of components as two real matrix equations, instead
of one complex matrix equation, saving CPU time. An example of the poloidaltoroidal
coupling for real coeﬃcients is
42 CHAPTER II. FORTRAN IMPLEMENTATION
int(1,3,:,:,jv,iv)=
nα
jv
R
αiv
R
+
α
iv
∂R
×ziso
which appears through a transposeof the (3, 1) element. [compare with −
inα
jv
R
αiv
R
+
α
iv
∂R
from the scalar product on pg 41]
Notes on matrix integrands:
• The Fourier component index, jmode, is taken from the global module. It is the
doloop index set in matrix_create and must not be changed in integrand routines.
• Other vectordiﬀerential operations acting on α
j
ˆ e
l
exp(inφ) are needed elsewhere. The
computations are derived in the same manner as ∇× (α
j
ˆ e
l
exp(inφ)) given above.
• In some case (semiimplicit operators, for example) the n = 0 component of a ﬁeld
is needed regardless of the nvalue assigned to a processor. Separate real storage is
created and saved on all processors (see Sects F& C).
The rhs integrands diﬀer from matrix integrands in a few ways:
1. The vector aspect (in a linear algebra sense) of the result implies one set of basis
function indices in the int array.
2. Coeﬃcients for all Fourier components are created in one integrand call.
3. The int array has 5 dimensions, int(iq,:,:,iv,im), where im is the Fourier index
(imode).
4. There are FFTs and pseudospectral operations to generate nonlinear terms.
Returning to Faraday’s law as an example, the brhs_mhd routine ﬁnds ∇×(α
j
ˆ e
l
exp (−in
φ))·
E(R, Z, φ), where E may contain ideal mhd, resistive, and neoclassical contributions. As with
the matrix integrand, E is only needed at the quadrature points.
From the ﬁrst executable statement, there are a number of pointers set to make various
ﬁelds available. The pointers for B are set to the storage for data from the end of the last
timesplit, and the math_tran routine, math_curl is used to ﬁnd the corresponding current
density. Then, ∇·B is found for error diﬀusion terms. After that, the be,ber, and bez arrays
are reset to the predicted B for the ideal electric ﬁeld during corrector steps (indicated by
the integrand_flag. More pointers are then set for neoclassical calculations.
The ﬁrst nonlinear computation appears after the neoclassical_init select block. The
V and Bdata is transformed to functions of φ, where the cross product, V×Bis determined.
The resulting nonlinear ideal E is then transformed to Fourier components (see Sec B.1).
Note that fft_nim concatenates the poloidal position indices into one index, so that real_be
has dimensions (1:3,1:mpseudo,1:nphi) for example. The mpseudo parameter can be less
C. FINITE ELEMENT COMPUTATIONS 43
than the number of cells in a block due to domain decomposition (see Sec C), so one should
not relate the poloidal index with the 2D poloidal indices used with Fourier components.
A loop over the Fourier index follows the pseudospectral computations. The linear ideal
E terms are created using the data for the speciﬁed steady solution, completing  (V
s
×B+
(V× B
s
+ (V×B) (see Sec A.1). Then, the resistive and neoclassical terms are computed.
Near the end of the routine, we encounter the loop over testfunction basis function
indices (j
, l
). The terms are similar to those in curl_de_iso, except ∇×(α
j
ˆ e
l
exp (−inφ))
is replaced by the local E, and there is a sign change for −∇× (α
j
ˆ e
l
exp(−in
φ)) · E.
The integrand routines used during iterative solves of 3D matrices are very similar to
rhs integrand routines. They are used to ﬁnd the dot product of a 3D matrix and a vector
without forming matrix elements themselves (see Sec D.2). The only noteworthy diﬀerence
with rhs integrands is that the operand vector is used only once. It is therefore interpolated
to the quadrature locations in the integrand routine with a generic_all_eval call, and the
interpolate is left in local arrays. In p_aniso_dot, for example, pres,presr, and presz are
local arrays, not pointers.
C.5 Surface Computations
The surface and surface_int modules function in an analogous manner to the volume
integration and integrand modules, respectively. One important diﬀerence is that not all
border segments of a block require surface computation (when there is one). Only those
on the domain boundary have contributions, and the seam data is not carried below the
finite_element level. Thus, the surface integration is called one cellside at a time from
get_rhs.
At present, 1D basis function information for the cell side is generated on the ﬂy in the
surface integration routines. It is passed to a surface integrand. Further development will
likely be required if more complicated operations are needed in the computations.
C.6 Dirichlet boundary conditions
The boundary module contains routines used for setting Dirichlet (or ‘essential’) boundary
conditions. They work by changing the appropriate rows of a linear system to
(A
jln jln
)(x
jln
) = 0, (II.4)
where j is a basis function node along the boundary, and l and n are the directionvector and
Fourier component indices. In some cases, a linear combination is set to zero. For Dirichlet
conditions on the normal component, for example, the rhs of Ax = b is modiﬁed to
(I − ˆ n
j
ˆ n
j
) · b ≡
˜
b (II.5)
44 CHAPTER II. FORTRAN IMPLEMENTATION
where ˆ n
j
is the unit normal direction at boundary node j. The matrix becomes
(I − ˆ n
j
ˆ n
j
) · A · (I − ˆ n
j
ˆ n
j
) + ˆ n
j
ˆ n
j
(II.6)
Dotting ˆ n
j
into the modiﬁed system gives
x
jn
= ˆ n
j
· x = 0 (II.7)
Dotting (I − ˆ n
j
ˆ n
j
) into the modiﬁed system gives
(I − ˆ n
j
ˆ n
j
) · A · ˜ x =
˜
b (II.8)
The net eﬀect is to apply the boundary condition and to remove x
jn
from the rest of the
linear system.
Notes:
• Changing boundary node equations is computationally more tractable than changing
the size of the linear system, as implied by ﬁnite element theory.
• Removing coeﬃcients from the rest of the system [through ·(I −ˆ nˆ n) on the right side of
A] may seem unnecessary. However, it preserves the symmetric form of the operator,
which is important when using conjugate gradients.
The dirichlet_rhs routine in the boundary module modiﬁes a cvector_type through
operations like (I −ˆ nˆ n)·. The ‘component’ character input is a ﬂag describing which compo
nent(s) should be set to 0. The seam data tang, norm, and intxy(s) for each vertex (segment)
are used to minimize the amount of computation.
The dirichlet_op and dirichlet_comp_op routines perform the related matrix opera
tions described above. Though mathematically simple, the coding is involved, since it has
to address the oﬀset storage scheme and multiple basis types for rblocks.
At present, the kernel always applies the same Dirichlet conditions to all boundary nodes
of a domain. Since the dirichlet_rhs is called one block at a time, diﬀerent component
input could be provided for diﬀerent blocks. However, the dirichlet_op routines would
need modiﬁcation to be consistent with the rhs. It would also be straightforward to make
the ‘component’ information part of the seam structure, so that it could vary along the
boundary independent of the block decomposition.
Since our equations are constructed for the change of a ﬁeld and not its new value, the
existing routines can be used for inhomogeneous Dirichlet conditions, too. If the conditions
do not change in time, one only needs to comment out the calls to dirichlet_rhs at the end
of the respective management routine and in dirichlet_vals_init. For timedependent
inhomogeneous conditions, one can adjust the boundary node values from the management
routine level, then proceed with homogeneous conditions in the system for the change in the
ﬁeld. Some consideration for timecentering the boundary data may be required. However,
the error is no more than O(∆t) in any case, provided that the time rate of change does not
appear explicitly.
C. FINITE ELEMENT COMPUTATIONS 45
C.7 Regularity conditions
When the input parameter geometry is set to ‘tor’, and the left side of the grid has points
along R = 0, the domain is simply connected – cylindrical with axis along Z, not toroidal.
With the cylindrical (R, Z, φ) coordinate system, our Fourier coeﬃcients must satisfy regu
larity conditions so that ﬁelds and their derivatives approach φindependent values at R = 0.
The regularity conditions may be derived by inserting the x = Rcos φ, y = Rsin φ
transformation and derivatives into the 2D Taylor series expansion in (x, y) about the origin.
It is straightforward to show that the Fourier φcoeﬃcients of a scalar S(R, φ) must satisfy
S
n
(φ) ∼ R
n
ψ(R
2
), (II.9)
where ψ is a series of nonnegative powers of its argument. For vectors, the Zdirection
component satisﬁes the relation for scalars, but
V
R
n
, V
φ
n
∼ R
n−1
ψ(R
2
) for n ≥ 0. (II.10)
Furthermore, at R = 0, phase information for V
R
1
and V
φ
1
is redundant. We take φ to be 0
at R = 0 and enforce
V
φ
1
= iV
R
1
. (II.11)
The entire functional form of the regularity conditions is not imposed numerically. The
conditions are for the limit of R →0 and would be too restrictive if applied across the ﬁnite
extent of a computational cell. Instead, we focus on the leading terms of the expansions.
The routines in the regularity module impose the leadingorder behavior in a manner
similar to what is done for Dirichlet boundary conditions. The regular_vec, regular_op,
and regular_comp_op routines modify the linear systems to set S
n
= 0 at R = 0 for n > 0
and to set V
R
n
and V
φ
n
to 0 for n = 0 and n ≥ 2. The phase relation for V
R
1
and V
φ
1
at
R = 0 is enforced by the following steps:
1. At R = 0 nodes only, change variables to V
+
= (V
r
+iV
φ
)/2 and V
−
= (V
r
− iV
φ
)/2.
2. Set V
+
= 0, i.e. set all elements in a V
+
column to 0.
3. To preserve symmetry with nonzero V
−
column entries, add −i times the V
φ
row to
the V
R
row and set all elements of the V
φ
row to 0 except for the diagonal.
After the linear system is solved, the regular_ave routine is used to set V
φ
= iV
−
at the
appropriate nodes.
Returning to the power series behavior for R → 0, the leading order behavior for S
0
,
V
R
1
, and V
φ
is the vanishing radial derivative. This is like a Neumann boundary condition.
Unfortunately, we cannot rely on a ‘natural B.C.’ approach because there is no surface with
46 CHAPTER II. FORTRAN IMPLEMENTATION
ﬁnite area along an R = 0 side of a computational domain. Instead, we add the penalty
matrix,
dV
w
R
2
∂α
j
∂R
∂α
j
∂R
, (II.12)
saved in the dr_penalty matrix structure, with suitable normalization to matrix rows for
S
0
, V
R
1
and V
φ
1
. The weight w is nonzero in elements along R = 0 only. Its positive value
penalizes changes leading to
∂
∂R
= 0 in these elements. It is not diﬀusive, since it is added to
linear equations for the changes in physical ﬁelds. The regular_op and regular_comp_op
routines add this penalty term before addressing the other regularity conditions.
[Looking through the subroutines, it appears that the penalty has been added for V
R
1
and V
φ
1
only. We should keep this in mind if a problem arises with S
0
(or V
Z
0
) near R = 0.]
D Matrix Solution
At present, NIMROD relies on its own iterative solvers that are coded in FORTRAN 90 like
the rest of the algorithm. They perform the basic conjugate gradient steps in NIMROD’s
block decomposed vector_type structures, and the matrix_type storage arrays have been
arranged to optimized matrixvector multiplication.
D.1 2D matrices
The iter_cg_f90 and iter_cg_comp modules contain the routines needed to solve symmetric
positivedeﬁnite and Hermitianpositivedeﬁnite systems, respectively. The iter_cg module
holds interface blocks to give common names and the real and complex routines. The rest
of the iter_cg.f ﬁle has external subroutines that address solverspeciﬁc blockborder com
munication operations.
Within the long iter_cg_f90.f and iter_cg_comp.f ﬁles are separate modules for
routines for diﬀerent preconditioning schemes, a module for managing partial factoriza
tion(iter_*_fac), and a module for managing routines that solve a system(iter_cg_*).
The basic cg steps are performed by iter_solve_*, and may be compared with textbook
descriptions of cg, once one understands NIMROD’s vector_type manipulation.
Although normal seam communication is used during matrix_vector multiplication,
there are also solverspeciﬁc block communication operations. The partial factorization rou
tines for preconditioning need matrix elements to be summed across block borders(including
oﬀdiagonal elements connecting border nodes), unlike matrix_vector product routines.
The iter_mat_com routines execute this communication using special functionality in edge_seg_network
for the oﬀdiagonal elements. After the partial factors are found and stored in the ma
trix_factor_type structure, border elements of the matric storage are restored to their
original values by iter_mat_rest.
D. MATRIX SOLUTION 47
Other seamrelated oddities are the averaging factors for border elements. The scalar
product of two vector_type is computed by iter_dot. The routine is simple, but it has
to account for redundant storage of coeﬃcients along block borders, hence ave_factor. In
the preconditioning step, the solve of
˜
Az = r where
˜
A is an approximate A and r is the
residual, results from blockbased partialfactors(the “direct”, “bl_ilu_*”, and “bl_diag*”
choices) are averaged at block borders. However, simply multiplying border zvalue by
ave_factor and seaming after preconditioning destroys the symmetry of the operation(and
convergence). Instead, we symmetrize the averaging by multiply border elements of r by
ave_factor_pre(∼
√
ave factor), inverting
˜
A, then multiply z by ave_factor_pre before
summing.
Each preconditioner has its own set of routines and factor storage, except for the global
and block lineJacobi algorithms which share 1D matrix routines. The blockdirect option
uses Lapack library routines. The blockincomplete factorization options are complicated but
eﬀective for midrange condition numbers(arising roughly when ∆t
explicit−limit
∆t τ
A
).
Neither of these two schemes has been updated to function with higherorder node data. The
block and global lineJacobi schemes use 1D solves of couplings along logical directions, and
the latter has its own parallel communication(see D). There is also diagonal preconditioning,
which inverts local direction vector couplings only. This simple scheme is the only one has
functions in both rblocks and tblocks. Computations with both block types and solver =
‘diagonal’ will use the speciﬁed solver in rblocks and ‘diagonal’ in tblocks.
NIMROD grids may have periodic rblocks, degenerate points in rblocks, and cellinterior
data may or may not be eliminated. These idiosyncracies have little eﬀect on the basic
cg steps, but they complicate the preconditioning operations. They also make coupling to
external library solver packages somewhat challenging.
D.2 3D matrices
The conjugate gradient solver for 3D systems is mathematically similar to the 2D solvers.
Computationally, it is quite diﬀerent. The Fourier representation leads to matrices that are
dense in nindex when φdependencies appear in the lhs of an equation. Storage requirements
for such a matrix would have to grow as the number of Fourier components is increased, even
with parallel decomposition. Achieving parallel scaling would be very challenging.
To avoid these issues, we use a ‘matrixfree’ approach, where the matrixvector products
needed for cg iteration are formed directly from rhstype ﬁnite element computation. For
example, were the resistivity in our form if Faraday’s law on p.? a function of φ, it would
generate oﬀdiagonal in n contributions:
¸
jln
B
jln
dRdZdφ
η(R, Z, φ)
µ
0
∇×(¯ α
j
l
e
−in
φ
) · ∇×(¯ α
jl
e
inφ
) (II.13)
which may be nonzero for all n
.
48 CHAPTER II. FORTRAN IMPLEMENTATION
We can switch the summation and integration orders to arrive at
dRdZdφ∇× (¯ α
j
l
e
−in
φ
) ·
η(R, Z, φ)
µ
0
[
¸
jln
B
jln
∇× (¯ α
jl
e
inφ
)] (II.14)
Identifying the term in brackets as µ
0
J(R, Z, φ), the curl of the interpolated and FFT’ed
B, shows how the product can be found as a rhs computation. The B
jln
data are coeﬃ
cients of the operand, but when they are associated with ﬁniteelement structures, calls to
generic_all_eval and math_curl create µ
0
J
n
. Thus, instead of calling an explicit matrix
vector product routine, iter_3d_solve calls get_rhs in ﬁnite element, passing a dotproduct
integrand name. This integrand routine uses the coeﬃcients of the operand in ﬁniteelement
interpolations.
The scheme uses 2D matrix structures to generate partial factors for preconditioning that
do not address n → n
coupling. It also accepts a separate 2D matrix structure to avoid
repetition of diagonal in n operations in the dotintegrand; the product is then the sum of
the ﬁnite element result and a 2Dmatrixvector multiplication. Further, iter_3d_cg solves
systems for changes in ﬁelds directly. The solution ﬁeld at the beginning of the time split is
passed into iter_3d_cg_solve to scale norms appropriately for the stop condition.
E Startup Routines
Development of the physics model often requires new *block_type and matrix storage. [The
storage modules and structure types are described in Sec. B.] Allocation and initialization
of storage are primary tasks of the startup routines located in ﬁle nimrod_init.f. Adding
new variables and matrices is usually a straightforward task of identifying an existing similar
data structure and copying and modifying calls to allocation routines.
The variable_alloc routine creates quadraturepoint storage with calls to rblock_qp_alloc
and tblock_qp_alloc. There are also lagr_quad_alloc and tri_linear_alloc calls to
create ﬁniteelement structures for nonfundamental ﬁelds (like J which is computed from
the fundamental B ﬁeld) and work structures for predicted ﬁelds. Finally, variable_alloc
creates vector_type structures used for diagnostic computations. The quadrature_save
routine allocates and initializes quadrature_point storage for the steadystate (or ’equilib
rium’) ﬁelds, and it initializes quadraturepoint storage for dependent ﬁelds.
Additional ﬁeld initialization information:
• The ˆ e
3
component of be_eq structures use the covariant component (RB
φ
) in toroidal
geometry, and the ˆ e
3
of ja_eq is the contravariant J
φ
/R. The data is converted to
cylindrical components after evaluating at quadrature point locations. The cylindrical
components are saved in their respective qp structures.
• Initialization routines have conditional statements that determine what storage is cre
ated for diﬀerent physics model options.
F. UTILITIES 49
• The pointer_init routine links the vector_type and ﬁnite element structures via
pointer assignment. New ﬁelds will need to be added to this list too.
• The boundary_vals_init routine enforces the homogeneous Dirichlet boundary con
ditions on the initial state. Selected calls can be commented out if inhomogeneous
conditions are appropriate.
The matrix_init routine allocates matrix and preconditioner storage structures. Opera
tions common to all matrix allocations are coded in the *_matrix_init_alloc, iter_fac_alloc,
and *_matrix_fac_degen subroutines. Any new structure allocations can be coded by copy
ing and modifying existing allocations.
F Utilities
The utilities.f ﬁle has subroutines for operations that are not encompassed by finite
_element and normal vector_type operations. The new_dt and ave_field_check sub
routines are particularly important, though not elegantly coded, subroutines. Teh new_dt
routine computes the rate of ﬂow through mesh cells to determine if advection should limit
the time step. Velocity vectors are averaged over all nodes in an element, and rates of ad
vection are found from v· x
i
/x
i

2
 in each cell, where x
i
is a vector displacement across the
cell in the ith logical coordinate direction. A similar computation is performed for electron
ﬂow when the twoﬂuid model is used.
The ave_field_check routine is an important part of our semiimplicit advance. The
‘linear’ terms in the operators use the steadystate ﬁelds and the n = 0 part of the solution.
The coeﬃcient for the isotropic part of the operator is based on the maximum diﬀerence be
tween total pressures and the steady_plus_n=0 pressures, over the φdirection. Thus, both
the ‘linear’ and ‘nonlinear’ parts of the semiimplicit operator change in time. However, com
puting matrix elements and ﬁnding partial factors fro preconditioning are computationally
intensive operations. To avoid calling these operations during every time step, we determine
how much the ﬁelds have changed sicne the last matrix update, and we compute new matrices
when the change exceeds a tolerance (ave_change_limit). [Matrices are also recomputed
when ∆t changes.]
The ave_field_check subroutine uses the vector_type pointers to facilitate the test.
It also considers the gridvertex data only, assuming it would not remain ﬁxed while other
coeﬃcients change. If the tolerance is exceeded, ﬂags such as b0_changed in the global
module are set to true, and the storage arrays for the n = 0 ﬁelds or nonlinear pressures
are updated. Parallel communication is required for domain decomposition of the Fourier
components (see Sec. C).
Before leaving utilities.f, let’s take a quick look at find_bb. This routine is used to
ﬁnd the toroidally symmetric part of the
ˆ
b
ˆ
b dyad, which is used for the 2D preconditioner
matrix for anisotropic thermal conduction. The computation requires information from
50 CHAPTER II. FORTRAN IMPLEMENTATION
all Fourier components, so it cannot occur in a matrix integrand routine. [Moving the
jmode loop from matrix_create to matrix integrands is neither practical nor preferrable.]
Thus,
ˆ
b
ˆ
b is created and stored at quadrature point locations, consistent with an integrand
type of computation, but separate from the ﬁniteelement hierarchy. The routine uses the
mpi_allreduce command to sum contributions from diﬀerent Fourier ’layers’.
Routines like find_bb may become common if we ﬁnd it necessary to use more 3D linear
systems in our advances.
G Extrap_mod
The extrap_mod module holds data and routines for extrapolating solution ﬁelds to the new
time level, providing the initial guess for an iterative linear system solve. The amount of
data saved depends on the extrap_order input parameter.
Code development rarely requires modiﬁcation of these routines, except extrap_init.
This routine decides how much storage is required, and it creates an index for locating the
data for each advance. Therefore, when adding a new equation, increase the dimension fo
extrap_q, extrap_nq, and extrap_int and deﬁne the new values in extrap_init. Again,
using existing code for examples is helpful.
H History Module hist_mod
The timedependent data written to XDRAW binary or text ﬁles is generated by the routines in
hist_mod. The output from probe_hist is presently just a collection of coeﬃcients from a
single grid vertex. Diagnostics requiring surface or volume integrals (toroidal ﬂux and kinetic
energy, for example) use rblock and tblock numerical integration of the integrands coded
in the diagnostic_ints module.
The numerical integration and integrands diﬀer only slightly from those used in form
ing systems for advancing the solution. First, the cell_rhs and cell_crhs vector_types
are allocated with poly_degree=0. This allocates a single cellinterior basis and no grid or
side bases. [The *block_get_rhs routines have ﬂexibility for using diﬀerent basis functions
through a simulation. They ’scatter’ to any vector_type storage provided by the calling rou
tine.] The diagnostic_ints integrands diﬀer in that there are no ¯ α
j
l
e
−in
φ
test functions.
The integrals are over physical densities.
I I/O
Coding new input parameters requires two changes to input.f (in the nimset directory
and an addition to parallel.f. The ﬁrst part of the input module deﬁnes variables and
default values. A description of the parameters should be provided if other users will have
I. I/O 51
access to the modiﬁed code. A new parameter should be deﬁned near related parameters
in the ﬁle. In addition, the parameter must be added to the appropriate namelist in the
read_namelist subroutine, so that it can be read from a nimrod.in ﬁle. The change required
in parallel.f is just to add the new parameterto the list in broadcast_input. This sends
the read values from the single processor that reads nimrod.in to all others. Be sure to use
another mpi_bcast or bcase_str with the same data type as an example.
Changes to dump ﬁles are made infrequently to maintain compatibility across code ver
sions to the greatest extent possible. However, data structures are written and read with
generic subroutines, which makes it easy to add ﬁelds. The overall layout of a dump ﬁle is
1. global information (dump_write and dump_read)
2. seams (dump_write_seam, dump_read_seam)
3. rblocks (* write rblock, * read rblock)
4. tblocks (* write tblock, * read tblock)
Within the rblock and tblock routines are calls to structurespeciﬁc subroutines. These
calls can be copied and modiﬁed to add new ﬁelds. [But, don’t expect compatibility with
other dump ﬁles.]
Other dump notes:
• The read routines allocate data structures to appropriate sizes just before a record is
read.
• All data is written as 8byte real to improve portability of the ﬁle while using FORTRAN
binary I/O.
• Only one processor reads and writes dump ﬁles. Data is transferred to other processors
via MPI communication coded in parallel_io.f.
• The NIMROD and NIMSET dump.f ﬁles are diﬀerent to avoid parallel data requirements
in NIMSET. Any changes must be made to both ﬁles.
52 CHAPTER II. FORTRAN IMPLEMENTATION
Chapter III
Parallelization
A Parallel communication introduction
For the present and forseeable future, ‘highperformance’ computing implies parallel comput
ing. This fact was readily apparent at the inception of the NIMROD project, so NIMROD
has always been developed as a parallel code. The distributed memory paradigm and MPI
communication were chosen for portability and eﬃciency. A well conceived implementation
(thanks largely to Steve Plimpton of Sandia) keeps most parallel communication in modular
coding, isolated from the physics model coding.
A detailed description of NIMROD’s parallel communication is beyond the scope of this
tutorial. However, a basic understanding of the domain decomposition is necessary for
successful development, and all developers will probably write at least one mpi_allreduce
call at some time.
While the idea of dividing a simulation into multiple processes is intuitive, the indepen
dence of the processes is not. When mpirun is used to launch a parallel simulation, it starts
multiple processes that execute NIMROD. The ﬁrst statements in the main program estab
lish communication between the diﬀerent processes, assign an integer label (node) to each
process, and tells each process the total number of processes. From this information and
that read from the input and dump ﬁles, processes determine what part of the simulation to
perform (in parallel_block_init). Each process then performs its part of the simulation
as if it were performing a serial computation, except when it encounters calls to communicate
with other processes. The program itself is responsible for orchestration without an active
director.
Before describing the diﬀerent types of domain decomposition used in NIMROD, lets
return to the find_bb routine in utilities.f for an MPI example. Each process is ﬁnding
¸
n
ˆ
b
n
(R, Z)
ˆ
b
n
(R, Z) for a potentially limited range of nvalues and grid blocks. What’s
needed in every process is the sum over all nvalues, i.e.
ˆ
b
ˆ
b, for each grid block assigned to a
53
54 CHAPTER III. PARALLELIZATION
process. This need is met by calling mpi_allreduce within grid_block do loops. Here an
array of data is sent by each process, and the MPI library routine sums the data element by
element with data from other processes that have diﬀerence Fourier components for the same
block. The resulting array is returned to all processes that participate in the communication.
Then, each process proceeds independently until its next MPI call.
B Grid block decomposition
The distributed memory aspect of domain decomposition implies that each processor needs
direct access to a limited amount of the simulation data. For grid block decomposition, this
is just a matter of having each processor choose a subset of the blocks. Within a layer
of Fourier components (described in III.C.), the block subsets are disjoint. Decomposition
of the data itself is facilitated by the FORTRAN 90 data structures. Each process has an rb
and/or tb block_type array but the sum of the dimensions is the number of elements in its
subset of blocks.
Glancing through finite_element.f, one sees doloops starting from 1 and ending at
nrbl, or starting from nrbl+1 and ending at nbl. Thus, the nrbl and nbl values are the
number of rblocks and rblocks+tblocks in a process. In fact, parallel simulations assign
new block index labels that are local to each process. The rb(ibl)%id and tb(ibl)%id
descriptors can be used to map a processor’s local block index (ibl) to the global index (id)
deﬁned in nimset. parallel_block_init sets up the inverse mapping global2local array
that is stored in pardata.[pardata has comment lines deﬁning the function of each of its
variables.] The global index limits, nrbl_total and nbl_total, are saved in the global
module.
Block to block communication between diﬀerent processes is handled within the edge_network
and edge_seg_network subroutines. These routines call parallel_seam_comm and related
routines instead of the serial edge_accumulate. In routines performing ﬁnite element or
linear algebra computations, cross block communication is invoked with one call, regardless
of parallel decomposition.
For those interested in greater details of the parallel block communication, the paral
lel_seam_init routine in parallel.f creates new data structures during startup. These
send and recv structures are analogous to seam structures, but the array at the highest level
is over processes with which the local process shares block border information.
The parallel_seam_comm routine (and its complex version) use MPI ‘point to point’
communication to exchange information. This means that each process issues separate ‘send’
and ‘receive’ request to every other process involved in the communication.
The general sequence of steps used in NIMROD for ‘point to point’ communications is
1. Issue a mpi_irecv to every process in the exchange, which indicates readiness to accept
data.
C. FOURIER “LAYER” DECOMPOSITION 55
2. Issue a mpi_send to every process involved which sends data from the local process.
3. Perform some serial work, like seaming among diﬀerent blocks of the local process to
avoid wasting CPU time while waiting for data to arrive.
4. Use mpi_waitany to verify that expected data from each participating process has
arrived. Once veriﬁcation is complete, the process can continue on to other operations.
Caution: calls to nonblocking communication (mpi_isend,mpi_irecv,...) seem to have
diﬃculty with buﬀers that are part of F90 structures when the mpich implementation is used.
The diﬃculty is overcome by passing the ﬁrst element, e.g.
CALL mpi_irecv(recv(irecv)%data(1),...)
instead of the entire array, %data,...
As a prelude to the next section, block border communication only involves processes
assigned to the same Fourier layer. This issue is addressed in the block2proc mapping
array. An example statement
inode=block2proc(ibl_global
assigns the node label for the process that owns global block number ibl_global  for the
same layer as the local process  to the variable inode.
Separating processes by some condition (e.g. same layer) is an important tool for NIM
ROD due to the two distinct types of decomposition. Process groups are established by
the mpi_comm_split routine. There are two in parallel_block_init. One creates a com
munication tag grouping all processes in a layer, comm_layer, and the other creates a tag
grouping all processes with the same subset of blocks, but diﬀerent layers, comm_mode. These
tags are particularly useful for collective communication within the groups of processes. The
find_bb mpi_allreduce was one example. Others appear after scalar products of vectors in
2D iterative solves: diﬀerent layers solve diﬀerent systems simultaneously. The sum across
blocks involves processes in the same layer only, hence calls to mpi_allreduce with the
comm_layer tag. Where all processes are involved, or where ‘point to point’ communication
(which references the default node index) is used, the mpi_comm_world tag appears.
C Fourier “layer” decomposition
By this point it’s probably evident that Fourier components are also separated into disjoint
subsets, called layers. The motivation is that a large part of the NIMROD algorithm involves
no interaction among diﬀerent Fourier components, As envisioned until recently, the matrices
would always be 2D, and Fourier component interaction would occur in explicit pseudo
spectral operations only. The new3D linear systems preserve layer decomposition by avoiding
explicit computation of matrix elements that are not diagonal in Fourier index.
56 CHAPTER III. PARALLELIZATION
il=0
n=0
n=1
n=2
il=1
Normal Decomp
il=0
il=1
iphi=1
iphi=2
Configuration!space Decomp.
Figure III.1: Normal Decomp: nmodes=2 on il=0 and nmodes=1 on il=1, nf=mx×my=9
Conﬁgspace Decomp: nphi=2
lphi
= 8, mpseudo=5 on il=0 and mpseudo=4 on il=1
Parameters are mx=my=3, nlayer=2, lphi=3
Apart from the basic mpi_allreduce exchanges, communication among layers is required
to multiply functions of toroidal angle (see B.1). Furthermore, the decomposition of the
domain is changed before an “inverse” FFT (going from Fourier coeﬃcients to values at
toroidal positions ) and after a “forward” FFT. The scheme lets us use serial FFT routines,
and it maintains a balanced workload among the processes, without redundancy.
For illustrative purposes, consider a poorly balanced choice of lphi= 3, 0 ≤ n ≤nmodes_total−1
with nmodes_total= 3, and nlayers= 2. (layers must be divisible by nlayers.) Since
pseudospectral operations are performed on block at a time, we can consider a singleblock
problem without loss of generality. (see Figure III.1)
Notes:
• The domain swap uses the collective mpi_alltoallv routine.
• Communication is carried out inside fft_nim, isolating it from the integrand routines.
D. GLOBALLINE PRECONDITIONING 57
• calls to fft_nim with nr = nf duplicate the entire block’s conﬁgspace data on every
layer.
D Globalline preconditioning
That illconditional matrices arise at large δt implies global propagation of perturbations
within a single time advance. Krylovspace iterative methods require very many iterations
to reach convergence in these conditions, unless the precondition step provides global “relax
ation”. The most eﬀective method we have found for cg on very illconditioned matrices is a
lineJacobi approach. We actually invert two approximate matrices and average the result.
They are deﬁned by eliminating oﬀdiagonal connections in the logical sdirection for one
approximation and by eliminating in the ndirection for the other. Each approximate system
˜
Az = r is solved by inverting 1D matrices,
To achieve global relaxation, lines are extended across block borders. Factoring and solv
ing these systems uses a domain swap, not unlike that used before pseudospectral compu
tations. However, pointtopoint communication is called (from parallel_line_* routines)
instead of collective communication. (see our 1999 APS poster, “Recent algorithmic ...” on
the web site.)
58 CHAPTER III. PARALLELIZATION
Bibliography
[1] N. A. Krall and Trivelpiece. Principles of Plasma Physics. San Francisco Press, 1986.
[2] G. Strang and G. J. Fix. An analysis of the ﬁnite element method. WesleyCambridge
Press, 1987.
[3] M. Abramowitz and I. A. Stegun, editors. Handbook of Mathematical Functions. Wash
ington, D. C.: National Bureau of Standards: For sale by the Supt. of Docs., U.S.
G.P.O., 1964, 1981.
[4] D. D. Schnack, D. C. Barnes, Z. Miki´c, Douglas S. Harned, and E. J. Caramana. Semi
implicit magnetohydrodynamic calculations. J. Comput. Phys., 70:330–354, 1987.
[5] K. Lerbinger and J. F. Luciani. A new semiimplicit approach for MHD computation.
J. Comput. Phys., 97:444, 1991.
[6] D. S. Harned and Z. Miki´c. Accurate semiimplicit treatment of the Hall eﬀect in MHD
computations. J. Comput. Phys., 83:1, 1989.
[7] R. Lionello, Z. Miki´c, and J. A. Linker. J. Comput. Phys., 152:346, 1999.
[8] B. Marder. A method for incorporating Gauss’ law into electromagnetic PIC codes. J.
Comput. Phys., 68:48, 1987.
[9] C. R. Sovinec, A. H. Glasser, T. A. Gianakon, D. C. Barnes, R. A. Nebel, S. E. Kruger,
D. D. Schnack, S. J. Plimpton, A. Tarditi, M. S. Chu, and the NIMROD Team. Non
linear magnetohydrodynamics simulation using highorder ﬁnite elements. J. Comput.
Phys., 195:355–386, 2004.
[10] C. Redwine. Upgrading to Fortran 90. Springer, 1995.
59
Contents
Contents Preface I Framework of NIMROD Kernel A Brief Reviews . . . . . . . . . . . . . . A.1 Equations . . . . . . . . . . . . A.2 Conditions of Interest . . . . . . A.3 Geometric Requirements . . . . B Spatial Discretization . . . . . . . . . . B.1 Fourier Representation . . . . . B.2 Finite Element Representation . B.3 Grid Blocks . . . . . . . . . . . B.4 Boundary Conditions . . . . . . C Temporal Discretization . . . . . . . . C.1 Semiimplicit Aspects . . . . . . C.2 Predictorcorrector Aspects . . C.3 Dissipative Terms . . . . . . . . C.4 · B Constraint . . . . . . . . C.5 Complete time advance . . . . . D Basic functions of the NIMROD kernel . . . . . . . . . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 1 3 3 3 6 6 7 7 8 13 13 13 14 15 15 15 16 18 19 19 19 19 22 23 24 28 29
II FORTRAN Implementation A Implementation map . . . . . . B Data storage and manipulation B.1 Solution Fields . . . . . B.2 Interpolation . . . . . . B.3 Vector_types . . . . . . B.4 Matrix_types . . . . . . B.5 Data Partitioning . . . . B.6 Seams . . . . . . . . . .
ii C Finite Element Computations . . . . C.1 Management Routines . . . . C.2 Finite_element Module . . . C.3 Numerical integration . . . . C.4 Integrands . . . . . . . . . . . C.5 Surface Computations . . . . C.6 Dirichlet boundary conditions C.7 Regularity conditions . . . . . Matrix Solution . . . . . . . . . . . . D.1 2D matrices . . . . . . . . . . D.2 3D matrices . . . . . . . . . . Startup Routines . . . . . . . . . . . Utilities . . . . . . . . . . . . . . . . Extrap_mod . . . . . . . . . . . . . . History Module hist_mod . . . . . . I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 34 36 36 39 43 43 45 46 46 47 48 49 50 50 50 53 53 54 55 57
D
E F G H I
III Parallelization A Parallel communication introduction B Grid block decomposition . . . . . . C Fourier “layer” decomposition . . . . D Globalline preconditioning . . . . . .
. n_int=4) .3 II. . . . . . . . . . . . . . . . . . . .5 II. . . . . . . . . . . . . . . . . . . . . 1D Cubic : extent of βi in red and βj in green. . .8 continuous linear basis function . .1 II. . . . . . . . . . . . . . . . dot denotes quadrature position. . . . Blockborder seaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 quad points. . . .6 II.7 II. . . . . . . . . . . mpseudo=5 on il=0 and mpseudo=4 on il=1 Parameters are mx=my=3. . . (n_side=2. . . . . . .4 II. . lphi=3 . . nlayer=2. 56 iii . . . . . . note j is an internal node . . . . . . . . .1 I. . .List of Figures I. . . integration gives joff= −1 1D Cubic : extent of αi in red and βj in green. input geom=‘toroidal’ & ‘linear’ schematic of leapfrog . . .2 II. . . . .4 II. . . . . . .1 Normal Decomp: nmodes=2 on il=0 and nmodes=1 on il=1. . . . . . 1D Cubic : extent of αi in red and αj in green. . . . . 9 9 10 17 22 25 26 26 30 31 37 38 Basic Element Example. . . . j are internal nodes iv=vertex index. . . . nf=mx×my=9 Conﬁgspace Decomp: nphi=2lphi = 8. . is=segment index. . . . . . . . . . . . . . . not node III. . . . 3 blocks with corresponding vertices . . . . note both i. . . .3 I. . continuous quadratic basis function . . . .2 I. nvert=count of segment and vertices . 4 elements in a block.
iv LIST OF FIGURES .
The presentation was intended to provide a substantive introduction to the FORTRAN coding used by the kernel.edu. For those using earlier versions of NIMROD (prior to and including 2.3. 1 .0 and data structures and array ordering are diﬀerent than what is described in Section B. we are already considering changes to the timeadvance that will alter the predictorcorrector approach discussed in Section C.wisc. It is the author’s hope that these notes will serve as a manual for the kernel for some time. For example.5. In addition.0. note that basis function ﬂexibility was added at 3.4).2 is the advance of temperature instead of pressure in Section C. This written version of the tutorial contains considerably more information than what was given in the three halfday sessions. the 3. so that attendees would learn enough to be able to further develop NIMROD or to modify it for their own applications.Preface The notes contained in this report were written for a tutorial given to members of the NIMROD Code Development Team and other users of NIMROD.0.2.3 version described here is not complete at this time. Some of the information will inevitably become dated as development continues. If you ﬁnd errors or have comments. but the only diﬀerence reﬂected in this report from version 3. please send them to me at sovinec@engr.
2 LIST OF FIGURES .
Chapter I Framework of NIMROD Kernel
A
A.1
Brief Reviews
Equations
NIMROD started from two generalpurpose PDE solvers, Proto and Proteus. Though remnants of these solvers are a small minority of the present code, most of NIMROD is modular and not speciﬁc to our applications. Therefore, what NIMROD solves can and does change, providing ﬂexibility to developers and users. Nonetheless, the algorithm implemented in NIMROD is tailored to solve ﬂuidbased models of fusion plasmas. The equations are the Maxwell equations without displacement current and the single ﬂuid form (see [1]) of velocity moments of the electron and ion distribution functions, neglecting terms of order me/mi smaller than other terms. Maxwell without displacement current: Ampere’s Law: µ0 J = Gauss’s Law: ·B= 0 Faraday’s Law: ∂B = − ×E ∂t Fluid Moments → Single Fluid Form Quasineutrality is assumed for the time and spatial scales of interest, i.e. • Zni ne → n 3 ×B
4
CHAPTER I. FRAMEWORK OF NIMROD KERNEL • e(Zni − ne)E is negligible compared to other forces • However ·E=0
Continuity: ∂n +V· ∂t Centerofmass Velocity Evolution: ρ • Π often set to ρν V • Kinetic eﬀects, e.g., neoclassical, would appear through Π Temperature Evolution (3.0.2 & earlier evolve p, pe) nα γ−1 ∂ + Vα · ∂t Tα = −pα · Vα − Πα : Vα − · qα + Qα ∂V + ρV · ∂t n = −n ·V
V = J×B−
p−
·Π
• qα = −nα χα · Tα in collisional limit • Qe includes ηJ 2 • α = i, e • pα = nα kTα Generalized Ohm’s (∼ electron velocity moment)(underbrace denotes input parameters, unless otherwise noted ohms=)
E=
−V × B ‘mhd’ ‘‘mhd & hall’ ‘2fl’
+ηJ
elecd>0
1 + J×B ne ‘hall’ ‘mhd & hall’ ‘2fl’
me + 2 ne
∂J e + ·Π ( pe + · (JV + VJ) − ) ∂t me neoclassical advect=‘all’ ‘2fl’
A steadystate part of the solution is separated from the solution ﬁelds in NIMROD. For resistive MHD for example, ∂/∂t → 0 implies
A. BRIEF REVIEWS 1. ρsVs · 2. 3. 4. Vs = Js × Bs − ps + · ρs ν V s
5
· (ns Vs) = 0 × Es =
ns γ−1 Vs
× (ηJs − Vs × Bs) = 0 Ts = −ps · Vs + · nsχ · Ts + Qs
·
Notes: • An equilibrium (GradShafranov) solution is a solution of 1 (Js × Bs = ps); 2 – 4
• Loading an ‘equilibrium’ into NIMROD means that 1 – 4 are assumed. This is appropriate and convenient in many cases, but it’s not appropriate when the ‘equilibrium’ already contains expected contributions from electromagnetic activity. • The steadystate part may be set to 0 or to the vacuum ﬁeld. Decomposing each ﬁeld into steady and evolving parts and canceling steady terms gives (for MHD) [A → As + A] ∂n + Vs · ∂t n+V· ns + V · n = −ns ·V−n · Vs − n ·V
(ρs + ρ)
∂V + ρ [(Vs + V) · ∂t
(Vs + V)] + ρs [Vs · p+
V+V·
Vs + V ·
V]
= Js × B + J × Bs + J × B − ∂B = ∂t
· ν [ρ Vs + ρs V + ρ V]
× (Vs × B + V × Bs + V × B − ηJ) ns [Vs · γ−1 ·V T]
(n + ns) ∂T n + [(Vs + V) · γ − 1 ∂t γ −1 = −ps + Notes: · [ns χ ·
(Ts + T )] + ·V−p
T +V·
Ts + V ·
T]
· Vs − p
T + nχ · Ts + nχ ·
• Poloidal cross section may be complicated. * All eﬀects are present in equations suﬃciently complete to model behavior in experiments. and linear terms tend to cancel. • Geometric symmetry of the toroidal direction was assumed. 2.2 Conditions of Interest The objective of the project is to simulate the electromagnetic behavior of magnetic conﬁnement fusion plasmas. Kinetic closures lead to integrodiﬀerential equations. • Global magnetic changes occur over the global resistive diﬀusion time. FRAMEWORK OF NIMROD KERNEL • The code further separates χ (and other dissipation coeﬃcients ?) into steady and evolving parts. General Notes: 1. decomposition adds to computations per step and code complexity. Collisional closures lead to local spatial derivatives only. • MHD waves propagate 104 – 1010 times faster. ∂ equations are the basis for the NIMROD timeadvance. It would also require complete source terms.3 Geometric Requirements The ability to accurately model the geometry of speciﬁc tokamak experiments has always been a requirement for the project.6 CHAPTER I. • The motivation is that timeevolving parts of ﬁelds can be orders of magnitude smaller than the steady part. ∂t A. so skipping decompositions would require tremendous accuracy in the large steady part. • Nonetheless. A. • The Team added: – Periodic linear geometry – Simplyconnected domains . • Topologychanging modes grow on intermediate time scales. Realistic conditions make the ﬂuid equations stiﬀ.
An . n > 0. • Linear computations ﬁnd An (t) (typically) for a single nvalue. 0 ≤ m ≤ 2N − 1 N • FFT returns Fourier coeﬃcients En. ‘lin_nmodes’) • Nonlinear simulations use FFTs to compute products on a uniform mesh in φ and z. 0 ≤ n ≤ N • FFT gives V m • Multiply (colocation) E m π N = −V m π N ×B m π . 0 ≤ m ≤ 2N − 1 N π N .1 Spatial Discretization Fourier Representation A pseudospectral method is used to represent the periodic direction of the domain. and the real coeﬃcient A0 . • Input parameter ‘lphi’ determines 2N. (Input ‘lin_nmax’.B. Example of nonlinear product: E = −V × B • Start with Vn. so we solve for the complex coeﬃcients. A truncated Fourier series yields a set of coeﬃcients suitable for numerical computation: N A (φ) = A0 + n=1 An einφ + A∗ e−inφ n or N A (z) = A0 + n=1 An ei 2πn z L + A∗ e−i n 2πn z L All physical ﬁelds are real functions of space. SPATIAL DISCRETIZATION 7 B B. 0 ≤ n ≤ N Standard FFTs require 2N to be a power of 2. B m π . 2N = 2lphi . Bn.
Substituting a truncated series for f (φ) changes the PDE problem into a search for the best solution on a restricted solution space. Orthogonality of basis functions is used to ﬁnd a system of discrete equations for the coeﬃcients.c. • One can show that setting Vn. n=1 dφe−in φ for 0 ≤ n ≤ N pol × En + ˆ in φ × En R . and the discrete equations describe the evolution of the coeﬃcients of the basis functions. FRAMEWORK OF NIMROD KERNEL • Collocation is equivalent to spectral convolution if aliasing errors are prevented. e. = 0 for n > 2N 3 prevents aliasing from quadratic 2lphi 3 nonlinearities (products of two functions). Both are basis function expansions that impose a restriction on the solution space.8 CHAPTER I. 0≤n ≤N B.c. etc. Bn. 2. 0 x−xi−1 0 x < xi−1 xi−1 ≤ x ≤ xi xi ≤ x ≤ xi+1 x > xi+1 αi (x) = xi −xi−1 xi+1 −x xi+1 −xi . Faraday’s law ∂B (φ) = − × E (φ) ∂t ∂ Bneinφ + c.2 Finite Element Representation Finite elements achieve discretization in a manner similar to the Fourier series. The basis functions of ﬁnite elements are loworder polynomials over limited regions of the domain.g. nmodes total = +1 • Aliasing does not prevent numerical convergence if the expansion converges. = − B0 + ∂t n=1 apply 2π ∂Bn =− ∂t dφ N N × E0 + En einφ + c. As a prelude to the ﬁnite element discretization. Fourier expansions: 1. the restricted spaces become better approximations of the continuous solution space as more basis functions are used.
B. Z) = j Aj αj (R. • Terms in the weak form must be integrable after any integrationbyparts.1: continuous linear basis function !i(x) 1 1 "i(x) x i!1 xi x i+1 x i!1 xi x i+1 Figure I. Using the restricted space implies that we seek solutions of the form A (R.2: continuous quadratic basis function 0 (x−x )(x−x 1 ) x < xi−1 i−1 i− (x −x )(x −x 21 ) xi−1 ≤ x ≤ xi i i−1 i 2 (x−xi+1 )(x−xi+ 1 ) 2 2 i− αi(x) = 0 (xi −xi+1 )(xi −xi+ 1 ) xi ≤ x ≤ xi+1 x > xi+1 βi(x) = 0 (x−xi−1 )(x−xi ) (xi− 1 −xi−1 )(xi− 1 −xi ) 2 2 xi−1 ≤ x ≤ xi otherwise In a conforming approximation. but derivatives need to be piecewise continuous only. but delta functions are not. Z) . step functions are admissible. continuity requirements for the restricted space are the same as those for the solution space for an analytic variational form of the problem. • Our ﬂuid equations require a continuous solution space in this sense. SPATIAL DISCRETIZATION !i(x) 1 9 x i!1 xi x i+1 Figure I.
The present version of NIMROD allows the user to choose the polynomial degree of the basis functions (poly_degree) and the mapping (‘met_spl’).10 CHAPTER I. Z. lowest order polynomials of physical coordinates. (See [2] for analysis of selfadjoint problems.e.3: input geom=‘toroidal’ & ‘linear’ For toroidal geometries. we must use the fact that eR.2)] • 2D basis functions are formed as products of 1D basis functions for R and Z. i. For scalar ﬁelds. All recent versions of NIMROD have equations expressed in (R. FRAMEWORK OF NIMROD KERNEL • The jsummation is over the nodes of the space. For vector ﬁelds. z eZ eR e# y ey ex ez x z Figure I. [αj &βj in 1D quadratic example (Figure I. Z) for linear geometries. • αj (R. Y. Numerical analysis shows that a correct application of the ﬁnite element approach ties convergence of the numerical solution to the best possible approximation by the restricted space. ˆ e . Nonuniform meshing can be handled by a mapping from logical coordinates to physical coordinates. we now have enough basis functions. we also need direction vectors. To preserve convergence properties. the order of the mapping should be no higher than polynomial degree of the basis functions[2]. eφ are functions of φ. φ) coordinates for toroidal geometries and (X.) Error bounds follow from the error bounds of a truncated Taylor series expansion. but we always ˆ ˆ have ei · ˆj = δij . Z) here may represent diﬀerent polynomials.
and make n sums from 0 to N. Z. SPATIAL DISCRETIZATION • Expanding scalars N 11 S (R.c. their ﬁnite extent means that dRdZD s (αi (R. φ. 0 ≤ n ≤ N ¯ • Integrate by parts to symmetrize the diﬀusion term and to avoid inadmissible derivatives of our solution space. and using E to represent the ideal −V ×B ﬁeld (after pseudospectral calculations): • Expand B (R. t) • Apply dRdZdφ αj l (R. φ) → αj (R. we create a weak form of the PDEs by multiplying by a conjugated basis function. Z)) D t (αj (R.c. φ) → • Expanding vectors αj (R. only where nodes i and j are in the same element (inclusive of element borders). φ. Z. l {R. Z. φ} . where D s is a spatial derivative operator of allowable order s (here 0 or 1).c. ˆ While our poloidal basis functions do not have orthogonality properties like einφ and eR . Returning to Faraday’s law for resistive MHD. Z) el Vj l0 + ˆ j l n=1 Vj lneinφ + c. Z) e−in φ for j = 1 : # nodes. Z)) = 0. t) and E (R. N V (R. Z) Sj 0 + j n=1 Sj neinφ + c.B. ¯ ˆ where N here is nmodes_total−1. Integrands of this form arise from a procedure analogous to that used with Fourier series alone. Z. Z. * For convenience αj l ≡ αj el and I’ll suppress c. αj l e−in φ · ¯ dRdZdφ ∂Bj ln αj l einφ ¯ ∂t dRdZdφ =− η µ0 Bj ln j ln j ln × αj l e−in φ · ¯ Ej ln αj leinφ ¯ × αj l einφ + ¯ j ln .
• The rhs is left as a set of integrals over functions of space. l ) and column (j.e. • {Bj ln } constitutes the solution vector at an advanced time (see C). using the same functions for test and basis functions (Galerkin) leads to the same equations as minimizing the corresponding variational [2]. • The lhs is written as a matrixvector product with the dRdZ f (¯j l ) g (¯j l ) integrals α α deﬁning a matrix element for row (j . and rearranging the lhs. n } Notes on this weak form of Faraday’s law: • Surface integrals for interelement boundaries cancel in conforming approximations. . These elements are diagonal in n. η • For selfadjoint operators. RayleighRitzGalerkin problem. FRAMEWORK OF NIMROD KERNEL + αj l e−in φ · ¯ j ln Ej ln αj l einφ × dS ¯ Using orthogonality to simplify. l). • The remaining surface integral is at the domain boundary. such as I+∆t × µ0 ×.12 CHAPTER I. l . i. 2π j ∂Bj l n ∂t η µ0 dRdZ αj αj + jl Bj ln dRdZdφ × αj l e−in φ · ¯ × αj lein φ ¯ Ej l αj l ein φ ¯ =− dRdZdφ × αj l e−in φ · ¯ jl + αj l · ¯ jl Ej ln αj l × dS ¯ for all {j .
The solution ﬁeld is marched in a sequence of timesteps from initial conditions. Both may be used in the same simulation. as in ﬁnite diﬀerence or ﬁnite volume methods. In NIMROD we simply substitute the Dirichlet boundary condition equations in the linear system at the appropriate rows for the boundary nodes. ξi }. but blocks don’t have to conform. The integrals are computed numerically with Gaussian quadrature 0 f (ξ) dξ → i wif (ξi ) where {wi . Neuman conditions are not set explicitly. i = 1. . η are logical coordinates. 1 dRdZ → J dξdη. Boundary conditions can be modiﬁed for the needs of particular applications. The variational aspects enforce Neuman conditions . 3.through the appearance (or nonappearance) of surface integrals . ng are prescribed by the quadrature method [3]. Formally. ∂α ∂R ∂ξ ∂η → ∂α ∂R + ∂α ∂R and similarly for Z. Unstructured blocks of triangular elements are called ‘tblocks’. More notes on weak forms: • The ﬁnite extent of the basis functions leads to sparse matrices • The mapping for nonuniform meshing has not been expressed explicitly. It entails: 1. Dirichlet conditions are also imposed on Tα . They are called ‘rblocks’.C. . . NIMROD has two types of blocks: Logically rectangular blocks of quadrilateral elements are structured. TEMPORAL DISCRETIZATION 13 B. If thermal conduction is used. where α is a low order polynomial of ξ ∂ξ ∂η and η with uniform meshing in (ξ. and J is the Jacobian. C Temporal Discretization NIMROD uses ﬁnite diﬀerence methods to discretize time. Dirichlet conditions are enforced on the restricted solution space (“essential conditions”). B.in the limit of inﬁnite spatial resolution ([2] has an excellent description). The kernel requires conformance of elements along block boundaries. 2.4 Boundary Conditions Dirichlet conditions are normally enforced for Bnormal and V. the poloidal mesh can be constructed from multiple gridblocks. . however. η).3 Grid Blocks For geometric ﬂexibility and parallel domain decomposition. where ξ. These “natural” conditions preserve the spatial convergence rate without using ghost cells.
This may lead to ‘the ultimate’ timeadvance scheme. we have chosen to take advantage of the smallness of the solution ﬁeld with respect to steady parts → the dominant stiﬀness arises in linear terms. FRAMEWORK OF NIMROD KERNEL C. Truncation errors are purely dispersive. the semiimplicit method as applied to hyperbolic systems of PDEs can be described as a leapfrog approach that uses a selfadjoint diﬀerential operator to extend the range of numerical stability to arbitrarily large ∆t. • Anisotropic terms use steady + (n = 0) ﬁelds → operators are diagonal in n. the semiimplicit method does not introduce numerical dissipation. Hall terms must be used with great care until further development is pursued. (See [4.1 Semiimplicit Aspects Without going into an analysis. then with the Hall electric ﬁeld. low dissipation systems like plasmas. It is the tool we use to address the stiﬀness of the ﬂuid model as applied to fusion plasmas. BMHD = Bn − ∆t Bn+1 = BMHD − ∆t × EMHD × EHall Though it precludes 2 order temporal convergence. requiring nonlinear system solves at each time step. this splitting reduces errors of large timestep [6]. Why not an implicit approach? A true implicit method for nonlinear systems converges on nonlinear terms at advanced times. the operators lead to linear matrix equations. and min ωk  0. 5].) Though any selfadjoint operator can be used to achieve numerical stability. At present. • Leads to very illconditioned matrices. nd . Like the leapfrog method. when the Hall electric ﬁeld is included in a computation. NIMROD also uses timesplitting. This aspect makes semiimplicit methods attractive for stiﬀ. The operator used in NIMROD to stabilize the MHD advance is the linear idealMHD force operator (no ﬂow) plus a Laplacian with a relatively small coeﬃcient [5]. B is ﬁrst advanced with the MHD electric ﬁeld. • We have run tokamak simulations accurately at max ωk ∆t ∼ 104 – 105. A separate semiimplicit operator is then used in the Hall advance of B. the operator implemented in NIMROD often gives inaccuracies at ∆t much larger than the explicit stability limit for a predictor/corrector advance of B due to EHall .14 CHAPTER I. accuracy at large ∆t is determined by how well the operator represents the physical behavior of the system [4]. For example. • Eigenvalues for the MHD operator are ∼ 1 + ω2∆t2 where ωk are the frequencies of the k MHD waves supported by the spatial discretization. However. Although the semiimplicit approach eliminates timestep restrictions from wave propagation.
f in the nimset directory. where C is O (1) and depends on centering. so we set centerings to 1/2 + δ and rely on having a little dissipation. even for Vs · terms. • Nonlinear and linear computations require two solves per equation.1) ∂ ·B = ∂t ·κ ·B ( · B) (I. We typically use fully forward centering. • Synergistic waveﬂow eﬀects require having the semiimplicit operator in the predictor step and centering waveproducing terms in the corrector step when there are no dissipation terms [7] • NIMROD does not have separate centering parameters for wave and advective terms. A predictorcorrector approach can be made numerically stable. the error will grow until it crashes the simulation.] C. TEMPORAL DISCRETIZATION 15 C. V [See input.4 · B Constraint Like many MHD codes capable of using nonuniform. • Numerical stability still requires ∆t ≤ C ∆x .2) dS · B[9]. κ ·B ∂B =− ∂t Applying · gives ×E+κ ·B ·B (I.C. C. nonorthogonal meshes. so that the error is diﬀused if the boundary conditions maintain constant .3 Dissipative Terms Dissipation terms are coded with timecenterings speciﬁed through input parameters.2 Predictorcorrector Aspects The semiimplicit approach does not ensure numerical stability for advection. · B is not identically 0 in NIMROD. We have used an errordiﬀusion approach [8] that adds the unphysical diﬀusive term · B to Faraday’s law. Left uncontrolled.
V∗ = Vn + fv (∆V)predict . 1+ = (∆t) me eta × −(∆t)f ·B κ ·B + (∆t)fη · (∆B)pass 2 ne µ0 n 1 · Πeneo + (∆t)κ ·B × Vn+1 × B∗ − × Bn + · Bn µ0 ne × n∗ + n∗ · Vn+1 • pass = predict → B∗ = Bn • pass = correct → B∗ = Bn + fb (∆B)predict • BMHD = Bn + ∆Bcorrect 1+ × me B0  × −∆tf ·B κ ·B + Csb (∆t)2 · (∆B)pass b 2 ne ne 1 ∗ me (J × B∗ − pn) + 2 · (J∗ Vn+1 + Vn+1 J∗ ) × e ne ne · BMHD +(∆t)b κ ·B = −(∆t)b . • For pass = correct. Cs∗ ’s are semiimplicit coeﬃcients.0. V∗ = Vn.5 Complete time advance A complete timestep in version 3. FRAMEWORK OF NIMROD KERNEL C.16 CHAPTER I. ρn − ∆tfν · ρn ν Csm ∆t2 [B0 × × × (B0 × I·) − J0 × × (B0 × I·)] 4 Csn∆t2 2 Csp ∆t2 (∆V)pass [ γP0 · + [ (p0 ) · I·]] − − 4 4 = −∆t [ρnV∗ · V∗ + Jn × Bn − pn + · ρn ν Vn + · Πneo ] − • For pass = predict.3 will solve coeﬃcients of the basis functions used in the following timediscrete equations. and • Vn+1 = Vn + (∆V)correct (∆n)pass = −∆t Vn+1 · • pass = predict → n∗ = nn • pass = correct → n∗ = nn + fn (∆n)predict • nn+1 = nn + (∆n)correct Note: f∗ ’s are centering coeﬃcients.
.C. B∗ = BMHD + (∆B)predict • Bn+1 = BMHD + (∆B)correct nn+1 + (∆t)fχ γ−1 ∗ Tα + pn α 17 · nn+1 χα · · nn+1 χα · (∆Tα )pass n Tα + Q = −∆t nn+1 n+1 · V γ −1 α · Vn+1 − α ∗ n • pass = predict → Tα = Tα ∗ n • pass = correct → Tα = Tα + fT (∆Tα )predict n+1 n • Tα = Tα + (∆Tα )correct • α = e.3 • Without the Hall timesplit. i Notes on timeadvance in 3.T} n V n+1/2 t Figure I. TEMPORAL DISCRETIZATION • pass = predict → (∆t)b = ∆t/2.T} n!1 n!1/2 {B. • Fields used in semiimplicit operators are updated only after changing more than a set level to reduce unnecessary compution of the matrix elements (which are costly).4: schematic of leapfrog • The diﬀerent centering of the Hall time split is required for numerical stability (see Nebel notes). approximate timecentered leapfrogging could be achieved 1 n n + nn+1 in the temperature equation by using 2 V {B. B∗ = BMHD • pass = correct → (∆t)b = ∆t.n. ∂J electron inertia term necessarily appears in both timesplits for B when it is • the ∂t included in E.n.0.
Output • Write global. and initial ﬁelds from dump ﬁle • Allocate storage for work arrays. as needed (b) Advance each equation • Allocate rhs and solution vector storage • Extrapolate in time previous solutions to provide an initial guess for the linear solve(s). Startup • Initialize parallel communication as needed • Read grid. • Update matrix and preconditioner if needed • Create rhs vector • Solve linear system(s) • Reinforce Dirichlet conditions • Complete regularity conditions if needed • Update stored solution • Deallocate temporary storage 3. Check stop condition • Problems such as lack of iterative solver convergence can stop simulations at any point . matrices. equilibrium.18 CHAPTER I. Time Advance (a) Preliminary Computations • Revise ∆t to satisfy CFL if needed • Check how much ﬁelds have changed since matrices were last updated. FRAMEWORK OF NIMROD KERNEL D Basic functions of the NIMROD kernel • Read input parameters 1. Set ﬂags if matrices need to be recomputed • Update neoclassical coeﬃcients. energy. voltages. and probe diagnostics to binary and/or text ﬁles at desired frequency in timestep (nhist) • Write complete dump ﬁles at desired frequency in timestep (ndump) 4. etc. etc. 2.
Most of the operations are repeated for each advance. so FORTRAN module grouping of subprograms on a level helps orchestrate the hierarchy. Use of KIND when deﬁning data settles platformspeciﬁc issues of precision.Chapter II FORTRAN Implementation Preliminary remarks As implied in the Framework section. the NIMROD kernel must complete a variety of tasks each time step.) A Implementation map All computationally intensive operations arise during either 1) the ﬁnite element calculations to ﬁnd rhs vectors and matrices. The data and operation needs of subprograms at a level in the hierarchy tend to be similar. and 2) solution of the linear systems. and equationspeciﬁc operations occur at only two levels in a hierarchy of subprograms for 1). 19 . Array syntax leads to cleaner code. Modularity has been essential in keeping the code manageable as the physics model has grown in complexity. FORTRAN 90 modules organize diﬀerent types of data and subprograms. This information is stored in FORTRAN 90 structures that are located in the fields module.1 Data storage and manipulation Solution Fields The data of a NIMROD simulation are the sets of basis functions coeﬃcients for each solution ﬁeld plus those for the grid and steadystate ﬁelds. B B. FORTRAN 90 extensions are also an important ingredient. ([10] is recommended. Arrays of data structures provide a clean way to store similar sets of data that have diﬀering numbers of elements.
2. all caps indicate a module name.20 Implementation Map of Primary NIMROD Operations ·adv * routines manage an advance of one ﬁeld rr rr j 6 6 ¨¨ % ¨ E ¢ gg gg c ¢ g g ¢ ITER CG F90 g g ¢ c g g perform real 2D matrix ¢ c g ¨ g ¢ RBLOCK(TBLOCK) ¨ solves SURFACE g ¢ ¨¨ g ·rblock make matrix c g g ¢ ·surface comp rhs ¨¨ g g ·rblock get rhs ¢ ¨¨ ITER CG COMP perform numerical ¨ g g ¢ Perform numerical integration in g g perform complex 2D matrix surface integrals g g rblocks c g g solves c EDGE c g g INTEGRANDS g g NEOCLASS INT ·edge network g g Perform most equaperform computations carry out communication g g FINITE ELEMENT BOUNDARY ' ·matrix create essential conditions ·get rhs ' organize f. FORTRAN IMPLEMENTATION NOTE:1.e.‘·’ indicates a subroutine name or interface to module procedure . computations REGULARITY ITER 3D CG MOD perform 3D matrix solves tion speciﬁc pointwise computations across block borders c c for neoclassicalspeciﬁc equations c § c c c ¥ FFT MOD C ' SURFACE INTS perform surfacespeciﬁc computations VECTOR TYPE MOD perform ops on vector coeﬃcients c c c MATH TRAN do a routine math operation GENERIC EVALS interpolate or ﬁnd storage for data at quadrature points MATRIX MOD compute 2D matrixvector products CHAPTER II.
dalpdr. A lagr_quad_type structure holds the coeﬃcients and information for evaluating the ﬁeld at any (ξ. This is a structure holding information such as the celldimensions of the block(mx. DATA STORAGE AND MANIPULATION 21 • rb is a 1D pointer array of deﬁnedtype rblock_type. and two sets of structures for each solution ﬁeld: 1. along with the interpolation routines that use them. arrays of basis function information at Gaussian quadrature nodes (alpha. just a pointer array. These structures hold pointer arrays that are used as pointers (as opposed to using them as ALLOCATABLE arrays in structures). An rb_comp_qp_type. The lagr_quad_types are deﬁned in the lagr_quad_mod module. • The type rblock_type (analogy:integer). The main storage locations are the fs. fsh. Here. fsv.xg. They are allocated 1:nbl. • fs(iv. and fsi arrays. is deﬁned in module rblock_type_mod. 2. [The tri_linear_types are the analogous structures for triangular elements in the tri_linear module. The elementindependent structure(vector_types) are used in linear algebra operations with the coeﬃcients. • tb is a 1D pointer array. there are also 1D pointer arrays of cvector_type and vector_type for each solution and steadystate ﬁeld. (See [10]) In general. where nrbl is the number of rblocks.] The lagr_quad_2D_type is used for φindependent ﬁelds. There are pointer assignments from these arrays to the lagr_quad_type structure arrays that have allocated storage space.B.yg). numerical integration information(wg. it is dimensioned 1:nrbl. let’s backup to the fields module level. here) are used in conjunction with the basis functions (as occurs in getting ﬁeld values at quadrature point locations). containing array dimensions and character variables for names. There are also some arrays used for interpolation. These structures are used to give blocktypeindependent access to coeﬃcients.im) → coeﬃcients for gridvertex nodes.iy. η).ix.dalpdz). Before delving into the elementspeciﬁc structures. respectively. the elementspeciﬁc structures(lagr_quad_type. At the start of a simulation.my). allocated to save the interpolates of the solution spaces at the numerical quadrature points. similar to rb but of type tblock_type_mod. dimensioned nrbl+1:nbl so that tblocks have numbers distinct from the rblocks. The structures are selfdescribing. – iv = direction vector component (l:nqty) – ix = horizontal logical coordinate index (0:mx) – iy = vertical logical coordinate index (0:my) .
ix.1.:) fsh(:.ix.ix!1.ix.η ) and αi(ξ .:) fs(:.:) fsh(:. since interpolations are needed to carry out the numerical integrations. For both real and complex data structures.4.2 Interpolation The lagr_quad_mod module also contains subroutines to carry out the interpolation of ﬁelds to arbitrary positions.ix. • fsi(iv.2.iy!1. there are single point evaluation routines (*_eval) and routines for ﬁnding the ﬁeld at the same oﬀset in every cell of a block ∂ ∂ αi (ξ .1.ib.:) fsv(:.2.ix.im) → coeﬃcients for vertical side nodes.iy.η ) (*_all_eval routines).:) fsh(:.iy. This is a critical aspect of the ﬁnite element algorithm.ix.2. All routines start by ﬁnding αi(ξ .iy.ix.iy.iy!1.:) fsi(:.1: Basic Element Example.iy.:) fs(:.ix!1.:) Figure II.2.2.iy.:) fsi(:.:) fsi(:.ix.ix.1.ix.im) → coeﬃcients for internal nodes [ib limit is (1:n_int)] fs(:.iy.:) fsv(:.ix.iy.ix.1.iy.iy.:) fsv(:.:) fsv(:.iy.:) fs(:. (n_side=2.ib.iy!1. n_int=4) B.iy.iy.iy!1. – iv = direction vector component (*:*) – ib = basis index (1:n_side) – ix = horizontal logical coordinate index (1:mx) – iy = vertical logical coordinate index (0:my) – im = Fourier component (“mode”) index (1:nfour) • fsv(iv.iy.ix!1.η ) [and ∂ξ ∂η .ix. FORTRAN IMPLEMENTATION – im = Fourier component (“mode”) index (1:nfour) • fsh(iv.iy.3.im) → coeﬃcients for horizontal side nodes.ix.1.22 CHAPTER II.:) fsh(:.:) fsi(:.ib.ix.ix!1.
dalpdr. I won’t cover tblocks in any detail.] The interpolation routines then multiply basis function values with respective coeﬃcients. So we have • CALL vector_assign_cvec(vectr(ibl). DATA STORAGE AND MANIPULATION 23 if requested] for the (element–independent) basis function values at the requested oﬀset (ξ . stored in the rb%alpha. putting block and basisspeciﬁc array idiosyncrasies into module routines. and Re{Bφ }. we need to create a linear combination of the old ﬁeld and the predicted solution. However. The data structure. development for released versions must be suitable for simulations with tblocks.] Vector_type structures simplify these operations. There is a loop over Fourier components that calls two real 2D matrix solves per component. After a solve in a predictor step (b_pass=bmhd pre or bhall pre). and fy. Interpolation operations are also needed in tblocks. One solves α for Re{BR}. Im{BZ }.f for an example of a single point interpolation call. 3D (complex) vector_type • work1 (1:nbl) → pointers to storage that will be B∗ in corrector step. fx.B. [It’s also used at startup to ﬁnd the basis function information at quadrature point locations. The developer writes one call to a subroutine interface in a loop over all blocks. dalpdz arrays for use as test functions in the ﬁnite element computations. be(ibl). flag. and the other solves for Im{BR}. [Management routines. The lagr_quad_bases routine is used for this operation. The computations are analogous but somewhat more complicated due to the unstructured nature of these blocks. Results for the interpolation are returned in the structure arrays f. and they need arrays for the returned values and logical derivatives in each cell. logical positions for the interpolation point. η ).3 Vector_types We often perform linear algebra operations on vectors of coeﬃcients. and −Im{Bφ} for the Fcomp. See generic_3D_all_eval for an example of an all_eval call. At this point. B. and desired derivative order are passed into the singlepoint interpolation routine. See routine struct_set_lay in diagnose. imode) . etc. Re{BZ }. 3D vector_type pre n What we want is B∗ imode = fb Bimode + (1 − fb )Bimode . iterative solves. 2D real vector_type • sln (1:nbl) → holds linear system solution. The all_eval routines need logical oﬀsets (from the lowerleft corner of each cell) instead of logical coordinates. • vectr (1:nbl) → work space. since they are due for a substantial upgrade to allow arbitrary polynomial degree basis functions. also a 2D vector_type • be (1:nbl) → pointers to Bn coeﬃcients. Consider an example from the adv_b_iso management routine. like rblocks.
Here we interface vector_add_vec.e. so we can write statements like work1(ibl) = be(ibl) be(ibl) = 0 if arrays conform which perform the assignment operations component array to component array or scalar to each component array element. • CALL vector_add(sln(ibl).f ﬁle. The 6D matrix is unimaginatively named arr. Re{BZ }.f ﬁle. Thus. each lowestlevel array contains all matrix elements that are multiplied with a gridvertex. so there are no Fourier component indices at this level. imode) – Transfer linear combination to appropriate vector components and Fourier component of work1. Going back to the end of the vector_type_mode. and arri respectively) i. Referencing one element appears as . arrh.4 Matrix_types The matrix_type_mod module has deﬁnitions for structures used to store matrix elements and for structures that store data for preconditioning iterative solves. verticalside. Examining vector_type_mod at the end of the vector_type_mode. horizontalside. FORTRAN IMPLEMENTATION – Transfers the flagindicated components such as r12mi3 for Re{BR}. we have arrays for matrix elements that will be used in matrixvector product routines. The necessity of such complication arises from varying array sizes associated with diﬀerent blocks and basis types. and our basis functions do not produce matrix couplings from the interior of one grid block to the interior of another. B. • CALL cvector_assign_vec(work1(ibl). vectr(ibl). This subroutine multiplies each coeﬃcient array and sums them. and coding for coeﬃcients in diﬀerent arrays is removed from the calling subroutine. note that ‘=’ is overloaded. The matrices are 2D. Furthermore. as determined by the type deﬁnitions of the passed arrays. all coeﬃcients for a single “basistype” within a block. and −Im{Bφ} for imode to the 2D vector. array ordering has been chosen to optimize these product routines which are called during every iteration of a conjugate gradient solve. or cellinterior vector type array (arr. arrv. Note that we did not have to do anything diﬀerent for tblocks. sln(ibl). v2fac=‘1center’) – Adds ‘1center’*vectr to ‘center’*sln and saves results in sln. At the bottom of the structure. flag. The structures are somewhat complicated in that there are levels containing pointer arrays of other structures.24 CHAPTER II. v1fac=‘center’. ‘vector_add is deﬁned as an interface label to three subroutines that perform the linear combination for each vector_type.
ix.jyoff. Though oﬀset requirements are the same. while vertices are numbered from 0. horizontalside. The oﬀset range is set by the extent of the corresponding basis functions. DATA STORAGE AND MANIPULATION arr(jq. Finally. The ‘outerproduct’ nature implies unique oﬀset ranges for gridvertex. j*off=0 for celltocell.jxoff. The column coordinates are simply jx = ix + jxoff jy = iy + jyoff For each dimension.iy) where iq ix iy jq jxoff jyoff = = = = = = ‘quantity’ row index horizontalcoordinate row index verticalcoordinate row index ‘quantity’ column index horizontalcoordinate column oﬀset verticalcoordinate column oﬀset 25 For gridvertex to gridvertex connections 0 ≤ ix ≤ mx and 0 ≤ iy ≤ my. giving dense storage for a sparse matrix without indirect addressing. with storage zeroed out where the oﬀset extends beyond the block border. integration gives joff= −1 Oﬀsets are more limited for basis functions that have their nonzero node in a cell interior. and cellinterior centered coeﬃcients.iq. such as the two sketched here. since cells are numbered from 1. we also need to distinguish diﬀerent bases of the same type. . vertex to vertex oﬀsets are −1 ≤ j ∗ off ≤ 1. AZ . Integration here gives joff=1 matrix elements. (simply 1 : 3 for AR . The structuring of rblocks permits oﬀset storage for the remaining indices. Aφ or just 1 for a scalar).B.2: 1D Cubic : extent of αi in red and αj in green. iq and jq correspond to the vector index dimension of the vector type. Thus for gridvertextocell connections −1 ≤ jxoff ≤ 0. i j Figure II. verticalside. while for celltogridvertex connections 0 ≤ jxoff ≤ 1.
1 ≤ iq ≤ 9 Storing separate arrays for matrix elements connecting each of the basis types therefore lets us have contiguous storage for nonzero elements despite diﬀerent oﬀset and basis dimensions.26 CHAPTER II. note both i. j are internal nodes For lagr_quad_type storage and vector_type pointers. .4: 1D Cubic : extent of βi in red and βj in green. 1 ≤ iq ≤ 2 • Biquartic cell interior to horizontal side for B 1 ≤ jq ≤ 27.3: 1D Cubic : extent of αi in red and βj in green. note j is an internal node i j Figure II. FORTRAN IMPLEMENTATION i j Figure II. Examples: • Bicubic grid vertex to vertical side for P jq=1. For matrixvector multiplication it is convenient and more eﬃcient to collapse the vector and basis indices into a single index with the vector index running faster. this is accomplished with an extra array dimension for side and interior centered data.
nb_type(1:4) is equal to the number of bases for each type (poly_degree1 for types 23 and (poly_degree − 1)2 for type 4). One of the matrix element arrays is then referenced as mat(jb. All of this is placed in a structure. so that we can have on array of rbl_mat’s with one component component per grid block. while isotropic operators use the same real matrix for two sets of real and imaginary components.B. rbl_mat. The storage is located in the matrix_storage_mod module. DATA STORAGE AND MANIPULATION 27 Instead of giving the arrays 16 diﬀerent names.B. only matrix_type deﬁnitions have been discussed. A similar tbl_mat structure is also set at this level. Collecting all block_mats in the global_matrix_type allows one to deﬁne an array of these structures with one array component for each Fourier component. So far.e. i. The numeral coding is 1≡gridvertex type 2≡horizontalvertex type 3≡verticalvertex type 4≡cellinterior type Bilinear elements are an exception. and nq_type is equal to the quantityindex dimension for each type. . each equation has its own matrix storage that holds matrix elements from timestep to timestep with updates as needed. starting horizontal and vertical coordinates (=0 or 1) for each of the diﬀerent types. This is a 1D array since the only stored matrices are diagonal in the Fourier component index. Anisotropic operators require complex arrays. NIMROD has several options. 2D.3. we place arr in a structure contained by a 4 × 4 array mat. 1 ≤ jb ≤ 4. For most cases. and each has its own storage needs. There are only gridvertex bases.ib)%arr where ib=basistype row index jb=basistype column index and 1 ≤ ib ≤ 4. matrix_type_mod also contains deﬁnitions of structures used to hold data for the preconditioning step of the iterative solves. This level of the matrix structure also contains selfdescriptions. such as nbtype=1 or 4. as in the B advance example in II. along with selfdescription. so mat is a 1 × 1 array.
reducing the number of unknowns in a linear system prior to calling an iterative solve. NIMROD’s kernel in isolation is a complicated code. if A11 − A12 A−1 A21 has the same structure as A11 .developed. The matelim routines serve a related purpose . However. If A11 is sparse but A11 −A12 A−1 A21 is not. The operations were initially written into one subroutine with adjustable oﬀsets and starting indices. it . What’s there now is ugly but faster due to having ﬁxed index limits for each basis type resulting in better compiler optimization.1) where Aij are submatrices. partitioning may slow the 22 computation. due to the objectives of the project and chosen algorithm. B. so that Ax is found through operations separated by block. Elimination is therefore used for 2D linear systems when poly_degree> 1. For 22 cellinterior to cellinterior submatrices. Development isn’t trivial. The large number of lowlevel routines called by the interfaced matvecs arose from optimization. this is true due to the singlecell extent of interior basis functions. A11 A12 A21 A22 x1 x2 = b1 b2 (II. data partitioning is the linchpin in the scheme. vectors are summed. but it would be far worse without modularity. Interface matvec allows one to use the same name for diﬀerent data types. followed by a communication step to sum the product vector across block borders.and should continue to be . Note that tblock matrix_vector routines are also kept in matrix_mod.3) A11 − A12 A−1 A21 x1 = b1 − A12 A−1 b2 22 22 Schur component gives a reduced system.5 Data Partitioning Data partitioning is part of a philosophy of how NIMROD has been . Early versions of the code had all data storage together. They will undergo major changes when tblock basis functions are generalized.28 CHAPTER II. it can help. FORTRAN IMPLEMENTATION The matrix_vector product subroutines are available in the matrix_mod module.2) (II. However. Instead. This amounts to a direct application of matrix partitioning. For the linear system equation.1. NIMROD’s modularity follows from the separation of tasks described by the implementation map. x2 = A−1 (b2 − A21 x1 ) 22 (II.].8 changes in the kernel’s README ﬁle. and A and A22 are invertible. xj and bi are the parts of vectors. [see the discussion of version 2. While this made it easy to get some parts of the code working. Additional notes regarding matrix storage: the stored coeﬃcients are not summed across block borders.
a list of blocks that touch the boundary of the domain (exblock_list).] However. If you are only concerned about one application. and seam0. the modularity revision has stood the test of time. the number of border vertices equals the . but the approach hampered parallelization. only two blocks can share a cell side.6 Seams The seam structure holds information needed at the borders of grid blocks. excorner. As deﬁned in the edge_type_mod module. We still require nimset to create seam0. Separation of the border data into a vertex structure and segment structure arises from the diﬀerent connectivity requirements. while many blocks can share a vertex. This logical is set at nimrod start up and is used to determine where to apply injection current and the appropriate boundary condition. and the changes made to achieve modularity at 2. Additional arrays could be added to indicate where diﬀerent boundary conditions should be applied. like the seam array itself. it is only used during initialization to determine which vertices and cell sides lie along the boundary. The seam array is dimensioned from one to the number of blocks. B. Eventually it became too cumbersome. The ﬁrst component applies to the vertex and the second component to the associated segment. in many cases. are ﬂags to indicate whether a boundary condition should be applied. a little extra eﬀort leads to a lot more usage of development work. thought and planning for modularity and data partitioning reward in the long term. including the connectivity of adjacent blocks and geometric data and ﬂags for setting boundary conditions. are tractable. seam0 was intended as a functionally equivalent seam for all space beyond the domain. we have added the logical array applV0(1:2.:). The scheme will need to evolve as new strides are taken. do what is necessary to get a result • However. DATA STORAGE AND MANIPULATION 29 encouraged repeated coding with slight modiﬁcations instead of taking the time to develop general purpose tools for each task.8 were extensive. The three logical arrays: expoint. even major changes. Instead. and r0point. The format is independent of block type to avoid blockspeciﬁc coding at the step where communication occurs. an edge_type holds vertex and segment structures. Changing terms or adding equations requires relatively minimal amount of coding. Aside For ﬂux injection boundary conditions. The seam_storage_mod module hold the seam array. The seam0 structure is of edge_type. [Particle closures come to mind. For each block the vertex and segment arrays have the same size.B. NOTE • These comments relate to coding for release.1. like generalizing ﬁnite element basis functions. But. Despite many changes since.
For tblocks. (entire domain for seam0) Within the vertex_type structure. or 2 to select seam vertex number.5). The ptr2 array is a duplicate of ptr.30 CHAPTER II. there is no requirement that the scan start at a particular internal vertex number. but references to seam0 are removed. For rblocks. but the starting locations are oﬀset (see Figure II. (see Figure II. Both arrays have two dimensions. is=segment index. ptr is the one that must be deﬁned during preprocessing. but the seam indexing always progresses counter clockwise around its block.5: iv=vertex index. The second index selects a particular connection.6) Connections for the one vertex common to all blocks are stored as: . While ptr2 is the one used during simulations. see information on parallelization). nvert=count of segment and vertices number of border segments. and the ﬁrst index is either: 1 to select (global) block numbers (running from 1 to nbl_total. the ptr and ptr2 arrays hold the connectivity information. The ptr array is in the original format with connections to seam0. Both indices proceed counter clockwise around the block. not 1 to nbl. the ordering is unique. FORTRAN IMPLEMENTATION iv=7 is=7 is=nvert iv=nvert is=1 iv=1 is=2 iv=2 Figure II. It is deﬁned during the startup phase of a nimrod simulation and is not saved to dump ﬁles. It gets deﬁned in nimset and is saved in dump ﬁles.
2) = 2 – seam(1)%vertex(5)% ptr(2. iv=4 iv=3 iv=11 .1) = 3 – seam(1)%vertex(5)% ptr(2..1) = 11 – seam(1)%vertex(5)% ptr(1.1) = 5 – seam(2)%vertex(2)%ptr(1. DATA STORAGE AND MANIPULATION 31 block2 block3 iv=1 iv=2 iv=5 block1 iv=1 iv=1 iv=2 iv=2 Figure II..1) = 1 – seam(2)%vertex(2)%ptr(2.2) = 11 • seam(3)%vertex(11) .2) = 2 • seam(2)%vertex(2) – seam(2)%vertex(2)%ptr(1.B.2) = 3 – seam(2)%vertex(2)%ptr(2.6: 3 blocks with corresponding vertices • seam(1)%vertex(5) – seam(1)%vertex(5)% ptr(1.
In tblocks.g. For vertex information. In rblocks vertex(iv)%intxy(1:2) saves (ix.3) = 3 – seam(3)%vertex(11)%ptr(2. only two blocks can share a cell side. For vertices along a domain boundary.1) = 2 – seam(3)%vertex(11)%ptr(2. ensuring identical results for the diﬀerent blocks. e.2) = 5 Notes on ptr: • It does not hold a selfreference. as mentioned above. the extra index follows the rblock vertex index and has the ranges 0:0. The %ave_factor and %ave_factor_pre arrays hold weights that are used during the iterative solves. It is used to ﬁnd vertices located at R = 0.2) = 1 – seam(3)%vertex(11)%ptr(2. respectively. FORTRAN IMPLEMENTATION – seam(3)%vertex(11)%ptr(1. with respect to the poloidal plane. The %order array holds a unique prescription (determined during NIMROD startup) of the summation order for the diﬀerent contributions at a border vertex. segment communication diﬀerers from vertex communication three ways. The scalar rgeom is set to R for all vertices in the seam. The %segment array (belonging to edge_type and of type segment_type) holds similar information. The various %seam_* arrays in vertex_type are temporary storage locations for the data that is communicated across block boundaries. but an extra array dimension is carried so that vector_type pointers can be used. analogous to the logical coordinates in rblocks.3) = 11 • The order of the connections is arbitrary The vertex_type stores interior vertex labels for its block in the intxy array. the vector_type section should have discussed their index ranges for data storage. Triangle vertices are numbered from 0 to mvert. there is only one blockinterior vertex label.1) = 2 – seam(3)%vertex(11)%ptr(1. there are no references like: – seam(3)%vertex(11)%ptr(1. • First. These unit vectors are used for setting Dirichlet boundary conditions. tang and norm are allocated (1:2) to hold the R and Z components of unit vectors in the boundary tangent and normal directions. however. The list is 1D in tblocks. and intxy(2) is set to 0. also consistent with rblock cell numbering. intxy(1) holds its value. where regularity conditions are applied. .32 CHAPTER II. Aside: Though discussion of tblocks has been avoided due to expected revisions. The extra dimension for cell information has the range 1:1.iy) for seam index iv.
or there may be many nodes along a given element side. None of the segment information is saved. Before describing the routines that handle block border communication.nside. Were the operation for a real (necessarily 2D) vector_type. nfour must be 1 (‘1_i4).nside. This ensures selfconsistency among the diﬀerent seam structures and with the grid. Only the ptr. The next step is to perform the communication. Finally.nfour. but seam0 is also written.1_i4.nfour. The 1D ptr array reﬂects the limited connectivity. there may be no element_side_centered data (poly degree = 1). Thus. which invokes serial or parallel operations to accumulate vertex and segment data. Finally. The routines called to perform block border communication are located in the edge module. CALL edge_network(nqty.false. The code segment starts with DO ibl=1. The edge_load and edge_unload routines are used to transfer block vertextype data to or from the seam storage. the fourth argument is a ﬂag to perform the load and .seam(ibl)) ENDDO which loads vector component indices 1:nqty (starting location assumed) and Fourier component indices 1:nfour (starting location set by the third parameter in the CALL statement) for the vertex nodes and ‘nside cellside nodes from the cvector_type rhsdum to seam_cin arrays in seam.nqty. DATA STORAGE AND MANIPULATION 33 • Second. • Third.nbl CALL edge_load_carr(rhsdum(ibl). The subroutine dump_write_seam in the dump module shows that very little is written.. Were the operation for a cvector_2D_type. it is worth noting what is written to the dump ﬁles for each seam. The single ibl loop includes both block types. intxy. segments are used to communicate oﬀdiagonal matrix elements that extend along the cell side. There are three sets of routines for the three diﬀerent vector types. as occurs in iter_cg_comp.) There are a few subtleties in the passed parameters. nfour must be 0 (‘0_i4 in the statement to match the dummy argument’s kind). and excorner arrays plus descriptor information are saved for the vertices. A typical example occurs at the end of the get_rhs routine in finite_element_mod.B. the tang and norm arrays have n extra dimension corresponding to the diﬀerent element side nodes if poly degree > 1. NIMROD creates much of the vertex and all of the segment information at startup. Central among these routines is edge_network. Matrix element communication makes use of intxyp and inxyn internal vertex indices for the previous and next vertices along the seam (intxys holds the internal cell indices).
Adding a new equation to the physics model usually requires adding a new management routine as well as new integrand routines. The simplest matrices are diagonal in Fourier component and have decoupling between various real and imaginary vector components. The FORTRAN modules listed on the left side of the implementation map (Sec. there are now three diﬀerent types of management routines. This section of the tutorial is intended to provide enough information to guide such development.34 CHAPTER II. C. the integrand modules hold coding that determines what appears in the volume integrals like those on pg11 for Faraday’s law.1 Management Routines The adv_* routines in ﬁle nimrod. The management routines control which integrand routines are called for settingup an advance.nbl CALL edge_unload_carr(rhsdum(ibl). or they are called twice to predict then correct the eﬀects of advection.2b. DO ibl=1.seam(ibl)) ENDDO The pardata module also has structures for the seams that are used for parallel communication only. To use the most eﬃcient linear solver for each equation.1_i4. The ﬁnal step is to unload the bordersummed data. The other level subject to frequent development is the management level at the top of the map. These advances require .nfour.f manage the advance of a physical ﬁeld. The sequence of operations performed by a management routine is outlined in Sec. C Finite Element Computations Creating a linear system of equations for the advance of a solution ﬁeld is a nontrivial task with 2D ﬁnite elements. In fact. They also need to call the iterative solver that is appropriate for a particular linear system. adding terms to existing equations requires modiﬁcations at this level only. normal physics capability development occurs at only two levels. Near the bottom of the map. which is something of an archaic remnant of the predatapartitioning days. it communicates the crhs and rhs vector types in the computation_pointers module. If true. In many instances. They are called once per advance in the case of linear computations without ﬂow.nside. A character passﬂag informs the management routine which step to take. This section describes more of the details needed to create a management routine. They are accessed indirectly through edge_network and do not appear in the ﬁnite element or linear algebra coding. D. FORTRAN IMPLEMENTATION unload steps within edge_network.nqty. A) have been developed to minimize the programming burden associated with adding new terms and equations.
Full use of an evolving number density ﬁeld requires a 3D matrix solve for velocity. They diﬀer in the following way: • Cellinterior elimination is not used. cvector_type array. global_matrix_type). • Before the iterative solve. Once the matrix and rhs are formed. B. matrix elements will be modiﬁed within matrix_create.4)%arr] is then used to hold the inverse of the interior submatrix (A−1 ). ﬂags (2). so that operators for each Fourier components can be saved. This step allows us to write integrand routines for the change of a solution ﬁeld rather than its new value. logical switch for using a surface integral. and this advance is managed by adv_v_3d. b. The second type is diagonal in Fouries component. The adv_v_clap is an example. routine name. followed by a b. there are vector and matrix operations to ﬁnd the product of the matrix and the old solution ﬁeld. i. 3D matrices. the matrix is the sum of a mass matrix and the discretized Laplacian. surface integrand name.4).c. Subroutine names for the integrand and essential boundary conditions are the next arguments.c.c. [Use rmat_elim= or cmat_elim= to signify real or complex matrix types. The call to matrix_create passes an array of global_matrix_type and an array of matrix_factor_type. • The iterative method solves Axnew = (b − xold). FINITE ELEMENT COMPUTATIONS 35 two matrix solves (using the same matrix) for each Fourier component. 22 The get_rhs calls are similar. • Loops over Fourier components are within the iterative solve. Summarizing: • Integrands are written for A and b with A∆x = b to avoid coding semiimplicit terms twice.C. but then solve for the new ﬁeld so that the relative tolerance of the iterative solver is not applied to the change. hence adv_v_aniso is this type of management routine. ﬂag and the ’pass’ character variable. The last argument is optional and its presence indicates that interior elimination is used. ﬁnd b − Axold. essential b. The full semiimplicit operator has such anisotropy. and the result is added to the rhs vector before calling the iterative solve. The 3D management routines are similar to the 2D management routines. usually due to anisotropy. . If elimination of cellinterior data is appropriate (see Sec. but all real and imaginary vector components must be solved simultaneously. except the parameter list is (integrand routine name. and interior storage [rbl_mat(ibl)%mat(4. All these management routine types are similar with respect to creating a set of 2D matrices (used only for preconditioning in the 3D matrixsystems) and with respect to ﬁnding the rhs vector. The third type of management routine deals with coupling among Fourier components.e. since ∆x x is often true.
finite_element just needs parameterlist information to call any routine out of the three classes. Data is saved (with timecentering in predictor steps) as basis function coeﬃcients and at quadrature points after interpolating through the *block_qp_update calls. surface_ints. Instead. Next. or boundary modules. After an iterative solve is complete.3 Numerical integration Modules at the next level in the ﬁnite element hierarchy perform the routine operations required for numerical integration over volumes and surfaces. passing the same integrand routine name. allowing for a few options. See Sec. The three modules. for example. The parameter list information. rblock. . iter_factor ﬁnds some type of approximate factorization to use as a preconditioner for the iterative solve. C. they call the appropriate matelim routine to reduce the number of unknowns when poly_degree> 1. The latter prevents an accumulation of error from the iterative solver. where the full 3D matrix is never formed. including assumedshape array deﬁnitions. surface integrand.53). and boundary condition routines are passed through matrix_create and get_rhs. Finally. The next step is to modify gridvertex and cellside coeﬃcients through cellinterior elimination (b1 − A12 A−1 b2 from p. must use the same argument list as that provided by the interface block for ‘integrand’ in comp_matrix_create. and it may perform an optional surface integration. but the operations are those repeated for every ﬁeld advance.36 CHAPTER II. but it is probably superﬂuous. The blockborder 22 seaming then completes a scatter process: The ﬁnal steps in get_rhs apply regularity and essential boundary conditions. there are some vector operations. Note that all integrand routines suitable for comp_matrix_create. Aside on the programming: • Although the names of the integrand. finite_element does not use the integrands. The real_ and comp_matrix_create routines ﬁrst call the separate numerical integration routines for rblocks and tblocks. connections for a degenerate rblock (one with a circularpolar origin vertex) are collected. The get_rhs routine also starts with blockspeciﬁc numerical integration. an rhs integrand name is passed. are provided by the interface blocks. C.2 Finite_element Module Routines in the finite_element module also serve managerial functions. including the completion of regularity conditions and an enforcement of Dirichlet boundary conditions. FORTRAN IMPLEMENTATION • The iterative solve used a ’matrixfree’ approach. Then. D for more information on the iterative solves.
FINITE ELEMENT COMPUTATIONS 37 A volume integral over this bicubic cell contributes to node coefficients at the positions.C. Figure II. Node coefficients along block borders require an edge_network call to get all contributions. A volume integral here contributes to node coefficients at the positions.7: Blockborder seaming .
Our quadrilateral elements have (poly_degree+1)2 nonzero test functions in every cell (see the Figure II. dot denotes quadrature position. It starts by setting the storage arrays in the passed cvector_type to ∅. vertical side. It then allocates the temporary array.38 CHAPTER II. we would not need the temporary integrand storage at all. The rblock_get_comp_rhs routine serves as an example of the numerical integration performed in NIMROD. Thus each iteration of the doloop ﬁnds the contributions for the same respective quadrature point in all elements of a block. The integration is performed as loop over quadrature points. not node tblock.4 [NIMROD uses two diﬀerent labeling schemes for the basis function nodes. 4 elements in a block. FORTRAN IMPLEMENTATION ig=1 ig=2 Figure II. as described in B. so in the integral dξdηJ (ξ. cell interior) for eﬃciency in . Note that wi ∗ J appears here. integrand.8: 4 quad points. η)f (ξ. This approach would allow us to incorporate additional block types in a straightforward manner should the need arise. and surface. because contributions have to be sorted for each basis type and oﬀset pair. R. horizontal side. η).7 and observe the number of node positions). using saved data for the logical oﬀsets from the bottom left corner of each cell. The ﬁnal step in the integration is the scatter into each basis function coeﬃcient storage. The vector_type and global_matrix_type separate basis types (grid vertex. and wi ∗ J (ξi . if there were only one type of block. The matrix integration routines are somewhat more complicated. are separated to provide a partitioning of the diﬀerent geometric information and integration procedures. In fact. ηi ). The call to the dummy name for the passed integrand then ﬁnds the equation and algorithmspeciﬁc information from a quadrature point. one quadrature point at a time. which is used to collect contributions for test functions that are nonzero in a cell. η) only. since the respective tblock integration routine calls the same integrands and one has to be able to use the same argument list. the integrand routine ﬁnds f(ξ. A blank tblock is passed along with the rb structure for the block.
The factor keff(1:nmodes)/bigr is the toroidal wavenumber for the data at Fourier index imode. Vector operations. Do not assume that a particular nvalue is associated with some value of imode. The input and global parameteres are satisﬁed by the F90 USE statement of modules given those names. In the toroidal geometry. per length 2. Neither vary during a simulation. Domain decomposition also aﬀects the values and range of keff. The matrix integration routines have to do the work of translating between the two labeling schemes. Pseudospectral computations.C. The Faraday’s law example in Section A. 5. . In linear 2πn geometry. so the basis function information is evaluated and saved in blockstructure array as described in B. Therefore. Finite element basis functions and their derivatives with respect to R and Z. This is why there is a keff array. let’s review what is typically needed for an integrand computation. The latter is simpler and more tractable for the blockindependent integrand coding. Wavenumbers for the toridal direction 3.4 Integrands Before delving further into the coding. holds the nvalue. 2. is provided to ﬁnd the value of the wavenumber associated with each Fourier component. The array keff(1:nmodes). the integrand routines use a single list for nonzero basis functions in each element.1. Values of solution ﬁelds and their derivatives. 6. Some of this information is already stored at the quadrature point locations. generic_alpha_eval and generic_ptr_set routines merely set pointers to the appropriate storage locations and do not copy the data to new locations.1. FINITE ELEMENT COMPUTATIONS 39 linear algebra. and bigr(:. Doing so will lead to unexpected and confusing consequences in other integrand routines.:) holds R. It is available through the qp_type storage. Linear computations often have nmodes=1 and keff(1) equal to the input parameter lin_nmax. It is important to remember that integrand computations must not change any values in the pointer arrays.1 suggests the following: 1. keff holds kn = and bigr=1. Solution ﬁeld data is interpolated to the quadrature pions and stored at the end of the equation advances. 4. There are two things to note: 1. In contrast. but each node may have more than one label in this scheme. also described in B. Input parameters and global simulation data.] C. The basis function information depends on the grid and the selected solution space. keff.
a pointer set for α. like int arrays for other matrix computations. so separate indices for row and column n indices are not used. Continuing with the Faraday’s law example. the ﬁrst term on the lhs of the second ∂B . One of them is always a blank structure. The ﬁrst two dimensions are directionvectorcomponent or ‘quantity’ indices. determines which coeﬃcient to use. the int array is ﬁlled. The spatial discretization gives the integrand αi αj equation on pg 11 is associated with ∂t (recall that J appears at the next higher level in the hierarchy). The adv_b_iso management routine passes the curl_de_iso integrand name. and the second is the row quantity (iq).1:my) for the cell indices in a gridblock.iv) for the j’ column basis index and the j row index. The next term in the lhs (from pgB. so lets examine that routine. η)ˆl exp (inφ) ∂tln test function dotted with gives for all l and n αj αj All of the stored matrices are 2D.40 CHAPTER II. The character variable integrand_flag. i. Though el is eR . Operators requiring a mass matrix can have matrix_create add it to another operator. Instead. the orthogonality of direction vectors implies that el does not appear. FORTRAN IMPLEMENTATION The k2ef array holds the square of each value of keff. The next two dimensions have range (1:mx. so there is coding for two diﬀerent coeﬃcients. regardless of symmetry. we need to evaluate ˆ ˆ ˆ ˆ ˆ × (αj el exp (inφ)). and a double loop over basis indices. To understand what is in the basis function loops. The last two indices are basis function indices (jv. the generic_evals module selects the appropriate data storage. but this keeps block speciﬁc coding out of integrands. It is more complicated than get_mass. The routine is used for creating the resistive diﬀusion operator and the semiimplicit operator for Hall terms. so iq=jq=1: ˆ αj (ξ. The get_mass subroutines then consists of a dimension check to determine the number of nonzero ﬁnite elemnt basis function in each cell. For get_mass.] The int array passed to get_mass has 6 dimensions. [The matrix elements are stored in the mass_mat data structure.e. it is straightforward to use a general .f and saved in the global module. as on pg 11. get_mass. η)ˆl exp (−in φ) e ∂Bj e αj (ξ. This helps make our matrixvector products faster and it implies that the same routines are suitable for forming nonsymmetric matrices. The ﬁrst is the column quantity (jq). This ‘mass’ matrix is so common that it has its own integrand routine. diagonal in n. Thus we need × (αj ˆl exp (−in φ)) · × (αj el exp (inφ)) e ˆ only. After the impedence coeﬃcient is found. We will skip them in this discussion. Note that calls to generic_ptr_set pass both rblock andtblock structures for a ﬁeld. but the general form is similar. Note that our operators have full storage. The basis loops appear in a conditional statement to use or skip the · term for the divergence cleaning. that is set in the advance subroutine of nimrod. eZ or eφ only.2) arises from implicit diﬀusion.
(ˆl )R = (ˆl )R = 1 and (ˆl )Z = (ˆl )φ = (ˆl )Z = (ˆl )φ = 0 so e int(1. 3.:. iq = (1. the scalar product × (αj el exp (−in φ)) · ˆ × (αj el exp (inφ)) is ˆ ∂αj ∂αj in αj inαj (ˆl )φ + e (ˆl )Z e (ˆl )φ − e (ˆl )Z e ∂Z R ∂Z R αj ∂αj inαj inαj (ˆl )R − e + (ˆl )φ e (ˆl )R − e + − R R ∂R R ∂αj ∂αj ∂αj ∂αj + (ˆl )Z − e (ˆl )R e (ˆl )Z − e (ˆl )R e ∂R ∂Z ∂R ∂Z αj ∂α + R ∂R (ˆl )φ e Finding the appropriate scalar product for a given (jq. the coupling between Re(BR. BZ ) and Im(Bφ ) is the same as that between Im(BR . saving CPU time. This simpliﬁes the scalar product ˆ greatly for each case. Note how the symmetry of the operator is preserved by construction. only real poloidal coeﬃcients are coupled to imaginary toroidal coeﬃcients and vice versa.:.iv)= n2 ∂αj v αiv αj vαiv + R ∂Z ∂Z ×ziso where ziso is an eﬀective impedance. making the operator phase independent. φ for el for l ⇒ iq = 1. An example of the poloidaltoroidal coupling for real coeﬃcients is . This lets us solve for these separate groups of components as two real matrix equations.C. e × (αj ˆl exp (inφ)) = (αj exp (inφ)) = ˆ (αj exp (inφ)) × el = (αj exp (inφ)) × ˆl + αj exp (inφ) × (ˆl ) e e ∂α ˆ ∂α ˆ inα ˆ R+ Z+ αφ exp (inφ) ∂R ∂Z R ∂α inα ∂α inα ˆ ˆ (ˆl )φ − e (ˆl )Z R + e (ˆl )R − e (ˆl )φ Z e ∂Z R R ∂R ∂α ∂α ˆ + (ˆl )Z − e (ˆl )R φ exp (inφ) e ∂R ∂Z × (ˆl ) = 0 for l = R. Furthermore. Like other isotropic operators the resulting matrix elements are either purely real or purely imaginary.1. and the only imaginary elements are those coupling poloidal and toroidal vector components. e e e e e For (jq. Thus.iq) pair is then a matter of subˆ ˆ ˆ stituting R. FINITE ELEMENT COMPUTATIONS 41 direction vector and then restrict attention to the three basis vectors of our coordinate system. 2.jv. 1). BZ ) and Re(Bφ ). instead of one complex matrix equation. respectively. Z. Z e 1 ˆ × (ˆl ) = − Z for l = φand toroidal geometry e R Thus.
Coeﬃcients for all Fourier components are created in one integrand call.1:nphi) for example. for example) the n = 0 component of a ﬁeld is needed regardless of the nvalue assigned to a processor. 1) element. is taken from the global module.:. int(iq.jv. It is the doloop index set in matrix_create and must not be changed in integrand routines. From the ﬁrst executable statement. and neoclassical contributions. Note that fft_nim concatenates the poloidal position indices into one index.1). ·B is found for error diﬀusion terms. the be. There are FFTs and pseudospectral operations to generate nonlinear terms. The ˆ computations are derived in the same manner as × (αj el exp (inφ)) given above. there are a number of pointers set to make various ﬁelds available. ˆ • In some case (semiimplicit operators.:.iv. The pointers for B are set to the storage for data from the end of the last timesplit. and bez arrays are reset to the predicted B for the ideal electric ﬁeld during corrector steps (indicated by the integrand_flag. The mpseudo parameter can be less . 4. After that. More pointers are then set for neoclassical calculations. E is only needed at the quadrature points. The resulting nonlinear ideal E is then transformed to Fourier components (see Sec B. The int array has 5 dimensions.3. The ﬁrst nonlinear computation appears after the neoclassical_init select block. The vector aspect (in a linear algebra sense) of the result implies one set of basis function indices in the int array. 2. where E may contain ideal mhd.:. jmode. Z.1:mpseudo.iv)= nα j v R αiv αiv + R ∂R ×ziso inαj v R αiv αiv + R ∂R which appears through a transpose of the (3. • Other vectordiﬀerential operations acting on αj el exp (inφ) are needed elsewhere. φ). math_curl is used to ﬁnd the corresponding current density.:. and the math_tran routine. resistive. Separate real storage is created and saved on all processors (see Sects F& C). FORTRAN IMPLEMENTATION int(1. Then.im). so that real_be has dimensions (1:3. As with the matrix integrand. The V and B data is transformed to functions of φ. where im is the Fourier index (imode). the brhs_mhd routine ﬁnds ×(αj el exp (−in φ))· ˆ E(R.ber. [compare with − from the scalar product on pg 41] Notes on matrix integrands: • The Fourier component index. 3. where the cross product. The rhs integrands diﬀer from matrix integrands in a few ways: 1.42 CHAPTER II. V ×B is determined. Returning to Faraday’s law as an example.
l ). we encounter the loop over testfunction basis function ˆ indices (j . In some cases. They work by changing the appropriate rows of a linear system to (Aj ln j ln )(xj ln ) = 0.6 Dirichlet boundary conditions The boundary module contains routines used for setting Dirichlet (or ‘essential’) boundary conditions.C. Further development will likely be required if more complicated operations are needed in the computations. A loop over the Fourier index follows the pseudospectral computations. C. Then. completing . except × (αj el exp (−inφ)) is replaced by the local E. so one should not relate the poloidal index with the 2D poloidal indices used with Fourier components. ˆ The integrand routines used during iterative solves of 3D matrices are very similar to rhs integrand routines. C.(Vs × B + (V × Bs + (V × B) (see Sec A. and the interpolate is left in local arrays. The linear ideal E terms are created using the data for the speciﬁed steady solution. and there is a sign change for − × (αj el exp (−in φ)) · E. and the seam data is not carried below the finite_element level. the surface integration is called one cellside at a time from get_rhs. pres.5 Surface Computations The surface and surface_int modules function in an analogous manner to the volume integration and integrand modules. FINITE ELEMENT COMPUTATIONS 43 than the number of cells in a block due to domain decomposition (see Sec C). not pointers. Thus. It is passed to a surface integrand. For Dirichlet conditions on the normal component. for example. At present.1). It is therefore interpolated to the quadrature locations in the integrand routine with a generic_all_eval call.5) . The terms are similar to those in curl_de_iso. One important diﬀerence is that not all border segments of a block require surface computation (when there is one). 1D basis function information for the cell side is generated on the ﬂy in the surface integration routines. Only those on the domain boundary have contributions. the resistive and neoclassical terms are computed.4) where j is a basis function node along the boundary. a linear combination is set to zero. They are used to ﬁnd the dot product of a 3D matrix and a vector without forming matrix elements themselves (see Sec D. The only noteworthy diﬀerence with rhs integrands is that the operand vector is used only once. Near the end of the routine. and l and n are the directionvector and Fourier component indices. respectively.2). for example. and presz are local arrays. the rhs of Ax = b is modiﬁed to ˆ ˆ b (I − nj nj ) · b ≡ ˜ (II.presr. In p_aniso_dot. (II.
Notes: • Changing boundary node equations is computationally more tractable than changing the size of the linear system. It would also be straightforward to make the ‘component’ information part of the seam structure. the dirichlet_op routines would need modiﬁcation to be consistent with the rhs. If the conditions do not change in time. the error is no more than O(∆t) in any case. . • Removing coeﬃcients from the rest of the system [through ·(I − nn) on the right side of ˆˆ A] may seem unnecessary. as implied by ﬁnite element theory. However. The dirichlet_op and dirichlet_comp_op routines perform the related matrix operations described above.8) The net eﬀect is to apply the boundary condition and to remove xj n from the rest of the linear system. The ‘component’ character input is a ﬂag describing which compoˆˆ nent(s) should be set to 0. it preserves the symmetric form of the operator. so that it could vary along the boundary independent of the block decomposition.7) (II. the existing routines can be used for inhomogeneous Dirichlet conditions. provided that the time rate of change does not appear explicitly. norm. However. which is important when using conjugate gradients. Since our equations are constructed for the change of a ﬁeld and not its new value. The dirichlet_rhs routine in the boundary module modiﬁes a cvector_type through operations like (I − nn)·. The seam data tang. The matrix becomes ˆ ˆ ˆ ˆ ˆ ˆ ˆ (I − nj nj ) · A · (I − nj nj ) + nj nj Dotting nj into the modiﬁed system gives ˆ ˆ xj n = nj · x = 0 Dotting (I − nj nj ) into the modiﬁed system gives ˆ ˆ ˆ ˆ ˜ b (I − nj nj ) · A · x = ˜ (II. the coding is involved. Some consideration for timecentering the boundary data may be required. However. one only needs to comment out the calls to dirichlet_rhs at the end of the respective management routine and in dirichlet_vals_init. diﬀerent component input could be provided for diﬀerent blocks. FORTRAN IMPLEMENTATION where nj is the unit normal direction at boundary node j. Since the dirichlet_rhs is called one block at a time. For timedependent inhomogeneous conditions. too.6) (II.44 CHAPTER II. At present. one can adjust the boundary node values from the management routine level. Though mathematically simple. then proceed with homogeneous conditions in the system for the change in the ﬁeld. and intxy(s) for each vertex (segment) are used to minimize the amount of computation. the kernel always applies the same Dirichlet conditions to all boundary nodes of a domain. since it has to address the oﬀset storage scheme and multiple basis types for rblocks.
but VRn . the regular_ave routine is used to set Vφ = iV− at the appropriate nodes. At R = 0 nodes only.’ approach because there is no surface with . y = R sin φ transformation and derivatives into the 2D Taylor series expansion in (x. y) about the origin. To preserve symmetry with nonzero V− column entries.C. The regularity conditions may be derived by inserting the x = R cos φ. After the linear system is solved.11) The entire functional form of the regularity conditions is not imposed numerically. Returning to the power series behavior for R → 0. and regular_comp_op routines modify the linear systems to set Sn = 0 at R = 0 for n > 0 and to set VRn and Vφn to 0 for n = 0 and n ≥ 2. (II.7 Regularity conditions When the input parameter geometry is set to ‘tor’. The phase relation for VR1 and Vφ1 at R = 0 is enforced by the following steps: 1.e. φ) coordinate system. phase information for VR1 and Vφ1 is redundant. With the cylindrical (R. VR1 . For vectors. (II. Z. the Zdirection component satisﬁes the relation for scalars. change variables to V+ = (Vr + iVφ )/2 and V− = (Vr − iVφ )/2. Vφn ∼ Rn−1 ψ(R2 ) for n ≥ 0. Instead. at R = 0. add −i times the Vφ row to the VRrow and set all elements of the Vφ row to 0 except for the diagonal. our Fourier coeﬃcients must satisfy regularity conditions so that ﬁelds and their derivatives approach φindependent values at R = 0. The conditions are for the limit of R → 0 and would be too restrictive if applied across the ﬁnite extent of a computational cell. set all elements in a V+ column to 0. and Vφ is the vanishing radial derivative. Unfortunately. we focus on the leading terms of the expansions. the leading order behavior for S0 .10) Furthermore.9) where ψ is a series of nonnegative powers of its argument. Set V+ = 0. not toroidal. 3. The regular_vec. The routines in the regularity module impose the leadingorder behavior in a manner similar to what is done for Dirichlet boundary conditions. We take φ to be 0 at R = 0 and enforce Vφ1 = iVR1 . we cannot rely on a ‘natural B. 2. and the left side of the grid has points along R = 0. It is straightforward to show that the Fourier φcoeﬃcients of a scalar S(R. This is like a Neumann boundary condition. FINITE ELEMENT COMPUTATIONS 45 C. i.C. (II. φ) must satisfy Sn(φ) ∼ Rn ψ(R2 ). the domain is simply connected – cylindrical with axis along Z. regular_op.
w ∂αj ∂αj dV 2 . D.1 2D matrices The iter_cg_f90 and iter_cg_comp modules contain the routines needed to solve symmetricpositivedeﬁnite and Hermitianpositivedeﬁnite systems. Although normal seam communication is used during matrix_vector multiplication. Within the long iter_cg_f90. once one understands NIMROD’s vector_type manipulation. unlike matrix_vector product routines.f ﬁles are separate modules for routines for diﬀerent preconditioning schemes. We should keep this in mind if a problem arises with S0 (or VZ0 ) near R = 0.12) R ∂R ∂R saved in the dr_penalty matrix structure. FORTRAN IMPLEMENTATION ﬁnite area along an R = 0 side of a computational domain.] D Matrix Solution At present. The regular_op and regular_comp_op routines add this penalty term before addressing the other regularity conditions. Instead. and a module for managing routines that solve a system(iter_cg_*). and may be compared with textbook descriptions of cg. there are also solverspeciﬁc block communication operations. and the matrix_type storage arrays have been arranged to optimized matrixvector multiplication. The iter_cg module holds interface blocks to give common names and the real and complex routines. we add the penalty matrix. The iter_mat_com routines execute this communication using special functionality in edge_seg_network for the oﬀdiagonal elements. After the partial factors are found and stored in the matrix_factor_type structure. Its positive value ∂ penalizes changes leading to ∂R = 0 in these elements. The rest of the iter_cg. [Looking through the subroutines. since it is added to linear equations for the changes in physical ﬁelds. (II.f ﬁle has external subroutines that address solverspeciﬁc blockborder communication operations. They perform the basic conjugate gradient steps in NIMROD’s block decomposed vector_type structures. The weight w is nonzero in elements along R = 0 only. The basic cg steps are performed by iter_solve_*. NIMROD relies on its own iterative solvers that are coded in FORTRAN 90 like the rest of the algorithm. a module for managing partial factorization(iter_*_fac). with suitable normalization to matrix rows for S0 . The partial factorization routines for preconditioning need matrix elements to be summed across block borders(including oﬀdiagonal elements connecting border nodes). it appears that the penalty has been added for VR1 and Vφ1 only. VR1 and Vφ1 . It is not diﬀusive. respectively. . border elements of the matric storage are restored to their original values by iter_mat_rest.f and iter_cg_comp.46 CHAPTER II.
To avoid these issues. Achieving parallel scaling would be very challenging. which inverts local direction vector couplings only. They also make coupling to external library solver packages somewhat challenging. and cellinterior data may or may not be eliminated. results from blockbased partialfactors(the “direct”. eﬀective for midrange condition numbers(arising roughly when ∆texplicit−limit Neither of these two schemes has been updated to function with higherorder node data. In ˜ ˜ the preconditioning step. were the resistivity in our form if Faraday’s law on p. and the latter has its own parallel communication(see D). then multiply z by ave_factor_pre before summing. φ) µ0 × (¯j l e−in φ ) · α × (¯j l einφ ) α (II. The block and global lineJacobi schemes use 1D solves of couplings along logical directions. inverting A. “bl_ilu_*”. D. simply multiplying border zvalue by ave_factor and seaming after preconditioning destroys the symmetry of the operation(and convergence). we symmetrize the averaging by multiply border elements of r by √ ˜ ave_factor_pre(∼ ave factor). Z. Storage requirements for such a matrix would have to grow as the number of Fourier components is increased. The Fourier representation leads to matrices that are dense in nindex when φdependencies appear in the lhs of an equation. it would generate oﬀdiagonal in n contributions: Bj ln j ln dRdZdφ η(R. but they complicate the preconditioning operations. Computations with both block types and solver = ‘diagonal’ will use the speciﬁed solver in rblocks and ‘diagonal’ in tblocks. These idiosyncracies have little eﬀect on the basic cg steps. MATRIX SOLUTION 47 Other seamrelated oddities are the averaging factors for border elements. Computationally. The routine is simple. There is also diagonal preconditioning. but it has to account for redundant storage of coeﬃcients along block borders. Each preconditioner has its own set of routines and factor storage. even with parallel decomposition.2 3D matrices The conjugate gradient solver for 3D systems is mathematically similar to the 2D solvers. However.D. hence ave_factor. where the matrixvector products needed for cg iteration are formed directly from rhstype ﬁnite element computation. Instead. we use a ‘matrixfree’ approach. This simple scheme is the only one has functions in both rblocks and tblocks. except for the global and block lineJacobi algorithms which share 1D matrix routines. The blockdirect option uses Lapack library routines. and “bl_diag*” choices) are averaged at block borders. it is quite diﬀerent.? a function of φ. the solve of Az = r where A is an approximate A and r is the residual. For example.13) which may be nonzero for all n . degenerate points in rblocks. The scalar product of two vector_type is computed by iter_dot. The blockincomplete factorization options are complicated but ∆t τA ). . NIMROD grids may have periodic rblocks.
The data is converted to ˆ cylindrical components after evaluating at quadrature point locations. Finally. the product is then the sum of the ﬁnite element result and a 2Dmatrixvector multiplication. The Bj ln data are coeﬃcients of the operand. Thus. Further.] Allocation and initialization of storage are primary tasks of the startup routines located in ﬁle nimrod_init. iter_3d_cg solves systems for changes in ﬁelds directly.14) Identifying the term in brackets as µ0J(R. Adding new variables and matrices is usually a straightforward task of identifying an existing similar data structure and copying and modifying calls to allocation routines. The solution ﬁeld at the beginning of the time split is passed into iter_3d_cg_solve to scale norms appropriately for the stop condition. [The storage modules and structure types are described in Sec. . iter_3d_solve calls get_rhs in ﬁnite element. and the e3 of ja_eq is the contravariant Jφ/R. passing a dotproduct integrand name.f. B. but when they are associated with ﬁniteelement structures. • Initialization routines have conditional statements that determine what storage is created for diﬀerent physics model options. The quadrature_save routine allocates and initializes quadrature_point storage for the steadystate (or ’equilibrium’) ﬁelds. The variable_alloc routine creates quadraturepoint storage with calls to rblock_qp_alloc and tblock_qp_alloc. instead of calling an explicit matrixvector product routine. The scheme uses 2D matrix structures to generate partial factors for preconditioning that do not address n → n coupling. FORTRAN IMPLEMENTATION We can switch the summation and integration orders to arrive at dRdZdφ × (¯j l e−in φ ) · α η(R. E Startup Routines Development of the physics model often requires new *block_type and matrix storage. φ). calls to generic_all_eval and math_curl create µ0Jn. variable_alloc creates vector_type structures used for diagnostic computations. This integrand routine uses the coeﬃcients of the operand in ﬁniteelement interpolations. and it initializes quadraturepoint storage for dependent ﬁelds. There are also lagr_quad_alloc and tri_linear_alloc calls to create ﬁniteelement structures for nonfundamental ﬁelds (like J which is computed from the fundamental B ﬁeld) and work structures for predicted ﬁelds. Z. Additional ﬁeld initialization information: • The e3 component of be_eq structures use the covariant component (RBφ) in toroidal ˆ geometry. The cylindrical components are saved in their respective qp structures. φ) [ µ0 Bj ln j ln × (¯j l einφ )] α (II. Z. shows how the product can be found as a rhs computation.48 CHAPTER II. the curl of the interpolated and FFT’ed B. It also accepts a separate 2D matrix structure to avoid repetition of diagonal in n operations in the dotintegrand.
and rates of advection are found from v · xi /xi2  in each cell. over the φdirection. Velocity vectors are averaged over all nodes in an element. and we compute new matrices when the change exceeds a tolerance (ave_change_limit).] The ave_field_check subroutine uses the vector_type pointers to facilitate the test. Any new structure allocations can be coded by copying and modifying existing allocations. we determine how much the ﬁelds have changed sicne the last matrix update. where xi is a vector displacement across the cell in the ith logical coordinate direction. C). The new_dt and ave_field_check subroutines are particularly important. The coeﬃcient for the isotropic part of the operator is based on the maximum diﬀerence between total pressures and the steady_plus_n=0 pressures.f. Parallel communication is required for domain decomposition of the Fourier components (see Sec. computing matrix elements and ﬁnding partial factors fro preconditioning are computationally intensive operations.F. UTILITIES 49 • The pointer_init routine links the vector_type and ﬁnite element structures via pointer assignment. • The boundary_vals_init routine enforces the homogeneous Dirichlet boundary conditions on the initial state.f ﬁle has subroutines for operations that are not encompassed by finite _element and normal vector_type operations. New ﬁelds will need to be added to this list too. Before leaving utilities. The computation requires information from . which is used for the 2D preconditioner bb matrix for anisotropic thermal conduction. let’s take a quick look at find_bb. though not elegantly coded. It also considers the gridvertex data only. Teh new_dt routine computes the rate of ﬂow through mesh cells to determine if advection should limit the time step. ﬂags such as b0_changed in the global module are set to true. and *_matrix_fac_degen subroutines. Thus. To avoid calling these operations during every time step. subroutines. assuming it would not remain ﬁxed while other coeﬃcients change. iter_fac_alloc. The matrix_init routine allocates matrix and preconditioner storage structures. However. The ave_field_check routine is an important part of our semiimplicit advance. Operations common to all matrix allocations are coded in the *_matrix_init_alloc. A similar computation is performed for electron ﬂow when the twoﬂuid model is used. F Utilities The utilities. The ‘linear’ terms in the operators use the steadystate ﬁelds and the n = 0 part of the solution. [Matrices are also recomputed when ∆t changes. Selected calls can be commented out if inhomogeneous conditions are appropriate. and the storage arrays for the n = 0 ﬁelds or nonlinear pressures are updated. This routine is used to ﬁnd the toroidally symmetric part of the ˆˆ dyad. both the ‘linear’ and ‘nonlinear’ parts of the semiimplicit operator change in time. If the tolerance is exceeded.
This routine decides how much storage is required. First. except extrap_init. when adding a new equation. so it cannot occur in a matrix integrand routine. ˆˆ is created and stored at quadrature point locations. Code development rarely requires modiﬁcation of these routines.f. and it creates an index for locating the data for each advance. for example) use rblock and tblock numerical integration of the integrands coded in the diagnostic_ints module.50 CHAPTER II. [Moving the jmode loop from matrix_create to matrix integrands is neither practical nor preferrable. They ’scatter’ to any vector_type storage provided by the calling routine.] Thus. providing the initial guess for an iterative linear system solve. A description of the parameters should be provided if other users will have .] The diagnostic_ints integrands diﬀer in that there are no αj l e−in φ test functions. G Extrap_mod The extrap_mod module holds data and routines for extrapolating solution ﬁelds to the new time level. Diagnostics requiring surface or volume integrals (toroidal ﬂux and kinetic energy. increase the dimension fo extrap_q. extrap_nq. consistent with an integrandbb type of computation. The numerical integration and integrands diﬀer only slightly from those used in forming systems for advancing the solution. The routine uses the mpi_allreduce command to sum contributions from diﬀerent Fourier ’layers’. FORTRAN IMPLEMENTATION all Fourier components. but separate from the ﬁniteelement hierarchy. [The *block_get_rhs routines have ﬂexibility for using diﬀerent basis functions through a simulation. ¯ The integrals are over physical densities. and extrap_int and deﬁne the new values in extrap_init. the cell_rhs and cell_crhs vector_types are allocated with poly_degree=0. using existing code for examples is helpful. Routines like find_bb may become common if we ﬁnd it necessary to use more 3D linear systems in our advances. The ﬁrst part of the input module deﬁnes variables and default values. The amount of data saved depends on the extrap_order input parameter. Again. Therefore. The output from probe_hist is presently just a collection of coeﬃcients from a single grid vertex.f (in the nimset directory and an addition to parallel. This allocates a single cellinterior basis and no grid or side bases. I I/O Coding new input parameters requires two changes to input. H History Module hist_mod The timedependent data written to XDRAW binary or text ﬁles is generated by the routines in hist_mod.
These calls can be copied and modiﬁed to add new ﬁelds. Data is transferred to other processors via MPI communication coded in parallel_io. so that it can be read from a nimrod.f ﬁles are diﬀerent to avoid parallel data requirements in NIMSET. [But.I. Be sure to use another mpi_bcast or bcase_str with the same data type as an example.in to all others. Changes to dump ﬁles are made infrequently to maintain compatibility across code versions to the greatest extent possible. . rblocks (* write rblock. • All data is written as 8byte real to improve portability of the ﬁle while using FORTRAN binary I/O. • Only one processor reads and writes dump ﬁles. However. • The NIMROD and NIMSET dump. the parameter must be added to the appropriate namelist in the read_namelist subroutine. Any changes must be made to both ﬁles. data structures are written and read with generic subroutines. * read tblock) Within the rblock and tblock routines are calls to structurespeciﬁc subroutines. which makes it easy to add ﬁelds. The overall layout of a dump ﬁle is 1. * read rblock) 4.f is just to add the new parameterto the list in broadcast_input.] Other dump notes: • The read routines allocate data structures to appropriate sizes just before a record is read.f. I/O 51 access to the modiﬁed code. In addition. seams (dump_write_seam. tblocks (* write tblock. This sends the read values from the single processor that reads nimrod. A new parameter should be deﬁned near related parameters in the ﬁle.in ﬁle. global information (dump_write and dump_read) 2. The change required in parallel. don’t expect compatibility with other dump ﬁles. dump_read_seam) 3.
52 CHAPTER II. FORTRAN IMPLEMENTATION .
f for an MPI example. The ﬁrst statements in the main program establish communication between the diﬀerent processes. A well conceived implementation (thanks largely to Steve Plimpton of Sandia) keeps most parallel communication in modular coding. needed in every process is the sum over all nvalues. the independence of the processes is not. processes determine what part of the simulation to perform (in parallel_block_init). i. a basic understanding of the domain decomposition is necessary for successful development. ‘highperformance’ computing implies parallel computing. except when it encounters calls to communicate with other processes. From this information and that read from the input and dump ﬁles. Each process then performs its part of the simulation as if it were performing a serial computation. Z) for a potentially limited range of nvalues and grid blocks. and all developers will probably write at least one mpi_allreduce call at some time. Z)ˆn (R.e. A detailed description of NIMROD’s parallel communication is beyond the scope of this tutorial. ˆˆ for each grid block assigned to a 53 . lets return to the find_bb routine in utilities. isolated from the physics model coding. so NIMROD has always been developed as a parallel code. The program itself is responsible for orchestration without an active director. However. Each process is ﬁnding ˆn(R. Before describing the diﬀerent types of domain decomposition used in NIMROD. This fact was readily apparent at the inception of the NIMROD project. and tells each process the total number of processes. When mpirun is used to launch a parallel simulation. What’s b b n bb. it starts multiple processes that execute NIMROD.Chapter III Parallelization A Parallel communication introduction For the present and forseeable future. While the idea of dividing a simulation into multiple processes is intuitive. assign an integer label (node) to each process. The distributed memory paradigm and MPI communication were chosen for portability and eﬃciency.
The rb(ibl)%id and tb(ibl)%id descriptors can be used to map a processor’s local block index (ibl) to the global index (id) deﬁned in nimset. the parallel_seam_init routine in parallel. the nrbl and nbl values are the number of rblocks and rblocks+tblocks in a process.C. each process proceeds independently until its next MPI call. this is just a matter of having each processor choose a subset of the blocks.f creates new data structures during startup. are saved in the global module. Then. Block to block communication between diﬀerent processes is handled within the edge_network and edge_seg_network subroutines. In fact. These routines call parallel_seam_comm and related routines instead of the serial edge_accumulate. This need is met by calling mpi_allreduce within grid_block do loops. . regardless of parallel decomposition. or starting from nrbl+1 and ending at nbl. Decomposition of the data itself is facilitated by the FORTRAN 90 data structures. nrbl_total and nbl_total.54 CHAPTER III. the block subsets are disjoint. but the array at the highest level is over processes with which the local process shares block border information. Each process has an rb and/or tb block_type array but the sum of the dimensions is the number of elements in its subset of blocks. The parallel_seam_comm routine (and its complex version) use MPI ‘point to point’ communication to exchange information. parallel simulations assign new block index labels that are local to each process. These send and recv structures are analogous to seam structures. cross block communication is invoked with one call. For grid block decomposition. For those interested in greater details of the parallel block communication. Issue a mpi_irecv to every process in the exchange.[pardata has comment lines deﬁning the function of each of its variables. Glancing through finite_element. In routines performing ﬁnite element or linear algebra computations. This means that each process issues separate ‘send’ and ‘receive’ request to every other process involved in the communication. one sees doloops starting from 1 and ending at nrbl.] The global index limits. The general sequence of steps used in NIMROD for ‘point to point’ communications is 1. Within a layer of Fourier components (described in III. which indicates readiness to accept data. The resulting array is returned to all processes that participate in the communication. PARALLELIZATION process. parallel_block_init sets up the inverse mapping global2local array that is stored in pardata.f. B Grid block decomposition The distributed memory aspect of domain decomposition implies that each processor needs direct access to a limited amount of the simulation data. Thus. Here an array of data is sent by each process. and the MPI library routine sums the data element by element with data from other processes that have diﬀerence Fourier components for the same block.).
g.) instead of the entire array. Once veriﬁcation is complete. and the other creates a tag grouping all processes with the same subset of blocks. . The diﬃculty is overcome by passing the ﬁrst element....for the same layer as the local process . One creates a communication tag grouping all processes in a layer. Others appear after scalar products of vectors in 2D iterative solves: diﬀerent layers solve diﬀerent systems simultaneously. C Fourier “layer” decomposition By this point it’s probably evident that Fourier components are also separated into disjoint subsets. the matrices would always be 2D. CALL mpi_irecv(recv(irecv)%data(1).. 3. Process groups are established by the mpi_comm_split routine. Caution: calls to nonblocking communication (mpi_isend. As a prelude to the next section.mpi_irecv.to the variable inode. These tags are particularly useful for collective communication within the groups of processes. 4. Separating processes by some condition (e. called layers.. and Fourier component interaction would occur in explicit pseudospectral operations only. Perform some serial work. comm_mode. like seaming among diﬀerent blocks of the local process to avoid wasting CPU time while waiting for data to arrive.g. An example statement inode=block2proc(ibl_global assigns the node label for the process that owns global block number ibl_global ..) seem to have diﬃculty with buﬀers that are part of F90 structures when the mpich implementation is used. The new 3D linear systems preserve layer decomposition by avoiding explicit computation of matrix elements that are not diagonal in Fourier index. There are two in parallel_block_init. block border communication only involves processes assigned to the same Fourier layer. Issue a mpi_send to every process involved which sends data from the local process. The find_bb mpi_allreduce was one example. the mpi_comm_world tag appears. the process can continue on to other operations. comm_layer. same layer) is an important tool for NIMROD due to the two distinct types of decomposition.C. e.. FOURIER “LAYER” DECOMPOSITION 55 2. The motivation is that a large part of the NIMROD algorithm involves no interaction among diﬀerent Fourier components. but diﬀerent layers. or where ‘point to point’ communication (which references the default node index) is used. Where all processes are involved. This issue is addressed in the block2proc mapping array. The sum across blocks involves processes in the same layer only. %data. As envisioned until recently... Use mpi_waitany to verify that expected data from each participating process has arrived. hence calls to mpi_allreduce with the comm_layer tag.
isolating it from the integrand routines. the decomposition of the domain is changed before an “inverse” FFT (going from Fourier coeﬃcients to values at toroidal positions ) and after a “forward” FFT. . PARALLELIZATION Configuration!space Decomp. communication among layers is required to multiply functions of toroidal angle (see B. For illustrative purposes. we can consider a singleblock problem without loss of generality. • Communication is carried out inside fft_nim. mpseudo=5 on il=0 and mpseudo=4 on il=1 Parameters are mx=my=3. without redundancy. consider a poorly balanced choice of lphi= 3.56 CHAPTER III. Furthermore. and nlayers= 2.1). nlayer=2. The scheme lets us use serial FFT routines. 0 ≤ n ≤nmodes_total−1 with nmodes_total= 3. nf=mx×my=9 Conﬁgspace Decomp: nphi=2lphi = 8.1) Notes: • The domain swap uses the collective mpi_alltoallv routine. il=1 il=0 Normal Decomp il=1 n=2 iphi=2 il=0 n=1 iphi=1 n=0 Figure III. lphi=3 Apart from the basic mpi_allreduce exchanges.1: Normal Decomp: nmodes=2 on il=0 and nmodes=1 on il=1. (see Figure III. and it maintains a balanced workload among the processes. (layers must be divisible by nlayers.) Since pseudospectral operations are performed on block at a time.
Krylovspace iterative methods require very many iterations to reach convergence in these conditions.D. (see our 1999 APS poster. pointtopoint communication is called (from parallel_line_* routines) instead of collective communication. lines are extended across block borders.. Each approximate system ˜ Az = r is solved by inverting 1D matrices.. D Globalline preconditioning That illconditional matrices arise at large δt implies global propagation of perturbations within a single time advance.) .” on the web site. However. To achieve global relaxation. GLOBALLINE PRECONDITIONING 57 • calls to fft_nim with nr = nf duplicate the entire block’s conﬁgspace data on every layer. The most eﬀective method we have found for cg on very illconditioned matrices is a lineJacobi approach. They are deﬁned by eliminating oﬀdiagonal connections in the logical sdirection for one approximation and by eliminating in the ndirection for the other. “Recent algorithmic . Factoring and solving these systems uses a domain swap. unless the precondition step provides global “relaxation”. We actually invert two approximate matrices and average the result. not unlike that used before pseudospectral computations.
58 CHAPTER III. PARALLELIZATION .
Marder. of Docs. A. Phys. 1987. S. c [8] B.. editors. Lionello. J. and J. A new semiimplicit approach for MHD computation. Comput. 1991. 97:444. [2] G. E. Glasser. T. J. Semic implicit magnetohydrodynamic calculations. C. Krall and Trivelpiece. J. H. C. Redwine.. and E.O. A. Z. A. J. 1986. J. S. 1989. and the NIMROD Team. Harned. J. An analysis of the ﬁnite element method. U. M. Douglas S.. A. Miki´. Comput. S.Bibliography [1] N. D. Phys. Gianakon. Kruger. Springer. Schnack. Phys. Barnes. J. 195:355–386. Tarditi. Barnes.. 68:48. Stegun.S. Accurate semiimplicit treatment of the Hall eﬀect in MHD c computations. D.: National Bureau of Standards: For sale by the Supt. Handbook of Mathematical Functions. Phys.. Caramana. D. R. Upgrading to Fortran 90. Miki´. A method for incorporating Gauss’ law into electromagnetic PIC codes. Miki´. Nonlinear magnetohydrodynamics simulation using highorder ﬁnite elements. WesleyCambridge Press. D. Washington. Abramowitz and I. A. Strang and G. 70:330–354. [7] R. A. 1987. J. Fix.P. 1995. R. D. Linker. Phys. 1981. A. [4] D. Lerbinger and J. Z. Principles of Plasma Physics. Plimpton. [6] D. [5] K. Chu. 152:346.. 59 . 83:1. San Francisco Press. [10] C. Nebel. S. 2004. Sovinec. F. Comput. [3] M. Harned and Z. G. 1999.. Luciani. C.. Comput. 1964. 1987. Schnack. J. Comput. Comput. D. [9] C. Phys.