Introduction To Numerical Methods For Variational Problems PDF

Introduction to Numerical
Methods for Variational

Problems
Hans Petter Langtangen1,2

Kent-Andre Mardal3,1
1
Center for Biomedical Computing, Simula Research Laboratory
2
Department of Informatics, University of Oslo
3
Department of Mathematics, University of Oslo
This easy-to-read book introduces the basic ideas and technicalities

of least squares, Galerkin, and weighted residual methods for solving
partial differential equations. Special emphasis is put on finite element
methods.
Sep 18, 2016

vi
modern, very powerful tool for solving partial differential equations by

Preface the finite element method, and it was designed to make implementations
very compact for those who are used to the abstract formulation of the
method. Our aim here is different.
A standard reference for FEniCS users have been the excellent text
by Brenner and Scott [4], but for many students with weak formal
mathematical background, the learning curve became unnecessarily steep
using this book to take advantage of FEniCS. The present book grew
out of the need to explain variational formulations in the most intuitive
way so FEniCS users can transform their PDE problem into the proper
formulation for FEniCS programming. We then added material such that
also the details of the most fundamental finite element algorithms could
easily be understood.
The learning outcomes of this book are four-fold
1. understanding various types of variational formulations of PDE prob-
The present book is essentially a book on the finite element method, lems,
although we discuss many other choices of basis functions and other 2. understanding the machinery of finite element algorithms, with an
applications than partial differential equations. The literature on finite emphasis on one-dimensional problems,
elements contains books of many different flavors, ranging from the 3. understanding potential artifacts in simulation results,
abstract view of the method [8, 16, 4, 3] via more intuitive, algorithmic 4. understanding how variational formulations can be used in other
treatment of the subject [], to the structural engineering view on the contexts (polynomial chaos expansions, solving linear systems).
method []. The present book has a very strong algorithmic focus (“how
to compute”), but formulate the method in the abstract form with The exposition is recognized by very explicit mathematics, i.e., we have
variational forms and function spaces. tried to write out all details of the finite element “engine” such that a
One highly valued feature of the finite element method is the prospect reader can calculate a finite element problem by hand. Explaining all
of rigorous analysis provided by the functional analysis framework. In par- details and carrying them out by hand are formidable tasks in two- and
ticular, for elliptic problems (or in general symmetric problems) the theory three-dimensional PDE problems, so we restrict the attention to one
provides error control via sharp estimates and efficient computations space dimension when it comes to detailed calculations. Although we
via multiscale algorithms. In fact, the development of this mathematical imagine that the reader will use FEniCS or other similar software to
theory of finite element methods is one of the highlights of numerical actually solve finite element problems, we strongly believe that successful
analysis. However, within scientific computing the computational engine application of such complex software requires a thorough understanding
of the finite element method is being used far outside what is theoretically of the underlying method, which is best gained by hand calculations of
understood. Obviously, this is a trend that will continue in the future. the steps in the algorithms. Also, hand calculations are indispensable
Hence, it is our aim here provide the reader with the gory details of the for debugging finite element programs: one can run a one-dimensional
implementation in an explicit, user-friendly and simplistic manner to problem, print out intermediate results, and compare with separate hand
lower the threshold of usage and to provide tools for verification and calculations. When the program is fully verified in 1D, ideally the program
debugging that are applicable in general situations. should turned into a 2D/3D simulation simply by switching from a 1D
An important motivation for writing this book was to provide an mesh to the relevant 2D/3D mesh.
intuitive approach to the finite element method, using the mathematical When working with algorithms and hand calculations in the present
language frameworks such as the FEniCS software [14]. FEniCS is a book, we emphasize the usefulness of symbolic computing. Our choice is
© 2016, Hans Petter Langtangen, Kent-Andre Mardal.

Released under CC Attribution 4.0 license
vii viii
the free SymPy package, which is very easy to use for students and which finite element method often denotes the numerical solution by uh . Our
gives a seamless transition from symbolic to numerical computing. Most of mathematical notation is dictated by the natural notation in a computer
the numerical algorithms in this book are summarized as compact SymPy code, so if u is the unknown in the code, we let u be the corresponding
programs. However, symbolic computing certainly has its limitations, quantity in the mathematical description as well. When also the exact
especially when it comes to speed, so their numerical counterparts are solution of the PDE problem is needed, it is usually denoted by ue . Espe-
usually also developed in the text. The reader should be able to write her cially in the chapter on nonlinear problems we introduce notations that
own finite element program for one-dimensional problems, but otherwise are handy in a program and use the same notation in the mathematics
we have no aim to educate the reader to write fast, state-of-the-art finite such that we achieve as close correspondence as possible between the
element programs! mathematics and the code.
Another learning outcome (although not needed to be a successful
Contents. The very first chapter starts with a quick overview of how
FEniCS user) is to understand how the finite element method is a
PDE problems are solved by the finite element method. The next four
special case of more general variational approaches to solving equations.
chapters go into deep detail of the algorithms. We employ a successful
Therefore, we also treat uncertainty quantification via polynomial chaos
idea, pursued by Larson and Bengzon [13] in particular, of first treating
methods and conjugate gradient-like methods for solving linear systems in
finite element approximation before attacking PDE problems. Chapter 2
a way that hopefully gives the reader an understanding of how seemingly
explains approximation of functions in function spaces from a general
very different numerical methods actually are just variants of a common
point of view, where finite element basis functions constitute one example
way of reasoning.
to be explored in Chapter 3. The principles of variational formulations
Many common topics found in finite element books are not present
constitute the subject of Chapter 4. A lot of details of the finite element
in this book. A striking feature is perhaps the presence of the abstract
machinery are met already in the approximation problem in Chapter 3, so
formulation of the finite element method, but without any classical error
when these topics are familiar together with the variational formulations
analysis. The reason is that we have taken a very practical approach
of Chapter 4, it is easier to put the entire finite element method together
to the contents: what does a user need to know to safely apply finite
in Chapter 5 with variational formulations of PDE problems, boundary
element software? A thorough understanding of the errors is obviously
conditions, and the necessary finite element computing algorithms. Our
essential, but the classical error analysis of elliptic problems is of limited
experience is that this pedagogical approach greatly simplifies the learning
practical interest for the practitioner, except for the final results regarding
process for students. Chapter 6 explains how time-dependent problems
convergence rates of different types of finite elements.
are attacked, primarily by using finite difference discretizations in time.
In time-dependent problems, on the other hand, a lot of things can go
Here we develop the corresponding finite difference schemes and analyze
wrong with finite element solutions, but the extensions of the classical
them via Fourier components and numerical dispersion relations. How to
finite element error analysis to time dependency quickly meets limitations
set up variational formulations of systems of PDEs, and in particular the
with respect to explaining typical numerical artifacts. We therefore follow
way systems are treated in FEniCS, is the topic of Chapter 7. Nonlinear
a completely different type of analysis, namely the one often used for
ODE and PDE problems are treated quite comprehensively in Chapter 9.
finite difference methods: insight through numerical dispersion relations
With all the fundamental building blocks from Chapter 2, we have most
and similar results based on exact discrete solutions via Fourier wave
of the elements in place for a quick treatment of the popular polynomial
components. Actually, all the analysis of the quality of finite element
chaos expansion method for uncertainty quantification in Chapter 10.
solutions are in this book done with the aid of techniques for analyzing
This chapter becomes a nice illustration of the power of the theory in
finite difference methods, so a knowledge of finite differences is needed.
the beginning of the book. A similar illustration of the applicability of
This approach also makes it very easy to compare the two methods,
variational thinking appears in Chapter 11 where we construct iterative
which is frequently done throughout the text.
methods for linear systems and derive methods in the Conjugate gradient
The mathematical notation in this text makes deviations from the
family.
literature trends. Especially books on the abstract formulation of the
ix
Supplementary materials. All program and data files referred to in this

book are available from the book’s primary web site: http://hplgit.
github.io/fem-book/doc/web/.
Oslo, September 2016 Hans Petter Langtangen, Kent-Andre Mardal

xii Contents
2.4.2 Lagrange polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Contents 2.4.3 Bernstein polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5 Approximation properties and convergence rates . . . . . . . . . . 51
2.6 Approximation of functions in higher dimensions . . . . . . . . . . 56
2.6.1 2D basis functions as tensor products of 1D functions 57
2.6.2 Example on polynomial basis in 2D . . . . . . . . . . . . . . . . 58
2.6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.6.4 Extension to 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3 Function approximation by finite elements . . . . . . . . . . . 71

3.1 Finite element basis functions . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.1.1 Elements and nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.2 The basis functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.1.3 Example on quadratic finite element functions . . . . . . . 77
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
3.1.4 Example on linear finite element functions . . . . . . . . . . 78
1 Quick overview of the finite element method . . . . . . . . 1 3.1.5 Example on cubic finite element functions . . . . . . . . . . 80
3.1.6 Calculating the linear system . . . . . . . . . . . . . . . . . . . . . 81
2 Function approximation by global functions . . . . . . . . . 9 3.1.7 Assembly of elementwise computations . . . . . . . . . . . . . 85
3.1.8 Mapping to a reference element . . . . . . . . . . . . . . . . . . . 88
2.1 Approximation of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.9 Example on integration over a reference element . . . . . 90
2.1.1 Approximation of planar vectors . . . . . . . . . . . . . . . . . . . 11
2.1.2 Approximation of general vectors . . . . . . . . . . . . . . . . . . 14 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.2.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.2 Approximation principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Linear system assembly and solution . . . . . . . . . . . . . . . 95
2.2.1 The least squares method . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Example on computing symbolic approximations . . . . 96
2.2.2 The projection (or Galerkin) method . . . . . . . . . . . . . . . 20
3.2.4 Using interpolation instead of least squares . . . . . . . . . 96
2.2.3 Example on linear approximation . . . . . . . . . . . . . . . . . . 20
3.2.5 Example on computing numerical approximations . . . . 97
2.2.4 Implementation of the least squares method . . . . . . . . . 21
3.2.6 The structure of the coefficient matrix . . . . . . . . . . . . . 98
2.2.5 Perfect approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.2.6 The regression method . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.8 Sparse matrix storage and solution . . . . . . . . . . . . . . . . . 101
2.3 Orthogonal basis functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Comparison of finite elements and finite differences . . . . . . . . 104
2.3.1 Ill-conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.1 Finite difference approximation of given functions . . . . 104
2.3.2 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Interpretation of a finite element approximation in
2.3.3 Orthogonal basis functions . . . . . . . . . . . . . . . . . . . . . . . . 36
terms of finite difference operators . . . . . . . . . . . . . . . . . 104
2.3.4 Numerical computations . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.3 Making finite elements behave as finite differences . . . 107
2.4 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 A generalized element concept . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.4.1 The interpolation (or collocation) principle . . . . . . . . . 39
3.4.1 Cells, vertices, and degrees of freedom . . . . . . . . . . . . . . 109

Contents xiii xiv Contents
3.4.2 Extended finite element concept . . . . . . . . . . . . . . . . . . . 110 4.3.1 Variable coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
3.4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.2 First-order derivative in the equation and boundary
3.4.4 Computing the error of the approximation . . . . . . . . . . 112 condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
3.4.5 Example on cubic Hermite polynomials . . . . . . . . . . . . . 114 4.3.3 Nonlinear coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.5 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.4 Implementation of the algorithms . . . . . . . . . . . . . . . . . . . . . . . 167
3.5.1 Newton-Cotes rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.4.1 Extensions of the code for approximation . . . . . . . . . . . 168
3.5.2 Gauss-Legendre rules with optimized points . . . . . . . . . 116 4.4.2 Fallback on numerical methods . . . . . . . . . . . . . . . . . . . . 169
4.4.3 Example with constant right-hand side . . . . . . . . . . . . . 170
3.6 Finite elements in 2D and 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.6.1 Basis functions over triangles in the physical domain . 117 4.5 Approximations may fail: convection-diffusion . . . . . . . . . . . . . 172
3.6.2 Basis functions over triangles in the reference cell . . . . 120
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
3.6.3 Affine mapping of the reference cell . . . . . . . . . . . . . . . . 123
3.6.4 Isoparametric mapping of the reference cell . . . . . . . . . 124 5 Variational formulations with finite elements . . . . . . . . 181
3.6.5 Computing integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1 Computing with finite elements . . . . . . . . . . . . . . . . . . . . . . . . . 181
3.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.1.1 Finite element mesh and basis functions . . . . . . . . . . . . 182
3.7.1 Example on approximation in 2D using FEniCS . . . . . 126 5.1.2 Computation in the global physical domain . . . . . . . . . 183
3.7.2 Refined code with curve plotting . . . . . . . . . . . . . . . . . . 128 5.1.3 Comparison with a finite difference discretization . . . . 185
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.1.4 Cellwise computations . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.2 Boundary conditions: specified nonzero value . . . . . . . . . . . . . 189
4 Variational formulations with global basis functions . 137
5.2.1 General construction of a boundary function . . . . . . . . 189
4.1 Basic principles for approximating differential equations . . . . 137 5.2.2 Example on computing with a finite element-based
4.1.1 Differential equation models . . . . . . . . . . . . . . . . . . . . . . 138 boundary function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.1.2 Simple model problems and their solutions . . . . . . . . . . 139 5.2.3 Modification of the linear system . . . . . . . . . . . . . . . . . . 194
4.1.3 Forming the residual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.2.4 Symmetric modification of the linear system . . . . . . . . 197
4.1.4 The least squares method . . . . . . . . . . . . . . . . . . . . . . . . 143 5.2.5 Modification of the element matrix and vector . . . . . . . 198
4.1.5 The Galerkin method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.3 Boundary conditions: specified derivative . . . . . . . . . . . . . . . . . 199
4.1.6 The method of weighted residuals . . . . . . . . . . . . . . . . . 144
5.3.1 The variational formulation . . . . . . . . . . . . . . . . . . . . . . . 199
4.1.7 Test and trial functions . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.3.2 Boundary term vanishes because of the test functions 199
4.1.8 The collocation method . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.3.3 Boundary term vanishes because of linear system
4.1.9 Examples on using the principles . . . . . . . . . . . . . . . . . . 147
modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
4.1.10 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.3.4 Direct computation of the global linear system . . . . . . 201
4.1.11 Boundary function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.3.5 Cellwise computations . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
4.2 Computing with global polynomials . . . . . . . . . . . . . . . . . . . . . 155
5.4 Implementation of finite element algorithms . . . . . . . . . . . . . . 203
4.2.1 Computing with Dirichlet and Neumann conditions . . 155
5.4.1 Extensions of the code for approximation . . . . . . . . . . . 203
4.2.2 When the numerical method is exact . . . . . . . . . . . . . . . 157
5.4.2 Utilizing a sparse matrix . . . . . . . . . . . . . . . . . . . . . . . . . 207
4.2.3 Abstract notation for variational formulations . . . . . . . 158
5.4.3 Application to our model problem . . . . . . . . . . . . . . . . . 208
4.2.4 Variational problems and minimization of functionals . 160
5.5 Variational formulations in 2D and 3D . . . . . . . . . . . . . . . . . . . 210
4.3 Examples on variational formulations . . . . . . . . . . . . . . . . . . . . 162
Contents xv xvi Contents
5.5.1 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 6.4 Accuracy of the finite element solution . . . . . . . . . . . . . . . . . . . 255
5.5.2 Example on a multi-dimensional variational problem . 211 6.4.1 Illustrating example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
5.5.3 Transformation to a reference cell in 2D and 3D . . . . . 213 6.4.2 Methods of analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
5.5.4 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 6.4.3 Fourier components and dispersion relations . . . . . . . . 257
5.5.5 Convenient formulas for P1 elements in 2D . . . . . . . . . 216 6.4.4 Forward Euler discretization . . . . . . . . . . . . . . . . . . . . . . 259
5.5.6 A glimpse of the mathematical theory of the finite 6.4.5 Backward Euler discretization . . . . . . . . . . . . . . . . . . . . . 260
element method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 6.4.6 Comparing amplification factors . . . . . . . . . . . . . . . . . . . 261
5.6 Implementation in 2D and 3D via FEniCS . . . . . . . . . . . . . . . 222 6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
5.6.1 Mathematical problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.6.2 Variational formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 223 7 Variational forms for systems of PDEs . . . . . . . . . . . . . . . 267
5.6.3 The FEniCS solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 7.1 Variational forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
5.6.4 Making the mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 7.1.1 Sequence of scalar PDEs formulation . . . . . . . . . . . . . . . 268
5.6.5 Solving a problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 7.1.2 Vector PDE formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 268
5.7 Convection-diffusion and Petrov-Galerkin methods . . . . . . . . 229 7.2 A worked example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 7.3 Identical function spaces for the unknowns . . . . . . . . . . . . . . . 270
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 7.3.1 Variational form of each individual PDE . . . . . . . . . . . . 270
7.3.2 Compound scalar variational form . . . . . . . . . . . . . . . . . 271
6 Time-dependent variational forms . . . . . . . . . . . . . . . . . . . 239 7.3.3 Decoupled linear systems . . . . . . . . . . . . . . . . . . . . . . . . . 272
7.3.4 Coupled linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
6.1 Discretization in time by a Forward Euler scheme . . . . . . . . . 240
6.1.1 Time discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 7.4 Different function spaces for the unknowns . . . . . . . . . . . . . . . 275
6.1.2 Space discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.5 Computations in 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
6.1.3 Variational forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.5.1 Another example in 1D . . . . . . . . . . . . . . . . . . . . . . . . . . 281
6.1.4 Notation for the solution at recent time levels . . . . . . . 243
6.1.5 Deriving the linear systems . . . . . . . . . . . . . . . . . . . . . . . 244 7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
6.1.6 Computational algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 245
6.1.7 Example using sinusoidal basis functions . . . . . . . . . . . 246 8 Flexible implementations of boundary conditions . . . . 289
6.1.8 Comparing P1 elements with the finite difference 8.1 Optimization with constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 8.1.1 Elimination of variables . . . . . . . . . . . . . . . . . . . . . . . . . . 290
6.2 Discretization in time by a Backward Euler scheme . . . . . . . . 249 8.1.2 Lagrange multiplier method . . . . . . . . . . . . . . . . . . . . . . . 290
6.2.1 Time discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 8.1.3 Penalty method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
6.2.2 Variational forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 8.2 Optimization of functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
6.2.3 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 8.2.1 Classical calculus of variations . . . . . . . . . . . . . . . . . . . . 293
6.3 Dirichlet boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 251 8.2.2 Penalty method for optimization with constraints . . . . 295
6.3.1 Boundary function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 8.2.3 Lagrange multiplier method for optimization with
6.3.2 Finite element basis functions . . . . . . . . . . . . . . . . . . . . . 252 constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
6.3.3 Modification of the linear system . . . . . . . . . . . . . . . . . . 252 8.2.4 Example: 1D problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
6.3.4 Example: Oscillating Dirichlet boundary condition . . . 253 8.2.5 Example: adding a constraint in a Neumann problem . 301
Contents xvii xviii Contents
9 Nonlinear problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 9.7.2 The group finite element method . . . . . . . . . . . . . . . . . . 367
9.7.3 Numerical integration of nonlinear terms by hand . . . . 370
9.1 Introduction of basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 303
9.7.4 Discretization of a variable coefficient Laplace term . . 371
9.1.1 Linear versus nonlinear equations . . . . . . . . . . . . . . . . . . 303
9.1.2 A simple model problem . . . . . . . . . . . . . . . . . . . . . . . . . 305 10 Uncertainty quantification and polynomial chaos
9.1.3 Linearization by explicit time discretization . . . . . . . . . 306 expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
9.1.4 Exact solution of nonlinear algebraic equations . . . . . . 307
9.1.5 Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 10.1 Sample problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
9.1.6 Picard iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 10.1.1 ODE for decay processes . . . . . . . . . . . . . . . . . . . . . . . . . 376
9.1.7 Linearization by a geometric mean . . . . . . . . . . . . . . . . . 311 10.1.2 The stochastic Poisson equation . . . . . . . . . . . . . . . . . . . 377
9.1.8 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 10.2 Basic principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
9.1.9 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 10.2.1 Basic statistical results . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
9.1.10 Implementation and experiments . . . . . . . . . . . . . . . . . . 315 10.2.2 Least-squares methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
9.1.11 Generalization to a general nonlinear ODE . . . . . . . . . . 318 10.2.3 Example: Least squares applied to the decay ODE . . . 382
9.1.12 Systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 10.2.4 Modeling the response . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
9.2 Systems of nonlinear algebraic equations . . . . . . . . . . . . . . . . . 323 10.2.5 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
9.2.1 Picard iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 10.2.6 Stochastic collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
9.2.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 10.3 The Chaospy software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
9.2.3 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
9.2.4 Example: A nonlinear ODE model from epidemiology 327 10.4 Intrusive polynomial chaos methods . . . . . . . . . . . . . . . . . . . . . 383
9.3 Linearization at the differential equation level . . . . . . . . . . . . . 329 11 Variational methods for linear systems . . . . . . . . . . . . . . 385
9.3.1 Explicit time integration . . . . . . . . . . . . . . . . . . . . . . . . . 330
11.1 Conjugate gradient-like iterative methods . . . . . . . . . . . . . . . . 386
9.3.2 Backward Euler scheme and Picard iteration . . . . . . . . 330
11.1.1 The Galerkin method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
9.3.3 Backward Euler scheme and Newton’s method . . . . . . 331
11.1.2 The least squares method . . . . . . . . . . . . . . . . . . . . . . . . 387
9.3.4 Crank-Nicolson discretization . . . . . . . . . . . . . . . . . . . . . 334
11.1.3 Krylov subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
9.4 1D stationary nonlinear differential equations . . . . . . . . . . . . . 335 11.1.4 Computation of the basis vectors . . . . . . . . . . . . . . . . . . 387
9.4.1 Finite difference discretization . . . . . . . . . . . . . . . . . . . . . 336 11.1.5 Computation of a new solution vector . . . . . . . . . . . . . . 389
9.4.2 Solution of algebraic equations . . . . . . . . . . . . . . . . . . . . 338 11.1.6 Summary of the least squares method . . . . . . . . . . . . . . 390
9.4.3 Galerkin-type discretization . . . . . . . . . . . . . . . . . . . . . . . 343 11.1.7 Truncation and restart . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
9.4.4 Picard iteration defined from the variational form . . . . 344 11.1.8 Summary of the Galerkin method . . . . . . . . . . . . . . . . . 391
9.4.5 Newton’s method defined from the variational form . . 345 11.1.9 A framework based on the error . . . . . . . . . . . . . . . . . . . 392
9.5 Multi-dimensional PDE problems . . . . . . . . . . . . . . . . . . . . . . . 347 11.2 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
9.5.1 Finite element discretization . . . . . . . . . . . . . . . . . . . . . . 348 11.2.1 Motivation and Basic Principles . . . . . . . . . . . . . . . . . . . 393
9.5.2 Finite difference discretization . . . . . . . . . . . . . . . . . . . . . 351 11.2.2 Use of the preconditioning matrix in the iterative
9.5.3 Continuation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 11.2.3 Classical iterative methods as preconditioners . . . . . . . 395
11.2.4 Incomplete factorization preconditioners . . . . . . . . . . . . 396
9.7 Symbolic nonlinear finite element equations . . . . . . . . . . . . . . . 366
9.7.1 Finite element basis functions . . . . . . . . . . . . . . . . . . . . . 367 A Useful formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Contents xix
A.1 Finite difference operator notation . . . . . . . . . . . . . . . . . . . . . . 399

A.2 Truncation errors of finite difference approximations . . . . . . . 400
A.3 Finite differences of exponential functions . . . . . . . . . . . . . . . . 401
A.4 Finite differences of t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
n
402
A.4.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
xxii List of Exercises and Problems
Exercise 3.12: Implement 3D computations with global basis

List of Exercises and Problems functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Exercise 3.13: Use Simpson’s rule and P2 elements . . . . . . . . . . . . . 136
Exercise 3.14: Make a 3D code for Lagrange elements of arbitrary
order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Exercise 4.1: Refactor functions into a more general class . . . . . . . 177
Exercise 4.2: Compute the deflection of a cable with sine functions 177
Exercise 4.3: Compute the deflection of a cable with power
functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Exercise 4.4: Check integration by parts . . . . . . . . . . . . . . . . . . . . . . 179
Exercise 5.1: Compute the deflection of a cable with 2 P1 elements 234
Exercise 5.2: Compute the deflection of a cable with 1 P2 element 234
Exercise 5.3: Compute the deflection of a cable with a step load . 234
Exercise 5.4: Compute with a non-uniform mesh . . . . . . . . . . . . . . . 235
Problem 5.5: Solve a 1D finite element problem by hand . . . . . . . . 235
Problem 2.1: Linear algebra refresher . . . . . . . . . . . . . . . . . . . . . . . . 65 Exercise 5.6: Investigate exact finite element solutions . . . . . . . . . . 236
Problem 2.2: Approximate a three-dimensional vector in a plane . 66 Exercise 5.7: Compare finite elements and differences for a radially
Problem 2.3: Approximate a parabola by a sine . . . . . . . . . . . . . . . 66 symmetric Poisson equation . . . . . . . . . . . . . . . . . . . 236
Problem 2.4: Approximate the exponential function by power Exercise 5.8: Compute with variable coefficients and P1 elements
functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 by hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Problem 2.5: Approximate the sine function by power functions . 67 Exercise 5.9: Solve a 2D Poisson equation using polynomials and
Problem 2.6: Approximate a steep function by sines . . . . . . . . . . . . 68 sines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Problem 2.7: Approximate a steep function by sines with boundary Exercise 6.1: Analyze a Crank-Nicolson scheme for the diffusion
adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Exercise 2.8: Fourier series as a least squares approximation . . . . . 69 Problem 7.1: Estimate order of convergence for the Cooling law . 286
Problem 2.9: Approximate a steep function by Lagrange Problem 7.2: Estimate order of convergence for the Cooling law . 287
polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Problem 9.1: Determine if equations are nonlinear or not . . . . . . . 355
Problem 2.10: Approximate a steep function by Lagrange Exercise 9.2: Derive and investigate a generalized logistic model . 355
polynomials and regression . . . . . . . . . . . . . . . . . . . . 70 Problem 9.3: Experience the behavior of Newton’s method . . . . . . 356
Problem 3.1: Define nodes and elements . . . . . . . . . . . . . . . . . . . . . . 131 Problem 9.4: Compute the Jacobian of a 2 × 2 system . . . . . . . . . . 357
Problem 3.2: Define vertices, cells, and dof maps . . . . . . . . . . . . . . 131 Problem 9.5: Solve nonlinear equations arising from a vibration
Problem 3.3: Construct matrix sparsity patterns . . . . . . . . . . . . . . . 132 ODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Problem 3.4: Perform symbolic finite element computations . . . . . 132 Exercise 9.6: Find the truncation error of arithmetic mean of
Problem 3.5: Approximate a steep function by P1 and P2 elements 132 products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Problem 3.6: Approximate a steep function by P3 and P4 elements 132 Problem 9.7: Newton’s method for linear problems . . . . . . . . . . . . . 359
Exercise 3.7: Investigate the approximation error in finite elements 133 Exercise 9.8: Discretize a 1D problem with a nonlinear coefficient 359
Problem 3.8: Approximate a step function by finite elements . . . . 133 Exercise 9.9: Linearize a 1D problem with a nonlinear coefficient 359
Exercise 3.9: 2D approximation with orthogonal functions . . . . . . 134 Problem 9.10: Finite differences for the 1D Bratu problem . . . . . . 360
Exercise 3.10: Use the Trapezoidal rule and P1 elements . . . . . . . . 135 Problem 9.11: Integrate functions of finite element expansions . . . 361
Exercise 3.11: Compare P1 elements and interpolation . . . . . . . . . 135 Problem 9.12: Finite elements for the 1D Bratu problem . . . . . . . . 362

List of Exercises and Problems xxiii
Exercise 9.13: Discretize a nonlinear 1D heat conduction PDE by

finite differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Exercise 9.14: Use different symbols for different approximations
of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Exercise 9.15: Derive Picard and Newton systems from a
variational form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
Exercise 9.16: Derive algebraic equations for nonlinear 1D heat
conduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
Exercise 9.17: Differentiate a highly nonlinear term . . . . . . . . . . . . 365
Exercise 9.18: Crank-Nicolson for a nonlinear 3D diffusion equation 365
Exercise 9.19: Find the sparsity of the Jacobian . . . . . . . . . . . . . . . 365
Problem 9.20: Investigate a 1D problem with a continuation method 365
2 1 Quick overview of the finite element method
The finite element method is a versatile approach to construct compu-

Quick overview of the finite
1
tational schemes to solve any partial differential equation on any domain
in any dimension. The method may at first glance appear cumbersome
element method and even unnatural as it relies on variational formulations and polynomial
spaces. However, the study of these somewhat abstract concepts pays off
as the finite element method provides a general recipe for efficient and
accurate simulations.
Let us start by outlining the concepts briefly. Consider the following
PDE in 2D:
−∇2 u = −uxx − uyy = f,

equipped with suitable boundary conditions. A finite difference scheme
to solve the current PDE would in the simplest case be described by the
stencil
ui−1,j − 2ui,j + ui+1,j ui,j−1 − 2ui,j + ui,j+1

− − = fi (1.1)
h2 h2
or reordered to the more famous
−ui−1,j − ui,j−1 + 4ui,j − ui+1,j − ui,j+1
= fi . (1.2)
h2
On a structured mesh, the stencil appears natural and is convenient
to implement. However, for a unstructured, “complicated”domain as
shown in Figure 1.1, we would need to be careful when placing points
and evaluating stencils and functions. Both accuracy and efficiency may
easily be sacrificed by a reckless implementation.
In general, a domain like the one represented in Figure 1.1 will be
representation by a triangulation. The finite element method (and the
finite volume method which often is a special case of the finite element
method) is a methodology for creating stencils like (1.2) in a structured
manner that adapt to the underlying triangulation.
The triangulation in Figure 1.1 is a mesh that consists of cells that are
connected and defined in terms of vertices. The fundamental idea of the
finite element method is to construct a procedure to compute a stencil
on a general element and then apply this procedure to each element of
Fig. 1.1 Example on a complicated domain for solving PDEs. the mesh. Let us therefore denote the mesh as Ω while Ωe is the domain
of a generic element such that Ω = ∪e Ωe .
This is exactly the point where the challenges of the finite element
method starts and where we need some new concepts. The basic question

1 Quick overview of the finite element method 3 4 1 Quick overview of the finite element method
Z Z Z
is: How should we create a stencil like (1.2) for a general element and a −∇2 u v dx = ∇u · ∇v dx −
∂u
v dS (1.6)
general PDE that has the maximal accuracy and minimal computational Ωe Ωe ∂Ωe ∂n
complexity at the current triangulation? The two basic building blocks The reasons behind this alternative formulation is rather mathematical
of the finite element method are and will not be a major subject of this book as it is well described
elsewhere. In fact, a precise explanation would need tools from functional
1. the solution is represented in terms of a polynomial expression on the
analysis.
given general element, and
With the above rewrite assuming now that the boundary term vanish
2. a variational formulation of the PDE where element-wise integration
due to boundary conditions (why this is possible will be dealt with in
enables the PDE to be transformed to a stencil.
detail later in this book) the stencil, corresponding to (1.2), is represented
Step 1 is, as will be explained later, conveniently represented both by
implementation-wise and mathematically as a solution Z
∇u · ∇v dx
N
X Ωe
u= ci ψi (x, y), (1.3)
where u is called the trial function, v is called a test function, and Ω
i=0
is an element of a triangulated mesh. The idea of software like FEniCS
where {ci } are the coefficients to be determined (often called the degrees is that this piece of mathematics can be directly expressed in terms of
of freedom) and ψi (x, y) are prescribed polynomials. The next step is Python code as
the variational formulation. This step may seem like a trick of magic or
a cumbersome mathematical exercise at first glance. We take the PDE mesh = Mesh("some_file")
V = FunctionSpace(mesh, "some polynomial")
and multiply by a function v (usually called a the test function) and u = TrialFunction(V)
integrate over an element Ωe and obtain the expression v = TestFunction(V)
a = dot(grad(u), grad(v))*dx
Z Z
−∇2 u v dx = f v dx (1.4) The methodology and code in this example is not tied to a particular
equation, except the formula for a, holding the derivatives of our sample
Ωe Ωe
A perfectly natural question at this point is: Why multiply with a test PDE, but any other PDE terms could be expressed via u, v, grad, and
function v? The simple answer is that there are N + 1 unknowns that other symbolic operators in this line of code. In fact, finite element
needs to be determined in u in (1.3) and for this we need N + 1 equations. packages like FEniCS are typically structured as general toolboxes that
The equations are obtained by using N + 1 different test functions which can be adapted to any PDE as soon as the derivation of variational
when used in (1.5) give rise to N + 1 linearly independent equations. formulations is mastered. The main obstacle here for a novice FEM user
While (1.4) is a variational formulation of our PDE problem, it is not is then to understand the concept of trial functions and test functions
the most common form. It is common to re-write realised in terms of polynomial spaces.
Z
Hence, a finite element formulation (or a weak formulation) of the
−∇2 u v dx (1.5)
Ωe Poisson problem that works on any mesh Ω can be written in terms of
solving the problem:
to weaken the requirement of the polynomial space used for the trial Z Z
function (that here needs to be twice differentiable) and write this term ∇u · ∇vd dx = f v dx .
on its corresponding weak form. That is, the term is rewritten in terms Ω Ω
of first-derivatives only (of both the trial and the test function) with the By varying the trial and test spaces we obtain different stencils, some of
aid of Gauss-Green’s lemma: which will be identical to finite difference schemes on particular meshes.
1 Quick overview of the finite element method 5 6 1 Quick overview of the finite element method
We will now show a complete FEniCS program to illustrate how a typical Z Z

finite element code may be structured ∇u · ∇vd dx = f v dx .
Ω Ω
mesh = Mesh("some_file")
V = FunctionSpace(mesh, "some polynomial")
This statement is assumed valid for all test functions v in some function
u = TrialFunction(V) space V of polynomials. The right-hand side expression is coded in
v = TestFunction(V) FEniCS as
a = dot(grad(u), grad(v))*dx
L = f*v*dx L = f*v*dx
bc = DirichletBC(V, "some_function", "some_domain") and the problem is then solved by the statements
solution = Function(V) # unknown FEM function
solve(a == L, solution, bc) u = Function(V) # unknown FEM function
plot(solution) solve(a == L, u, bc)
While the finite element method is versatile and may be adapted where bc holds information about boundary conditions. This information
to any PDE on any domain in any dimension, the different methods is connected to information about the triangulation, the mesh. Assuming
that are derived by using different trial and test functions may vary u = 0 on the boundary, we can in FEniCS generate a triangular mesh
significantly in terms of accuracy and efficiency. In fact, even though the over a rectangular domain [−1, −1] × [−1, 1] as follows:
process of deriving at a variational formulation is general, a bad choice
mesh = RectangleMesh((-1, -1), (1, 1), nx=10, ny=10)
of polynomial space may in some cases lead to a completely wrong result. bc = DirichletBC(V, 0, ’on_boundary’)
This is particularly the case for complicated PDEs. For this reason, it
is dangerous to regard the method as a black box and not do proper Mathematically, the finite element method transforms our PDE to a
verification of the method for a particular application. sparse linear system. The solve step performs two tasks: construction of
In this book we will put focus on verification in the sense that we the linear system based on the given information about the domain and
provide the reader with explicit calculations as well as demonstrations of its elements, and then solution of the linear system by either an iterative
how such computations can be performed by using symbolic or numerical or direct method.
calculations to control the various parts of the computational framework. We are now in a position to summarize all the parts of a FEniCS
In our view, there are three important tests that should be frequently program that solves the Poisson equation by the finite element method:
employed during verification: from fenics import *
mesh = RectangleMesh((-1, -1), (1, 1), nx=10, ny=10)
1. reducing the model problem to 1D and carefully check the calculations V = FunctionSpace(mesh, ’P’, 2) # quadratic polynomials
involved in the variational formulation on a small 1D mesh bc = DirichletBC(V, 0, ’on_boundary’)
u = TrialFunction(V)
2. perform the calculation involved on one general or random element v = TestFunction(V)
3. test whether convergence is obtained and to what order the method a = dot(grad(u), grad(v))*dx
converge by refining the mesh L = f*v*dx
u = Function(V) # unknown FEM function to be computed
solve(a == L, u, bc)
The two first task here should ideally be performed by independent vtkfile = File(’poisson.pvd’); vtkfile << u # store solution
calculations outside the framework used for the simulations. In our view
sympy is a convenient tool that can be used to assist hand calculations. Solving a different PDE is a matter of changing a and L. We refer to the
So far, we have outlined how the finite element method handles deriva- FEniCS tutorial [12, 11] for lots of examples.
tives in a PDE, but we also had a right-hand side f . This term is Although we assert here that the finite element method is a tool
multiplied by the test function v as well, such that the entire Poisson that can solve any PDE problem on any domain of any complexity, the
equation is transformed to fundamental ideas of the method are in fact even more general. This book
will emphasize this general framework and show how the finite element
1 Quick overview of the finite element method 7
method is actually just one way to choose the expansion functions ψ,

but there are many others, giving rise to other frameworks for other
problems, such as uncertainty quantification via polynomial chaos theory
in Chapter 10. By understanding the general framework of approximations
of the form (1.3), i.e, understanding how ψi are constructed and how
ci are computed, we believe the reader can develop a better intuitive
understanding the finite element method in particular and its foundation
in general.
10 2 Function approximation by global functions
This is the reason why we shall first become familiar with finite element
Function approximation by global
2
approximation before addressing finite element methods for differential
equations.
functions First, we refresh some linear algebra concepts about approximating
vectors in vector spaces. Second, we extend these concepts to approxi-
mating functions in function spaces, using the same principles and the
same notation. We present examples on approximating functions by
global basis functions with support throughout the entire domain. That
is, the functions are in general nonzero on the entire domain. Third,
we introduce the finite element type of basis functions globally. These
basis functions will later, in ref ch:approx:fe, be used with local support
(meaning that each function is nonzero except in a small part of the
domain) to enhance stability and efficiency. We explain all details of
the computational algorithms involving such functions. Four types of
approximation principles are covered: 1) the least squares method, 2) the
Many successful numerical solution methods for differential equations, L2 projection or Galerkin method, 3) interpolation or collocation, and 4)
including the finite element method, aim at approximating the unknown the regression method.
function by a sum
2.1 Approximation of vectors

N
X
u(x) ≈ ci ψi (x), (2.1)
i=0
where ψi (x) are prescribed functions and c0 , . . . , cN are unknown coef- We shall start with introducing two fundamental methods for deter-
ficients to be determined. Solution methods for differential equations mining the coefficients ci in (2.1). These methods will be introduce for
utilizing (2.1) must have a principle for constructing N + 1 equations approximation of vectors. Using vectors in vector spaces to bring across
to determine c0 , . . . , cN . Then there is a machinery regarding the actual the ideas is believed to appear more intuitive to the reader than starting
constructions of the equations for c0 , . . . , cN , in a particular problem. directly with functions in function spaces. The extension from vectors to
Finally, there is a solve phase for computing the solution c0 , . . . , cN of functions will be trivial as soon as the fundamental ideas are understood.
the N + 1 equations. The first method of approximation is called the least squares method
Especially in the finite element method, the machinery for construct- and consists in finding ci such that the difference f − u, measured
ing the discrete equations to be implemented on a computer is quite in a certain norm, is minimized. That is, we aim at finding the best
comprehensive, with many mathematical and implementational details approximation u to f , with the given norm as measure of “distance”. The
entering the scene at the same time. From an ease-of-learning perspective second method is not as intuitive: we find u such that the error f − u
it can therefore be wise to follow an idea of Larson and Bengzon [13] is orthogonal to the space where u lies. This is known as projection, or
and introduce the computational machinery for a trivial equation: u = f . in the context of differential equations, the idea is also well known as
Solving this equation with f given and u on the form (2.1), means that Galerkin’s method. When approximating vectors and functions, the two
we seek an approximation u to f . This approximation problem has the methods are equivalent, but this is no longer the case when applying the
advantage of introducing most of the finite element toolbox, but with- principles to differential equations.
out involving demanding topics related to differential equations (e.g.,
integration by parts, boundary conditions, and coordinate mappings).

2.1 Approximation of vectors 11 12 2 Function approximation by global functions
2.1.1 Approximation of planar vectors We say that ψ0 is a basis vector in the space V . Our aim is to find the
vector
Let f = (3, 5) be a vector in the xy plane and suppose we want to
approximate this vector by a vector aligned in the direction of another u = c0 ψ0 ∈ V (2.3)
vector that is restricted to be aligned with some vector (a, b). Figure 2.1
depicts the situation. This is the simplest approximation problem for which best approximates the given vector f = (3, 5). A reasonable
vectors. Nevertheless, for many readers it will be wise to refresh some criterion for a best approximation could be to minimize the length of the
basic linear algebra by consulting a textbook. Exercise 2.1 suggests difference between the approximate u and the given f . The difference,
specific tasks to regain familiarity with fundamental operations on inner or error e = f − u, has its length given by the norm
product vector spaces. Familiarity with such operations are assumed in 1
the forthcoming text. ||e|| = (e, e) 2 ,
where (e, e) is the inner product of e and itself. The inner product, also
6 called scalar product or dot product, of two vectors u = (u0 , u1 ) and
v = (v0 , v1 ) is defined as
(3,5)
(u, v ) = u0 v0 + u1 v1 . (2.4)
5
Remark. We should point out that we use the notation (·, ·) for two
different things: (a, b) for scalar quantities a and b means the vector
starting in the origin and ending in the point (a, b), while (u, v ) with
vectors u and v means the inner product of these vectors. Since vectors
4
are here written in boldface font there should be no confusion. We

may add that the norm associated with this inner product is the usual
3 Euclidean length of a vector, i.e.,
c0(a,b) q q
kuk = (u, u) = u20 + u21
2
The least squares method. We now want to determine the u that

minimizes ||e||, that is we want to compute the optimal c0 in (2.3). The
1
algebra is simplified if we minimize the square of the norm, ||e||2 = (e, e),
(a,b) instead of the norm itself. Define the function
0 E(c0 ) = (e, e) = (f − c0 ψ0 , f − c0 ψ0 ) . (2.5)

0 1 2 3 4 5 6
Fig. 2.1 Approximation of a two-dimensional vector in a one-dimensional vector space. We can rewrite the expressions of the right-hand side in a more convenient
form for further work:
We introduce the vector space V spanned by the vector ψ0 = (a, b): E(c0 ) = (f , f ) − 2c0 (f , ψ0 ) + c20 (ψ0 , ψ0 ) . (2.6)
V = span {ψ0 } . (2.2) This rewrite results from using the following fundamental rules for inner
product spaces:
and f ). Then we see mathematically that e is orthogonal to any vector

(αu, v ) = α(u, v ), α ∈ R, (2.7) v in the space V and we may express any v ∈ V as v = sψ0 for any
scalar parameter s (recall that two vectors are orthogonal when their
inner product vanishes). Then we calculate the inner product
(u + v , w) = (u, w) + (v , w), (2.8)
(e, sψ0 ) = (f − c0 ψ0 , sψ0 )
(u, v ) = (v , u) . (2.9) = (f , sψ0 ) − (c0 ψ0 , sψ0 )
Minimizing E(c0 ) implies finding c0 such that = s(f , ψ0 ) − sc0 (ψ0 , ψ0 )
(f , ψ0 )
∂E = s(f , ψ0 ) − s (ψ0 , ψ0 )
= 0. (ψ0 , ψ0 )
∂c0
= s ((f , ψ0 ) − (f , ψ0 ))
It turns out that E has one unique minimum and no maximum point.
Now, when differentiating (2.6) with respect to c0 , note that none of the = 0.
inner product expressions depend on c0 , so we simply get Therefore, instead of minimizing the square of the norm, we could demand
∂E that e is orthogonal to any vector in V , which in our present simple case
= −2(f , ψ0 ) + 2c0 (ψ0 , ψ0 ) . (2.10) amounts to a single vector only. This method is known as projection.
∂c0
(The approach can also be referred to as a Galerkin method as explained
Setting the above expression equal to zero and solving for c0 gives
at the end of Section 2.1.2.)
(f , ψ0 ) Mathematically, the projection method is stated by the equation
c0 = , (2.11)
(ψ0 , ψ0 )
(e, v ) = 0, ∀v ∈ V . (2.14)
which in the present case, with ψ0 = (a, b), results in
An arbitrary v ∈ V can be expressed as sψ0 , s ∈ R, and therefore (2.14)
3a + 5b implies
c0 = 2 . (2.12)
a + b2
For later, it is worth mentioning that setting the key equation (2.10) (e, sψ0 ) = s(e, ψ0 ) = 0,
to zero and ordering the terms lead to which means that the error must be orthogonal to the basis vector in the
space V :
(f − c0 ψ0 , ψ0 ) = 0,
or (e, ψ0 ) = 0 or (f − c0 ψ0 , ψ0 ) = 0,
which is what we found in (2.13) from the least squares computations.
(e, ψ0 ) = 0 . (2.13)
This implication of minimizing E is an important result that we shall
make much use of. 2.1.2 Approximation of general vectors
2
The projection method. We shall now show that minimizing ||e|| Let us generalize the vector approximation from the previous section
implies that e is orthogonal to any vector v in the space V . This result is to vectors in spaces with arbitrary dimension. Given some vector f , we
visually quite clear from Figure 2.1 (think of other vectors along the line want to find the best approximation to this vector in the space
(a, b): all of them will lead to a larger distance between the approximation
V = span {ψ0 , . . . , ψN } . 

 0, if p 6= i and q =
6 i,
We assume that the space has dimension N + 1 and that basis vectors ∂

c , if p = i and q = 6 i,
ψ0 , . . . , ψN are linearly independent so that none of them are redundant. cp cq = q
(2.17)
∂ci 
 cp , if p 6= i and q = i,
Any vector u ∈ V can then be written as a linear combination of the 
 2ci , if p = q = i .
basis vectors, i.e.,
Then
N
X
u= cj ψj ,
N X N N N
j=0 ∂ X X X
cp cq (ψp , ψq ) = cp (ψp , ψi )+ cq (ψi , ψq )+2ci (ψi , ψi ) .
where cj ∈ R are scalar coefficients to be determined. ∂ci p=0 q=0 p=0,p6=i q=0,q6=i
The least squares method. Now we want to find c0 , . . . , cN , such that Since each of the two sums is missing the term ci (ψi , ψi ), we may split
u is the best approximation to f in the sense that the distance (error) the very last term in two, to get exactly that “missing” term for each
e = f − u is minimized. Again, we define the squared distance as a sum. This idea allows us to write
function of the free parameters c0 , . . . , cN ,
N X N N
∂ X X
X X cp cq (ψp , ψq ) = 2 ci (ψj , ψi ) . (2.18)
E(c0 , . . . , cN ) = (e, e) = (f − cj ψj , f − cj ψj ) ∂ci p=0 q=0 j=0
j j
It then follows that setting
N
X N X
X N
= (f , f ) − 2 cj (f , ψj ) + cp cq (ψp , ψq ) . (2.15) ∂E
j=0 p=0 q=0 = 0, i = 0, . . . , N,
∂ci
Minimizing this E with respect to the independent variables c0 , . . . , cN implies
is obtained by requiring
N
X
∂E −2(f , ψi ) + 2 ci (ψj , ψi ) = 0, i = 0, . . . , N .
= 0, i = 0, . . . , N . j=0
∂ci
The first term in (2.15) is independent of ci , so its derivative vanishes. Moving the first term to the right-hand side shows that the equation is
The second term in (2.15) is differentiated as follows: actually a linear system for the unknown parameters c0 , . . . , cN :
N
X
N
∂ X Ai,j cj = bi , i = 0, . . . , N, (2.19)
2 cj (f , ψj ) = 2(f , ψi ), (2.16)
∂ci j=0 j=0
since the expression to be differentiated is a sum and only one term, where
ci (f , ψi ), contains ci (this term is linear in ci ). To understand this
differentiation in detail, write out the sum specifically for, e.g, N = 3 Ai,j = (ψi , ψj ), (2.20)
and i = 1.
bi = (ψi , f ) . (2.21)
The last term in (2.15) is more tedious to differentiate. It can be wise
to write out the double sum for N = 1 and perform differentiation with We have changed the order of the two vectors in the inner product
respect to c0 and c1 to see the structure of the expression. Thereafter, according to (2.9):
one can generalize to an arbitrary N and observe that
sum of squares of differences between the components in f and u. We

Ai,j = (ψj , ψi ) = (ψi , ψj ), find u such that this sum of squares is minimized.
The principle (2.22), or the equivalent form (2.23), is known as pro-
simply because the sequence i-j looks more aesthetic.
jection. Almost the same mathematical idea was used by the Russian
The Galerkin or projection method. In analogy with the “one- mathematician Boris Galerkin to solve differential equations, resulting in
dimensional” example in Section 2.1.1, it holds also here in the general what is widely known as Galerkin’s method.
case that minimizing the distance (error) e is equivalent to demanding
that e is orthogonal to all v ∈ V :
2.2 Approximation principles
(e, v ) = 0, ∀v ∈ V . (2.22)
PN
Since any v ∈ V can be written as v = i=0 ci ψi , the statement (2.22) Let V be a function space spanned by a set of basis functions ψ0 , . . . , ψN ,
is equivalent to saying that
N
V = span {ψ0 , . . . , ψN },
X
(e, ci ψi ) = 0, such that any function u ∈ V can be written as a linear combination of
i=0
the basis functions:
for any choice of coefficients c0 , . . . , cN . The latter equation can be
X
rewritten as u= cj ψj . (2.24)
j∈Is
N
X
ci (e, ψi ) = 0 . That is, we consider functions as vectors in a vector space – a so-called
i=0 function space – and we have a finite set of basis functions that span the
If this is to hold for arbitrary values of c0 , . . . , cN , we must require that space just as basis vectors or unit vectors span a vector space.
each term in the sum vanishes, which means that The index set Is is defined as Is = {0, . . . , N } and is from now on
used both for compact notation and for flexibility in the numbering of
(e, ψi ) = 0, i = 0, . . . , N . (2.23) elements in sequences.
For now, in this introduction, we shall look at functions of a single
These N + 1 equations result in the same linear system as (2.19): variable x: u = u(x), ψj = ψj (x), j ∈ Is . Later, we will almost trivially
N N extend the mathematical details to functions of two- or three-dimensional
X X
(f − cj ψj , ψi ) = (f , ψi ) − (ψi , ψj )cj = 0, physical spaces. The approximation (2.24) is typically used to discretize
j=0 j=0 a problem in space. Other methods, most notably finite differences, are
and hence common for time discretization, although the form (2.24) can be used in
time as well.
N
X
(ψi , ψj )cj = (f , ψi ), i = 0, . . . , N .
j=0
2.2.1 The least squares method
So, instead of differentiating the E(c0 , . . . , cN ) function, we could simply
use (2.22) as the principle for determining c0 , . . . , cN , resulting in the Given a function f (x), how can we determine its best approximation
N + 1 equations (2.23). u(x) ∈ V ? A natural starting point is to apply the same reasoning as we
The names least squares method or least squares approximation are did for vectors in Section 2.1.2. That is, we minimize the distance between
natural since the calculations consists of minimizing ||e||2 , and ||e||2 is a u and f . However, this requires a norm for measuring distances, and a
2.2 Approximation principles 19 20 2 Function approximation by global functions
norm is most conveniently defined through an inner product. Viewing a 2.2.2 The projection (or Galerkin) method
function as a vector of infinitely many point values, one for each value
As in Section 2.1.2, the minimization of (e, e) is equivalent to
of x, the inner product of two arbitrary functions f (x) and g(x) could
intuitively be defined as the usual summation of pairwise “components”
(e, v) = 0, ∀v ∈ V . (2.30)
(values), with summation replaced by integration:
Z This is known as a projection of a function f onto the subspace V . We
(f, g) = f (x)g(x) dx . may also call it a Galerkin method for approximating functions. Using
the same reasoning as in (2.22)-(2.23), it follows that (2.30) is equivalent
To fix the integration domain, we let f (x) and ψi (x) be defined for a to
domain Ω ⊂ R. The inner product of two functions f (x) and g(x) is then
Z (e, ψi ) = 0, i ∈ Is . (2.31)
(f, g) = f (x)g(x) dx . (2.25)
Ω Inserting e = f − u in this equation and ordering terms, as in the multi-
The distance between f and any function u ∈ V is simply f − u, and dimensional vector case, we end up with a linear system with a coefficient
the squared norm of this distance is matrix (2.28) and right-hand side vector (2.29).
Whether we work with vectors in the plane, general vectors, or func-
X X
E = (f (x) − cj ψj (x), f (x) − cj ψj (x)) . (2.26) tions in function spaces, the least squares principle and the projection or
j∈Is j∈Is Galerkin method are equivalent.
Note the analogy with (2.15): the given function f plays the role of the
given vector f , and the basis function ψi plays the role of the basis vector
ψi . We can rewrite (2.26), through similar steps as used for the result 2.2.3 Example on linear approximation
(2.15), leading to
Let us apply the theory in the previous section to a simple problem:
given a parabola f (x) = 10(x − 1)2 − 1 for x ∈ Ω = [1, 2], find the best
X X X
E(ci , . . . , cN ) = (f, f ) − 2 cj (f, ψi ) + cp cq (ψp , ψq ) . (2.27) approximation u(x) in the space of all linear functions:
j∈Is p∈Is q∈Is
V = span {1, x} .
Minimizing this function of N + 1 scalar variables {ci }i∈Is , requires
differentiation with respect to ci , for all i ∈ Is . The resulting equations With our notation, ψ0 (x) = 1, ψ1 (x) = x, and N = 1. We seek
are very similar to those we had in the vector case, and we hence end
up with a linear system of the form (2.19), with basically the same u = c0 ψ0 (x) + c1 ψ1 (x) = c0 + c1 x,
expressions: where c0 and c1 are found by solving a 2 × 2 the linear system. The
coefficient matrix has elements
Ai,j = (ψi , ψj ), (2.28)

bi = (f, ψi ) . (2.29)
The only difference from (2.19) is that the inner product is defined in
terms of integration rather than summation.
Z 2
A0,0 = (ψ0 , ψ0 ) = 1 · 1 dx = 1, (2.32)
1 10
Z 2 approximation
A0,1 = (ψ0 , ψ1 ) = 1 · x dx = 3/2, (2.33) exact
1 8
A1,0 = A0,1 = 3/2, (2.34)
Z 2 6
A1,1 = (ψ1 , ψ1 ) = x · x dx = 7/3 . (2.35)
1 4
The corresponding right-hand side is
2
Z 2
b1 = (f, ψ0 ) = (10(x − 1)2 − 1) · 1 dx = 7/3, (2.36) 0
1
Z 2 2
b2 = (f, ψ1 ) = (10(x − 1)2 − 1) · x dx = 13/3 . (2.37)
1
41.0 1.2 1.4 1.6 1.8 2.0 2.2
Solving the linear system results in x
Fig. 2.2 Best approximation of a parabola by a straight line.
c0 = −38/3, c1 = 10, (2.38)
and consequently
is a list of expressions for {ψi }i∈Is , and Omega is a 2-tuple/list holding
38 the limits of the domain Ω:
u(x) = 10x −
. (2.39)
3
import sympy as sym
Figure 2.2 displays the parabola and its best approximation in the space
of all linear functions. def least_squares(f, psi, Omega):
N = len(psi) - 1
A = sym.zeros((N+1, N+1))
b = sym.zeros((N+1, 1))
2.2.4 Implementation of the least squares method x = sym.Symbol(’x’)
for i in range(N+1):
for j in range(i, N+1):
Symbolic integration. The linear system can be computed either sym- A[i,j] = sym.integrate(psi[i]*psi[j],
bolically or numerically (a numerical integration rule is needed in the (x, Omega[0], Omega[1]))
latter case). Let us first compute the system and its solution symbolically, A[j,i] = A[i,j]
b[i,0] = sym.integrate(psi[i]*f, (x, Omega[0], Omega[1]))
i.e., using classical “pen and paper” mathematics with symbols. The c = A.LUsolve(b)
Python package sympy can greatly help with this type of mathematics, # Note: c is a sympy Matrix object, solution is in c[:,0]
and will therefore be frequently used in this text. Some basic familiarity u = 0
for i in range(len(psi)):
with sympy is assumed, typically symbols, integrate, diff, expand, u += c[i,0]*psi[i]
and simplify. Much can be learned by studying the many applications return u, c
of sympy that will be presented. Observe that we exploit the symmetry of the coefficient matrix: only
Below is a function for symbolic computation of the linear system, the upper triangular part is computed. Symbolic integration, also in
where f (x) is given as a sympy expression f involving the symbol x, psi
sympy, is often time consuming, and (roughly) halving the work has # fall back on numerical integration
noticeable effect on the waiting time for the computations to finish. integrand = sym.lambdify([x], integrand)
I = sym.mpmath.quad(integrand, [Omega[0], Omega[1]])
A[i,j] = A[j,i] = I
Notice integrand = psi[i]*f
We remark that the symbols in sympy are created and stored in if symbolic:
I = sym.integrate(integrand, (x, Omega[0], Omega[1]))
a symbol factory that is indexed by the expression used in the if not symbolic or isinstance(I, sym.Integral):
construction and that repeated constructions from the same expres- integrand = sym.lambdify([x], integrand)
sion will not create new objects. The following code illustrates the I = sym.mpmath.quad(integrand, [Omega[0], Omega[1]])
b[i,0] = I
behavior of the symbol factory: c = A.LUsolve(b) # symbolic solve
# c is a sympy Matrix object, numbers are in c[i,0]
>>> from sympy import * c = [sym.simplify(c[i,0]) for i in range(c.shape[0])]
>>> x0 = Symbol("x") u = sum(c[i]*psi[i] for i in range(len(psi)))
>>> x1 = Symbol("x") return u, c
>>> id(x0) ==id(x1)
True The function is found in the file approx1D.py.
>>> a0 = 3.0
>>> a1 = 3.0 Plotting the approximation. Comparing the given f (x) and the ap-
>>> id(a0) ==id(a1)
False proximate u(x) visually is done by the following function, which utilizes
sympy’s lambdify tool to convert a sympy expression to a Python func-
tion for numerical computations:
Fall back on numerical R
integration. Obviously,
R
sympy may fail to def comparison_plot(f, u, Omega, filename=’tmp.pdf’):
x = sym.Symbol(’x’)
successfully integrate Ω ψi ψj dx, and especially Ω f ψi dx, symbolically. f = sym.lambdify([x], f, modules="numpy")
Therefore, we should extend the least_squares function such that it falls u = sym.lambdify([x], u, modules="numpy")
back on numerical integration if the symbolic integration is unsuccessful. resolution = 401 # no of points in plot
xcoor = linspace(Omega[0], Omega[1], resolution)
In the latter case, the returned value from sympy’s integrate function exact = f(xcoor)
is an object of type Integral. We can test on this type and utilize approx = u(xcoor)
the mpmath module in sympy to perform numerical integration of high plot(xcoor, approx)
hold(’on’)
precision. Even when sympy manages to integrate symbolically, it can take plot(xcoor, exact)
an undesirable long time. We therefore include an argument symbolic legend([’approximation’, ’exact’])
that governs whether or not to try symbolic integration. Here is a complete savefig(filename)
and improved version of the previous function least_squares: The modules=’numpy’ argument to lambdify is important if there are
def least_squares(f, psi, Omega, symbolic=True): mathematical functions, such as sin or exp in the symbolic expressions
N = len(psi) - 1 in f or u, and these mathematical functions are to be used with vector
A = sym.zeros((N+1, N+1)) arguments, like xcoor above.
x = sym.Symbol(’x’) Both the least_squares and comparison_plot functions are found
for i in range(N+1): in the file approx1D.py. The comparison_plot function in this file is
for j in range(i, N+1): more advanced and flexible than the simplistic version shown above. The
integrand = psi[i]*psi[j]
if symbolic: file ex_approx1D.py applies the approx1D module to accomplish the
I = sym.integrate(integrand, (x, Omega[0], Omega[1])) forthcoming examples.
if not symbolic or isinstance(I, sym.Integral):
# Could not integrate symbolically,
P
Notice The linear system j Ai,j cj = bi , i ∈ Is , is then
X X
We remind the reader that the code examples can be found in cj Ai,j = dj Ai,j , i ∈ Is ,
a tarball at http://hplgit.github.io/fem-book/doc/web/. The j∈Is j∈Is
following command shows a useful way to search for code
which implies that ci = di for i ∈ Is .
Terminal> find . -name ’*.py’ -exec grep least_squares {} \; -print
Here ’.’ specifies the directory for the search, -name ’*.py’ that
files with suffix *.py should be searched through while 2.2.6 The regression method
-exec grep least_squares {} \; -print So far, the function to be approximated has been known in terms of
means that all lines containing the text least_squares should be a formula f (x). Very often in applications, no formula is known, but
printed to the screen. the function value is known at a set of points. If we use N + 1 basis
functions and know exactly N + 1 function values, we can determine
the coefficients ci by interpolation as explained in Section 2.4.1. The
approximating function will then equal the f values at the points where
the f values are sampled.
2.2.5 Perfect approximation However, one normally has f sampled at a lot of points, here denoted by
x0 , x1 , . . . , xm , and we assume m N . What can we do then to determine
Let us use the code above to recompute the problem from Section 2.2.3
the coefficients? The answer is to find a least squares approximation. The
where we want to approximate a parabola. What happens if we add an
resulting method is called regression and is well known from statistics
element x2 to the basis and test what the best approximation is if V
when fitting a simple (usually polynomial) function to a set of data
is the space of all parabolic functions? The answer is quickly found by
points.
running
Overdetermined equation system. Intuitively, we would demand u to
>>> from approx1D import *
>>> x = sym.Symbol(’x’) equal f at all the data points xi , i0, 1, . . . , m,
>>> f = 10*(x-1)**2-1 X
>>> u, c = least_squares(f=f, psi=[1, x, x**2], Omega=[1, 2]) u(xi ) = cj ψj (xi ) = f (xi ), i = 0, 1, . . . , m . (2.40)
>>> print u
j∈Is
10*x**2 - 20*x + 9
>>> print sym.expand(f) The fundamental problem here is that we have more equations than
10*x**2 - 20*x + 9
unknowns since there are N + 1 unknowns and m + 1 > N + 1 equations.
Now, what if we use ψi (x) = xi for i = 0, 1, . . . , N = 40? The output Such a system of equations is called an overdetermined system. We can
from least_squares gives ci = 0 for i > 2, which means that the method write it in matrix form as
finds the perfect approximation. X
In fact, we have a general result that if f ∈ V , the least squares and Ai,j cj = bi , i = 0, 1, . . . , m, (2.41)
projection/Galerkin methods compute the exact solution u = f . The j∈Is
proof is straightforward: if f ∈ V , f can be expanded in terms of the with coefficient matrix and right-hand side vector given by
P
basis functions, f = j∈Is dj ψj , for some coefficients {dj }j∈Is , and the
right-hand side then has entries
X X Ai,j = ψj (xi ), (2.42)
bi = (f, ψi ) = dj (ψj , ψi ) = dj Ai,j . bi = f (xi ) . (2.43)
j∈Is j∈Is
Note that the matrix is a rectangular (m + 1) × (N + 1) matrix since The equation system (2.45) or (2.46) are known as the normal equations.
i = 0, . . . , m and j = 0, . . . , N . With A as an (m+1)×(N +1) matrix, AT A becomes an (N +1)×(N +1)
The normal equations derived from a least squares principle. The matrix, and AT b becomes a vector of length N + 1. Often, m N , so
least squares method is a common technique for solving overdetermined AT A is much smaller than A.
P
equations systems. Let us write the overdetermined system j∈Is Ai,j cj = Many prefer to write the linear system (2.45) on the standard form
P
bi more compactly in matrix form as Ac = b. Since we have more equations j Bi,j cj = di , i = 0, . . . , NP
. We can easily do so by exchanging
P T
than unknowns, it is (in general) impossible to find a vector c that fulfills the i and k index (i ↔ k), i ATk,i Ai,j = k Ai,k Ak,j , and setting
P T
Ac = b. The best we can do is to make the residual r = b − Ac as small as Bi,j = k Ai,k Ak,j . Similarly, we exchange i and k in the right-hand side
P
possible. That is, we can find c such that it minimizes the norm Euclidean expression and get k ATi,k bk = di . Expressing Bi,j and di in terms of
norm of r: ||r||. The algebra simplifies significantly by minimizing ||r||2 the ψi and xi , using (2.42) and (2.43), we end up with the formulas
instead. This principle corresponds to a least squares method.
P P
The i-th component of r reads ri = bi − j Ai,j cj , so ||r||2 = i ri2 . X X m
X
2
Minimizing ||r|| with respect to the unknowns c0 , . . . , cN implies that Bi,j = ATi,k Ak,j = Ak,i Ak,j = ψi (xk )ψj (xk ), (2.47)
k k k=0
∂ X X m
X
||r||2 = 0, k = 0, . . . , N, (2.44) di = ATi,k bk = Ak,i bk = ψi (xk )f (xk ) (2.48)
∂ck
k k k=0
which leads to
Implementation. The following function defines the matrix entries Bi,j
according to (2.47) and the right-hand side entries di according (2.48).
∂ X 2 X ∂ri X ∂ X X P
ri = 2ri = 2ri (bi − Ai,j cj ) = 2 ri (−Ai,k ) = 0 . Thereafter, it solves the linear system j Bi,j cj = di . The input data
∂ck i i
∂ck i
∂ck j i f and psi hold f (x) and ψi , i = 0, . . . , N , as symbolic expression, but
P since m is thought to be much larger than N , and there are loops from 0
By inserting ri = bi − Ai,j cj in the last expression we get
j to m, we use numerical computing to speed up the computations.
  def regression(f, psi, points):
X X X X X N = len(psi) - 1
bi − Ai,j cj  (−Ai,k ) = − bi Ai,k + ( Ai,j Ai,k )cj = 0 . m = len(points)
i j i j i # Use numpy arrays and numerical computing
B = np.zeros((N+1, N+1))
d = np.zeros(N+1)
Introducing the transpose of A, AT , we know that ATi,j = Aj,i . Therefore, # Wrap psi and f in Python functions rather than expressions
P P
the expression i Ai,j Ai,k can be written as i ATk,i Ai,j and be recognized # so that we can evaluate psi at points[i]
P
as the formula for the matrix-matrix product AT A. Also, i bi Ai,k can x = sym.Symbol(’x’)
P T psi_sym = psi # save symbolic expression
be written i Ak,i bi and recognized as the matrix-vector product AT b. psi = [sym.lambdify([x], psi[i]) for i in range(N+1)]
These observations imply that (2.44) is equivalent to the linear system f = sym.lambdify([x], f)
for j in range(N+1):
X X X X B[i,j] = 0
( ATk,i Ai,j )cj = (AT A)k,j cj = ATk,i bi = (AT b)k , k = 0, . . . , N, for k in range(m+1):
j i j i B[i,j] += psi[i](points[k])*psi[j](points[k])
(2.45) d[i] = 0
for k in range(m+1):
or in matrix form, d[i] += psi[i](points[k])*f(points[k])
c = np.linalg.solve(B, d)
AT Ac = AT b . (2.46) u = sum(c[i]*psi_sym[i] for i in range(N+1))
10 10 10
return u, c
approximation approximation approximation
8 exact 8 exact 8 exact
2 interpolation points 8 interpolation points 32 interpolation points
6 6 6
Example. We repeat the computational example from Section 2.4.1, but 4 4 4
this time with many more points. The parabola f (x) = 10(x − 1)2 − 1 is
2 2 2
0 0 0
to be approximated by a linear function on Ω = [1, 2]. We divide Ω into 2 2 2
m + 2 intervals and use the inner m + 1 points: 4

1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
4
1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
4
1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
Fig. 2.3 Approximation of a parabola by a regression method with varying number of

import sympy as sym points.
f = 10*(x-1)**2 - 1
10 10 10
approximation approximation approximation
exact 8 exact exact
psi = [1, x]
8 8
4 data points 8 data points 32 data points
6
6 6
Omega = [1, 2] 4
4
4
m_values = [2-1, 8-1, 64-1] 2
2
2
# Create m+3 points and use the inner m+1 points

0
0 0
2
for m in m_values: 2 4 2
points = np.linspace(Omega[0], Omega[1], m+3)[1:-1] 4

1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
6
1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
4
1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
u, c = regression(f, psi, points)

Fig. 2.4 Approximation of a parabola with noise by a straight line.
comparison_plot(
f, u, Omega,
filename=’parabola_by_regression_%d’ % (m+1),
points=points, We can fit a parabola instead of a straight line, as done in Figure 2.5.
points_legend=’%d interpolation points’ % (m+1),
legend_loc=’upper left’)
When m becomes large, the fitted parabola and the original parabola
without noise become very close.
Figure 2.3 shows results for m + 1 = 2 (left), m + 1 = 8 (middle), and
m + 1 = 64 (right) data points. The approximating function is not so 14
approximation
12
approximation
10
approximation
sensitive to the number of points as long as they cover a significant part 12
10
exact
4 data points
10 exact
8 data points 8
exact
32 data points
of the domain (the first 2 point approximation puts too much weight
8
6
8
6
on the center, while the 8 point approximation cover almost the entire
6 4
4
4
2
domain and produces a good approximation which is barely improved

2
2
0 0
0
with 64 points): 2
1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
2
1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
2
1.0 1.2 1.4 1.6
x
1.8 2.0 2.2
Fig. 2.5 Approximation of a parabola with noise by a parabola.
u(x) = 10x − 13.2, 2 points

The regression method is not much used for approximating differential
u(x) = 10x − 12.7, 8 points equations or a given function, but is central in uncertainty quantification
u(x) = 10x − 12.7, 64 points methods such as polynomial chaos expansions.
To explicitly make the link to classical regression in statistics, we
consider f = 10(x − 1)2 − 1 + , where is a random, normally distributed The residual: an indirect but computationally cheap mea-
variable. The goal in classical regression is to find the straight line that sure of the error
best fits the data points (in a least squares sense). The only difference When attempting to solve a system Ac = b, we may question how
from the previous setup, is that the f (xi ) values are based on a function far off a vector c0 is. The error is clearly the difference between c
formula, here 10(x − 1)2 − 1, plus normally distributed noise. Figure 2.4 and c0 , e = c − c0 . The vector c0 is obviously the solution of the
shows three sets of data points, along with the original f (x) function problem Ac0 = b0 , where b0 is easily computable as the matrix
without noise, and the straight line that is a least squares approximation
to the data points.
2.3 Orthogonal basis functions 31 32 2 Function approximation by global functions
vector product Ac0 . The residual can be seen as the error of the u(x) = c0 + c1 x + c2 x2 + · · · + c10 x10 . Analytically, we know that cj = 0
input data, b − b0 and is defined for j > 2, but numerically we may get cj 6= 0 for j > 2.
r = b − Ac0 . exact sympy numpy32 numpy64

9 9.62 5.57 8.98
Clearly, the error and the residual are related by -20 -23.39 -7.65 -19.93
10 17.74 -4.50 9.96
0 -9.19 4.13 -0.26
Ae = r. 0 5.25 2.99 0.72
0 0.18 -1.21 -0.93
While the computation of the error requires inversion of A, which 0 -2.48 -0.41 0.73
may be computationally expensive, the residual is easily computable. 0 1.81 -0.013 -0.36
0 -0.66 0.08 0.11
hpl 1: Here something is missing? What is actually the point? 0 0.12 0.04 -0.02
Try to insert a heading in the notice... 0 -0.001 -0.02 0.002
The exact value of cj , j = 0, 1, . . . , 10, appears in the first column while

the other columns correspond to results obtained by three different
methods:
2.3 Orthogonal basis functions
• Column 2: The matrix and vector are converted to the data struc-
ture sympy.mpmath.fp.matrix and the sympy.mpmath.fp.lu_solve
Approximation of a function via orthogonal functions, especially si-
function is used to solve the system.
nusoidal functions or orthogonal polynomials, is a very popular and
• Column 3: The matrix and vector are converted to numpy arrays with
successful approach. The finite element method does not make use of
data type numpy.float32 (single precision floating-point number)
orthogonal functions, but functions that are “almost orthogonal”.
and solved by the numpy.linalg.solve function.
• Column 4: As column 3, but the data type is numpy.float64 (double
precision floating-point number).
2.3.1 Ill-conditioning
We see from the numbers in the table that double precision performs
For basis functions that are not orthogonal the condition number of much better than single precision. Nevertheless, when plotting all these
the matrix may create problems during the solution process due to for solutions the curves cannot be visually distinguished (!). This means that
example round-off errors as will be illustrated in the following. The com- the approximations look perfect, despite the partially very wrong values
putational example in Section 2.2.5 applies the least_squares function of the coefficients.
which invokes symbolic methods to calculate and solve the linear system. Increasing N to 12 makes the numerical solver in numpy abort with
The correct solution c0 = 9, c1 = −20, c2 = 10, ci = 0 for i ≥ 3 is perfectly the message: "matrix is numerically singular". A matrix has to be non-
recovered. singular to be invertible, which is a requirement when solving a linear
Suppose we convert the matrix and right-hand side to floating-point system. Already when the matrix is close to singular, it is ill-conditioned,
arrays and then solve the system using finite-precision arithmetics, which which here implies that the numerical solution algorithms are sensitive
is what one will (almost) always do in real life. This time we get aston- to round-off errors and may produce (very) inaccurate results.
ishing results! Up to about N = 7 we get a solution that is reasonably The reason why the coefficient matrix is nearly singular and ill-
close to the exact one. Increasing N shows that seriously wrong coeffi- conditioned is that our basis functions ψi (x) = xi are nearly linearly
cients are computed. Below is a table showing the solution of the linear dependent for large i. That is, xi and xi+1 are very close for i not very
system arising from approximating a parabola by functions on the form small. This phenomenon is illustrated in Figure 2.6. There are 15 lines
in this figure, but only half of them are visually distinguishable. Almost functions. The functions used in the finite element methods are almost
linearly dependent basis functions give rise to an ill-conditioned and orthogonal, and this property helps to avoid problems with solving matrix
almost singular matrix. This fact can be illustrated by computing the systems. Almost orthogonal is helpful, but not enough when it comes
determinant, which is indeed very close to zero (recall that a zero deter- to partial differential equations, and ill-conditioning of the coefficient
minant implies a singular and non-invertible matrix): 10−65 for N = 10 matrix is a theme when solving large-scale matrix systems arising from
and 10−92 for N = 12. Already for N = 28 the numerical determinant finite element discretizations.
computation returns a plain zero.
2.3.2 Fourier series

18000
A set of sine functions is widely used for approximating functions (note
16000 that the sines are orthogonal with respect to the L2 inner product as
can be easily verified using sympy). Let us take
14000
12000 V = span {sin πx, sin 2πx, . . . , sin(N + 1)πx} .
10000 That is,
8000
ψi (x) = sin((i + 1)πx), i ∈ Is .
6000
An approximation to the parabola f (x) = 10(x−1)2 −1 for x ∈ Ω = [1, 2]
4000 from Section 2.2.3 can then be computed by the least_squares function
from Section 2.2.4:
2000
N = 3
01.0 1.2 1.4 1.6 1.8 2.0 2.2 import sympy as sym
psi = [sym.sin(sym.pi*(i+1)*x) for i in range(N+1)]
Fig. 2.6 The 15 first basis functions xi , i = 0, . . . , 14. f = 10*(x-1)**2 - 1
Omega = [0, 1]
u, c = least_squares(f, psi, Omega)
On the other hand, the double precision numpy solver does run for comparison_plot(f, u, Omega)
N = 100, resulting in answers that are not significantly worse than those P
Figure 2.7 (left) shows the oscillatory approximation of N j=0 cj sin((j +
in the table above, and large powers are associated with small coefficients
1)πx) when N = 3. Changing N to 11 improves the approximation
(e.g., cj < 10−2 for 10 ≤ j ≤ 20 and c < 10−5 for j > 20). Even for
considerably, see Figure 2.7 (right).
N = 100 the approximation still lies on top of the exact curve in a plot
There is an error f (0) − u(0) = 9 at x = 0 in Figure 2.7 regardless of
(!).
how large N is, because all ψi (0) = 0 and hence u(0) = 0. We may help
The conclusion is that visual inspection of the quality of the approx-
the approximation to be correct at x = 0 by seeking
imation may not uncover fundamental numerical problems with the
computations. However, numerical analysts have studied approximations X
u(x) = f (0) + cj ψj (x) . (2.49)
and ill-conditioning for decades, and it is well known that the basis
j∈Is
{1, x, x2 , x3 , . . . , } is a bad basis. The best basis from a matrix condition-
ing point of view is to have orthogonal functions such that (ψi , ψj ) = 0 for However, this adjustment introduces a new problem at x = 1 since we
i=6 j. There are many known sets of orthogonal polynomials and other now get an error f (1) − u(1) = f (1) − 0 = −1 at this point. A more
10 10 10 10
approximation approximation approximation approximation
exact exact exact exact
8 8 8 8
6 6 6 6
4 4 4 4
2 2 2 2
0 0 0 0
20.0 0.2 0.4 0.6 0.8 1.0 20.0 0.2 0.4 0.6 0.8 1.0 20.0 0.2 0.4 0.6 0.8 1.0 20.0 0.2 0.4 0.6 0.8 1.0
x x x x
Fig. 2.7 Best approximation of a parabola by a sum of 3 (left) and 11 (right) sine Fig. 2.8 Best approximation of a parabola by a sum of 3 (left) and 11 (right) sine
functions. functions with a boundary term.
clever adjustment is to replace the f (0) term by a term that is f (0) at 2.3.3 Orthogonal basis functions
x = 0 and f (1) at x = 1. A simple linear combination f (0)(1 − x) + xf (1)
does the job: The choice of sine functions ψi (x) = sin((i + 1)πx) has a great compu-
X tational advantage: on Ω = [0, 1] these basis functions are orthogonal,
u(x) = f (0)(1 − x) + xf (1) + cj ψj (x) . (2.50) implying that Ai,j = 0 if i 6= j. This result is realized by trying
j∈Is
integrate(sin(j*pi*x)*sin(k*pi*x), x, 0, 1)
This adjustment of u alters the linear system slightly. In the general case,
in WolframAlpha √ (avoid i in the integrand as this symbol means the
we set R
imaginary unit −1). Asking WolframAlpha also about 01 sin2 (jπx) dx,
X we find that it equals 1/2. With a diagonal matrix we can easily solve
u(x) = B(x) + cj ψj (x),
for the coefficients by hand:
j∈Is
Z
and the linear system becomes 1
ci = 2 f (x) sin((i + 1)πx) dx, i ∈ Is , (2.51)
X 0
(ψi , ψj )cj = (f − B, ψi ), i ∈ Is . which is nothing but the classical formula for the coefficients of the
j∈Is
Fourier sine series of f (x) on [0, 1]. In fact, when V contains the basic
The calculations can still utilize the least_squares or least_squares_orth functions used in a Fourier series expansion, the approximation method
functions, but solve for u − b: derived in Section 2.2 results in the classical Fourier series for f (x) (see
f0 = 0; f1 = -1
Exercise 2.8 for details).
B = f0*(1-x) + x*f1 With orthogonal basis functions we can make the least_squares
u_sum, c = least_squares_orth(f-b, psi, Omega) function (much) more efficient since we know that the matrix is diagonal
u = B + u_sum
and only the diagonal elements need to be computed:
Figure 2.8 shows the result of the technique for ensuring right boundary
def least_squares_orth(f, psi, Omega):
values. Even 3 sines can now adjust the f (0)(1 − x) + xf (1) term such N = len(psi) - 1
that u approximates the parabola really well, at least visually. A = [0]*(N+1)
b = [0]*(N+1)
A[i] = sym.integrate(psi[i]**2, (x, Omega[0], Omega[1]))
b[i] = sym.integrate(psi[i]*f, (x, Omega[0], Omega[1]))
c = [b[i]/A[i] for i in range(len(b))]
u = 0 Below is such a method. It requires Python functions f(x) and psi(x,i)

for i in range(len(psi)): for f (x) and ψi (x) as input. The output is a mesh function with values
u += c[i]*psi[i]
return u, c u on the mesh with points in the array x. Three numerical integration
methods are offered: scipy.integrate.quad (precision set to 10−8 ),
As mentioned in Section 2.2.4, symbolic integration may fail or take a sympy.mpmath.quad (about machine precision), and a Trapezoidal rule
very long time. It is therefore natural to extend the implementation above based on the points in x (unknown accuracy, but increasing with the
with a version where we can choose between symbolic and numerical number of mesh points in x).
integration and fall back on the latter if the former fails:
def least_squares_numerical(f, psi, N, x,
def least_squares_orth(f, psi, Omega, symbolic=True): integration_method=’scipy’,
N = len(psi) - 1 orthogonal_basis=False):
A = [0]*(N+1) # plain list to hold symbolic expressions import scipy.integrate
b = [0]*(N+1) A = np.zeros((N+1, N+1))
x = sym.Symbol(’x’) b = np.zeros(N+1)
for i in range(N+1): Omega = [x[0], x[-1]]
# Diagonal matrix term dx = x[1] - x[0]
A[i] = sym.integrate(psi[i]**2, (x, Omega[0], Omega[1]))
# Right-hand side term j_limit = i+1 if orthogonal_basis else N+1
integrand = psi[i]*f for j in range(i, j_limit):
if symbolic: print ’(%d,%d)’ % (i, j)
I = sym.integrate(integrand, (x, Omega[0], Omega[1])) if integration_method == ’scipy’:
if not symbolic or isinstance(I, sym.Integral): A_ij = scipy.integrate.quad(
print ’numerical integration of’, integrand lambda x: psi(x,i)*psi(x,j),
integrand = sym.lambdify([x], integrand) Omega[0], Omega[1], epsabs=1E-9, epsrel=1E-9)[0]
I = sym.mpmath.quad(integrand, [Omega[0], Omega[1]]) elif integration_method == ’sympy’:
b[i] = I A_ij = sym.mpmath.quad(
c = [b[i]/A[i] for i in range(len(b))] lambda x: psi(x,i)*psi(x,j),
u = 0 [Omega[0], Omega[1]])
u = sum(c[i,0]*psi[i] for i in range(len(psi))) else:
return u, c values = psi(x,i)*psi(x,j)
A_ij = trapezoidal(values, dx)
This functionR
is found in the file approx1D.py. Observe that we here A[i,j] = A[j,i] = A_ij
assume that Ω ϕ2i dx can always be symbolically computed, which is not
if integration_method == ’scipy’:
an unreasonable assumption when the basis functions are orthogonal, b_i = scipy.integrate.quad(
but there is no guarantee, so an improved version of the function above lambda x: f(x)*psi(x,i), Omega[0], Omega[1],
would implement numerical integration also for the A[i,i] term. epsabs=1E-9, epsrel=1E-9)[0]
elif integration_method == ’sympy’:
b_i = sym.mpmath.quad(
lambda x: f(x)*psi(x,i), [Omega[0], Omega[1]])
else:
2.3.4 Numerical computations values = f(x)*psi(x,i)
b_i = trapezoidal(values, dx)
Sometimes the basis functions ψi and/or the function f have a nature that b[i] = b_i
makes symbolic integration CPU-time consuming or impossible. R
Even
c = b/np.diag(A) if orthogonal_basis else np.linalg.solve(A, b)
though we implemented a fallback on numerical integration of f ϕi dx, u = sum(c[i]*psi(x, i) for i in range(N+1))
considerable time might still be required by sympy just by attempting to return u, c
integrate symbolically. Therefore, it will be handy to have function for
def trapezoidal(values, dx):
fast numerical integration and numerical solution of the linear system. """Integrate values by the Trapezoidal rule (mesh size dx)."""
2.4 Interpolation 39 40 2 Function approximation by global functions
return dx*(np.sum(values) - 0.5*values[0] - 0.5*values[-1]) continuous function u that goes through the f (xi ) points. In this case
the xi points are called interpolation points. When the same approach is
Here is an example on calling the function:
used to approximate differential equations, one usually applies the name
from numpy import linspace, tanh, pi collocation method and xi are known as collocation points.
def psi(x, i):
Given f as a sympy symbolic expression f, {ψi }i∈Is as a list psi, and
return sin((i+1)*x) a set of points {xi }i∈Is as a list or array points, the following Python
function sets up and solves the matrix system for the coefficients {ci }i∈Is :
x = linspace(0, 2*pi, 501)
N = 20 def interpolation(f, psi, points):
u, c = least_squares_numerical(lambda x: tanh(x-pi), psi, N, x, N = len(psi) - 1
orthogonal_basis=True) A = sym.zeros((N+1, N+1))
Remark. The scipy.integrate.quad integrator is usually much faster psi_sym = psi # save symbolic expression
# Turn psi and f into Python functions
than sympy.mpmath.quad. x = sym.Symbol(’x’)
psi = [sym.lambdify([x], psi[i]) for i in range(N+1)]
f = sym.lambdify([x], f)
2.4 Interpolation for j in range(N+1):
A[i,j] = psi[j](points[i])
b[i,0] = f(points[i])
2.4.1 The interpolation (or collocation) principle c = A.LUsolve(b)
# c is a sympy Matrix object, turn to list
The principle of minimizing the distance between u and f is an intuitive c = [sym.simplify(c[i,0]) for i in range(c.shape[0])]
u = sym.simplify(sum(c[i,0]*psi_sym[i] for i in range(N+1)))
way of computing a best approximation u ∈ V to f . However, there are return u, c
other approaches as well. One is to demand that u(xi ) = f (xi ) at some
selected points xi , i ∈ Is : The interpolation function is a part of the approx1D module.
We found it convenient in the above function to turn the expressions
X
u(xi ) = cj ψj (xi ) = f (xi ), i ∈ Is . (2.52) f and psi into ordinary Python functions of x, which can be called
j∈Is with float values in the list points when building the matrix and the
P right-hand side. The alternative is to use the subs method to substitute
We recognize that the equation j cj ψj (xi ) = f (xi ) is actually a linear
the x variable in an expression by an element from the points list. The
system with N + 1 unknown coefficients {cj }j∈Is :
following session illustrates both approaches in a simple setting:
X
Ai,j cj = bi , i ∈ Is , (2.53) >>> import sympy as sym
j∈Is >>> x = sym.Symbol(’x’)
>>> e = x**2 # symbolic expression involving x
with coefficient matrix and right-hand side vector given by >>> p = 0.5 # a value of x
>>> v = e.subs(x, p) # evaluate e for x=p
>>> v
0.250000000000000
Ai,j = ψj (xi ), (2.54) >>> type(v)
bi = f (xi ) . (2.55) sympy.core.numbers.Float
>>> e = lambdify([x], e) # make Python function of e
>>> type(e)
This time the coefficient matrix is not symmetric because ψj (xi ) 6= >>> function
ψi (xj ) in general. The method is often referred to as an interpolation >>> v = e(p) # evaluate e(x) for x=p
method since some point values of f are given (f (xi )) and we fit a >>> v
0.25 2.4.2 Lagrange polynomials

>>> type(v)
float In Section 2.3.2 we explained the advantage of having a diagonal matrix:
A nice feature of the interpolation or collocation method is that it formulas for the coefficients {ci }i∈Is can then be derived by hand. For
avoids computing integrals. However, one has to decide on the location an interpolation (or collocation) method a diagonal matrix implies that
of the xi points. A simple, yet common choice, is to distribute them ψj (xi ) = 0 if i 6= j. One set of basis functions ψi (x) with this property
uniformly throughout the unit interval. is the Lagrange interpolating polynomials, or just Lagrange polynomials.
(Although the functions are named after Lagrange, they were first dis-
Example. Let us illustrate the interpolation method by approximating covered by Waring in 1779, rediscovered by Euler in 1783, and published
our parabola f (x) = 10(x − 1)2 − 1 by a linear function on Ω = [1, 2], by Lagrange in 1795.) Lagrange polynomials are key building blocks in
using two collocation points x0 = 1 + 1/3 and x1 = 1 + 2/3: the finite element method, so familiarity with these polynomials will be
import sympy as sym required anyway.
x = sym.Symbol(’x’) A Lagrange polynomial can be written as
f = 10*(x-1)**2 - 1
psi = [1, x]
Omega = [1, 2] N
Y x − xj x − x0 x − xi−1 x − xi+1 x − xN
points = [1 + sym.Rational(1,3), 1 + sym.Rational(2,3)] ψi (x) = = ··· ··· ,
u, c = interpolation(f, psi, points) x − xj xi − x0 xi − xi−1 xi − xi+1 xi − xN
comparison_plot(f, u, Omega) j=0,j6=i i
(2.56)
The resulting linear system becomes for i ∈ Is . We see from (2.56) that all the ψi functions are polynomials
! ! ! of degree N which have the property
1 4/3 c0 1/9
= (
1 5/3 c1 31/9 1, i = s,
ψi (xs ) = δis , δis = (2.57)
with solution c0 = −119/9 and c1 = 10. Figure 2.9 (left) shows the 0, i 6= s,
resulting approximation u = −119/9 + 10x. We can easily test other when xs is an interpolation (collocation) point. Here we have used the
interpolation points, say x0 = 1 and x1 = 2. This changes the line quite Kronecker delta symbol δis . This property implies that A is a diagonal
significantly, see Figure 2.9 (right). matrix, that is that Ai,j = 0 for i 6= j and Ai,j = 1 when i = j. The
solution of the linear system is then simply
10 10
approximation approximation
ci = f (xi ), (2.58)
exact exact
8 8 i ∈ Is ,
6
6
and
4
4
X X
u(x) = ci ψi (x) = f (xi )ψi (x) . (2.59)
2
2
0
j∈Is j∈Is
2 0
41.0 21.0
We remark however that (2.57) does not necessary imply that the matrix
obtained by the least squares or project methods are diagonal.
1.2 1.4 1.6 1.8 2.0 2.2 1.2 1.4 1.6 1.8 2.0 2.2
x x
Fig. 2.9 Approximation of a parabola by linear functions computed by two interpolation
points: 4/3 and 5/3 (left) versus 1 and 2 (right). The following function computes the Lagrange interpolating poly-
nomial ψi (x) on the unit interval (0,1), given the interpolation points
x0 , . . . , xN in the list or array points:
def Lagrange_polynomial(x, i, points):
p = 1 Approximation of a polynomial. The Galerkin or least squares methods

for k in range(len(points)): lead to an exact approximation if f lies in the space spanned by the basis
if k != i:
p *= (x - points[k])/(points[i] - points[k]) functions. It could be of interest to see how the interpolation method
return p with Lagrange polynomials as basis is able to approximate a polynomial,
The next function computes a complete basis, ψ0 , . . . , ψN , using equidis- e.g., a parabola. Running
tant points throughout Ω: for N in 2, 4, 5, 6, 8, 10, 12:
f = x**2
def Lagrange_polynomials_01(x, N): psi, points = Lagrange_polynomials_01(x, N)
if isinstance(x, sym.Symbol): u = interpolation(f, psi, points)
h = sym.Rational(1, N-1)
else: shows the result that up to N=4 we achieve an exact approximation, and
h = 1.0/(N-1) then round-off errors start to grow, such that N=15 leads to a 15-degree
points = [i*h for i in range(N)]
psi = [Lagrange_polynomial(x, i, points) for i in range(N)] polynomial for u where the coefficients in front of xr for r > 2 are
return psi, points of size 10−5 and smaller. As the matrix is ill-conditioned and we use
floating-point arithmetic, we do not obtain the exact solution. Still, we
When x is an sym.Symbol object, we let the spacing between the inter-
get a solution that is visually identical to the exact solution. The reason
polation points, h, be a sympy rational number, so that we get nice end
is that the ill-conditioning causes the system to have many solutions
results in the formulas for ψi . The other case, when x is a plain Python
very close to the true solution. While we are lucky for N=15 and obtain
float, signifies numerical computing, and then we let h be a floating-
a solution that is visually identical to the true solution, ill-conditioning
point number. Observe that the Lagrange_polynomial function works
may also result in completely wrong results. As we continue with higher
equally well in the symbolic and numerical case - just think of x being
values, N=20 reveals that the procedure is starting to fall apart as the
an sym.Symbol object or a Python float. A little interactive session
approximate solution is around 0.9 at x = 1.0, where it should have been
illustrates the difference between symbolic and numerical computing of
1.0. At N=30 the approximate solution is around 5 · 108 at x = 1.
the basis functions and points:
kam 2: Not sure of this. Need to test. Seems like an unstable variant
>>> import sympy as sym of Lagrange which is somewhat strange when combined with interpolation
>>> x = sym.Symbol(’x’)
which should produce the identity matrix
>>> psi, points = Lagrange_polynomials_01(x, N=2)
>>> points Successful example. Trying out the Lagrange polynomial basis for
[0, 1/2, 1] approximating f (x) = sin 2πx on Ω = [0, 1] with the least squares and
>>> psi
[(1 - x)*(1 - 2*x), 2*x*(2 - 2*x), -x*(1 - 2*x)] the interpolation techniques can be done by
>>> x = 0.5 # numerical computing x = sym.Symbol(’x’)

>>> psi, points = Lagrange_polynomials_01(x, N=2) f = sym.sin(2*sym.pi*x)
>>> points psi, points = Lagrange_polynomials_01(x, N)
[0.0, 0.5, 1.0] Omega=[0, 1]
>>> psi u, c = least_squares(f, psi, Omega)
[-0.0, 1.0, 0.0] comparison_plot(f, u, Omega)
u, c = interpolation(f, psi, points)
That is, when used symbolically, the Lagrange_polynomials_01 func- comparison_plot(f, u, Omega)
tion returns the symbolic expression for the Lagrange functions while Figure 2.10 shows the results. There is a difference between the least
when x is a numerical valued the function returns the value of the basis squares and the interpolation technique but the difference decreases
function evaluate in x. In the present example only the second basis rapidly with Increasing N .
function should be 1 in the mid-point while the others are zero according Less successful example. The next example concerns interpolating
to (2.57). f (x) = |1 − 2x| on Ω = [0, 1] using Lagrange polynomials. Figure 2.11
Least squares approximation by Lagrange polynomials of degree 3 Interpolation by Lagrange polynomials of degree 3
1.0 approximation 1.0 approximation points = [Omega[0] + i*h for i in range(N+1)]
exact exact elif point_distribution == ’Chebyshev’:
0.5 0.5
points = Chebyshev_nodes(Omega[0], Omega[1], N)
psi = [Lagrange_polynomial(x, i, points) for i in range(N+1)]
return psi, points
0.0 0.0
def Chebyshev_nodes(a, b, N):

0.5 0.5
from math import cos, pi
return [0.5*(a+b) + 0.5*(b-a)*cos(float(2*i+1)/(2*N+1))*pi) \
1.0 1.0
for i in range(N+1)]
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
Fig. 2.10 Approximation via least squares (left) and interpolation (right) of a sine All the functions computing Lagrange polynomials listed above are found
function by Lagrange interpolating polynomials of degree 3. in the module file Lagrange.py.
Figure 2.13 shows the improvement of using Chebyshev nodes, com-
pared with the equidistant points in Figure 2.11. The reason for this
shows a peculiar effect: the approximation starts to oscillate more and
improvement is that the corresponding Lagrange polynomials have much
more as N grows. This numerical artifact is not surprising when looking
smaller oscillations, which can be seen by comparing Figure 2.14 (Cheby-
at the individual Lagrange polynomials. Figure 2.12 shows two such
shev points) with Figure 2.12 (equidistant points). Note the different
polynomials, ψ2 (x) and ψ7 (x), both of degree 11 and computed from
scale on the vertical axes in the two figures and also that the Chebyshev
uniformly spaced points xi = i/11, i = 0, . . . , 11, marked with circles.
points tend to cluster more around the element boundaries.
We clearly see the property of Lagrange polynomials: ψ2 (xi ) = 0 and
Another cure for undesired oscillations of higher-degree interpolating
ψ7 (xi ) = 0 for all i, except ψ2 (x2 ) = 1 and ψ7 (x7 ) = 1. The most
polynomials is to use lower-degree Lagrange polynomials on many small
striking feature, however, is the dominating oscillation near the boundary
patches of the domain. This is actually the idea pursued in the finite
where ψ2 > 5 and ψ7 = −10 in some points. The reason is easy to
element method. For instance, linear Lagrange polynomials on [0, 1/2]
understand: since we force the functions to be zero at so many points, a
and [1/2, 1] would yield a perfect approximation to f (x) = |1 − 2x| on
polynomial of high degree is forced to oscillate between the points. The
Ω = [0, 1] since f is piecewise linear.
phenomenon is named Runge’s phenomenon and you can read a more
detailed explanation on Wikipedia.
1.0 Interpolation by Lagrange polynomials of degree 7 2 Interpolation by Lagrange polynomials of degree 14
Remedy for strong oscillations. The oscillations can be reduced by a approximation
exact
approximation
exact
more clever choice of interpolation points, called the Chebyshev nodes:
1
0.8
0
0.6

1 1 2i + 1
1
xi = (a + b) + (b − a) cos pi , i = 0 . . . , N, (2.60) 0.4
2 2 2(N + 1) 2
0.2
3
on the interval Ω = [a, b]. Here is a flexible version of the
0.00.0 40.0
Lagrange_polynomials_01 function above, valid for any interval 0.2 0.4
x
0.6 0.8 1.0 0.2 0.4
x
0.6 0.8 1.0
Fig. 2.11 Interpolation of an absolute value function by Lagrange polynomials and

Ω = [a, b] and with the possibility to generate both uniformly distributed uniformly distributed interpolation points: degree 7 (left) and 14 (right).
points and Chebyshev nodes:
def Lagrange_polynomials(x, N, Omega, point_distribution=’uniform’):
if point_distribution == ’uniform’:
How does the least squares or projection methods work with Lagrange
if isinstance(x, sym.Symbol): polynomials? We can just call the least_squares function, but sympy
h = sym.Rational(Omega[1] - Omega[0], N) has problems integrating the f (x) = |1 − 2x| function times a polynomial,
else:
h = (Omega[1] - Omega[0])/float(N)
so we need to fall back on numerical integration.
6 1.2
ψ2 ψ2
4 ψ7 1.0 ψ7
2 0.8
0 0.6
2 0.4
4 0.2
6 0.0
8 0.2
100.0 0.2 0.4 0.6 0.8 1.0 0.40.0 0.2 0.4 0.6 0.8 1.0
Fig. 2.12 Illustration of the oscillatory behavior of two Lagrange polynomials based on Fig. 2.14 Illustration of the less oscillatory behavior of two Lagrange polynomials based
12 uniformly spaced points (marked by circles). on 12 Chebyshev points (marked by circles).
1.2 Interpolation by Lagrange polynomials of degree 7 1.2 Interpolation by Lagrange polynomials of degree 14
approximation approximation
exact exact
1.0 1.0
integrand = sym.lambdify([x], integrand)
0.8 I = sym.mpmath.quad(integrand, [Omega[0], Omega[1]])
0.8
A[i,j] = A[j,i] = I
0.6
0.6 integrand = psi[i]*f
0.4 I = sym.integrate(integrand, (x, Omega[0], Omega[1]))
0.4
0.2
if isinstance(I, sym.Integral):
0.2 0.0
0.00.0 0.2 0.4 0.6 0.8 1.0 0.20.0 0.2 0.4 0.6 0.8 1.0
b[i,0] = I
x x
c = A.LUsolve(b)
Fig. 2.13 Interpolation of an absolute value function by Lagrange polynomials and c = [sym.simplify(c[i,0]) for i in range(c.shape[0])]
Chebyshev nodes as interpolation points: degree 7 (left) and 14 (right). u = sum(c[i]*psi[i] for i in range(len(psi)))
return u, c
def least_squares(f, psi, Omega):

N = len(psi) - 1
b = sym.zeros((N+1, 1)) 2.4.3 Bernstein polynomials
for i in range(N+1): An alternative to the Taylor and Lagrange families of polynomials are the
for j in range(i, N+1): Bernstein polynomials. These polynomials are popular in visualization
integrand = psi[i]*psi[j]
and we include a presentation of them for completeness. Furthermore,
if isinstance(I, sym.Integral): as we will demonstrate, the choice of basis functions may matter in
# Could not integrate symbolically, fall back terms of accuracy and efficiency. In fact, in finite element methods, a
# on numerical integration with mpmath.quad
Least
1.2 squares approximation by Lagrange polynomials of degree 11
approximation
exact
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Fig. 2.15 Illustration of an approximation of the absolute value function using the least Fig. 2.16 The nine functions of a Bernstein basis of order 8.
square method .
main challenge, from a numerical analysis point of view, is to determine

appropriate basis functions for a particular purpose or equation.
On the unit interval, the Bernstein polynomials are defined in terms
of powers of x and 1 − x (the barycentric coordinates of the unit interval)
as !
n i
Bi,n = x (1 − x)n−i , i = 0, . . . , n. (2.61)
i
The Figure 2.16 shows the basis functions of a Bernstein basis of order
8. This figure should be compared against Figure 2.17, which shows the
corresponding Lagrange basis of order 8. The Lagrange basis is convenient
because it is a nodal basis, that is; the basis functions are 1 in their
nodal points and zero at all other nodal points as described by (2.57).
However, looking at Figure 2.17 we also notice that the basis function
are oscillatory and have absolute values that are significantly larger than
1 between the nodal points. Consider for instance the basis function
represented by the purple color. It is 1 in x = 0.5 and 0 at all other Fig. 2.17 The nine functions of a Lagrange basis of order 8.
nodal points and hence this basis function represents the value at the
mid-point. However, this function also has strong negative contributions
−2. For the Bernstein basis, all functions are positive and all functions
close to the element boundaries where it takes negative values less than
output values in [0, 1]. Therefore there is no oscillatory behavior. The
2.5 Approximation properties and convergence rates 51 52 2 Function approximation by global functions
main disadvantage of the Bernstein basis is that the basis is not a nodal def convergence_rate_analysis(series_type, func):
basis. In fact, all functions contribute everywhere except x = 0 and N_values = [2, 4, 8, 16]
norms = []
x = 1. cpu_times = []
Both Lagrange and Bernstein polynomials take larger values towards for N in N_values:
the element boundaries than in the middle of the element, but the
psi = series(series_type, N)
Bernstein polynomials always remain less or equal to 1. t0 = time.clock()
We remark that the Bernstein basis is easily extended to polygons in u, c = least_squares_non_verbose(
gauss_bell, psi, Omega, False)
2D and 3D in terms of the barycentric coordinates. For example, consider t1 = time.clock()
the reference triangle in 2D consisting of the faces x = 0, y = 0, and
x + y = 1. The barycentric coordinates are b1 (x, y) = x, b2 (x, y), and error2 = sym.lambdify([x], (func - u)**2)
L2_norm = scipy.integrate.quad(error2, Omega[0], Omega[1])
b3 (x, y) = 1 − x − y and the Bernstein basis functions of order n is of L2_norm = scipy.sqrt(L2_norm)
the form norms.append(L2_norm[0])
cpu_times.append(t1-t0)
n! i j
Bi,j,k = x y (1 − x − y)k , for i + j + k = n . return N_values, norms, cpu_times
i!j!k!
We run the analysis as follows
Omega = [0, 1]
2.5 Approximation properties and convergence rates x = sym.Symbol("x")
gaussian_bell = sym.exp(-(x-0.5)**2) - sym.exp(-0.5**2)
step = sym.Piecewise((1, 0.25 < x), (0, True)) - \
We will now compare the different approximation methods in terms of sym.Piecewise((1, 0.75 < x), (0, True))
accuracy and efficiency. We consider four different series for generating func = gaussian_bell
approximations: Taylor, Lagrange, sinusoidal, and Bernstein. For all
import pylab
families we expect that the approximations improve as we increase the series_types = ["Taylor", "Sinusoidal", "Bernstein", "Lagrange"]
number of basis functions in our representations. We also expect that the for series_type in series_types:
computational complexity increases. Let us therefore try to quantify the N_values, norms, cpu_times = \
convergence_rate_analysis(series_type, func)
accuracy and efficiency of the different methods in terms of the number plt.loglog(N_values, norms)
of basis functions N . In the present example we consider the least square plt.show()
method.
Below we list the numerical error for different N when approximating
Let us consider the approximation of a Gaussian bell function, i.e.,
the Gaussian bell function.
that the exact solution is
N 2 4 8 16
ue = exp(−(x − 0.5)2 ) − exp(−0.52 ) Taylor 9.83e-02 2.63e-03 7.83e-07 3.57e-10
sine 2.70e-03 6.10e-04 1.20e-04 2.17e-05
We remark that ue is zero in x = 0 and x = 1 and that we have chosen Bernstein 2.10e-03 4.45e-05 8.73e-09 4.49e-15
Lagrange 2.10e-03 4.45e-05 8.73e-09 2.45e-12
the bell function because it cannot be expressed as a finite sum of either
polynomials or sines. We may therefore study the behavior as N → ∞. It is quite clear that the different methods have different properties. For
To quantify the behavior of the error as well as the complexity of the example, the Lagrange basis for N = 16 is 145 times more accurate than
computations we compute the approximation with increasing number the Taylor basis. However, Bernstein is actually more than 500 times
of basis functions and time the computations by using time.clock more accurate than the Lagrange basis! The approximations obtained by
(returning the CPU time so far in the program). A code example goes as sines are far behind the polynomial approximations for N > 4.
follows:
The corresponding CPU time of the required computations also vary the sinusoidal basis seems to have a polynomial convergence rate as the
quite a bit: log-log plot is a linear line. The Bernstein, Lagrange, and Taylor methods
appear to have convergence that is faster than polynomial. It is then
N 2 4 8 16
interesting to consider a log plot and see if the behavior is exponential.
Taylor 0.0123 0.0325 0.108 0.441
sine 0.0113 0.0383 0.229 1.107 Figure 2.19 is a log plot. Here, the Bernstein approximation appears to
Bernstein 0.0384 0.1100 0.3368 1.187 be a linear line which suggests that the convergence is exponential.
Lagrange 0.0807 0.3820 2.5233 26.52
Here, the timings are in seconds. The Taylor basis is the most efficient
and is in fact more than 60 times faster than the Lagrange basis for 10-1
N = 16 (with our naive implementation of basic formulas). 10-2
In order to get a more precise idea of how the approximation methods 10-3
behave as N increases, we investigate two simple data models which 10-4
may be used in a regression analysis. These two are the polynomial and 10-5
exponential model defined by 10-6
10-7
10-8
E1 (N ) = α1 N β1 , (2.62)
10-9
E2 (N ) = α2 exp(β2 N ) (2.63) 10-10
10-11 Taylor
Taking the logarithm of (2.62) we obtain Sinusoidal
10-12
Bernstein
10-13 Lagrange
log(E1 (N )) = β1 log(N ) + log(α1 )
10-14 0
10 101 102
Hence, letting x = log(N ) be the independent variable and y =
log(E1 (N )) the dependent one, we simply have the straight line y = ax+b Fig. 2.18 Convergence of least square approximation using basis function in terms of
with a = β1 and b = log(α1 ). Then, we may perform a regression analysis the Taylor, sinusoidal, Bernstein and Lagrange basis in a log-log plot.
as earlier with respect to the basis functions (1, x) and obtain an estimate
of the order of convergence in terms of β1 . For the second model (2.63),
The following program computes the order of convergence for the sines
we take the natural logarithm and obtain
using the polynomial model (2.62) while the Bernstein approximation is
estimates in terms of model (2.63). We avoid to compute estimates for
ln(E2 (N )) = β2 N + ln(α2 )
the Taylor and Lagrange approximations as neither the log-log plot nor
Again, regression analysis provides the means to estimate the convergence, the log plot demonstrated linear behavior.
but here we let x = N be the independent variable, y = ln(E2 (N )), a = β2 hpl 3: Any comment about the regression_with_noise function?
and b = ln(α2 ). To summarize, the polynomial model should have the
N_values = [2, 4, 8, 16, 32]
data around a straight line in a log-log plot, while the exponential model Taylor = [0.0983, 0.00263, 7.83e-07, 3.57e-10]
has its date around a straight line in a log plot with the logarithmic scale Sinusoidal = [0.0027, 0.00061, 0.00012, 2.17e-05]
Bernstein = [0.0021, 4.45e-05, 8.73e-09, 4.49e-15]
on the y axis. Lagrange = [0.0021, 4.45e-05, 8.73e-09, 2.45e-12]
Before we perform the regression analysis, a good rule is to inspect
the behavior visually in log and log-log plots. Figure 2.18 shows a log-log x = sym.Symbol(’x’)
psi = [1, x]
plot of the error with respect to N for the various methods. Clearly,
for completeness we include a log-log plot in Figure 2.20 to illustrate the

polynomial increase in CPU time with respect to N.
102
101
100
10-1
Taylor
10-2 Sinusoidal
Bernstein
Fig. 2.19 Convergence of least square approximation using basis function in terms of Lagrange
the Taylor, sinusoidal, Bernstein and Lagrange basis in a log plot. 10-3 0
10 101 102
u, c = regression_with_noise(log2(Sinusoidal), psi, log2(N_values)) Fig. 2.20 CPU timings of the approximation with the difference basis in a log-log plot.
print "estimated model for sine: %3.2e*N**(%3.2e)" % \
(2**(c[0]), c[1])
The complete code can be found in convergence_rate_local.py.
# Check the numbers estimated by the model by manual inspection
for N in N_values:
print 2**c[0] * N**c[1]
u, c = regression_with_noise(log(Bernstein), psi, N_values) 2.6 Approximation of functions in higher dimensions

print "estimated model for Bernstein: %3.2e*exp(%3.2e*N)" % \
(exp(c[0]), c[1])
All the concepts and algorithms developed for approximation of 1D
# Check the numbers estimated by the model by manual inspection functions f (x) can readily be extended to 2D functions f (x, y) and 3D
for N in N_values: functions f (x, y, z). Basically, the extensions consist of defining basis
print exp(c[0]) * exp(N * c[1])
functions ψi (x, y) or ψi (x, y, z) over some domain Ω, and for the least
The program estimates the sine approximation convergences as 1.43e− squares and Galerkin methods, the integration is done over Ω.
02 · N −2.3 , which means that the convergence is slightly above second As in 1D, the least squares and projection/Galerkin methods lead to
order. The Bernstein approximation on the other hand is 8.01e − 02 · linear systems
exp(−1.92e + 00N ).
The CPU time in this example here would be significantly faster if
the algorithms were implemented in a compiled language like C/C++ or
Fortran and we should not be careful in drawing conclusion about the
efficiency of the different methods based on this example alone. However,
2.6 Approximation of functions in higher dimensions 57 58 2 Function approximation by global functions
X
Ai,j cj = bi , i ∈ Is , arrays of length M and N , respectively, while p = a ⊗ b must be
j∈Is represented by a two-dimensional array of size M × N .
Ai,j = (ψi , ψj ), Tensor products can be used in a variety of contexts.
bi = (f, ψi ),
Given the vector spaces Vx and Vy as defined in (2.65) and (2.66),
where the inner product of two functions f (x, y) and g(x, y) is defined the tensor product space V = Vx ⊗ Vy has a basis formed as the tensor
completely analogously to the 1D case (2.25): product of the basis for Vx and Vy . That is, if {ψi (x)}i∈Ix and {ψi (y)}i∈Iy
Z are basis for Vx and Vy , respectively, the elements in the basis for V
(f, g) = f (x, y)g(x, y)dxdy (2.64) arise from the tensor product: {ψi (x)ψj (y)}i∈Ix ,j∈Iy . The index sets are
Ω
Ix = {0, . . . , Nx } and Iy = {0, . . . , Ny }.
The notation for a basis function in 2D can employ a double index as
2.6.1 2D basis functions as tensor products of 1D functions in
One straightforward way to construct a basis in 2D is to combine 1D ψp,q (x, y) = ψ̂p (x)ψ̂q (y), p ∈ Ix , q ∈ Iy .
basis functions. Say we have the 1D vector space
The expansion for u is then written as a double sum
Vx = span{ψ̂0 (x), . . . , ψ̂Nx (x)} . (2.65) X X
u= cp,q ψp,q (x, y) .
A similar space for a function’s variation in y can be defined, p∈Ix q∈Iy
Alternatively, we may employ a single index,

Vy = span{ψ̂0 (y), . . . , ψ̂Ny (y)} . (2.66)
We can then form 2D basis functions as tensor products of 1D basis ψi (x, y) = ψ̂p (x)ψ̂q (y),
functions. and use the standard form for u,
X
Tensor products u= cj ψj (x, y) .
Given two vectors a = (a0 , . . . , aM ) and b = (b0 , . . . , bN ), their outer
j∈Is
tensor product, also called the dyadic product, is p = a ⊗ b, defined The single index can be expressed in terms of the double index through
through i = p(Ny + 1) + q or i = q(Nx + 1) + p.
pi,j = ai bj , i = 0, . . . , M, j = 0, . . . , N .
In the tensor terminology, a and b are first-order tensors (vectors 2.6.2 Example on polynomial basis in 2D
with one index, also termed rank-1 tensors), and then their outer Let us again consider approximation with the least squares method, but
tensor product is a second-order tensor (matrix with two indices, also now with basis functions in 2D. Suppose we choose ψ̂p (x) = xp , and try
termed rank-2 tensor). The corresponding inner tensor product is the an approximation with Nx = Ny = 1:
P
well-known scalar or dot product of two vectors: p = a·b = N j=0 aj bj .
Now, p is a rank-0 tensor. ψ0,0 = 1, ψ1,0 = x, ψ0,1 = y, ψ1,1 = xy .
Tensors are typically represented by arrays in computer code.
In the above example, a and b are represented by one-dimensional Using a mapping to one index like i = q(Nx + 1) + p, we get
Z Z
ψ0 = 1, ψ1 = x, ψ2 = y, ψ3 = xy . Ly Lx
A0,0 = (ψ0 , ψ0 ) = 1dxdy = Lx Ly ,
0 0
With the specific choice f (x, y) = (1 + x2 )(1 + 2y 2 ) on Ω = [0, Lx ] × Z Z
[0, Ly ], we can perform actual calculations:
Ly Lx L2x Ly
A0,1 = (ψ0 , ψ1 ) = xdxdy = ,
0 0 2
Z Ly Z Lx
Lx L2y
A0,2 = (ψ0 , ψ2 ) = ydxdy = ,
0 0 2
Z Ly Z Lx
L2x L2y
A0,3 = (ψ0 , ψ3 ) = xydxdy = ,
0 0 4
Z Ly Z Lx
L2 Ly
A1,0 = (ψ1 , ψ0 ) = xdxdy = x ,
0 0 2
Z Ly Z Lx 3
L Ly
A1,1 = (ψ1 , ψ1 ) = x2 dxdy = x ,
0 0 3
Z Ly Z Lx
L2x L2y
A1,2 = (ψ1 , ψ2 ) = xydxdy = ,
0 0 4
Z Ly Z Lx 3 2
Lx Ly
A1,3 = (ψ1 , ψ3 ) = x2 ydxdy = ,
0 0 6
Z Ly Z Lx
Lx L2y
A2,0 = (ψ2 , ψ0 ) = ydxdy = ,
0 0 2
Z Ly Z Lx
L2x L2y
A2,1 = (ψ2 , ψ1 ) = xydxdy = ,
0 0 4
Z Ly Z Lx
Lx L3y
A2,2 = (ψ2 , ψ2 ) = y 2 dxdy = ,
0 0 3
Z Ly Z Lx 2 3
Lx Ly
A2,3 = (ψ2 , ψ3 ) = xy 2 dxdy = ,
0 0 6
Z Ly Z Lx 2 2
Lx Ly
A3,0 = (ψ3 , ψ0 ) = xydxdy = ,
0 0 4
Z Ly Z Lx
L3x L2y
A3,1 = (ψ3 , ψ1 ) = x2 ydxdy = ,
0 0 6
Z Ly Z Lx
L2x L3y
A3,2 = (ψ3 , ψ2 ) = xy 2 dxdy = ,
0 0 6
Z Ly Z Lx
L3x L3y
A3,3 = (ψ3 , ψ3 ) = x2 y 2 dxdy = .
0 0 9
The right-hand side vector has the entries
Z Ly Z Lx
b0 = (ψ0 , f ) = 1 · (1 + x2 )(1 + 2y 2 )dxdy 1 1
0 0 Ai,j = Â(x) (y)
p,r Âq,s = Lp+r+1 Lq+s+1 ,
Z Ly Z Lx 2 1 p+r+1 x
q+s+1 y
= (1 + 2y 2 )dy (1 + x2 )dx = (Ly + L3y )(Lx + L3x ) for p, r ∈ Ix and q, s ∈ Iy .
0 0 3 3
Z Ly Z Lx Corresponding reasoning for the right-hand side leads to
b1 = (ψ1 , f ) = x(1 + x2 )(1 + 2y 2 )dxdy
0 0
Z Z Z Z
Ly Lx 2 1 1 Ly Lx
= (1 + 2y 2 )dy x(1 + x2 )dx = (Ly + L3y )( L2x + L4x ) bi = (ψi , f ) = ψi f dxdx
0 0 3 2 4 0 0
Z Ly Z Lx Z Ly Z Lx
b2 = (ψ2 , f ) = y(1 + x2 )(1 + 2y 2 )dxdy = ψ̂p (x)ψ̂q (y)f dxdx
0 0 0 0
Z Z Z Z
Ly Lx 1 1 1 Ly Ly
= y(1 + 2y 2 )dy (1 + x2 )dx = ( L2y + L4y )(Lx + L3x ) = ψ̂q (y)(1 + 2y 2 )dy ψ̂p (x)(1 + x2 )dx
0 0 2 2 3 0 0
Z Ly Z Lx Z Ly Z Ly
b3 = (ψ3 , f ) = xy(1 + x2 )(1 + 2y 2 )dxdy = y q (1 + 2y 2 )dy xp (1 + x2 )dx
0 0 0 0
Z Z
Ly Lx 1 1 1 1 1 2 1 1
= y(1 + 2y 2 )dy x(1 + x2 )dx = ( L2y + L4y )( L2x + L4x ) . =( Lq+1 + Lq+3 )( Lp+1 + Lp+3 )
0 0 2 2 2 4 q+1 y q+3 y p+1 x p+3 x
There is a general pattern in these calculations that we can explore. Choosing Lx = Ly = 2, we have
An arbitrary matrix entry has the formula    308 
4 4 4 4  
 4 16 4 16  9
 140  9 −1
 3 3     
Z Ly Z Lx A= 16  , b =  3 , c =  − 32 43  .
Ai,j = (ψi , ψj ) =  4 4 16
3 3   44 
ψi ψj dxdy 8
Z Z
0 0
Z Z 4 16 16 64
3 3 9 60
Ly Lx Ly Lx
= ψp,q ψr,s dxdy = ψ̂p (x)ψ̂q (y)ψ̂r (x)ψ̂s (y)dxdy Figure 2.21 illustrates the result.
0 0 0 0
Z Ly Z Lx
= ψ̂q (y)ψ̂s (y)dy ψ̂p (x)ψ̂r (x)dx f(x,y) f(x,y)
0 0 45 35
= Â(x) (y) 40 30
p,r Âq,s ,
45 35
40 30
35 35 25 25
30 20
25 30 15
20 20
10
where
15 25
10 5 15
5 20 0
0 -5
10
15
5
Z Lx Z Ly
2
10
2
0
Â(x) Â(y)
1.5 5 1.5
= ψ̂p (x)ψ̂r (x)dx, = ψ̂q (y)ψ̂s (y)dy,

2 2
1 1.5 0 1 1.5 -5
1 1
p,r q,s 0.5 0.5
0 0
0.5 0.5
0 0 0 0
Fig. 2.21 Approximation of a 2D quadratic function (left) by a 2D bilinear function

are matrix entries for one-dimensional approximations. Moreover, i = (right) using the Galerkin or least squares method.
pNx + q and j = sNy + r.
With ψ̂p (x) = xp we have
The calculations can also be done using sympy. The following code
1 1 computes the matrix of our example
Â(x) = Lp+r+1 , Â(y) = Lq+s+1 ,
p,r
p+r+1 x q,s
q+s+1 y
import sympy as sym
and
x, y, Lx, Ly = sym.symbols("x y L_x L_y") Third, we must construct a list of 2D basis functions. Here are two
examples based on tensor products of 1D "Taylor-style" polynomials xi
def integral(integrand):
Ix = sym.integrate(integrand, (x, 0, Lx)) and 1D sine functions sin((i + 1)πx):
I = sym.integrate(Ix, (y, 0, Ly))
return I def taylor(x, y, Nx, Ny):
return [x**i*y**j for i in range(Nx+1) for j in range(Ny+1)]
basis = [1, x, y, x*y]
A = sym.Matrix(sym.zeros([4,4])) def sines(x, y, Nx, Ny):
return [sym.sin(sym.pi*(i+1)*x)*sym.sin(sym.pi*(j+1)*y)
for i in range(len(basis)): for i in range(Nx+1) for j in range(Ny+1)]
psi_i = basis[i]
for j in range(len(basis)):
The complete code appears in approx2D.py.
psi_j = basis[j] The previous hand calculation where a quadratic f was approximated
A[i,j] = integral(psi_i*psi_j) by a bilinear function can be computed symbolically by
We remark that sympy may even write the output in LATEX or C++ >>> from approx2D import *
format. kam 4: place this comment somwehere else? >>> f = (1+x**2)*(1+2*y**2)
>>> psi = taylor(x, y, 1, 1)
>>> Omega = [[0, 2], [0, 2]]
>>> u, c = least_squares(f, psi, Omega)
>>> print u
2.6.3 Implementation 8*x*y - 2*x/3 + 4*y/3 - 1/9
>>> print sym.expand(f)
The least_squares function from Section 2.3.3 and/or the file approx1D. 2*x**2*y**2 + x**2 + 2*y**2 + 1
py can with very small modifications solve 2D approximation prob-
lems. First, let Omega now be a list of the intervals in x and y di- We may continue with adding higher powers to the basis:
rection. For example, Ω = [0, Lx ] × [0, Ly ] can be represented by >>> psi = taylor(x, y, 2, 2)
Omega = [[0, L_x], [0, L_y]]. >>> u, c = least_squares(f, psi, Omega)
>>> print u
Second, the symbolic integration must be extended to 2D: 2*x**2*y**2 + x**2 + 2*y**2 + 1
>>> print u-f
import sympy as sym
0
integrand = psi[i]*psi[j] For Nx ≥ 2 and Ny ≥ 2 we recover the exact function f , as expected,
I = sym.integrate(integrand,
(x, Omega[0][0], Omega[0][1]), since in that case f ∈ V , see Section 2.2.5.
(y, Omega[1][0], Omega[1][1]))
provided integrand is an expression involving the sympy symbols x and

y. The 2D version of numerical integration becomes 2.6.4 Extension to 3D
if isinstance(I, sym.Integral): Extension to 3D is in principle straightforward once the 2D extension
integrand = sym.lambdify([x,y], integrand) is understood. The only major difference is that we need the repeated
I = sym.mpmath.quad(integrand,
[Omega[0][0], Omega[0][1]],
outer tensor product,
[Omega[1][0], Omega[1][1]])
V = Vx ⊗ Vy ⊗ Vz .
The right-hand side integrals are modified in a similar way. (We
(q) (q)
should add that sympy.mpmath.quad is sufficiently fast even in 2D, but In general, given vectors (first-order tensors) a(q) = (a0 , . . . , aNq ), q =
scipy.integrate.nquad is much faster.) 0, . . . , m, the tensor product p = a(0) ⊗ · · · ⊗ am has elements
2.7 Exercises 65 66 2 Function approximation by global functions
(0) (1) (m)

pi0 ,i1 ,...,im = ai1 ai1 · · · aim . d) Check out the topic of inner product spaces. Suggest a possible inner
product for the space of all linear functions of the form ax + b, a, b ∈ R,
The basis functions in 3D are then
defined on some interval Ω = [A, B]. Show that this particular inner
product satisfies the general requirements of an inner product in a vector
ψp,q,r (x, y, z) = ψ̂p (x)ψ̂q (y)ψ̂r (z),
space.
with p ∈ Ix , q ∈ Iy , r ∈ Iz . The expansion of u becomes Filename: linalg1.
X X X
u(x, y, z) = cp,q,r ψp,q,r (x, y, z) .
p∈Ix q∈Iy r∈Iz
Problem 2.2: Approximate a three-dimensional vector in a
A single index can be introduced also here, e.g., i = rNx Ny + qNx + p, plane
P
u = i ci ψi (x, y, z).
Given f = (1, 1, 1) in R3 , find the best approximation vector u in the
plane spanned by the unit vectors (1, 0) and (0, 1). Repeat the calculations
Use of tensor product spaces using the vectors (2, 1) and (1, 2).
Constructing a multi-dimensional space and basis from tensor prod- Filename: vec111_approx.
ucts of 1D spaces is a standard technique when working with global
basis functions. In the world of finite elements, constructing basis
functions by tensor products is much used on quadrilateral and Problem 2.3: Approximate a parabola by a sine
hexahedra cell shapes, but not on triangles and tetrahedra. Also, the
global finite element basis functions are almost exclusively denoted Given the function f (x) = 1 + 2x(1 − x) on Ω = [0, 1], we want to find
by a single index and not by the natural tuple of indices that arises an approximation in the function space
from tensor products.
V = span{1, sin(πx)} .
a) Sketch or plot f (x). Think intuitively how an expansion in terms of
the basis functions of V , ψ0 (x) = 1, ψ1 (x) = sin(πx), can be construction
to yield a best approximation to f . Or phrased differently, see if you can
2.7 Exercises guess the coefficients c0 and c1 in the expansion
Problem 2.1: Linear algebra refresher u(x) = c0 ψ0 + c1 ψ1 = c0 + c1 sin(πx) .

R1
Look up the topic of vector space in your favorite linear algebra book or 2
Compute the L error ||f − u||L2 = ( 0 (f − u)2 dx)1/2 .
search for the term at Wikipedia.
Hint. If you make a mesh function e of the error on some mesh with
a) Prove that vectors in the plane spanned by the vector (a, b) form a uniformly spaced coordinates in the array xc, the integral can be approx-
vector space by showing that all the axioms of a vector space are satisfied. imated as np.sqrt(dx*np.sum(e**2)), where dx=xc[0]-xc[1] is the
b) Prove that all linear functions of the form ax + b constitute a vector mesh spacing and np is an alias for the numpy module in Python.
space, a, b ∈ R. b) Perform the hand calculations for a least squares approximation.
c) Show that all quadratic functions of the form 1 + ax2 + bx do not Filename: parabola_sin.
constitute a vector space.
Problem 2.4: Approximate the exponential function by power Problem 2.6: Approximate a steep function by sines
functions
Find the best approximation of f (x) = tanh(s(x − π)) on [0, 2π] in the
Let V be a function space with basis functions xi , i = 0, 1, . . . , N . Find space V with basis ψi (x) = sin((2i + 1)x), i ∈ Is = {0, . . . , N }. Make a
P
the best approximation to f (x) = exp(−x) on Ω = [0, 8] among all movie showing how u = j∈Is cj ψj (x) approximates f (x) as N grows.
functions in V for N = 2, 4, 6. Illustrate the three approximations in Choose s such that f is steep (s = 20 is appropriate).
three separate plots. Hint 1. One may naively call the least_squares_orth and
Hint. Apply the lest_squares and comparison_plot functions in the comparison_plot from the approx1D module in a loop and extend the
approx1D.py module as these make the exercise easier to solve. basis with one new element in each pass. This approach implies a lot of
Filename: exp_powers. recomputations. A more efficient strategy is to let least_squares_orth
compute with only one basis function at a time and accumulate the
corresponding u in the total solution.
Problem 2.5: Approximate the sine function by power Hint 2. ffmpeg or avconv may skip frames when plot files are combined
functions to a movie. Since there are few files and we want to see each of them,
use convert to make an animated GIF file (-delay 200 is suitable).
In this exercise we want to approximate the sine function by polynomials
Filename: tanh_sines.
of order N + 1. Consider two bases:
Remarks. Approximation of a discontinuous (or steep) f (x) by sines,
results in slow convergence and oscillatory behavior of the approximation
V1 = {x, x3 , x5 , . . . , xN −2 , xN }, close to the abrupt changes in f . This is known as the Gibb’s phenomenon.
V2 = {1, x, x2 , x3 , . . . , xN } .
The basis V1 is motivated by the fact that the Taylor polynomial approx-
Problem 2.7: Approximate a steep function by sines with
imation to the sine function has only odd powers, while V2 is motivated
boundary adjustment
by the assumption that also the even powers could improve the approxi-
mation in a least-squares setting. We study the same approximation problem as in Problem 2.6. Since
Compute the best approximation to f (x) = sin(x) among all functions ψi (0) = ψi (2π) = 0 for all i, u = 0 at the boundary points x = 0
in V1 and V2 on two domains of increasing sizes: Ω1,k = [0, kπ], k = and x = 2π, while f (0) = −1 and f (2π) = 1. This discrepancy at the
2, 3 . . . , 6 and Ω2,k = [−kπ/2, kπ/2], k = 2, 3, 4, 5. Make plots for all boundary can be removed by adding a boundary function B(x):
combinations of V1 , V2 , Ω1 , Ω2 , k = 2, 3, . . . , 6. X
Add a plot of the N -th degree Taylor polynomial approximation of u(x) = B(x) + cj ψj (x),
sin(x) around x = 0. j∈Is
Hint. You can make a loop over V1 and V2 , a loop over Ω1 and Ω2 , where B(x) has the right boundary values: B(xL ) = f (xL ) and B(xR ) =
and a loop over k. Inside the loops, call the functions least_squares f (xR ), with xL = 0 and xR = 2π as the boundary points. A linear choice
and comparison_plot from the approx1D module. N = 7 is a suggested of B(x) is
value.
(xR − x)f (xL ) + (x − xL )f (xR )
Filename: sin_powers. B(x) = .
xR − xL
a) Use the basis ψi (x) = sin((i + 1)x), i ∈ Is = {0, . . . , N } and plot u
and f for N = 16. (It suffices to make plots for even i.)
b) Use the basis from Exercise 2.6, ψi (x) = sin((2i + 1)x), i ∈ Is = Problem 2.9: Approximate a steep function by Lagrange
{0, . . . , N }. (It suffices to make plots for even i.) Observe that the polynomials
approximation converges to a piecewise linear function!
Use interpolation with uniformly distributed points and Chebychev nodes
c) Use the basis ψi (x) = sin(2(i + 1)x), i ∈ Is = {0, . . . , N }, and observe to approximate
that the approximation converges to a piecewise constant function.
Filename: tanh_sines_boundary_term. 1
f (x) = − tanh(s(x − )), x ∈ [0, 1],
2
Remarks. The strange results in b) and c) are due to the choice of
by Lagrange polynomials for s = 5 and s = 20, and N = 3, 7, 11, 15.
basis. In b), ϕi (x) is an odd function around x = π/2 and x = 3π/2. No
Combine 2 × 2 plots of the approximation for the four N values, and
combination of basis functions is able to approximate the flat regions
create such figures for the four combinations of s values and point types.
of f . All basis functions in c) are even around x = π/2 and x = 3π/2,
Filename: tanh_Lagrange.
but odd at x = 0, π, 2π. With all the sines represented, as in a), the
approximation is not constrained with a particular symmetry behavior.
Problem 2.10: Approximate a steep function by Lagrange

polynomials and regression
Exercise 2.8: Fourier series as a least squares approximation
Redo Problem 2.9, but apply a regression method with N -degree Lagrange
a) Given a function f (x) on an interval [0, L], look up the formula for
polynomials and 2N + 1 data points. Recall that Problem 2.9 applies
the coefficients aj and bj in the Fourier series of f :
N + 1 points and the resulting approximation interpolates f at these
1 X∞
2πx
∞
X
2πx
points, while a regression method with more points does not interpolate
f (x) = a0 + aj cos j + bj sin j . f at the data points. Do more points and a regression method help reduce
2 L L
j=1 j=1
the oscillatory behavior of Lagrange polynomial approximations?
b) Let an infinite-dimensional vector space V have the basis functions Filename: tanh_Lagrange_regression.
cos j 2πx 2πx
L for j = 0, 1, . . . , ∞ and sin j L for j = 1, . . . , ∞. Show that
the least squares approximation method from Section 2.2 leads to a
linear system whose solution coincides with the standard formulas for
the coefficients in a Fourier series of f (x) (see also Section 2.3.2).
Hint. You may choose

2π 2π
ψ2i = cos i x , ψ2i+1 = sin i x , (2.67)
L L
for i = 0, 1, . . . , N → ∞.
c) Choose f (x) = H(x − 12 ) on Ω = [0, 1], where H is the Heaviside
function: H(x) = 0 for x < 0, H(x) = 1 for x > 0 and H(0) = 12 . Find
the coefficients aj and bj in the Fourier series for f (x). Plot the sum for
j = 2N + 1, where N = 5 and N = 100.
Filename: Fourier_ls.
72 3 Function approximation by finite elements
Function approximation by finite

elements 3 4
ψ0
ψ1
u =4ψ0 −12 ψ1
2
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

The purpose of this chapter is to use the ideas from the previous chapter
to approximate functions, but the basis functions are now of finite element Fig. 3.1 A function resulting from a weighted sum of two sine basis functions.
type.
9
3.1 Finite element basis functions u
8
The specific basis functions exemplified in Section 2.2 are in general 7
nonzero on the entire domain Ω, as can be seen in Figure 3.1, where we
6
plot two sinusoidal basis functions ψ0 (x) = sin 21 πx and ψ1 (x) = sin 2πx
together with the sum u(x) = 4ψ0 (x) − 12 ψ1 (x). We shall now turn our 5
attention to basis functions that have compact support, meaning that they
4
are nonzero on a small portion of Ω only. Moreover, we shall restrict the
functions to be piecewise polynomials. This means that the domain is split 3
into subdomains and each basis function is a polynomial on one or more
2
of these subdomains, see Figure 3.2 for a sketch involving locally defined
P ϕ0 ϕ1 ϕ2
hat functions that make u = j cj ψj piecewise linear. At the boundaries 1
between subdomains, one normally just forces continuity of u, so that
00.0
when connecting two polynomials from two subdomains, the derivative 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
becomes discontinuous. This type of basis functions is fundamental in the
finite element method. (One may wonder why continuity of derivatives is Fig. 3.2 A function resulting from a weighted sum of three local piecewise linear (hat)
functions.
not desired, and it is, but it turns out to be mathematically challenging
in 2D and 3D, and it is not strictly needed.)
We first introduce the concepts of elements and nodes in a simplistic necessary step before treating a wider class of approximations within the
fashion. Later, we shall generalize the concept of an element, which is a

3.1 Finite element basis functions 73 74 3 Function approximation by finite elements
family of finite element methods. The generalization is also compatible Example. On Ω = [0, 1] we may introduce two elements, Ω (0) = [0, 0.4]
with the concepts used in the FEniCS finite element software. and Ω (1) = [0.4, 1]. Furthermore, let us introduce three nodes per element,
equally spaced within each element. Figure 3.4 shows the mesh with
Ne = 2 elements and Nn = 2Ne + 1 = 5 nodes. A node’s coordinate is
3.1.1 Elements and nodes denoted by xi , where i is either a global node number or a local one. In
the latter case we also need to know the element number to uniquely
Let u and f be defined on an interval Ω. We divide Ω into Ne non- define the node.
overlapping subintervals Ω (e) , e = 0, . . . , Ne − 1: The three nodes in element number 0 are x0 = 0, x1 = 0.2, and
x2 = 0.4. The local and global node numbers are here equal. In element
Ω = Ω (0) ∪ · · · ∪ Ω (Ne ) . (3.1) number 1, we have the local nodes x0 = 0.4, x1 = 0.7, and x2 = 1 and
(e)
We shall for now refer to Ω as an element, identified by the unique the corresponding global nodes x2 = 0.4, x3 = 0.7, and x4 = 1. Note that
number e. On each element we introduce a set of points called nodes. the global node x2 = 0.4 is shared by the two elements.
For now we assume that the nodes are uniformly spaced throughout the
element and that the boundary points of the elements are also nodes. The
nodes are given numbers both within an element and in the global domain.
These are referred to as local and global node numbers, respectively. Local
nodes are numbered with an index r = 0, . . . , d, while the Nn global
x
nodes are numbered as i = 0, . . . , Nn − 1. Figure 3.3 shows nodes as small 0 1 2 3 4
circular disks and element boundaries as small vertical lines. Global node Ω(0) Ω(1)
numbers appear under the nodes, but local node numbers are not shown. Fig. 3.4 Finite element mesh with 2 elements and 5 nodes.
Since there are two nodes in each elements, the local nodes are numbered
0 (left) and 1 (right) in each element.
For the purpose of implementation, we introduce two lists or arrays:
nodes for storing the coordinates of the nodes, with the global node
numbers as indices, and elements for holding the global node numbers
in each element. By defining elements as a list of lists, where each sublist
contains the global node numbers of one particular element, the indices
of each sublist will correspond to local node numbers for that element.
x
The nodes and elements lists for the sample mesh above take the form
0 1 2 3 4 5
Ω(0) Ω(1) Ω(2) Ω(3) Ω(4) nodes = [0, 0.2, 0.4, 0.7, 1]
Fig. 3.3 Finite element mesh with 5 elements and 6 nodes. elements = [[0, 1, 2], [2, 3, 4]]
Looking up the coordinate of, e.g., local node number 2 in element 1, is

Nodes and elements uniquely define a finite element mesh, which is our done by nodes[elements[1][2]] (recall that nodes and elements start
discrete representation of the domain in the computations. A common their numbering at 0). The corresponding global node number is 4, so
special case is that of a uniformly partitioned mesh where each element has we could alternatively look up the coordinate as nodes[4].
the same length and the distance between nodes is constant. Figure 3.3 The numbering of elements and nodes does not need to be regular.
shows an example on a uniformly partitioned mesh. The strength of the Figure 3.5 shows and example corresponding to
finite element method (in contrast to the finite difference method) is that nodes = [1.5, 5.5, 4.2, 0.3, 2.2, 3.1]
it is equally easy to work with a non-uniformly partitioned mesh as a elements = [[2, 1], [4, 5], [0, 4], [3, 0], [5, 2]]
uniformly partitioned one.
2.5 over two elements. Also observe that the basis function ϕi is zero at all
2.0 nodes, except at global node number i. We also remark that the shape
1.5 of a basis function over an element is completely determined by the
1.0 coordinates of the local nodes in the element.
0.5
0.0 x
0.5 3 0 4 5 2 1
1.0 Ω(3) Ω(2) Ω(1) Ω(4) Ω(0)
1.5 0 1 2 3 4 5 6 7
Fig. 3.5 Example on irregular numbering of elements and nodes.
1.0
3.1.2 The basis functions 0.8 ϕ2
0.6 ϕ3
0.4
Construction principles. Finite element basis functions are in this text 0.2 ϕ4
0.0
recognized by the notation ϕi (x), where the index (now in the beginning) 0.2
0.0 0.2 0.4 0.6 0.8 1.0
corresponds to a global node number. Since ψi is the symbol for basis
functions in general in this text, the particular choice of finite element
basis functions means that we take ψi = ϕi .
Let i be the global node number corresponding to local node r in
element number e with d + 1 local nodes. We distinguish between internal
nodes in an element and shared nodes. The latter are nodes that are
shared with the neighboring elements. The finite element basis functions
Fig. 3.6 Illustration of the piecewise quadratic basis functions associated with nodes in
ϕi are now defined as follows. an element.
• For an internal node, with global number i and local number r, take
ϕi (x) to be the Lagrange polynomial that is 1 at the local node r and
Properties of ϕi . The construction of basis functions according to the
zero at all other nodes in the element. The degree of the polynomial
principles above lead to two important properties of ϕi (x). First,
is d, according to (2.56). On all other elements, ϕi = 0.
(
• For a shared node, let ϕi be made up of the Lagrange polynomial 1, i = j,
on this element that is 1 at node i, combined with the Lagrange ϕi (xj ) = δij , δij = (3.2)
0, i 6= j,
polynomial over the neighboring element that is also 1 at node i. On
all other elements, ϕi = 0. when xj is a node in the mesh with global node number j. The result
ϕi (xj ) = δij is obtained as the Lagrange polynomials are constructed
A visual impression of three such basis functions is given in Figure 3.6. The to have exactly this property. The property also implies a convenient
domain Ω = [0, 1] is divided into four equal-sized elements, each having interpretation of ci as the value of u at node i. To show this, we expand
three local nodes. The element boundaries are marked by vertical dashed P
u in the usual way as j cj ψj and choose ψi = ϕi :
lines and the nodes by small circles. The function ϕ2 (x) is composed of a
X X
quadratic Lagrange polynomial over element 0 and 1, ϕ3 (x) corresponds u(xi ) = cj ψj (xi ) = cj ϕj (xi ) = ci ϕi (xi ) = ci .
to an internal node in element 1 and is therefore nonzero on this element j∈Is j∈Is
only, while ϕ4 (x) is like ϕ2 (x) composed to two Lagrange polynomials
Because of this interpretation, the coefficient ci is by many named ui or 1. The polynomial that is 1 at local node 1 (global node 3) makes up
Ui . the basis function ϕ3 (x) over this element, with ϕ3 (x) = 0 outside
Second, ϕi (x) is mostly zero throughout the domain: the element.
2. The polynomial that is 1 at local node 0 (global node 2) is the “right
• ϕi (x) 6= 0 only on those elements that contain global node i,
part” of the global basis function ϕ2 (x). The “left part” of ϕ2 (x)
• ϕi (x)ϕj (x) 6= 0 if and only if i and j are global node numbers in the consists of a Lagrange polynomial associated with local node 2 in the
same element. neighboring element Ω (0) = [0, 0.25].
Since Ai,j is the integral of ϕi ϕj it means that most of the elements in 3. Finally, the polynomial that is 1 at local node 2 (global node 4) is
the coefficient matrix will be zero. We will come back to these properties the “left part” of the global basis function ϕ4 (x). The “right part”
and use them actively in computations to save memory and CPU time. comes from the Lagrange polynomial that is 1 at local node 0 in the
In our example so far, each element has d + 1 nodes, resulting in local neighboring element Ω (2) = [0.5, 0.75].
Lagrange polynomials of degree d (according to Section 2.4.2), but it is The specific mathematical form of the polynomials over element 1 is
not a requirement to have the same d value in each element.
given by the formula (2.56):
(x − 0.25)(x − 0.5)
3.1.3 Example on quadratic finite element functions ϕ3 (x) = , x ∈ Ω (1)
(0.375 − 0.25)(0.375 − 0.5)
Let us set up the nodes and elements lists corresponding to the mesh (x − 0.375)(x − 0.5)
implied by Figure 3.6. Figure 3.7 sketches the mesh and the numbering. ϕ2 (x) = , x ∈ Ω (1)
(0.25 − 0.375)(0.25 − 0.5)
We have
(x − 0.25)(x − 0.375)
ϕ4 (x) = , x ∈ Ω (1)
nodes = [0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0] (0.5 − 0.25)(0.5 − 0.25)
elements = [[0, 1, 2], [2, 3, 4], [4, 5, 6], [6, 7, 8]]
As mentioned earlier, any global basis function ϕi (x) is zero on elements
that do not contain the node with global node number i. Clearly, the
property (3.2) is easily verified, see for instance that ϕ3 (0.375) = 1 while
ϕ3 (0.25) = 0 and ϕ3 (0.5) = 0.
The other global functions associated with internal nodes, ϕ1 , ϕ5 , and
ϕ7 , are all of the same shape as the drawn ϕ3 in Figure 3.6, while the
global basis functions associated with shared nodes have the same shape
x
as shown ϕ2 and ϕ4 . If the elements were of different length, the basis
0 1 2 3 4 5 6 7 8 functions would be stretched according to the element size and hence be
Ω(0) Ω(1) Ω(2) Ω(3) different.
Fig. 3.7 Sketch of mesh with 4 elements and 3 nodes per element.
Let us explain mathematically how the basis functions are constructed 3.1.4 Example on linear finite element functions
according to the principles. Consider element number 1 in Figure 3.7,
Ω (1) = [0.25, 0.5], with local nodes 0, 1, and 2 corresponding to global Figure 3.8 shows piecewise linear basis functions (d = 1). These are
nodes 2, 3, and 4. The coordinates of these nodes are 0.25, 0.375, and mathematically simpler than the quadratic functions in the previous
0.5, respectively. We define three Lagrange polynomials on this element: section, and one would therefore think that it is easier to understand
the linear functions first. However, linear basis functions do not involve
internal nodes and are therefore a special case of the general situation. Here, xj , j = i − 1, i, i + 1, denotes the coordinate of node j. For elements
That is why we think it is better to understand the construction of of equal length h the formulas can be simplified to
quadratic functions first, which easily generalize to any d > 2, and then 
look at the special case d = 1. 
 0, x < xi−1 ,

 (x − x )/h, xi−1 ≤ x < xi ,
ϕi (x) = i−1
(3.4)

 1 − (x − x i )/h, xi ≤ x < xi+1 ,

 0, x ≥ xi+1
3.1.5 Example on cubic finite element functions

Piecewise cubic basis functions can be defined by introducing four nodes
1.0
per element. Figure 3.9 shows examples on ϕi (x), i = 3, 4, 5, 6, associated
0.8 ϕ1 with element number 1. Note that ϕ4 and ϕ5 are nonzero only on element
0.6
0.4
ϕ2 number 1, while ϕ3 and ϕ6 are made up of Lagrange polynomials on two
0.2 neighboring elements.
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Fig. 3.8 Illustration of the piecewise linear basis functions associated with nodes in an
element. 1.0
0.8
0.6
0.4
0.2
We have the same four elements on Ω = [0, 1]. Now there are no 0.0
0.2
internal nodes in the elements so that all basis functions are associated 0.4
0.0 0.2 0.4 0.6 0.8 1.0
with shared nodes and hence made up of two Lagrange polynomials, one
from each of the two neighboring elements. For example, ϕ1 (x) results
from the Lagrange polynomial in element 0 that is 1 at local node 1 and
0 at local node 0, combined with the Lagrange polynomial in element 1
that is 1 at local node 0 and 0 at local node 1. The other basis functions
are constructed similarly.
Explicit mathematical formulas are needed for ϕi (x) in computations. Fig. 3.9 Illustration of the piecewise cubic basis functions associated with nodes in an
In the piecewise linear case, the formula (2.56) leads to element.


 0, x < xi−1 , We see that all the piecewise linear basis functions have the same

 (x − x
i−1 )/(xi
− xi−1 ), xi−1 ≤ x < xi , “hat” shape. They are naturally referred to as hat functions, also called
ϕi (x) = (3.3)

 1 − (x − xi )/(xi+1 − xi ), xi ≤ x < xi+1 , chapeau functions. The piecewise quadratic functions in Figure 3.6 are

 0, x ≥ xi+1 . seen to be of two types. “Rounded hats” associated with internal nodes
in the elements and some more “sombrero” shaped hats associated with
element boundary nodes. Higher-order basis functions also have hat-like ϕ2 ϕ3
shapes, but the functions have pronounced oscillations in addition, as
illustrated in Figure 3.9.
A common terminology is to speak about linear elements as elements x
with two local nodes associated with piecewise linear basis functions. Sim- 0 1 2 3 4 5
ilarly, quadratic elements and cubic elements refer to piecewise quadratic Ω(0) Ω(1) Ω(2) Ω(3) Ω(4)
Fig. 3.10 Illustration of the piecewise linear basis functions corresponding to global node
or cubic functions over elements with three or four local nodes, re- 2 and 3.
spectively. Alternative names, frequently used in the following, are P1
elements for linear elements, P2 for quadratic elements, and so forth: Pd
signifies degree d of the polynomial basis functions. while ϕ2 (x) has negative slope over [x2 , x3 ] and corresponds to setting
i = 2 in (3.4),
3.1.6 Calculating the linear system ϕ2 (x) = 1 − (x − x2 )/h .

We can now easily integrate,
The elements in the coefficient matrix and right-hand side are given by
Z Z
the formulas (2.28) and (2.29), but now the choice of ψi is ϕi . Consider P1 x3 x − x2 x − x2 h
elements where ϕi (x) is piecewise linear. Nodes and elements numbered A2,3 = ϕ2 ϕ3 dx = 1− dx = .
Ω x2 h h 6
consecutively from left to right in a uniformly partitioned mesh imply
The diagonal entry in the coefficient matrix becomes
the nodes
Z 2 Z 2
xi = ih, i = 0, . . . , Nn − 1, x2 x − x1 x3 x − x2 2h
A2,2 = dx + 1− dx = .
and the elements x1 h x2 h 3
The entry A2,1 has an integral that is geometrically similar to the situation
Ω (i) = [xi , xi+1 ] = [ih, (i + 1)h], i = 0, . . . , Ne − 1 . (3.5) in Figure 3.10, so we get A2,1 = h/6.
We have in this case Ne elements and Nn = Ne + 1 nodes. The parameter Calculating a general row in the matrix. We can now generalize the
N denotes the number of unknowns in the expansion for u, and with calculation of matrix entries to a general row number i. The entry
R
the P1 elements, N = Nn . The domain is Ω = [x0 , xN ]. The formula Ai,i−1 = Ω ϕi ϕi−1 dx involves hat functions as depicted in Figure 3.11.
for ϕi (x) is given by (3.4) and a graphical illustration is provided in Since the integral is geometrically identical to the situation with specific
Figures 3.8 and 3.11. nodes 2 and 3, we realize that Ai,i−1 = Ai,i+1 = h/6 and Ai,i = 2h/3.
Calculating specific matrix entries. Let us calculate the specific matrix However, we can compute the integral directly too:
R
entry A2,3 = Ω ϕ2 ϕ3 dx. Figure 3.10 shows what ϕ2 and ϕ3 look like.
We realize from this figure that the product ϕ2 ϕ3 6= 0 only over element
2, which contains node 2 and 3. The particular formulas for ϕ2 (x) and
ϕ3 (x) on [x2 , x3 ] are found from (3.4). The function ϕ3 has positive slope
over [x2 , x3 ] and corresponds to the interval [xi−1 , xi ] in (3.4). With i = 3
we get
ϕ3 (x) = (x − x2 )/h,
Z
Ai,i−1 = ϕi ϕi−1 dx The general formula for bi , see Figure 3.12, is now easy to set up
ZΩxi−1 Z xi Z xi+1
Z Z Z
= ϕi ϕi−1 dx + ϕi ϕi−1 dx + ϕi ϕi−1 dx xi x − xi−1 x − xi xi+1
xi−2 xi−1 xi bi = ϕi (x)f (x) dx = f (x) dx+ f (x) dx . 1−
| {z } | {z } Ω xi−1 hxi h
ϕi =0 ϕi−1 =0
Z (3.6)
We remark that the above formula applies to internal nodes (living at the
xi x − xi x − xi−1 h
= 1− dx = .
xi−1 h h 6 interface between two elements) and that for the nodes on the boundaries
| {z }| {z }
ϕi (x) ϕi−1 (x) only one integral needs to be computed.
We need a specific f (x) function to compute these integrals. With
The particular formulas for ϕi−1 (x) and ϕi (x) on [xi−1 , xi ] are found f (x) = x(1 − x) and two equal-sized elements in Ω = [0, 1], one gets
from (3.4): ϕi is the linear function with positive slope, corresponding    
to the interval [xi−1 , xi ] in (3.4), while φi−1 has a negative slope so the 210 2−h
h h2 
definition in interval [xi , xi+1 ] in (3.4) must be used. 
A = 1 4 1, b=

 12 − 14h  .
6 12
012 10 − 17h
The solution becomes
ϕi−1 ϕi
h2 5 23 2
c0 =
, c1 = h − h2 , c2 = 2h − h .
6 6 6
x The resulting function
i−2 i−1 i i +1
Fig. 3.11 Illustration of two neighboring linear (hat) functions with general node numbers.
u(x) = c0 ϕ0 (x) + c1 ϕ1 (x) + c2 ϕ2 (x)
is displayed in Figure 3.13 (left). Doubling the number of elements to
The first and last row of the coefficient matrix lead to slightly different four leads to the improved approximation in the right part of Figure 3.13.
integrals:
0.30 0.30
Z Z x1 2 u u
x − x0 h f f
A0,0 = ϕ20 dx = 1− dx = . 0.25 0.25
Ω x0 h 3
0.20 0.20
Similarly, AN,N involves an integral over only one element and hence
equals h/3. 0.15 0.15
0.10 0.10
0.05 0.05
ϕi f(x)
0.000.0 0.000.0
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
Fig. 3.13 Least squares approximation of a parabola using 2 (left) and 4 (right) P1
elements.
x
i−2 i−1 i i +1
Fig. 3.12 Right-hand side integral with the product of a basis function and the given
function to approximate.
" #
3.1.7 Assembly of elementwise computations (e)
(e)
Ã0,0 Ã0,1
(e)
Ã = (e) (e) .
Our integral computations so far have been straightforward. However, Ã1,0 Ã1,1
with higher-degree polynomials and in higher dimensions (2D and 3D), while P2 elements have a 3 × 3 element matrix:
integrating in the physical domain gets increasingly complicated. Instead,  
(e) (e) (e)
integrating over one element at a time, and transforming each element to Ã Ã0,1 Ã0,2
 0,0 
a common standardized geometry in a new reference coordinate system, Ã(e) = (e) (e) (e) 
 Ã1,0 Ã1,1 Ã1,2  .
is technically easier. Almost all computer codes employ a finite element (e) (e) (e)
Ã2,0 Ã2,1 Ã2,2
algorithm that calculates the linear system by integrating over one
(e)
element at a time. We shall therefore explain this algorithm next. The Assembly of element matrices. Given the numbers Ãr,s , we should,
amount of details might be overwhelming during a first reading, but once according to (3.7), add the contributions to the global coefficient matrix
all those details are done right, one has a general finite element algorithm by
that can be applied to all sorts of elements, in any space dimension, no
matter how geometrically complicated the domain is. Aq(e,r),q(e,s) := Aq(e,r),q(e,s) + Ã(e)
r,s , r, s ∈ Id . (3.8)
The element matrix. We start by splitting the integral over Ω into a This process of adding in elementwise contributions to the global matrix
sum of contributions from each element: is called finite element assembly or simply assembly.
Z X Z Figure 3.14 illustrates how element matrices for elements with two
(e) (e)
Ai,j = ϕi ϕj dx = Ai,j , Ai,j = ϕi ϕj dx . (3.7) nodes are added into the global matrix. More specifically, the figure shows
Ω (e)
Ω e how the element matrix associated with elements 1 and 2 assembled,
(e)
Now, Ai,j 6= 0, if and only if, i and j are nodes in element e (look assuming that global nodes are numbered from left to right in the domain.
at Figure 3.11 to realize this property, but the result also holds for With regularly numbered P3 elements, where the element matrices have
all types of elements). Introduce i = q(e, r) as the mapping of local size 4 × 4, the assembly of elements 1 and 2 are sketched in Figure 3.15.
node number r in element e to the global node number i. This is just a
short mathematical notation for the expression i=elements[e][r] in a
program. Let r and s be the local node numbers corresponding to the
global node numbers i = q(e, r) and j = q(e, s). With d + 1 nodes per
(e)
element, all the nonzero matrix entries in Ai,j arise from the integrals
involving basis functions with indices corresponding to the global node
numbers in element number e:
Z
ϕq(e,r) ϕq(e,s) dx, r, s = 0, . . . , d .
Ω (e)
These contributions can be collected in a (d + 1) × (d + 1) matrix known
as the element matrix. Let Id = {0, . . . , d} be the valid indices of r and
s. We introduce the notation
Fig. 3.14 Illustration of matrix assembly: regularly numbered P1 elements.
Ã(e) = {Ã(e)
r,s }, r, s ∈ Id ,
for the element matrix. For P1 elements (d = 2) we have
Assembly of irregularly numbered elements and nodes. After assem-
bly of element matrices corresponding to regularly numbered elements
Fig. 3.16 Illustration of matrix assembly: irregularly numbered P1 elements.
(e)
we can collect the d + 1 nonzero contributions bi , for i = q(e, r), r ∈ Id ,
Fig. 3.15 Illustration of matrix assembly: regularly numbered P3 elements.
in an element vector
b̃(e) (e)
r = {b̃r }, r ∈ Id .
and nodes are understood, it is wise to study the assembly process for These contributions are added to the global right-hand side by an assem-
irregularly numbered elements and nodes. Figure 3.5 shows a mesh where bly process similar to that for the element matrices:
the elements array, or q(e, r) mapping in mathematical notation, is
given as bq(e,r) := bq(e,r) + b̃(e) r ∈ Id . (3.10)
r ,
elements = [[2, 1], [4, 5], [0, 4], [3, 0], [5, 2]]
The associated assembly of element matrices 1 and 2 is sketched in

Figure 3.16. 3.1.8 Mapping to a reference element
We have created animations to illustrate the assembly of P1 and P3 Instead of computing the integrals
elements with regular numbering as well as P1 elements with irregular Z
numbering. The reader is encouraged to develop a “geometric” under-
Ã(e)
r,s = ϕq(e,r) (x)ϕq(e,s) (x) dx
standing of how element matrix entries are added to the global matrix. Ω (e)
This understanding is crucial for hand computations with the finite over some element Ω (e) = [xL , xR ] in the physical coordinate system,
element method. it turns out that it is considerably easier and more convenient to map
The element vector. The right-hand side of the linear system is also the element domain [xL , xR ] to a standardized reference element domain
computed elementwise: [−1, 1] and compute all integrals over the same domain [−1, 1]. We have
now introduced xL and xR as the left and right boundary points of
Z X Z an arbitrary element. With a natural, regular numbering of nodes and
(e) (e)
bi = f (x)ϕi (x) dx = bi , bi = f (x)ϕi (x) dx . (3.9) elements from left to right through the domain, we have xL = xe and
Ω (e)
Ω e xR = xe+1 for P1 elements.
(e)
We observe that bi 6= 0 if and only if global node i is a node in element
e (look at Figure 3.12 to realize this property). With d nodes per element
The coordinate transformation. Let X ∈ [−1, 1] be the coordinate in Why reference elements?
the reference element. A linear mapping, also known as an affine mapping,
from X to x can be written The great advantage of using reference elements is that the formulas
for the basis functions, ϕ̃r (X), are the same for all elements and
1 1 independent of the element geometry (length and location in the
x = (xL + xR ) + (xR − xL )X . (3.11)
2 2 mesh). The geometric information is “factored out” in the simple
This relation can alternatively be expressed as mapping formula and the associated det J quantity. Also, the in-
tegration domain is the same for all elements. All these features
1
x = xm + hX, (3.12) contribute to simplify computer codes and make them more general.
2
where we have introduced the element midpoint xm = (xL + xR )/2 and
the element length h = xR − xL . Formulas for local basis functions. The ϕ̃r (x) functions are simply the
Lagrange polynomials defined through the local nodes in the reference
Formulas for the element matrix and vector entries. Integrating on element. For d = 1 and two nodes per element, we have the linear
the reference element is a matter of just changing the integration variable Lagrange polynomials
from x to X. Let
ϕ̃r (X) = ϕq(e,r) (x(X)) (3.13) 1

ϕ̃0 (X) = (1 − X) (3.17)
2
be the basis function associated with local node number r in the reference 1
element. Switching from x to X as integration variable, using the rules ϕ̃1 (X) = (1 + X) (3.18)
2
from calculus, results in
Quadratic polynomials, d = 2, have the formulas
Z Z 1 dx
Ã(e)
r,s = ϕq(e,r) (x)ϕq(e,s) (x) dx = ϕ̃r (X)ϕ̃s (X) dX . (3.14) 1
Ω (e) dX ϕ̃0 (X) = (X − 1)X (3.19)
2
−1
In 2D and 3D, dx is transformed to detJ dX, where J is the Jacobian ϕ̃1 (X) = 1 − X 2 (3.20)
of the mapping from x to X. In 1D, detJ dX = dx/ dX = h/2. To 1
obtain a uniform notation for 1D, 2D, and 3D problems we therefore ϕ̃2 (X) = (X + 1)X (3.21)
2
replace dx/ dX by det J already now. The integration over the reference
element is then written as In general,
Z 1
d
Y X − X(s)
Ã(e)
r,s = ϕ̃r (X)ϕ̃s (X) det J dX . (3.15) ϕ̃r (X) = , (3.22)
−1 s=0,s6=r
X(r) − X(s)
The corresponding formula for the element vector entries becomes where X(0) , . . . , X(d) are the coordinates of the local nodes in the reference
element. These are normally uniformly spaced: X(r) = −1 + 2r/d, r ∈ Id .
Z Z 1
b̃(e)
r = f (x)ϕq(e,r) (x) dx = f (x(X))ϕ̃r (X) det J dX . (3.16)
Ω (e) −1
3.1.9 Example on integration over a reference element
To illustrate the concepts from the previous section in a specific example,
we now consider calculation of the element matrix and vector for a
specific choice of d and f (x). A simple choice is d = 1 (P1 elements) and Integration of lower-degree polynomials above is tedious, and higher-
f (x) = x(1 − x) on Ω = [0, 1]. We have the general expressions (3.15) degree polynomials involve much more algebra, but sympy may help. For
(e) (e)
and (3.16) for Ãr,s and b̃r . Writing these out for the choices (3.17) and example, we can easily calculate (3.23), (3.24), and (3.27) by
(3.18), and using that det J = h/2, we can do the following calculations >>> import sympy as sym
of the element matrix entries: >>> x, x_m, h, X = sym.symbols(’x x_m h X’)
>>> sym.integrate(h/8*(1-X)**2, (X, -1, 1))
h/3
Z 1 >>> sym.integrate(h/8*(1+X)*(1-X), (X, -1, 1))
(e) h
Ã0,0 = ϕ̃0 (X)ϕ̃0 (X) dX h/6
−1 2 >>> x = x_m + h/2*X
Z 1 Z
1 1 h h 1 h >>> b_0 = sym.integrate(h/4*x*(1-x)*(1-X), (X, -1, 1))
= (1 − X) (1 − X) dX = (1 − X)2 dX = , (3.23) >>> print b_0
−1 2 2 2 8 −1 3 -h**3/24 + h**2*x_m/6 - h**2/12 - h*x_m**2/2 + h*x_m/2
Z 1
(e) h
Ã1,0 = ϕ̃1 (X)ϕ̃0 (X) dX
−1 2
Z 1 Z
1 1 h h 1 h
= (1 + X) (1 − X) dX = (1 − X 2 ) dX = , (3.24) 3.2 Implementation
−1 2 2 2 8 −1 6
(e) (e)
Ã0,1 = Ã1,0 , (3.25)
Z 1
Based on the experience from the previous example, it makes sense to
(e) h write some code to automate the analytical integration process for any
Ã1,1 = ϕ̃1 (X)ϕ̃1 (X) dX
−1 2 choice of finite element basis functions. In addition, we can automate
Z 1 Z
1 1 h h 1 h the assembly process and the solution of the linear system. Another
= (1 + X) (1 + X) dX = (1 + X)2 dX = . (3.26)
−1 2 2 2 8 −1 3 advantage is that the code for these purposes document all details of all
steps in the finite element computational machinery. The complete code
The corresponding entries in the element vector becomes using (3.12))
can be found in the module file fe_approx1D.py.
Z 1
(e) h
b̃0 = f (x(X))ϕ̃0 (X) dX
−1 2 3.2.1 Integration
Z 1
1 1 1 h
= (xm + hX)(1 − (xm + hX)) (1 − X) dX First we need a Python function for defining ϕ̃r (X) in terms of a Lagrange
−1 2 2 2 2
1 3 1 2 1 2 1 2 1 polynomial of degree d:
= − h + h xm − h − hxm + hxm (3.27)
24 6 12 2 2 import sympy as sym
Z 1 import numpy as np
(e) h
b̃1 = f (x(X))ϕ̃1 (X) dX
−1 2 def basis(d, point_distribution=’uniform’, symbolic=False):
Z 1
1 1 1 h """
= (xm + hX)(1 − (xm + hX)) (1 + X) dX Return all local basis function phi as functions of the
−1 2 2 2 2 local point X in a 1D element with d+1 nodes.
1 3 1 2 1 2 1 2 1 If symbolic=True, return symbolic expressions, else
= − h − h xm + h − hxm + hxm . (3.28) return Python functions of X.
24 6 12 2 2 point_distribution can be ’uniform’ or ’Chebyshev’.
In the last two expressions we have used the element midpoint xm . """
X = sym.symbols(’X’)
if d == 0:
phi_sym = [1]
3.2 Implementation 93 94 3 Function approximation by finite elements
else: of the element interval Omega_e is used and the final matrix elements
if point_distribution == ’uniform’: are numbers, not symbols. This functionality can be demonstrated:
if symbolic:
# Compute symbolic nodes >>> from fe_approx1D import *
h = sym.Rational(1, d) # node spacing >>> phi = basis(d=1, symbolic=True)
nodes = [2*i*h - 1 for i in range(d+1)] >>> phi
else: [-X/2 + 1/2, X/2 + 1/2]
nodes = np.linspace(-1, 1, d+1) >>> element_matrix(phi, Omega_e=[0.1, 0.2], symbolic=True)
elif point_distribution == ’Chebyshev’: [h/3, h/6]
# Just numeric nodes [h/6, h/3]
nodes = Chebyshev_nodes(-1, 1, d) >>> element_matrix(phi, Omega_e=[0.1, 0.2], symbolic=False)
[0.0333333333333333, 0.0166666666666667]
phi_sym = [Lagrange_polynomial(X, r, nodes) [0.0166666666666667, 0.0333333333333333]
for r in range(d+1)]
# Transform to Python functions The computation of the element vector is done by a similar procedure:
phi_num = [sym.lambdify([X], phi_sym[r], modules=’numpy’)
for r in range(d+1)] def element_vector(f, phi, Omega_e, symbolic=True):
return phi_sym if symbolic else phi_num n = len(phi)
b_e = sym.zeros((n, 1))
def Lagrange_polynomial(x, i, points): # Make f a function of X
p = 1 X = sym.Symbol(’X’)
for k in range(len(points)): if symbolic:
if k != i: h = sym.Symbol(’h’)
p *= (x - points[k])/(points[i] - points[k]) else:
return p h = Omega_e[1] - Omega_e[0]
x = (Omega_e[0] + Omega_e[1])/2 + h/2*X # mapping
Observe how we construct the phi_sym list to be symbolic expressions f = f.subs(’x’, x) # substitute mapping formula for x
for ϕ̃r (X) with X as a Symbol object from sympy. Also note that the detJ = h/2 # dx/dX
for r in range(n):
Lagrange_polynomial function (here simply copied from Section 2.3.2) b_e[r] = sym.integrate(f*phi[r]*detJ, (X, -1, 1))
works with both symbolic and numeric variables. return b_e
Now we can write the function that computes the element matrix with a
Here we need to replace the symbol x in the expression for f by the
list of symbolic expressions for ϕr (phi = basis(d, symbolic=True)):
mapping formula such that f can be integrated in terms of X, cf. the
(e) R1
def element_matrix(phi, Omega_e, symbolic=True): formula b̃r = −1 f (x(X))ϕ̃r (X) h2 dX.
n = len(phi)
A_e = sym.zeros((n, n)) The integration in the element matrix function involves only products
X = sym.Symbol(’X’) of polynomials, which sympy can easily deal with, but for the right-
if symbolic: hand side sympy may face difficulties with certain types of expressions
h = sym.Symbol(’h’)
else: f. The result of the integral is then an Integral object and not a
h = Omega_e[1] - Omega_e[0] number or expression as when symbolic integration is successful. It may
detJ = h/2 # dx/dX therefore be wise to introduce a fall back on numerical integration. The
for r in range(n):
for s in range(r, n): symbolic integration can also spend considerable time before reaching an
A_e[r,s] = sym.integrate(phi[r]*phi[s]*detJ, (X, -1, 1)) unsuccessful conclusion, so we may also introduce a parameter symbolic
A_e[s,r] = A_e[r,s]
return A_e
to turn symbolic integration on and off:
def element_vector(f, phi, Omega_e, symbolic=True):
In the symbolic case (symbolic is True), we introduce the element length ...
as a symbol h in the computations. Otherwise, the real numerical value if symbolic:
I = sym.integrate(f*phi[r]*detJ, (X, -1, 1))
if not symbolic or isinstance(I, sym.Integral):
h = Omega_e[1] - Omega_e[0] # Ensure h is numerical standard numerical solver is called. The symbolic version is suited for
detJ = h/2 small problems only (small N values) since the calculation time becomes
integrand = sym.lambdify([X], f*phi[r]*detJ)
I = sym.mpmath.quad(integrand, [-1, 1]) prohibitively large otherwise. Normally, the symbolic integration will be
b_e[r] = I more time consuming in small problems than the symbolic solution of
... the linear system.
Numerical integration requires that the symbolic integrand is converted
to a plain Python function (integrand) and that the element length h
is a real number. 3.2.3 Example on computing symbolic approximations
We can exemplify the use of assemble on the computational case from
Section 3.1.6 with two P1 elements (linear basis functions) on the domain
3.2.2 Linear system assembly and solution Ω = [0, 1]. Let us first work with a symbolic element length:
The complete algorithm for computing and assembling the elementwise >>> h, x = sym.symbols(’h x’)
contributions takes the following form >>> nodes = [0, h, 2*h]
>>> elements = [[0, 1], [1, 2]]
def assemble(nodes, elements, phi, f, symbolic=True): >>> phi = basis(d=1, symbolic=True)
N_n, N_e = len(nodes), len(elements) >>> f = x*(1-x)
if symbolic: >>> A, b = assemble(nodes, elements, phi, f, symbolic=True)
A = sym.zeros((N_n, N_n)) >>> A
b = sym.zeros((N_n, 1)) # note: (N_n, 1) matrix [h/3, h/6, 0]
else: [h/6, 2*h/3, h/6]
A = np.zeros((N_n, N_n)) [ 0, h/6, h/3]
b = np.zeros(N_n) >>> b
for e in range(N_e): [ h**2/6 - h**3/12]
Omega_e = [nodes[elements[e][0]], nodes[elements[e][-1]]] [ h**2 - 7*h**3/6]
[5*h**2/6 - 17*h**3/12]
A_e = element_matrix(phi, Omega_e, symbolic) >>> c = A.LUsolve(b)
b_e = element_vector(f, phi, Omega_e, symbolic) >>> c
[ h**2/6]
for r in range(len(elements[e])): [12*(7*h**2/12 - 35*h**3/72)/(7*h)]
for s in range(len(elements[e])): [ 7*(4*h**2/7 - 23*h**3/21)/(2*h)]
A[elements[e][r],elements[e][s]] += A_e[r,s]
b[elements[e][r]] += b_e[r]
return A, b
3.2.4 Using interpolation instead of least squares
The nodes and elements variables represent the finite element mesh as
explained earlier. As an alternative to the least squares formulation, we may compute the c
Given the coefficient matrix A and the right-hand side b, we can vector based on the interpolation method from Section 2.4.1, using finite
P
compute the coefficients {cj }j∈Is in the expansion u(x) = j cj ϕj as the element basis functions. Choosing the nodes as interpolation points, the
solution vector c of the linear system: method can be written as
if symbolic: X
c = A.LUsolve(b) u(xi ) = cj ϕj (xi ) = f (xi ), i ∈ Is .
else: j∈Is
c = np.linalg.solve(A, b)
The coefficient matrix Ai,j = ϕj (xi ) becomes the identity matrix because
When A and b are sympy arrays, the solution procedure implied by basis function number j vanishes at all nodes, except node i: ϕj (xi ) = δij .
A.LUsolve is symbolic. Otherwise, A and b are numpy arrays and a Therefore, ci = f (xi ).
The associated sympy calculations are def approximate(f, symbolic=False, d=1, N_e=4, filename=’tmp.pdf’):
>>> fn = sym.lambdify([x], f) which computes a mesh with N_e elements, basis functions of degree d, and
>>> c = [fn(xc) for xc in nodes] approximates a given symbolic expression f by a finite element expansion
>>> c P P
[0, h*(1 - h), 2*h*(1 - 2*h)] u(x) = j cj ϕj (x). When symbolic is False, u(x) = j cj ϕj (x) can be
computed at a (large) number of points and plotted together with f (x).
These expressions are much simpler than those based on least squares or The construction of the pointwise function u from the solution vector
projection in combination with finite element basis functions. However, P
c is done elementwise by evaluating r cr ϕ̃r (X) at a (large) number of
which of the two methods that is most appropriate for a given task is points in each element in the local coordinate system, and the discrete
problem-dependent, so we need both methods in our toolbox. (x, u) values on each element are stored in separate arrays that are finally
concatenated to form a global array for x and for u. The details are
found in the u_glob function in fe_approx1D.py.
3.2.5 Example on computing numerical approximations
The numerical computations corresponding to the symbolic ones in
Section 3.2.3 (still done by sympy and the assemble function) go as 3.2.6 The structure of the coefficient matrix
follows: Let us first see how the global matrix looks like if we assemble symbolic
>>> nodes = [0, 0.5, 1] element matrices, expressed in terms of h, from several elements:
>>> elements = [[0, 1], [1, 2]]
>>> phi = basis(d=1, symbolic=True) >>> d=1; N_e=8; Omega=[0,1] # 8 linear elements on [0,1]
>>> x = sym.Symbol(’x’) >>> phi = basis(d)
>>> f = x*(1-x) >>> f = x*(1-x)
>>> A, b = assemble(nodes, elements, phi, f, symbolic=False) >>> nodes, elements = mesh_symbolic(N_e, d, Omega)
>>> A >>> A, b = assemble(nodes, elements, phi, f, symbolic=True)
[ 0.166666666666667, 0.0833333333333333, 0] >>> A
[0.0833333333333333, 0.333333333333333, 0.0833333333333333] [h/3, h/6, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0.0833333333333333, 0.166666666666667] [h/6, 2*h/3, h/6, 0, 0, 0, 0, 0, 0]
>>> b [ 0, h/6, 2*h/3, h/6, 0, 0, 0, 0, 0]
[ 0.03125] [ 0, 0, h/6, 2*h/3, h/6, 0, 0, 0, 0]
[0.104166666666667] [ 0, 0, 0, h/6, 2*h/3, h/6, 0, 0, 0]
[ 0.03125] [ 0, 0, 0, 0, h/6, 2*h/3, h/6, 0, 0]
>>> c = A.LUsolve(b) [ 0, 0, 0, 0, 0, h/6, 2*h/3, h/6, 0]
>>> c [ 0, 0, 0, 0, 0, 0, h/6, 2*h/3, h/6]
[0.0416666666666666] [ 0, 0, 0, 0, 0, 0, 0, h/6, h/3]
[ 0.291666666666667]
[0.0416666666666666] The reader is encouraged to assemble the element matrices by hand and
verify this result, as this exercise will give a hands-on understanding of
The fe_approx1D module contains functions for generating the nodes what the assembly is about. In general we have a coefficient matrix that
and elements lists for equal-sized elements with any number of nodes is tridiagonal:
per element. The coordinates in nodes can be expressed either through
the element length symbol h (symbolic=True) or by real numbers
(symbolic=False):
nodes, elements = mesh_uniform(N_e=10, d=3, Omega=[0,1],
symbolic=True)
There is also a function

   
2 1 0 ··· ··· ··· ··· ··· 0 4 2 −1 0 0 0 0 0 0
.. 
1 4 1 ...
  2 16 2 0 0 0 0 0 0 
. 




 . .. 
  −1 2 8 2 −1 0 0 0 0 
0 1 4 1 .. . 
 0 0 2 16 2 0 0 0 0 

 
. . . . ..  h  
 .. . . .. .. 0 .  A=  0 0 −1 2 8 2 −1 0 0  (3.32)
  30 
 0 0 0 0 2 16 2 0 0 

h .
A =  ..
 . . .
.. .. .. .. ... . .
.

(3.29)  
6 .   
 0 0 0 0 −1 2 8 2 −1 
 .. . . .. 
  
. 0 1 4 1 . .  0 0 0 0 0 0 2 16 2 
 
 .. .. .. .. ..  0 0 0 0 0 0 −1 2 4
.

. . . . 0 

 .. . .  In general, for i odd we have the nonzeroes
. . 1 4 1
0 ··· ··· ··· ··· ··· 0 1 2
The structure of the right-hand side is more difficult to reveal since it Ai,i−2 = −1, Ai−1,i = 2, Ai,i = 8, Ai+1,i = 2, Ai+2,i = −1,
involves an assembly of elementwise integrals of f (x(X))ϕ̃r (X)h/2, which
multiplied by h/30, and for i even we have the nonzeros
obviously depend on the particular choice of f (x). Numerical integration
can give some insight into the nature of the right-hand side. For this
Ai−1,i = 2, Ai,i = 16, Ai+1,i = 2,
purpose it is easier to look at the integration in x coordinates, which
gives the general formula (3.6). For equal-sized elements of length h, we multiplied by h/30. The rows with odd numbers correspond to nodes at
can apply the Trapezoidal rule at the global node points to arrive at the element boundaries and get contributions from two neighboring ele-
ments in the assembly process, while the even numbered rows correspond
  to internal nodes in the elements where only one element contributes to
1 1 N
X −1
the values in the global matrix.
bi = h  ϕi (x0 )f (x0 ) + ϕi (xN )f (xN ) + ϕi (xj )f (xj ) ,
2 2 j=1
which leads to
3.2.7 Applications
(
1
bi = 2 hf (xi ), i = 0 or i = N,
(3.30) With the aid of the approximate function in the fe_approx1D module we
hf (xi ), 1≤i≤N −1 can easily investigate the quality of various finite element approximations
The reason for this simple formula is just that ϕi is either 0 or 1 at the to some given functions. Figure 3.17 shows how linear and quadratic
nodes and 0 at all but one of them. elements approximate the polynomial f (x) = x(1 − x)8 on Ω = [0, 1],
Going to P2 elements (d=2) leads to the element matrix using equal-sized elements. The results arise from the program
  import sympy as sym
4 2 −1 from fe_approx1D import approximate
h  
A(e) =  2 16 2  (3.31) x = sym.Symbol(’x’)
30
−1 2 4 approximate(f=x*(1-x)**8, symbolic=False, d=1, N_e=4)
approximate(f=x*(1-x)**8, symbolic=False, d=2, N_e=2)
and the following global matrix, assembled here from four elements: approximate(f=x*(1-x)**8, symbolic=False, d=1, N_e=8)
approximate(f=x*(1-x)**8, symbolic=False, d=2, N_e=4)
The quadratic functions are seen to be better than the linear ones for the
same value of N , as we increase N . This observation has some generality:
higher degree is not necessarily better on a coarse mesh, but it is as we Most of the matrix entries Ai,j are zero, because (ϕi , ϕj ) = 0 unless i
refine the mesh and the function is properly resolved. and j are nodes in the same element. In 1D problems, we do not need to
store or compute with these zeros when solving the linear system, but
0.05
u
0.05
u that requires solution methods adapted to the kind of matrices produced
0.04
f
0.04
f
by the finite element approximations.
A matrix whose majority of entries are zeros, is known as a sparse
0.03 0.03
matrix. Utilizing sparsity in software dramatically decreases the storage
0.02 0.02 demands and the CPU-time needed to compute the solution of the
0.01 0.01 linear system. This optimization is not very critical in 1D problems
where modern computers can afford computing with all the zeros in the
0.00 0.00
complete square matrix, but in 2D and especially in 3D, sparse matrices
0.010.0 0.2 0.4 0.6 0.8 1.0 0.010.0 0.2 0.4 0.6 0.8 1.0 are fundamental for feasible finite element computations. One of the
advantageous features of the finite element method is that it produces
0.05
u
0.05
u very sparse matrices. The reason is that the basis functions have local
0.04
f
0.04
f
support such that the product of two basis functions, as typically met in
integrals, is mostly zero.
0.03 0.03
Using a numbering of nodes and elements from left to right over a
0.02 0.02 1D domain, the assembled coefficient matrix has only a few diagonals
0.01 0.01 different from zero. More precisely, 2d + 1 diagonals around the main
diagonal are different from zero, where d is the order of the polynomial.
0.00 0.00
With a different numbering of global nodes, say a random ordering,
0.010.0 0.2 0.4 0.6 0.8 1.0 0.010.0 0.2 0.4 0.6 0.8 1.0 the diagonal structure is lost, but the number of nonzero elements is
Fig. 3.17 Comparison of the finite element approximations: 4 P1 elements with 5 nodes unaltered. Figures 3.18 and 3.19 exemplify sparsity patterns.
(upper left), 2 P2 elements with 5 nodes (upper right), 8 P1 elements with 9 nodes (lower
left), and 4 P2 elements with 9 nodes (lower right).
3.2.8 Sparse matrix storage and solution

Some of the examples in the preceding section took several minutes to com-
pute, even on small meshes consisting of up to eight elements. The main
explanation for slow computations is unsuccessful R
symbolic integration: Fig. 3.18 Matrix sparsity pattern for left-to-right numbering (left) and random numbering
sympy may use a lot of energy on integrals like f (x(X))ϕ̃r (X)h/2 dx (right) of nodes in P1 elements.
before giving up, and the program then resorts to numerical integration.
Codes that can deal with a large number of basis functions and accept The scipy.sparse library supports creation of sparse matrices and
flexible choices of f (x) should compute all integrals numerically and
linear system solution.
replace the matrix objects from sympy by the far more efficient array
objects from numpy. • scipy.sparse.diags for matrix defined via diagonals
There is also another (potential) reason for slow code: the solution • scipy.sparse.dok_matrix for matrix incrementally defined via index
algorithm for the linear system performs much more work than necessary. pairs (i, j)
tion may be more difficult in particular because of boundary conditions.

It also appears that the DOK sparse matrix format available in the
scipy.sparse package is fast enough even for big 1D problems on to-
day’s laptops, so the need for improving efficiency occurs only in 2D
and 3D problems, but then the complexity of the mesh favors the DOK
format.
Fig. 3.19 Matrix sparsity pattern for left-to-right numbering (left) and random numbering
(right) of nodes in P3 elements.
3.3 Comparison of finite elements and finite differences
The dok_matrix object is most convenient for finite element compu- The previous sections on approximating f by a finite element function u
tations. This sparse matrix format is called DOK, which stands for utilize the projection/Galerkin or least squares approaches to minimize
Dictionary Of Keys: the implementation is basically a dictionary (hash) the approximation error. We may, alternatively, use the collocation/in-
with the entry indices (i,j) as keys. terpolation method as described in Section 3.2.4. Here we shall compare
Rather than declaring A = np.zeros((N_n, N_n)), a DOK sparse these three approaches with what one does in the finite difference method
matrix is created by when representing a given function on a mesh.
import scipy.sparse
A = scipy.sparse.dok_matrix((N_n, N_n))
3.3.1 Finite difference approximation of given functions
When there is any need to set or add some matrix entry i,j, just do
Approximating a given function f (x) on a mesh in a finite difference
A[i,j] = entry
# or context will typically just sample f at the mesh points. If ui is the value
A[i,j] += entry of the approximate u at the mesh point xi , we have ui = f (xi ). The
collocation/interpolation method using finite element basis functions
The indexing creates the matrix entry on the fly, and only the nonzero
gives exactly the same representation, as shown Section 3.2.4,
entries in the matrix will be stored.
To solve a system with right-hand side b (one-dimensional numpy
u(xi ) = ci = f (xi ) .
array) with a sparse coefficient matrix A, we must use some kind of
a sparse linear system solver. The safest choice is a method based on How does a finite element Galerkin or least squares approximation
sparse Gaussian elimination. One high-qualify package for this purpose differ from this straightforward interpolation of f ? This is the question
if UMFPACK. It is interfaced from SciPy by to be addressed next. We now limit the scope to P1 elements since this
is the element type that gives formulas closest to those arising in the
import scipy.sparse.linalg
c = scipy.sparse.linalg.spsolve(A.tocsr(), b, use_umfpack=True) finite difference method.
The call A.tocsr() is not strictly needed (a warning is issued otherwise),
but ensures that the solution algorithm can efficiently work with a copy
3.3.2 Interpretation of a finite element approximation in
of the sparse matrix in Compressed Sparse Row (CSR) format.
An advantage of the scipy.sparse.diags matrix over the DOK for-
terms of finite difference operators
mat is that the former allows vectorized assignment to the matrix. Vec- The linear system arising from a Galerkin or least squares approximation
torization is possible for approximation problems when all elements are reads in general
of the same type. However, when solving differential equations, vectoriza-
3.3 Comparison of finite elements and finite differences 105 106 3 Function approximation by finite elements
X
cj (ψi , ψj ) = (f, ψi ), i ∈ Is . To proceed with the right-hand side, we can turn to numerical inte-
j∈Is gration schemes. The Trapezoidal method for (f, ϕi ), based on sampling
In the finite element approximation we choose ψi = ϕi . With ϕi corre- the integrand f ϕi at the node points xi = ih gives
sponding to P1 elements and a uniform mesh of element length h we have
in Section 3.1.6 calculated the matrix with entries (ϕi , ϕj ). Equation Z
1 N
X −1
number i reads (f, ϕi ) = f ϕi dx ≈ h (f (x0 )ϕi (x0 )+f (xN )ϕi (xN ))+h f (xj )ϕi (xj ) .
Ω 2 j=1
h
(ui−1 + 4ui + ui+1 ) = (f, ϕi ) . (3.33) Since ϕi is zero at all these points, except at xi , the Trapezoidal rule
6
The first and last equation, corresponding to i = 0 and i = N are slightly collapses to one term:
different, see Section 3.2.6.
The finite difference counterpart to (3.33) is just ui = fi as explained in (f, ϕi ) ≈ hf (xi ), (3.35)
Section 3.3.1. To easier compare this result to the finite element approach for i = 1, . . . , N − 1, which is the same result as with collocation/inter-
to approximating functions, we can rewrite the left-hand side of (3.33) as polation, and of course the same result as in the finite difference method.
For the end points i = 0 and i = N we get contribution from only one
1
h(ui + (ui−1 − 2ui + ui+1 )) . (3.34) element so
6
Thinking in terms of finite differences, we can write this expression using 1
(f, ϕi ) ≈ hf (xi ), i = 0, i = N . (3.36)
finite difference operator notation: 2
Simpson’s rule with sample points also in the middle of the elements,
h2 at xi+ 1 = (xi + xi+1 )/2, can be written as
[h(u +
Dx Dx u)]i ,
6 2
which is nothing but the standard discretization of (see also Ap-  

Z N −1 N −1
pendix A.1) h̃ X X
g(x) dx ≈ g(x0 ) + 2 g(xj ) + 4 g(xj+ 1 ) + f (x2N ) ,
Ω 3 2
h2 00
j=1 j=0
h(u +
u ).
6 where h̃ = h/2 is the spacing between the sample points. Our inte-
Before interpreting the approximation procedure as solving a differ- grand is g = f ϕi . For all the node points, ϕi (xj ) = δij , and there-
P −1
ential equation, we need to work out what the right-hand side is in the fore N j=1 f (xj )ϕi (xj ) = f (xi ). At the midpoints, ϕi (xi± 12 ) = 1/2 and
context of P1 elements. Since ϕi is the linear function that is 1 at xi and ϕi (xj+ 1 ) = 0 for j > 1 and j < i − 1. Consequently,
zero at all other nodes, only the interval [xi−1 , xi+1 ] contribute to the 2
integral on the right-hand side. This integral is naturally split into two N
X −1
1
parts according to (3.4): f (xj+ 1 )ϕi (xj+ 1 ) = (f (xj− 1 ) + f (xj+ 1 )) .
j=0
2 2 2 2 2
Z xi 1
Z xi+1 1 When 1 ≤ i ≤ N − 1 we then get
(f, ϕi ) = f (x) (x − xi−1 ) dx + f (x)(1 − (x − xi )) dx .
xi−1 h xi h h
(f, ϕi ) ≈
(f 1 + fi + fi+ 1 ) . (3.37)
However, if f is not known we cannot do much else with this expression. It 3 i− 2 2
is clear that many values of f around xi contribute to the right-hand side, This result shows that, with Simpson’s rule, the finite element method
not just the single point value f (xi ) as in the finite difference method. operates with the average of f over three points, while the finite difference
3.3 Comparison of finite elements and finite differences 107 108 3 Function approximation by finite elements
method just applies f at one point. We may interpret this as a "smearing" Computations in physical space. We have already seen that applying
or smoothing of f by the finite element method. the Trapezoidal rule to the right-hand side (f, ϕi ) simply gives f sampled
We can now summarize our findings. With the approximation of (f, ϕi ) at xi . Using the Trapezoidal rule on the matrix entries Ai,j = (ϕi , ϕj )
by the Trapezoidal rule, P1 elements give rise to equations that can be involves a sum X
expressed as a finite difference discretization of ϕi (xk )ϕj (xk ),
k
h2 but ϕi (xk ) = δik and ϕj (xk ) = δjk . The product ϕi ϕj is then differ-
u + u00 = f, u0 (0) = u0 (L) = 0, (3.38)
6 ent from zero only when sampled at xi and i = j. The Trapezoidal
expressed with operator notation as approximation to the integral is then
h2
[u +
Dx Dx u = f ]i . (3.39) (ϕi , ϕj ) ≈ h, i = j,
6
As h → 0, the extra term proportional to u00 goes to zero, and the two and zero if i 6= j. This means that we have obtained a diagonal matrix!
methods converge to the same solution. The first and last diagonal elements, (ϕ0 , ϕ0 ) and (ϕN , ϕN ) get contribu-
With the Simpson’s rule, we may say that we solve tion only from the first and last element, respectively, resulting in the
approximate integral value h/2. The corresponding right-hand side also
h2 has a factor 1/2 for i = 0 and i = N . Therefore, the least squares or
[u + Dx Dx u = f¯]i , (3.40) Galerkin approach with P1 elements and Trapezoidal integration results
6
in
where f¯i means the average 31 (fi−1/2 + fi + fi+1/2 ).
2
The extra term h6 u00 represents a smoothing effect: with just this
ci = fi , i ∈ Is .
term, we would find u by integrating f twice and thereby smooth f
considerably. In addition, the finite element representation of f involves Simpsons’s rule can be used to achieve a similar result for P2 elements,
an average, or a smoothing, of f on the right-hand side of the equation i.e, a diagonal coefficient matrix, but with the previously derived average
system. If f is a noisy function, direct interpolation ui = fi may result of f on the right-hand side.
in a noisy u too, but with a Galerkin or least squares formulation and P1 Elementwise computations. Identical results to those above will arise if
elements, we should expect that u is smoother than f unless h is very we perform elementwise computations. The idea is to use the Trapezoidal
small. rule on the reference element for computing the element matrix and vector.
The interpretation that finite elements tend to smooth the solution is When assembled, the same equations ci = f (xi ) arise. Exercise 3.10
valid in applications far beyond approximation of 1D functions. encourages you to carry out the details.
Terminology. The matrix with entries (ϕi , ϕj ) typically arises from
terms proportional to u in a differential equation where u is the unknown
3.3.3 Making finite elements behave as finite differences
function. This matrix is often called the mass matrix, because in the
With a simple trick, using numerical integration, we can easily produce early days of the finite element method, the matrix arose from the mass
the result ui = fi with the Galerkin or least square formulation with times acceleration term in Newton’s second law of motion. Making the
P1 elements. This is useful in many occasions when we deal with more mass matrix diagonal by, e.g., numerical integration, as demonstrated
difficult differential equations and want the finite element method to above, is a widely used technique and is called mass lumping. In time-
have properties like the finite difference method (solving standard linear dependent problems it can sometimes enhance the numerical accuracy
wave equations is one primary example). and computational efficiency of the finite element method. However, there
are also examples where mass lumping destroys accuracy.
3.4 A generalized element concept 109 110 3 Function approximation by finite elements
3.4 A generalized element concept 3.4.2 Extended finite element concept
So far, finite element computing has employed the nodes and element The concept of a finite element is now
lists together with the definition of the basis functions in the reference el- • a reference cell in a local reference coordinate system;
ement. Suppose we want to introduce a piecewise constant approximation • a set of basis functions ϕ̃i defined on the cell;
with one basis function ϕ̃0 (x) = 1 in the reference element, corresponding • a set of degrees of freedom that uniquely how basis functions from
to a ϕi (x) function that is 1 on element number i and zero on all other different elements are glued together across element interfaces. A
elements. Although we could associate the function value with a node common technique is to choose the basis functions such that ϕ̃i = 1
in the middle of the elements, there are no nodes at the ends, and the for degree of freedom number i that is associated with nodal point xi
previous code snippets will not work because we cannot find the element and ϕ̃i = 0 for all other degrees of freedom. This technique ensures
boundaries from the nodes list. the desired continuity;
In order to get a richer space of finite element approximations, we • a mapping between local and global degree of freedom numbers, here
need to revise the simple node and element concept presented so far called the dof map;
and introduce a more powerful terminology. Much literature employs the • a geometric mapping of the reference cell onto the cell in the physical
definition of node and element introduced in the previous sections so domain.
it is important have this knowledge, besides being a good pedagogical
background from understanding the extended element concept in the There must be a geometric description of a cell. This is trivial in 1D since
following. the cell is an interval and is described by the interval limits, here called
vertices. If the cell is Ω (e) = [xL , xR ], vertex 0 is xL and vertex 1 is xR .
The reference cell in 1D is [−1, 1] in the reference coordinate system X.
3.4.1 Cells, vertices, and degrees of freedom The expansion of u over one cell is often used:
We now introduce cells as the subdomains Ω (e) previously referred to as
X
elements. The cell boundaries are uniquely defined in terms of vertices. u(x) = ũ(X) = cr ϕ̃r (X), x ∈ Ω (e) , X ∈ [−1, 1], (3.41)
This applies to cells in both 1D and higher dimensions. We also define r
a set of degrees of freedom (dof), which are the quantities we aim to
where the sum is taken over the numbers of the degrees of freedom and
compute. The most common type of degree of freedom is the value of
cr is the value of u for degree of freedom number r.
the unknown function u at some point. (For example, we can introduce
Our previous P1, P2, etc., elements are defined by introducing d + 1
nodes as before and say the degrees of freedom are the values of u at
equally spaced nodes in the reference cell, a polynomial space (Pd)
the nodes.) The basis functions are constructed so that they equal unity
containing a complete set of polynomials of order d, and saying that the
for one particular degree of freedom and zero for the rest. This property
P degrees of freedom are the d + 1 function values at these nodes. The basis
ensures that when we evaluate u = j cj ϕj for degree of freedom number
functions must be 1 at one node and 0 at the others, and the Lagrange
i, we get u = ci . Integrals are performed over cells, usually by mapping
polynomials have exactly this property. The nodes can be numbered from
the cell of interest to a reference cell.
left to right with associated degrees of freedom that are numbered in the
With the concepts of cells, vertices, and degrees of freedom we increase
same way. The degree of freedom mapping becomes what was previously
the decoupling of the geometry (cell, vertices) from the space of basis
represented by the elements lists. The cell mapping is the same affine
functions. We will associate different sets of basis functions with a cell. In
mapping (3.11) as before.
1D, all cells are intervals, while in 2D we can have cells that are triangles
with straight sides, or any polygon, or in fact any two-dimensional
geometry. Triangles and quadrilaterals are most common, though. The
popular cell types in 3D are tetrahedra and hexahedra.
3.4.3 Implementation from fe_approx1D_numint import *

Implementationwise, f = x*(1 - x)
N_e = 10
• we replace nodes by vertices; vertices, cells, dof_map = mesh_uniform(N_e, d=3, Omega=[0,1])
phi = [basis(len(dof_map[e])-1) for e in range(N_e)]
• we introduce cells such that cell[e][r] gives the mapping from A, b = assemble(vertices, cells, dof_map, phi, f)
local vertex r in cell e to the global vertex number in vertices; c = np.linalg.solve(A, b)
• we replace elements by dof_map (the contents are the same for Pd # Make very fine mesh and sample u(x) on this mesh for plotting
x_u, u = u_glob(c, vertices, cells, dof_map,
elements). resolution_per_element=51)
plot(x_u, u)
Consider the example from Section 3.1.1 where Ω = [0, 1] is divided into
two cells, Ω (0) = [0, 0.4] and Ω (1) = [0.4, 1], as depicted in Figure 3.4. These steps are offered in the approximate function, which we here
The vertices are [0, 0.4, 1]. Local vertex 0 and 1 are 0 and 0.4 in cell 0 apply to see how well four P0 elements (piecewise constants) can approx-
and 0.4 and 1 in cell 1. A P2 element means that the degrees of freedom imate a parabola:
are the value of u at three equally spaced points (nodes) in each cell. from fe_approx1D_numint import *
The data structures become x=sym.Symbol("x")
for N_e in 4, 8:
vertices = [0, 0.4, 1] approximate(x*(1-x), d=0, N_e=N_e, Omega=[0,1])
cells = [[0, 1], [1, 2]]
dof_map = [[0, 1, 2], [2, 3, 4]] Figure 3.20 shows the result.
kam 5: does not seem to work, something for Yapi.
If we would approximate f by piecewise constants, known as P0
elements, we simply introduce one point or node in an element, preferably
X = 0, and define one degree of freedom, which is the function value at
P0, Ne=4, exact integration P0, Ne=8, exact integration
0.25 0.25
u u
this node. Moreover, we set ϕ̃0 (X) = 1. The cells and vertices arrays
f f
0.2 0.2
remain the same, but dof_map is altered:

0.15 0.15
dof_map = [[0], [1]]

0.1 0.1
We use the cells and vertices lists to retrieve information on the 0.05 0.05
geometry of a cell, while dof_map is the q(e, r) mapping introduced

earlier in the assembly of element matrices and vectors. For example,
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
the Omega_e variable (representing the cell interval) in previous code Fig. 3.20 Approximation of a parabola by 4 (left) and 8 (right) P0 elements.
snippets must now be computed as
Omega_e = [vertices[cells[e][0], vertices[cells[e][1]]
The assembly is done by

3.4.4 Computing the error of the approximation
A[dof_map[e][r], dof_map[e][s]] += A_e[r,s]
b[dof_map[e][r]] += b_e[r] So far we have focused on computing the coefficients cj in the approx-
P
imation u(x) = j cj ϕj as well as on plotting u and f for visual com-
We will hereafter drop the nodes and elements arrays and
parison. A more quantitative comparison needs to investigate the error
work exclusively with cells, vertices, and dof_map. The module
e(x) = f (x) − u(x). We mostly want a single number to reflect the error
fe_approx1D_numint.py now replaces the module fe_approx1D and
and use a norm for this purpose, usually the L2 norm
offers similar functions that work with the new concepts:
Z 1/2
where C is a constant depending on d and Ω = [0, L], but not on h,
||e||L2 = e2 dx .
Ω and the norm |f (d+1) | is defined through
Since the finite element approximation is defined for all x ∈ Ω, and we are Z !2
interested in how u(x) deviates from f (x) through all the elements, we can (d+1) 2
L dd+1 f
|f | = dx
either integrate analytically or use an accurate numerical approximation. 0 dxd+1
The latter is more convenient as it is a generally feasible and simple
approach. The idea is to sample e(x) at a large number of points in each
element. The function u_glob in the fe_approx1D_numint module does
this for u(x) and returns an array x with coordinates and an array u
with the u values:
3.4.5 Example on cubic Hermite polynomials
x, u = u_glob(c, vertices, cells, dof_map, The finite elements considered so far represent u as piecewise polynomials
resolution_per_element=101) with discontinuous derivatives at the cell boundaries. Sometimes it is
e = f(x) - u desirable to have continuous derivatives. A primary example is the
Let us use the Trapezoidal method to approximate the integral. Because solution of differential equations with fourth-order derivatives where
different elements may have different lengths, the x array may have a standard finite element formulations lead to a need for basis functions
non-uniformly distributed set of coordinates. Also, the u_glob function with continuous first-order derivatives. The most common type of such
works in an element by element fashion such that coordinates at the basis functions in 1D is the so-called cubic Hermite polynomials. The
boundaries between elements appear twice. We therefore need to use construction of such polynomials, as explained next, will further exemplify
a "raw" version of the Trapezoidal rule where we just add up all the the concepts of a cell, vertex, degree of freedom, and dof map.
trapezoids: Given a reference cell [−1, 1], we seek cubic polynomials with the
values of the function and its first-order derivative at X = −1 and X = 1
Z n−1
X 1 as the four degrees of freedom. Let us number the degrees of freedom as
g(x) dx ≈ (g(xj ) + g(xj+1 ))(xj+1 − xj ),
2
Ω j=0 • 0: value of function at X = −1
if x0 , . . . , xn are all the coordinates in x. In vectorized Python code, • 1: value of first derivative at X = −1
• 2: value of function at X = 1
g_x = g(x)
integral = 0.5*np.sum((g_x[:-1] + g_x[1:])*(x[1:] - x[:-1]))
• 3: value of first derivative at X = 1
Computing the L2 norm of the error, here named E, is now achieved by By having the derivatives as unknowns, we ensure that the derivative
of a basis function in two neighboring elements is the same at the node
e2 = e**2
E = np.sqrt(0.5*np.sum((e2[:-1] + e2[1:])*(x[1:] - x[:-1]))
points.
The four basis functions can be written in a general form
3
X
How does the error depend on h and d? ϕ̃i (X) = Ci,j X j ,
Theory and experiments show that the least squares or projection/-
j=0
Galerkin method in combination with Pd elements of equal length with four coefficients Ci,j , j = 0, 1, 2, 3, to be determined for each i. The
h has an error constraints that basis function number i must be 1 for degree of freedom
number i and zero for the other three degrees of freedom, gives four
||e||L2 = C|f (d+1) |hd+1 , (3.42) equations to determine Ci,j for each i. In mathematical detail,
3.5 Numerical integration 115 116 3 Function approximation by finite elements
Z 1
ϕ̃0 (−1) = 1, ϕ̃0 (1) = ϕ̃00 (−1) = ϕ̃0i (1) = 0,
g(X) dX ≈ 2g(0), X̄0 = 0, w0 = 2, (3.49)
ϕ̃01 (−1) = 1, ϕ̃1 (−1) = ϕ̃1 (1) = ϕ̃01 (1) = 0, −1
ϕ̃2 (1) = 1, ϕ̃2 (−1) = ϕ̃02 (−1) = ϕ̃02 (1) = 0, which integrates linear functions exactly.
ϕ̃03 (1) = 1, ϕ̃3 (−1) = ϕ̃03 (−1) = ϕ̃3 (1) = 0 .
These four 4 × 4 linear equations can be solved, yielding the following 3.5.1 Newton-Cotes rules
formulas for the cubic basis functions:
The Newton-Cotes rules are based on a fixed uniform distribution of
3 1 the integration points. The first two formulas in this family are the
ϕ̃0 (X) = 1 − (X + 1)2 + (X + 1)3 (3.43) well-known Trapezoidal rule,
4 4
1
ϕ̃1 (X) = −(X + 1)(1 − (X + 1))2 (3.44) Z
2 1
3 1 g(X) dX ≈ g(−1) + g(1), X̄0 = −1, X̄1 = 1, w0 = w1 = 1,
ϕ̃2 (X) = (X + 1) − (X + 1)3
2
(3.45) −1
4 2 (3.50)
1 1 and Simpson’s rule,
ϕ̃3 (X) = − (X + 1)( (X + 1)2 − (X + 1)) (3.46)
2 2 Z
(3.47) 1 1
g(X) dX ≈ (g(−1) + 4g(0) + g(1)) , (3.51)
−1 3
The construction of the dof map needs a scheme for numbering the
where
global degrees of freedom. A natural left-to-right numbering has the
function value at vertex xi as degree of freedom number 2i and the value
of the derivative at xi as degree of freedom number 2i+1, i = 0, . . . , Ne +1. 1 4
X̄0 = −1, X̄1 = 0, X̄2 = 1, w0 = w2 = , w1 = . (3.52)
3 3
Newton-Cotes rules up to five points is supported in the module file
3.5 Numerical integration numint.py.
For higher accuracy one can divide the reference cell into a set of subin-
Finite element codes usually apply numerical approximations to integrals. tervals and use the rules above on each subinterval. This approach results
Since the integrands in the coefficient matrix often are (lower-order) in composite rules,
R
well-known from basic introductions to numerical
polynomials, integration rules that can integrate polynomials exactly are integration of ab f (x) dx.
popular.
Numerical integration rules can be expressed in a common form,
3.5.2 Gauss-Legendre rules with optimized points
Z 1 M
X
g(X) dX ≈ wj g(X̄j ), (3.48) More accurate rules, for a given M , arise if the location of the integration
points are optimized for polynomial integrands. The Gauss-Legendre
−1 j=0
where X̄j are integration points and wj are integration weights, j = rules (also known as Gauss-Legendre quadrature or Gaussian quadrature)
0, . . . , M . Different rules correspond to different choices of points and constitute one such class of integration methods. Two widely applied
weights. Gauss-Legendre rules in this family have the choice
The very simplest method is the Midpoint rule,
3.6 Finite elements in 2D and 3D 117 118 3 Function approximation by finite elements
1 1
M =1: X̄0 = − √ , X̄1 = √ , w0 = w1 = 1 (3.53)
3 3 2.0
r r
3 3 5 8
M =2: X̄0 = − , X̄0 = 0, X̄2 = , w0 = w 2 = , w1 = .
5 5 9 9
(3.54)
1.5
These rules integrate 3rd and 5th degree polynomials exactly. In general,
an M -point Gauss-Legendre rule integrates a polynomial of degree 2M +
1 exactly. The code numint.py contains a large collection of Gauss- 1.0
Legendre rules.
0.5
3.6 Finite elements in 2D and 3D
Finite element approximation is particularly powerful in 2D and 3D 0.0 0.5 1.0 1.5 2.0
because the method can handle a geometrically complex domain Ω with
ease. The principal idea is, as in 1D, to divide the domain into cells and
Fig. 3.22 Example on 2D P1 elements in a deformed geometry.
use polynomials for approximating a function over a cell. Two popular
cell shapes are triangles and quadrilaterals. It is common to denote
finite elements on triangles and tetrahedrons as P while elements defined
in terms of quadrilaterals and boxes are denoted by Q. Figures 3.21,
3.22, and 3.23 provide examples. P1 elements means linear functions
(a0 + a1 x + a2 y) over triangles, while Q1 elements have bilinear functions
(a0 + a1 x + a2 y + a3 xy) over rectangular cells. Higher-order elements can 1.0
easily be defined. 0.8
0.6
1.0 1.0
0.8 0.8 0.4
0.6 0.6
0.4 0.4 0.2
0.2 0.2
0.00.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.0 0.5 1.0 1.5 2.0 2.5 3.0
Fig. 3.21 Example on 2D P1 elements.
3.6.1 Basis functions over triangles in the physical domain Fig. 3.23 Example on 2D Q1 elements.
Cells with triangular shape will be in main focus here. With the P1
triangular element, u is a linear function over each cell, as depicted in We give the vertices of the cells global and local numbers as in 1D.
Figure 3.24, with discontinuous derivatives at the cell boundaries. The degrees of freedom in the P1 element are the function values at a
Fig. 3.24 Example on scalar function defined in terms of piecewise linear 2D functions Fig. 3.25 Example on a piecewise linear 2D basis function over a patch of triangles.
defined on triangles.
3.6.2 Basis functions over triangles in the reference cell

set of nodes, which are the three vertices. The basis function ϕi (x, y) is
then 1 at the vertex with global vertex number i and zero at all other As in 1D, we can define the basis functions and the degrees of freedom
vertices. On an element, the three degrees of freedom uniquely determine in a reference cell and then use a mapping from the reference coordinate
the linear basis functions in that element, as usual. The global ϕi (x, y) system to the physical coordinate system. We also need a mapping of
function is then a combination of the linear functions (planar surfaces) local degrees of freedom numbers to global degrees of freedom numbers.
over all the neighboring cells that have vertex number i in common. The reference cell in an (X, Y ) coordinate system has vertices (0, 0),
Figure 3.25 tries to illustrate the shape of such a “pyramid”-like function. (1, 0), and (0, 1), corresponding to local vertex numbers 0, 1, and 2,
respectively. The P1 element has linear functions ϕ̃r (X, Y ) as basis
Element matrices and vectors. As in 1D, we split the integral over
functions, r = 0, 1, 2. Since a linear function ϕ̃r (X, Y ) in 2D is on the
Ω into a sum of integrals over cells. Also as in 1D, ϕi overlaps ϕj (i.e., form Cr,0 + Cr,1 X + Cr,2 Y , and hence has three parameters Cr,0 , Cr,1 ,
ϕi ϕj =
6 0) if and only if i and j are vertices in the same cell. Therefore, and Cr,2 , we need three degrees of freedom. These are in general taken as
the integral of ϕi ϕj over an element is nonzero only when i and j run
the function values at a set of nodes. For the P1 element the set of nodes
over the vertex numbers in the element. These nonzero contributions to is the three vertices. Figure 3.26 displays the geometry of the element
the coefficient matrix are, as in 1D, collected in an element matrix. The
and the location of the nodes.
size of the element matrix becomes 3 × 3 since there are three degrees Requiring ϕ̃r = 1 at node number r and ϕ̃r = 0 at the two other nodes,
of freedom that i and j run over. Again, as in 1D, we number the local
gives three linear equations to determine Cr,0 , Cr,1 , and Cr,2 . The result
vertices in a cell, starting at 0, and add the entries in the element matrix is
into the global system matrix, exactly as in 1D. All details and code
appear below.
Fig. 3.26 2D P1 element.
ϕ̃0 (X, Y ) = 1 − X − Y, (3.55)

ϕ̃1 (X, Y ) = X, (3.56)
ϕ̃2 (X, Y ) = Y (3.57)
Higher-order approximations are obtained by increasing the polynomial

order, adding additional nodes, and letting the degrees of freedom be
function values at the nodes. Figure 3.27 shows the location of the six Fig. 3.28 2D P1, P2, P3, P4, P5, and P6 elements.
nodes in the P2 element.
Fig. 3.29 P1 elements in 1D, 2D, and 3D.
Fig. 3.27 2D P2 element.
A polynomial of degree p in X and Y has np = (p + 1)(p + 2)/2 terms

and hence needs np nodes. The values at the nodes constitute np degrees
of freedom. The location of the nodes for ϕ̃r up to degree 6 is displayed
in Figure 3.28.
The generalization to 3D is straightforward: the reference element is
a tetrahedron with vertices (0, 0, 0), (1, 0, 0), (0, 1, 0), and (0, 0, 1) in a
X, Y, Z reference coordinate system. The P1 element has its degrees of Fig. 3.30 P2 elements in 1D, 2D, and 3D.
freedom as four nodes, which are the four vertices, see Figure 3.29. The
P2 element adds additional nodes along the edges of the cell, yielding a
The interval in 1D, the triangle in 2D, the tetrahedron in 3D, and its
total of 10 nodes and degrees of freedom, see Figure 3.30.
generalizations to higher space dimensions are known as simplex cells (the
geometry) or simplex elements (the geometry, basis functions, degrees of kam 6: Worked example here?
freedom, etc.). The plural forms simplices and simplexes are also a much
used shorter terms when referring to this type of cells or elements. The
side of a simplex is called a face, while the tetrahedron also has edges. 3.6.4 Isoparametric mapping of the reference cell
Acknowledgment. Figures 3.26-3.30 are created by Anders Logg and
Instead of using the P1 basis functions in the mapping (3.58), we may
taken from the FEniCS book: Automated Solution of Differential Equa- use the basis functions of the actual Pd element:
tions by the Finite Element Method, edited by A. Logg, K.-A. Mardal,
and G. N. Wells, published by Springer, 2012. X
x= ϕ̃r (X )xq(e,r) , (3.59)
r
where r runs over all nodes, i.e., all points associated with the degrees of
3.6.3 Affine mapping of the reference cell freedom. This is called an isoparametric mapping. For P1 elements it is
(1) identical to the affine mapping (3.58), but for higher-order elements the
Let ϕ̃r denote the basis functions associated with the P1 element
mapping of the straight or planar faces of the reference cell will result in
in 1D, 2D, or 3D, and let xq(e,r) be the physical coordinates of local
a curved face in the physical coordinate system. For example, when we
vertex number r in cell e. Furthermore, let X be a point in the reference
use the basis functions of the triangular P2 element in 2D in (3.59), the
coordinate system corresponding to the point x in the physical coordinate
straight faces of the reference triangle are mapped onto curved faces of
system. The affine mapping of any X onto x is then defined by
parabolic shape in the physical coordinate system, see Figure 3.32.
X
x= ϕ̃(1)
r (X )xq(e,r) , (3.58)
r x2
where r runs over the local vertex numbers in the cell. The affine mapping X2
essentially stretches, translates, and rotates the triangle. Straight or
planar faces of the reference cell are therefore mapped onto straight or
planar faces in the physical coordinate system. The mapping can be used
for both P1 and higher-order elements, but note that the mapping itself
X1
always applies the P1 basis functions.
x2 x1
X2 local global
Fig. 3.32 Isoparametric mapping of a P2 element.
X1 From (3.58) or (3.59) it is easy to realize that the vertices are correctly
mapped. Consider a vertex with local number s. Then ϕ̃s = 1 at this
vertex and zero at the others. This means that only one term in the sum
x1 is nonzero and x = xq(e,s) , which is the coordinate of this vertex in the
global coordinate system.
local global
Fig. 3.31 Affine mapping of a P1 element.
3.6.5 Computing integrals 3.7.1 Example on approximation in 2D using FEniCS

(e)
Let Ω̃ denote the reference cell and Ω the cell in the physical coordi-
r
Our previous programs for doing 1D approximation by finite element
nate system. The transformation of the integral from the physical to the basis function had a focus on all the small details needed to compute
reference coordinate system reads the solution. When going to 2D and 3D, the basic algorithms are the
same, but the amount of computational details with basis functions,
Z Z reference functions, mappings, numerical integration and so on, becomes
ϕi (x)ϕj (x) dx = ϕ̃i (X )ϕ̃j (X ) det J dX, (3.60) overwhelming because of all the flexibility and choices of elements. For
(e) r
ZΩ ZΩ̃ this purpose, we must, except in the simplest cases with P1 elements, use
ϕi (x)f (x) dx = ϕ̃i (X )f (x(X )) det J dX, (3.61) some well-developed, existing computer library. Here we shall use FEniCS,
Ω (e) Ω̃ r
which is a free, open finite element package for advanced computations.
where dx means the infinitesimal area element dxdy in 2D and dxdydz in The package can be programmed in C++ or Python. How it works is
3D, with a similar definition of dX. The quantity det J is the determinant best illustrated by an example.
of the Jacobian of the mapping x(X ). In 2D,
Mathematical problem. We want to approximate the function f (x) =
" #
∂x ∂x
∂x ∂y ∂x ∂y 2xy − x2 by P1 and P2 elements on [0, 2] × [−1, 1] using a division into
J= ∂X
∂y
∂Y
∂y , det J = − . (3.62) 8 × 8 squares, which are then divided into rectangles and then into
∂X ∂Y ∂Y ∂X
∂X ∂Y
triangles.
With the affine mapping (3.58), det J = 2∆, where ∆ is the area or
The code. Observe that the code employs the basic concepts from 1D,
volume of the cell in the physical coordinate system.
but is capable of using any element in FEniCS on any mesh in any
Remark. Observe that finite elements in 2D and 3D builds on the number of space dimensions (!).
same ideas and concepts as in 1D, but there is simply much more to
compute because the specific mathematical formulas in 2D and 3D are from fenics import *
more complicated and the book keeping with dof maps also gets more def approx(f, V):
complicated. The manual work is tedious, lengthy, and error-prone so """Return Galerkin approximation to f in V."""
automation by the computer is a must.
v = TestFunction(V)
a = u*v*dx
L = f*v*dx
u = Function(V)
3.7 Implementation solve(a == L, u)
return u
Our previous programs for doing 1D approximation by finite element def problem():
basis function had a focus on all the small details needed to compute f = Expression(’2*x[0]*x[1] - pow(x[0], 2)’, degree=2)
the solution. When going to 2D and 3D, the basic algorithms are the mesh = RectangleMesh(Point(0,-1), Point(2,1), 8, 8)
same, but the amount of computational details with basis functions, V1 = FunctionSpace(mesh, ’P’, 1)
reference functions, mappings, numerical integration and so on, becomes u1 = approx(f, V1)
overwhelming because of all the flexibility and choices of elements. For u1.rename(’u1’, ’u1’)
u1_error = errornorm(f, u1, ’L2’)
this purpose, we must, except in the simplest cases with P1 elements, use
some well-developed, existing computer library. Here we shall use FEniCS, V2 = FunctionSpace(mesh, ’P’, 2)
which is a free, open finite element package for advanced computations. u2 = approx(f, V2)
u2.rename(’u2’, ’u2’)
The package can be programmed in C++ or Python. How it works is u2_error = errornorm(f, u2, ’L2’)
best illustrated by an example.
print ’L2 errors: e1=%g, e2=%g’ % (u1_error, u2_error) Integrating SymPy and FEniCS. The definition of f must be expressed
# Simple plotting in C++. This part requires two definitions: one of f and one of Ω, or
plot(f, title=’f’, mesh=mesh)
plot(u1, title=’u1’) more precisely: the mesh (discrete Ω divided into cells). The definition
plot(u2, title=’u2’) of f is here expressed in C++ (it will be compiled for fast evaluation),
where the independent coordinates are given by a C/C++ vector x. This
if __name__ == ’__main__’:
problem() means that x is x[0], y is x[1], and z is x[2]. Moreover, x[0]**2 must
interactive() # Enable plotting be written as pow(x[0], 2) in C/C++.
Figure 3.33 shows the computed u1. The plots of u2 and f are identical Fortunately, we can easily integrate SymPy and Expression objects,
and therefore not shown. The plot itself is not very informative about because SymPy can take a formula and translate it to C/C++ code, and
the approximation quality of P1 elements. The output of errors becomes then we can require a Python code to numerically evaluate the formula.
Here is how we can specify f in SymPy and use it in FEniCS as an
L2 errors: e1=0.0131493, e2=4.93418e-15 Expression object:
>>> import sympy as sym
>>> x, y = sym.symbols(’x[0] x[1]’)
>>> f = 2*x*y - x**2
>>> print f
-x[0]**2 + 2*x[0]*x[1]
1.00
>>> f = sym.printing.ccode(f) # Translate to C code
>>> print f
-1.25 -pow(x[0], 2) + 2*x[0]*x[1]
>>> import fenics as fe
>>> f = fe.Expression(f)
-3.50
Here, the function ccode generates C code and we use x and y as
-5.75 placeholders for x[0] and x[1], which represent the coordinate of a
general point x in any dimension. The output of ccode can then be used
-8.00 directly in Expression.
Fig. 3.33 Plot of the computed approximation using Lagrange elements of second order.
3.7.2 Refined code with curve plotting

Dissection of the code. The function approx is a general solver function Interpolation and projection. The operation with defining a, L, and
for any f andR V . We define the unknown u in the variational form solving for a u is so common that it can be done by single FEniCS
a = a(u, v) = uv dx as a TrialFunction object and the test function v command project:
as a TestFunction object. Then we define the variational form through
u = project(f, V)
the integrand u*v*dx. The linear form L is similarly defined as f*v*dx.
Here, f must be an Expression object in FEniCS, i.e., a formula defined So, there is no need for our approx function!
by its implementation in C++. With a and L defined, we re-define u to If we want to do interpolation (or collocation) instead, we simply do
be a finite element function Function, which is now the unknown scalar
u = interpolate(f, V)
field to be computed by the simple expression solve(a == L, u). We
remark that the above function approx is implemented in FEniCS (in a Plotting the solution along a line. Having u and f available as finite
slightly more general fashion) in the function project. element functions (Function objects), we can easily plot the solution
The problem function applies approx to solve a specific problem. along a line since FEniCS has functionality for evaluating a Function
at arbitrary points inside the domain. For example, here is the code for However, we can now easily compare different type of elements and mesh
plotting u and f along a line x = const or y = const. resolutions:
import numpy as np import fenics as fe
import matplotlib.pyplot as plt import sympy as sym
x, y = sym.symbols(’x[0] x[1]’)
def comparison_plot2D(
u, f, # Function expressions in x and y def problem(f, nx=8, ny=8, degrees=[1,2]):
value=0.5, # x or y equals this value """
variation=’y’, # independent variable Plot u along x=const or y=const for Lagrange elements,
n=100, # no if intervals in plot of given degrees, on a nx times ny mesh. f is a SymPy expression.
tol=1E-8, # tolerance for points inside the domain """
plottitle=’’, # heading in plot f = sym.printing.ccode(f)
filename=’tmp’, # stem of filename f = fe.Expression(f)
): mesh = fe.RectangleMesh(
""" fe.Point(-1, 0), fe.Point(1, 2), 2, 2)
Plot u and f along a line in x or y dir with n intervals for degree in degrees:
and a tolerance of tol for points inside the domain. if degree == 0:
""" # The P0 element is specified like this in FEniCS
v = np.linspace(-1+tol, 1-tol, n+1) V = fe.FunctionSpace(mesh, ’DG’, 0)
# Compute points along specified line: else:
points = np.array([(value, v_) # The Lagrange Pd family of elements, d=1,2,3,...
if variation == ’y’ else (v_, value) V = fe.FunctionSpace(mesh, ’P’, degree)
for v_ in v]) u = fe.project(f, V)
u_values = [u(point) for point in points] # eval. Function u_error = fe.errornorm(f, u, ’L2’)
f_values = [f(point) for point in points] print ’||u-f||=%g’ % u_error, degree
plt.figure() comparison_plot2D(
plt.plot(v, u_values, ’r-’, v, f_values, ’b--’) u, f,
plt.legend([’u’, ’f’], loc=’upper left’) n=50,
if variation == ’y’: value=0.4, variation=’x’,
plt.xlabel(’y’); plt.ylabel(’u, f’) plottitle=’Approximation by P%d elements’ % degree,
else: filename=’approx_fenics_by_P%d’ % degree,
plt.xlabel(’x’); plt.ylabel(’u, f’) tol=1E-3)
plt.title(plottitle) #fe.plot(u, title=’Approx by P%d’ % degree)
plt.savefig(filename + ’.pdf’)
plt.savefig(filename + ’.png’) if __name__ == ’__main__’:
# x and y are global SymPy variables
f = 2*x*y - x**16
Integrating plotting and computations. It is now very easy to give f = 2*x*y - x**2
some graphical impression of the approximations for various kinds of 2D problem(f, nx=2, ny=2, degrees=[0, 1, 2])
elements. Basically, to solve the problem of approximating f = 2xy − x2 plt.show()
on Ω = [−1, 1] × [0, 2] by P2 elements on a 2 × 2 mesh, we want to (We note that this code issues a lot of warnings from the u(point)
integrate the function above with following type of computations: evaluations.)
import fenics as fe We show in Figure 3.34 how f is approximated by P0, P1, and P2
f = fe.Expression(’2*x[0]*x[1] - pow(x[0], 2)’) elements on a very coarse 2 × 2 mesh consisting of 8 cells.
mesh = fe.RectangleMesh(fe.Point(1,-1), fe.Point(2,1), 2, 2)
V = fe.FunctionSpace(mesh, ’P’, 2)
We have also added the result obtained by P2 elements.
u = fe.project(f, V)
err = fe.errornorm(f, u, ’L2’)
print err Questions
3.8 Exercises 131 132 3 Function approximation by finite elements
Problem 3.3: Construct matrix sparsity patterns

0.5 Approximation by P0 elements 0.5 Approximation by P1 elements 0.5 Approximation by P2 elements
u u u
f f f
0.0 0.0 0.0
0.5 0.5 0.5 Problem 3.1 describes a element mesh with a total of five elements,
but with two different element and node orderings. For each of the two
u, f
u, f
u, f
1.0 1.0 1.0
1.5 1.5 1.5 orderings, make a 5 × 5 matrix and fill in the entries that will be nonzero.
2.0
1.0 0.5 0.0
x
0.5 1.0
2.0
1.0 0.5 0.0
x
0.5 1.0
2.0
1.0 0.5 0.0
x
0.5 1.0 Hint. A matrix entry (i, j) is nonzero if i and j are nodes in the same
Fig. 3.34 Comparison of P0, P1, and P2 approximations (left to right) along a line in a element.
2D mesh. Filename: fe_sparsity_pattern.
There are two striking features in the figure: Problem 3.4: Perform symbolic finite element computations
1. The P2 solution is exact. Why? Perform symbolic calculations to find formulas for the coefficient matrix
2. The P1 solution does not seem to be a least squares approxima- and right-hand side when approximating f (x) = sin(x) on Ω = [0, π] by
tion. Why? two P1 elements of size π/2. Solve the system and compare u(π/2) with
the exact value 1.
Filename: fe_sin_P1.
With this code, found in the file approx_fenics.py, we can easily
run lots of experiments with the Lagrange element family. Just write the
Problem 3.5: Approximate a steep function by P1 and P2
SymPy expression and choose the mesh resolution!
elements
Given
3.8 Exercises 1
f (x) = tanh(s(x − ))
2
Problem 3.1: Define nodes and elements use the Galerkin or least squares method with finite elements to find
Consider a domain Ω = [0, 2] divided into the three elements [0, 1], [1, 1.2], an approximate function u(x). Choose s = 20 and try Ne = 4, 8, 16 P1
and [1.2, 2]. elements and Ne = 2, 4, 8 P2 elements. Integrate f ϕi numerically.
For P1 and P2 elements, set up the list of coordinates and nodes Hint. You can automate the computations by calling the approximate
(nodes) and the numbers of the nodes that belong to each element method in the fe_approx1D_numint module.
(elements) in two cases: 1) nodes and elements numbered from left to Filename: fe_tanh_P1P2.
right, and 2) nodes and elements numbered from right to left.
Filename: fe_numberings1.
Problem 3.6: Approximate a steep function by P3 and P4
elements
Problem 3.2: Define vertices, cells, and dof maps a) Solve Problem 3.5 using Ne = 1, 2, 4 P3 and P4 elements.
Repeat Problem 3.1, but define the data structures vertices, cells, b) How will an interpolation method work in this case with the same
and dof_map instead of nodes and elements. number of nodes?
Filename: fe_numberings2. Filename: fe_tanh_P3P4.
Exercise 3.7: Investigate the approximation error in finite (

elements 0 if 0 ≤ x < 1/2,
f (x) =
1 if 1/2 ≤ x ≥ 1/2
The theory (3.42) from Section 3.4.4 predicts that the error in the Pd
approximation of a function should behave as hd+1 , where h is the length by 2, 4, 8, and 16 elements and P1, P2, P3, and P4. Compare approxi-
of the element. Use experiments to verify this asymptotic behavior (i.e., mations visually.
for small enough h). Choose three examples: f (x) = Ae−ωx on [0, 3/ω], Hint. This f can also be expressed in terms of the Heaviside function
√
f (x) = A sin(ωx) on Ω = [0, 2π/ω] for constant A and ω, and f (x) = x H(x): f (x) = H(x − 1/2). Therefore, f can be defined by
on [0, 1].
f = sym.Heaviside(x - sym.Rational(1,2))
Hint 1. Run a series of experiments: (hi , Ei ), i = 0, . . . , m, where Ei is
the L2 norm of the error corresponding to element length hi . Assume an making the approximate function in the fe_approx1D.py module an
error model E = Chr and compute r from two successive experiments: obvious candidate to solve the problem. However, sympy does not handle
symbolic integration with this particular integrand, and the approximate
ri = ln(Ei+1 /Ei )/ ln(hi+1 /hi ), i = 0, . . . , m − 1 . function faces a problem when converting f to a Python function (for
plotting) since Heaviside is not an available function in numpy.
Hopefully, the sequence r0 , . . . , rm−1 converges to the true r, and rm−1
An alternative is to perform hand calculations. This is an instructive
can be taken as an approximation to r. Run such experiments for different
task, but in practice only feasible for few elements and P1 and P2 elements.
d for the different f (x) functions.
It is better to copy the functions element_matrix, element_vector,
Hint 2. The approximate function in fe_approx1D_numint.py is handy assemble, and approximate from the fe_approx1D_numint.py file and
for calculating the numerical solution. This function returns the finite edit these functions such that they can compute approximations with f
element solution as the coefficients {ci }i∈Is . To compute u, use u_glob given as a Python function and not a symbolic expression. Also assume
from the same module. Use the Trapezoidal rule to integrate the L2 that phi computed by the basis function is a Python callable function.
error: Remove all instances of the symbolic variable and associated code.
xc, u = u_glob(c, vertices, cells, dof_map) Filename: fe_Heaviside_P1P2.
e = f_func(xc) - u
L2_error = 0
e2 = e**2
for i in range(len(xc)-1): Exercise 3.9: 2D approximation with orthogonal functions
L2_error += 0.5*(e2[i+1] + e2[i])*(xc[i+1] - xc[i])
L2_error = np.sqrt(L2_error) a) Assume we have basis functions ϕi (x, y) in 2D that are orthogonal
The reason for this Trapezoidal integration is that u_glob returns coor- such that (ϕi , ϕj ) = 0 when i 6= j. The function least_squares in the
dinates xc and corresponding u values where some of the coordinates file approx2D.py will then spend much time on computing off-diagonal
(the cell vertices) coincides, because the solution is computed in one terms in the coefficient matrix that we know are zero. To speed up the
element at a time, using all local nodes. Also note that there are many computations, make a version least_squares_orth that utilizes the
coordinates in xc per cell such that we can accurately compute the error orthogonality among the basis functions.
inside each cell. b) Apply the function to approximate
Filename: Pd_approx_error.
f (x, y) = x(1 − x)y(1 − y)e−x−y
Problem 3.8: Approximate a step function by finite elements on Ω = [0, 1] × [0, 1] via basis functions
Approximate the step function

ϕi (x, y) = sin((p + 1)πx) sin((q + 1)πy), i = q(Nx + 1) + p, d) Repeat the set of experiments in the above point, but use interpola-
tion/collocation based on the node points to compute u(x) (recall that ci
where p = 0, . . . , Nx and q = 0, . . . , Ny . is now simply f (xi )). Compute the error E = ||u − f ||L2 . Which method
Hint. Get ideas from the function least_squares_orth in Section 2.3.3 seems to be most accurate?
and file approx1D.py. Filename: fe_P1_vs_interp.
c) Make a unit test for the least_squares_orth function.

Filename: approx2D_ls_orth.
Exercise 3.12: Implement 3D computations with global basis
functions
Exercise 3.10: Use the Trapezoidal rule and P1 elements Extend the approx2D.py code to 3D applying ideas from Section 2.6.4.
Construct some 3D problem to make a test function for the implementa-
Consider approximation of some f (x) on an interval Ω using the least tion.
squares or Galerkin methods with P1 elements. Derive the element matrix
and vector using the Trapezoidal rule (3.50) for calculating integrals on Hint. Drop symbolic integration since it is in general too slow for 3D prob-
the reference element. Assemble the contributions, assuming a uniform lems. Also use scipy.integrate.nquad instead of sympy.mpmath.quad
cell partitioning, and show that the resulting linear system has the form for numerical integration, since it is much faster.
ci = f (xi ) for i ∈ Is . Filename: approx3D.
Filename: fe_P1_trapez.
Exercise 3.13: Use Simpson’s rule and P2 elements

Exercise 3.11: Compare P1 elements and interpolation Redo Exercise 3.10, but use P2 elements and Simpson’s rule based on
We shall approximate the function sampling the integrands at the nodes in the reference cell.
Filename: fe_P2_simpson.
f (x) = 1 + sin(2πnx), x ∈ Ω = [0, 1],
where n ∈ Z and ≥ 0.
Exercise 3.14: Make a 3D code for Lagrange elements of
a) Plot f (x) for n = 1, 2, 3 and find the wave length of the function. arbitrary order
b) We want to use NP elements per wave length. Show that the number Extend the code from Section 3.7.2 to 3D.
of elements is then nNP .
c) The critical quantity for accuracy is the number of elements per wave
length, not the element size in itself. It therefore suffices to study an f
with just one wave length in Ω = [0, 1]. Set = 0.5.
Run the least squares or projection/Galerkin method for NP =
2, 4, 8, 16, 32. Compute the error E = ||u − f ||L2 .
Hint 1. Use the fe_approx1D_numint module to compute u and use
the technique from Section 3.4.4 to compute the norm of the error.
Hint 2. Read up on the Nyquist–Shannon sampling theorem.
138 4 Variational formulations with global basis functions
The coming sections address at how global basis functions and the
Variational formulations with
4
least squares, Galerkin, and collocation principles can be used to solve
differential equations.
global basis functions
4.1.1 Differential equation models
Let us consider an abstract differential equation for a function u(x) of
one variable, written as
L(u) = 0, x∈Ω. (4.1)

Here are a few examples on possible choices of L(u), of increasing com-
plexity:
d2 u
The finite element method is a very flexible approach for solving partial L(u) = − f (x), (4.2)
dx2
differential equations. Its two most attractive features are the ease of d du

handling domains of complex shape in two and three dimensions and L(u) = α(x) + f (x), (4.3)
dx dx
the ease of using higher-degree polynomials in the approximations. The d

du

latter feature typically leads to errors proportional to hd+1 , where h is L(u) = α(u) − au + f (x), (4.4)
dx dx
the element length and d is the polynomial degree. When the solution d

du

is sufficiently smooth, the ability to use larger d creates methods that L(u) = α(u) + f (u, x) . (4.5)
dx dx
are much more computationally efficient than standard finite difference
methods (and equally efficient finite difference methods are technically Both α(x) and f (x) are considered as specified functions, while a is
much harder to construct). a prescribed parameter. Differential equations corresponding to (4.2)-
However, before we attack finite element methods, with localized basis (4.3) arise in diffusion phenomena, such as stationary (time-independent)
functions, it can be easier from a pedagogical point of view to study transport of heat in solids and flow of viscous fluids between flat plates.
approximations by global functions because the mathematics in this case The form (4.4) arises when transient diffusion or wave phenomena are
gets simpler. discretized in time by finite differences. The equation (4.5) appears in
chemical models when diffusion of a substance is combined with chemical
reactions. Also in biology, (4.5) plays an important role, both for spreading
4.1 Basic principles for approximating differential of species and in models involving generation and propagation of electrical
signals.
equations
Let Ω = [0, L] be the domain in one space dimension. In addition
to the differential equation, u must fulfill boundary conditions at the
The finite element method is usually applied for discretization in space, boundaries of the domain, x = 0 and x = L. When L contains up to
and therefore spatial problems will be our focus in the coming sections. second-order derivatives, as in the examples above, we need one boundary
Extensions to time-dependent problems usually employs finite difference condition at each of the (two) boundary points, here abstractly specified
approximations in time. as

4.1 Basic principles for approximating differential equations 139 140 4 Variational formulations with global basis functions
B0 (u) = 0, x = 0, B1 (u) = 0, x = L (4.6) − u00 (x) = f (x), x ∈ Ω = [0, L], u(0) = 0, u(L) = D . (4.11)
There are three common choices of boundary conditions: A closely related problem with a different boundary condition at x = 0
reads
Bi (u) = u − g, Dirichlet condition (4.7)
du − u00 (x) = f (x), x ∈ Ω = [0, L], u0 (0) = C, u(L) = D . (4.12)
Bi (u) = −α − g, Neumann condition (4.8)
dx
du A third variant has a variable coefficient,
Bi (u) = −α − H(u − g), Robin condition (4.9)
dx
Here, g and H are specified quantities. − (α(x)u0 (x))0 = f (x), x ∈ Ω = [0, L],
u0 (0) = C, u(L) = D .
From now on we shall use ue (x) as symbol for the exact solution, (4.13)
fulfilling The solution u to the model problem (4.11) can be determined as
Z x
L(ue ) = 0, x ∈ Ω, (4.10) u0 (x) = − f (x) + c0 ,
0
while u(x) is our notation for an approximate solution of the differential Z x
equation. u(x) = u0 (x) + c1 ,
0
where c0 and c1 are determined by the boundary conditions such that

Remark on notation
u0 (0) = C and u(L) = D.
In the literature about the finite element method, it is common to use Computing the solution is easily done using sympy. Some common
u as the exact solution and uh as the approximate solution, where h code is defined first:
is a discretization parameter. However, the vast part of the present
import sympy as sym
text is about the approximate solutions, and having a subscript h x, L, C, D, c_0, c_1, = sym.symbols(’x L C D c_0 c_1’)
attached all the time is cumbersome. Of equal importance is the
close correspondence between implementation and mathematics The following function computes the solution symbolically for the model
that we strive to achieve in this text: when it is natural to use u and problem (4.11):
not u_h in code, we let the mathematical notation be dictated by def model1(f, L, D):
the code’s preferred notation. In the relatively few cases where we """Solve -u’’ = f(x), u(0)=0, u(L)=D."""
need to work with the exact solution of the PDE problem we call it # Integrate twice
u_x = - sym.integrate(f, (x, 0, x)) + c_0
ue in mathematics and u_e in the code (the function for computing u = sym.integrate(u_x, (x, 0, x)) + c_1
u_e is named u_exact). # Set up 2 equations from the 2 boundary conditions and solve
# with respect to the integration constants c_0, c_1
r = sym.solve([u.subs(x, 0)-0, # x=0 condition
u.subs(x,L)-D], # x=L condition
[c_0, c_1]) # unknowns
# Substitute the integration constants in the solution
4.1.2 Simple model problems and their solutions u = u.subs(c_0, r[c_0]).subs(c_1, r[c_1])
u = sym.simplify(sym.expand(u))
return u
A common model problem used much in the forthcoming examples is
Calling model1(2, L, D) results in the solution
1
C tan−1 (L) − C tan−1 (x) + D tan−1 (x)
x D + L2 − Lx
u(x) = (4.14) u(x) =
L tan−1 (L)
The model problem (4.12) can be solved by
def model2(f, L, C, D):
"""Solve -u’’ = f(x), u’(0)=C, u(L)=D.""" 4.1.3 Forming the residual
u_x = - sym.integrate(f, (x, 0, x)) + c_0
u = sym.integrate(u_x, (x, 0, x)) + c_1 The fundamental idea is to seek an approximate solution u in some space
r = sym.solve([sym.diff(u,x).subs(x, 0)-C, # x=0 cond. V,
u.subs(x,L)-D], # x=L cond.
[c_0, c_1])
u = u.subs(c_0, r[c_0]).subs(c_1, r[c_1]) V = span{ψ0 (x), . . . , ψN (x)},
u = sym.simplify(sym.expand(u))
return u which means that u can always be expressed as a linear combination of
the basis functions {ψj }j∈Is , with Is as the index set {0, . . . , N }:
to yield
X
u(x) = cj ψj (x) .
u(x) = −x2 + Cx − CL + D + L2 , (4.15) j∈Is
if f (x) = 2. Model (4.13) requires a bit more involved code, The coefficients {cj }j∈Is are unknowns to be computed.
def model3(f, a, L, C, D): (Later, in Section 5.2, we will see that if we specify boundary values
"""Solve -(a*u’)’ = f(x), u(0)=C, u(L)=D.""" of u different from zero, we must look for an approximate solution
P P
au_x = - sym.integrate(f, (x, 0, x)) + c_0 u(x) = B(x)+ j cj ψj (x), where j cj ψj ∈ V and B(x) is some function
u = sym.integrate(au_x/a, (x, 0, x)) + c_1
r = sym.solve([u.subs(x, 0)-C, for incorporating the right boundary values. Because of B(x), u will not
u.subs(x,L)-D], necessarily lie in V . This modification does not imply any difficulties.)
[c_0, c_1]) We need principles for deriving N + 1 equations to determine the N + 1
u = u.subs(c_0, r[c_0]).subs(c_1, r[c_1])
u = sym.simplify(sym.expand(u)) unknowns {ci }i∈Is . When approximating a given function f by u =
P
return u j cj ϕj , a key idea is to minimize the square norm of the approximation
error e = u − f or (equivalently) demand that e is orthogonal to V .
def demo(): Working with e is not so useful here since the approximation error in
f = 2 our case is e = ue − u and ue is unknown. The only general indicator
u = model1(f, L, D)
we have on the quality of the approximate solution is to what degree u
print ’model1:’, u, u.subs(x, 0), u.subs(x, L) P
print sym.latex(u, mode=’plain’) fulfills the differential equation. Inserting u = j cj ψj into L(u) reveals
u = model2(f, L, C, D) that the result is not zero, because u in general is an approximation and
#f = x
#u = model2(f, L, C, D)
not identical to ue . The nonzero result,
print ’model2:’, u, sym.diff(u, x).subs(x, 0), u.subs(x, L) X
print sym.latex(u, mode=’plain’) R = L(u) = L( cj ψj ), (4.16)
u = model3(0, 1+x**2, L, C, D) j
print ’model3:’, u, u.subs(x, 0), u.subs(x, L)
print sym.latex(u, mode=’plain’) is called the residual and measures the error in fulfilling the governing
equation.
if __name__ == ’__main__’:
demo() Various principles for determining {cj }j∈Is try to minimize R in some
sense. Note that R varies with x and the {cj }j∈Is parameters. We may
With f (x) = 0 and α(x) = 1 + x2 we get write this dependence explicitly as
R = R(x; c0 , . . . , cN ) . (4.17) orthogonal to the N + 1 basis functions {ψi } spanning V (and this is the
most convenient way to express (4.22):
Below, we present three principles for making R small: a least squares
method, a projection or Galerkin method, and a collocation or interpola-
(R, ψi ) = 0, i ∈ Is , (4.23)
tion method.
resulting in N + 1 equations for determining {ci }i∈Is .
4.1.4 The least squares method

4.1.6 The method of weighted residuals
The least-squares method aims to find {ci }i∈Is such that the square norm
of the residual A generalization of the Galerkin method is to demand that R is orthogonal
Z to some space W , but not necessarily the same space as V where we
||R|| = (R, R) = R2 dx (4.18) seek the unknown function. This generalization is called the method of
Ω
weighted residuals:
is minimized. By introducing an inner product of two functions f and g
on Ω as (R, v) = 0, ∀v ∈ W . (4.24)
Z
(f, g) = f (x)g(x) dx, (4.19) If {w0 , . . . , wN } is a basis for W , we can equivalently express the method
Ω of weighted residuals as
the least-squares method can be defined as
(R, wi ) = 0, i ∈ Is . (4.25)
min E = (R, R) . (4.20) The result is N + 1 equations for {ci }i∈Is .
c0 ,...,cN
The least-squares method can also be viewed as a weighted residual
Differentiating with respect to the free parameters {ci }i∈Is gives the
method with wi = ∂R/∂ci .
N + 1 equations
Z
∂R ∂R Variational formulation of the continuous problem
2R dx = 0 ⇔ (R, ) = 0, i ∈ Is . (4.21)
Ω ∂ci ∂ci
Statements like (4.22), (4.23), (4.24), or (4.25)) are known as weak
formulations or variational formulations. These equations are in this
4.1.5 The Galerkin method text primarily used for a numerical approximation u ∈ V , where V
is a finite-dimensional space with dimension N + 1. However, we
The least-squares principle is equivalent to demanding the error to be may also let the exact solution ue fulfill a variational formulation
orthogonal to the space V when approximating a function f by u ∈ V . (L(ue ), v) = 0 ∀v ∈ V , but the exact solution lies in general in
With a differential equation we do not know the true error so we must a space with infinite dimensions (because an infinite number of
instead require the residual R to be orthogonal to V . This idea implies parameters are needed to specify the solution). The variational for-
seeking {ci }i∈Is such that mulation for ue in an infinite-dimensional space V is a mathematical
way of stating the problem and acts as an alternative to the usual
(R, v) = 0, ∀v ∈ V . (4.22) (strong) formulation of a differential equation with initial and/or
This is the Galerkin method for differential equations. boundary conditions.
The above abstract statement can be made concrete by choosing a
concrete basis. For example, the statement is equivalent to R being
Much of the literature on finite element methods takes a dif- 4.1.8 The collocation method
ferential equation problem and first transforms it to a variational
The idea of the collocation method is to demand that R vanishes at
formulation in an infinite-dimensional space V , before searching
for an approximate solution in a finite-dimensional subspace of V . N + 1 selected points x0 , . . . , xN in Ω:
However, we prefer the more intuitive approach with an approx-
imate solution u in a finite-dimensional space V inserted in the R(xi ; c0 , . . . , cN ) = 0, i ∈ Is . (4.26)
differential equation, and then the resulting residual is demanded The collocation method can also be viewed as a method of weighted
to be orthogonal to V . residuals with Dirac delta functions as weighting functions. Let δ(x − xi )
be the Dirac delta function centered around x = xi with the properties
that δ(x − xi ) = 0 for x 6= xi and
Z
Remark on terminology f (x)δ(x − xi ) dx = f (xi ), xi ∈ Ω . (4.27)
Ω
The terms weak or variational formulations often refer to a statement
Intuitively, we may think of δ(xR − xi ) as a very peak-shaped function
like (4.22) or (4.24) after integration by parts has been performed
around x = xi with an integral −∞ ∞
δ(x − xi )dx that evaluates to unity.
(the integration by parts technique is explained in Section 4.1.10).
Mathematically, it can be shown that δ(x − xi ) is the limit of a Gaussian
The result after integration by parts is what is obtained after taking
function centered at x = xi with a standard deviation that approaches
the first variation of a minimization problem (see Section 4.2.4).
zero. Using this latter model, we can roughly visualize delta functions
However, in this text we use variational formulation as a common
as done in Figure 4.1. Because of (4.27), we can let wi = δ(x − xi )
term for formulations which, in contrast to the differential equation
be weighting functions in the method of weighted residuals, and (4.25)
R = 0, instead demand that an average of R is zero: (R, v) = 0 for
becomes equivalent to (4.26).
all v in some space.
4.1.7 Test and trial functions

40
In the context of the Galerkin method and the method of weighted
residuals it is common to use the name trial function for the approximate
P
u = j cj ψj . The space containing the trial function is known as the 30
trial space. The function v entering the orthogonality requirement in the
Galerkin method and the method of weighted residuals is called test
w
20
function, and so are the ψi or wi functions that are used as weights in
the inner products with the residual. The space where the test functions
comes from is naturally called the test space. 10
We see that in the method of weighted residuals the test and trial
spaces are different and so are the test and trial functions. In the Galerkin
method the test and trial spaces are the same (so far). 00.0 0.2 0.4 0.6 0.8 1.0
x
Fig. 4.1 Approximation of delta functions by narrow Gaussian functions.

The subdomain collocation method. The idea of this approach is to

demand the integral of R to vanish over N + 1 subdomains Ωi of Ω:
Z R(x; c0 , . . . , cN ) = u00 (x) + f (x),
 
R dx = 0, i ∈ Is . (4.28) d2 X
Ωi = 2 cj ψj (x) + f (x),
dx j∈I
This statement can also be expressed as a weighted residual method s
X
Z = cj ψj00 (x) + f (x) . (4.33)
Rwi dx = 0, i ∈ Is , (4.29) j∈Is
Ω
where wi = 1 for x ∈ Ωi and wi = 0 otherwise. The least squares method. The equations (4.21) in the least squares
method require an expression for ∂R/∂ci . We have
 
4.1.9 Examples on using the principles ∂R ∂ X X ∂cj
= cj ψj00 (x) + f (x) = ψ 00 (x) = ψi00 (x) . (4.34)
Let us now apply global basis functions to illustrate the different principles ∂ci ∂ci j∈I
s j∈I
∂ci j
s
for making the residual R small.
The governing equations for the unknown parameters {cj }j∈Is are then
The model problem. We consider the differential equation problem
X
( cj ψj00 + f, ψi00 ) = 0, i ∈ Is , (4.35)
− u (x) = f (x),
00
x ∈ Ω = [0, L], u(0) = 0, u(L) = 0 . (4.30) j
which can be rearranged as

Basis functions. Our choice of basis functions ψi for V is
X
(ψi00 , ψj00 )cj = −(f, ψi00 ), i ∈ Is . (4.36)
x
ψi (x) = sin (i + 1)π , i ∈ Is . (4.31) j∈Is
L
This is nothing but a linear system
An important property of these functions is that ψi (0) = ψi (L) = 0,
X
which means that the boundary conditions on u are fulfilled: Ai,j cj = bi , i ∈ Is .
X X j∈Is
u(0) = cj ψj (0) = 0, u(L) = cj ψj (L) = 0 .
j j
The entries in the coefficient matrix are given by
Another nice property is that the chosen sine functions are orthogonal
on Ω: Ai,j = (ψi00 , ψj00 )
Z L
x x
( = π 4 (i + 1)2 (j + 1)2 L−4 sin (i + 1)π sin (j + 1)π dx
ZL 1 0 L L
x x 2L i=j
sin (i + 1)π sin (j + 1)π dx = (4.32)
L L 0, i 6= j The orthogonality of the sine functions simplify the coefficient matrix:
0
(
1 −3 4
provided i and j are integers. 2 L π (i + 1)4 i = j
Ai,j = (4.37)
0, i 6= j
The residual. We can readily calculate the following explicit expression
for the residual: The right-hand side reads
Z L (u00 , v) = −(f, v), ∀v ∈ V . (4.41)

x
bi = −(f, ψi00 ) = (i + 1)2 π 2 L−2 f (x) sin (i + 1)π dx (4.38)
0 L This is the variational formulation, based on the Galerkin principle, of our
differential equation. The ∀v ∈ V requirement is equivalent to demanding
Since the coefficient matrix is diagonal we can easily solve for the equation (u00 , v) = −(f, v) to be fulfilled for all basis functions v = ψi ,
2L
Z L
x
i ∈ Is , see (4.22) and (4.23). We therefore have
ci = f (x) sin (i + 1)π dx . (4.39)
π 2 (i + 1)2 0 L X
( cj ψj00 , ψi ) = −(f, ψi ), i ∈ Is . (4.42)
With the special choice of f (x) = 2, the coefficients can be calculated in j∈Is
sympy by This equation can be rearranged to a form that explicitly shows that we
import sympy as sym get a linear system for the unknowns {cj }j∈Is :
i, j = sym.symbols(’i j’, integer=True) X
x, L = sym.symbols(’x L’) (ψi , ψj00 )cj = (f, ψi ), i ∈ Is . (4.43)
f = 2 j∈Is
a = 2*L/(sym.pi**2*(i+1)**2)
c_i = a*sym.integrate(f*sym.sin((i+1)*sym.pi*x/L), (x, 0, L)) For the particular choice of the basis functions (4.31) we get in fact
c_i = simplify(c_i) the same linear system as in the least squares method because ψ 00 =
print c_i −(i + 1)2 π 2 L−2 ψ. Consequently, the solution u(x) becomes identical to
The answer becomes the one produced by the least squares method.
The collocation method. For the collocation method (4.26) we need
L2 (−1)i + 1 to decide upon a set of N + 1 collocation points in Ω. A simple choice
ci = 4 is to use uniformly spaced points: xi = i∆x, where ∆x = L/N in our
π 3 (i3 + 3i2 + 3i + 1)
case (N ≥ 1). However, these points lead to at least two rows in the
Now, 1 + (−1)i = 0 for i odd, so only the coefficients with even index
matrix consisting of zeros (since ψi (x0 ) = 0 and ψi (xN ) = 0), thereby
are nonzero. Introducing i = 2k for k = 0, . . . , N/2 to count the relevant
making the matrix singular and non-invertible. This forces us to choose
indices (for N odd, k goes to (N − 1)/2), we get the solution
some other collocation points, e.g., random points or points uniformly
distributed in the interior of Ω. Demanding the residual to vanish at
8L2
N/2
X x
u(x) = sin (2k + 1)π . (4.40) these points leads, in our model problem (4.30), to the equations
π 3 (2k + 1)3 L
k=0 X
The coefficients decay very fast: c2 = c0 /27, c4 = c0 /125. The solution − cj ψj00 (xi ) = f (xi ), i ∈ Is , (4.44)
will therefore be dominated by the first term, j∈Is
which is seen to be a linear system with entries

8L2 x
u(x) ≈ 3 sin π .
xi

π L Ai,j = −ψj00 (xi ) = (j + 1)2 π 2 L−2 sin (j + 1)π ,
L
The Galerkin method. The Galerkin principle (4.22) applied to (4.30)
consists of inserting our special residual (4.33) in (4.22) in the coefficient matrix and entries bi = 2 for the right-hand side (when
f (x) = 2).
(u00 + f, v) = 0, ∀v ∈ V, The special case of N = 0 can sometimes be of interest. A natural
choice is then the midpoint x0 = L/2 of the domain, resulting in A0,0 =
or −ψ000 (x0 ) = π 2 L−2 , f (x0 ) = 2, and hence c0 = 2L2 /π 2 .
Comparison. In the present model problem, with f (x) = 2, the exact and the method of weighted residuals can, however, be applied together
solution is u(x) = x(L − x), while for N = 0 the Galerkin and least with finite element basis functions if we use integration by parts as a
squares method result in u(x) = 8L2 π −3 sin(πx/L) and the collocation means for transforming a second-order derivative to a first-order one.
method leads to u(x) = 2L2 π −2 sin(πx/L). We can quickly use sympy to Consider the model problem (4.30) and its Galerkin formulation
verify that the maximum error occurs at the midpoint x = L/2 and find
what the errors are. First we set up the error expressions: −(u00 , v) = (f, v) ∀v ∈ V .
>>> import sympy as sym Using integration by parts in the Galerkin method, we can “move” a
>>> # Computing with Dirichlet conditions: -u’’=2 and sines derivative of u onto v:
>>> x, L = sym.symbols(’x L’)
>>> e_Galerkin = x*(L-x) - 8*L**2*sym.pi**(-3)*sym.sin(sym.pi*x/L)
>>> e_colloc = x*(L-x) - 2*L**2*sym.pi**(-2)*sym.sin(sym.pi*x/L) Z L Z L
If the derivative of the errors vanish at x = L/2, the errors reach their u00 (x)v(x) dx = − u0 (x)v 0 (x) dx + [vu0 ]L
0
0 0
maximum values here (the errors vanish at the boundary points). Z L
=− u0 (x)v 0 (x) dx + u0 (L)v(L) − u0 (0)v(0) .
>>> dedx_Galerkin = sym.diff(e_Galerkin, x) 0
>>> dedx_Galerkin.subs(x, L/2) (4.45)
0
>>> dedx_colloc = sym.diff(e_colloc, x) Usually, one integrates the problem at the stage where the u and v
>>> dedx_colloc.subs(x, L/2)
0 functions enter the formulation. Alternatively, but less common, we can
integrate by parts in the expressions for the matrix entries:
Finally, we can compute the maximum error at x = L/2 and evaluate
the expressions numerically with three decimals: Z Z
L L
>>> sym.simplify(e_Galerkin.subs(x, L/2).evalf(n=3)) ψi (x)ψj00 (x) dx = − ψi0 (x)ψj0 (x)dx + [ψi ψj0 ]L
0
-0.00812*L**2 0 0
Z L
>>> sym.simplify(e_colloc.subs(x, L/2).evalf(n=3))
0.0473*L**2 =− ψi0 (x)ψj0 (x) dx + ψi (L)ψj0 (L) − ψi (0)ψj0 (0) .
0
The error in the collocation method is about 6 times larger than the (4.46)
error in the Galerkin or least squares method.
Integration by parts serves to reduce the order of the derivatives and to
make the coefficient matrix symmetric since (ψi0 , ψj0 ) = (ψj0 , ψi0 ). The sym-
metry property depends on the type of terms that enter the differential
4.1.10 Integration by parts equation. As will be seen later in Section 5.3, integration by parts also
A problem arises if we want to apply popular finite element functions to provides a method for implementing boundary conditions involving u0 .
solve our model problem (4.30) by the standard least squares, Galerkin, or With the choice (4.31) of basis functions we see that the “boundary
collocation methods: the piecewise polynomials ψi (x) have discontinuous terms” ψi (L)ψj0 (L) and ψi (0)ψj0 (0) vanish since ψi (0) = ψi (L) = 0. We
derivatives at the cell boundaries which makes it problematic to compute therefore end up with the following alternative Galerkin formulation:
the second-order derivative. This fact actually makes the least squares
and collocation methods less suitable for finite element approximation of −(u00 , v) = (u0 , v 0 ) = (f, v) ∀v ∈ V .
the unknown function. (By rewriting the equation −u00 = f as a system of
Weak form. Since the variational formulation after integration by parts
two first-order equations, u0 = v and −v 0 = f , the least squares method
make weaker demands on the differentiability of u and the basis functions
can be applied. Also, differentiating discontinuous functions can actually
ψi , the resulting integral formulation is referred to as a weak form of the
be handled by distribution theory in mathematics.) The Galerkin method
differential equation problem. The original variational formulation with X

x
second-order derivatives, or the differential equation problem with second- u(x) = D+ cj ψj (x), (4.47)
L
order derivative, is then the strong form, with stronger requirements on j∈I s
the differentiability of the functions. with ψi (0) = ψi (L) = 0.

For differential equations with second-order derivatives, expressed as The particular shape of the B(x) function is not important as long as
variational formulations and solved by finite element methods, we will its boundary values are correct. For example, B(x) = D(x/L)p for any
always perform integration by parts to arrive at expressions involving power p will work fine in the above example. Another choice could be
only first-order derivatives. B(x) = D sin(πx/(2L)).
As a more general example, consider a domain Ω = [a, b] where the
boundary conditions are u(a) = Ua and u(b) = Ub . A class of possible
4.1.11 Boundary function B(x) functions is
So far we have assumed zero Dirichlet boundary conditions, typically Ub − Ua

B(x) = Ua + (x − a)p , p > 0. (4.48)
u(0) = u(L) = 0, and we have demanded that ψi (0) = ψi (L) = 0 for (b − a)p
i ∈ Is . What about a boundary condition like u(L) = D = 6 0? This Real applications will most likely use the simplest version, p = 1, but
P
condition immediately faces a problem: u = j cj ϕj (L) = 0 since all here such a p parameter was included to demonstrate that there are
ϕi (L) = 0. many choices of B(x) in a problem. Fortunately, there is a general,
We remark that we faced exactly the same problem in Section 2.3.2 unique technique for constructing B(x) when we use finite element basis
where we considered Fourier series approximations of functions that functions for V .
where non-zero at the boundaries. We will use the same trick as we did kam 7: in the below, I cannot really find where it is stated that we
earlier to get around this problem. need to adjust the right-hand side as well hpl 8: I don’t understand
A boundary condition of the form u(L) = D can be implemented by what you mean.
demanding that all ψi (L) = 0, but adding a boundary function B(x) with
the right boundary value, B(L) = D, to the expansion for u:
How to deal with nonzero Dirichlet conditions
X
u(x) = B(x) + cj ψj (x) . The general procedure of incorporating Dirichlet boundary condi-
j∈Is tions goes as follows. Let ∂ΩE be the part(s) of the boundary ∂Ω
This u gets the right value at x = L: of the domain Ω where u is specified. Set ψi = 0 at the points in
X
∂ΩE and seek u as
u(L) = B(L) + cj ψj (L) = B(L) = D . X
j∈Is u(x) = B(x) + cj ψj (x), (4.49)
The idea is that for any boundary where u is known we demand ψi to j∈Is
vanish and construct a function B(x) to attain the boundary value of u. where B(x) equals the boundary conditions on u at ∂ΩE .
There are no restrictions on how B(x) varies with x in the interior of the
domain, so this variation needs to be constructed in some way. Exactly
how we decide the variation to be, is not important. Remark. With the B(x) term, u does not in general lie in V =
For example, with u(0) = 0 and u(L) = D, we can choose B(x) = span {ψ0 , . . . , ψN } anymore. Moreover, when a prescribed value of u
xD/L, since this form ensures that B(x) fulfills the boundary conditions: at the boundary, say u(a) = Ua is different from zero, it does not make
B(0) = 0 and B(L) = D. The unknown function is then sought on the sense to say that u lies in a vector space, because this space does not
form obey the requirements of addition and scalar multiplication. For example,
4.2 Computing with global polynomials 155 156 4 Variational formulations with global basis functions
Z 1 Z 1
2u does not lie in the space since its boundary value is 2Ua , which is
Ai,j = (ψj0 , ψi0 ) = ψi0 (x)ψj0 (x) dx = (i + 1)(j + 1)(1 − x)i+j dx
incorrect. It only makes sense to split u in two parts, as done above, and 0 0
P
have the unknown part j cj ψj in a proper function space. (i + 1)(j + 1)
= ,
i+j+1
bi = (2, ψi ) − (D, ψi0 ) − Cψi (0)
4.2 Computing with global polynomials Z 1
= (2ψi (x) − Dψi0 (x)) dx − Cψi (0)
0
Z
The next example uses global polynomials and shows that if our solu- 1
= 2(1 − x)i+1 + D(i + 1)(1 − x)i dx − C
tion, modulo boundary conditions, lies in the space spanned by these 0
polynomials, then the Galerkin method recovers the exact solution. (D − C)(i + 2) + 2 2
= =D−C + .
i+2 i+2
Relevant sympy commands to help calculate these expressions are
4.2.1 Computing with Dirichlet and Neumann conditions
from sympy import *
Let us perform the necessary calculations to solve x, C, D = symbols(’x C D’)
i, j = symbols(’i j’, integer=True, positive=True)
psi_i = (1-x)**(i+1)
−u00 (x) = 2, x ∈ Ω = [0, 1], u0 (0) = C, u(1) = D, psi_j = psi_i.subs(i, j)
integrand = diff(psi_i, x)*diff(psi_j, x)
using a global polynomial basis ψi ∼ xi . The requirements on ψi is that integrand = simplify(integrand)
ψi (1) = 0, because u is specified at x = 1, so a proper set of polynomial A_ij = integrate(integrand, (x, 0, 1))
basis functions can be A_ij = simplify(A_ij)
print ’A_ij:’, A_ij
f = 2
ψi (x) = (1 − x)i+1 , i ∈ Is . b_i = integrate(f*psi_i, (x, 0, 1)) - \
integrate(diff(D*x, x)*diff(psi_i, x), (x, 0, 1)) - \
A suitable B(x) function to handle the boundary condition u(1) = D is C*psi_i.subs(x, 0)
B(x) = Dx. The variational formulation becomes b_i = simplify(b_i)
print ’b_i:’, b_i
(u0 , v 0 ) = (2, v) − Cv(0) ∀v ∈ V . The output becomes

P
From inserting u = B + j cj ψj and choosing v = ψi we get A_ij: (i + 1)*(j + 1)/(i + j + 1)
b_i: ((-C + D)*(i + 2) + 2)/(i + 2)
X
(ψj0 , ψi0 )cj = (2, ψi ) − (B 0 , ψi0 ) − Cψi (0), i ∈ Is . We can now choose some N and form the linear system, say for N = 1:
j∈Is
N = 1
The entries in the linear system are then A = zeros((N+1, N+1))
b = zeros(N+1)
print ’fresh b:’, b
for r in range(N+1):
for s in range(N+1):
A[r,s] = A_ij.subs(i, r).subs(j, s)
b[r,0] = b_i.subs(i, r)
The system becomes

! ! !
1 1 c0 1−C +D Since these are two variational statements in the same space, we can
=
1 4/3 c1 2/3 − C + D subtract them and use the bilinear property of a(·, ·):
The solution (c = A.LUsolve(b)) becomes c0 = 2 − C + D and c1 = −1,
resulting in a(B + E, v) − a(B + F, v) = L(v) − L(v)
a(B + E − (B + F ), v) = 0
u(x) = 1 − x2 + D + C(x − 1), (4.50)
a(E − F ), v) = 0
We can form this u in sympy and check that the differential equation and
the boundary conditions are satisfied: If a(E − F ), v) = 0 for all v in V , then E − F must be zero everywhere
in the domain, i.e., E = F . Or in other words: u = ue . This proves that
u = sum(c[r,0]*psi_i.subs(i, r) for r in range(N+1)) + D*x
print ’u:’, simplify(u) the exact solution is recovered if ue − B lies in V ., i.e., can be expressed
P
print "u’’:", simplify(diff(u, x, x)) as j∈Is dj ψj where {ψj }j∈Is is a basis for V . The method will then
print ’BC x=0:’, simplify(diff(u, x).subs(x, 0)) compute the solution cj = dj , j ∈ Is .
print ’BC x=1:’, simplify(u.subs(x, 1))
The case treated in Section 4.2.1 is of the type where ue − B is a
The output becomes quadratic function that is 0 at x = 1, and therefore (ue − B) ∈ V , and
the method finds the exact solution.
u: C*x - C + D - x**2 + 1
u’’: -2
BC x=0: C
BC x=1: D
4.2.3 Abstract notation for variational formulations
The complete sympy code is found in u_xx_2_CD.py.
The exact solution is found by integrating twice and applying the We have seen that variational formulations end up with a formula in-
boundary conditions, either by hand or using sympy as shown in Sec- volving u and v, such as (u0 , v 0 ) and a formula involving v and known
tion 4.1.2. It appears that the numerical solution coincides with the exact functions, such as (f, v). A widely used notation is to introduce an
one. This result is to be expected because if (ue − B) ∈ V , u = ue , as abstract variational statement written as
proved next.
a(u, v) = L(v) ∀v ∈ V,
where a(u, v) is a so-called bilinear form involving all the terms that
4.2.2 When the numerical method is exact contain both the test and trial function, while L(v) is a linear form
containing all the terms without the trial function. For example, the
We have some variational formulation: find (u − B) ∈ V such that statement
a(u, v) = L(u) ∀v ∈ V . The exact solution also fulfills a(ue , v) = L(v),
Z Z
but normally (ue − B) lies in a much larger (infinite-dimensional) space.
u0 v 0 dx = f v dx or (u0 , v 0 ) = (f, v) ∀v ∈ V
Suppose, nevertheless, that ue − B = E, where E ∈ V . That is, apart Ω Ω
from Dirichlet conditions, ue lies in our finite-dimensional space V which can be written in abstract form: find u such that
we use to compute u. Writing also u on the same form u = B + F , F ∈ V ,
we have a(u, v) = L(v) ∀v ∈ V,
where we have the definitions
a(B + E, v) = L(v) ∀v ∈ V,
a(B + F, v) = L(v) ∀v ∈ V . a(u, v) = (u0 , v 0 ), L(v) = (f, v) .
The term linear means that 4.2.4 Variational problems and minimization of functionals
Example. Many physical problems can be modeled as partial differential
L(α1 v1 + α2 v2 ) = α1 L(v1 ) + α2 L(v2 )
equations and as minimization problems. For example, the deflection
for two test functions v1 and v2 , and scalar parameters α1 and α2 . u(x) of an elastic string subject to a transversal force f (x) is governed
Similarly, the term bilinear means that a(u, v) is linear in both its by the differential equation problem
arguments:
−u00 (x) = f (x), x ∈ (0, L), x(0) = x(L) = 0 .
a(α1 u1 + α2 u2 , v) = α1 a(u1 , v) + α2 a(u2 , v), Equivalently, the deflection u(x) is the function v that minimizes the
a(u, α1 v1 + α2 v2 ) = α1 a(u, v1 ) + α2 a(u, v2 ) . potential energy F (v) in a string,
Z
In nonlinear problems these linearity properties do not hold in general 1 L
F (v) = (v 0 )2 − f v dx .
and the abstract notation is then 2 0
That is, F (u) = minv∈V F (v). The quantity F (v) is called a functional:
F (u; v) = 0 ∀v ∈ V . it takes one or more functions as input and produces a number. Loosely
speaking, we may say that a functional is “a function of functions”.
The matrix system associated with a(u, v) = L(v) can also be written
P Functionals very often involve integral expressions as above.
in an abstract form by inserting v = ψi and u = j cj ψj in a(u, v) = L(v).
A range of physical problems can be formulated either as a differential
Using the linear properties, we get
equation or as a minimization of some functional. Quite often, the
X differential equation arises from Newton’s 2nd law of motion while the
a(ψj , ψi )cj = L(ψi ), i ∈ Is ,
functional expresses a certain kind of energy.
j∈Is
Many traditional applications of the finite element method, especially
which is a linear system in solid mechanics and constructions with beams and plates, start with
X formulating F (v) from physical principles, such as minimization of elastic
Ai,j cj = bi , i ∈ Is , energy, and then proceeds with deriving a(u, v) = L(v), which is the
formulation usually desired in software implementations.
j∈Is
where
The general minimization problem. The relation between a differential
Ai,j = a(ψj , ψi ), bi = L(ψi ) . equation and minimization of a functional can be expressed in a general
mathematical way using our abstract notation for a variational form:
In many problems, a(u, v) is symmetric such that a(ψj , ψi ) = a(ψi , ψj ). a(u, v) = L(v). It can be shown that the variational statement
In those cases the coefficient matrix becomes symmetric, Ai,j = Aj,i , a
property that can simplify solution algorithms for linear systems and a(u, v) = L(v) ∀v ∈ V,
make them more stable. The property also reduces memory requirements
and the computational work. is equivalent to minimizing the functional
The abstract notation a(u, v) = L(v) for linear differential equation 1
problems is much used in the literature and in description of finite element F (v) = a(v, v) − L(v)
2
software (in particular the FEniCS documentation). We shall frequently over all functions v ∈ V . That is,
summarize variational forms using this notation.
F (u) ≤ F (v) ∀v ∈ V .
P
Derivation. kam 9: need that F is convex and a is positive, symmetric, Minimization of the discretized functional. Inserting v = j cj ψj
right? Check turns minimization of F (v) into minimization of a quadratic function of
To see this, we write F (u) ≤ F (η), ∀η ∈ V instead and set η = u + v, the parameters c0 , . . . , cN :
where v ∈ V is an arbitrary function in V . For any given arbitrary v, we X X X
can view F (v) as a function g() and find the extrema of g, which is a F̄ (c0 , . . . , cN ) = a(ψi , ψj )ci cj − L(ψj )cj
function of one variable. We have j∈Is i∈Is j∈Is
of N + 1 parameters.
1
F (η) = F (u + v) = a(u + v, u + v) − L(u + v) . Minimization of F̄ implies
2
From the linearity of a and L we get ∂ F̄
= 0, i ∈ Is .
∂ci
g() = F (u + u) After some algebra one finds
1 X
= a(u + v, u + v) − L(u + v) a(ψi , ψj )cj = L(ψi ), i ∈ Is ,
2
1 1 j∈Is
= a(u, u + v) + a(v, u + u) − L(u) − L(v) which is the same system as the one arising from a(u, v) = L(v).
2 2
1 1 1 1
= a(u, u) + a(u, v) + a(v, u) + 2 a(v, v) − L(u) − L(v) . Calculus of variations. A branch of applied mathematics, called calculus
2 2 2 2 of variations, deals with the technique of minimizing functionals to derive
If we now assume that a is symmetric, a(u, v) = a(v, u), we can write differential equations. The technique involves taking the variation (a
kind of derivative) of functionals, which have given name to terms like
1 1 variational form, variational problem, and variational formulation.
g() = a(u, u) + a(u, v) + 2 a(v, v) − L(u) − L(v) .
2 2
The extrema of g is found by searching for such that g 0 () = 0:
4.3 Examples on variational formulations
g 0 () = a(u, v) − L(v) + a(v, v) = 0 .
This linear equation in has a solution = (a(u, v) − L(u))/a(v, v) if The following sections derive variational formulations for some prototype
a(v, v) > 0. But recall that a(u, v) = L(v) for any v, so we must have differential equations in 1D, and demonstrate how we with ease can
= 0. Since the reasoning above holds for any v ∈ V , the function handle variable coefficients, mixed Dirichlet and Neumann boundary
η = u + v that makes F (η) extreme must have = 0, i.e., η = u, the conditions, first-order derivatives, and nonlinearities.
solution of a(u, v) = L(v) for any v in V .
Looking at g 00 () = a(v, v), we realize that = 0 corresponds to a
unique minimum if a(v, v) > 0.
4.3.1 Variable coefficient
The equivalence of a variational form a(u, v) = L(v) ∀v ∈ V and the
minimization problem F (u) ≤ F (v) ∀v ∈ V requires that 1) a is bilinear Consider the problem
and L is linear, 2) a(u, v) is symmetric: a(u, v) = a(v, u), and 3) that
a(v, v) > 0.
d du
− α(x) = f (x), x ∈ Ω = [0, L], u(0) = C, u(L) = D .
dx dx
(4.51)
4.3 Examples on variational formulations 163 164 4 Variational formulations with global basis functions
There are two new features of this problem compared with previous The boundary term vanishes since v(0) = v(L) = 0. The variational
examples: a variable coefficient α(x) and nonzero Dirichlet conditions at formulation is then
both boundary points. Z Z
Let us first deal with the boundary conditions. We seek du dv
dx =
α(x) f (x)v dx, ∀v ∈ V .
Ω dx dx Ω
X
u(x) = B(x) + cj ψi (x) . The variational formulation can alternatively be written in a more com-
j∈Is pact form:
Since the Dirichlet conditions demand
(αu0 , v 0 ) = (f, v), ∀v ∈ V .
ψi (0) = ψi (L) = 0, i ∈ Is , The corresponding abstract notation reads
the function B(x) must fulfill B(0) = C and B(L) = D. The we are
guaranteed that u(0) = C and u(L) = D. How B varies in between x = 0 a(u, v) = L(v) ∀v ∈ V,
and x = L is not of importance. One possible choice is with
1 a(u, v) = (αu0 , v 0 ), L(v) = (f, v) .
B(x) = C + (D − C)x, P
L We may insert u = B + j cj ψj and v = ψi to derive the linear system:
which follows from (4.48) with p = 1. X
We seek (u − B) ∈ V . As usual, (αB 0 + α cj ψj0 , ψi0 ) = (f, ψi ), i ∈ Is .
j∈Is
V = span{ψ0 , . . . , ψN } . Isolating everything with the cj coefficients on the left-hand side and all
known terms on the right-hand side gives
Note that any v ∈ V has the property v(0) = v(L) = 0.
The residual arises by inserting our u in the differential equation: X
(αψj0 , ψi0 )cj = (f, ψi ) + (α(D − C)L−1 , ψi0 ), i ∈ Is .
j∈Is
d du
R=− α −f. P
dx dx This is nothing but a linear system j Ai,j cj = bi with
Galerkin’s method is
Z
(R, v) = 0, ∀v ∈ V, Ai,j = (αψj0 , ψi0 ) = α(x)ψj0 (x), ψi0 (x) dx,
Ω
Z
or written with explicit integrals, D−C 0
bi = (f, ψi ) + (α(D − C)L−1 , ψi0 ) = f (x)ψi (x) + α(x) ψi (x) dx .
Z Ω L
d du
− α − f v dx = 0, ∀v ∈ V .
Ω dx dx
We proceed with integration by parts to lower the derivative from second 4.3.2 First-order derivative in the equation and boundary
to first order: condition
Z Z L The next problem to formulate in terms of a variational form reads
d du du dv du
− α(x) v dx = α(x) dx − α v .
Ω dx dx Ω dx dx dx 0
− u00 (x) + bu0 (x) = f (x), x ∈ Ω = [0, L], u(0) = C, u0 (L) = E .
(4.52)
4.3 Examples on variational formulations 165 166 4 Variational formulations with global basis functions
The new features are a first-order derivative u0 in the equation and the mathematical formulation, such as forcing all ϕi = 0 on the bound-
boundary condition involving the derivative: u0 (L) = E. Since we have a ary and constructing a B(x), and are therefore known as essential
Dirichlet condition at x = 0, we must force ψi (0) = 0 and use a boundary boundary conditions.
function to take care of the condition u(0) = C. Because there is no
Dirichlet condition on x = L we do not make any requirements to ψi (L). The final variational form reads
The simplest possible choice of B(x) is B(x) = C.
The expansion for u becomes (u0 , v 0 ) + (bu0 , v) = (f, v) + Ev(L), ∀v ∈ V .
X
u=C+ cj ψi (x) . In the abstract notation we have
j∈Is
The variational formulation arises from multiplying the equation by a a(u, v) = L(v) ∀v ∈ V,
test function v ∈ V and integrating over Ω: with the particular formulas
(−u00 + bu0 − f, v) = 0, ∀v ∈ V a(u, v) = (u0 , v 0 ) + (bu0 , v), L(v) = (f, v) + Ev(L) .

P
We apply integration by parts to the u00 v term only. Although we could The associated linear system is derived by inserting u = B + j cj ψj
also integrate u0 v by parts, this is not common. The result becomes and replacing v by ψi for i ∈ Is . Some algebra results in
X
(u0 , v 0 ) + (bu0 , v) = (f, v) + [u0 v]L
0, ∀v ∈ V . ((ψj0 , ψi0 ) + (bψj0 , ψi )) cj = (f, ψi ) + Eψi (L) .
| {z }
j∈Is | {z }
Now, v(0) = 0 so Ai,j bi
Observe that in this problem, the coefficient matrix is not symmetric,

[u0 v]L
0 = u (L)v(L) = Ev(L),
0
because of the term
because u0 (L) = E. Thus, integration by parts allows us to take care of Z Z
the Neumann condition in the boundary term. (bψj0 , ψi ) = bψj0 ψi dx 6= bψi0 ψj dx = (ψi0 , bψj ) .
Ω Ω
Natural and essential boundary conditions

A common mistake is to forget a boundary term like [u0 v]L 4.3.3 Nonlinear coefficient
0 in the
integration by parts. Such a mistake implies that we actually impose Finally, we show that the techniques used above to derive variational
the condition u0 = 0 unless there is a Dirichlet condition (i.e., v = 0) forms apply to nonlinear differential equation problems as well. Here
at that point! This fact has great practical consequences, because is a model problem with a nonlinear coefficient α(u) and a nonlinear
it is easy to forget the boundary term, and that implicitly set a right-hand side f (u):
boundary condition!
Since homogeneous Neumann conditions can be incorporated
without “doing anything” (i.e., omitting the boundary term), and − (α(u)u0 )0 = f (u), x ∈ [0, L], u(0) = 0, u0 (L) = E . (4.53)
non-homogeneous Neumann conditions can just be inserted in the
boundary term, such conditions are known as natural boundary Our space V has basis {ψi }i∈Is , and because of the condition u(0) = 0,
conditions. Dirichlet conditions require more essential steps in the we must require ψi (0) = 0, i ∈ Is .
Galerkin’s method is about inserting the approximate u, multiplying
the differential equation by v ∈ V , and integrate,
4.4 Implementation of the algorithms 167 168 4 Variational formulations with global basis functions
Z Z 4.4.1 Extensions of the code for approximation

d L du L
− α(u) v dx = f (u)v dx ∀v ∈ V . The user must prepare a function integrand_lhs(psi, i, j) for re-
0 dx dx 0
turning the integrand of the integral that contributes to matrix entry
The integration by parts does not differ from the case where we have
(i, j) on the left-hand side. The psi variable is a Python dictionary
α(x) instead of α(u):
holding the basis functions and their derivatives in symbolic form. More
Z L du dv
Z L precisely, psi[q] is a list of
α(u) dx = f (u)v dx + [α(u)vu0 ]L
0 ∀v ∈ V .
0 dx dx 0 dq ψ0 dq ψNn −1
The term α(u(0))v(0)u0 (0) = 0 since v(0). The other term, { q
,..., }.
dx dxq
α(u(L))v(L)u0 (L), is used to impose the other boundary condition Similarly, integrand_rhs(psi, i) returns the integrand for entry num-
u0 (L) = E, resulting in ber i in the right-hand side vector.
Since we also have contributions to the right-hand side vector (and
Z L Z L potentially also the matrix) from boundary terms without any integral,
du dv
α(u) dx = f (u)v dx + α(u(L))v(L)E ∀v ∈ V, we introduce two additional functions, boundary_lhs(psi, i, j) and
0 dx dx 0
boundary_rhs(psi, i) for returning terms in the variational formula-
or alternatively written more compactly as tion that are not to be integrated over the domain Ω. Examples, to be
shown later, will explain in more detail how these user-supplied function
(α(u)u0 , v 0 ) = (f (u), v) + α(u(L))v(L)E ∀v ∈ V . may look like.
Since the problem is nonlinear, we cannot identify a bilinear form a(u, v) The linear system can be computed and solved symbolically by the
and a linear form L(v). An abstract formulation is typically find u such following function:
that import sympy as sym
F (u; v) = 0 ∀v ∈ V, def solver(integrand_lhs, integrand_rhs, psi, Omega,

boundary_lhs=None, boundary_rhs=None):
with N = len(psi[0]) - 1
F (u; v) = (a(u)u0 , v 0 ) − (f (u), v) − a(L)v(L)E . b = sym.zeros((N+1, 1))
P x = sym.Symbol(’x’)
By inserting u = j cj ψj and v = ψi in F (u; v), we get a nonlinear for i in range(N+1):
system of algebraic equations for the unknowns ci , i ∈ Is . Such systems for j in range(i, N+1):
must be solved by constructing a sequence of linear systems whose integrand = integrand_lhs(psi, i, j)
solutions hopefully converge to the solution of the nonlinear system. if boundary_lhs is not None:
Frequently applied methods are Picard iteration and Newton’s method. I += boundary_lhs(psi, i, j)
A[i,j] = A[j,i] = I # assume symmetry
integrand = integrand_rhs(psi, i)
4.4 Implementation of the algorithms if boundary_rhs is not None:
I += boundary_rhs(psi, i)
b[i,0] = I
c = A.LUsolve(b)
Our hand calculations can benefit greatly by symbolic computing, as
u = sum(c[i,0]*psi[0][i] for i in range(len(psi[0])))
shown earlier, so it is natural to extend our approximation programs return u, c
based on sympy to the problem domain of variational formulations.
4.4.2 Fallback on numerical methods integrand = sym.lambdify([x], integrand)

Not surprisingly, symbolic solution of differential equations, discretized if boundary_rhs is not None:
I += boundary_rhs(psi, i)
by a Galerkin or least squares method with global basis functions, is of b[i,0] = I
limited interest beyond the simplest problems, because symbolic inte- c = A.LUsolve(b)
gration might be very time consuming or impossible, not only in sympy u = sum(c[i,0]*psi[0][i] for i in range(len(psi[0])))
return u, c
but also in WolframAlpha (which applies the perhaps most powerful
symbolic integration software available today: Mathematica). Numerical
integration as an option is therefore desirable.
The extended solver function below tries to combine symbolic and 4.4.3 Example with constant right-hand side
numerical integration. The latter can be enforced by the user, or it can be
invoked after a non-successful symbolic integration (being detected by an To demonstrate the code above, we address
Integral object as the result of the integration in sympy). Note that for a
numerical integration, symbolic expressions must be converted to Python −u00 (x) = b, x ∈ Ω = [0, 1], u(0) = 1, u(1) = 0,
functions (using lambdify), and the expressions cannot contain other with b as a (symbolic) constant. A possible basis for the space V is
symbols than x. The real solver routine in the varform1D.py file has ψi (x) = xi+1 (1 − x), i ∈ Is . Note that ψi (0) = ψi (1) = 0 as required by
error checking and meaningful error messages in such cases. The solver the Dirichlet conditions. We need a B(x) function to take care of the
code below is a condensed version of the real one, with the purpose known boundary values of u. Any function B(x) = 1 − xp , p ∈ R, is a
of showing how to automate the Galerkin or least squares method for candidate, and one arbitrary choice from this family is B(x) = 1 − x3 .
solving differential equations in 1D with global basis functions: The unknown function is then written as
kam 10: this point has been made many times already
X
def solver(integrand_lhs, integrand_rhs, psi, Omega, u(x) = B(x) + cj ψj (x) .
boundary_lhs=None, boundary_rhs=None, symbolic=True): j∈Is
N = len(psi[0]) - 1
A = sym.zeros((N+1, N+1)) Let us use the Galerkin method to derive the variational formulation.
b = sym.zeros((N+1, 1)) Multiplying the differential equation by v and integrating by parts yield
for i in range(N+1): Z 1 Z 1
for j in range(i, N+1): u0 v 0 dx = f v dx ∀v ∈ V,
integrand = integrand_lhs(psi, i, j) 0 0
P
if symbolic: and with u = B + j cj ψj we get the linear system
if isinstance(I, sym.Integral):
symbolic = False # force num.int. hereafter X Z 1 Z 1
if not symbolic: ψi0 ψj0 dx cj = (f ψi − B 0 ψi0 ) dx, i ∈ Is . (4.54)
integrand = sym.lambdify([x], integrand) 0 0
j∈Is
if boundary_lhs is not None:
I += boundary_lhs(psi, i, j) The application can be coded as follows with sympy:
A[i,j] = A[j,i] = I
integrand = integrand_rhs(psi, i) import sympy as sym
if symbolic: x, b = sym.symbols(’x b’)
I = sym.integrate(integrand, (x, Omega[0], Omega[1])) f = b
if isinstance(I, sym.Integral): B = 1 - x**3
symbolic = False dBdx = sym.diff(B, x)
if not symbolic:
# Compute basis functions and their derivatives
N = 3 We can play around with the code and test that with f = Kxp , for
psi = {0: [x**(i+1)*(1-x) for i in range(N+1)]} some constants K and p, the solution is a polynomial of degree p + 2,
psi[1] = [sym.diff(psi_i, x) for psi_i in psi[0]]
and N = p + 1 guarantees that the approximate solution is exact.
def integrand_lhs(psi, i, j): Although the symbolic code is capable of integrating many choices of
return psi[1][i]*psi[1][j] f (x), the symbolic expressions for u quickly become lengthy and non-
def integrand_rhs(psi, i): informative, so numerical integration in the code, and hence numerical
return f*psi[0][i] - dBdx*psi[1][i] answers, have the greatest application potential.
Omega = [0, 1]
from varform1D import solver

u_bar, c = solver(integrand_lhs, integrand_rhs, psi, Omega,
4.5 Approximations may fail: convection-diffusion
verbose=True, symbolic=True)
u = B + u_bar In the previous examples we have obtained reasonable approximations
print ’solution u:’, sym.simplify(sym.expand(u))
of the continuous solution with several different approaches. In this
kam 11: does not work hpl 12: We must get it to work. It’s a small chapter we will consider a convection-diffusion equation where many
extension of the code for approximation and very natural to include when methods will fail. The failure is purely numerical and it is often tied to
we have so much material already on sympy for implementing algorithms. the resolution. The current example is perhaps the prime example of
numerical instabilities in the context of numerical solution algorithms
The printout of u reads -b*x**2/2 + b*x/2 - x + 1. Note that ex- for PDEs.
panding u, before simplifying, is necessary in the present case to get a Consider the equation
compact, final expression with sympy. Doing expand before simplify
is a common strategy for simplifying expressions in sympy. However, a −uxx − ux = 0, ∈ (0, 1), (4.55)
non-expanded u might be preferable in other cases - this depends on the u(0) = 1, (4.56)
problem in question. u(1) = 0. (4.57)
The exact solution ue (x) can be derived by some sympy code that
closely follows the examples in Section 4.1.2. The idea is to integrate The problem describes a convection-diffusion problem where the con-
−u00 = b twice and determine the integration constants from the boundary vection is modelled by the first order term −ux and diffusion is described
conditions: by the second order term −uxx . In many applications 1 and the
dominating term is −ux . The sign of −ux is not important, the same
C1, C2 = sym.symbols(’C1 C2’) # integration constants
f1 = sym.integrate(f, x) + C1 problem occurs for ux . The sign only determine the direction of the
f2 = sym.integrate(f1, x) + C2 convection.
# Find C1 and C2 from the boundary conditions u(0)=0, u(1)=1 For = 0, the solution satisfies
s = sym.solve([u_e.subs(x,0) - 1, u_e.subs(x,1) - 0], [C1, C2])
# Form the exact solution Z x
u_e = -f2 + s[C1]*x + s[C2] u(x) − u(1) = (−ux )(− dx) = 0,
print ’analytical solution:’, u_e 1
print ’error:’, sym.simplify(sym.expand(u - u_e))
which means that u(x) = u(1). Clearly only the boundary condition at
The last line prints 0, which is not surprising when ue (x) is a parabola x = 1 is required and the solution is constant throughout the domain.
and our approximate u contains polynomials up to degree 4. It suffices to If 0 < 1 such that the term −ux is dominating, the solution is
have N = 1, i.e., polynomials of degree 2, to recover the exact solution. similar to the solution for = 0 in the interior. However, the second order
term −uxx makes the problem a second order problem and two boundary
conditions are required, one condition at each side. The boundary condi-
4.5 Approximations may fail: convection-diffusion 173 174 4 Variational formulations with global basis functions
tion at x = 0 forces the solution to be zero at that point and this creates is significantly larger than 103 as it was here. In these applications the
a sharp gradient close to x = 0. For this reason, the problem is called a boundary layer is extremely thin and the gradient extremely sharp.
singular perturbation problem as the problem changes fundamentally in In this chapter we will not embark on the fascinating and complex issue
the sense that different boundary conditions are required in the limiting of boundary layer theory but only consider the numerical issues related
case = 0. to this phenomenon. Let us as earlier therefore consider an approximate
The solution of the above problem is solution on the following form
e−x/ − 1 N
X −1
u(x) = . (4.58) u(x) = û(x) + B(x) = cj ψj (x) + B(x) (4.59)
e−1/ − 1
j=1
As earlier {ψj (x)}N

j=1 } are zero at the boundary x = 0 and x = 1 and
−1
the boundary conditions are accounted for by the function B(x). Let
B(x) = c0 (1 − x) + cN x . (4.60)
Then we fixate c0 = 0 and cN = 1 which makes B(x) = x. To determine
j=1 } we consider the homogeneous Dirichlet problem where we solve
−1
{cj }N
for û = u − B. The homogeneous Dirichlet problem reads
−ûxx + ûx = 1, ∈ (0, 1), (4.61)

û(0) = 0, (4.62)
û(1) = 0 .
The Galerkin formulation of (4.62) is obtained as

Z 1
(−û00 + û0 − 1)ψj dx .
0
Integration by parts leads to

Fig. 4.2 Analytical solution to the convection-diffusion problem for varying . Z 1
û0 ψi0 + û0 ψi − 1ψi dx .
0
The solution is plotted in Figure 4.2 for different values of . Clearly, P
as decrease the exponential function represents a sharper and sharper In other words, we need to solve the linear system j Ai,j cj = bi where
gradient. From a physical or engineering point of view, the equation
(4.55) represents the simplest problem involving a common phenomenon Z 1
of boundary layers. Boundary layers are common in all kinds of fluid Ai,j = ψj0 ψi0 + ψj0 ψi dx,
0
flow and is a main problem when discretizing such equations. Boundary Z 1
layers have the characteristics of the solution (4.58), that is; a sharp bi = 1ψj dx .
0
local exponential gradient. In fluid flow the parameter is often related
to the inverse of the Reynolds number which frequently in engineering A sketch of a corresponding code where we also plot the behavior of
the solution with respect to different goes as follows.
4.5 Approximations may fail: convection-diffusion 175 176 4 Variational formulations with global basis functions
import matplotlib.pyplot as plt

N = 8
psi = series(x, series_type, N) # Lagrange, Bernstein, sin, ...
eps_values =[1.0, 0.1, 0.01, 0.001]
for eps in eps_valuess:
A = sym.zeros((N-1), (N-1))
b = sym.zeros((N-1))
for i in range(0, N-1):

integrand = f*psi[i]
b[i,0] = sym.mpmath.quad(integrand, [Omega[0], Omega[1]])
for j in range(0, N-1):
integrand = eps*sym.diff(psi[i], x)*\
sym.diff(psi[j], x) - sym.diff(psi[i], x)*psi[j]
A[i,j] = sym.mpmath.quad(integrand,
[Omega[0], Omega[1]])
c = A.LUsolve(b)
u = sum(c[r,0]*psi[r] for r in range(N-1)) + x
U = sym.lambdify([x], u, modules=’numpy’)
x_ = numpy.arange(Omega[0], Omega[1], 1/((N+1)*100.0)) Fig. 4.4 Solution obtained with Galerkin approximation using Lagrangian polynomials
U_ = U(x_) of order up to 16 for various .
plt.plot(x_, U_)
The numerical solutions for different is shown in Figure 4.3 and 4.4
for N = 8 and N = 16, respectively. From these figures we can make two
observations. The first observation is that the numerical solution contains
non-physical oscillations that grows as decreases. These oscillations are
so strong that for N = 8, the numerical solutions do not resemble the
true solution at all for less than 1/10. The true solution is always in
the interval [0, 1] while the numerical solution has values larger than 2
for = 1/100 and larger than 10 for = 1/1000. The second observation
is that the numerical solutions appear to improve as N increases. While
the numerical solution is outside the interval [0, 1] for less than 1/10
the magnitude of the oscillations clearly has decreased.
We will return to this example later and show examples of techniques
that can be used to improve the approximation. The complete source
code can be found in conv_diff_global.py.
Fig. 4.3 Solution obtained with Galerkin approximation using Lagrangian polynomials
of order up to 8 for various .
4.6 Exercises 177 178 4 Variational formulations with global basis functions
4.6 Exercises x̄ =
x
, ū =
w
,
L/2 wc
Exercise 4.1: Refactor functions into a more general class where wc is a characteristic size of w. Inserted in the problem for w,
Section 4.1.2 lists three functions for computing the analytical solution 4T wc d2 ū
of some simple model problems. There is quite some repetitive code, = ` (= const) .
L2 dx̄2
suggesting that the functions can benefit from being refactored into a A desire is to have u and its derivatives about unity, so choosing wc such
class hierarchy, where the super class solves −(a(x)u0 (x))0 = f (x) and that |d2 ū/dx̄2 | = 1 is an idea. Then wc = 14 `L2 /T , and the problem for
where subclasses define the equations for the boundary conditions in the scaled vertical deflection u becomes
a model. Make a method for returning the residual in the differential
equation and the boundary conditions when the solution is inserted u00 = 1, x ∈ (0, 1), u(0) = 0, u0 (1) = 0 .
in these equations. Create a test function that verifies that all three
residuals vanish for each of the model problems in Section 4.1.2. Also Observe that there are no physical parameters in this scaled problem.
make a method that returns the solution either as sympy expression From now on we have for convenience renamed x to be the scaled quantity
or as a string in LATEX format. Add a fourth subclass for the problem x̄.
−(au0 )0 = f with a Robin boundary condition: a) Find the exact solution for the deflection u.
b) A possible function space is spanned by ψi = sin((2i + 1)πx/2),
u(0) = 0, −u0 (L) = C(u − D) . i = 0, . . . , N . These functions fulfill the necessary condition ψi (0) = 0,
√
Demonstrate the use of this subclass for the case f = 0 and a = 1 + x. but they also fulfill ψi0 (1) = 0 such that both boundary conditions are
P
Filename: uxx_f_sympy_class. fulfilled by the expansion u = j cj ϕj .
Use a Galerkin and a least squares method to find the coefficients cj
P
in u(x) = j cj ψj . Find how fast the coefficients decrease in magnitude
Exercise 4.2: Compute the deflection of a cable with sine by looking at cj /cj−1 . Find the error in the maximum deflection at x = 1
functions when only one basis function is used (N = 0).
Hint. In this case, where the basis functions and their derivatives are
A hanging cable of length L with significant tension T has a deflection
orthogonal, it is easiest to set up the calculations by hand and use sympy
w(x) governed by
to help out with the integrals.
T w00 (x) = `(x), c) Visualize the solutions in b) for N = 0, 1, 20.
where `(x) the vertical load per unit length. The cable is fixed at x = 0 d) The functions in b) were selected such that they fulfill the condition
and x = L so the boundary conditions become w(0) = w(L) = 0. The ψ 0 (1) = 0. However, in the Galerkin method, where we integrate by
deflection w is positive upwards, and ` is positive when it acts downwards. parts, the condition u0 (1) = 0 is incorporated in the variational form.
If we assume a constant load `(x) = const, the solution is expected to This leads to the idea of just choosing a simpler basis, namely “all” sine
be symmetric around x = L/2. For a function w(x) that is symmetric functions ψi = sin((i + 1) πx
2 ). Will the method adjust the coefficient such
around some point x0 , it means that w(x0 − h) = w(x0 + h), and then that the additional functions compared with those in b) get vanishing
w0 (x0 ) = limh→0 (w(x0 + h) − w(x0 − h))/(2h) = 0. We can therefore coefficients? Or will the additional basis functions improve the solution?
utilize symmetry to halve the domain. We then seek w(x) in [0, L/2] Use Galerkin’s method.
with boundary conditions w(0) = 0 and w0 (L/2) = 0. e) Now we drop the symmetry condition at x = 1 and extend the domain
The problem can be scaled by introducing dimensionless variables, to [0, 2] such that it covers the entire (scaled) physical cable. The problem
now reads
4.6 Exercises 179
u00 = 1, x ∈ (0, 2), u(0) = u(2) = 0 .

This time we need basis functions that are zero at x = 0 and x = 2. The
set sin((i + 1) πx
2 ) from d) is a candidate since they vanish x = 2 for any
i. Compute the approximation in this case. Why is this approximation
without the problem that this set of basis functions introduced in d)?
Filename: cable_sin.
Exercise 4.3: Compute the deflection of a cable with power

functions
a) Repeat Exercise 4.2 b), but work with the space
V = span{x, x2 , x3 , x4 , . . .} .
Choose the dimension of V to be 4 and observe that the exact solution
is recovered by the Galerkin method.
Hint. Use the solver function from varform1D.py.
b) What happens if we use a least squares method for this problem with
the basis in a)?
Filename: cable_xn.
Exercise 4.4: Check integration by parts

Consider the Galerkin method for the problem involving u in Exercise 4.2.
Show that the formulas for cj are independent of whether we perform
integration by parts or not.
Filename: cable_integr_by_parts.
182 5 Variational formulations with finite elements
5.1.1 Finite element mesh and basis functions

Variational formulations with finite
elements 5 We introduce a finite element mesh with Ne cells, all with length h, and
number the cells from left to right. Choosing P1 elements, there are two
nodes per cell, and the coordinates of the nodes become
xi = ih, h = L/Ne , i = 0, . . . , Nn − 1 = Ne .
Any node i is associated with a finite element basis function ϕi (x).
When approximating a given function f by a finite element function
u, we expand u using finite element basis functions associated with all
nodes in the mesh. The parameter N , which counts the unknowns from
0 to N , is then equal to Nn − 1 such that the total number of unknowns,
N + 1, is the total number of nodes. However, when solving differential
equations we will often have N < Nn − 1 because of Dirichlet boundary
conditions. The reason is simple: we know what u are at some (here two)
hpl 13: Cannot remember why we need an opening example with global nodes, and the number of unknown parameters is naturally reduced.
polynomials in a chapter about local polynomials. Could it just be the In our case with homogeneous Dirichlet boundary conditions, we do
property of Galerkin recovering the exact solution? Chaperwise it fits not need any boundary function B(x), so we can work with the expansion
better in the previous one... kam 14: have now moved it before abstract X
forms in previous chapter u(x) = cj ψj (x) . (5.1)
We shall now take the ideas from previous chapters and put together j∈Is
such that we can solve PDEs using the flexible finite element basis Because of the boundary conditions, we must demand ψi (0) = ψi (L) = 0,
functions. This is quite a machinery with many details, but the chapter i ∈ Is . When ψi for all i = 0, . . . , N is to be selected among the finite
is mostly an assembly of concepts and details we have already met. element basis functions ϕj , j = 0, . . . , Nn − 1, we have to avoid using ϕj
functions that do not vanish at x0 = 0 and xNn −1 = L. However, all ϕj
vanish at these two nodes for j = 1, . . . , Nn − 2. Only basis functions
5.1 Computing with finite elements associated with the end nodes, ϕ0 and ϕNn −1 , violate the boundary
conditions of our differential equation. Therefore, we select the basis
functions ϕi to be the set of finite element basis functions associated
The purpose of this section is to demonstrate in detail how the finite
with all the interior nodes in the mesh:
element method can then be applied to the model problem
ψi = ϕi+1 , i = 0, . . . , N .
−u00 (x) = 2, x ∈ (0, L), u(0) = u(L) = 0,
The i index runs over all the unknowns ci in the expansion for u, and in
with variational formulation
this case N = Nn − 3.
In the general case, and in particular on domains in higher dimen-
(u0 , v 0 ) = (2, v) ∀v ∈ V .
sions, the nodes are not necessarily numbered from left to right, so we
Any v ∈ V must obey v(0) = v(L) = 0 because of the Dirichlet conditions introduce a mapping from the node numbering, or more precisely the
on u. The variational formulation is derived in Section 4.1.10. degree of freedom numbering, to the numbering of the unknowns in the
final equation system. These unknowns take on the numbers 0, . . . , N .

5.1 Computing with finite elements 183 184 5 Variational formulations with finite elements
Unknown number j in the linear system corresponds to degree of freedom Figure 5.1 shows ϕ02 (x) and ϕ03 (x).
number ν(j), j ∈ Is . We can then write
ψi = ϕν(i) , i = 0, . . . , N . ϕ20 ϕ30

With a regular numbering as in the present example, ν(j) = j + 1,
j = 0, . . . , N = Nn − 3.
x
5.1.2 Computation in the global physical domain 0 1 2 3 4 5
We shall first perform a computation in the x coordinate system because Ω(0) Ω(1) Ω(2) Ω(3) Ω(4)
the integrals can be easily computed here by simple, visual, geometric Fig. 5.1 Illustration of the derivative of piecewise linear basis functions associated with
considerations. This is called a global approach since we work in the x nodes in cell 2.
coordinate system and compute integrals on the global domain [0, L].
The entries in the coefficient matrix and right-hand side are We realize that ϕ0i and ϕ0j has no overlap, and hence their product
vanishes, unless i and j are nodes belonging to the same cell. The only
Z L Z L nonzero contributions to the coefficient matrix are therefore
Ai,j = ψi0 (x)ψj0 (x) dx, bi = 2ψi (x) dx, i, j ∈ Is .
0 0
Z L
Expressed in terms of finite element basis functions ϕi we get the alter- Ai−1,i−2 = ϕ0i (x)ϕ0i−1 (x) dx,
native expressions Z
0
L
Ai−1,i−1 = ϕ0i (x)2 dx,
Z L Z L 0
Z
Ai,j = ϕ0i+1 (x)ϕ0j+1 (x) dx, bi = 2ϕi+1 (x) dx, i, j ∈ Is . L
0 0 Ai−1,i = ϕ0i (x)ϕ0i+1 (x) dx,
0
For the following calculations the subscripts on the finite element basis
for i = 1, . . . , N + 1, but for i = 1, Ai−1,i−2 is not defined, and for
functions are more conveniently written as i and j instead of i + 1 and
i = N + 1, Ai−1,i is not defined.
j + 1, so our notation becomes
From Figure 5.1, we see that ϕ0i−1 (x) and ϕ0i (x) have overlap of one cell
(i−1)
Z L Z L Ω = [xi−1 , xi ] and that their product then is −1/h2 . The integrand
Ai−1,j−1 = ϕ0i (x)ϕ0j (x) dx, bi−1 = 2ϕi (x) dx, is constant and therefore Ai−1,i−2 = −h−2 h = −h−1 . A similar reasoning
0 0
can be applied to Ai−1,i , which also becomes −h−1 . The integral of ϕ0i (x)2
where the i and j indices run as i, j = 1, . . . , N + 1 = Nn − 2.
gets contributions from two cells, Ω (i−1) = [xi−1 , xi ] and Ω (i) = [xi , xi+1 ],
The ϕi (x) function is a hat function with peak at x = xi and a linear
but ϕ0i (x)2 = h−2 in both cells, and the length of the integration interval
variation in [xi−1 , xi ] and [xi , xi+1 ]. The derivative is 1/h to the left of
is 2h so we get Ai−1,i−1 = 2h−1 .
xi and −1/h to the right, or more formally,
The right-hand side involves an integral of 2ϕi (x), i = 1, . . . , Nn − 2,

which is just the area under a hat function of height 1 and width 2h, i.e.,
 0,
 x < xi−1 ,

 h−1 , xi−1 ≤ x < xi , equal to h. Hence, bi−1 = 2h.
ϕ0i (x) = (5.2) To summarize the linear system, we switch from i to i + 1 such that

 −h−1 , xi ≤ x < xi+1 ,

 0, x ≥ xi+1 we can write
which is, in fact, equal to (5.5) if (5.5) is divided by h. Therefore, the

Ai,i−1 = Ai,i+1 = −h−1 , Ai,i = 2h−1 , bi = 2h . finite difference and the finite element method are equivalent in this
simple test problem.
The equation system to be solved only involves the unknowns ci for
Sometimes a finite element method generates the finite difference
i ∈ Is . With our numbering of unknowns and nodes, we have that ci
equations on a uniform mesh, and sometimes the finite element method
equals u(xi+1 ). The complete matrix system then takes the following
generates equations that are different. The differences are modest, but
form:
may influence the numerical quality of the solution significantly, especially
in time-dependent problems. It depends on the problem at hand whether a
    
1 −1 0 · · · · · · · · · · · · · · · 0 c0 2h finite element discretization is more or less accurate than a corresponding
..   ..   .. 
 −1 2 −1 . . . finite difference discretization.

 .  .
    . 
  
 0 −1 2 −1 . . .
 ..   .
 .
  . 
  .. 
 .   .   
 . . .. .. ..   .   . 
 .. . . . .  .   .. 
 0 .  .    5.1.4 Cellwise computations
1 .
 .. .. .. .. .. 
 .
..     . 
.
. . . . . . .   ..
= . 
  .  (5.3)
h
 . Software for finite element computations normally employs the cell by
 . . . 

 ..
  . 
2 −1 . . .. 
  .  cell computational procedure where an element matrix and vector are
 . 0 −1  .   . 
    
 .. .. .. .. ..   ..   ..  calculated for each cell and assembled in the global linear system. Let us
 . . . . . 0   .
   . 

 ..   ..
  
  ..  go through the details of this type of algorithm.
.. .. ..
 . . . . −1   .   .  All integrals are mapped to the local reference coordinate system
0 · · · · · · · · · · · · · · · 0 −1 1 cN 2h X ∈ [−1, 1]. In the present case, the matrix entries contain derivatives
with respect to x,
5.1.3 Comparison with a finite difference discretization (e)

Z Z 1 d d h
Ai−1,j−1 = ϕ0i (x)ϕ0j (x) dx = ϕ̃r (X) ϕ̃s (X) dX,
A typical row in the matrix system (5.3) can be written as Ω (e) −1 dx dx 2
1 2 1 where the global degree of freedom i is related to the local degree of

− ci−1 + ci − ci+1 = 2h . (5.4) freedom r through i = q(e, r). Similarly, j = q(e, s). The local degrees of
h h h
freedom run as r, s = 0, 1 for a P1 element.
Let us introduce the notation uj for the value of u at node j: uj = u(xj ),
P P
since we have the interpretation u(xj ) = j cj ϕ(xj ) = j cj δij = cj . The integral for the element matrix. There are simple formulas for
The unknowns c0 , . . . , cN are u1 , . . . , uNn −2 . Shifting i with i + 1 in (5.4) the basis functions ϕ̃r (X) as functions of X. However, we now need to
and inserting ui = ci−1 , we get find the derivative of ϕ̃r (X) with respect to x. Given
1 2 1 1 1
− ui−1 + ui − ui+1 = 2h, (5.5) ϕ̃0 (X) = (1 − X), ϕ̃1 (X) = (1 + X),
h h h 2 2
A finite difference discretization of −u (x) = 2 by a centered, second-
00 we can easily compute dϕ̃r /dX:
order finite difference approximation u00 (xi ) ≈ [Dx Dx u]i with ∆x = h
dϕ̃0 1 dϕ̃1 1
yields =− , = .
dX 2 dX 2
ui−1 − 2ui + ui+1 From the chain rule,
− = 2, (5.6)
h2
dϕ̃r dϕ̃r dX 2 dϕ̃r therefore becomes a 1×1 matrix and there is only one entry in the element
= = . (5.7)
dx dX dx h dX vector. On cell 0, only ψ0 = ϕ1 is involved, corresponding to integration
The transformed integral is then with ϕ̃1 . On cell Ne , only ψN = ϕNn −2 is involved, corresponding to
Z Z integration with ϕ̃0 . We then get the special end-cell contributions
(e)
1 2 dϕ̃r 2 dϕ̃s h
Ai−1,j−1 = ϕ0i (x)ϕ0j (x) dx = dX . 1
Ω (e) −1 h dX h dX 2 Ã(e) = 1 , b̃(e) = h 1 , (5.9)
The integral for the element vector. The right-hand side is transformed h
according to for e = 0 and e = Ne . In these cells, we have only one degree of freedom,
not two as in the interior cells.
(e)
Z Z 1 h Assembly. The next step is to assemble the contributions from the
bi−1 = 2ϕi (x) dx = 2ϕ̃r (X) dX, i = q(e, r), r = 0, 1 . various cells. The assembly of an element matrix and vector into the
Ω (e) −1 2
global matrix and right-hand side can be expressed as
Detailed calculations of the element matrix and vector. Specifically
for P1 elements we arrive at the following calculations for the element
matrix entries: Aq(e,r),q(e,s) = Aq(e,r),q(e,s) + Ã(e)
r,s , bq(e,r) = bq(e,r) + b̃(e)
r ,
Z for r and s running over all local degrees of freedom in cell e.

(e)
1 2 1 2 1 h 1 To make the assembly algorithm more precise, it is convenient to set
Ã0,0 = − − dX =
−1 h 2 h 2 2 h up Python data structures and a code snippet for carrying out all details
Z
(e)
1 2 1 2 1 h 1 of the algorithm. For a mesh of four equal-sized P1 elements and L = 2
Ã0,1 = − dX = − we have
−1 h 2 h 2 2 h
Z
(e)
1 2 1 2 1 h 1 vertices = [0, 0.5, 1, 1.5, 2]
Ã1,0 = − dX = −
−1 h 2 h 2 2 h cells = [[0, 1], [1, 2], [2, 3], [3, 4]]
Z 1 dof_map = [[0], [0, 1], [1, 2], [2]]
(e) 2 1 2 1 h 1
Ã1,1 = dX =
−1 h 2 h 2 2 h The total number of degrees of freedom is 3, being the function values
at the internal 3 nodes where u is unknown. In cell 0 we have global
The element vector entries become degree of freedom 0, the next cell has u unknown at its two nodes, which
Z become global degrees of freedom 0 and 1, and so forth according to
(e) 1 1 h
b̃0 = 2 (1 − X) dX = h the dof_map list. The mathematical q(e, r) quantity is nothing but the
−1 2 2
Z 1
1 h dof_map list.
(e)
b̃1 = 2 (1 + X) dX = h . Assume all element matrices are stored in a list Ae such that
−1 2 2 (e)
Ae[e][i,j] is Ãi,j . A corresponding list for the element vectors is named
Expressing these entries in matrix and vector notation, we have (e)
be, where be[e][r] is b̃r . A Python code snippet illustrates all details
! ! of the assembly algorithm:
1 1 −1 1
Ã(e) = , b̃(e) = h . (5.8)
h −1 1 1 # A[i,j]: coefficient matrix, b[i]: right-hand side
for e in range(len(Ae)):
Contributions from the first and last cell. The first and last cell for r in range(Ae[e].shape[0]):
for s in range(Ae[e].shape[1]):
involve only one unknown and one basis function because of the Dirichlet A[dof_map[e,r],dof_map[e,s]] += Ae[e][i,j]
boundary conditions at the first and last node. The element matrix b[dof_map[e,r]] += be[e][i,j]
5.2 Boundary conditions: specified nonzero value 189 190 5 Variational formulations with finite elements
P
The general case with N_e P1 elements of length h has It is easy to verify that B(xi ) = j∈Ib Uj ϕj (xi ) = Ui .
The unknown function can then be written as
N_n = N_e + 1
vertices = [i*h for i in range(N_n)] X X
cells = [[e, e+1] for e in range(N_e)] u(x) = Uj ϕj (x) + cj ϕν(j) , (5.11)
dof_map = [[0]] + [[e-1, e] for i in range(1, N_e)] + [[N_n-2]] j∈Ib j∈Is
Carrying out the assembly results in a linear system that is identical where ν(j) maps unknown number j in the equation system to node ν(j),
to (5.3), which is not surprising, since the procedures is mathematically Ib is the set of indices corresponding to basis functions associated with
equivalent to the calculations in the physical domain. nodes where Dirichlet conditions apply, and Is is the set of indices used
So far, our technique for computing the matrix system have assumed to number the unknowns from zero to N . We can easily show that with
that u(0) = u(L) = 0. The next section deals with the extension to this u, a Dirichlet condition u(xk ) = Uk is fulfilled:
nonzero Dirichlet conditions.
X X
u(xk ) = Uj ϕj (xk ) + cj ϕν(j) (xk ) = Uk
j∈Ib
| {z } j∈Is
| {z }
6=0 ⇔ j=k =0, k6∈Is
5.2 Boundary conditions: specified nonzero value Some examples will further clarify the notation. With a regular left-
to-right numbering of nodes in a mesh with P1 elements, and Dirichlet
We have to take special actions to incorporate nonzero Dirichlet condi- conditions at x = 0, we use finite element basis functions associated with
tions, such as u(L) = D, into the computational procedures. The present the nodes 1, 2, . . . , Nn − 1, implying that ν(j) = j + 1, j = 0, . . . , N ,
section outlines alternative, yet mathematically equivalent, methods. where N = Nn − 2. Consider a particular mesh:
5.2.1 General construction of a boundary function

In Section 4.1.11 we introduced a boundary function B(x) to deal with
nonzero Dirichlet boundary conditions for u. The construction of such
a function is not always trivial, especially not in multiple dimensions. x
However, a simple and general constructive idea exists when the basis 0 1 2 3 4 5
functions have the property
(
Ω(0) Ω(1) Ω(2) Ω(3) Ω(4)
1, i = j,
ϕi (xj ) = δij , δij =
0, i 6= j, The expansion associated with this mesh becomes
where xj is a boundary point. Examples on such functions are the
u(x) = U0 ϕ0 (x) + c0 ϕ1 (x) + c1 ϕ2 (x) + · · · + c4 ϕ5 (x) .
Lagrange interpolating polynomials and finite element functions.
Suppose now that u has Dirichlet boundary conditions at nodes with Switching to the more standard case of left-to-right numbering and
numbers i ∈ Ib . For example, Ib = {0, Nn − 1} in a 1D mesh with node boundary conditions u(0) = C, u(L) = D, we have N = Nn − 3 and
numbering from left to right and Dirichlet conditions at the end nodes
i = 0 and i = Nn − 1. Let Ui be the corresponding prescribed values of X
u(xi ). We can then, in general, use u(x) = Cϕ0 + DϕNn −1 + cj ϕj+1
j∈Is
X
B(x) = Uj ϕj (x) . (5.10) = Cϕ0 + DϕNn + c0 ϕ1 + c1 ϕ2 + · · · + cN ϕNn −2 .
j∈Ib
Finite element meshes in non-trivial 2D and 3D geometries usually X

leads to an irregular cell and node numbering. Let us therefore take a u(x) = B(x) + cj ψj (x)
look at an irregular numbering in 1D: j∈Is
in −(u00 , ψi ) = (f, ψi ) and integrating by parts results in a linear system

2.5 with
2.0
Z Z
1.5 L L
Ai,j = ψi0 (x)ψj0 (x) dx, bi = (f (x)ψi (x) − B 0 (x)ψi0 (x)) dx .
1.0 0 0
0.5 We choose ψi = ϕi+1 , i = 0, . . . , N = Nn − 3 if the node numbering is
0.0 x from left to right. (Later we also need the assumption that cells too are
0.5 3 0 4 5 2 1 numbered from left to right.) The boundary function becomes
1.0 Ω(3) Ω(2) Ω(1) Ω(4) Ω(0)
B(x) = Cϕ0 (x) + DϕNn −1 (x) .
1.5 0 1 2 3 4 5 6 7
The expansion for u(x) is
X
Say we in this mesh have Dirichlet conditions on the left-most and u(x) = B(x) + cj ϕj+1 (x) .
right-most node, with numbers 3 and 1, respectively. We can number j∈Is
the unknowns at the interior nodes as we want, e.g., from left to right,
We can write the matrix and right-hand side entries as
resulting in ν(0) = 0, ν(1) = 4, ν(2) = 5, ν(3) = 2. This gives
Z
B(x) = U3 ϕ3 (x) + U1 ϕ1 (x), L
Ai−1,j−1 = ϕ0i (x)ϕ0j (x) dx,
0
and Z L
bi−1 = (f (x)ϕ0i (x) − (Cϕ00 (x) + Dϕ0Nn −1 (x))ϕ0i (x)) dx,
0
3
X
u(x) = B(x) + cj ϕν(j) = U3 ϕ3 + U1 ϕ1 + c0 ϕ0 + c1 ϕ4 + c2 ϕ5 + c3 ϕ2 . for i, j = 1, . . . , N + 1 = Nn − 2. Note that we have here used B 0 =
j=0 Cϕ00 + Dϕ0Nn −1 .
The idea of constructing B, described here, generalizes almost trivially Computations in physical coordinates. Most of the terms in the linear
P
to 2D and 3D problems: B = j∈Ib Uj ϕj , where Ib is the index set system have already been computed so we concentrate on the new con-
R
containing the numbers of all the nodes on the boundaries where Dirichlet tribution from the boundary function. The integral C 0L ϕ00 (x))ϕ0i (x) dx,
values are prescribed. associated with the Dirichlet condition in x = 0, can only get a nonzero
contribution from the first cell, Ω (0) = [x0 , x1 ] since ϕ00 (x) = 0 on all
other cells. Moreover, ϕ00 (x)ϕ0i (x) dx =
6 0 only for i = 0 and i = 1 (but
5.2.2 Example on computing with a finite element-based node i = 0 is excluded from the formulation), since ϕ Ri
= 0 on the first cell
boundary function if i > 1. With a similar reasoning we realize that D 0L ϕ0Nn −1 (x))ϕ0i (x) dx
can only get a nonzero contribution from the last cell. From the explana-
Let us see how the model problem −u00 = 2, u(0) = C, u(L) = D, tions of the calculations in Section 3.1.6 we then find that
is affected by a B(x) to incorporate boundary values. Inserting the
expression
Z Z
L 1 1 1 1 2 dϕ̃1 2 dϕ̃0 h
ϕ00 (x)ϕ01 (x) dx = (− ) · · h = − , 0 =
b̃N f ϕ̃0 − D dX
e
0 h h h −1 h dX h dX 2
Z L Z 1
1 1 1 h 212 1 h 1
ϕ0Nn −1 (x)ϕ0Nn −2 (x) dx = · (− ) · h = − . = 2 ϕ̃0 dX − D (− ) · 2 = h + D .
0 h h h 2 −1 h2h 2 2 h
With these expressions we get The contributions from the B(x) function to the global right-hand side
vector becomes C/h for b0 and D/h for bN , exactly as we computed in
Z L 1
Z L 1 the physical domain.
b0 = f (x)ϕ1 dx − C(− ), bN = f (x)ϕNn −2 dx − D(− ) .
0 h 0 h
Cellwise computations on the reference element. As an equivalent 5.2.3 Modification of the linear system
alternative, we now turn to cellwise computations. The element matrices
and vectors are calculated as in Section 5.1.4, so we concentrate on the From an implementational point of view, there is a convenient alternative
impact of the new term involving B(x). This new term, B 0 = Cϕ00 + to adding the B(x) function and using only the basis functions associated
Dϕ0Nn −1 , vanishes on all cells except for e = 0 and e = Ne . Over the first with nodes where u is truly unknown. Instead of seeking
cell (e = 0) the B 0 (x) function in local coordinates reads X X
u(x) = Uj ϕj (x) + cj ϕν(j) (x), (5.12)
dB 2 dϕ̃0
=C , j∈Ib j∈Is
dx h dX we use the sum over all degrees of freedom, including the known boundary
while over the last cell (e = Ne ) it looks like values:
dB 2 dϕ̃1 X
=D . u(x) = cj ϕj (x) . (5.13)
dx h dX j∈Is
For an arbitrary interior cell, we have the formula
Note that the collections of unknowns {ci }i∈Is in (5.12) and (5.13) are
Z 1 h different. The index set Is = {0, . . . , N } always goes to N , and the
b̃(e) = f (x(X))ϕ̃r (X) dX, number of unknowns is N + 1, but in (5.12) the unknowns correspond
r
−1 2
to nodes where u is not known, while in (5.13) the unknowns cover u
for an entry in the local element vector. In the first cell, the value at
values at all the nodes. So, if the index set Ib contains Nb node numbers
local node 0 is known so only the value at local node 1 is unknown. The
where u is prescribed, we have that N = Nn − Nb in (5.12) and N = Nn
associated element vector entry becomes
in (5.13).
Z The idea is to compute the entries in the linear system as if no Dirichlet
(1)
1 2 dϕ̃0 2 dϕ̃1 h values are prescribed. Afterwards, we modify the linear system to ensure
b̃0 = f ϕ̃1 − C dX
−1 h dX h dX 2 that the known cj values are incorporated.
Z 1
h 2 1 21h 1 A potential problem arises for the boundary term [u0 v]L 0 from the
= 2 ϕ̃1 dX − C (− ) ·2=h+C .
2 −1 h 2 h22 h integration by parts: imagining no Dirichlet conditions means that we no
longer require v = 0 at Dirichlet points, and the boundary term is then
The value at local node 1 in the last cell is known so the element vector nonzero at these points. However, when we modify the linear system, we
here is will erase whatever the contribution from [u0 v]L
0 should be at the Dirichlet
points in the right-hand side of the linear system. We can therefore safely
forget [u0 v]L
0 at any point where a Dirichlet condition applies.
  
Computations in the physical system. Let us redo the computations 1 −1 0 · · · · · · · · · · · · ··· 0 c0  
h
..   ..
in the example in Section 5.2.1. We solve −u00 = 2 with u(0) = 0  −1 2 −1 . . .
   
.  .
    2h 
and u(L) = D. The expressions for Ai,j and bi are the same, but the 
..   .
  
  . 
 0 −1 2 −1 . . .

.   .   .. 
numbering is different as the numbering of unknowns and nodes now   .   
 . . .. .. ..  
 ..   . 
coincide:  .. . . . . 0 .    .. 
  .   
1
 .. .. .. .. .. .. ..   ..
    . 
= . 
Z L Z L . . . . . . .  .   .  (5.14)
Ai,j = ϕ0i (x)ϕ0j (x) dx, bi = f (x)ϕi (x) dx, h
 .
. . ..   ..
   . 
 .   . 
0 0  . 0 −1 2 −1 . .  .   . 
    
for i, j = 0, . . . , N = Nn − 1. The integrals involving basis functions  .. .. .. .. ..   ..   .. 
 . . . . . 0   .
   . 
corresponding to interior mesh nodes, i, j = 1, . . . , Nn − 2, are obviously 
 .. .. .. ..   ..
  
  2h 
the same as before. We concentrate on the contributions from ϕ0 and  . . . . −1   . 
h
ϕNn −1 : 0 · · · · · · · · · · · · · · · 0 −1 1 cN
Incorporation of Dirichlet values can now be done by replacing the

Z Z
L 1
x1
first and last equation by the very simple equations c0 = 0 and cN = D,
A0,0 = (ϕ00 )2
dx = = dx , (ϕ00 )2
0 0 h respectively. Note that the factor 1/h in front of the matrix then requires
Z L Z x1
1 a factor h to be introduce appropriately on the diagonal in the first and
A0,1 = ϕ0 ϕ1 dx =
0 0
ϕ00 ϕ01 dx = − , last row of the matrix.
0 0 h
Z Z
L xNn −1 1
AN,N = (ϕ0N )2 dx = (ϕ0N )2 dx = ,   
0 xNn −2 h h 00 ··· ··· ··· ··· ··· 0 c0  
0
Z Z xN −1 ..   ..
1  −1 2 −1 . . .
L    
.  .   2h 
n
AN,N −1 = ϕ0N ϕ0N −1 dx = ϕ0N ϕ0N −1 dx =− . 
 
  
0 xNn −2 h
 0 −1 2 −1 . . .
 ..   .
 .
  . 
  .. 
 .   .   
 . . .
..   ...
The new terms on the right-hand side are also those involving ϕ0 and . .     . 
 .. . . .. .. 0   .. 
ϕNn −1 : 
1
   
 ..  .
.. 
.. .. .. .. ..    . 
.   ..
. . . . . = .  (5.15)
h .   . 
 .
. . ..   ..
   . 
 .   . 
Z L Z x1  . 0 −1 2 −1 . .  .   . 
b0 = 2ϕ0 (x) dx = 2ϕ0 (x) dx = h, 
 .. .. .. .. ..

  ..
  
  .. 
Z
0 0
Z
 .

. . . . 0   .
   . 
  
L xNn −1  .. . . . . . .   ..   2h 
bN = 2ϕNn −1 dx = 2ϕNn −1 dx = h .  . . . . −1   . 
0 D
0 ··· ··· ··· ··· ··· 0 0 h
xNn −2
cN
The complete matrix system, involving all degrees of freedom, takes
the form Note that because we do not require ϕi (0) = 0 and ϕi (L) = 0, i ∈ Is ,
the boundary term [u0 v]L 0 , in principle, gives contributions u (0)ϕ0 (0)
0
to b0 and u0 (L)ϕN (L) to bN (u0 ϕi vanishes for x = 0 or x = L for

i = 1, . . . , N − 1). Nevertheless, we erase these contributions in b0 and
bN and insert boundary values instead. This argument shows why we
can drop computing [u0 v]L 0 at Dirichlet nodes when we implement the
Dirichlet values by modifying the linear system.
  
5.2.4 Symmetric modification of the linear system h 0 0 ··· ··· ··· ··· ··· 0 c0 
0

. ..   ..
2 −1 . .
   
0 .  .
    2h
The original matrix system (5.3) is symmetric, but the modifications    
 . ..   .   . 
in (5.15) destroy this symmetry. Our described modification will in −1 2 −1 . .  . ..
0 
 .  .
 
  

general destroy an initial symmetry in the matrix system. This is not a .
 .. .. .. .. ..  
 ..   .
.. 
. . . 0 .   .
  
particular computational disadvantage for tridiagonal systems arising in 
1
  
 .. .. .. .. .. .. ..   .. ..
    
. . . . . = (5.16)
1D problems, but may be more serious in 2D and 3D problems when the h . .  .   . 

.
. . ..   .. ..
  
systems are large and exploiting symmetry can be important for halving .   
. 0 −1 2 −1 . .   .   . 

the storage demands and speeding up computations. Methods for solving 
 .. .. .. .. .. 

 ..
 
  .. 
symmetric matrix are also usually more stable and efficient than those .

. . . . 0  .
  
  . 

 .. .. .. ..    .   2h + D/h 
for non-symmetric systems. Therefore, an alternative modification which . . . . 0  ..  
preserves symmetry is attractive. 0 ··· ··· ··· ··· ··· 0 0 h cN
D
One can formulate a general algorithm for incorporating a Dirichlet
condition in a symmetric way. Let ck be a coefficient corresponding to a
known value u(xk ) = Uk . We want to replace equation k in the system
5.2.5 Modification of the element matrix and vector
by ck = Uk , i.e., insert zeroes in row number k in the coefficient matrix,
set 1 on the diagonal, and replace bk by Uk . A symmetry-preserving The modifications of the global linear system can alternatively be done for
modification consists in first subtracting column number k in the coeffi- the element matrix and vector. Let us perform the associated calculations
cient matrix, i.e., Ai,k for i ∈ Is , times the boundary value Uk , from the in the computational example where the element matrix and vector is
right-hand side: bi ← bi − Ai,k Uk , i = 0, . . . , N . Then we put zeroes in given by (5.8). The modifications are needed in cells where one of the
both row number k and column number k in the coefficient matrix, and degrees of freedom is known. In the present example, this means the first
finally set bk = Uk . The steps in algorithmic form becomes and last cell. We compute the element matrix and vector as if there were
no Dirichlet conditions. The boundary term [u0 v]L 0 is simply forgotten
1. bi ← bi − Ai,k Uk for i ∈ Is
at nodes that have Dirichlet conditions because the modification of the
2. Ai,k = Ak,i = 0 for i ∈ Is
element vector will anyway erase the contribution from the boundary
3. Ak,k = 1
term. In the first cell, local degree of freedom number 0 is known and
4. bi = Uk
the modification becomes
This modification goes as follows for the specific linear system written ! !
out in (5.14) in Section 5.2.3. First we subtract the first column in the (0) 1 h0 (0) 0
Ã =A= , b̃ = . (5.17)
coefficient matrix, times the boundary value, from the right-hand side. h −1 1 h
Because c0 = 0, this subtraction has no effect. Then we subtract the In the last cell we set
last column, times the boundary value D, from the right-hand side. This ! !
action results in bN −1 = 2h + D/h and bN = h − 2D/h. Thereafter, we (Ne ) 1 1 −1 (Ne ) h
place zeros in the first and last row and column in the coefficient matrix Ã =A= , b̃ = . (5.18)
h 0 h D
and 1 on the two corresponding diagonal entries. Finally, we set b0 = 0
and bN = D. The result becomes We can also perform the symmetric modification. This operation affects
only the last cell with a nonzero Dirichlet condition. The algorithm is
the same as for the global linear system, resulting in
! !
(Ne ) 1 10 (Ne ) h + D/h
Ã =A= , b̃ = . (5.19)
h 0h D
5.3 Boundary conditions: specified derivative 199 200 5 Variational formulations with finite elements
The reader is encouraged to assemble the element matrices and vectors The basis function that is left out is used in a boundary function B(x)
and check that the result coincides with the system (5.16). instead. With a left-to-right numbering, ψi = ϕi , i = 0, . . . , Nn − 2, and
B(x) = DϕNn −1 :
N =N
Xn −2
5.3 Boundary conditions: specified derivative u(x) = DϕNn −1 (x) + cj ϕj (x) .
j=0
Suppose our model problem −u00 (x) = f (x) features the boundary condi- Inserting this expansion for u in the variational form (5.3.1) leads to
tions u0 (0) = C and u(L) = D. As already indicated in Section 4.3, the the linear system
former condition can be incorporated through the boundary term that
arises from integration by parts. The details of this method will now be !
Z Z
illustrated in the context of finite element basis functions. N
X L L
ϕ0i (x)ϕ0j (x) dx cj = f (x)ϕi (x) − Dϕ0Nn −1 (x)ϕ0i (x) dx−Cϕi (0),
j=0 0 0
(5.20)
5.3.1 The variational formulation for i = 0, . . . , N = Nn − 2.
Starting with the Galerkin method,
Z
5.3.3 Boundary term vanishes because of linear system
L
(u00 (x) + f (x))ψi (x) dx = 0, i ∈ Is ,
0 modifications
integrating u00 ψi by parts results in
We may, as an alternative to the approach in the previous section, use a
Z Z
basis {ψi }i∈Is which contains all the finite element functions on the mesh:
L L
ψi = ϕi , i = 0, . . . , Nn −1 = N . In this case, u0 (L)ψi (L) = u0 (L)ϕi (L) 6= 0
u0 (x)0 ψi0 (x) dx−(u0 (L)ψi (L)−u0 (0)ψi (0)) = f (x)ψi (x) dx, i ∈ Is .
0 0 for the i corresponding to the boundary node at x = L (where ϕi = 1).
The number of this node is i = Nn − 1 = N if a left-to-right numbering
The first boundary term, u0 (L)ψi (L), vanishes because u(L) = D.
of nodes is utilized.
The second boundary term, u0 (0)ψi (0), can be used to implement the
However, even though u0 (L)ϕNn −1 (L) 6= 0, we do not need to compute
condition u0 (0) = C, provided ψi (0) 6= 0 for some i (but with finite
this term. For i < Nn − 1 we realize that ϕi (L) = 0. The only nonzero
elements we fortunately have ψ0 (0) = 1). The variational form of the
contribution to the right-hand side comes from i = N (bN ). Without a
differential equation then becomes
boundary function we must implement the condition u(L) = D by the
Z L Z L equivalent statement cN = D and modify the linear system accordingly.
u0 (x)ϕ0i (x) dx + Cϕi (0) = f (x)ϕi (x) dx, i ∈ Is . This modification will erase the last row and replace bN by another value.
0 0
Any attempt to compute the boundary term u0 (L)ϕNn −1 (L) and store it
in bN will be lost. Therefore, we can safely forget about boundary terms
5.3.2 Boundary term vanishes because of the test functions corresponding to Dirichlet boundary conditions also when we use the
methods from Section 5.2.3 or Section 5.2.4.
At points where u is known we may require ψi to vanish. Here, u(L) = D The expansion for u reads
and then ψi (L) = 0, i ∈ Is . Obviously, the boundary term u0 (L)ψi (L)
X
then vanishes. u(x) = cj ϕj (x) .
The set of basis functions {ψi }i∈Is contains, in this case, all the finite j∈Is
element basis functions on the mesh, except the one that is 1 at x = L.
5.3 Boundary conditions: specified derivative 201 202 5 Variational formulations with finite elements
    
Insertion in the variational form (5.3.1) leads to the linear system 1 −1 0 · · · · · · · · · · · · ··· 0 c0
h−C
..   ..
 −1 2 −1 . . .
   
.  .
 
  2h 
!   
..   . . 
 0 −1 2 −1 . . . .
Z Z    
X
.   . .
 .
L L   
ϕ0i (x)ϕ0j (x) dx cj = (f (x)ϕi (x)) dx − Cϕi (0), i ∈ Is . 
 . .
 
..

0 0  .. . . .. .. ..  
 
 .. .

j∈Is
 . . 0 . 
 
   .


(5.21) 1
 .. .. .. .. .. .. ..   ..
 
  .. 
. . . . . . =
.  . . .
After having computed the system, we replace the last row by cN = D, h
 .
 
.

. . ..   ..
 
 
either straightforwardly as in Section 5.2.3 or in a symmetric fashion  .   .. 
 . 0 −1 2 −1 . .  .
  
as in Section 5.2.4. These modifications can also be performed in the

 .. .. .. .. ..
 

  ..
  .
.


 . . . . . 0  .
 .
   
element matrix and vector for the right-most cell. 
 ..
 
  .. .. 
.. .. ..  
.

 . . . . −1   .
  
0 · · · · · · · · · · · · · · · 0 −1 2 cN 2h − Dh/6
5.3.4 Direct computation of the global linear system (5.22)
Next we consider the technique where we modify the linear system to
We now turn to actual computations with P1 finite elements. The focus incorporate Dirichlet conditions (see Section 5.3.3).RNow N = Nn −1. The
is on how the linear system and the element matrices and vectors are two differences from the case above is that the − 0L DϕNn −1 ϕi dx term
modified by the condition u0 (0) = C. is left out of the right-hand side and an extra last row associated with
Consider first the approach where Dirichlet conditions are incorporated the node xNn −1 = L where the Dirichlet condition applies is appended
by a B(x) function and the known degree of freedom CNn −1 is left out of to the system. This last row is anyway replaced by the condition cN = D
the linear system (see Section 5.3.2). The relevant formula for the linear or this condition can be incorporated in a symmetric fashion. Using the
system is given by (5.20). There are three differences compared to the simplest, former approach gives
extensively computed case where u(0) = 0 in Sections 5.1.2 and 5.1.4.
First, because we do not have a Dirichlet condition at the left boundary,   
we need to extend the linear system (5.3) with an equation associated 1 −1 0 · · · · · ·
··· ··· ··· 0 c0  
..   .. h−C
.
2 −1 . .
  
with the node x0 = 0. According to Section 5.2.3, this extension consists  −1
 .  .
    2h 
  
of including A0,0 = 1/h, A0,1 = −1/h, and b0 = h. For i > 0 we have  . ..   .   . 
 0 −1 2 −1 . . .   .
 .
  .. 
Ai,i = 2/h, Ai−1,i = Ai,i+1 = −1/h. Second, we need to include the 
 . .. .. .. ..   .
 
  . 
 .. . . . 0 .   .   .. 
extra term −Cϕi (0) on the right-hand side. Since all ϕi (0) = 0 for   .   
1
 .. .. .. .. .. .. ..  

 .
  . 
.   ..
i = 1, . . . , N , this term reduces to −Cϕ0 (0) = −C and affects only the . . . . . = .  (5.23)
h .   .  .
first equation (i = 0). We simply add −C to b0 such that b0 = h − C.  . .    . 
 ..

 . .   . 
R
Third, the boundary term − 0L DϕNn −1 (x)ϕi dx must be computed.  .

0 −1 2 −1 . . ..  
 .   . 
  
 .. .. .. .. ..   ..   .. 
Since i = 0, . . . , N = Nn − 2, this integral can only get a nonzero  . . . . . 0    . 
 .
 
  
contribution with i = Nn − 2 over the last cell. The result becomes  .. ..   ..   2h 
 . . −1 2 −1   .  
−Dh/6. The resulting linear system can be summarized in the form D
0 ··· ··· ··· ··· ··· 0 0 h cN
5.3.5 Cellwise computations

Now we compute with one element at a time, working in the reference
coordinate system X ∈ [−1, 1]. We need to see how the u0 (0) = C
5.4 Implementation of finite element algorithms 203 204 5 Variational formulations with finite elements
condition affects the element matrix and vector. The extra term −Cϕi (0) 1. other types of integrands (as implied by the variational formulation)
in the variational formulation only affects the element vector in the first 2. additional boundary terms in the variational formulation for Neumann
cell. On the reference cell, −Cϕi (0) is transformed to −C ϕ̃r (−1), where boundary conditions
r counts local degrees of freedom. We have ϕ̃0 (−1) = 1 and ϕ̃1 (−1) = 0 3. modification of element matrices and vectors due to Dirichlet boundary
(0)
so we are left with the contribution −C ϕ̃0 (−1) = −C to b̃0 : conditions
! ! Point 1 and 2 can be taken care of by letting the user supply functions
(0) 1 11 (0) h−C
Ã =A= , b̃ = . (5.24) defining the integrands and boundary terms on the left- and right-hand
h −1 1 h
side of the equation system:
No other element matrices or vectors are affected by the −Cϕi (0) bound-
ary term. • Integrand on the left-hand side: ilhs(e, phi, r, s, X, x, h)
There are two alternative ways of incorporating the Dirichlet condition. • Integrand on the right-hand side: irhs(e, phi, r, X, x, h)
Following Section 5.3.2, we get a 1 × 1 element matrix in the last cell • Boundary term on the left-hand side: blhs (e, phi, r, s, X, x,
and an element vector with an extra term containing D: h)
• Boundary term on the right-hand side: brhs (e, phi, r, s, X, x,
1 h)
Ã(e) =
1 , b̃(e) = h 1 − D/6 , (5.25)
h
Here, phi is a dictionary where phi[q] holds a list of the derivatives of
Alternatively, we include the degree of freedom at the node with
order q of the basis functions with respect to the physical coordinate
u specified. The element matrix and vector must then be modified to
x. The derivatives are available as Python functions of X. For example,
constrain the c̃1 = cN value at local node r = 1:
phi[0][r](X) means ϕ̃r (X), and phi[1][s](X, h) means dϕ̃s (X)/dx
1
! ! (we refer to the file fe1D.py for details regarding the function basis
11 h
Ã(Ne ) = A = , b̃(Ne ) = . (5.26) that computes the phi dictionary). The r and s arguments in the above
h 0h D
functions correspond to the index in the integrand contribution from an
(e) (e)
integration point to Ãr,s and b̃r . The variables e and h are the current
element number and the length of the cell, respectively. Specific examples
5.4 Implementation of finite element algorithms below will make it clear how to construct these Python functions.
Given a mesh represented by vertices, cells, and dof_map as ex-
At this point, it is sensible to create a program with symbolic calculations plained before, we can write a pseudo Python code to list all the steps in
to perform all the steps in the computational machinery, both for au- the computational algorithm for finite element solution of a differential
tomating the work and for documenting the complete algorithms. As we equation.
have seen, there are quite many details involved with finite element com-
<Declare global matrix and rhs: A, b>
putations and incorporation of boundary conditions. An implementation
will also act as a structured summary of all these details. for e in range(len(cells)):
# Compute element matrix and vector

n = len(dof_map[e]) # no of dofs in this element
5.4.1 Extensions of the code for approximation h = vertices[cells[e][1]] - vertices[cells[e][1]]
<Initialize element matrix and vector: A_e, b_e>
Implementation of the finite element algorithms for differential equations # Integrate over the reference cell
follows closely the algorithm for approximation of functions. The new points, weights = <numerical integration rule>
additional ingredients are for X, w in zip(points, weights):
phi = <basis functions and derivatives at X>
detJ = h/2 A complete function finite_element1D_naive for the 1D finite algo-

dX = detJ*w rithm above, is found in the file fe1D.py. The term “naive” refers to a
x = <affine mapping from X> version of the algorithm where we use a standard dense square matrix as
for r in range(n): global matrix A. The implementation also has a verbose mode for printing
for s in range(n): out the element matrices and vectors as they are computed. Below is
A_e[r,s] += ilhs(e, phi, r, s, X, x, h)*dX
b_e[r] += irhs(e, phi, r, X, x, h)*dX the complete function without the print statements. You should study in
detail since it contains all the steps in the finite element algorithm.
# Add boundary terms
for r in range(n): def finite_element1D_naive(
for s in range(n): vertices, cells, dof_map, # mesh
A_e[r,s] += blhs(e, phi, r, s, X, x)*dX essbc, # essbc[globdof]=value
b_e[r] += brhs(e, phi, r, X, x, h)*dX ilhs, # integrand left-hand side
irhs, # integrand right-hand side
# Incorporate essential boundary conditions blhs=lambda e, phi, r, s, X, x, h: 0,
for r in range(n): brhs=lambda e, phi, r, X, x, h: 0,
global_dof = dof_map[e][r] intrule=’GaussLegendre’, # integration rule class
if global_dof in essbc: verbose=False, # print intermediate results?
# local dof r is subject to an essential condition ):
value = essbc[global_dof] N_e = len(cells)
# Symmetric modification N_n = np.array(dof_map).max() + 1
b_e -= value*A_e[:,r]
A_e[r,:] = 0 A = np.zeros((N_n, N_n))
A_e[:,r] = 0 b = np.zeros(N_n)
A_e[r,r] = 1
b_e[r] = value for e in range(N_e):
Omega_e = [vertices[cells[e][0]], vertices[cells[e][1]]]
# Assemble h = Omega_e[1] - Omega_e[0]
for r in range(n):
for s in range(n): d = len(dof_map[e]) - 1 # Polynomial degree
A[dof_map[e][r], dof_map[e][s]] += A_e[r,s] # Compute all element basis functions and their derivatives
b[dof_map[e][r] += b_e[r] phi = basis(d)
<solve linear system> # Element matrix and vector

n = d+1 # No of dofs per element
Having implemented this function, the user only has supply the ilhs, A_e = np.zeros((n, n))
irhs, blhs, and brhs functions that specify the PDE and boundary b_e = np.zeros(n)
conditions. The rest of the implementation forms a generic computa- # Integrate over the reference cell
tional engine. The big finite element packages Diffpack and FEniCS are if intrule == ’GaussLegendre’:
structured exactly the same way. A sample implementation of ilhs for points, weights = GaussLegendre(d+1)
elif intrule == ’NewtonCotes’:
the 1D Poisson problem is: points, weights = NewtonCotes(d+1)
def integrand_lhs(phi, i, j): for X, w in zip(points, weights):
return phi[1][i]*phi[1][j] detJ = h/2
x = affine_mapping(X, Omega_e)
which returns dϕ̃i (X)/dxdϕ̃j (X)/dx. Reducing the amount of code the dX = detJ*w
user has to supply and making the code as close as possible to the
mathematical formulation makes the software user-friendly and easy to # Compute contribution to element matrix and vector
for r in range(n):
debug. for s in range(n):
A_e[r,s] += ilhs(phi, r, s, X, x, h)*dX
b_e[r] += irhs(phi, r, X, x, h)*dX A[i,j] = term
# Add boundary terms but only the index pairs (i,j) we have used in assignments or in-place
for r in range(n): arithmetics are actually stored. A tailored solution algorithm is needed.
for s in range(n):
A_e[r,s] += blhs(phi, r, s, X, x, h)
The most reliable is sparse Gaussian elimination. SciPy gives access to
b_e[r] += brhs(phi, r, X, x, h) the UMFPACK algorithm for this purpose:
# Incorporate essential boundary conditions import scipy.sparse.linalg
modified = False c = scipy.sparse.linalg.spsolve(A.tocsr(), b, use_umfpack=True)
for r in range(n):
global_dof = dof_map[e][r] The declaration of A and the solve statement are the only changes needed
if global_dof in essbc: in the finite_element1D_naive to utilize sparse matrices. The resulting
# dof r is subject to an essential condition modification is found in the function finite_element1D.
value = essbc[global_dof]
# Symmetric modification
b_e -= value*A_e[:,r]
A_e[r,:] = 0 5.4.3 Application to our model problem
A_e[:,r] = 0
A_e[r,r] = 1 Let us demonstrate the finite element software on
b_e[r] = value
modified = True
−u00 (x) = f (x), x ∈ (0, L), u0 (0) = C, u(L) = D .
# Assemble
for r in range(n): This problem can be analytically solved by the model2 function from
for s in range(n): Section 4.1.2. Let f (x) = x2 . Calling model2(x**2, L, C, D) gives
A[dof_map[e][r], dof_map[e][s]] += A_e[r,s]
b[dof_map[e][r]] += b_e[r]
1 4
u(x) = D + C(x − L) + (L − x4 )
c = np.linalg.solve(A, b) 12
return c, A, b, timing
The variational formulation reads
The timing object is a dictionary holding the CPU spent on computing
A and the CPU time spent on solving the linear system. (We have left (u0 , v) = (x2 , v) − Cv(0) .
out the timing statements.) The entries in the element matrix and vector, which we need to set up
the ilhs, irhs, blhs, and brhs functions, becomes
5.4.2 Utilizing a sparse matrix Z 1 dϕ̃r ϕ̃s

A potential efficiency problem with the finite_element1D_naive func- A(e)
r,s = (det J dX),
−1 dx dx
tion is that it uses dense (N + 1) × (N + 1) matrices, while we know that Z 1
only 2d + 1 diagonals around the main diagonal are different from zero. b(e) = x2 ϕ̃r det J dX − C ϕ̃r (−1)I(e, 0),
Switching to a sparse matrix is very easy. Using the DOK (dictionary of
−1
keys) format, we declare A as where I(e) is an indicator function: I(e, q) = 1 if e = q, otherwise I(e) = 0.
We use this indicator function to formulate that the boundary term Cv(0),
import scipy.sparse
A = scipy.sparse.dok_matrix((N_n, N_n)) which in the local element coordinate system becomes C ϕ̃r (−1), is only
included for the element e = 0.
Assignments or in-place arithmetics are done as for a dense matrix,
The functions for specifying the element matrix and vector entries
A[i,j] += term must contain the integrand, but without the det J dX term as this term
is taken care of by the quadrature loop, and the derivatives dϕ̃r (X)/dx between the nodes. Exercise 5.6 asks you to explore this problem further
with respect to the physical x coordinates are contained in phi[1][r](X), using other m and d values.
computed by the function basis.
def ilhs(e, phi, r, s, X, x, h):
return phi[1][r](X, h)*phi[1][s](X, h) 14
finite elements, d=1
def irhs(e, phi, r, X, x, h): exact
return x**2*phi[0][r](X) 12
def blhs(e, phi, r, s, X, x, h): 10

return 0
def brhs(e, phi, r, X, x, h): 8

return -C*phi[0][r](-1) if e == 0 else 0
We can then make the call to finite_element1D_naive or 6

finite_element1D to solve the problem with two P1 elements:
4
from fe1D import finite_element1D_naive, mesh_uniform
C = 5; D = 2; L = 4
d = 1 2
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
vertices, cells, dof_map = mesh_uniform(
N_e=2, d=d, Omega=[0,L], symbolic=False) Fig. 5.2 Finite element and exact solution using two cells.
essbc = {}
essbc[dof_map[-1][-1]] = D
c, A, b, timing = finite_element1D(
vertices, cells, dof_map, essbc,
ilhs=ilhs, irhs=irhs, blhs=blhs, brhs=brhs, 5.5 Variational formulations in 2D and 3D
intrule=’GaussLegendre’)
It remains to plot the solution (with high resolution in each element). To The major difference between deriving variational formulations in 2D and
this end, we use the u_glob function imported from fe1D, which imports 3D compared to 1D is the rule for integrating by parts. The cells have
it from fe_approx1D_numit (the u_glob function in fe_approx1D.py shapes different from an interval, so basis functions look a bit different,
works with elements and nodes, while u_glob in fe_approx1D_numint and there is a technical difference in actually calculating the integrals
works with cells, vertices, and dof_map): over cells. Otherwise, going to 2D and 3D is not a big step from 1D. All
u_exact = lambda x: D + C*(x-L) + (1./6)*(L**3 - x**3) the fundamental ideas still apply.
from fe1D import u_glob
x, u, nodes = u_glob(c, cells, vertices, dof_map)
u_e = u_exact(x, C, D, L)
print u_exact(nodes, C, D, L) - c # difference at the nodes 5.5.1 Integration by parts
import matplotlib.pyplot as plt A typical second-order term in a PDE may be written in dimension-
plt.plot(x, u, ’b-’, x, u_e, ’r--’)
plt.legend([’finite elements, d=%d’ %d, ’exact’], loc=’upper left’)
independent notation as
plt.show()
∇2 u or ∇ · (α(x)∇u) .
The result is shown in Figure 5.2. We see that the solution using P1
elements is exact at the nodes, but feature considerable discrepancy The explicit forms in a 2D problem become
5.5 Variational formulations in 2D and 3D 211 212 5 Variational formulations with finite elements
∂2u ∂2u The vector field v and the scalar functions a, α, f , u0 , and g may vary
∇2 u = ∇ · ∇u = + ,
∂x2 ∂y 2 with the spatial coordinate x and must be known.
Such a second-order PDE needs exactly one boundary condition at each
and
∂

∂u

∂

∂u
point of the boundary, so ∂ΩN ∪ ∂ΩD must be the complete boundary
∇ · (a(x)∇u) = α(x, y) + α(x, y) . ∂Ω.
∂x ∂x ∂y ∂y
Assume that the boundary function u0 (x) is defined for all x ∈ Ω.
We shall continue with the latter operator as the former arises from just
The unknown function can then be expanded as
setting α = 1.
X
R u=B+ cj ψj , B = u0 .
The integration by parts formula for ∇ · (α∇) j∈Is
The general rule for integrating by parts is often referred to as As long as any ψj = 0 on ∂ΩD , we realize that u = u0 on ∂ΩD .
Green’s first identity: The variational formula is obtained from Galerkin’s method, which
technically means multiplying the PDE by a test function v and inte-
Z Z Z grating over Ω:
∂u
− ∇·(α(x)∇u)v dx = α(x)∇u·∇v dx− a v ds, (5.27) Z Z Z
Ω Ω ∂Ω ∂n (v · ∇u + βu)v dx = ∇ · (α∇u) dx + f v dx .
Ω Ω Ω
where ∂Ω is the boundary of Ω and ∂u/∂n = n ·∇u is the derivative
The second-order term is integrated by parts, according to the formula
of u in the outward normal Rdirection, n being an outward unit
(5.27):
normal to ∂Ω. The integrals R Ω () dx are area integrals in 2D and
volume integrals in 3D, while ∂Ω () ds is a line integral in 2D and a Z Z Z
∂u
surface integral in 3D. ∇ · (α∇u) v dx = − α∇u · ∇v dx + α v ds .
Ω Ω ∂Ω ∂n
Galerkin’s method therefore leads to
It will be convenient to divide the boundary into two parts:
Z Z Z Z
• ∂ΩN , where we have Neumann conditions −a ∂n ∂u
= g, and ∂u
(v · ∇u + βu)v dx = − α∇u · ∇v dx + α v ds + f v dx .
• ∂ΩD , where we have Dirichlet conditions u = u0 . Ω Ω ∂Ω ∂n Ω
The test functions v are (as usual) required to vanish on ∂ΩD . The boundary term can be developed further by noticing that v 6= 0 only
on ∂ΩN ,
Z Z
∂u ∂u
5.5.2 Example on a multi-dimensional variational problem α v ds = α v ds,
∂Ω ∂n ∂ΩN ∂n
Here is a quite general, stationary, linear PDE arising in many problems: and that on ∂ΩN , we have the condition a ∂n
∂u
= −g, so the term becomes
Z
− gv ds .
v · ∇u + βu = ∇ · (α∇u) + f, x ∈ Ω, (5.28) ∂ΩN
u = u0 , x ∈ ∂ΩD , (5.29) The final variational form is then
∂u
−α = g, x ∈ ∂ΩN . (5.30) Z Z Z Z
∂n
(v · ∇u + βu)v dx = − α∇u · ∇v dx − gv ds + f v dx .
Ω Ω ∂ΩN Ω
Instead of using the integral signs, we may use the inner product dra and box-shapes elements dominate in 3D. The finite element basis
notation: functions ϕi are, as in 1D, polynomials over each cell. The integrals in
the variational formulation are, as in 1D, split into contributions from
(v · ∇u, v) + (βu, v) = −(α∇u, ∇v) − (g, v)N + (f, v) . each cell, and these contributions are calculated by mapping a physical
cell, expressed in physical coordinates x, to a reference cell in a local
The subscript N in (g, v)N is a notation for a line or surface integral over
coordinate system X . This mapping will now be explained in detail.
∂ΩN , while (·, ·) is the area/volume integral over Ω.
We consider an integral of the type
We can derive explicit expressions for the linear system for {cj }j∈Is
that arises from the variational formulation. Inserting the u expansion Z
results in α(x)∇ϕi · ∇ϕj dx, (5.31)
Ω (e)
where the ϕi functions are finite element basis functions in 2D or 3D,
X
((v · ∇ψj , ψi ) + (βψj , ψi ) + (α∇ψj , ∇ψi ))cj = defined in the physical domain. Suppose we want to calculate this inte-
j∈Is gral over a reference cell, denoted by Ω̃ r , in a coordinate system with
coordinates X = (X0 , X1 ) (2D) or X = (X0 , X1 , X2 ) (3D). The map-
(g, ψi )N + (f, ψi ) − (v · ∇u0 , ψi ) + (βu0 , ψi ) + (α∇u0 , ∇ψi ) .
ping between a point X in the reference coordinate system and the
This is a linear system with matrix entries corresponding point x in the physical coordinate system is given by a
vector relation x(X ). The corresponding Jacobian, J, of this mapping
Ai,j = (v · ∇ψj , ψi ) + (βψj , ψi ) + (α∇ψj , ∇ψi ) has entries
∂xj
Ji,j = .
and right-hand side entries ∂Xi
The change of variables requires dx to be replaced by det J dX. The
bi = (g, ψi )N + (f, ψi ) − (v · ∇u0 , ψi ) + (βu0 , ψi ) + (α∇u0 , ∇ψi ), derivatives in the ∇ operator in the variational form are with respect to
x, which we may denote by ∇x . The ϕi (x) functions in the integral are
for i, j ∈ Is . replaced by local basis functions ϕ̃r (X ) so the integral features ∇x ϕ̃r (X ).
In the finite element method, we usually express u0 in terms of basis We readily have ∇X ϕ̃r (X ) from formulas for the basis functions in the
functions and restrict i and j to run over the degrees of freedom that reference cell, but the desired quantity ∇x ϕ̃r (X ) requires some efforts
are not prescribed as Dirichlet conditions. However, we can also keep to compute. All the details are provided below.
all the {cj }j∈Is as unknowns, drop the u0 in the expansion for u, and Let i = q(e, r) and consider two space dimensions. By the chain rule,
incorporate all the known cj values in the linear system. This has been
∂ ϕ̃r ∂ϕi ∂ϕi ∂x ∂ϕi ∂y
explained in detail in the 1D case, and the technique is the same for 2D = = + ,
and 3D problems. ∂X ∂X ∂x ∂X ∂y ∂X
and
∂ ϕ̃r ∂ϕi ∂ϕi ∂x ∂ϕi ∂y
= = + .
∂Y ∂Y ∂x ∂Y ∂y ∂Y
5.5.3 Transformation to a reference cell in 2D and 3D
We can write these two equations as a vector equation
The real power of the finite element method first becomes evident when " # " #"
∂ϕi
#
∂ ϕ̃r ∂x ∂y
we want to solve partial differential equations posed on two- and three- ∂X = ∂X ∂X ∂x
∂ ϕ̃r ∂x ∂y ∂ϕi
dimensional domains of non-trivial geometric shape. As in 1D, the domain ∂Y ∂Y ∂Y ∂y
Ω is divided into Ne non-overlapping cells. The elements have simple
Identifying
shapes: triangles and quadrilaterals are popular in 2D, while tetrahe-
" # " # " #
∂ ϕ̃r ∂x ∂y ∂ϕi
>>> x
∇X ϕ̃r = ∂X
∂ ϕ̃r , J= ∂X
∂x
∂X
∂y , ∇x ϕr = ∂x
∂ϕi , [(0.16666666666666666, 0.16666666666666666),
∂Y ∂Y ∂Y ∂y (0.66666666666666666, 0.16666666666666666),
(0.16666666666666666, 0.66666666666666666)]
we have the relation >>> w
[0.16666666666666666, 0.16666666666666666, 0.16666666666666666]
∇X ϕ̃r = J · ∇x ϕi , Rules with 1, 3, 4, and 7 points over the triangle will exactly integrate
which we can solve with respect to ∇x ϕi : polynomials of degree 1, 2, 3, and 4, respectively. In 3D, rules with 1, 4,
5, and 11 points over the tetrahedron will exactly integrate polynomials
∇x ϕi = J −1 · ∇X ϕ̃r . (5.32) of degree 1, 2, 3, and 4, respectively.
On the reference cell, ϕi (x) = ϕ̃r (X ), so
∇x ϕ̃r (X ) = J −1 (X ) · ∇X ϕ̃r (X ) . (5.33) 5.5.5 Convenient formulas for P1 elements in 2D
This means that we have the following transformation of the integral We shall now provide some formulas for piecewise linear ϕi functions and
in the physical domain to its counterpart over the reference cell: their integrals in the physical coordinate system. These formulas make it
convenient to compute with P1 elements without the need to work in the
Z Z reference coordinate system and deal with mappings and Jacobians. A
α(x)∇x ϕi ·∇x ϕj dx = α(x(X ))(J −1 ·∇X ϕ̃r )·(J −1 ·∇ϕ̃s ) det J dX lot of computational and algorithmic details are hidden by this approach.
Ω (e) Ω̃ r
(5.34) Let Ω (e) be cell number e, and let the three vertices have global
vertex numbers I, J, and K. The corresponding coordinates are (xI , yI ),
(xJ , yJ ), and (xK , yK ). The basis function ϕI over Ω (e) have the explicit
formula
5.5.4 Numerical integration
1
Integrals are normally computed by numerical integration rules. For ϕI (x, y) = ∆ (αI + βI x + γI y) , (5.35)
2
multi-dimensional cells, various families of rules exist. All of them are
R P where
similar to what is shown in 1D: f dx ≈ j wi f (xj ), where wj are
weights and xj are corresponding points.
The file numint.py contains the functions quadrature_for_triangles(n) αI = xJ yK − xK yJ , (5.36)
and quadrature_for_tetrahedra(n), which returns lists of points and
βI = yJ − yK , (5.37)
weights corresponding to integration rules with n points over the
reference triangle with vertices (0, 0), (1, 0), (0, 1), and the reference γI = xK − xJ , , (5.38)
tetrahedron with vertices (0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), respectively.
and
For example, the first two rules for integration over a triangle have 1
 
and 3 points: 1 x I yI
 
>>> import numint 2∆ = det  1 xJ yJ  . (5.39)
>>> x, w = numint.quadrature_for_triangles(num_points=1) 1 x K yK
>>> x
[(0.3333333333333333, 0.3333333333333333)] The quantity ∆ is the area of the cell.
>>> w
[0.5]
The following formula is often convenient when computing element
>>> x, w = numint.quadrature_for_triangles(num_points=3) matrices and vectors:
Z Remark. The mathematical theory of finite element methods is primar-

p!q!r! ily developed for to stationary PDE problems of elliptic nature whose
ϕpI ϕqJ ϕrK dxdy = 2∆ . (5.40)
Ω (e) (p + q + r + 2)! solutions are smooth. However, such problems can be solved with the
(Note that the q in this formula is not to be mixed with the q(e, r) desired accuracy by most numerical methods and pose no difficulties.
mapping of degrees of freedom.) Time-dependent problems, on the other hand, easily lead to non-physical
R
As an example, the element matrix entry Ω (e) ϕI ϕJ dx can be com- features in the numerical solutions and therefore requires more care and
puted by setting p = q = 1 and r = 0, when I 6= J, yielding ∆/12, and knowledge by the user. Our focus on the accuracy of the finite element
p = 2 and q = r = 0, when I = J, resulting in ∆/6. We collect these method will of this reason be centered around time-dependent problems,
numbers in a local element matrix: but then we need a different set of tools for the analysis. These tools are
  based on converting finite element equations to finite difference form and
211 studying Fourier wave components.
∆ 
1 2 1 kam 15: This is really two remarks in one and we can split and put
12
112 them in more appropriate place. Time is dealt with later.
R
The common element matrix entry Ω (e) ∇ϕI · ∇ϕJ dx, arising from a Abstract variational forms. To list the main results from the mathemat-
Laplace term ∇2 u, can also easily be computed by the formulas above. ical theory of finite elements, we consider linear PDEs with an abstract
We have variational form
∆2 a(u, v) = L(v) ∀v ∈ V .
∇ϕI · ∇ϕJ = (βI βJ + γI γJ ) = const,
4
This is the discretized problem (as usual in this book) where we seek
so that the element matrix entry becomes 14 ∆3 (βI βJ + γI γJ ).
u ∈ V . The weak formulation of the corresponding continuous problem,
From an implementational point of view, one will work with local
fulfilled by the exact solution ue ∈ Ve is here written as
vertex numbers r = 0, 1, 2, parameterize the coefficients in the basis
functions by r, and look up vertex coordinates through q(e, r).
a(ue , v) = L(v) ∀v ∈ Ve .
Similar formulas exist for integration of P1 elements in 3D.
The space V is finite dimensional (with dimension N + 1), while Ve is
infinite dimensional. Normally The hope is that u → ue as N → ∞ and
5.5.6 A glimpse of the mathematical theory of the finite V → Ve .
element method Example on an abstract variational form and associated spaces. Con-
sider the problem −u00 (x) = f (x) on Ω = [0, 1], with u(0) = 0 and
Almost all books on the finite element method that introduces the
u0 (1) = β. The weak form is
abstract variational problem a(u, v) = L(v) spend considerable pages on
deriving error estimates and other properties of the approximate solution. Z 1 Z 1
The machinery with function spaces and bilinear and linear forms has a(u, v) = u0 v 0 dx, L(v) = f vdx + βv(1) .
0 0
the great advantage that a very large class of PDE problems can be
The space V for the approximate solution u can be chosen in many ways
analyzed in a unified way. This feature is often taken as an advantage of
as previously described. The exact solution ue fulfills a(u, v) = L(v) for all
finite element methods over finite difference and volume methods. Since
v in Ve , and to specify what Ve is, we need to introduce Hilbert spaces. The
there are so many excellent textbooks on the mathematical properties of
Hilbert space L2 (Ω) consists of all functions that are square-integrable
finite element methods [13, 3, 4, 8, 7, 16], this text will not repeat the
on Ω:
theory, but give a glimpse of typical assumptions and general results for
elliptic PDEs.
Z
L2 (Ω) = v 2 dx < ∞ . a(ue , v) = L(v) ∀v ∈ Ve .
(This result is known as the Lax-Milgram Theorem. We remark that
Ω
The space Ve is the space of all functions whose first-order derivative is symmetry is not strictly needed for this theorem.)
also square-integrable:
Stability. The solution ue ∈ Ve obeys the stability estimate
c0
dv
Ve = H01 (Ω) = v ∈ L2 (Ω) | ∈ L2 (Ω), and v(0) = 0 . c2
. ||u|| ≤
dx
Equivalent minimization problem. The solution ue ∈ Ve also fulfills
The requirements of square-integrable zeroth- and first-order derivatives
the minimization problem
are motivated from the formula for a(u, v) where products of the first-
order derivatives are to be integrated on Ω. We remark that it is common 1
that H01 denote the space of H 1 functions that are zero everywhere on min F (v),
F (v) = a(v, v) − L(v) .
v∈Ve 2
the boundary, but here we use it for functions that are zero only at x = 0.
Best approximation principle. The energy norm is defined as
The Sobolev space H01 (Ω) has an inner product
q
Z
du dv ||v||a = a(v, v) .
(u, v)H 1 = (uv + )dx,
Ω dx dx The discrete solution u ∈ V is the best approximation in energy norm,
and associated norm
q ||ue − u||a ≤ ||ue − v||a ∀v ∈ V .
||v||H 1 = (v, v)H 1 . This is quite remarkable: once we have V (i.e., a mesh and a finite
Assumptions. A set of general results builds on the following assump- element), the Galerkin method finds the Rbest approximation in this space.
tions. Let Ve be an infinite-dimensional inner-product space such that In the example above, we have ||v||a = 01 (v 0 )2 dx, so the derivative u0 is
ue ∈ Ve . The space has an associated norm ||v|| (e.g., ||v||H 1 in the closer to u0e than any other possible function in V :
example above with Ve = H01 (Ω)). Z 1 Z 1
1. L(v) is linear in its argument. (u0e − u0 )2 dx ≤ (u0 − v 0 )dx ∀v ∈ V .

0 0
2. a(u, v) is a bilinear in its arguments. Best approximation property in the norm of the space. If ||v|| is the
3. L(v) is bounded (also called continuous) if there exists a positive norm associated with Ve , we have another best approximation property:
constant c0 such that |L(v)| ≤ c0 ||v|| ∀v ∈ Ve .
4. a(u, v) is bounded (or continuous) if there exists a positive constant c1 2
1
c1 such that |a(u, v)| ≤ c1 ||u||||v|| ∀u, v ∈ Ve . ||ue − u|| ≤
c2
||ue − v|| ∀v ∈ V .
5. a(u, v) is elliptic (or coercive) if there exists a positive constant c2
Symmetric, positive definite coefficient matrix. The discrete problem
such that a(v, v) ≥ c2 ||v||2 ∀v ∈ Ve .
a(u, v) = L(v) ∀v ∈ V leads to a linear system Ac = b, where the
6. a(u, v) is symmetric: a(u, v) = a(v, u).
coefficient matrix A is symmetric (AT = A) and positive definite (xT Ax >
Based on the above assumptions, which must be verified in each specific 0 for all vectors x 6= 0). One can then use solution methods that demand
problem, one can derive some general results that are listed below. less storage and that are faster and more reliable than solvers for general
linear systems. One is also guaranteed the existence and uniqueness of
Existence and uniqueness. There exists a unique solution of the prob-
the discrete solution u.
lem: find ue ∈ Ve such that
Equivalent matrix minimization problem. The solution c of the linear We remark that the second estimate strictly speaking requires extra
system Ac = b also solves the minimization problem minw ( 12 wT Aw − bT w smoothness (regularity).
in the vector space RN +1 .
A priori error estimate for the derivative. In our sample problem,
−u00 = f on Ω = [0, 1], u(0) = 0, u0 (1) = β, one can derive the following 5.6 Implementation in 2D and 3D via FEniCS
error estimate for Lagrange finite element approximations of degree s:
From a principle of view, we have seen that variational forms of the
Z 1 21 type: find a(u, v) = L ∀v ∈ V (and even general nonlinear problems
(u0e − u0 )2 dx ≤ Chs ||ue ||H s+1 , F (u; v) = 0), can apply the computational machinery of introduced for
0
the approximation problem u = f . We actually need two extensions only:
where ||u||H s+1 is a norm that integrates the sum of the square of all
derivatives up to order s + 1, C is a constant, and h is the maximum cell 1. specify Dirichlet boundary conditions as part of V
length. The estimate shows that choosing elements with higher-degree 2. incorporate Neumann flux boundary conditions in the variational form
polynomials (large s) requires more smoothness in ue since higher-order
The algorithms are all the same in any space dimension, we only need to
derivatives need to be square-integrable.
choose the element type and associated integration rule. Once we know
A consequence of the error estimate is that u0 → u0e as h → 0, i.e., the
how to compute things in 1D, and made the computer code sufficiently
approximate solution converges to the exact one.
flexible, the method and code should work for any variational form in
The constant C in depends on the shape of triangles in 2D and
any number of space dimensions! This fact is exactly the idea behind the
tetrahedra in 3D: squeezed elements with a small angle lead to a large
FEniCS finite element software.
C, and such deformed elements are not favorable for the accuracy.
Therefore, if we know how to set up an approximation problem in
One can generalize the above estimate to the general problem class
any dimension in FEniCS, and know how to derive variational forms in
a(u, v) = L(v): the error in the derivative is proportional to hs . Note that
higher dimensions, we are (in principle!) very close to solving a PDE
the expression ||ue − u|| in the example is ||ue − u||H 1 so it involves the
problem in FEniCS. Building on the Section 3.7, we shall now solve a
sum of the zeroth and first derivative. The appearance of the derivative
quite general 1D/2D/3D Poisson problem in FEniCS. There is very much
makes the error proportional to hs - if we only look at the solution it
more FEniCS programming than this example, but it illustrates our fact
converges as hs+1 (see below).
that when we go beyond 1D, there is exists software which leverage the
The above estimate is called an a priori estimate because the bound
full power of the finite element method as a method for solving “any”
contains the exact solution, which is not computable. There are also
problem in any number of space dimensions.
a posteriori estimates where the bound involves the approximation u,
which is available in computations.
A priori error estimate for the solution. The finite element solution of 5.6.1 Mathematical problem
our sample problem fulfills
The following model describes the pressure u in the flow around a bore
||ue − u|| ≤ Ch s+1
||ue ||H s+1 , hole of radius a in a porous medium. If the hole is long in the vertical
direction, we can model it by a 2D domain in the cross section.
This estimate shows that the error converges as h2 for P1 elements. An
equivalent finite difference method, see Section 5.1.3, is known to have
an error proportional to h2 , so the above estimate is expected. In general, ∇ · (α∇u) = 0, a < ||x|| < b, (5.41)
the convergence is hs+1 for elements with polynomials of degree s. Note u(x) = Ua , ||x|| = a, (5.42)
that the estimate for u0 is proportional to h raised to one power less. u(x) = Ub ||x|| = b . (5.43)
5.6 Implementation in 2D and 3D via FEniCS 223 224 5 Variational formulations with finite elements
That is, we have a hollow circular 2D domain with inner radius a and
outer radius b. The pressure is known on these two boundaries, so this is
a pure Dirichlet problem.
Symmetry. The first thing we should observe is that the problem is
radially symmetric, so we can change to polar coordinates and obtain a
1D problem in the radial direction:
(rαu0 )0 = 0, u(a) = Ua , u(b) = Ub .

This is not very exciting beyond being able to find an analytical solution
and compute the true error of a finite element approximation.
However, many software packages solve problems in Cartesian coordi-
nates, and FEniCS basically do this, so we want to take advantage of
symmetry in Cartesian coordinates and reformulate the problem in a
smaller domain.
Looking at the domain as a cake with a hole, any piece of the cake
will be a candidate for a reduced-size domain. The solution is symmetric
about any line θ = const in polar coordinates, so at such lines we have the
symmetry boundary condition ∂u/∂n = 0, i.e., a homogeneous Neumann
condition. In Figure 5.3 we have plotted a possible mesh of cells as
triangles, here with dense refinement toward the bore hole, because we Fig. 5.3 Mesh of a hollow cylinder, with refinement and utilizing symmetry.
know the solution will decay most rapidly toward the origin. This mesh
is a piece of the cake with four sides: Dirichlet conditions on the inner
and outer boundary, named ΓDa and ΓDb , and ∂u/∂n = 0 on the two Z
other sides, named ΓN . In this particular example, the arc of the piece ∇ · (α∇u)v dx = 0, ∀v ∈ V
of the cake is 45 degrees, but any value of the arc will work. Ω
Z Z
The boundary problem can then be expressed as ∂u
=− α∇u · ∇v dx + α v ds
Ω ΓN ∂n
Z
∇ · (α∇u) = 0, x ∈ Ω, (5.44) =− α∇u · ∇v dx .
Ω
u(x) = Ua , x ∈ ΓDa , (5.45)
We are left with a problem of the form: find u such that a(u, v) =
u(x) = Ub , x ∈ ΓDb , (5.46)
L(v) ∀v ∈ V , with
∂u
= 0, x ∈ ΓN . (5.47)
∂n Z
a(u, v) = α∇u · ∇v dx, (5.48)
ZΩ
5.6.2 Variational formulation L(v) = 0v dx . (5.49)
Ω
To obtain the variational formulation, we multiply the PDE by a test
function v and integrate the second-order derivatives by part:
We write the integrand as 0v dx even though L = 0, because it is necessary # Define variational problem
in FEniCS to specify L as a linear form and not the number zero. The u = TrialFunction(V)
v = TestFunction(V)
Dirichlet conditions make a nonzero solution. a = alpha*dot(grad(u), grad(v))*dx
L = Constant(0)*v*dx # L = 0*v*dx = 0 does not work...
# Compute solution
5.6.3 The FEniCS solver u = Function(V)
solve(a == L, u, bcs)
Suppose we have a function make_mesh that can make the mesh for us.
f = File("mesh.xml")
All we need to do in the solver and that has not been exemplified before, f << mesh
is to define V with proper Dirichlet conditions. It is easy to do so as
long as the Neumann conditions are zero. Otherwise, we will have to In order to avoid L=0 (L equal to the float zero), we have to tell FEniCS
do a line integral along the boundary and that brings in quite some that is a linear form, so zero must be specified as Constant(0).
concepts in FEniCS about how to mark boundaries [12]. Fortunately, Note that everything is the same as for the approximation problem,
a lot of problems have homogeneous Neumann conditions (thanks to except for the Dirichlet conditions and the formulas for a and L. FEniCS
symmetries!), so the example here can be extended and become useful in has, of course, access to very efficient solution methods, so we could add
many contexts. arguments to the solve call to apply state-of-the-art iterative methods
We have to write functions for testing whether a point is on a Dirichlet and preconditioners for large-scale problems. However, for this little 2D
boundary or not: case a standard sparse Gaussian elimination, as implied by solve(a =
L, u, bcs) is the most efficient and reliable approach.
V = FunctionSpace(mesh, ’P’, degree) Finally, we can save the solution to file for using professional visual-
# Define Dirichlet boundary conditions ization software and, if desired, add a quick plotting using the built-in
from math import sqrt FEniCS tool plot:
def inner(x, on_boundary):
"""Return True if x on r=a with tolerance.""" # Save solution to file in VTK format
r = on_boundary and \ vtkfile = File(filename + ’.pvd’)
abs(sqrt(x[0]**2 + x[1]**2) - x_a) < 1E-2 vtkfile << u
print ’XXXa’, r, x[0], x[1], abs(sqrt(x[0]**2 + x[1]**2) - x_a), on_boundary
return r u.rename(’u’, ’u’); plot(u); plot(mesh)
interactive()
def outer(x, on_boundary):
"""Return True if x on r=b with tolerance.""" (The u.rename call is just for getting a more readable title in the plot.)
r = on_boundary and \ The above statements are collected in a function solver in the file
abs(sqrt(x[0]**2 + x[1]**2) - x_b) < 1E-2
print ’XXXb’, r, x[0], x[1], abs(sqrt(x[0]**2 + x[1]**2) - x_b), on_boundary borehole_fenics.py:
return r
def solver(alpha, # Diffusion coefficient
Note here that we test with a tolerance since the points on the boundary u_a, # Inner pressure
u_b, # Outer pressure
may be subject to rounding errors when making the mesh coordinates. Theta, # Arc size
We then use the DirichletBC object to make different kinds of Dirich- x_a, # Inner boundary
let conditions, here two, and collect them in a list ‘bcs: x_b, # Outer boundary
nr, # Resolution r direction
bc_inner = DirichletBC(V, u_a, inner) nt, # Resolution azimuthal direction
bc_outer = DirichletBC(V, u_b, outer) degree, # Element polynomial degree
bcs = [bc_inner, bc_outer] filename, # Name of VTK file
):
The next step is to define the variational problem and solve it:
y = mesh.coordinates()[:,1]
Be careful with name clashes! s = 3.5
It is easy when coding mathematics to use variable names that
def denser(x, y):
correspond to one-letter names in the mathematics. For example, return [a + (b-a)*((x-a)/(b-a))**s, y]
in the mathematics of this problem there are to a variables: the
radius of the inner boundary and the bilinear form in the variational x_bar, y_bar = denser(x, y)
xy_bar_coor = np.array([x_bar, y_bar]).transpose()
formulation. Using a for the inner boundary in solver does not mesh.coordinates()[:] = xy_bar_coor
work: it is quickly overwritten by the bilinear form. We therefore
have to introduce x_a. Long variable names are to be preferred # Map onto to a "piece of cake"
for safe programming, though short names corresponding to the def cylinder(r, s):
mathematics are nicer... return [r*np.cos(Theta*s), r*np.sin(Theta*s)]
x_hat, y_hat = cylinder(x_bar, y_bar)

xy_hat_coor = np.array([x_hat, y_hat]).transpose()
mesh.coordinates()[:] = xy_hat_coor
return mesh
5.6.4 Making the mesh
We could also have used the mesh tool mshr in FEniCS, but with our
The hardest part of a finite element problem is very often to make the
approach here we have full control of the refinement towards the hole.
mesh. Here the idea is to first make a rectangle, then make the denser
toward the left end, and then bend it to get the form of the part of a
hole.
Let x and y be the coordinates of a vertex in the mesh that is a 5.6.5 Solving a problem
rectangle (0, a) × (0, b). The stretching towards x = a is done by mapping We assume that α is constant. Before solving such a specific problem, it

x−a
s can be wise to scale the problem since it often reduces the amount of
x̄ = a + (b − a) . (5.50) input data in the model. Here, the variation in u is typically |ua − ub | so
b−a
we use that as characteristic pressure. The coordinates may be naturally
A stretching towards x = b is given by scaled by the bore hole radius, so we have new, scaled variables

x − a 1/s u − ua x y
x̄ = a + (b − a) . (5.51) ū = , x̄ = , ȳ = .
b−a ua − ub a a
The parameter s controls the amount of stretching. The code below shows Now, we expect ū ∈ [0, 1], which is a goal of scaling. Inserting this in the
the details of mapping the coordinates of FEniCS mesh. problem gives the PDE
Mapping of a rectangle onto a our geometry is done by
∇2 ū = 0
x̂ = x̄ cos(Θȳ), ŷ = x̄ sin(Θȳ) .
in a domain with inner radius 1 and ū = 0, and outer radius
We are now ready for the Python code that codes these formulas and
a
manipulates the FEniCS mesh: β= ,
b
def make_mesh(Theta, a, b, nr, nt): with ū = 1. Our solver can solve this problem by setting alpha=1, u_a=0,
mesh = RectangleMesh(Point(a, 0), Point(b, 1), nr, nt, ’crossed’)
u_b=0, x_a=1, x_b=beta.
# First make a denser mesh towards r=a
x = mesh.coordinates()[:,0]
5.7 Convection-diffusion and Petrov-Galerkin methods 229 230 5 Variational formulations with finite elements
hpl 16: Show plots from paraview. hpl 17: Do 3D automatically. A Galerkin approximation with a finite element approximation is
P
hpl 18: Do 1D Dirichlet model problem from previous sections (exercise!). obtained by letting u = j∈Is cj ψj (x) and v = ψi (x) which leads to a
P
linear system of equations j Ai,j cj = bi where
Z 1
5.7 Convection-diffusion and Petrov-Galerkin methods Ai,j = µψi0 ψj0 + ψj0 ψi dx,
0
Z 1
bi = 0ψi dx .
Let us now return to the problem in Section 5.5.2 0
Figure ?? shows the finite element solution on a coarse mesh of 10

v · ∇u + βu = ∇ · (α∇u) + f, x ∈ Ω, (5.52) elements for µ = 0.01. Clearly, the finite element method has the same
u = u0 , x ∈ ∂ΩD , (5.53) problem as was observed earlier in global functions in Section 5.5.2.
For finite differences, the cure for these oscillations is upwinding. That
∂u
−α = g, x ∈ ∂ΩN . (5.54) is, instead of using a central difference scheme, we employ the following
∂n difference scheme:
In Section 4.5 we investigated the simplified case in 1D
du 1
(xi ) = [ui+1 − ui ] if v < 0,
−ux − αuxx = 0, dx h
du 1
u(0) = 0, u(1) = 1 . (xi ) = [ui − ui−1 ] if v > 0 .
dx h
and the analytical solution was: Using this scheme, oscillations will disappear as can be seen in Figure ??.
There is a relationship between upwinding and artificial diffusion. If
e−x/α − 1 we discretize ux with a central difference and add diffusion as = h/2∆
ue (x) = .
e−1/α − 1 we get
The approximation with global functions failed when α was signifi-
cantly smaller than v . The computed solution contained non-physical ui+1 − ui−1
oscillations that were orders of magnitude larger than the true solution. central scheme, first order derivative
2h
The approximation did however improve as the number of degrees of h −ui+1 + 2ui − ui−1
freedom increased. + central scheme, second order derivate
2 h2
The variational problem is: Find u ∈ V such that ui − ui−1
= upwind scheme
h
a(u, v) = L(v)∀v
Hence, upwinding is equivalent to adding artificial diffusion with
where = h/2; that is, in both cases we actually solve the problem
Z 1 −(µ + )uxx + vux = f .

a(u, v) = −ux v + µux vx dx,
Z
0 using a central difference scheme.
1
Finite difference upwinding is difficult to express using finite elements
L(v) = 0v dx = 0.
0 methods, but it is closely to adding some kind of diffusion to the scheme.
5.7 Convection-diffusion and Petrov-Galerkin methods 231 232 5 Variational formulations with finite elements
A clever method of adding diffusion is the so-called Petrov-Galerkin Expression("(exp(-x[0]/%e) - 1)/ (exp(-1/%e) - 1)" % \
method. The Galerkin approximation consist of finding u ∈ V such that (1.0e-2, 0.5))
mesh = UnitIntervalMesh(10)
for all v ∈ V the variational formulation is satisfied. V = FunctionSpace(mesh, "CG", 1)
Z v = TestFunction(V)
1
a(u, v) = −ux v + µux vx dx, alpha = Constant(1.0e-2)
0
Z beta = Constant(0.5)
1
f = Constant(0)
L(v) = 0v dx = 0. h = mesh.hmin()
0
v = v - beta*h*v.dx(0) #Petrov-Galerkin
The Petrov-Galerkin method is a seemingly innocent and straightforward
a = (-u.dx(0)*v + alpha*u.dx(0)*v.dx(0) + beta*h*u.dx(0)*v.dx(0))*dx
extension of Galerkin where we want to find u ∈ V such that for all L = f*v*dx
w ∈ W the variational formulation is fulfilled.
bc = DirichletBC(V, u_analytical, boundary)
u = Function(V)
Z 1 solve(a == L, u, bc)
a(u, w) = −ux w + µux wx dx,
0 Notice that the Petrov-Galerkin method is here implemented as v =
Z 1
v - beta*h*v.dx(0).
L(w) = 0w dx = 0.
0
However, W cannot be chosen freely. For instance, to obtain a quadratic

matrix the number of basis functions in V and W must be the same. Let 5.8 Summary
w = v + β v · ∇w.
P
• When approximating f by u = j cj ϕj , the least squares method
and the Galerkin/projection method give the same result. The inter-
Aij = a(ψi , ψj + βv · ∇ψj ) polation/collocation method is simpler and yields different (mostly
Z Z
= µ∇ψi · ∇(ψj + βv · ∇ψj ) dx + v∇ψi · (ψj + βv · ∇ψj ) dx inferior) results.
ZΩ Z Ω • Fourier series expansion can be viewed as a least squares or Galerkin
= µ∇ψi · ∇ψj dx + v · ∇ψi ψj dx approximation procedure with sine and cosine functions.
|Ω {z Ω
} • Basis functions should optimally be orthogonal or almost orthogonal,
Z standard Galerkin
Z
because this gives little round-off errors when solving the linear system,
and the coefficient matrix becomes diagonal or sparse. One way to
+β µ∇ψi · ∇(v · ∇ψj ) dx + β (v · ∇ψi )(v · ∇ψj ) dx
| Ω
{z } | Ω
{z }
obtain almost orthogonal basis functions is to localize the basis by
=0 for linear elements artificial diffusion in v direction making the basis functions that have local support in only a few cells
of a mesh. This is utilized in the finite element method.
The corresponding FEniCS code is
• Finite element basis functions are piecewise polynomials, normally
from fenics import * with discontinuous derivatives at the cell boundaries. The basis func-
import matplotlib.pyplot as plt tions overlap very little, leading to stable numerics and sparse matrices.
def boundary(x): • To use the finite element method for differential equations, we use the
return x[0] < 1E-15 or x[0] > 1.0 - 1E-15 Galerkin method or the method of weighted residuals to arrive at a
variational form. Technically, the differential equation is multiplied
u_analytical = \
by a test function and integrated over the domain. Second-order
5.8 Summary 233 234 5 Variational formulations with finite elements
derivatives are integrated by parts to allow for typical finite element • Neumann conditions, involving prescribed values of the derivative (or
basis functions that have discontinuous derivatives. flux) of u, are incorporated in boundary terms arising from integrating
• The least squares method is not much used for finite element solution terms with second-order derivatives by part. Forgetting to account
of differential equations of second order, because it then involves for the boundary terms implies the condition ∂u/∂n = 0 at parts of
second-order derivatives which cause trouble for basis functions with the boundary where no Dirichlet condition is set.
discontinuous derivatives.
• We have worked with two common finite element terminologies and
associated data structures (both are much used, especially the first
one, while the other is more general): 5.9 Exercises
1. elements, nodes, and mapping between local and global node num-
bers Exercise 5.1: Compute the deflection of a cable with 2 P1
2. an extended element concept consisting of cell, vertices, degrees elements
of freedom, local basis functions, geometry mapping, and mapping Solve the problem for u in Exercise 4.2 using two P1 linear elements.
between local and global degrees of freedom Incorporate the condition u(0) = 0 by two methods: 1) excluding the
• The meaning of the word "element" is multi-fold: the geometry of unknown at x = 0, 2) keeping the unknown at x = 0, but modifying the
a finite element (also known as a cell), the geometry and its basis linear system.
functions, or all information listed under point 2 above. Filename: cable_2P1.
• One normally computes integrals in the finite element method element
by element (cell by cell), either in a local reference coordinate system
or directly in the physical domain. Exercise 5.2: Compute the deflection of a cable with 1 P2
• The advantage of working in the reference coordinate system is that the element
mathematical expressions for the basis functions depend on the element
type only, not the geometry of that element in the physical domain. Solve the problem for u in Exercise 4.2 using one P2 element with
The disadvantage is that a mapping must be used, and derivatives quadratic basis functions.
must be transformed from reference to physical coordinates. Filename: cable_1P2.
• Element contributions to the global linear system are collected in an
element matrix and vector, which must be assembled into the global
system using the degree of freedom mapping (dof_map) or the node Exercise 5.3: Compute the deflection of a cable with a step
numbering mapping (elements), depending on which terminology load
that is used.
• Dirichlet conditions, involving prescribed values of u at the boundary, We consider the deflection of a tension cable as described in Exercise 4.2:
are mathematically taken care of via a boundary function that takes w00 = `, w(0) = w(L) = 0. Now the load is discontinuous:
on the right Dirichlet values, while the basis functions vanish at such (
`1 , x < L/2,
boundaries. The finite element method features a general expression `(x) = x ∈ [0, L] .
`2 , x ≥ L/2
for the boundary function. In software implementations, it is easier
to drop the boundary function and the requirement that the basis This load is not symmetric with respect to the midpoint x = L/2 so the
functions must vanish on Dirichlet boundaries and instead manipulate solution loses its symmetry. Scaling the problem by introducing
the global matrix system (or the element matrix and vector) such
that the Dirichlet values are imposed on the unknown parameters. x̄ =
x
, u=
w
, `¯ =
` − `1
.
L/2 wc `2 − `1
5.9 Exercises 235 236 5 Variational formulations with finite elements
This leads to a scaled problem on [0, 2] (we rename x̄ as x for convenience): b) Derive the variational form of this problem.
c) Introduce a finite element mesh with uniform partitioning. Use P1
(
1, x < 1, elements and compute the element matrix and vector for a general
¯ =
u00 = `(x) x ∈ (0, 1), u(0) = 0, u(2) = 0 .
0, x ≥ 1 element.
d) Incorporate the boundary conditions and assemble the element con-
a) Find the analytical solution of the problem. tributions.
Hint. Integrate the equation separately for x < 1 and x > 1. Use the
e) Identify the resulting linear system as a finite difference discretization
conditions that u and u0 must be continuous at x = 1.
of the differential equation using
b) Use ψi = sin((i + 1) πx2 ), i = 0, . . . , N and the Galerkin method to
P
find an approximate solution u = j cj ψj . Plot how fast the coefficients [D2x u = Dx Dx u]i .
cj tend to zero (on a log scale).
f) Compute the numerical solution and plot it together with the exact
c) Solve the problem with P1 finite elements. Plot the solution for
solution for a mesh with 20 elements and = 10, 1, 0.1, 0.01.
Ne = 2, 4, 8 elements.
Filename: convdiff1D_P1.
Filename: cable_discont_load.
Exercise 5.6: Investigate exact finite element solutions

Exercise 5.4: Compute with a non-uniform mesh
a) Derive the linear system for the problem −u00 = 2 on [0, 1], with Consider
u(0) = 0 and u(1) = 1, using P1 elements and a non-uniform mesh. The
vertices have coordinates x0 = 0 < x1 < · · · < xNn −1 = 1, and the length −u00 (x) = xm , x ∈ (0, L), u0 (0) = C, u(L) = D,
of cell number e is he = xe+1 − xe . where m ≥ 0 is an integer, and L, C, and D are given numbers. Utilize
b) It is of interest to compare the discrete equations for the finite element a mesh with two (non-uniform) elements: Ω (0) = [0, 3] and Ω (0) = [3, 4].
method in a non-uniform mesh with the corresponding discrete equations Plot the exact solution and the finite element solution for polynomial
arising from a finite difference method. Go through the derivation of degree d = 1, 2, 3, 4 and m = 0, 1, 2, 3, 4. Find values of d and m that
the finite difference formula u00 (xi ) ≈ [Dx Dx u]i and modify it to find a make the finite element solution exact at the nodes in the mesh.
natural discretization of u00 (xi ) on a non-uniform mesh. Compare the Hint. Use the mesh_uniform, finite_element1D, and u_glob2 func-
finite element and difference discretizations tions from the fe1D.py module.
Filename: nonuniform_P1. Filename: u_xx_xm_P1to4.
Problem 5.5: Solve a 1D finite element problem by hand Exercise 5.7: Compare finite elements and differences for a
The following scaled 1D problem is a very simple, yet relevant, model radially symmetric Poisson equation
for convective transport in fluids:
We consider the Poisson problem in a disk with radius R with Dirichlet
conditions at the boundary. Given that the solution is radially
p symmetric
u = u ,
0 00
u(0) = 0, u(1) = 1, x ∈ [0, 1] . (5.55)
and hence dependent only on the radial coordinate (r = x2 + y 2 ), we
a) Find the analytical solution to this problem. (Introduce w = u , solve
0
can reduce the problem to a 1D Poisson equation
the first-order differential equation for w(x), and integrate once more.)
5.9 Exercises 237 238 5 Variational formulations with finite elements

1 d du u(x) = α + β(1 + L2 ) tan−1 (x), (5.58)
− r = f (r), r ∈ (0, R), u0 (0) = 0, u(R) = UR . (5.56)
r dr dr
is an exact solution if f (x) = γu.
a) Derive a variational form of (5.56) by integrating over the whole disk, Derive a variational formulation and compute general expressions for
or posed equivalently: use a weighting function 2πrv(r) and integrate r the element matrix and vector in an arbitrary element, using P1 elements
from 0 to R. and a uniform partitioning of [0, L]. The right-hand side integral is
b) Use a uniform mesh partition with P1 elements and show what the challenging and can be computed by a numerical integration rule. The
resulting set of equations becomes. Integrate the matrix entries exact by Trapezoidal rule (3.50) gives particularly simple expressions. Filename:
hand, but use a Trapezoidal rule to integrate the f term. atan1D_P1.
c) Explain that an intuitive finite difference method applied to (5.56)

gives
Exercise 5.9: Solve a 2D Poisson equation using polynomials
1 1 and sines
2
ri+ 1 (ui+1 − ui ) − ri− 1 (ui − ui−1 ) = fi , i = rh .
ri h 2 2
The classical problem of applying a torque to the ends of a rod can be
For i = 0 the factor 1/ri seemingly becomes problematic. One must modeled by a Poisson equation defined in the cross section Ω:
always have u0 (0) = 0, because of the radial symmetry, which implies
u−1 = u1 , if we allow introduction of a fictitious value u−1 . Using this −∇2 u = 2, (x, y) ∈ Ω,
u−1 in the difference equation for i = 0 gives
with u = 0 on ∂Ω. Exactly the same problem arises for the deflection of
a membrane with shape Ω under a constant load.
1 1
For a circular cross section one can readily find an analytical solution.
r 1 (u1 − u0 ) − r− 1 (u0 − u1 ) =
r0 h2 2 2
For a rectangular cross section the analytical approach ends up with a
1 1 sine series. The idea in this exercise is to use a single basis function to
((r0 + r1 )(u1 − u0 ) − (r−1 + r0 )(u0 − u1 )) ≈ 2(u1 − u0 ),
r0 2h2 obtain an approximate answer.
if we use r−1 + r1 ≈ 2r0 . We assume for simplicity that the cross section is the unit square:
Set up the complete set of equations for the finite difference method Ω = [0, 1] × [0, 1].
and compare to the finite element method in case a Trapezoidal rule is a) We consider the basis ψp,q (x, y) = sin((p + 1)πx) sin(qπy), p, q =
used to integrate the f term in the latter method. 0, . . . , n. These basis functions fulfill the Dirichlet condition. Use a
Filename: radial_Poisson1D_P1. Galerkin method and n = 0.
b) The basis function involving sine functions are orthogonal. Use this
property in the Galerkin method to derive the coefficients cp,q in a formula
Exercise 5.8: Compute with variable coefficients and P1 P P
u = p q cp,q ψp,q (x, y).
elements by hand
c) Another possible basis is ψi (x, y) = (x(1−x)y(1−y))i+1 , i = 0, . . . , N .
Consider the problem Use the Galerkin method to compute the solution for N = 0. Which
choice of a single basis function is best, u ∼ x(1 − x)y(1 − y) or u ∼
d du sin(πx) sin(πy)? In order to answer the question, it is necessary to search
− α(x) + γu = f (x), x ∈ Ω = [0, L], u(0) = α, u0 (L) = β .
dx dx the web or the literature for an accurate estimate of the maximum u
(5.57)
value at x = y = 1/2.
We choose α(x) = 1 + x2 . Then
Filename: torsion_sin_xy.
240 6 Time-dependent variational forms
control of the implementation and the computational efficiency in time

Time-dependent variational forms
6
and space.
We shall use a simple diffusion problem to illustrate the basic principles
of how a time-dependent PDE is solved by finite differences in time and
finite elements in space. Of course, instead of finite elements, we may
employ other types of basis functions, such as global polynomials. Our
model problem reads
∂u
= α∇2 u + f (x, t), x ∈ Ω, t ∈ (0, T ], (6.1)
∂t
u(x, 0) = I(x), x ∈ Ω, (6.2)
∂u
= 0, x ∈ ∂Ω, t ∈ (0, T ] . (6.3)
∂n
Here, u(x, t) is the unknown function, α is a constant, and f (x, t) and
The finite element method is normally used for discretization in space. I(x) are given functions. We have assigned the particular boundary
There are three alternative strategies for performing a discretization in condition (6.3) to minimize the details on handling boundary conditions
time: in the finite element method.
1. Use finite differences for time derivatives to arrive at a recursive set of

spatial problems that can be discretized by the finite element method.
2. Discretize in space by finite elements first, and then solve the resulting 6.1 Discretization in time by a Forward Euler scheme
system of ordinary differential equations (ODEs) by some standard
library for ODEs. The discretization strategy is to first apply a simple finite difference
3. Use discontinuous finite elements in the spatial direction separately. scheme in time and derive a recursive set of spatially continuous PDE
problems, one at each time level. For each spatial PDE problem we can
With the first strategy, we discretize in time prior to the space discretiza- set up a variational formulation and employ the finite element method
tion, while the second strategy consists of doing exactly the opposite. for solution.
It should come as no surprise that in many situations these two strate-
gies end up in exactly the same systems to be solved, but this is not
always the case. The third approach reproduces standard finite difference 6.1.1 Time discretization
schemes such as the Backward Euler and the Crank-Nicolson schemes,
but offers an interesting framework for deriving higher-order methods. In We can apply a finite difference method in time to (6.1). First we need
this chapter we shall be concerned with the first strategy, which is the ’a mesh’ in time, here taken as uniform with mesh points tn = n∆t,
most common strategy as it turns the time-dependent PDE problem to a n = 0, 1, . . . , Nt . A Forward Euler scheme consists of sampling (6.1)
sequence of stationary problems for which efficient finite element solution at tn and approximating the time derivative by a forward difference
strategies often are available. The second strategy would naturally employ [Dt+ u]n ≈ (un+1 − un )/∆t. A list of finite difference formulas can be
well-known ODE software, which are available as user-friendly routines found in A.1. This approximation turns (6.1) into a differential equation
in Python. However, these routines are presently not efficient enough for that is discrete in time, but still continuous in space. With a finite
PDE problems in 2D and 3D. The first strategy gives complete hands-on difference operator notation we can write the time-discrete problem as

6.1 Discretization in time by a Forward Euler scheme 241 242 6 Time-dependent variational forms
[Dt+ u = α∇2 u + f ]n , (6.4) N

X
une ≈ un = cnj ψj (x), (6.8)
for n = 1, 2, . . . , Nt − 1. Writing this equation out in detail and isolating j=0
the unknown un+1 on the left-hand side, demonstrates that the time- N
X
discrete problem is a recursive set of problems that are continuous in un+1
e ≈ un+1 = cn+1
j ψj (x) . (6.9)
space: j=0
Note that, as before, N denotes the number of degrees of freedom in the

un+1 = un + ∆t α∇2 un + f (x, tn ) . (6.5)
spatial domain. The number of time points is denoted by Nt . We define
Given u0 = I, we can use (6.5) to compute u1 , u2 , . . . , uNt . a space V spanned by the basis functions {ψi }i∈Is .
More precise notation

6.1.3 Variational forms
For absolute clarity in the various stages of the discretizations,
we introduce ue (x, t) as the exact solution of the space-and time- A Galerkin method or a weighted residual method with weighting func-
continuous partial differential equation (6.1) and une (x) as the time- tions wi can now be formulated. We insert (6.8) and (6.9) in (6.7) to
discrete approximation, arising from the finite difference method in obtain the residual
time (6.4). More precisely, ue fulfills
R = un+1 − un − ∆t α∇2 un + f (x, tn ) .
∂ue
= α∇2 ue + f (x, t), (6.6) The weighted residual principle,
∂t
Z
while un+1
e , with a superscript, is the solution of the time-discrete Rw dx = 0, ∀w ∈ W,
equations Ω
results in
un+1
e = une + ∆t α∇2 une + f (x, tn ) . (6.7)
Z h i
The un+1 quantity is then discretized in space and approximated
e un+1 − un − ∆t α∇2 un + f (x, tn ) w dx = 0, ∀w ∈ W .
by un+1 . Ω
From now on we use the Galerkin method so W = V . Isolating the

unknown un+1 on the left-hand side gives
6.1.2 Space discretization Z Z h i

un+1 v dx = un + ∆t α∇2 un + f (x, tn ) v dx, ∀v ∈ V .
We now introduce a finite element approximation to une and un+1
e in (6.7), Ω Ω
where the coefficients depend on the time level: As usual in spatial finite element problems involving second-order
R
derivatives, we apply integration by parts on the term (∇2 un )v dx:
Z Z Z
∂un
α(∇2 un )v dx = − α∇un · ∇v dx + α v dx .
Ω Ω ∂Ω ∂n
The last term vanishes because we have the Neumann condition ∂un /∂n =
0 for all n. Our discrete problem in space and time then reads
Z Z Z Z (u, v) = (un , v) − ∆t(α∇un , ∇v) + ∆t(f n , v) . (6.14)

un+1 v dx = un v dx−∆t α∇un ·∇v dx+∆t f n v dx, ∀v ∈ V .
Ω Ω Ω Ω
(6.10)
This is the variational formulation of our recursive set of spatial problems. 6.1.5 Deriving the linear systems
In the following, we adopt the previously introduced convention that the
Nonzero Dirichlet boundary conditions unknowns cn+1 are written as cj , while the known cnj from the previous
j
As in stationary problems, we can introduce a boundary function time level is simply written as cnj . To derive the equations for the new
B(x, t) to take care of nonzero Dirichlet conditions: unknown coefficients cj , we insert
N
X N
X
N
X u= cj ψj (x), un = cnj ψj (x)
ue ≈ u = B(x, tn ) +
n n
cnj ψj (x), (6.11) j=0 j=0
in (6.13) or (6.14), let the equation hold for all v = ψi , i = 0, . . . , N , and

j=0
order the terms as matrix-vector products:

N
X
un+1
e ≈ un+1 = B(x, tn+1 ) + cn+1
j ψj (x) . (6.12)
j=0
N
X N
X N
X
(ψi , ψj )cj = (ψi , ψj )cnj −∆t (∇ψi , α∇ψj )cnj +∆t(f n , ψi ), i = 0, . . . , N .
j=0 j=0 j=0
(6.15)
P
6.1.4 Notation for the solution at recent time levels This is a linear system j Ai,j cj = bi with
In a program it is only necessary to have the two variables un+1 and un Ai,j = (ψi , ψj )
at the same time at a given time step. It is therefore unnatural to use
the index n in computer code. Instead a natural variable naming is u for and
un+1 , the new unknown, and u_n for un , the solution at the previous time N N
X X
level. When we have several preceding (already computed) time levels, bi = (ψi , ψj )cnj − ∆t (∇ψi , α∇ψj )cnj + ∆t(f n , ψi ) .
it is natural to number them like u_nm1, u_nm2, u_nm3, etc., backwards j=0 j=0
in time, corresponding to un−1 , un−2 , and un−3 . Essentially, this means
It is instructive and convenient for implementations to write the linear
a one-to-one mapping of notation in mathematics and software, except
system on the form
for un+1 . We shall therefore, to make the distance between mathematics
and code as small as possible, often introduce just u for un+1 in the
M c = M c1 − ∆tKc1 + ∆tf, (6.16)
mathematical notation. Equation (6.10) with this new naming convention
is consequently expressed as where
Z Z Z Z
uv dx = un v dx − ∆t α∇un · ∇v dx + ∆t f n v dx . (6.13)
Ω Ω Ω Ω
This variational form can alternatively be expressed by the inner product

notation:
X
M = {Mi,j }, Mi,j = (ψi , ψj ), i, j ∈ Is , ψj (xi )cnj = I(xi ),
K = {Ki,j }, Ki,j = (∇ψi , α∇ψj ), i, j ∈ Is , j
f = {fi }, fi = (f (x, tn ), ψi ), i ∈ Is , where xi denotes an interpolation point. Projection (or Galerkin’s

c = {ci }, i ∈ Is , method) implies solving a linear system with M as coefficient matrix:
c1 = {cni }, i ∈ Is . X
Mi,j cnj = (I, ψi ), i ∈ Is .
We realize that M is the matrix arising from a term with the zero-
j
th derivative of u, and called the mass matrix, while K is the matrix

arising from a Laplace term ∇2 u. The K matrix is often known as the
stiffness matrix. (The terms mass and stiffness stem from the early days of 6.1.7 Example using sinusoidal basis functions
finite elements when applications to vibrating structures dominated. The Let us go through a computational example and demonstrate the algo-
mass matrix arises from the mass times acceleration term in Newton’s rithm from the previous section. We consider a 1D problem
second law, while the stiffness matrix arises from the elastic forces (the
“stiffness”) in that law. The mass and stiffness matrix appearing in a
diffusion have slightly different mathematical formulas compared to the ∂u ∂2u
= α 2, x ∈ (0, L), t ∈ (0, T ],
classic structure problem.) ∂t ∂x
(6.17)
Remark. The mathematical symbol f has two meanings, either the
u(x, 0) = A cos(πx/L) + B cos(10πx/L), x ∈ [0, L],
function f (x, t) in the PDE or the f vector in the linear system to be
(6.18)
solved at each time level.
∂u
= 0, x = 0, L, t ∈ (0, T ] .
∂x
(6.19)
6.1.6 Computational algorithm
We use a Galerkin method with basis functions
We observe that M and K can be precomputed so that we can avoid
computing the matrix entries at every time level. Instead, some matrix- ψi = cos(iπx/L) .
vector multiplications will produce the linear system to be solved. The
computational algorithm has the following steps: These basis functions fulfill (6.19), which is not a requirement (there
are no Dirichlet conditions in this problem), but helps to make the
1. Compute M and K. approximation good.
2. Initialize u0 by interpolation or projection Since the initial condition (6.18) lies in the space V where we seek the
3. For n = 1, 2, . . . , Nt : approximation, we know that a Galerkin or least squares approximation
a. compute b = M c1 − ∆tKc1 + ∆tf of the initial condition becomes exact. Therefore, the initial condition
b. solve M c = b can be expressed as
c. set c1 = c
cn1 = A, cn10 = B,
In case of finite element basis functions, interpolation of the initial while cni = 0 for i 6= 1, 10.
condition at the nodes means cnj = I(xj ). Otherwise one has to solve the The M and K matrices are easy to compute since the basis functions
linear system are orthogonal on [0, L]. Hence, we only need to compute the diagonal
entries. We get
Z L
6.1.8 Comparing P1 elements with the finite difference
Mi,i = cos2 (ixπ/L) dx,
0 method
which is computed as
We can compute the M and K matrices using P1 elements in 1D. A
>>> import sympy as sym uniform mesh on [0, L] is introduced for this purpose. Since the boundary
>>> x, L = sym.symbols(’x L’) conditions are solely of Neumann type in this sample problem, we have
>>> i = sym.symbols(’i’, integer=True)
>>> sym.integrate(sym.cos(i*x*sym.pi/L)**2, (x,0,L)) no restrictions on the basis functions ψi and can simply choose ψi = ϕi ,
Piecewise((L, Eq(pi*i/L, 0)), (L/2, True)) i = 0, . . . , N = Nn − 1.
which means L if i = 0 and L/2 otherwise. Similarly, the diagonal entries From Section 5.1.2 or 5.1.4 we have that the K matrix is the same
of the K matrix are computed as as we get from the finite difference method: h[Dx Dx u]ni , while from
Section 3.3.2 we know that M can be interpreted as the finite difference
>>> sym.integrate(sym.diff(cos(i*x*sym.pi/L),x)**2, (x,0,L)) approximation h[u + 16 h2 Dx Dx u]ni . The equation system M c = b in the
pi**2*i**2*Piecewise((0, Eq(pi*i/L, 0)), (L/2, True))/L**2
algorithm is therefore equivalent to the finite difference scheme
so
1
[Dt+ (u + h2 Dx Dx u) = αDx Dx u + f ]ni . (6.20)
6
π 2 i2 (More precisely, M c = b divided by h gives the equation above.)
M0,0 = L, Mi,i = L/2, i > 0, K0,0 = 0, Ki,i = , i > 0.
2L
Lumping the mass matrix. As explained in Section 3.3.3, one can turn
The equation system becomes the M matrix into a diagonal matrix diag(h/2, h, . . . , h, h/2) by applying
the Trapezoidal rule for integration. Then there is no need to solve a
linear system at each time level, and the finite element scheme becomes
Lc0 = Lc00 − ∆t · 0 · c00 ,
identical to a standard finite difference method
L L π 2 i2 n
ci = cni − ∆t c , i > 0.
2 2 2L i [Dt+ u = αDx Dx u + f ]ni . (6.21)
The first equation always leads to c0 = 0 since we start with cn1 = 0 for The Trapezoidal integration is not as accurate as exact integration
n = 0. The others imply and introduces an error. Normally, one thinks of any error as an overall
πi 2 n decrease of the accuracy. Nevertheless, errors may cancel each other, and
ci = (1 − ∆t(
) )ci . the error introduced by numerical integration may in certain problems
L
lead to improved overall accuracy in the finite element method. The
With the notation ci for ci at the n-th time level, we can apply the
n
interplay of the errors in the current problem is analyzed in detail in
relation above recursively and get
Section 6.4. The effect of the error is at least not more severe than what
πi 2 n 0 is produced by the finite difference method since both are O(h2 ).
) ) ci .
cni = (1 − ∆t( Making M diagonal is usually referred to as lumping the mass matrix.
L
Since only two of the coefficients are nonzero at time t = 0, we have the There is an alternative method to using an integration rule based on
closed-form discrete solution the node points: one can sum the entries in each row, place the sum on
the diagonal, and set all other entries in the row equal to zero. For P1
elements the methods of lumping the mass matrix give the same result.
π 10π 2 n
uni = A(1 − ∆t( )2 )n cos(πx/L) + B(1 − ∆t( ) ) cos(10πx/L) .
L L
6.2 Discretization in time by a Backward Euler scheme 249 250 6 Time-dependent variational forms
6.2 Discretization in time by a Backward Euler scheme 6.2.3 Linear systems

P P
Inserting u = j cj ψi and un = j cnj ψi , and choosing v to be the basis
6.2.1 Time discretization functions ψi ∈ V , i = 0, . . . , N , together with doing some algebra, lead
The Backward Euler scheme in time applied to our diffusion problem to the following linear system to be solved at each time level:
can be expressed as follows using the finite difference operator notation:
(M + ∆tK)c = M c1 + ∆tf, (6.26)
2
[Dt− u = α∇ u + f (x, t)] . n
where M , K, and f are as in the Forward Euler case. This time we
Here [Dt− u]n
≈ (u − u )/∆t. Written out, and collecting the unknown
n n−1 really have to solve a linear system at each time level. The computational
un on the left-hand side and all the known terms on the right-hand side, algorithm goes as follows.
the time-discrete differential equation becomes 1. Compute M , K, and A = M + ∆tK
2. Initialize u0 by interpolation or projection
un − ∆t α∇2 un + f (x, tn ) = un−1 . (6.22) 3. For n = 1, 2, . . . , Nt :
Equation (6.22) can compute u1 , u2 , . . . , uNt , if we have a start u0 = I a. compute b = M c1 + ∆tf
from the initial condition. However, (6.22) is a partial differential equation b. solve Ac = b
in space and needs a solution method based on discretization in space. c. set c1 = c
For this purpose we use an expansion as in (6.8)-(6.9).
In case of finite element basis functions, interpolation of the initial
condition at the nodes means cnj = I(xj ). Otherwise one has to solve the
P
6.2.2 Variational forms linear system j ψj (xi )cj = I(xi ), where xi denotes an interpolation
point. Projection (or Galerkin’s method) implies solving a linear system
P
Inserting (6.8)-(6.9) in (6.22), multiplying by any v ∈ V (or ψi ∈ V ), with M as coefficient matrix: j Mi,j cnj = (I, ψi ), i ∈ Is .
and integrating by parts, as we did in the Forward Euler case, results in
Finite difference operators corresponding to P1 elements. We know
the variational form
what kind of finite difference operators the M and K matrices correspond
Z Z Z to (after dividing by h), so (6.26) can be interpreted as the following
(un v + ∆tα∇un · ∇v) dx = un−1 v dx + ∆t f n v dx, ∀v ∈ V . finite difference method:
Ω Ω Ω
(6.23) 1
[Dt− (u + h2 Dx Dx u) = αDx Dx u + f ]ni . (6.27)
Expressed with u for the unknown un and un for the previous time level, 6
as we have done before, the variational form becomes The mass matrix M can be lumped, as explained in Section 6.1.8, and
then the linear system arising from the finite element method with P1
Z Z Z elements corresponds to a plain Backward Euler finite difference method
(uv + ∆tα∇u · ∇v) dx = un v dx + ∆t f n v dx, (6.24) for the diffusion equation:
Ω Ω Ω
or with the more compact inner product notation, [Dt− u = αDx Dx u + f ]ni . (6.28)
(u, v) + ∆t(α∇u, ∇v) = (un , v) + ∆t(f n , v) . (6.25)

6.3 Dirichlet boundary conditions 251 252 6 Time-dependent variational forms
6.3 Dirichlet boundary conditions 6.3.2 Finite element basis functions

When using finite elements, each basis function ϕi is associated with
Suppose now that the boundary condition (6.3) is replaced by a mixed
a node xi . We have a collection of nodes {xi }i∈Ib on the boundary
Neumann and Dirichlet condition, ∂ΩD . Suppose Ukn is the known Dirichlet value at xk at time tn (Ukn =
u0 (xk , tn )). The appropriate boundary function is then
u(x, t) = u0 (x, t), x ∈ ∂ΩD , (6.29) X
B(x, tn ) = Ujn ϕj .
∂
−α u(x, t) = g(x, t), x ∈ ∂ΩN . (6.30) j∈Ib
∂n
The unknown coefficients cj are associated with the rest of the nodes,
Using a Forward Euler discretization in time, the variational form at which have numbers ν(i), i ∈ Is = {0, . . . , N }. The basis functions for V
a time level becomes are chosen as ψi = ϕν(i) , i ∈ Is , and all of these vanish at the boundary
nodes as they should. The expansion for un+1 and un become
Z Z Z Z
un+1 v dx = (un −∆tα∇un ·∇v) dx+∆t f v dx−∆t gv ds, ∀v ∈ V . X X
Ω Ω Ω ∂ΩN un = Ujn ϕj + cnj ϕν(j) ,
(6.31) j∈Ib j∈Is
X X
un+1 = Ujn+1 ϕj + cj ϕν(j) .
j∈Ib j∈Is
6.3.1 Boundary function
The equations for the unknown coefficients {cj }j∈Is become
The Dirichlet condition u = u0 at ∂ΩD can be incorporated through a
boundary function B(x) = u0 (x) and demanding that v = 0 at ∂ΩD . 
Z
  
X X Z
The expansion for un is written as  ϕi ϕj dx cj =  (ϕi ϕj − ∆tα∇ϕi · ∇ϕj ) dx cn −
j
X j∈Is Ω j∈Is Ω
un (x) = u0 (x, tn ) + cnj ψj (x) . XZ
j∈Is ϕi ϕj (Ujn+1 − Ujn ) + ∆tα∇ϕi · ∇ϕj Ujn dx
Inserting this expansion in the variational formulation and letting it hold
j∈Ib Ω
Z Z
for all basis functions ψi leads to the linear system + ∆t f ϕi dx − ∆t gϕi ds, i ∈ Is .
Ω ∂ΩN
   
X Z X Z
 ψi ψj dx cn+1
=  (ψi ψj − ∆tα∇ψi · ∇ψj ) dx cn −
j j
j∈Is Ω j∈Is Ω 6.3.3 Modification of the linear system
Z
(u0 (x, tn+1 ) − u0 (x, tn ) + ∆tα∇u0 (x, tn ) · ∇ψi ) dx Instead of introducing a boundary function B we can work with basis
Ω
Z Z
functions associated with all the nodes and incorporate the Dirichlet
conditions by modifying the linear system. Let Is be the index set that
+ ∆t f ψi dx − ∆t gψi ds, i ∈ Is .
counts all the nodes: {0, 1, . . . , N = Nn − 1}. The expansion for un is
P
then j∈Is cnj ϕj and the variational form becomes
Ω ∂ΩN
6.3 Dirichlet boundary conditions 253 254 6 Time-dependent variational forms
   
X Z X Z A physical interpretation may be that u is the temperature deviation
 ϕi ϕj dx cj =  (ϕi ϕj − ∆tα∇ϕi · ∇ϕj ) dx c1,j
from a constant mean temperature in a body Ω that is subject to an
j∈Is Ω j∈Is
ZΩ Z oscillating temperature (e.g., day and night, or seasonal, variations) at
+ ∆t f ϕi dx − ∆t gϕi ds . x = 0.
We use a Backward Euler scheme in time and P1 elements of constant
Ω ∂ΩN
length h in space. Incorporation of the Dirichlet condition at x = 0
R
We introduce the matrices M and K with entries Mi,j = ϕi ϕj dx and through modifying the linear system at each time level means that we
R Ω carry out the computations as explained in Section 6.2 and get a system
Ki,j = α∇ϕi · ∇ϕj dx, respectively. In addition, we define the vectors
Ω R R
(6.26). The M and K matrices computed without paying attention to
c, c1 , and f with entries ci , c1,i , and f ϕi dx − gϕi ds, respectively. Dirichlet boundary conditions become
Ω ∂ΩN
The equation system can then be written as 
2 1 0 ··· ··· ··· ··· ··· 0

.. 
1 4 1 ...

M c = M c1 − ∆tKc1 + ∆tf . (6.32)  . 

0 1 4 1 .. . .. 
When M , K, and f are assembled without paying attention to Dirichlet  . 
. . . . .. 
boundary conditions, we need to replace equation k by ck = Uk for k  .. . . .. .. 0 .
 
corresponding to all boundary nodes (k ∈ Ib ). The modification of M h .
M =  ..
 . . . .
.. .. .. .. .. . .
.

(6.37)
6 . 
consists in setting Mk,j = 0, j ∈ Is , and the Mk,k = 1. Alternatively,  .. . . .. 

a modification that preserves the symmetry of M can be applied. At . 0 1 4 1 . .
 
each time level one forms b = M c1 − ∆tKc1 + ∆tf and sets bk = Ukn+1 ,  .. .. .. .. .. 
.

. . . . 0 
k ∈ Ib , and solves the system M c = b.  .. .. 
In case of a Backward Euler method, the system becomes (6.26). We
. . 1 4 1
can write the system as Ac = b, with A = M + ∆tK and b = M c1 + f . 0 ··· ··· ··· ··· ··· 0 1 2
Both M and K needs to be modified because of Dirichlet boundary and
conditions, but the diagonal entries in K should be set to zero and those  
in M to unity. In this way, for k ∈ Ib we have Ak,k = 1. The right-hand 1 −1 0 · · · · · · · · · · · · · · · 0
.. 
side must read bk = Ukn for k ∈ Ib (assuming the unknown is sought at  −1 2 −1 . . .

 .  
time level tn ).  . .. 
 0 −1 2 −1 . . . 
 
 . . .. .. .. 
 .. . . . . 0 . 
 
6.3.4 Example: Oscillating Dirichlet boundary condition
α .
K =  ..
 . . .
.. .. .. .. .. . . .. 
 (6.38)
h . 
 .. . . .. 

We shall address the one-dimensional initial-boundary value problem  . 0 −1 2 −1 . . 
 
 .. .. .. .. .. 
 .

. . . . 0  
 .. .. 
ut = (αux )x + f, x ∈ Ω = [0, L], t ∈ (0, T ], (6.33)  . . −1 2 −1 
u(x, 0) = 0, x ∈ Ω, (6.34) 0 · · · · · · · · · · · · · · · 0 −1 1
u(0, t) = a sin ωt, t ∈ (0, T ], (6.35) The right-hand side of the variational form contains M c1 since there is
ux (L, t) = 0, t ∈ (0, T ] . (6.36) no source term (f ) and no boundary term from the integration by parts
(ux = 0 at x = L and we compute as if ux = 0 at x = 0 too). We must
6.4 Accuracy of the finite element solution 255 256 6 Time-dependent variational forms
incorporate the Dirichlet boundary condition c0 = a sin ωtn by ensuring • Run CN finite difference and finite element (just use FE code)
that this is the first equation in the linear system. To this end, the first • Run Forward Euler
row in K and M is set to zero, but the diagonal entry M0,0 is set to 1. • Point to problems with non-physical oscillations, should be larger in
The right-hand side is b = M c1 , and we set b0 = a sin ωtn . Note that in finite elements than in finite difference
this approach, N = Nn − 1, and c equals the unknown u at each node in
the mesh. We can write the complete linear system as
kam 20: ensuring that it is the first equation in the system?
6.4.2 Methods of analysis
c0 = a sin ωtn , (6.39) There are three major tools for analyzing the accuracy of time-dependent
finite element problems:
h α h
(ci−1 + 4ci + ci+1 ) + ∆t (−ci−1 + 2ci − ci+1 ) = (c1,i−1 + 4c1,i + c1,i+1 ),
6 h 6 • Truncation error
(6.40) • Finite element error analysis framework
i = 1, . . . , Nn − 2, • Amplification factor analysis
h α h The truncation error is the dominating tool used to analyze finite dif-
(ci−1 + 2ci ) + ∆t (−ci−1 + ci ) = (c1,i−1 + 2c1,i ),
6 h 6 ference schemes, but very seldom (if at all) used in the finite element
(6.41)
literature. This is actually surprising, since finite elements also give rise
i = Nn − 1 . to difference stencils of the same nature mathematical form as finite
difference methods, and there is no reason why we should not use a
The Dirichlet boundary condition can alternatively be implemented
simple tool as the truncation error to establish expected convergence
through a boundary function B(x, t) = a sin ωt ϕ0 (x):
rates of finite element methods.
X Instead, a more rigorous and powerful error analysis framework has
un (x) = a sin ωtn ϕ0 (x) + cj ϕν(j) (x), ν(j) = j + 1 .
been developed for finite element methods. This framework deals primar-
j∈Is
ily with the spatial discretization and is not so superior for space-time
Now, N = Nn − 2 and the c vector contains values of u at nodes problems.
1, 2, . . . , Nn − 1. The right-hand side gets a contribution To explain the numerical artifacts from the previous section, both the
truncation error and the error analysis framework fall too short, and
ZL we have turn to the method based on analyzing amplification factors.
(a(sin ωtn − sin ωtn−1 )ϕ0 ϕi − ∆tαa sin ωtn ∇ϕ0 · ∇ϕi ) dx . (6.42) For wave equations, the counterpart is often referred to as analysis of
0 dispersion relations.
The idea of the method of analyzing amplification factors is to see how
sinusoidal waves are amplified in time. For example, if high frequency
6.4 Accuracy of the finite element solution components are damped much less than the analytical damping, we may
see this as ripples or noise in the solution.
kam 21: new intro Let us address the diffusion equation in 1D, ut = αuxx for x ∈ Ω =
(0, π) and t ∈ (0, T ]. For the case where we have homogeneous Dirichlet
conditions and the initial condition is u(x, 0) = u0 (x), the solution to
6.4.1 Illustrating example the problem can be expressed as
• Two insulated metal pieces

∞ 2 ∆t
X 2 u = Ane eikx , Ae = e−αk . (6.43)
u(x, t) = Bk e−α(k) t sin(kx),
k=1 We remark that Ae is a function of the parameter k, but to avoid to
R
where Bk = Ω u0 sin(kx). This is the well-known Fourier decomposition clutter the notation here we write Ae instead of Ae,k . This convention
of a signal in sine waves (one can also use cosine functions or a combination will be used also for the discrete case.
of sines and cosines). For a given wave sin(kx) with wave length λ = 2π/k, As we will show, many numerical schemes for the diffusion equation
2
this part of the signal will in time develop as e−α(k) t . Smooth signals will also have a similar wave component as solution:
need only a few long waves (Bk decays rapidly with k), while discontinuous
or noisy signals will have an amount of short waves with significant unq = An eikx , (6.44)
amplitude (Bk decays slowly with k). where A is an amplification factor to be calculated by inserting (6.44)
2
The amplification factor is defined as Ae = e−α(k) ∆t and expresses in the discrete equations. Normally A 6= Ae , and the difference in the
how much a wave with frequency k is damped over a time step. The corre- amplification factor is what introduces (visible) numerical errors. To
sponding numerical amplification factor will vary with the discretization compute A, we need explicit expressions for the discrete equations for
method and also discretization parameters in space. {cj }j∈Is in the finite element method. That is, we need to assemble
From the analytical expression for the amplification factor, we see that the linear system and look at a general row in the system. This row
2
e−α(k) is always less than 1. Further, we notice that the amplification can be written as a finite difference scheme, and the analysis of the
factor has a strong dependency on the frequency of the Fourier component. finite element solution is therefore performed in the same way as for
For low frequency components (when k is small), the amplification finite difference methods. Expressing the discrete finite element equations
factor is relatively large although always less than 1. For high frequency as finite difference operators turns out to be very convenient for the
components, when k approaches ∞, the amplification factor goes to 0. calculations.
Hence, high frequency components (rapid changes such as discontinuities We introduce xq = qh, or xq = q∆x, for the node coordinates, to align
or noise) present in the initial condition will be quickly dampened, while the notation with that frequently used in finite difference methods. A
low frequency components stay for a longer time interval. convenient start of the calculations is to establish some results for various
The purpose of this section is to discuss the amplification factor of finite difference operators acting on the wave component
numerical schemes and compare the amplification factor of the scheme
with the known analytical amplification factor. unq = An eikq∆x . (6.45)
The forward Euler scheme (see A.1) is
6.4.3 Fourier components and dispersion relations un+1 − unq
u0q (tn ) ≈ [Dt+ uq ]n =
q
Let us again consider the diffusion equation in 1D, ut = αuxx . To allow ∆t

for general boundary conditions, we include both the sin(kx) and cos(kx), An+1 − An ikq∆x
=
q
e
or for convenience we expand the Fourier series in terms of {eikx }∞
k=−∞ .
∆t
Hence, we perform a separation in terms of the (Fourier) wave component A−1
= An eikq∆x .
∆t
u = eβt+ikx
Similarly, the actions of the most common operators of relevance for the
2
√ model problem at hand are listed below.
where β = −αk and i = −1 is the imaginary unit.
Discretizing in time such that t = n∆t, this exact wave component
can alternatively be written as
A−1 stability criterion is |A| ≤ 1, which here implies A ≤ 1 and A ≥ −1. The
[Dt+ An eikq∆x ]n = An eikq∆x , (6.46)
∆t former is always fulfilled, while the latter leads to
1−A −1
[Dt− An eikq∆x ]n = An eikq∆x , (6.47) sin2 p
∆t 4F ≤ 2.
1 1
1
A 2 − A− 2
1
A−1 1 − 23 sin2 p
[Dt An eikq∆x ]n+ 2 = An+ 2 eikq∆x = An eikq∆x , (6.48)
∆t ∆t The factor sin2 p/(1 − 23 sin2 p) can be plotted for p ∈ [0, π/2], and the
4 k∆x maximum value goes to 3 as p → π/2. The worst case for stability
[Dx Dx An eikq∆x ]q = −An sin2 . (6.49)
∆x2 2 therefore occurs for the shortest possible wave, p = π/2, and the stability
criterion becomes
6.4.4 Forward Euler discretization 1 ∆x2

F ≤ ⇒ ∆t ≤ , (6.52)
6 6α
We insert (6.45) in the Forward Euler scheme with P1 elements in space
which is a factor 1/3 worse than for the standard Forward Euler finite
and f = 0 (note that this type of analysis can only be carried out if
difference method for the diffusion equation, which demands F ≤ 1/2.
f = 0) )
Lumping the mass matrix will, however, recover the finite difference
1 method and therefore imply F ≤ 1/2 for stability. In other words,
[Dt+ (u + h2 Dx Dx u) = αDx Dx u]nq . (6.50) introducing an error in the integration improves the stability by a factor
6
We have (using (6.46) and (6.49) of 3.
A−1 4 k∆x
[Dt+ Dx Dx Aeikx ]nq = [Dt+ A]n [Dx Dx eikx ]q = −An eikp∆x sin2 ( ). 6.4.5 Backward Euler discretization
∆t ∆x 2 2
We can use the same approach of analysis and insert (6.45) in the
The term [Dt+ Aeikx + 16 ∆x2 Dt+ Dx Dx Aeikx ]nq then reduces to
Backward Euler scheme with P1 elements in space and f = 0:
A − 1 1 2A − 1 4 k∆x
− ∆x sin2 ( ), 1
∆t 6 ∆t ∆x2 2 [Dt− (u + h2 Dx Dx u) = αDx Dx u]ni . (6.53)
or 6

A−1 2 Similar calculations as in the Forward Euler case lead to
1 − sin2 (k∆x/2) .
∆t 3
2 2
Introducing p = k∆x/2 and F = α∆t/∆x2 , the complete scheme becomes (1 − A−1 ) 1 − sin p = −4F sin2 p,
3

2 2 and hence
(A − 1) 1 − sin p = −4F sin2 p,
3 !−1
from which we find A to be sin2 p
A = 1 + 4F .
1 − 23 sin2 p
2
sin p
A = 1 − 4F . (6.51) The quantity in the parentheses is always greater than unity, so |A| ≤ 1
1 − 23 sin2 p regardless of the size of F and p. As expected, the Backward Euler
How does this A change the stability criterion compared to the Forward scheme is unconditionally stable.
Euler finite difference scheme and centered differences in space? The
6.4.6 Comparing amplification factors

1.0 Method: BE
It is of interest to compare A and Ae as functions of p for some F
values. Figure 6.2 displays the amplification factors for the Backward
Euler scheme corresponding to a coarse mesh with F = 2 and a mesh at
the stability limit of the Forward Euler scheme in the finite difference 0.8
method, F = 1/2. Figures 6.1 and 6.3 shows how the accuracy increases
with lower F values for both the Forward and Backward Euler schemes,
respectively. The striking fact, however, is that the accuracy of the finite 0.6
element method is significantly less than the finite difference method for
A/Ae
the same value of F . Lumping the mass matrix to recover the numerical
amplification factor A of the finite difference method is therefore a good 0.4
idea in this problem. F=2, FEM
0.2
F=2, FDM
F=1/2, FEM
Method: FE F=1/2, FDM
1.0 exact
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
p
Fig. 6.2 Comparison of coarse-mesh amplification factors for Backward Euler discretiza-
0.5 tion of a 1D diffusion equation.
The difference between the exact and the numerical amplification factors
0.0
A/Ae
gives insight into the order of the approximation. Considering for example
the forward Euler scheme, the difference Ae −A, where Ae and A are given
in (6.43) and (6.51) is a complicated expression. However, performing a
F=1/6, FEM Taylor expansion in terms of ∆t using sympy is straightforward:
0.5 F=1/6, FDM
F=1/12, FEM >>> import sympy as sym
F=1/12, FDM >>> k, dt, dx, alpha = sym.symbols("k dt dx alpha")
exact >>> p = k*dx/2
>>> F = alpha*dt/(dx*dx)
1.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 >>> Ae = sym.exp(-alpha*k**2*dt) # exact
p >>> Af = 1 - 4*F*sym.sin(p)**2/(1 - 2.0/3.0*sym.sin(p)**2) # FE
>>> (Ae - Af).series(dt, n=2)
Fig. 6.1 Comparison of fine-mesh amplification factors for Forward Euler discretization dt*(-alpha*k**2 + 4*alpha*sin(dx*k/2)**2/
of a 1D diffusion equation. (-0.666666666666667*dx**2*sin(dx*k/2)**2 + dx**2)) + O(dt**2)
Hence, the differences between the numerical and exact amplification

hpl 22: remaining tasks Remaining tasks: factor is first order in time, as expected.
The L2 error of the numerical solution at time step n is
• Taylor expansion of the error in the amplification factor Ae − A
• Taylor expansion of the error e = (Ane − An )eikx
• L2 norm of e
1.0 Method: BE 1.0 Method: CN
0.9
0.5
0.8
0.7
0.0
A/Ae
A/Ae
0.6
0.5 F=1/6, FEM F=2, FEM

F=1/6, FDM 0.5 F=2, FDM
0.4
F=1/12, FEM F=1/2, FEM
F=1/12, FDM F=1/2, FDM
exact exact
0.3 1.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
p p
Fig. 6.3 Comparison of fine-mesh amplification factors for Backward Euler discretization Fig. 6.4 Comparison of coarse-mesh amplification factors for Crank-Nicolson discretiza-
of a 1D diffusion equation. tion of a 1D diffusion equation.
s s
Z 1 Z 1 from numpy import *
ku − ue kL2 = (u − ue )2 dx = ((Ane − An )eikx )2 dx import matplitlib.pyplot as plt
0 0 dxs = [8*10**-2, 4*10**-2, 2*10**-2, 10**-2, 10**-2/2, 10**-2/4, 10**-2/8]
for dx in dxs:
Again this yields a complicated expression for hand-calculations, but the k_max=100
following sympy provides the estimate: k = arange(1, k_max, 1)
dt = 0.5*dx**2
>>> n, i, x = sym.symbols("n i x") alpha = 1
>>> e = (Ae**n - Af**n)*sym.exp(i*k*x) f = dt*(-alpha*k**2 + 4*alpha*sin(dx*k/2)**2/
>>> L2_error_squared = sym.integrate(e**2, (x, 0, 1)) (-0.666666666666667*dx**2*sin(dx*k/2)**2 + dx**2))
>>> sym.sqrt(L2_error_squared.series(dt, n=2))
O(dt) plt.loglog(k, f)
plt.legend(["dx=%3.2e" % dx for dx in dxs], loc="lower left")
We remark here that it is an advantage to take the square-root after the plt.show()
deriving the Taylor-series. Figure ?? shows that there is a polynomial relationship between the
As we saw earlier, the amplification factor varied with k, in particular error of the amplification factor and k, that is Ae − A goes as k 4 . For
with respect to the resolution. Let us therefore consider the error in the well-resolved meshes, ∆x ≤ 0.2k the amplification factor is always less
amplification factor Ae − A at different resolutions for the k = 1 . . . 100 than 1/1000. For the under-resolved meshes, e.g. ∆x = 8k the error of
at mesh sizes that under-resolve and properly resolve the first hundred the amplification factor is even larger than 1.
8 1
components. That is, we vary dx from 100 to 800 . Plotting the error in From the previous analysis of forward Euler scheme, we know that
the amplification factor versus k as follows, the scheme is only stable as long as the stability criterion ∆t ≤ 12 ∆x2
1.0 Method: CN
0.8
0.6
A/Ae
0.4
F=1/6, FEM
0.2
F=1/6, FDM
F=1/12, FEM
F=1/12, FDM
exact
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
p
Fig. 6.5 Comparison of fine-mesh amplification factors for Backward Euler discretization Fig. 6.6 Error in the amplification factor versus ∆t .
of a 1D diffusion equation.
6.5 Exercises
is satisfied. Let us therefore consider the error in the amplification
factor with respect to ∆t. Figure 6.6 shows a clear tendency that lower Exercise 6.1: Analyze a Crank-Nicolson scheme for the
frequencies (lower k) are quickly dampened. In fact, the lower frequencies
diffusion equation
will be dampened even though the stability criterion is not satisfied.
However, the stability criterion is important for the high frequency Perform the analysis in Section 6.4 for a 1D diffusion equation ut = αuxx
components of the error. discretized by the Crank-Nicolson scheme in time:
kam 23: I have to admit I not sure this is what we want to show !
un+1 − un 1 ∂un+1 ∂un
=α + 2 ,
∆t 2 ∂x2 ∂x
or written compactly with finite difference operators,
1
[Dt u = αDx Dx ut ]n+ 2 .
(From a strict mathematical point of view, the un and un+1 in these
equations should be replaced by une and un+1
e to indicate that the unknown
is the exact solution of the PDE discretized in time, but not yet in space,
see Section 6.1.) Filename: fe_diffusion.
268 7 Variational forms for systems of PDEs
7.1.1 Sequence of scalar PDEs formulation

Variational forms for systems of
PDEs 7 Let us start with the approach that treats one equation at a time. We
multiply equation number i by some test function v (i) ∈ V (i) and integrate
over the domain:
Z
L(0) (u(0) , . . . , u(m) )v (0) dx = 0, (7.1)
Ω
..
. (7.2)
Z
L(m) (u(0) , . . . , u(m) )v (m) dx = 0 . (7.3)
Ω
Terms with second-order derivatives may be integrated by parts, with

Neumann conditions inserted in boundary integrals. Let
(i) (i)
Many mathematical models involve m + 1 unknown functions governed V (i) = span{ψ0 , . . . , ψNi },
by a system of m + 1 differential equations. In abstract form we may such that
denote the unknowns by u(0) , . . . , u(m) and write the governing equations
as Ni
X (i) (i)
u(i) = B (i) (x) + cj ψj (x),
j=0
L0 (u(0) , . . . , u(m) ) = 0, where B (i) is a boundary function to handle nonzero Dirichlet conditions.
.. Observe that different unknowns may live in different spaces with different
.
basis functions and numbers of degrees of freedom.
Lm (u(0) , . . . , u(m) ) = 0, From the m equations in the variational forms we can derive m coupled
(i)
where Li is some differential operator defining differential equation num- systems of algebraic equations for the Πi=0
m
Ni unknown coefficients cj ,
ber i. j = 0, . . . , Ni , i = 0, . . . , m.
7.1 Variational forms 7.1.2 Vector PDE formulation

The alternative method for deriving a variational form for a system of
There are basically two ways of formulating a variational form for a differential equations introduces a vector of unknown functions
system of differential equations. The first method treats each equation
independently as a scalar equation, while the other method views the u = (u(0) , . . . , u(m) ),
total system as a vector equation with a vector function as unknown.
a vector of test functions
v = (v (0) , . . . , v (m) ),
with

7.2 A worked example 269 270 7 Variational forms for systems of PDEs
The boundary conditions associated with (7.5)-(7.6) are w = 0 on

u, v ∈ V = V (0) × · · · × V (m) . ∂Ω and T = T0 on ∂Ω. Each of the equations (7.5) and (7.6) needs one
condition at each point on the boundary.
With nonzero Dirichlet conditions, we have a vector B = (B (0) , . . . , B (m) )
The system (7.5)-(7.6) arises from fluid flow in a straight pipe, with the
with boundary functions and then it is u − B that lies in V , not u itself.
z axis in the direction of the pipe. The domain Ω is a cross section of the
The governing system of differential equations is written
pipe, w is the velocity in the z direction, µ is the viscosity of the fluid, β
is the pressure gradient along the pipe, T is the temperature, and κ is the
L(u) = 0,
heat conduction coefficient of the fluid. The equation (7.5) comes from
where the Navier-Stokes equations, and (7.6) follows from the energy equation.
The term −µ||∇w||2 models heating of the fluid due to internal friction.
L(u) = (L(0) (u), . . . , L(m) (u)) . Observe that the system (7.5)-(7.6) has only a one-way coupling: T
The variational form is derived by taking the inner product of the vector depends on w, but w does not depend on T , because we can solve (7.5)
of equations and the test function vector: with respect to w and then (7.6) with respect to T . Some may argue that
Z
this is not a real system of PDEs, but just two scalar PDEs. Nevertheless,
the one-way coupling is convenient when comparing different variational
L(u) · v = 0 ∀v ∈ V . (7.4)
Ω forms and different implementations.
Observe that (7.4) is one scalar equation. To derive systems of al-
gebraic equations for the unknown coefficients in the expansions of
the unknown functions, one chooses m linearly independent v vectors 7.3 Identical function spaces for the unknowns
to generate m independent variational forms from (7.4). The particular
choice v = (v (0) , 0, . . . , 0) recovers (7.1), v = (0, . . . , 0, v (m) recovers (7.3), Let us first apply the same function space V for w and T (or more
and
R
v = (0, . . . , 0, v (i) , 0, . . . , 0) recovers the variational form number i, precisely, w ∈ V and T − T0 ∈ V ). With
(i) (i)
ΩL v dx = 0, in (7.1)-(7.3).
V = span{ψ0 (x, y), . . . , ψN (x, y)},
we write
7.2 A worked example
N
X N
X
(w) (T )
w= cj ψj , T = T0 + cj ψj . (7.7)
We now consider a specific system of two partial differential equations in
j=0 j=0
two space dimensions:
Note that w and T in (7.5)-(7.6) denote the exact solution of the PDEs,
while w and T (7.7) are the discrete functions that approximate the exact
µ∇2 w = −β, (7.5) solution. It should be clear from the context whether a symbol means
2
κ∇ T = −µ||∇w|| . 2
(7.6) the exact or approximate solution, but when we need both at the same
time, we use a subscript e to denote the exact solution.
The unknown functions w(x, y) and T (x, y) are defined in a domain Ω,
while µ, β, and κ are given constants. The norm in (7.6) is the standard
Euclidean norm: 7.3.1 Variational form of each individual PDE
||∇w||2 = ∇w · ∇w = wx2 + wy2 . Inserting the expansions (7.7) in the governing PDEs, results in a residual
in each equation,
7.3 Identical function spaces for the unknowns 271 272 7 Variational forms for systems of PDEs
Z Z
(µ∇w · ∇v0 + κ∇T · ∇v1 ) dx = (βv0 + µ∇w · ∇w v1 ) dx, ∀v ∈ V .
Ω Ω
Rw = µ∇ w + β, 2
(7.8) (7.12)
2 2 Choosing v0 = v and v1 = 0 gives the variational form (7.10), while
RT = κ∇ T + µ||∇w|| . (7.9)
v0 = 0 and v1 = v gives (7.11). R
A Galerkin method demands Rw and RT do be orthogonal to V : With the inner product notation, (p, q) = Ω pq dx, we can alternatively
write (7.10) and (7.11) as
Z
Rw v dx = 0 ∀v ∈ V,
ZΩ (µ∇w, ∇v) = (β, v) ∀v ∈ V,
RT v dx = 0 ∀v ∈ V . (κ∇T, ∇v) = (µ∇w · ∇w, v) ∀v ∈ V,
Ω
Because of the Dirichlet conditions, v = 0 on ∂Ω. We integrate the or since µ and κ are considered constant,
Laplace terms by parts and note that the boundary terms vanish since
v = 0 on ∂Ω:
µ(∇w, ∇v) = (β, v) ∀v ∈ V, (7.13)
Z Z κ(∇T, ∇v) = µ(∇w · ∇w, v) ∀v ∈ V . (7.14)
µ∇w · ∇v dx = βv dx ∀v ∈ V, (7.10)
ZΩ ZΩ
Note that the left-hand side of (7.13) is again linear in w, the left-hand
side of (7.14) is linear in T and the nonlinearity of w appears in the
κ∇T · ∇v dx = µ∇w · ∇w v dx ∀v ∈ V . (7.11)
Ω Ω right-hand side of (7.14)
The equation Rw in (7.8) is linear in w, while the equation RT in (7.9)
is linear in T and nonlinear in w.
7.3.3 Decoupled linear systems
(w) (T )
The linear systems governing the coefficients cj and cj , j = 0, . . . , N ,
7.3.2 Compound scalar variational form are derived by inserting the expansions (7.7) in (7.10) and (7.11), and
choosing v = ψi for i = 0, . . . , N . The result becomes
The alternative way of deriving the variational from is to introduce a
test vector function v ∈ V = V × V and take the inner product of v
N
and the residuals, integrated over the domain: X (w) (w) (w)
Ai,j cj = bi , i = 0, . . . , N, (7.15)
Z j=0
(Rw , RT ) · v dx = 0 ∀v ∈ V . N
X (T ) (T ) (T )
= bi , i = 0, . . . , N, (7.16)
Ω
Ai,j cj
With v = (v0 , v1 ) we get j=0
Z (w)
Ai,j = µ(∇ψj , ∇ψi ), (7.17)
(Rw v0 + RT v1 ) dx = 0 ∀v ∈ V .
(w)
Ω bi = (β, ψi ), (7.18)
Integrating the Laplace terms by parts results in (T )
Ai,j = κ(∇ψj , ∇ψi ), (7.19)
(T )
X (w)
X (w)
bi = µ(( cj ∇ψj ) ·( ck ∇ψk ), ψi ) . (7.20)
j k
7.3 Identical function spaces for the unknowns 273 274 7 Variational forms for systems of PDEs
It can also be instructive to write the linear systems using matrices and N
X (w,w) (w)
N
X (w,T ) (T ) (w)
vectors. Define K as the matrix corresponding to the Laplace operator Ai,j cj + Ai,j cj = bi , i = 0, . . . , N, (7.23)
∇2 . That is, Ki,j = (∇ψj , ∇ψi ). Let us introduce the vectors
j=0 j=0
N
X N
X
(T,w) (w) (T,T ) (T ) (T )
Ai,j cj + Ai,j cj = bi , i = 0, . . . , N, (7.24)
(w) (w) (w)
b = (b0 , . . . , bN ), j=0 j=0
(w,w)
(T ) (T ) (T ) Ai,j = µ(∇ψj , ∇ψi ), (7.25)
b = (b0 , . . . , bN ),
(w,T )
(w) (w) Ai,j = 0, (7.26)
c(w) = (c0 , . . . , cN ),
(w)
(T ) (T ) bi = (β, ψi ), (7.27)
c(T ) = (c0 , . . . , cN ) .
(w,T )
Ai,j = µ((∇w− ) · ∇ψj ), ψi ), (7.28)
The system (7.15)-(7.16) can now be expressed in matrix-vector form as (T,T )
Ai,j = κ(∇ψj , ∇ψi ), (7.29)
(T )
bi = 0. (7.30)
µKc(w) = b(w) , (7.21)
κKc(T ) = b(T ) . (7.22) This system can alternatively be written in matrix-vector form as
We can solve the first system for c(w) , and then the right-hand side
b(T ) is known such that we can solve the second system for c(T ) . Hence, µKc(w) = b(w) , (7.31)
the decoupling of the unknowns w and T reduces the system of nonlinear Lc (w)
+ κKc (T )
= 0, (7.32)
PDEs to two linear PDEs.
(w,T )
with L as the matrix from the ∇w− · ∇ operator: Li,j = Ai,j . The
(w,w) (T,T )
matrix K is Ki,j = Ai,j = Ai,j .
7.3.4 Coupled linear systems The matrix-vector equations are often conveniently written in block
form:
Despite the fact that w can be computed first, without knowing T , we
! ! !
shall now pretend that w and T enter a two-way coupling such that we µK 0 c(w) b(w)
need to derive the algebraic equations as one system for all the unknowns = ,
(w) (T ) (w)
L κK c(T ) 0
cj and cj , j = 0, . . . , N . This system is nonlinear in cj because of the
∇w · ∇w product. To remove this nonlinearity, imagine that we introduce Note that in the general case where all unknowns enter all equations,
an iteration method where we replace ∇w · ∇w by ∇w− · ∇w, w− being we have to solve the compound system (7.23)-(7.24) since then we cannot
the w computed in the previous iteration. Then the term ∇w− · ∇w is utilize the special property that (7.15) does not involve T and can be
linear in w since w− is known. The total linear system becomes solved first.
When the viscosity depends on the temperature, the µ∇2 w term
must be replaced by ∇ · (µ(T )∇w), and then T enters the equation
for w. Now we have a two-way coupling since both equations contain
w and T and therefore must be solved simultaneously. The equation
∇ · (µ(T )∇w) = −β is nonlinear, and if some iteration procedure is
invoked, where we use a previously computed T− in the viscosity (µ(T− )),
7.4 Different function spaces for the unknowns 275 276 7 Variational forms for systems of PDEs
the coefficient is known, and the equation involves only one unknown, w. As earlier, we may decoupled the system in terms of two linear PDEs
In that case we are back to the one-way coupled set of PDEs. as we did with (7.15)-(7.16) or linearize the coupled system by intro-
We may also formulate our PDE system as a vector equation. To ducing the previous iterate w− as in (7.23)-(7.24). However, we need to
this end, we introduce the vector of unknowns u = (u(0) , u(1) ), where (w) (T )
distinguish between ψi and ψi , and the range in the sums over j
u(0) = w and u(1) = T . We then have must match the number of degrees of freedom in the spaces V (w) and
! V (T ) . The formulas become
2 −µ−1 β
∇ u= .
−κ µ∇u(0) · ∇u(0)
−1
Nw
X (w) (w) (w)
Ai,j cj = bi , i = 0, . . . , Nw , (7.36)
j=0
7.4 Different function spaces for the unknowns NT
X (T ) (T ) (T )
Ai,j cj = bi , i = 0, . . . , NT , (7.37)
It is easy to generalize the previous formulation to the case where j=0
(w) (w) (w)
w ∈ V (w) and T ∈ V (T ) , where V (w) and V (T ) can be different spaces Ai,j = µ(∇ψj , ∇ψi ), (7.38)
with different numbers of degrees of freedom. For example, we may use (w)
bi =
(w)
(β, ψi ), (7.39)
quadratic basis functions for w and linear for T . Approximation of the
(T ) (T ) (T )
unknowns by different finite element spaces is known as mixed finite Ai,j = κ(∇ψj , ∇ψi ), (7.40)
element methods. Nw
X Nw
X
(T ) (w) (w) (w) (w) (T )
We write bi = µ( cj ∇ψj ) · ( ck ∇ψk ), ψi ) . (7.41)
j=0 k=0
(w)
V (w) = span{ψ0 , . . . , ψNw },
(w) In the case we formulate one compound linear system involving both
(w) (T )
(T ) (T )
cj , j = 0, . . . , Nw , and cj , j = 0, . . . , NT , (7.23)-(7.24) becomes
V (T ) = span{ψ0 , . . . , ψNT } .
The next step is to multiply (7.5) by a test function v (w) ∈ V (w) and Nw
X (w,w) (w)
NT
X (w,T ) (T ) (w)
(7.6) by a v (T ) ∈ V (T ) , integrate by parts and arrive at Ai,j cj + Ai,j cj = bi , i = 0, . . . , Nw , (7.42)
j=0 j=0
Nw
X NT
X
Z Z (T,w) (w) (T,T ) (T ) (T )
Ai,j cj + Ai,j cj = bi , i = 0, . . . , NT , (7.43)
µ∇w · ∇v (w) dx = βv (w) dx ∀v (w) ∈ V (w) , (7.33)
Ω Ω j=0 j=0
Z Z
(w,w) (w) (w)
κ∇T · ∇v (T ) dx = µ∇w · ∇w v (T ) dx ∀v (T ) ∈ V (T ) . (7.34) Ai,j = µ(∇ψj , ∇ψi ), (7.44)
Ω Ω (w,T )
Ai,j = 0, (7.45)
The compound scalar variational formulation applies a test vector (w) (w)
function v = (v (w) , v (T ) ) and reads bi = (β, ψi ), (7.46)
(w,T ) (w) (T )
Ai,j = µ(∇w− · ∇ψj ), ψi ), (7.47)
Z Z
(T,T ) (T ) (T )
(µ∇w · ∇v (w)
+ κ∇T · ∇v (T )
) dx = (βv (w)
+ µ∇w · ∇w v (T )
) dx, Ai,j = κ(∇ψj , ∇ψi ), (7.48)
Ω Ω (T )
(7.35) bi = 0. (7.49)
valid ∀v ∈ V = V (w) × V (T ) .
7.5 Computations in 1D 277 278 7 Variational forms for systems of PDEs
Z
Here, we have again performed a linearization by employing a previous we,x = −
β
dx + Cw ,
iterate w− . The corresponding block form Z
µ
(w)
!
(w)
!
(w)
! we = wx dx + Dw ,
µK 0 c b
= , Z
L κK (T ) c(T ) 0
Te,x = − µwx2 dx + CT ,
(w) (T )
has square and rectangular block matrices: K is Nw × Nw , K is Z
NT × NT , while L is NT × Nw , we = wx dx + DT ,
7.5 Computations in 1D where we determine the constants Cw , Dw , CT , and DT by the boundary

conditions w(0) = w(H) = 0 and T (0) = T (H) = T0 . The calculations
may be performed in sympy as
We can reduce the system (7.5)-(7.6) to one space dimension, which
corresponds to flow in a channel between two flat plates. Alternatively, one import sympy as sym
may consider flow in a circular pipe, introduce cylindrical coordinates, and
x, mu, beta, k, H, C, D, T0 = sym.symbols("x mu beta k H C D T0")
utilize the radial symmetry to reduce the equations to a one-dimensional wx = sym.integrate(-beta/mu, (x, 0, x)) + C
problem in the radial coordinate. The former model becomes w = sym.integrate(wx, x) + D
s = sym.solve([w.subs(x, 0)-0, # x=0 condition
w.subs(x,H)-0], # x=H condition
[C, D]) # unknowns
µwxx = −β, (7.50) w = w.subs(C, s[C]).subs(D, s[D])
w = sym.simplify(sym.expand(w))
κTxx = −µwx2 , (7.51)
Tx = sym.integrate(-mu*sym.diff(w,x)**2, x) + C
while the model in the radial coordinate r reads T = sym.integrate(Tx, x) + D
s = sym.solve([T.subs(x, 0)-T0, # x=0 condition
T.subs(x, H)-T0], # x=H condition
1 d dw [C, D]) # unknowns
µ r = −β, (7.52) T = T.subs(C, s[C]).subs(D, s[D])
r dr dr T = sym.simplify(sym.expand(T))

1 d dT dw 2
κ r = −µ . (7.53) We find that the solutions are
r dr dr dr
βx
The domain for (7.50)-(7.51) is Ω = [0, H], with boundary conditions we (x) = (H − x) ,
2µ
w(0) = w(H) = 0 and T (0) = T (H) = T0 . For (7.52)-(7.53) the domain !
is [0, R] (R being the radius of the pipe) and the boundary conditions β 2 H 3 x H 2 2 H 3 x4
Te (x) = − x + x − + T0 .
are du/dr = dT /dr = 0 for r = 0, u(R) = 0, and T (R) = T0 . µ 24 8 6 12
The exact solutions, we and Te , to (7.50) and (7.51) are computed as
The figure 7.1 shows w computed by the finite element method using
the decoupled approach with P1 elements, that is; implementing (7.15).
The analytical solution we is a quadratic polynomial. The linear finite
elements result in a poor approximation on the coarse meshes, N = 2 and
N = 4, but the approximation improves fast and already at N = 8 the
approximation appears adequate. The figure 7.1 shows the approximation
of T and also here we see that the fourth order polynomial is poorly
approximated at coarse resolution, but that the approximation quickly
improves.
Fig. 7.2 The solution T of (7.51) for κ = H = 1.
a = mu*inner(grad(u), grad(v))*dx
L = -beta*v*dx
Fig. 7.1 The solution w of (7.50) with β = µ = 1 for different mesh resolutions.
w = Function(V)
solve(a == L, w, bc)
The figure 7.1 shows T for different resolutions. The same tendency is T0 = Constant(1)
apparent although the coarse grid solutions are worse for T than for w. kappa = Constant(1)
The solutions at N = 16 and N = 32, however, appear almost identical. bc = DirichletBC(V, T0, boundary)
a = kappa*inner(grad(u), grad(v))*dx
def boundary(x): L = -mu*inner(grad(w), grad(w))*v*dx
return x[0] < DOLFIN_EPS or x[0] > 1.0 - DOLFIN_EPS
T = Function(V)
from dolfin import * solve(a == L, T, bc)
import matplotlib.pyplot as plt
plt.plot(V.dofmap().tabulate_all_coordinates(mesh), T.vector().array())
Ns = [2, 4, 8, 16, 32] plt.hold(True)
for N in Ns: plt.legend(["N=%d"%N for N in Ns], loc="upper left")
mesh = UnitIntervalMesh(N) plt.show()
V = FunctionSpace(mesh, "Lagrange", 1)
u = TrialFunction(V) Most of the FEniCS code should be familiar to the reader, but we re-
v = TestFunction(V) mark that we use the function V.dofmap().tabulate_all_coordinates(mesh)
beta = Constant(1)
to obtain the coordinates of the nodal points. This is a general function
mu = Constant(1) that works for any finite element implemented in FEniCS and also in a
parallel setting.
bc = DirichletBC(V, Constant(0), boundary)
kam 24: do not need extensive description as it should be covered problem where a is on the following form
elsewhere (
The calculations for (7.52) and (7.53) are similar. The sympy code 1 if x ≤ 12 ,
a(x) =
a0 if x > 12 .
import sympy as sym
r, R = sym.symbols("r R") Notice that for such an a(x), the equation (7.54) does not necessarily
rwr = sym.integrate(-(beta/mu)*r, r) + C make sense because we cannot differentiate a(x). Strictly speaking a0 (x)
w = sym.integrate(rwr/r, r) + D would be a Dirac’s delta function in x = 12 , that is; a0 (x) is ∞ at x = 12
s = sym.solve([sym.diff(w,r).subs(r, 0)-0, # r=0 condition
w.subs(r,R)-0], # r=R condition and zero everywhere else.
[C, D]) # unknowns Hand-calculations do however show that we may be able to compute
w = w.subs(C, s[C]).subs(D, s[D]) the solution. Integrating (7.54) yields the expression
w = sym.simplify(sym.expand(w))
rTr = sym.integrate(-mu*sym.diff(w,r)**2*r, r) + C −(au0 ) = C

T = sym.integrate(rTr/r, r) + D
s = sym.solve([sym.diff(T,r).subs(r, 0)-T0, # r=0 condition A trick now is to divide by a(x) on both sides to obtain
T.subs(r, R)-T0], # r=R condition
[C, D]) # unknowns C
T = T.subs(C, s[C]).subs(D, s[D]) −u0 =
T = sym.simplify(sym.expand(T)) a
and we obtain the solutions and hence

C
u(x) = x+D (7.57)
β R2 − r 2 a(x)
w(r) = ,
4µ The boundary conditions demand that u(0) = 0, which means that D = 0
1
and u(1) = 1 fixate C to be a(1) = a0 .
T (r) = R4 β 2 + 64T0 µ − β 2 r4 .
64µ To obtain a variational form of this problem suitable for finite element
kam 25: so how do we do we conclude this? FEniCS simulation in simulation, we transform the problem to a homogeneous Dirichlet problem.
P
3D or FEniCS simulation in 1D with r? Or perhaps both? There are Let the solution be u(x) = B(x) + j∈Is cj ψj (x) where B(x) = x.
some stuff happening with piecewise integration in the presence of the r. kam 26: the right hand side is hard to compute. Perhaps find a
problem with homogeneous BC. Or perhaps just comment that we take
the linear algebra approach. Can take the opportunity to say that the BC
trick requires extra regularity and that we instead do the LA approach.
7.5.1 Another example in 1D The variational problem derived from a standard Galerkin method
Consider the problem reads: Find u such that
Z Z
−(au0 )0 = 0, (7.54) au0 v 0 dx = f vdx
Ω Ω
u(0) = 0, (7.55)
We observe that in the variational problem, the discontinuity of a
u(1) = 1 . (7.56)
does not cause any problem as the differentiation is moved from a (and
P
For any scalar a (larger than 0), we may easily verify that the solution is u0 ) to v by using integration by parts. Letting u = j∈Is cj ψj (x) and
P
u(x) = x. In many applications, such as for example porous media flow v = ψi (x) the corresponding linear system is j Ai,j cj = bi with
or heat conduction, the parameter a contains a jump that represents
the transition from one material to another. Hence, let us consider the
Z
Ai,j = (aψj0 , ψi0 ) = a(x)ψj0 (x)ψi0 (x) dx,
Ω
bi = (f, ψi ) = 0 .
The solution of the problem is shown in Figure 7.3 at different mesh

resolutions. The analytical solution in (7.57) is a piecewise polynomial,
linear for x in [0, 21 ) and ( 12 , 1] and it seems that the numerical strategy
gives a good approximation of the solution.
The flux au0 is often a quantity of interest. Because the flux involves
differentiation with respect to x we do not have an direct access to it
and have to compute it. A natural approach is to take the Galerkin
P
approximation, that is we seek a w ≈ au0 on the form w = j∈Is dj ψj
and require Galerkin orthogonality. In other words, we require that
w − au0 is orthogonal to {ψi }. This is done by solving the linear system
P
j Mi,j dj = bi with
Z
Mi,j = (aψj , ψi ) = a(x)ψj (x)ψi (x) dx, Fig. 7.3 Solution of the Darcy problem with discontinuous coefficient for different number
ZΩ X of elements N .
bi = (au0 , ψi ) = a(x) cj ψj0 (x) dx .
Ω j Z
1 (w) ∂u (w)
wv + v dx = 0, (7.60)
As shown in Figure 7.4, this approach does not produce a good Ω a ∂x
Z
approximation of the flux. ∂w (u)
v dx = 0 . (7.61)
To improve the approximation of the flux, it is common to consider Ω ∂x
an equivalent form of (7.54) where the flux is one of the unknowns. The P P
(u) (w) (u)
equations reads: and letting u = j∈Is cj ψj , w= j∈Is cj ψj , v (u) = ψi , and v (w) =
(w)
ψi , we obtain the following system of linear equations
∂w
= 0, (7.58) " #" # " #
∂x A(w,w) A(w,u) c(w) b(w)
∂u Ac = = (u) = b,
w = −a . (7.59) A(u,w) 0 c(u) b
∂x
A straightforward calculation shows that inserting (7.59) into (7.58) where
yields the equation (7.54). We also note that we have replaced the second
order differential equation with a system of two first order differential
equations.
It is common to swap the order of the equations and also divide
equation (7.59) by a in order to get a symmetric system of equations.
Then variational formulation of the problem, having the two unknowns
w and u and corresponding test functions v (w) and v (u) , becomes
Z
(w,w) 1 (w) (w)
Ai,j = ψj (x)ψi (x) dx i, j = 0 . . . N (w) − 1,
Ω a(x)
Z
(w,u) ∂ (u) (w)
Ai,j = ψj (x)ψi (x) dx i = 0 . . . N (w) − 1, j = 0, . . . N (u) − 1,
Ω ∂x
(u,w) (w,u)
Ai,j = Aj,i ,
(w) (w)
bi = (0, ψi ) = 0,
(u) (u)
bi = (0, ψi ) = 0 .
It is interesting to note that the standard Galerkin formulation of the

problem results in a perfect approximation of u, while the flux −au0 is
badly represented. On the other hand, for the mixed formulation, the flux
is well approximated but u is approximated only to first order yielding
a staircase approximation. These observations naturally suggest that
we should employ P1 approximation of both u and its flux. We should
then get a perfect approximation of both unknowns. This is however
not possible. The linear system we obtain with P1 elements for both
variables is singular. Fig. 7.4 The corresponding flux au0 for the Darcy problem with discontinuous coefficient
for different number of elements N .
This example shows that when we are solving systems of PDEs with
several unknowns, we can not choose the approximation arbitrary. The
polynomial spaces of the different unknowns have to be compatible and 7.6 Exercises
the accuracy of the different unknowns depend on each other. We will
not discuss the reasons for the need of compatibility here as it is rather Problem 7.1: Estimate order of convergence for the Cooling
mathematical and well presented in many books, e.g. [4, 3, 6]. law
kam 27: Not sure why the I don’t get a0 . Need to debug. kam
28: integration by parts done above, but not commented. Also bc, not Consider the 1D Example of the fluid flow in a straight pipe coupled
considered. kam 29: wrong figure to heat conduction in Section 7.5. The example demonstrated fast con-
vergence when using linear elements for both variables w and T . In this
exercise we quantify the order of convergence. That is, we expect that
kw − we kL2 ≤ Cw hβw ,
kT − Te kL2 ≤ CT hβT ,
for some Cw , CT , βw and βT . Assume therefore that
kw − we kL2 = Cw hβw ,
kT − Te kL2 = CT hβT ,
and estimate Cw , CT , βw and βT .

7.6 Exercises 287
Problem 7.2: Estimate order of convergence for the Cooling

law
Repeat Exercise 7.1 with quadratic finite elements for both w and T .
Calculations to be continued...
290 8 Flexible implementations of boundary conditions
∂f ∂f
8
= 0, = 0.
Flexible implementations of ∂x ∂y
These two equations are in general nonlinear and can have many solutions,
boundary conditions one unique solution, or none. In our specific example, there is only one
solution: x = 0, y = 0.
Now we want to optimize f (x, y) under the constraint y = 2 − x. This
means that only f values along the line y = 2 − x are relevant, and we
can imagine we view f (x, y) along this line and want to find the optimum
value.
8.1.1 Elimination of variables

Our f is obviously a function of one variable along the line. Inserting
y = 2 − x in f (x, y) eliminates y and leads to f as function of x alone:
One quickly gets the impression that variational forms can handle only
two types of boundary conditions: essential conditions where the unknown f (x, y = 2 − x) = 4 − 4x + 2x2 .
is prescribed, and natural conditions where flux terms integrated by parts The condition for an optimum is
allow specification of flux conditions. However, it is possible to treat
much more general boundary conditions by adding their weak form. d
(4 − 4x + 2x2 ) = −4 + 4x = 0,
That is, one simply adds
R
the variational formulation of some boundary dx
condition B(u) = 0: ΩB B(u)v dx, where ΩB is some boundary, to the so x = 1 and y = 2 − x = 1.
variational formulation of the PDE problem. Or using the terminology In the general case we have a scalar function f (x), x = (x0 , . . . , xm )
from Chapter 2: the residual of the boundary condition when the discrete with n + 1 constraints gi (x) = 0, i = 0, . . . , n. In theory, we could use the
solution is inserted is added to the residual of the entire problem. The constraints to express n + 1 variables in terms of the remaining m − n
present chapter shows underlying mathematical details. variables, but this is very seldom possible, because it requires us to solve
the gi = 0 symbolically with respect to n + 1 different variables.
8.1 Optimization with constraint

8.1.2 Lagrange multiplier method
Suppose we have a function
When we cannot easily eliminate variables using the constraint(s), the
2
f (x, y) = x + y .2 Lagrange multiplier method come to aid. Optimization of f (x, y) under
the constraint g(x, y) = 0 then consists in formulating the Lagrangian
and want to optimize its values, i.e., find minima and maxima. The
condition for an optimum is that the derivatives vanish in all directions, `(x, y, λ) = f (x, y) + λg(x, y),
which implies
where λ is the Lagrange multiplier, which is unknown. The conditions
n · ∇f = 0 ∀n ∈ R , 2 for an optimum is that
which further implies

8.1 Optimization with constraint 291 292 8 Flexible implementations of boundary conditions
∂`
= 0,
∂`
= 0,
∂`
= 0. Note that λ is now a given (chosen) number. The `λ function is just a
∂x ∂y ∂λ function of two variables, so the optimum is found by solving
In our example, we have
∂`λ ∂`λ
= 0, = 0.
2 2 ∂x ∂y
`(x, y, λ) = x + y + λ(y − 2 + x),
Here we get
leading to the conditions
2x + λ(y − 2 + x) = 0, 2y + λ(y − 2 + x) = 0 .
2x + λ = 0, 2y + λ = 0, y − 2 + x = 0.
The solution becomes
This is a system of three linear equations in three unknowns with the
solution 1
x=y= ,
1 − 12 λ−1
x = 1, y = 1, λ = 2.
which we see approaches the correct solution x = y = 1 as λ → ∞.
In the general case with optimizing f (x) subject to the constraints The penalty method for optimization of a multi-variate function f (x)
gi (x) = 0, i = 0, . . . , n, the Lagrangian becomes with constraints gi (x) = 0, i = 0, . . . , n, can be formulated as optimiza-
n
tion of the unconstrained function
X
`(x, λ) = f (x) + λj gj (x),
1 Xn
j=0 `λ (x) = f (x) + λ (gi (x))2 .
2 j=0
with x = (x0 , . . . , xm ) and λ = (λ0 , . . . , λn ). The conditions for an
optimum are Sometimes the symbol −1 is used for λ in the penalty method.
∂f ∂f
= 0, = 0 .,
∂x ∂λ
8.2 Optimization of functionals
where
∂f ∂f
=0⇒ = 0, i = 0, . . . , m . The methods above for optimization of scalar functions of a finite number
∂x ∂xi
of variables can be generalized to optimization of functionals (functions
Similarly, ∂f /∂ λ = 0 leads to n + 1 equations ∂f /∂λi = 0, i = 0, . . . , n. of functions). We start with the specific example of optimizing
Z Z Z
8.1.3 Penalty method F (u) = ||∇u||2 dx − f u dx − gu ds, u ∈ V, (8.1)
Ω Ω ∂ΩN
Instead of incorporating the constraint exactly, as in the Lagrange mul-
tiplier method, the penalty method employs an approximation at the 2
where Ω ⊂ R , and u and f are functions of x and y in Ω. The norm
benefit of avoiding the extra Lagrange multiplier as unknown. The idea ||∇u||2 is defined as u2x + u2y , with ux denoting the derivative with respect
is to add the constraint squared, multiplied by a large prescribed number to x. The vector space V contains the relevant functions for this problem,
λ, called the penalty parameter, and more specifically, V is the Hilbert space H01 consisting of all functions
R
for which (u2 + ||∇u||2 ) dx is finite and u = 0 on ∂ΩD , which is some
1
`λ (x, y) = f (x, y) + λ(y − 2 + x)2 . Ω
2 part of the boundary ∂Ω of Ω. The remaining part of the boundary is
denoted by ∂ΩN (∂ΩN ∪ ∂ΩD = ∂Ω, ∂ΩN ∩ ∂ΩD = ∅), over which F (u)
8.2 Optimization of functionals 293 294 8 Flexible implementations of boundary conditions
Z Z Z
involves a line integral. Note that F is a mapping from any u ∈ V to a δF
= ∇u · ∇v dx − f v dx − gv ds = 0 . (8.6)
real number in R. δu
Ω Ω ∂ΩN
Since v is arbitrary, this equation must hold ∀v ∈ V . Many will recognize

8.2.1 Classical calculus of variations (8.6) as the variational formulation of a Poisson problem, which can be
directly discretized and solved by a finite element method.
Optimization of the functional F makes use of the machinery from varia- Variational calculus goes one more step and derives a partial differential
tional calculus. The essence is to demand that the functional derivative equation problem from (8.6), known as the Euler-Lagrange equation
of F with respect to u is zero. Technically, this is carried out by writing corresponding to optimization of F (u). To find the differential equation,
a general function ũ ∈ V as ũ = u + v, where u is the exact solution one manipulates the variational form (8.6) such Rthat no derivatives of v
of the optimization problem, v is an arbitrary function in V , and is a appear and the equation (8.6) can be written as Lv dx = 0, ∀v ∈, from
scalar parameter. The functional derivative in the direction of v (also Ω
known as the Gateaux derivative) is defined as which it follows that L = 0 is the differential equation.
R
Performing integration by parts of the term ∇u · ∇v dx in (8.6)
δF d Ω
= lim F (u + v) . (8.2) moves the derivatives of v over to u:
δu →0 d
R
As an example, the functional derivative to the term f u dx in F (u) Z Z Z
∂u
is computed by finding
Ω
∇u · ∇v dx = − (∇2 u)v dx + v ds
∂n
Ω Ω ∂Ω
Z Z Z Z Z
d ∂u ∂u
f · (u + v) dx = f v dx, (8.3) =− 2
(∇ u)v dx + v ds + v ds
d ∂n ∂n
Ω Ω Ω ∂ΩD ∂ΩN
R Z Z Z
and then let go to zero, which just results in f v dx. The functional ∂u ∂u
Ω =− (∇2 u)v dx + 0 ds + v ds
derivative of the other area integral becomes ∂n ∂n
Ω ∂ΩD ∂ΩN
Z Z
2 ∂u
Z Z =− (∇ u)v dx + v ds .
d ∂n
((ux +vx )2 +(uy +vy )2 ) dx = (2(ux +vx )vx +2(uv +vy )vy ) dx, Ω ∂ΩN
d
Using this rewrite in (8.6) gives
Ω Ω
which leads to Z Z Z Z
∂u
Z Z − (∇2 u)v dx + v ds − f v dx − gv ds,
(ux vx + uy vy ) dx = ∇u · ∇v dx, (8.4) ∂n
Ω ∂ΩN Ω ∂ΩN
Ω Ω
which equals
as → 0. Z Z
The functional derivative of the boundary term becomes (∇2 u + f )v dx +
∂u
− g v ds = 0 .
Z Z ∂n
d Ω ∂ΩN
g · (u + v) ds = gv ds, (8.5)
d This is to hold for any v ∈ V , which means that the integrands must
vanish, and we get the famous Poisson problem
∂ΩN ∂ΩN
for any . From (8.3)-(8.5) we then get the result

Z
−∇2 u = f, (x, y) ∈ Ω, |u − uN | ds = 0 . (8.7)
u = 0, (x, y) ∈ ∂ΩD , ∂ΩN
∂u
= g, (x, y) ∈ ∂ΩN . However, this constraint is cumbersome to implement. Note that the
∂n absolute sign
R
here is needed as in general there are many functions u
such that u − uN ds = 0.
∂ΩN
Some remarks. A penalty method. The idea is to add a penalization term 12 λ(u − uN )2 ,
• Specifying u on some part of the boundary (∂ΩD ) implies a integrated over the boundary ∂ΩN , to the functional F (u), just as we
specification of ∂u/∂n on the rest of the boundary. In particular, do in the penalty method (the factor 12 can be incorporated in λ, but
if such a specification is not explicitly done, the mathematics makes the final result look nicer). The condition ∂u/∂n = g on ∂ΩN is
above implies ∂u/∂n = 0 on ∂ΩN . no longer relevant, so we replace the g by the unknown ∂u/∂n in the
• If a non-zero condition on u = u0 on ∂ΩD is wanted, one can boundary integral term in (8.1). The new functional becomes
write u = u0 + ū and express the functional F in terms of ū, kam 30: there is a mismatch between text and equations here. If we
which obviously must vanish on ∂ΩD since u is the exact solution replace g with ∂u/∂n it seems we get a symmetric form right away as
that is u0 on ∂ΩD . we get u∂u/∂n.
• The boundary conditions on u must be implemented in the
space V , i.e., we can only work with functions that must be zero Z Z Z
∂u 1
Z
on ∂ΩD (so-called essential boundary condition). The condition F (u) = ||∇u||2 dx− f u dx− u ds+ λ(u−uN )2 ds, u ∈ V,
∂n 2
involving ∂u/∂n is easier to implement since it is just a matter Ω Ω ∂ΩN ∂ΩN
of computing a line integral. (8.8)
• The solution is not unique if ∂ΩD = ∅ (any solution u + const is In F (ũ), insert ũ = u + v, differentiate with respect to , and let → 0.
also a solution). The result becomes
Z Z Z Z
δF ∂u
= ∇u · ∇v dx − f v dx − v ds + λ(u − uN )v ds = 0 .
δu ∂n
Ω Ω ∂ΩN ∂ΩN
(8.9)
8.2.2 Penalty method for optimization with constraints
The attention is now on optimization of a functional F (u) with a given Remark.
constraint that u = uN on ∂ΩN . We could, of course, just extend the
We can drop the essential condition u = 0 on ∂ΩE and just use the
Dirichlet condition on u in the previous set-up by saying that ∂ΩD is
method above to enforce u = uN on the entire boundary ∂Ω.
the complete boundary ∂Ω and that u takes on the values of 0 and uN
at the different parts of the boundary. However, this also implies that
all the functions in V must vanish on the entire boundary. We want to Symmetrization. Using the formulation (8.9) for finite element compu-
relax this condition (and by relaxing it, we will derive a method that tations has one disadvantage: the variational form is no longer symmetric,
can be used for many other types of boundary conditions!). The goal is, and the coefficient matrix in the associated linear system becomes non-
therefore, to incorporate u = uN on ∂ΩN without demanding anything symmetric. We see this if we rewrite (8.9) as
from the functions in V . We can achieve this by enforcing the constraint
a(u, v) = L(v), ∀v ∈ V,
with 8.2.3 Lagrange multiplier method for optimization with

constraints
Z Z Z
a(u, v) = ∇u · ∇v dx −
∂u
v ds + λuv ds, We consider the same problem as in Section 8.2.2, but this time we want
∂n to apply a Lagrange multiplier method so we can solve for a multiplier
Ω ∂ΩN ∂ΩN
Z Z function rather than specifying a large number for a penalty parameter
L(v) = f v dx + λuN v ds . and getting an approximate result.
Ω ∂ΩN The functional to be optimized reads
The lack of symmetry is evident in that we cannot interchange u and v Z Z Z Z
in a(u, v), that is; a(u, v) 6= a(v, u). The standard finite element method F (u) = ||∇u||2 dx− f u dx− uN ds+ λ(u−uN ) ds, u∈V .
results in a symmetric bilinear form and a corresponding matrix for Ω Ω ∂ΩN ∂ΩN
the Poisson problem and we would like to regain this property. The
problematic non-symmetric term is the line integral of v∂u/∂n. If we had Here we have two unknown functions: u ∈ V in Ω and λ ∈ Q on ∂ΩN .
another termR u∂v/∂n, the sum would be symmetric. The idea is therefore The optimization criteria are
to subtract ∂ΩN u∂v/∂n R
ds from both a(u,
R
v) and L(v). Since u = uN
δF δF
on ∂ΩN we subtract ∂ΩN u∂v/∂n ds = ∂ΩN uN ∂v/∂n ds in L(v): = 0, = 0.
δu δλ
Z Z Z Z We write ũ = u + u v and λ̃ = λ + λ p, where v is an arbitrary function
∂u ∂v
a(u, v) = ∇u · ∇v dx − v ds − u ds + λuv ds, in V and p is an arbitrary function in Q. Notice that V is here a usual
∂n ∂n
Ω ∂ΩN ∂ΩN ∂ΩN function space with functions defined on Ω, while on the other hand is a
Z Z
∂v
Z function space defined only on the surface ΩN . We insert the expressions
L(v) = f v dx − uN ds + λuN v ds . for ũ and λ̃ for u and λ and compute
∂n
Ω ∂ΩN ∂ΩN
Z Z Z Z
This formulation is known as Nitsche’s method, but it is actually a δF dF ∂u
= lim = ∇u · ∇v dx − f v dx − v ds + λ(u − uN ) ds = 0,
standard penalty method. δu u →0 du ∂n
We can also easily derive this formulation from the partial differential Ω
Z
Ω ∂ΩN ∂ΩN
equation problem. We multiply −∇2 u = f by v and integrate over Ω. δF

= lim
dF
= (u − uN )p ds = 0 .
Integration by parts leads to δλ λ →0 dλ
∂ΩN
Z Z Z
(∇2 u)v dx = − ∇u · ∇v dx +
∂u
v ds . These equations can be written as a linear system of equations: Find
Ω Ω ∂Ω
∂n u, λ ∈ V × Q such that
Now, u = 0 and therefore v = 0 on ∂ΩD so the line integral reduces to a(u, v) + b(λ, v) = L(v),
an integral over ∂ΩN , where we have no condition and hence no value b(u, p) = 0,
for ∂u/∂n, so we leave
R
the integral as is. Then we add the boundary
penalization term λ ∂ΩN (u − uN )v ds. The result becomes identical to for all test functions v ∈ V and p ∈ Q and
(8.9). We can thereafter add the symmetrization terms if desired.
Z Z
a(u, v) = ∇u · ∇v dx −
∂u
v ds, A uniform mesh with nodes xi = i∆x is introduced, numbered from
∂n left to right: i = 0, . . . , Nx . The approximate value of u at xi is denoted
Ω P x
by ci , and in general the approximation to u is N i=0 ϕi (x)ci .
∂ΩN
Z
b(λ, v) = λv ds, The elements at the boundaries needs special attention. Let us consider
Ω the element 0 defined on [0, h]. The basis functions are ϕ0 (x) = 1 − x/h
Z
and ϕ1 (x) = x/h. Hence, ϕ0 |x=0 = 1, ϕ00 |x=0 = −1/h, ϕ1 |x=0 = 0, and
L(v) = f v dx,
ϕ01 |x=0 = 1/h. Therefore, for element 0 we obtain the element matrix
Ω
Z
(0) 3
L(λ) = uN λ ds. A0,0 = λ + ,
h
Ω
(0) 2
P P A0,1 = − ,
(u) (λ) (v) (p) h
Letting u = j∈Is cj ψj ,
λ= v= j∈Is cj ψj , ψi , and p = ψi ,
(0) 2
we obtain the following system of linear equations A1,0 = − ,
h
" #" # " # 1
A(u,u) A(λ,u) c(u) b(u) (0)
A1,1 = .
Ac = (λ) = = b, h
A(u,λ) 0 c b(λ)
The interior elements (e = 1 . . . Ne − 2) result in the following element
where matrices
(u,u) (w) (w)
Ai,j = a(ψj , ψi ), (e) 1 (e) 1
A0,0 = , A0,1 = − ,
(u,λ) (u) (w) h h
Ai,j = b(ψj , ψi ), 1 1
(e) (e)
(λ,u) (u,λ) A1,0 =− , A1,1 = .
Ai,j = Aj,i , h h
(w) (w)
bi = (f, ψi ), While the element at the boundary x = 1 result in a element matrix
(u) similar to A0 except that 0 and 1 are swapped. The calculations are
bi = 0.
straightforward in sympy
kam 31: a should be symmetric. Check why not. Also here ∂ΩN is import sympy as sym
used almost everywhere. Should probably be ∂Ω. x, h, lam = sym.symbols("x h \lambda")
basis = [1 - x/h, x/h]
for i in range(len(basis)):
8.2.4 Example: 1D problem phi_i = basis[i]
for j in range(len(basis)):
Penalty method. Let us do hand calculations to demonstrate weakly phi_j = basis[j]
a = sym.integrate(sym.diff(phi_i, x)*sym.diff(phi_j, x), (x, 0, h))
enforced boundary conditions via a penalty method and via the Lagrange a -= (sym.diff(phi_i, x)*phi_j).subs(x,0)
multiplier method. We study the simple problem −u00 = 2 on [0, 1] with a -= (sym.diff(phi_j, x)*phi_i).subs(x,0)
boundary conditions u(0) = 0 and u(1) = 1. a += (lam*phi_j*phi_i).subs(x,0)
Z 1 kam 32: Do the thing in FEniCS. Check results for different λ.

a(u, v) = ∇u · ∇v dx − [ux v]10 − [vx u]10 + [λuv]10
0 Lagrange multiplier method. For the Lagrange multiplier method we
Z 1 need a function space Q defined on the boundary of the domain. In 1D
L(v) = f v dx − [vx uN ]10 + [λuN v]10 . with Ω = (0, 1) the boundary is x = 0 and x = 1. Hence, Q can be
0
8.2 Optimization of functionals 301
spanned by two basis functions λ0 and λ1 . These functions should be

such that λ0 = 1 for x = 0 and zero everywhere else, while λ1 = 1 for
x = 1 and zero everywhere else. Now, set
λ(x) = λ0 ϕ0 (x) + λNx ϕNx (x) .

kam 33: ok, write this up in detail
8.2.5 Example: adding a constraint in a Neumann problem

R
• Neumann problem: add Ω u dx = 0 as constraint.
• Set up Nitsche and Lagrange multiplier method and the simple trick
u fixed at a point. Compare in 1D.
kam
R
34:
R
penalty is kindR of hard, at least in fenics, it involves the terms
Ω u × Ω v rather than Ω uv.
304 9 Nonlinear problems
Differential equations. The unknown in a differential equation is a

Nonlinear problems
9
function and not a number. In a linear differential equation, all terms
involving the unknown function are linear in the unknown function or its
derivatives. Linear here means that the unknown function, or a derivative
of it, is multiplied by a number or a known function. All other differential
equations are non-linear.
The easiest way to see if an equation is nonlinear, is to spot nonlinear
terms where the unknown function or its derivatives are multiplied by
each other. For example, in
u0 (t) = −a(t)u(t) + b(t),

the terms involving the unknown function u are linear: u0 contains the
derivative of the unknown function multiplied by unity, and au contains
the unknown function multiplied by a known function. However,
9.1 Introduction of basic concepts u0 (t) = u(t)(1 − u(t)),
is nonlinear because of the term −u2 where the unknown function is

9.1.1 Linear versus nonlinear equations multiplied by itself. Also
Algebraic equations. A linear, scalar, algebraic equation in x has the ∂u ∂u
form +u = 0,
∂t ∂x
is nonlinear because of the term uux where the unknown function appears
ax + b = 0,
in a product with its derivative. (Note here that we use different notations
for arbitrary real constants a and b. The unknown is a number x. All for derivatives: u0 or du/dt for a function u(t) of one variable, ∂u∂t or ut
other algebraic equations, e.g., x2 + ax + b = 0, are nonlinear. The typical for a function of more than one variable.)
feature in a nonlinear algebraic equation is that the unknown appears Another example of a nonlinear equation is
in products with itself, like x2 or in functions that are infinite sums of
products, like ex = 1 + x + 21 x2 + 3!1 x3 + · · · . u00 + sin(u) = 0,
We know how to solve a linear algebraic equation, x = −b/a, but
because sin(u) contains products of u, which becomes clear if we expand
there are no general closed formulas for finding the exact solutions of
the function in a Taylor series:
nonlinear algebraic equations, except for very special cases (quadratic
equations constitute a primary example). A nonlinear algebraic equation 1
may have no solution, one solution, or many solutions. The tools for sin(u) = u − u3 + . . .
3
solving nonlinear algebraic equations are iterative methods, where we
construct a series of linear equations, which we know how to solve, and
hope that the solutions of the linear equations converge to a solution of Mathematical proof of linearity
the nonlinear equation we want to solve. Typical methods for nonlinear
algebraic equation equations are Newton’s method, the Bisection method,
and the Secant method.

9.1 Introduction of basic concepts 305 306 9 Nonlinear problems
• explicit time discretization methods (with no need to solve nonlinear

To really prove mathematically that some differential equation in an
algebraic equations)
unknown u is linear, show for each term T (u) that with u = au1 +bu2
• implicit Backward Euler time discretization, leading to nonlinear
for constants a and b,
algebraic equations solved by
T (au1 + bu2 ) = aT (u1 ) + bT (u2 ) . – an exact analytical technique
– Picard iteration based on manual linearization
For example, the term T (u) = (sin2 t)u0 (t) is linear because – a single Picard step
– Newton’s method
T (au1 + bu2 ) = (sin2 t)(au1 (t) + bu2 (t))0 • implicit Crank-Nicolson time discretization and linearization via a
2 2 geometric mean formula
= a(sin t)u01 (t) + b(sin t)u02 (t)
= aT (u1 ) + bT (u2 ) . Thereafter, we compare the performance of the various approaches. De-
spite the simplicity of (9.1), the conclusions reveal typical features of the
However, T (u) = sin u is nonlinear because various methods in much more complicated nonlinear PDE problems.
T (au1 + bu2 ) = sin(au1 + bu2 ) 6= a sin u1 + b sin u2 .

9.1.3 Linearization by explicit time discretization
Time discretization methods are divided into explicit and implicit meth-
ods. Explicit methods lead to a closed-form formula for finding new values
9.1.2 A simple model problem of the unknowns, while implicit methods give a linear or nonlinear system
A series of forthcoming examples will explain how to tackle nonlinear of equations that couples (all) the unknowns at a new time level. Here
differential equations with various techniques. We start with the (scaled) we shall demonstrate that explicit methods may constitute an efficient
logistic equation as model problem: way to deal with nonlinear differential equations.
The Forward Euler method is an explicit method. When applied to
u0 (t) = u(t)(1 − u(t)) . (9.1) (9.1), sampled at t = tn , it results in
This is a nonlinear ordinary differential equation (ODE) which will be un+1 − un

solved by different strategies in the following. Depending on the chosen = un (1 − un ),
∆t
time discretization of (9.1), the mathematical problem to be solved at which is a linear algebraic equation for the unknown value un+1 that we
every time level will either be a linear algebraic equation or a nonlinear can easily solve:
algebraic equation. In the former case, the time discretization method
transforms the nonlinear ODE into linear subproblems at each time un+1 = un + ∆t un (1 − un ) .
level, and the solution is straightforward to find since linear algebraic
equations are easy to solve. However, when the time discretization leads The nonlinearity in the original equation poses in this case no difficulty
to nonlinear algebraic equations, we cannot (except in very rare cases) in the discrete algebraic equation. Any other explicit scheme in time will
solve these without turning to approximate, iterative solution methods. also give only linear algebraic equations to solve. For example, a typical
The next subsections introduce various methods for solving nonlinear 2nd-order Runge-Kutta method for (9.1) leads to the following formulas:
differential equations, using (9.1) as model. We shall go through the
following set of cases:
u∗ = un + ∆tun (1 − un ), pick the right solution? This is in general a hard problem. In the present
1 simple case, however, we can analyze the roots mathematically and
un+1 = un + ∆t (un (1 − un ) + u∗ (1 − u∗ ))) . provide an answer. The idea is to expand the roots in a series in ∆t
2
and truncate after the linear term since the Backward Euler scheme will
The first step is linear in the unknown u∗ . Then u∗ is known in the next introduce an error proportional to ∆t anyway. Using sympy we find the
step, which is linear in the unknown un+1 . following Taylor series expansions of the roots:
>>> import sympy as sym
>>> dt, u_1, u = sym.symbols(’dt u_1 u’)
9.1.4 Exact solution of nonlinear algebraic equations >>> r1, r2 = sym.solve(dt*u**2 + (1-dt)*u - u_1, u) # find roots
>>> r1
Switching to a Backward Euler scheme for (9.1), (dt - sqrt(dt**2 + 4*dt*u_1 - 2*dt + 1) - 1)/(2*dt)
>>> r2
un − un−1 (dt + sqrt(dt**2 + 4*dt*u_1 - 2*dt + 1) - 1)/(2*dt)
= un (1 − un ), (9.2) >>> print r1.series(dt, 0, 2) # 2 terms in dt, around dt=0
∆t -1/dt + 1 - u_1 + dt*(u_1**2 - u_1) + O(dt**2)
results in a nonlinear algebraic equation for the unknown value un . The >>> print r2.series(dt, 0, 2)
u_1 + dt*(-u_1**2 + u_1) + O(dt**2)
equation is of quadratic type:
We see that the r1 root, corresponding to a minus sign in front of the
∆t(un )2 + (1 − ∆t)un − un−1 = 0, square root in (9.4), behaves as 1/∆t and will therefore blow up as
∆t → 0! Since we know that u takes on finite values, actually it is less
and may be solved exactly by the well-known formula for such equations.
than or equal to 1, only the r2 root is of relevance in this case: as ∆t → 0,
Before we do so, however, we will introduce a shorter, and often cleaner,
u → u(1) , which is the expected result.
notation for nonlinear algebraic equations at a given time level. The
For those who are not well experienced with approximating mathemat-
notation is inspired by the natural notation (i.e., variable names) used
ical formulas by series expansion, an alternative method of investigation
in a program, especially in more advanced partial differential equation
is simply to compute the limits of the two roots as ∆t → 0 and see if a
problems. The unknown in the algebraic equation is denoted by u, while
limit unreasonable:
u(1) is the value of the unknown at the previous time level (in general, u(`)
is the value of the unknown ` levels back in time). The notation will be >>> print r1.limit(dt, 0)
-oo
frequently used in later sections. What is meant by u should be evident >>> print r2.limit(dt, 0)
from the context: u may be 1) the exact solution of the ODE/PDE u_1
problem, 2) the numerical approximation to the exact solution, or 3) the
unknown solution at a certain time level.
The quadratic equation for the unknown un in (9.2) can, with the new
notation, be written 9.1.5 Linearization
When the time integration of an ODE results in a nonlinear algebraic
F (u) = ∆tu2 + (1 − ∆t)u − u(1) = 0 . (9.3)
equation, we must normally find its solution by defining a sequence of
The solution is readily found to be linear equations and hope that the solutions of these linear equations con-
q verge to the desired solution of the nonlinear algebraic equation. Usually,
1 this means solving the linear equation repeatedly in an iterative fashion.
u= −1 + ∆t ± (1 − ∆t)2 − 4∆tu(1) . (9.4)
2∆t Alternatively, the nonlinear equation can sometimes be approximated by
Now we encounter a fundamental challenge with nonlinear algebraic one linear equation, and consequently there is no need for iteration.
equations: the equation may have more than one solution. How do we
Constructing a linear equation from a nonlinear one requires lineariza- c

auk uk+1 + buk+1 + c = 0 ⇒ uk+1 = − , k = 0, 1, . . .
tion of each nonlinear term. This can be done manually as in Picard auk + b
iteration, or fully algorithmically as in Newton’s method. Examples will Since we need to perform the iteration at every time level, the time level
best illustrate how to linearize nonlinear problems. counter is often also included:
un−1
9.1.6 Picard iteration aun,k un,k+1 +bun,k+1 −un−1 = 0 ⇒ un,k+1 = , k = 0, 1, . . . ,
aun,k + b
Let us write (9.3) in a more compact form
with the start value un,0 = un−1 and the final converged value un = un,k
F (u) = au2 + bu + c = 0, for sufficiently large k.
However, we will normally apply a mathematical notation in our final
with a = ∆t, b = 1 − ∆t, and c = −u(1) . Let u− be an available formulas that is as close as possible to what we aim to write in a computer
approximation of the unknown u. code and then it becomes natural to use u and u− instead of uk+1 and
Then we can linearize the term u2 simply by writing u− u. The resulting uk or un,k+1 and un,k .
equation, F̂ (u) = 0, is now linear and hence easy to solve:
Stopping criteria. The iteration method can typically be terminated
F (u) ≈ F̂ (u) = au− u + bu + c = 0 . when the change in the solution is smaller than a tolerance u :
Since the equation F̂ = 0 is only approximate, the solution u does not |u − u− | ≤ u ,

equal the exact solution ue of the exact equation F (ue ) = 0, but we can
hope that u is closer to ue than u− is, and hence it makes sense to repeat or when the residual in the equation is sufficiently small (< r ),
the procedure, i.e., set u− = u and solve F̂ (u) = 0 again. There is no
|F (u)| = |au2 + bu + c| < r .
guarantee that u is closer to ue than u− , but this approach has proven
to be effective in a wide range of applications.
The idea of turning a nonlinear equation into a linear one by using an A single Picard iteration. Instead of iterating until a stopping criterion
approximation u− of u in nonlinear terms is a widely used approach that is fulfilled, one may iterate a specific number of times. Just one Picard
goes under many names: fixed-point iteration, the method of successive iteration is popular as this corresponds to the intuitive idea of approx-
substitutions, nonlinear Richardson iteration, and Picard iteration. We imating a nonlinear term like (un )2 by un−1 un . This follows from the
will stick to the latter name. linearization u− un and the initial choice of u− = un−1 at time level tn . In
Picard iteration for solving the nonlinear equation arising from the other words, a single Picard iteration corresponds to using the solution
Backward Euler discretization of the logistic equation can be written as at the previous time level to linearize nonlinear terms. The resulting
discretization becomes (using proper values for a, b, and c)
c
u=− , u− ← u . un − un−1
au− + b = un (1 − un−1 ), (9.5)
The ← symbols means assignment (we set u− equal to the value of u). ∆t
The iteration is started with the value of the unknown at the previous which is a linear algebraic equation in the unknown un , making it easy
time level: u− = u(1) . to solve for un without any need for any alternative notation.
Some prefer an explicit iteration counter as superscript in the mathe- We shall later refer to the strategy of taking one Picard step, or
matical notation. Let uk be the computed approximation to the solution equivalently, linearizing terms with use of the solution at the previous
in iteration k. In iteration k + 1 we want to solve time step, as the Picard1 method. It is a widely used approach in science
and technology, but with some limitations if ∆t is not sufficiently small n+ 21

Using an arithmetic mean on the linear u term in (9.6) and a geometric
(as will be illustrated later). mean for the second term, results in a linearized equation for the unknown
un+1 :
Notice
un+1 − un 1
Equation (9.5) does not correspond to a “pure” finite difference = (un + un+1 ) − un un+1 ,
∆t 2
method where the equation is sampled at a point and derivatives which can readily be solved:
replaced by differences (because the un−1 term on the right-hand
side must then be un ). The best interpretation of the scheme (9.5) 1 + 21 ∆t
un+1 = un .
is a Backward Euler difference combined with a single (perhaps 1 + ∆tun − 12 ∆t
insufficient) Picard iteration at each time level, with the value at
This scheme can be coded directly, and since there is no nonlinear
the previous time level as start for the Picard iteration.
algebraic equation to iterate over, we skip the simplified notation with u
for un+1 and u(1) for un . The technique with using a geometric average
is an example of transforming a nonlinear algebraic equation to a linear
one, without any need for iterations.
9.1.7 Linearization by a geometric mean The geometric mean approximation is often very effective for lineariz-
ing quadratic nonlinearities. Both the arithmetic and geometric mean
We consider now a Crank-Nicolson discretization of (9.1). This means approximations have truncation errors of order ∆t2 and are therefore
that the time derivative is approximated by a centered difference, compatible with the truncation error O(∆t2 ) of the centered difference
1 approximation for u0 in the Crank-Nicolson method.
[Dt u = u(1 − u)]n+ 2 , Applying the operator notation for the means and finite differences,
written out as the linearized Crank-Nicolson scheme for the logistic equation can be
compactly expressed as
un+1 − un 1 1
= un+ 2 − (un+ 2 )2 . (9.6) t,g 1
∆t [Dt u = ut + u2 ]n+ 2 .
n+ 12
The first term u is normally approximated by an arithmetic mean,
1 1 Remark
un+ 2 ≈ (un + un+1 ),
2 If we use an arithmetic instead of a geometric mean for the nonlinear
such that the scheme involves the unknown function only at the time term in (9.6), we end up with a nonlinear term (un+1 )2 . This term
levels where we actually compute it. The same arithmetic mean applied can be linearized as u− un+1 in a Picard iteration approach and in
to the second term gives particular as un un+1 in a Picard1 iteration approach. The latter
gives a scheme almost identical to the one arising from a geometric
1 1
(un+ 2 )2 ≈ (un + un+1 )2 , mean (the difference in un+1 being 14 ∆tun (un+1 − un ) ≈ 14 ∆t2 u0 u,
4 i.e., a difference of size ∆t2 ).
which is nonlinear in the unknown un+1 . However, using a geometric
1
mean for (un+ 2 )2 is a way of linearizing the nonlinear term in (9.6):
kam 35: this is the first time I’ve seen this ut notation. It is not in
(u n+ 12 2
) ≈u un n+1
. the appendix either.
9.1.8 Newton’s method

F 0 (u) = 2au + b .
The Backward Euler scheme (9.2) for the logistic equation leads to a
nonlinear algebraic equation (9.3). Now we write any nonlinear algebraic The iteration method becomes
equation in the general and compact form
a(u− )2 + bu− + c
u = u− + , u− ← u . (9.7)
F (u) = 0 . 2au− + b
At each time level, we start the iteration by setting u− = u(1) . Stopping
Newton’s method linearizes this equation by approximating F (u) by its
criteria as listed for the Picard iteration can be used also for Newton’s
Taylor series expansion around a computed value u− and keeping only
method.
the linear part:
An alternative mathematical form, where we write out a, b, and c, and
use a time level counter n and an iteration counter k, takes the form
1
F (u) = F (u− ) + F 0 (u− )(u − u− ) + F 00 (u− )(u − u− )2 + · · ·
2
∆t(un,k )2 + (1 − ∆t)un,k − un−1
≈ F (u− ) + F 0 (u− )(u − u− ) = F̂ (u) . un,k+1 = un,k + , un,0 = un−1 , (9.8)
2∆tun,k + 1 − ∆t
The linear equation F̂ (u) = 0 has the solution for k = 0, 1, . . .. A program implementation is much closer to (9.7) than
to (9.8), but the latter is better aligned with the established mathematical
F (u− )
u = u− − . notation used in the literature.
F 0 (u− )
Expressed with an iteration index in the unknown, Newton’s method
takes on the more familiar mathematical form
9.1.9 Relaxation
F (u )k
One iteration in Newton’s method or Picard iteration consists of solving a
uk+1 = uk − , k = 0, 1, . . .
F 0 (uk ) linear problem F̂ (u) = 0. Sometimes convergence problems arise because
It can be shown that the error in iteration k + 1 of Newton’s method the new solution u of F̂ (u) = 0 is “too far away” from the previously
is proportional to the square of the error in iteration k, a result referred computed solution u− . A remedy is to introduce a relaxation, meaning
to as quadratic convergence. This means that for small errors the method that we first solve F̂ (u∗ ) = 0 for a suggested value u∗ and then we take u
converges very fast, and in particular much faster than Picard iteration as a weighted mean of what we had, u− , and what our linearized equation
and other iteration methods. (The proof of this result is found in most F̂ = 0 suggests, u∗ :
textbooks on numerical analysis.) However, the quadratic convergence
appears only if uk is sufficiently close to the solution. Further away from u = ωu∗ + (1 − ω)u− .
the solution the method can easily converge very slowly or diverge. The The parameter ω is known as a relaxation parameter, and a choice ω < 1
reader is encouraged to do Exercise 9.3 to get a better understanding for may prevent divergent iterations.
the behavior of the method. Relaxation in Newton’s method can be directly incorporated in the
Application of Newton’s method to the logistic equation discretized basic iteration formula:
by the Backward Euler method is straightforward as we have
F (u− )
2 (1) u = u− − ω . (9.9)
F (u) = au + bu + c, a = ∆t, b = 1 − ∆t, c = −u , F 0 (u− )
and then
9.1.10 Implementation and experiments def CN_logistic(u0, dt, Nt):

u = np.zeros(Nt+1)
The program logistic.py contains implementations of all the methods u[0] = u0
described above. Below is an extract of the file showing how the Picard and for n in range(0, Nt):
u[n+1] = (1 + 0.5*dt)/(1 + dt*u[n] - 0.5*dt)*u[n]
Newton methods are implemented for a Backward Euler discretization return u
of the logistic equation.
We may run experiments with the model problem (9.1) and the different
def BE_logistic(u0, dt, Nt, choice=’Picard’, strategies for dealing with nonlinearities as described above. For a quite
eps_r=1E-3, omega=1, max_iter=1000):
if choice == ’Picard1’: coarse time resolution, ∆t = 0.9, use of a tolerance r = 0.05 in the
choice = ’Picard’ stopping criterion introduces an iteration error, especially in the Picard
max_iter = 1 iterations, that is visibly much larger than the time discretization error
u = np.zeros(Nt+1) due to a large ∆t. This is illustrated by comparing the upper two plots in
iterations = [] Figure 9.1. The one to the right has a stricter tolerance = 10−3 , which
u[0] = u0
leads to all the curves corresponding to Picard and Newton iteration to
for n in range(1, Nt+1):
a = dt be on top of each other (and no changes can be visually observed by
b = 1 - dt reducing r further). The reason why Newton’s method does much better
c = -u[n-1]
than Picard iteration in the upper left plot is that Newton’s method
if choice == ’Picard’: with one step comes far below the r tolerance, while the Picard iteration
needs on average 7 iterations to bring the residual down to r = 10−1 ,
def F(u):
return a*u**2 + b*u + c
which gives insufficient accuracy in the solution of the nonlinear equation.
It is obvious that the Picard1 method gives significant errors in addition
u_ = u[n-1] to the time discretization unless the time step is as small as in the lower
k = 0
while abs(F(u_)) > eps_r and k < max_iter: right plot.
u_ = omega*(-c/(a*u_ + b)) + (1-omega)*u_ The BE exact curve corresponds to using the exact solution of the
k += 1 quadratic equation at each time level, so this curve is only affected by the
u[n] = u_
iterations.append(k) Backward Euler time discretization. The CN gm curve corresponds to the
theoretically more accurate Crank-Nicolson discretization, combined with
elif choice == ’Newton’: a geometric mean for linearization. This curve appears more accurate,
def F(u): especially if we take the plot in the lower right with a small ∆t and an
return a*u**2 + b*u + c appropriately small r value as the exact curve.
When it comes to the need for iterations, Figure 9.2 displays the
def dF(u):
return 2*a*u + b number of iterations required at each time level for Newton’s method
and Picard iteration. The smaller ∆t is, the better starting value we
u_ = u[n-1]
have for the iteration, and the faster the convergence is. With ∆t = 0.9
k = 0
while abs(F(u_)) > eps_r and k < max_iter: Picard iteration requires on average 32 iterations per time step for the
u_ = u_ - F(u_)/dF(u_) stricter convergence criterion, but this number is dramatically reduced
k += 1
u[n] = u_
as ∆t is reduced.
iterations.append(k) However, introducing relaxation and a parameter ω = 0.8 immediately
return u, iterations reduces the average of 32 to 7, indicating that for the large ∆t = 0.9,
The Crank-Nicolson method utilizing a linearization based on the Picard iteration takes too long steps. An approximately optimal value for
geometric mean gives a simpler algorithm: ω in this case is 0.5, which results in an average of only 2 iterations! An
dt=0.9, eps=5E-02 dt=0.9, eps=1E-03

even more dramatic impact of ω appears when ∆t = 1: Picard iteration 12 Picard Picard
40
does not convergence in 1000 iterations, but ω = 0.5 again brings the Newton
35
Newton
average number of iterations down to 2.

10
30
8
No of iterations
No of iterations
25
1.0 dt=0.9, eps=5E-02 1.0 dt=0.9, eps=1E-03 6 20
0.9 0.9 15
4
0.8 0.8 10
2
0.7 0.7 5
0.6 0.6 0 0
2 4 6 8 10 2 4 6 8 10
u
u
0.5 0.5 Time level Time level
0.4 FE 0.4 FE dt=0.45, eps=1E-03 dt=0.09, eps=1E-04

BE exact BE exact 6 3.0
0.3 BE Picard 0.3 BE Picard Picard Picard
BE Picard1 BE Picard1 Newton Newton
0.2 BE Newton 0.2 BE Newton 5 2.5
CN gm CN gm
0.1 0.1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 4 2.0
t t
No of iterations
No of iterations
dt=0.45, eps=1E-03 dt=0.09, eps=1E-04 3 1.5
1.0 1.0
0.9 0.9 2 1.0

0.8 0.8
1 0.5
0.7 0.7
0.6 0.6 0 0.0

5 10 15 20 20 40 60 80 100
Time level Time level
u
0.5 0.5
FE FE Fig. 9.2 Comparison of the number of iterations at various time levels for Picard and
0.4 0.4
BE exact BE exact Newton iteration.
0.3 BE Picard 0.3 BE Picard
BE Picard1 BE Picard1
0.2 BE Newton 0.2 BE Newton
CN gm CN gm
0.1
0 1 2 3 4
t
5 6 7 8 9
0.1
0 1 2 3 4
t
5 6 7 8 9 9.1.11 Generalization to a general nonlinear ODE
Fig. 9.1 Impact of solution strategy and time step length on the solution.
Let us see how the various methods in the previous sections can be
applied to the more generic model
hpl 36: Is this remark really relevant now? Compare with text.
Remark. The simple Crank-Nicolson method with a geometric mean for u0 = f (u, t), (9.10)
the quadratic nonlinearity gives visually more accurate solutions than where f is a nonlinear function of u.
the Backward Euler discretization. Even with a tolerance of r = 10−3 ,
Explicit time discretization. Explicit ODE methods like the Forward
all the methods for treating the nonlinearities in the Backward Euler
Euler scheme, Runge-Kutta methods, Adams-Bashforth methods all
discretization give graphs that cannot be distinguished. So for accuracy
evaluate f at time levels where u is already computed, so nonlinearities
in this problem, the time discretization is much more crucial than r .
in f do not pose any difficulties.
Ideally, one should estimate the error in the time discretization, as the
solution progresses, and set r accordingly. Backward Euler discretization. Approximating u0 by a backward dif-
ference leads to a Backward Euler scheme, which can be written as
F (un ) = un − ∆t f (un , tn ) − un−1 = 0,

or alternatively
about 2 iterations per time step, but this example shows a potential
F (u) = u − ∆t f (u, tn ) − u(1) = 0 . of treating f more implicitly.
A trick to treat f implicitly in Picard iteration is to evaluate it as
A simple Picard iteration, not knowing anything about the nonlinear
f (u− , t)u/u− . For a polynomial f , f (u, t) = um , this corresponds to
structure of f , must approximate f (u, tn ) by f (u− , tn ):
(u− )m u/u− = (u− )m−1 u. Sometimes this more implicit treatment
has no effect, as with f (u, t) = exp(−u) and f (u, t) = ln(1 + u),
F̂ (u) = u − ∆t f (u− , tn ) − u(1) .
but with f (u, t) = sin(2(u + 1)), the f (u− , t)u/u− trick leads to
The iteration starts with u− = u(1) and proceeds with repeating 7, 9, and 11 iterations during the first three steps, while f (u− , t)
demands 17, 21, and 20 iterations. (Experiments can be done with
the code ODE_Picard_tricks.py.)
u∗ = ∆t f (u− , tn ) + u(1) , u = ωu∗ + (1 − ω)u− , u− ← u,
until a stopping criterion is fulfilled. Newton’s method applied to a Backward Euler discretization of u0 =
f (u, t) requires the computation of the derivative
Explicit vs implicit treatment of nonlinear terms ∂f
F 0 (u) = 1 − ∆t (u, tn ) .
Evaluating f for a known u− is referred to as explicit treatment ∂u
of f , while if f (u, t) has some structure, say f (u, t) = u3 , parts Starting with the solution at the previous time level, u− = u(1) , we can
of f can involve the known u, as in the manual linearization like just use the standard formula
(u− )2 u, and then the treatment of f is “more implicit” and “less
explicit”. This terminology is inspired by time discretization of
u0 = f (u, t), where evaluating f for known u values gives explicit F (u− ) u− − ∆t f (u− , tn ) − u(1)
u = u− − ω = u−
− ω . (9.11)
schemes, while treating f or parts of f implicitly, makes f contribute F 0 (u− ) 1 − ∆t ∂u
∂
f (u− , tn )
to the unknown terms in the equation at the new time level.
Crank-Nicolson discretization. The standard Crank-Nicolson scheme
Explicit treatment of f usually means stricter conditions on ∆t
with arithmetic mean approximation of f takes the form
to achieve stability of time discretization schemes. The same applies
to iteration techniques for nonlinear algebraic equations: the “less” un+1 − un 1
we linearize f (i.e., the more we keep of u in the original formula), = (f (un+1 , tn+1 ) + f (un , tn )) .
∆t 2
the faster the convergence may be.
We can write the scheme as a nonlinear algebraic equation
We may say that f (u, t) = u3 is treated explicitly if we evaluate f
as (u− )3 , partially implicit if we linearize as (u− )2 u and fully implicit
if we represent f by u3 . (Of course, the fully implicit representation 1 1
F (u) = u − u(1) − ∆t f (u, tn+1 ) − ∆t f (u(1) , tn ) = 0 . (9.12)
will require further linearization, but with f (u, t) = u2 a fully 2 2
implicit treatment is possible if the resulting quadratic equation is A Picard iteration scheme must in general employ the linearization
solved with a formula.)
For the ODE u0 = −u3 with f (u, t) = −u3 and coarse time 1 1
F̂ (u) = u − u(1) − ∆t f (u− , tn+1 ) − ∆t f (u(1) , tn ),
resolution ∆t = 0.4, Picard iteration with (u− )2 u requires 8 itera- 2 2
tions with r = 10−3 for the first time step, while (u− )3 leads to 22 while Newton’s method can apply the general formula (9.11) with F (u)
iterations. After about 10 time steps both approaches are down to given in (9.12) and
1 ∂f
F 0 (u) = 1 − ∆t (u, tn+1 ) . un − un−1
2 ∂u = f (un , tn ) .
∆t
This is a system of algebraic equations,
9.1.12 Systems of ODEs
un − ∆t f (un , tn ) − un−1 = 0,
We may write a system of ODEs
or written out
d
u0 (t) = f0 (u0 (t), u1 (t), . . . , uN (t), t),
dt un0 − ∆t f0 (un , tn ) − u0n−1 = 0,
d
u1 (t) = f1 (u0 (t), u1 (t), . . . , uN (t), t), ..
dt .
.. unN − ∆t fN (un , tn ) − un−1 = 0.
. N
d Example. We shall address the 2 × 2 ODE system for oscillations of a
um (t) = fm (u0 (t), u1 (t), . . . , uN (t), t),
dt pendulum subject to gravity and air drag. The system can be written as
as
u0 = f (u, t), u(0) = U0 , (9.13) ω̇ = − sin θ − βω|ω|, (9.14)

if we interpret u as a vector u = (u0 (t), u1 (t), . . . , uN (t)) and f as a θ̇ = ω, (9.15)
vector function with components (f0 (u, t), f1 (u, t), . . . , fN (u, t)).
where β is a dimensionless parameter (this is the scaled, dimensionless
kam 37: Here we don’t use bf
version of the original, physical model). The unknown components of the
Most solution methods for scalar ODEs, including the Forward and
system are the angle θ(t) and the angular velocity ω(t). We introduce
Backward Euler schemes and the Crank-Nicolson method, generalize
u0 = ω and u1 = θ, which leads to
in a straightforward way to systems of ODEs simply by using vector
arithmetics instead of scalar arithmetics, which corresponds to applying
the scalar scheme to each component of the system. For example, here is u00 = f0 (u, t) = − sin u1 − βu0 |u0 |,
a backward difference scheme applied to each component,
u01 = f1 (u, t) = u0 .
kam 38: I think we should write f0 (un0 , . . . , uN , tn )
A Crank-Nicolson scheme reads
un0 − un−1
0
= f0 (un , tn ),
∆t un+1
0 − un0 n+ 1 n+ 1 n+ 1
un1 − un−1 = − sin u1 2 − βu0 2 |u0 2 |
1
= f1 (un , tn ), ∆t
∆t 1 n+1 1
.. ≈ − sin (u1 + un1 ) − β (un+1 + un0 )|un+1 + un0 |,
2 4 0 0
.
(9.16)
unN − un−1
N
= fN (un , tn ), un+1
1 − un1 n+ 1 1
∆t = u0 2 ≈ (un+1 + un0 ) . (9.17)
∆t 2 0
which can be written more compactly in vector form as
9.2 Systems of nonlinear algebraic equations 323 324 9 Nonlinear problems
This is a coupled system of two nonlinear algebraic equations in two 9.2.1 Picard iteration
unknowns un+1
0 and un+1
1 .
We cannot apply Picard iteration to nonlinear equations unless there is
Using the notation u0 and u1 for the unknowns un+10 and un+1
1 in
(1) (1) some special structure. For the commonly arising case A(u)u = b(u) we
this system, writing u0 and u1 for the previous values un0 and un1 ,
can linearize the product A(u)u to A(u− )u and b(u) as b(u− ). That is,
multiplying by ∆t and moving the terms to the left-hand sides, gives
we use the most previously computed approximation in A and b to arrive

at a linear system for u:
(1) 1 (1) 1 (1) (1)
u0 − u0 + ∆t sin (u1 + u1 ) + ∆tβ(u0 + u0 )|u0 + u0 | = 0,
2 4 A(u− )u = b(u− ) .
(9.18)
A relaxed iteration takes the form
(1) 1 (1)
u1 − u1 − ∆t(u0 + u0 ) = 0 .
2 A(u− )u∗ = b(u− ), u = ωu∗ + (1 − ω)u− .
(9.19)
In other words, we solve a system of nonlinear algebraic equations as a
Obviously, we have a need for solving systems of nonlinear algebraic sequence of linear systems.
equations, which is the topic of the next section.
Algorithm for relaxed Picard iteration
9.2 Systems of nonlinear algebraic equations Given A(u)u = b(u) and an initial guess u− , iterate until conver-
gence:
Implicit time discretization methods for a system of ODEs, or a PDE, 1. solve A(u− )u∗ = b(u− ) with respect to u∗
lead to systems of nonlinear algebraic equations, written compactly as 2. u = ωu∗ + (1 − ω)u−
3. u− ← u
F (u) = 0,
where u is a vector of unknowns u = (u0 , . . . , uN ), and F is a vector
function: F = (F0 , . . . , FN ). The system at the end of Section 9.1.12 fits
“Until convergence” means that the iteration is stopped when the
this notation with N = 2, F0 (u) given by the left-hand side of (9.18),
change in the unknown, ||u − u− ||, or the residual ||A(u)u − b||, is
while F1 (u) is the left-hand side of (9.19).
sufficiently small, see Section 9.2.3 for more details.
Sometimes the equation system has a special structure because of the
underlying problem, e.g.,
A(u)u = b(u), 9.2.2 Newton’s method

with A(u) as an (N + 1) × (N + 1) matrix function of u and b as a vector The natural starting point for Newton’s method is the general nonlinear
function: b = (b0 , . . . , bN ). vector equation F (u) = 0. As for a scalar equation, the idea is to approx-
We shall next explain how Picard iteration and Newton’s method can imate F around a known value u− by a linear function F̂ , calculated
be applied to systems like F (u) = 0 and A(u)u = b(u). The exposition from the first two terms of a Taylor expansion of F . In the multi-variate
has a focus on ideas and practical computations. More theoretical con- case these two terms become
siderations, including quite general results on convergence properties of
these methods, can be found in Kelley [9]. F (u− ) + J(u− ) · (u − u− ),
where J is the Jacobian of F , defined by

(A(u− ) + A0 (u− )u− + b0 (u− ))δu = −A(u− )u− + b(u− ) .
∂Fi
Ji,j = . Rearranging the terms demonstrates the difference from the system
∂uj
solved in each Picard iteration:
So, the original nonlinear system is approximated by
A(u− )(u− + δu) − b(u− ) + γ(A0 (u− )u− + b0 (u− ))δu = 0 .
F̂ (u) = F (u− ) + J(u− ) · (u − u− ) = 0, | {z }
Picard system
which is linear in u and can be solved in a two-step procedure: first
solve Jδu = −F (u− ) with respect to the vector δu and then update Here we have inserted a parameter γ such that γ = 0 gives the Picard
u = u− + δu. A relaxation parameter can easily be incorporated: system and γ = 1 gives the Newton system. Such a parameter can be
handy in software to easily switch between the methods.
u = ω(u− + δu) + (1 − ω)u− = u− + ωδu .
Combined algorithm for Picard and Newton iteration
Algorithm for Newton’s method Given A(u), b(u), and an initial guess u− , iterate until convergence:
Given F (u) = 0 and an initial guess u− , iterate until convergence: 1. solve (A + γ(A0 (u− )u− + b0 (u− )))δu = −A(u− )u− + b(u− ) with
respect to δu
1. solve Jδu = −F (u− ) with respect to δu 2. u = u− + ωδu
2. u = u− + ωδu 3. u− ← u
3. u− ← u
γ = 1 gives a Newton method while γ = 0 corresponds to Picard
iteration.
For the special system with structure A(u)u = b(u),

X
Fi = Ai,k (u)uk − bi (u), 9.2.3 Stopping criteria
k
one gets Let ||·|| be the standard Euclidean vector norm. Four termination criteria
are much in use:
X ∂Ai,k ∂bi
Ji,j = uk + Ai,j − . (9.20) • Absolute change in solution: ||u − u− || ≤ u
∂uj ∂uj
k • Relative change in solution: ||u − u− || ≤ u ||u0 ||, where u0 denotes
We realize that the Jacobian needed in Newton’s method consists of the start value of u− in the iteration
A(u− ) as in the Picard iteration plus two additional terms arising from • Absolute residual: ||F (u)|| ≤ r
the differentiation. Using the notation A0 (u) for ∂A/∂u (a quantity with • Relative residual: ||F (u)|| ≤ r ||F (u0 )||
three indices: ∂Ai,k /∂uj ), and b0 (u) for ∂b/∂u (a quantity with two
indices: ∂bi /∂uj ), we can write the linear system to be solved as To prevent divergent iterations to run forever, one terminates the itera-
tions when the current number of iterations k exceeds a maximum value
(A + A0 u + b0 )δu = −Au + b, kmax .
The relative criteria are most used since they are not sensitive to
or the characteristic size of u. Nevertheless, the relative criteria can be
misleading when the initial start value for the iteration is very close S n+1 − S n 1 β
= −β[SI]n+ 2 ≈ − (S n I n + S n+1 I n+1 ), (9.26)
to the solution, since an unnecessary reduction in the error measure is ∆t 2
enforced. In such cases the absolute criteria work better. It is common I n+1 − I n 1 1 β ν
= β[SI]n+ 2 − νI n+ 2 ≈ (S n I n + S n+1 I n+1 ) − (I n + I n+1 ) .
to combine the absolute and relative measures of the size of the residual, ∆t 2 2
as in (9.27)
||F (u)|| ≤ rr ||F (u0 )|| + ra , (9.21) Introducing S for S n+1 , S (1) for S n , I for I n+1 , I (1) for I n , we can rewrite
the system as
where rr is the tolerance in the relative criterion and ra is the tolerance
in the absolute criterion. With a very good initial guess for the iteration
(typically the solution of a differential equation at the previous time 1
FS (S, I) = S − S (1) + ∆tβ(S (1) I (1) + SI) = 0, (9.28)
level), the term ||F (u0 )|| is small and ra is the dominating tolerance. 2
1 1
Otherwise, rr ||F (u0 )|| and the relative criterion dominates. FI (S, I) = I − I (1) − ∆tβ(S (1) I (1) + SI) + ∆tν(I (1) + I) = 0 .
With the change in solution as criterion we can formulate a combined 2 2
(9.29)
absolute and relative measure of the change in the solution:
A Picard iteration. We assume that we have approximations S − and I −
||δu|| ≤ ur ||u0 || + ua , (9.22) to S and I, respectively. A way of linearizing the only nonlinear term SI
The ultimate termination criterion, combining the residual and the is to write I − S in the FS = 0 equation and S − I in the FI = 0 equation,
change in solution with a test on the maximum number of iterations, can which also decouples the equations. Solving the resulting linear equations
be expressed as with respect to the unknowns S and I gives
S (1) − 12 ∆tβS (1) I (1)

||F (u)|| ≤ rr ||F (u0 )|| + ra or ||δu|| ≤ ur ||u0 || + ua or k > kmax . S= ,
(9.23) 1 + 12 ∆tβI −
I (1) + 12 ∆tβS (1) I (1) − 12 ∆tνI (1)
I= .
1 − 12 ∆tβS − + 12 ∆tν
9.2.4 Example: A nonlinear ODE model from epidemiology
Before a new iteration, we must update S − ← S and I − ← I.
The simplest model spreading of a disease, such as a flu, takes the form
of a 2 × 2 ODE system Newton’s method. The nonlinear system (9.28)-(9.29) can be written
as F (u) = 0 with F = (FS , FI ) and u = (S, I). The Jacobian becomes
S 0 = −βSI, (9.24) ! !
∂ ∂
∂S FS ∂I FS 1 + 12 ∆tβI 1
2 ∆tβS
I = βSI − νI,
0
(9.25) J= = .
∂ ∂
∂S I ∂I FI
F − 12 ∆tβI 1 − 12 ∆tβS + 12 ∆tν
where S(t) is the number of people who can get ill (susceptibles) and
The Newton system J(u− )δu = −F (u− ) to be solved in each iteration is
I(t) is the number of people who are ill (infected). The constants β > 0
then
and ν > 0 must be given along with initial conditions S(0) and I(0).
Implicit time discretization. A Crank-Nicolson scheme leads to a 2 × 2
system of nonlinear algebraic equations in the unknowns S n+1 and I n+1 :
9.3 Linearization at the differential equation level 329 330 9 Nonlinear problems
! !
1 + 21 ∆tβI − 1
2 ∆tβS
−
δS 9.3.1 Explicit time integration
=
− 2 ∆tβI 1 − 2 ∆tβS + 21 ∆tν
1 − 1 −
δI
! The nonlinearities in the PDE are trivial to deal with if we choose an
S − − S (1) + 12 ∆tβ(S (1) I (1) + S − I − ) explicit time integration method for (9.30), such as the Forward Euler
I − I − 12 ∆tβ(S (1) I (1) + S − I − ) + 21 ∆tν(I (1) + I − )
− (1) method:
Remark. For this particular system of ODEs, explicit time integration [Dt+ u = ∇ · (α(u)∇u) + f (u)]n ,
methods work very well. Even a Forward Euler scheme is fine, but (as also
experienced more generally) the 4-th order Runge-Kutta method is an or written out,
excellent balance between high accuracy, high efficiency, and simplicity. un+1 − un
= ∇ · (α(un )∇un ) + f (un ),
∆t
which is a linear equation in the unknown un+1 with solution
9.3 Linearization at the differential equation level
un+1 = un + ∆t∇ · (α(un )∇un ) + ∆tf (un ) .
The attention is now turned to nonlinear partial differential equations
The disadvantage with this discretization is the strict stability criterion
(PDEs) and application of the techniques explained above for ODEs. The
∆t ≤ h2 /(6 max α) for the case f = 0 and a standard 2nd-order finite
model problem is a nonlinear diffusion equation for u(x, t):
difference discretization in 3D space with mesh cell sizes h = ∆x = ∆y =
∆z.
∂u
= ∇ · (α(u)∇u) + f (u), x ∈ Ω, t ∈ (0, T ], (9.30)
∂t
∂u
−α(u) = g, x ∈ ∂ΩN , t ∈ (0, T ], (9.31) 9.3.2 Backward Euler scheme and Picard iteration
∂n
u = u0 , x ∈ ∂ΩD , t ∈ (0, T ] . (9.32) A Backward Euler scheme for (9.30) reads
In the present section, our aim is to discretize this problem in time and [Dt− u = ∇ · (α(u)∇u) + f (u)]n .
then present techniques for linearizing the time-discrete PDE problem
“at the PDE level” such that we transform the nonlinear stationary PDE Written out,
problem at each time level into a sequence of linear PDE problems, which
un − un−1
can be solved using any method for linear PDEs. This strategy avoids = ∇ · (α(un )∇un ) + f (un ) . (9.33)
∆t
the solution of systems of nonlinear algebraic equations. In Section 9.4
we shall take the opposite (and more common) approach: discretize the This is a nonlinear PDE for the unknown function un (x). Such a PDE
nonlinear problem in time and space first, and then solve the resulting can be viewed as a time-independent PDE where un−1 (x) is a known
nonlinear algebraic equations at each time level by the methods of function.
Section 9.2. Very often, the two approaches are mathematically identical, We introduce a Picard iteration with k as iteration counter. A typical
so there is no preference from a computational efficiency point of view. linearization of the ∇ · (α(un )∇un ) term in iteration k + 1 is to use
The details of the ideas sketched above will hopefully become clear the previously computed un,k approximation in the diffusion coefficient:
through the forthcoming examples. α(un,k ). The nonlinear source term is treated similarly: f (un,k ). The
unknown function un,k+1 then fulfills the linear PDE
un,k+1 − un−1
= ∇ · (α(un,k )∇un,k+1 ) + f (un,k ) . (9.34)
∆t
The initial guess for the Picard iteration at this time level can be taken equations, but the idea of the method can be applied at the PDE level
as the solution at the previous time level: un,0 = un−1 . too.
We can alternatively apply the implementation-friendly notation where
Linearization via Taylor expansions. Let un,k be an approximation to
u corresponds to the unknown we want to solve for, i.e., un,k+1 above,
the unknown un . We seek a better approximation on the form
and u− is the most recently computed value, un,k above. Moreover, u(1)
denotes the unknown function at the previous time level, un−1 above.
un = un,k + δu . (9.36)
The PDE to be solved in a Picard iteration then looks like
The idea is to insert (9.36) in (9.33), Taylor expand the nonlinearities
u − u(1) and keep only the terms that are linear in δu (which makes (9.36) an
= ∇ · (α(u− )∇u) + f (u− ) . (9.35)
∆t approximation for un ). Then we can solve a linear PDE for the correction
At the beginning of the iteration we start with the value from the previous δu and use (9.36) to find a new approximation
time level: u− = u(1) , and after each iteration, u− is updated to u.
un,k+1 = un,k + δu
Remark on notation to un . Repeating this procedure gives a sequence un,k+1 , k = 0, 1, . . .
The previous derivations of the numerical scheme for time discretiza- that hopefully converges to the goal un .
tions of PDEs have, strictly speaking, a somewhat sloppy notation, Let us carry out all the mathematical details for the nonlinear diffusion
but it is much used and convenient to read. A more precise notation PDE discretized by the Backward Euler method. Inserting (9.36) in (9.33)
must distinguish clearly between the exact solution of the PDE gives
problem, here denoted ue (x, t), and the exact solution of the spatial
problem, arising after time discretization at each time level, where un,k + δu − un−1
(9.33) is an example. The latter is here represented as un (x) and is = ∇·(α(un,k +δu)∇(un,k +δu))+f (un,k +δu) . (9.37)
∆t
an approximation to ue (x, tn ). Then we have another approximation
un,k (x) to un (x) when solving the nonlinear PDE problem for un We can Taylor expand α(un,k + δu) and f (un,k + δu):
by iteration methods, as in (9.34).
In our notation, u is a synonym for un,k+1 and u(1) is a synonym dα n,k
for un−1 , inspired by what are natural variable names in a code. α(un,k + δu) = α(un,k ) + (u )δu + O(δu2 ) ≈ α(un,k ) + α0 (un,k )δu,
du
We will usually state the PDE problem in terms of u and quickly df n,k
redefine the symbol u to mean the numerical approximation, while f (un,k + δu) = f (un,k ) + (u )δu + O(δu2 ) ≈ f (un,k ) + f 0 (un,k )δu .
du
ue is not explicitly introduced unless we need to talk about the
exact solution and the approximate solution at the same time. Inserting the linear approximations of α and f in (9.37) results in
un,k + δu − un−1
= ∇ · (α(un,k )∇un,k ) + f (un,k )+
∆t
9.3.3 Backward Euler scheme and Newton’s method ∇ · (α(un,k )∇δu) + ∇ · (α0 (un,k )δu∇un,k )+
∇ · (α0 (un,k )δu∇δu) + f 0 (un,k )δu . (9.38)
At time level n, we have to solve the stationary PDE (9.33). In the
previous section, we saw how this can be done with Picard iterations. The term α0 (un,k )δu∇δu is of order δu2 and therefore omitted since
Another alternative is to apply the idea of Newton’s method in a clever we expect the correction δu to be small (δu δu2 ). Reorganizing the
way. Normally, Newton’s method is defined for systems of algebraic equation gives a PDE for δu that we can write in short form as
u− − u(1)
F (u− ) = − ∇ · (α(u− )∇u− ) + f (u− ), (9.42)
δF (δu; u n,k
) = −F (u n,k
), ∆t
1
where δF (δu; u− ) = − δu + ∇ · (α(u− )∇δu)+
∆t
∇ · (α0 (u− )δu∇u− ) + f 0 (u− )δu . (9.43)
un,k − un−1
F (u ) =
n,k
− ∇ · (α(un,k )∇un,k ) + f (un,k ), (9.39) The form that orders the PDE as the Picard iteration terms plus the
∆t
1 Newton method’s derivative terms becomes
δF (δu; un,k ) = − δu + ∇ · (α(un,k )∇δu)+
∆t
∇ · (α0 (un,k )δu∇un,k ) + f 0 (un,k )δu . (9.40) u − u(1)
= ∇ · (α(u− )∇u) + f (u− )+
Note that δF is a linear function of δu, and F contains only terms that ∆t
are known, such that the PDE for δu is indeed linear. γ(∇ · (α0 (u− )(u − u− )∇u− ) + f 0 (u− )(u − u− )) . (9.44)
The Picard and full Newton versions correspond to γ = 0 and γ = 1,

Observations respectively.
The notational form δF = −F resembles the Newton system Derivation with alternative notation. Some may prefer to derive the
Jδu = −F for systems of algebraic equations, with δF as Jδu. linearized PDE for δu using the more compact notation. We start with
The unknown vector in a linear system of algebraic equations enters inserting un = u− + δu to get
the system as a linear operator in terms of a matrix-vector product
(Jδu), while at the PDE level we have a linear differential operator u− + δu − un−1
instead (δF ). = ∇ · (α(u− + δu)∇(u− + δu)) + f (u− + δu) .
∆t
Taylor expanding,
Similarity with Picard iteration. We can rewrite the PDE for δu in a
slightly different way too if we define un,k + δu as un,k+1 .
α(u− + δu) ≈ α(u− ) + α0 (u− )δu,
f (u− + δu) ≈ f (u− ) + f 0 (u− )δu,
un,k+1 − un−1
= ∇ · (α(un,k )∇un,k+1 ) + f (un,k )
∆t and inserting these expressions gives a less cluttered PDE for δu:
+ ∇ · (α0 (un,k )δu∇un,k ) + f 0 (un,k )δu . (9.41)
Note that the first line is the same PDE as arises in the Picard iteration, u− + δu − un−1
= ∇ · (α(u− )∇u− ) + f (u− )+
while the remaining terms arise from the differentiations that are an ∆t
inherent ingredient in Newton’s method. ∇ · (α(u− )∇δu) + ∇ · (α0 (u− )δu∇u− )+
Implementation. For coding we want to introduce u for un , u− for un,k ∇ · (α0 (u− )δu∇δu) + f 0 (u− )δu .
and u(1) for un−1 . The formulas for F and δF are then more clearly
written as
9.3.4 Crank-Nicolson discretization
A Crank-Nicolson discretization of (9.30) applies a centered difference at
tn+ 1 :
2
9.4 1D stationary nonlinear differential equations 335 336 9 Nonlinear problems
1
[Dt u = ∇ · (α(u)∇u) + f (u)]n+ 2 .
− (α(u)u0 )0 + au = f (u), x ∈ (0, L),
α(u(0))u0 (0) = C, u(L) = D .
The standard technique is to apply an arithmetic average for quantities (9.50)
defined between two mesh points, e.g., The problem (9.50) arises from the stationary limit of a diffusion
1 1 equation,
un+ 2 ≈ (un + un+1 ) .
2 ∂u ∂

∂u

However, with nonlinear terms we have many choices of formulating an = α(u) − au + f (u), (9.51)
∂t ∂x ∂x
arithmetic mean:
as t → ∞ and ∂u/∂t → 0. Alternatively, the problem (9.50) arises at
each time level from implicit time discretization of (9.51). For example,
1 1 1
a Backward Euler scheme for (9.51) leads to
[f (u)]n+ 2 ≈ f ( (un + un+1 )) = [f (ut )]n+ 2 , (9.45)
2
1 1 t 1 un − un−1 d dun
[f (u)]n+ 2 ≈ (f (un ) + f (un+1 )) = [f (u) ]n+ 2 , (9.46) = α(un ) − aun + f (un ) . (9.52)
2 ∆t dx dx
1 1 1 1
[α(u)∇u]n+ 2 ≈ α( (un + un+1 ))∇( (un + un+1 )) = [α(ut )∇ut ]n+ 2 , Introducing u(x) for un (x), u(1) for un−1 , and defining f (u) in (9.50) to
2 2
(9.47) be f (u) in (9.52) plus un−1 /∆t, gives (9.50) with a = 1/∆t.
1 1 1 t 1
[α(u)∇u]n+ 2 ≈ (α(un ) + α(un+1 ))∇( (un + un+1 )) = [α(u) ∇ut ]n+ 2 ,
2 2
(9.48) 9.4.1 Finite difference discretization
1
Since the technical steps in finite difference discretization in space are
1 t 1
[α(u)∇u]n+ 2 ≈ (α(un )∇un + α(un+1 )∇un+1 ) = [α(u)∇u ]n+ 2 .
2 so much simpler than the steps in the finite element method, we start
(9.49)
with finite difference to illustrate the concept of handling this nonlinear
A big question is whether there are significant differences in accuracy problem and minimize the spatial discretization details.
between taking the products of arithmetic means or taking the arithmetic The nonlinearity in the differential equation (9.50) poses no more
mean of products. Exercise 9.6 investigates this question, and the answer difficulty than a variable coefficient, as in the term (α(x)u0 )0 . We can
is that the approximation is O(∆t2 ) in both cases. therefore use a standard finite difference approach to discretizing the
kam 39: mean value notation. Refer to appedix? Laplace term with a variable coefficient:
[−Dx αDx u + au = f ]i .
9.4 1D stationary nonlinear differential equations Writing this out for a uniform mesh with points xi = i∆x, i =
0, . . . , Nx , leads to
Section 9.3 presented methods for linearizing time-discrete PDEs directly
prior to discretization in space. We can alternatively carry out the 1
discretization in space of the time-discrete nonlinear PDE problem and − 2
αi+ 1 (ui+1 − ui ) − αi− 1 (ui − ui−1 ) + aui = f (ui ) . (9.53)
∆x 2 2
get a system of nonlinear algebraic equations, which can be solved by
Picard iteration or Newton’s method as presented in Section 9.2. This This equation is valid at all the mesh points i = 0, 1, . . . , Nx − 1. At
latter approach will now be described in detail. i = Nx we have the Dirichlet condition ui = D. The only difference from
We shall work with the 1D problem
the case with (α(x)u0 )0 and f (x) is that now α and f are functions of u 9.4.2 Solution of algebraic equations
and not only of x: (α(u(x))u0 )0 and f (u(x)).
The structure of the equation system. The nonlinear algebraic equa-
The quantity αi+ 1 , evaluated between two mesh points, needs a com-
2 tions (9.56) are of the form A(u)u = b(u) with
ment. Since α depends on u and u is only known at the mesh points, we
need to express αi+ 1 in terms of ui and ui+1 . For this purpose we use
1
2
an arithmetic mean, although a harmonic mean is also common in this Ai,i = (α(ui−1 ) + 2α(ui ) + α(ui+1 )) + a,
context if α features large jumps. There are two choices of arithmetic 2∆x2
means: 1
Ai,i−1 = − (α(ui−1 ) + α(ui )),
2∆x2
1
1 Ai,i+1 = − (α(ui ) + α(ui+1 )),
2∆x2
1
αi+ 1 ≈ α( (ui + ui+1 ) = [α(ux )]i+ 2 , (9.54)
2 2 bi = f (ui ) .
1 x 1
αi+ 1 ≈ (α(ui ) + α(ui+1 )) = [α(u) ]i+ 2 (9.55)
2 2 The matrix A(u) is tridiagonal: Ai,j = 0 for j > i + 1 and j < i − 1.
Equation (9.53) with the latter approximation then looks like The above expressions are valid for internal mesh points 1 ≤ i ≤ Nx −1.
For i = 0 we need to express ui−1 = u−1 in terms of u1 using (9.57):
1 2∆x
− ((α(ui ) + α(ui+1 ))(ui+1 − ui ) − (α(ui−1 ) + α(ui ))(ui − ui−1 )) u−1 = u1 − C. (9.58)
2∆x2 α(u0 )
+ aui = f (ui ), (9.56) This value must be inserted in A0,0 . The expression for Ai,i+1 applies for
i = 0, and Ai,i−1 does not enter the system when i = 0.
or written more compactly,
Regarding the last equation, its form depends on whether we include
the Dirichlet condition u(L) = D, meaning uNx = D, in the nonlinear
[−Dx αx Dx u + au = f ]i .
algebraic equation system or not. Suppose we choose (u0 , u1 , . . . , uNx −1 )
At mesh point i = 0 we have the boundary condition α(u)u0 = C, as unknowns, later referred to as systems without Dirichlet conditions.
which is discretized by The last equation corresponds to i = Nx − 1. It involves the boundary
value uNx , which is substituted by D. If the unknown vector includes the
[α(u)D2x u = C]0 , boundary value, (u0 , u1 , . . . , uNx ), later referred to as system including
meaning Dirichlet conditions, the equation for i = Nx −1 just involves the unknown
uNx , and the final equation becomes uNx = D, corresponding to Ai,i = 1
u1 − u−1 and bi = D for i = Nx .
α(u0 ) =C. (9.57)
2∆x
Picard iteration. The obvious Picard iteration scheme is to use previ-
The fictitious value u−1 can be eliminated with the aid of (9.56) for i = 0. ously computed values of ui in A(u) and b(u), as described more in detail
Formally, (9.56) should be solved with respect to ui−1 and that value in Section 9.2. With the notation u− for the most recently computed
(for i = 0) should be inserted in (9.57), but it is algebraically much easier value of u, we have the system F (u) ≈ F̂ (u) = A(u− )u − b(u− ), with
to do it the other way around. Alternatively, one can use a ghost cell F = (F0 , F1 , . . . , Fm ), u = (u0 , u1 , . . . , um ). The index m is Nx if the
[−∆x, 0] and update the u−1 value in the ghost cell according to (9.57) system includes the Dirichlet condition as a separate equation and Nx − 1
after every Picard or Newton iteration. Such an approach means that we otherwise. The matrix A(u− ) is tridiagonal, so the solution procedure
use a known u−1 value in (9.56) from the previous iteration. is to fill a tridiagonal matrix data structure and the right-hand side
vector with the right numbers and call a Gaussian elimination routine 1
( − (α(u− 0 ) + α(u1 ))u0 +
−
for tridiagonal linear systems. 2∆x2
kam 40: yapi checks the calculations of the following pages (α(u−0 ) + 2α(u1 ) + α(u2 ))u1 −
− −
Mesh with two cells. It helps on the understanding of the details to (α(u1 ) + α(u2 )))u2 + au1 = f (u−
− −
1 ).
write out all the mathematics in a specific case with a small mesh, say We must now move the u2 term to the right-hand side and replace all
just two cells (Nx = 2). We use u− i for the i-th component in u .
−
occurrences of u2 by D:
The starting point is the basic expressions for the nonlinear equations
at mesh point i = 0 and i = 1 are
1
( − (α(u− 0 ) + α(u1 ))u0 +
−
2∆x2
A0,−1 u−1 + A0,0 u0 + A0,1 u1 = b0 , (9.59) (α(u−0 ) + 2α(u1 ) + α(D))u1 + au1
−
A1,0 u0 + A1,1 u1 + A1,2 u2 = b1 . (9.60) 1

= f (u−1)+ (α(u−
1 ) + α(D))D .
2∆x2
Equation (9.59) written out reads
The two equations can be written as a 2 × 2 system:
! ! !
1
( − (α(u−1 ) + α(u0 ))u−1 + B0,0 B0,1 u0
=
d0
,
2∆x2 B1,0 B1,1 u1 d1
(α(u−1 ) + 2α(u0 ) + α(u1 ))u0 −
where
(α(u0 ) + α(u1 )))u1 + au0 = f (u0 ) .
We must then replace u−1 by (9.58). With Picard iteration we get 1

B0,0 = (α(u−
−1 ) + 2α(u0 ) + α(u1 )) + a
− −
(9.61)
2∆x2
1 1
( − (α(u− B0,1 = − (α(u−
−1 ) + 2α(u0 ) + α(u1 )),
− −
(9.62)
−1 ) + 2α(u0 ) + α(u1 ))u1 +
− −
2∆x2 2∆x2
1
(α(u−−1 ) + 2α(u0 ) + α(u1 ))u0 + au0
− −
B1,0 = − (α(u−
0 ) + α(u1 )),
−
(9.63)
2∆x2
1 1
= f (u−
0)− (α(u−−1 ) + α(u0 ))C,
−
B1,1 = (α(u−
0 ) + 2α(u1 ) + α(D)) + a, (9.64)
−
0 )∆x
α(u− 2∆x2
1
where d0 = f (u−0)− (α(u− −1 ) + α(u0 ))C,
−
(9.65)
α(u−0 )∆x
2∆x 1
−1 = u1 −
u− −
C. d1 = f (u−1)+ (α(u−1 ) + α(D))D . (9.66)
0)
α(u− 2∆x2
Equation (9.60) contains the unknown u2 for which we have a Dirichlet The system with the Dirichlet condition becomes
condition. In case we omit the condition as a separate equation, (9.60)     
with Picard iteration becomes B0,0 B0,1 0 u0 d0
    
 B1,0 B1,1 B1,2   u1  =  d1  ,
0 0 1 u2 D
with
1 ∂Fi ∂Ai,i−1 ∂Ai,i ∂Ai,i+1 ∂bi

B1,1 = (α(u−0 ) + 2α(u1 ) + α(u2 )) + a,
−
(9.67) Ji,i = = ui−1 + ui + Ai,i + ui+1 −
2∆x2 ∂ui ∂ui ∂ui ∂ui ∂ui
1 1
B1,2 = − (α(u−
1 ) + α(u2 ))), (9.68) = (−α0 (ui )ui−1 + 2α0 (ui )ui + α(ui−1 ) + 2α(ui ) + α(ui+1 ))+
2∆x2 2∆x2
d1 = f (u1 ) .
−
(9.69) 1
a− α0 (ui )ui+1 − b0 (ui ),
2∆x2
Other entries are as in the 2 × 2 system. ∂Fi ∂Ai,i−1 ∂Ai,i ∂bi
Ji,i−1 = = ui−1 + Ai−1,i + ui −
Newton’s method. The Jacobian must be derived in order to use ∂ui−1 ∂ui−1 ∂ui−1 ∂ui−1
Newton’s method. Here it means that we need to differentiate F (u) = 1
= (−α0 (ui−1 )ui−1 − (α(ui−1 ) + α(ui )) + α0 (ui−1 )ui ),
A(u)u − b(u) with respect to the unknown parameters u0 , u1 , . . . , um 2∆x2
(m = Nx or m = Nx − 1, depending on whether the Dirichlet condition Ji,i+1 =
∂Ai,i+1
ui+1 + Ai+1,i +
∂Ai,i
ui −
∂bi
is included in the nonlinear system F (u) = 0 or not). Nonlinear equation ∂ui−1 ∂ui+1 ∂ui+1
number i of (9.56) has the structure 1
= (−α0 (ui+1 )ui+1 − (α(ui ) + α(ui+1 )) + α0 (ui+1 )ui ) .
2∆x2
Fi = Ai,i−1 (ui−1 , ui )ui−1 +Ai,i (ui−1 , ui , ui+1 )ui +Ai,i+1 (ui , ui+1 )ui+1 −bi (ui ) . kam 41: check completely!!
The explicit expression for nonlinear equation number i, Fi (u0 , u1 , . . .),
Computing the Jacobian requires careful differentiation. For example, arises from moving the f (ui ) term in (9.56) to the left-hand side:
∂ ∂Ai,i ∂ui 1
(Ai,i (ui−1 , ui , ui+1 )ui ) = ui + Ai,i Fi = − ((α(ui ) + α(ui+1 ))(ui+1 − ui ) − (α(ui−1 ) + α(ui ))(ui − ui−1 ))
∂ui ∂ui ∂ui 2∆x2
∂ 1 + aui − f (ui ) = 0 . (9.70)
= ( (α(ui−1 ) + 2α(ui ) + α(ui+1 )) + a)ui +
∂ui 2∆x2
1 At the boundary point i = 0, u−1 must be replaced using the formula
(α(ui−1 ) + 2α(ui ) + α(ui+1 )) + a (9.58). When the Dirichlet condition at i = Nx is not a part of the
2∆x2
1 equation system, the last equation Fm = 0 for m = Nx − 1 involves the
= (2α0 (ui )ui + α(ui−1 ) + 2α(ui ) + α(ui+1 )) + a . quantity uNx −1 which must be replaced by D. If uNx is treated as an
2∆x2
unknown in the system, the last equation Fm = 0 has m = Nx and reads
The complete Jacobian becomes
FNx (u0 , . . . , uNx ) = uNx − D = 0 .
Similar replacement of u−1 and uNx must be done in the Jacobian for
the first and last row. When uNx is included as an unknown, the last
row in the Jacobian must help implement the condition δuNx = 0, since
we assume that u contains the right Dirichlet value at the beginning of
the iteration (uNx = D), and then the Newton update should be zero for
i = 0, i.e., δuNx = 0. This also forces the right-hand side to be bi = 0,
i = Nx .
We have seen, and can see from the present example, that the linear
system in Newton’s method contains all the terms present in the system
that arises in the Picard iteration method. The extra terms in Newton’s
   
method can be multiplied by a factor such that it is easy to program X Z L X
 α(D + ck ψk )ψj0 ψi0 + aψi ψj  dx cj
one linear system and set this factor to 0 or 1 to generate the Picard or
0
Newton system. j∈Is
Z
k∈Is
L X
= f (D + ck ψk )ψi dx − Cψi (0), i ∈ Is . (9.72)
0 k∈Is
9.4.3 Galerkin-type discretization
Fundamental integration problem. Methods that use the Galerkin
For a Galerkin-type discretization, which may be developed into a finite or weighted residual method face a fundamental R
difficulty in nonlinear
P
element method, we first need to derive the variational problem. Let V problems: how can we integrate a terms like 0L α( k ck ψk )ψi0 ψj0 dx and
RL P
be an appropriate function space with basis functions {ψi }i∈Is . Because 0 f ( k ck ψk )ψi dx when we do not know the ck coefficients in the
of the Dirichlet condition at x = L we require ψi (L) = 0, i ∈ Is . The argument of the α function? We can resort to numerical integration,
P P
approximate solution is written as u = D + j∈Is cj ψj , where the term D provided an approximate k ck ψk can be used for the argument u in f
can be viewed as a boundary function needed to implement the Dirichlet and α. This is the approach used in computer programs.
condition u(L) = D. We remark that the boundary function is D rather However, if we want to look more mathematically into the structure
than Dx/L because of the Neumann condition at x = 0. of the algebraic equations generated by the finite element method in
Using Galerkin’s method, we multiply the differential equation by any nonlinear problems, and compare such equations with those arising in
v ∈ V and integrate terms with second-order derivatives by parts: the finite differenceR method, we need techniques that enable integration
P
of expressions like 0L f ( k ck ψk )ψi dx by hand. Two such techniques will
Z L Z L Z L be shown: the group finite element and numerical integration based on
α(u)u0 v 0 dx + auv dx = f (u)v dx + [α(u)u0 v]L
0, ∀v ∈ V . the nodes only. Both techniques are approximate, but they allow us to
0 0 0
see the difference equations in the finite element method. The details
The Neumann condition at the boundary x = 0 is inserted in the boundary are worked out in Appendix 9.7. Some readers will prefer to dive into
term: these symbolic calculations to gain more understanding of nonlinear finite
element equations, while others will prefer to continue with computational
algorithms (in the next two sections) rather than analysis.
[α(u)u0 v]L
0 = α(u(L))u (L)v(L)−α(u(0))u (0)v(0) = 0−Cv(0) = −Cv(0) .
0 0
(Recall that since ψi (L) = 0, any linear combination v of the basis

functions also vanishes at x = L: v(L) = 0.) The variational problem is 9.4.4 Picard iteration defined from the variational form
then: find u ∈ V such that
Consider the problem (9.50) with the corresponding variational form
(9.71). Our aim is to define a Picard iteration based on this variational
Z Z Z
L L L
form without any attempt to compute integrals symbolically as in the
α(u)u0 v 0 dx+ auv dx = f (u)v dx−Cv(0), ∀v ∈ V . (9.71)
0 0 0 previous three sections. The idea in Picard iteration is to use a previously
computed u value in the nonlinear functions α(u) and f (u). Let u− be the
To derive the algebraic equations, we note that ∀v ∈ V is equivalent
P available approximation to u from the previous iteration. The linearized
with v = ψi for i ∈ Is . Setting u = D + j cj ψj and sorting terms results
variational form for Picard iteration is then
in the linear system
Z L Z L
(α(u− )u0 v 0 + auv) dx = f (u− )v dx − Cv(0), ∀v ∈ V . (9.73)
0 0
This is a linear problem a(u, v) = L(v) with bilinear and linear forms The derivation of the Jacobian of (9.74) goes as
Z L Z L Z L
∂Fi ∂
a(u, v) = (α(u− )u0 v 0 + auv) dx, L(v) = f (u− )v dx − Cv(0) . Ji,j = = (α(u)u0 ψi0 + auψi − f (u)ψi ) dx
0 0 ∂cj 0 ∂cj
Z L
Make sure to distinguish the coefficient a in auv from the differential ∂u 0 ∂u0 0 ∂u ∂u
= ((α0 (u) u + α(u) )ψ + a ψi − f 0 (u) ψi ) dx
equation from the a in the abstract bilinear form notation a(·, ·). 0 ∂cj ∂cj i ∂cj ∂cj
Z
The linear system associated with (9.73) is computed the standard L
= ((α0 (u)ψj u0 + α(u)ψj0 )ψi0 + aψj ψi − f 0 (u)ψj ψi ) dx
way. Technically, we are back to solving −(α(x)u0 )0 + au = f (x). The 0
P Z
unknown u is sought on the form u = B(x) + j∈Is cj ψj , with B(x) = D L
and ψi = ϕν(i) , ν(i) = i + 1, and Is = {0, 1, . . . , N = Nn − 2}. = (α0 (u)u0 ψi0 ψj + α(u)ψi0 ψj0 + (a − f (u))ψi ψj ) dx (9.75)
0
9.4.5 Newton’s method defined from the variational form One must be careful about the prime symbol as differenti-
ation!
Application of Newton’s method to the nonlinear variational form (9.71)
arising from the problem (9.50) requires identification of the nonlinear In α0 the derivative is with respect to the independent variable in
algebraic equations Fi = 0. Although we originally denoted the unknowns the α function, and that is u, so
in nonlinear algebraic equations by u0 , . . . , uN , it is in the present context
dα
most natural to have the unknowns as c0 , . . . , cN and write , α0 =
du
Fi (c0 , . . . , cN ) = 0, i ∈ Is , while in u0 the differentiation is with respect to x, so
and define the Jacobian as Ji,j = ∂Fi /∂cj for i, j ∈ Is . du

. u0 =
The specific form of the equations Fi = 0 follows from the variational dx
form Similarly, f is a function of u, so f 0 means df /du.
Z Z
L L When calculating the right-hand side vector Fi and the coefficient
(α(u)u0 v 0 + auv) dx = f (u)v dx − Cv(0), ∀v ∈ V,
0 0 matrix Ji,j in the linear system to be solved in each Newton iteration,
P one must use a previously computed u, denoted by u− , for the symbol u
by choosing v = ψi , i ∈ Is , and setting u = j∈Is cj ψj , maybe with a
in (9.74) and (9.75). With this notation we have
boundary function to incorporate Dirichlet conditions, we get
Z L
Z
L Fi = α(u− )u−0 ψi0 + (a − f (u− ))ψi dx − Cψi (0), i ∈ Is ,
Fi = (α(u)u 0
ψi0 + auψi − f (u)ψi ) dx + Cψi (0) = 0, i ∈ Is . (9.74) 0
0 (9.76)
Z
In the differentiations leading to the Jacobian we will frequently use the L
results Ji,j = (α0 (u− )u−0 ψi0 ψj + α(u− )ψi0 ψj0 + (a − f (u− ))ψi ψj ) dx, i, j ∈ Is .
0
(9.77)
∂u ∂ X ∂u0 ∂ X
= ck ψk = ψj , = ck ψk0 = ψj0 .
∂cj ∂cj k ∂cj ∂cj k
9.5 Multi-dimensional PDE problems 347 348 9 Nonlinear problems
These expressions can be used for any basis {ψi }i∈Is . Choosing finite 9.5.1 Finite element discretization
element functions for ψi , one will normally want to compute the integral
As an example, a Backward Euler discretization of the PDE
contribution cell by cell, working in a reference cell. To this end, we restrict
the integration to one cell and transform the cell to [−1, 1]. The most
P ut = ∇ · (α(u)∇u) + f (u), (9.80)
recently computed approximation u− to u becomes ũ− = t ũ−1 t ϕ̃t (X)
over the reference element, where ũ−1 t is the value of u− at global node gives the nonlinear time-discrete PDEs
(or degree of freedom) q(e, t) corresponding to the local node t (or degree
of freedom). The formulas (9.76) and (9.77) then change to un − ∆t∇ · (α(un )∇un ) + f (un ) = un−1 .
We may alternatively write these equations with u for un and u(1) for
Z 1 un−1 :
F̃r(e) = α(ũ− )ũ−0 ϕ̃0r + (a − f (ũ− ))ϕ̃r det J dX − C ϕ̃r (0), (9.78)
−1
Z 1 u − ∆t∇ · (α(u)∇u) − ∆tf (u) = u(1) .
J˜r,s
(e)
= (α0 (ũ− )ũ−0 ϕ̃0r ϕ̃s + α(ũ− )ϕ̃0r ϕ̃0s + (a − f (ũ− ))ϕ̃r ϕ̃s ) det J dX,
−1
(9.79)
Understand the meaning of the symbol u in various formu-
with r, s ∈ Id runs over the local degrees of freedom. las!
Many finite element programs require the user to provide Fi and Ji,j . Note that the mathematical meaning of the symbol u changes in
Some programs, like FEniCS, are capable of automatically deriving Ji,j the above equations: u(x, t) is the exact solution of (9.80), un (x)
if Fi is specified. is an approximation to the exact solution at t = tn , while u(x)
Dirichlet conditions. Incorporation of the Dirichlet values by assembling in the latter equation is a synonym for un . Below, this u(x) will
P
contributions from all degrees of freedom and then modifying the linear be approximated by a new u = k ck ψk (x) in space, and then
system can obviously be applied to Picard iteration as that method the actual u symbol used in the Picard and Newton iterations
P
involves a standard linear system. In the Newton system, however, the is a further approximation of k ck ψk arising from the nonlinear
unknown is a correction δu to the solution. Dirichlet conditions are iteration algorithm.
implemented by inserting them in the initial guess u− for the Newton Much literature reserves u for the exact solution, uses uh (x, t)
iteration and implementing δui = 0 for all known degrees of freedom. for the finite element solution that varies continuously in time, in-
The manipulation of the linear system follows exactly the algorithm in troduces perhaps unh as the approximation of uh at time tn , arising
the linear problems, the only difference being that the known values are from some time discretization, and then finally applies un,k
h for the
zero. approximation to unh in the k-th iteration of a Picard or Newton
method. The converged solution at the previous time step can be
called un−1
h , but then this quantity is an approximate solution of
the nonlinear equations (at the previous time level), while the coun-
9.5 Multi-dimensional PDE problems
terpart unh is formally the exact solution of the nonlinear equations
at the current time level. The various symbols in the mathematics
The fundamental ideas in the derivation of Fi and Ji,j in the 1D model can in this way be clearly distinguished. However, we favor to use
problem are easily generalized to multi-dimensional problems. Neverthe- u for the quantity that is most naturally called u in the code, and
less, the expressions involved are slightly different, with derivatives in x that is the most recent approximation to the solution, e.g., named
replaced by ∇, so we present some examples below in detail.
h above. This is also the key quantity of interest in mathematical
un,k
Z
derivations of algorithms as well. Choosing u this way makes the Fi ≈ F̂i = (u− ψi + ∆t α(u− )∇u− · ∇ψi − ∆tf (u− )ψi − u(1) ψi ) dx .
most important mathematical cleaner than using more cluttered Ω
notation as un,k (9.84)
h . We therefore introduce other symbols for other
versions of the unknown function. It is most appropriate for us to The Jacobian is obtained by differentiating (9.82) and using
say that ue (x, t) is the exact solution, un in the equation above is
the approximation to ue (x, tn ) after time discretization, and u is ∂u X ∂
= ck ψk = ψj , (9.85)
the spatial approximation to un from the most recent iteration in a ∂cj ∂cj
k
nonlinear iteration method. ∂∇u X ∂
= ck ∇ψk = ∇ψj . (9.86)
∂cj ∂cj
Let us assume homogeneous Neumann conditions on the entire bound- k
ary for simplicity in the boundary term. The variational form becomes: The result becomes
find u ∈ V such that
Z
∂Fi
Z Ji,j = = (ψj ψi + ∆t α0 (u)ψj ∇u · ∇ψi + ∆t α(u)∇ψj · ∇ψi −
(uv + ∆t α(u)∇u · ∇v − ∆tf (u)v − u(1) v) dx = 0, ∀v ∈ V . (9.81) ∂cj Ω
Ω
∆tf 0 (u)ψj ψi ) dx . (9.87)
The nonlinear algebraic equations follow from setting v = ψi and using
P
the representation u = k ck ψk , which we just write as The evaluation of Ji,j as the coefficient matrix in the linear system in
Newton’s method applies the known approximation u− for u:
Z
Fi = (uψi + ∆t α(u)∇u · ∇ψi − ∆tf (u)ψi − u(1) ψi ) dx . (9.82) Z
Ω Ji,j = (ψj ψi + ∆t α0 (u− )ψj ∇u− · ∇ψi + ∆t α(u− )∇ψj · ∇ψi −
Ω
Picard iteration needs a linearization where we use the most recent ∆tf 0 (u− )ψj ψi ) dx . (9.88)
approximation u− to u in α and f :
Hopefully, this example also shows how convenient the notation with u
Z and u− is: the unknown to be computed is always u and linearization
Fi ≈ F̂i = (uψi +∆t α(u− )∇u·∇ψi −∆tf (u− )ψi −u(1) ψi ) dx . (9.83) by inserting known (previously computed) values is a matter of adding
Ω
an underscore. One can take great advantage of this quick notation in
The equations F̂i = 0 are now linear and we can easily derive a linear software [15].
P
system j∈Is Ai,j cj = bi , i ∈ Is , for the unknown coefficients {ci }i∈Is by
P Non-homogeneous Neumann conditions. A natural physical flux con-
inserting u = j cj ψj . We get
dition for the PDE (9.80) takes the form of a non-homogeneous Neumann
Z Z
condition
Ai,j = (ψj ψi +∆t α(u− )∇ψj ·∇ψi ) dx, bi = (∆tf (u− )ψi +u(1) ψi ) dx . ∂u
Ω Ω = g, x ∈ ∂ΩN ,
− α(u) (9.89)
∂n
In Newton’s method we need to evaluate Fi with the known value u− where g is a prescribed function and ∂ΩN is a part of the boundary
for u: R
of the domain Ω. From integrating Ω ∇ · (α∇u) dx by parts, we get a
boundary term
Z
α(u)
∂u
v ds . (9.90) We do not dive into the details of handling boundary conditions now.
∂ΩN ∂u Dirichlet and Neumann conditions are handled as in a corresponding
Inserting the condition (9.89) into this term results in an integral over linear, variable-coefficient diffusion problems.
prescribed values: Writing the scheme out, putting the unknown values on the left-
Z hand side and known values on the right-hand side, and introducing
− gv ds . ∆x = ∆y = h to save some writing, one gets
∂ΩN
The nonlinearity in the α(u) coefficient condition (9.89) therefore does ∆t 1

not contribute with a nonlinearity in the variational form. uni,j − ( (α(uni,j ) + α(uni+1,j ))(uni+1,j − uni,j )
h2 2
Robin conditions. Heat conduction problems often apply a kind of 1
− (α(uni−1,j ) + α(uni,j ))(uni,j − uni−1,j )
Newton’s cooling law, also known as a Robin condition, at the boundary: 2
1
+ (α(uni,j ) + α(uni,j+1 ))(uni,j+1 − uni,j )
∂u 2
− α(u) = h(u)(u − Ts (t)), x ∈ ∂ΩR , (9.91) 1
∂u − (α(uni,j−1 ) + α(uni,j ))(uni,j − uni,j−1 )) − ∆tf (uni,j ) = ui,j
n−1
where h(u) is a heat transfer coefficient between the body (Ω) and its 2
surroundings, Ts is the temperature of the surroundings, and ∂ΩR is a This defines a nonlinear algebraic system on the form A(u)u = b(u).
part of the boundary where this Robin condition applies. The boundary
Picard iteration. The most recently computed values u− of un can
integral (9.90) now becomes
be used in α and f for a Picard iteration, or equivalently, we solve
Z
A(u− )u = b(u− ). The result is a linear system of the same type as arising
h(u)(u − Ts (T ))v ds .
∂ΩR from ut = ∇ · (α(x)∇u) + f (x, t).
In many physical applications, h(u) can be taken as constant, and then The Picard iteration scheme can also be expressed in operator notation:
the boundary term is linear in u, otherwise it is nonlinear and contributes x y
to the Jacobian in a Newton method. Linearization in a Picard method [Dt− u = Dx α(u− ) Dx u + Dy α(u− ) Dy u + f (u− )]ni,j .
will typically use a known value in h, but keep the u in u−Ts as unknown: Newton’s method. As always, Newton’s method is technically more
h(u− )(u − Ts (t)). Exercise 9.15 asks you to carry out the details. involved than Picard iteration. We first define the nonlinear algebraic
equations to be solved, drop the superscript n (use u for un ), and intro-
duce u(1) for un−1 :
9.5.2 Finite difference discretization
∆t
A typical diffusion equation Fi,j = ui,j − (
h2
1
ut = ∇ · (α(u)∇u) + f (u), (α(ui,j ) + α(ui+1,j ))(ui+1,j − ui,j )−
2
can be discretized by (e.g.) a Backward Euler scheme, which in 2D can 1
(α(ui−1,j ) + α(ui,j ))(ui,j − ui−1,j )+
be written 2
1
x y (α(ui,j ) + α(ui,j+1 ))(ui,j+1 − ui,j )−
[Dt− u = Dx α(u) Dx u + Dy α(u) Dy u + f (u)]ni,j . 2
1 (1)
(α(ui,j−1 ) + α(ui,j ))(ui,j − ui,j−1 )) − ∆t f (ui,j ) − ui,j = 0 .
2
It is convenient to work with two indices i and j in 2D finite difference Ji,j,i−1,j =

∂Fi,j
discretizations, but it complicates the derivation of the Jacobian, which ∂ui−1,j
then gets four indices. (Make sure you really understand the 1D version 1 ∆t 0
= (α (ui−1,j )(ui,j − ui−1,j ) + (α(ui−1,j ) + α(ui,j ))(−1)),
of this problem as treated in Section 9.4.1.) The left-hand expression of 2 h2
an equation Fi,j = 0 is to be differentiated with respect to each of the ∂Fi,j
Ji,j,i+1,j =
unknowns ur,s (recall that this is short notation for unr,s ), r ∈ Ix , s ∈ Iy : ∂ui+1,j
1 ∆t
∂Fi,j = (−α0 (ui+1,j )(ui+1,j − ui,j ) − (α(ui−1,j ) + α(ui,j ))),
Ji,j,r,s = . 2 h2
∂ur,s
∂Fi,j
The Newton system to be solved in each iteration can be written as Ji,j,i,j−1 =
∂ui,j−1
X X 1 ∆t 0
Ji,j,r,s δur,s = −Fi,j , i ∈ Ix , j ∈ Iy . = (α (ui,j−1 )(ui,j − ui,j−1 ) + (α(ui,j−1 ) + α(ui,j ))(−1)),
2 h2
r∈Ix s∈Iy
∂Fi,j
Given i and j, only a few r and s indices give nonzero contribution to Ji,j,i,j+1 =
∂ui,j+1
the Jacobian since Fi,j contains ui±1,j , ui,j±1 , and ui,j . This means that 1 ∆t
Ji,j,r,s has nonzero contributions only if r = i ± 1, s = j ± 1, as well as = (−α0 (ui,j+1 )(ui,j+1 − ui,j ) − (α(ui,j−1 ) + α(ui,j ))) .
2 h2
r = i and s = j. The corresponding terms in Ji,j,r,s are Ji,j,i−1,j , Ji,j,i+1,j ,
Ji,j,i,j−1 , Ji,j,i,j+1 , and Ji,j,i,j . Therefore, the left-hand side of the Newton The Ji,j,i,j entry has a few more terms and is left as an exercise. Inserting
P P
system, r s Ji,j,r,s δur,s collapses to the most recent approximation u− for u in the J and F formulas and
then forming Jδu = −F gives the linear system to be solved in each
Newton iteration. Boundary conditions will affect the formulas when any
Ji,j,r,s δur,s = Ji,j,i,j δui,j + Ji,j,i−1,j δui−1,j + Ji,j,i+1,j δui+1,j + Ji,j,i,j−1 δui,j−1 of the indices coincide with a boundary value of an index.
+ Ji,j,i,j+1 δui,j+1
The specific derivatives become 9.5.3 Continuation methods

Picard iteration or Newton’s method may diverge when solving PDEs
with severe nonlinearities. Relaxation with ω < 1 may help, but in
highly nonlinear problems it can be necessary to introduce a continuation
parameter Λ in the problem: Λ = 0 gives a version of the problem that
is easy to solve, while Λ = 1 is the target problem. The idea is then to
increase Λ in steps, Λ0 = 0, Λ1 < · · · < Λn = 1, and use the solution from
the problem with Λi−1 as initial guess for the iterations in the problem
corresponding to Λi .
The continuation method is easiest to understand through an example.
Suppose we intend to solve
−∇ · (||∇u||q ∇u) = f,
which is an equation modeling the flow of a non-Newtonian fluid through a
channel or pipe. For q = 0 we have the Poisson equation (corresponding to
9.6 Exercises 355 356 9 Nonlinear problems
a Newtonian fluid) and the problem is linear. A typical value for pseudo- and the logistic model arises from the simplest possible choice of a(u):
plastic fluids may be qn = −0.8. We can introduce the continuation r(u) = %(1 − u/M ), where M is the maximum value of u that the
parameter Λ ∈ [0, 1] such that q = qn Λ. Let {Λ` }n`=0 be the sequence of environment can sustain, and % is the growth under unlimited access to
Λ values in [0, 1], with corresponding q values {q` }n`=0 . We can then solve resources (as in the beginning when u is small). The idea is that a(u) ∼ %
a sequence of problems when u is small and that a(t) → 0 as u → M .
An a(u) that generalizes the linear choice is the polynomial form
−∇ · ||∇u` ||q` ∇u` = f, ` = 0, . . . , n,
a(u) = %(1 − u/M )p , (9.93)
where the initial guess for iterating on u` is the previously computed
solution u`−1 . If a particular Λ` leads to convergence problems, one may where p > 0 is some real number.
try a smaller increase in Λ: Λ∗ = 12 (Λ`−1 + Λ` ), and repeat halving the a) Formulate a Forward Euler, Backward Euler, and a Crank-Nicolson
step in Λ until convergence is reestablished. scheme for (9.92).
Hint. Use a geometric mean approximation in the Crank-Nicolson scheme:
[a(u)u]n+1/2 ≈ a(un )un+1 .
9.6 Exercises
b) Formulate Picard and Newton iteration for the Backward Euler
scheme in a).
Problem 9.1: Determine if equations are nonlinear or not
c) Implement the numerical solution methods from a) and b). Use
Classify each term in the following equations as linear or nonlinear. logistic.py to compare the case p = 1 and the choice (9.93).
Assume that u, u, and p are unknown functions and that all other
symbols are known quantities. d) Implement unit tests that check the asymptotic limit of the solutions:
u → M as t → ∞.
1. mu00 + β|u0 |u0 + cu = F (t)
2. ut = αuxx Hint. You need to experiment to find what “infinite time” is (increases
3. utt = c2 ∇2 u substantially with p) and what the appropriate tolerance is for testing
4. ut = ∇ · (α(u)∇u) + f (x, y) the asymptotic limit.
5. ut + f (u)x = 0 e) Perform experiments with Newton and Picard iteration for the model
6. ut + u · ∇u = −∇p + r∇2 u, ∇ · u = 0 (u is a vector field) (9.93). See how sensitive the number of iterations is to ∆t and p.
7. u0 = f (u, t) Filename: logistic_p.
8. ∇2 u = λeu
Filename: nonlinear_vs_linear.
Problem 9.3: Experience the behavior of Newton’s method
kam 44: where do we find this demo
Exercise 9.2: Derive and investigate a generalized logistic The program Newton_demo.py illustrates graphically each step in
model Newton’s method and is run like
kam 45: where is this program?
The logistic model for population growth is derived by assuming a
nonlinear growth rate, Terminal
Terminal> python Newton_demo.py f dfdx x0 xmin xmax
u0 = a(u)u, u(0) = I, (9.92)

Use this program to investigate potential problems with Newton’s method Exercise 9.6: Find the truncation error of arithmetic mean of
products
2
when solving e−0.5x cos(πx) = 0. Try a starting point x0 = 0.8 and
x0 = 0.85 and watch the different behavior. Just run
In Section 9.3.4 we introduce alternative arithmetic means of a product.
Terminal
Say the product is P (t)Q(t) evaluated at t = tn+ 1 . The exact value is
Terminal> python Newton_demo.py ’0.2 + exp(-0.5*x**2)*cos(pi*x)’ \ 2
’-x*exp(-x**2)*cos(pi*x) - pi*exp(-x**2)*sin(pi*x)’ \
n+ 21 n+ 12 n+ 12
0.85 -3 3 [P Q] =P Q
1
There are two obvious candidates for evaluating [P Q]n+ 2 as a mean of
and repeat with 0.85 replaced by 0.8.
values of P and Q at tn and tn+1 . Either we can take the arithmetic
mean of each factor P and Q,
Problem 9.4: Compute the Jacobian of a 2 × 2 system 1 1 1

[P Q]n+ 2 ≈ (P n + P n+1 ) (Qn + Qn+1 ), (9.95)
2 2
Write up the system (9.18)-(9.19) in the form F (u) = 0, F = (F0 , F1 ), or we can take the arithmetic mean of the product P Q:
u = (u0 , u1 ), and compute the Jacobian Ji,j = ∂Fi /∂uj .
1 1
[P Q]n+ 2 ≈ (P n Qn + P n+1 Qn+1 ) . (9.96)
2
Problem 9.5: Solve nonlinear equations arising from a The arithmetic average of P (tn+ 1 ) is O(∆t2 ):
2
vibration ODE
1
P (tn+ 1 ) = (P n + P n+1 ) + O(∆t2 ) .
Consider a nonlinear vibration problem 2 2
A fundamental question is whether (9.95) and (9.96) have different orders
mu00 + bu0 |u0 | + s(u) = F (t), (9.94) of accuracy in ∆t = tn+1 − tn . To investigate this question, expand
where m > 0 is a constant, b ≥ 0 is a constant, s(u) a possibly nonlinear quantities at tn+1 and tn in Taylor series around tn+ 1 , and subtract the
2
1
function of u, and F (t) is a prescribed function. Such models arise from true value [P Q]n+ 2 from the approximations (9.95) and (9.96) to see
Newton’s second law of motion in mechanical vibration problems where what the order of the error terms are.
s(u) is a spring or restoring force, mu00 is mass times acceleration, and
Hint. You may explore sympy for carrying out the tedious calculations.
bu0 |u0 | models water or air drag.
A general Taylor series expansion of P (t + 12 ∆t) around t involving just
a) Rewrite the equation for u as a system of two first-order ODEs, and a general function P (t) can be created as follows:
discretize this system by a Crank-Nicolson (centered difference) method.
1 1 >>> from sympy import *
With v = u0 , we get a nonlinear term v n+ 2 |v n+ 2 |. Use a geometric >>> t, dt = symbols(’t dt’)
n+ 21
average for v . >>> P = symbols(’P’, cls=Function)
>>> P(t).series(t, 0, 4)
b) Formulate a Picard iteration method to solve the system of nonlinear P(0) + t*Subs(Derivative(P(_x), _x), (_x,), (0,)) +
t**2*Subs(Derivative(P(_x), _x, _x), (_x,), (0,))/2 +
algebraic equations. t**3*Subs(Derivative(P(_x), _x, _x, _x), (_x,), (0,))/6 + O(t**4)
>>> P_p = P(t).series(t, 0, 4).subs(t, dt/2)
c) Explain how to apply Newton’s method to solve the nonlinear equa- >>> P_p
tions at each time level. Derive expressions for the Jacobian and the P(0) + dt*Subs(Derivative(P(_x), _x), (_x,), (0,))/2 +
right-hand side in each Newton iteration. dt**2*Subs(Derivative(P(_x), _x, _x), (_x,), (0,))/8 +
dt**3*Subs(Derivative(P(_x), _x, _x, _x), (_x,), (0,))/48 + O(dt**4)
Filename: nonlin_vib.
The error of the arithmetic mean, 12 (P (− 12 ∆t) + P (− 21 ∆t)) for t = 0 is

then ((1 + u2 )u0 )0 = 1, x ∈ (0, 1), u(0) = u(1) = 0 . (9.98)
>>> P_m = P(t).series(t, 0, 4).subs(t, -dt/2) a) Construct a Picard iteration method for (9.98) without discretizing
>>> mean = Rational(1,2)*(P_m + P_p) in space.
>>> error = simplify(expand(mean) - P(0))
>>> error b) Apply Newton’s method to (9.98) without discretizing in space.
dt**2*Subs(Derivative(P(_x), _x, _x), (_x,), (0,))/8 + O(dt**4)
c) Discretize (9.98) by a centered finite difference scheme. Construct a
Use these examples to investigate the error of (9.95) and (9.96) for
Picard method for the resulting system of nonlinear algebraic equations.
n = 0. (Choosing n = 0 is necessary for not making the expressions too
complicated for sympy, but there is of course no lack of generality by d) Discretize (9.98) by a centered finite difference scheme. Define the
using n = 0 rather than an arbitrary n - the main point is the product system of nonlinear algebraic equations, calculate the Jacobian, and set
and addition of Taylor series.) up Newton’s method for solving the system.
Filename: product_arith_mean. Filename: nonlin_1D_coeff_linearize.
Problem 9.7: Newton’s method for linear problems Problem 9.10: Finite differences for the 1D Bratu problem
Suppose we have a linear system F (u) = Au − b = 0. Apply Newton’s We address the so-called Bratu problem
method to this system, and show that the method converges in one
iteration. Filename: Newton_linear. u00 + λeu = 0, x ∈ (0, 1), u(0) = u(1) = 0, (9.99)
where λ is a given parameter and u is a function of x. This is a widely used
model problem for studying numerical methods for nonlinear differential
Exercise 9.8: Discretize a 1D problem with a nonlinear
equations. The problem (9.99) has an exact solution
coefficient
!
We consider the problem cosh((x − 12 )θ/2)
ue (x) = −2 ln ,
cosh(θ/4)
((1 + u2 )u0 )0 = 1, x ∈ (0, 1), u(0) = u(1) = 0 . (9.97) where θ solves
a) Discretize (9.97) by a centered finite difference method on a uniform √
mesh. θ= 2λ cosh(θ/4) .
b) Discretize (9.97) by a finite element method with P1 elements of There are two solutions of (9.99) for 0 < λ < λc and no solution for
equal length. Use the Trapezoidal method to compute all integrals. Set λ > λc . For λ = λc there is one unique solution. The critical value λc
up the resulting matrix system in symbolic form such that the equations solves
can be compared with those in a).
Filename: nonlin_1D_coeff_discretize. 1
p
1= 2λc sinh(θ(λc )/4) .
4
A numerical value is λc = 3.513830719.
Exercise 9.9: Linearize a 1D problem with a nonlinear a) Discretize (9.99) by a centered finite difference method.
coefficient
b) Set up the nonlinear equations Fi (u0 , u1 , . . . , uNx ) = 0 from a). Cal-
We have a two-point boundary value problem culate the associated Jacobian.
c) Implement a solver that can compute u(x) using Newton’s method. a LATEX document and compiling it to a PDF document or by using
Plot the error as a function of x in each iteration. http://latex.codecogs.com to display LATEX formulas in a web page.
Here are some appropriate Python statements for the latter purpose:
d) Investigate whether Newton’s method gives second-order convergence
by computing ||ue − u||/||ue − u− ||2 in each iteration, where u is solution from sympy import *
in the current iteration and u− is the solution in the previous iteration. ...
# expr_i holdes the integral as a sympy expression
Filename: nonlin_1D_Bratu_fd. latex_code = latex(expr_i, mode=’plain’)
# Replace u_im1 sympy symbol name by latex symbol u_{i-1}
latex_code = latex_code.replace(’im1’, ’{i-1}’)
# Replace u_ip1 sympy symbol name by latex symbol u_{i+1}
Problem 9.11: Integrate functions of finite element latex_code = latex_code.replace(’ip1’, ’{i+1}’)
# Escape (quote) latex_code so it can be sent as HTML text
expansions import cgi
html_code = cgi.escape(latex_code)
We shall investigate integrals on the form # Make a file with HTML code for displaying the LaTeX formula
Z f = open(’tmp.html’, ’w’)
L X # Include an image that can be clicked on to yield a new
f( uk ϕk (x))ϕi (x) dx, (9.100) # page with an interactive editor and display area where the
0 k # formula can be further edited
text = """
where ϕi (x) are P1 finite element basis functions and uk are unknown <a href="http://www.codecogs.com/eqnedit.php?latex=%(html_code)s"
coefficients, more precisely the values of the unknown function u at nodes target="_blank">
xk . We introduce a node numbering that goes from left to right and <img src="http://latex.codecogs.com/gif.latex?%(html_code)s"
title="%(latex_code)s"/>
also that all cells have the same length h. Given i, the integral only gets </a>
contributions from [xi−1 , xi+1 ]. On this interval ϕk (x) = 0 for k < i − 1 """ % vars()
and k > i + 1, so only three basis functions will contribute: f.write(text)
f.close()
X
uk ϕk (x) = ui−1 ϕi−1 (x) + ui ϕi (x) + ui+1 ϕi+1 (x) . The formula is displayed by loading tmp.html into a web browser.
k Filename: fu_fem_int.
The integral (9.100) now takes the simplified form
Z
Problem 9.12: Finite elements for the 1D Bratu problem
xi+1
f (ui−1 ϕi−1 (x) + ui ϕi (x) + ui+1 ϕi+1 (x))ϕi (x) dx .
xi−1
We address the same 1D Bratu problem as described in Problem 9.10.
Split this integral in two integrals over cell L (left), [xi−1 , xi ], and cell
a) Discretize (9.12) by a finite element method using a uniform mesh
R (right), [xi , xi+1 ]. Over cell L, u simplifies to ui−1 ϕi−1 + ui ϕi (since
with P1 elements. Use a group finite element method for the eu term.
ϕi+1 = 0 on this cell), and over cell R, u simplifies to ui ϕi + ui+1 ϕi+1 .
Make a sympy program that can compute the integral and write it out b) Set up the nonlinear equations Fi (u0 , u1 , . . . , uNx ) = 0 from a). Cal-
as a difference equation. Give the f (u) formula on the command line. culate the associated Jacobian.
Try out f (u) = u2 , sin u, exp u. Filename: nonlin_1D_Bratu_fe.
Hint. Introduce symbols u_i, u_im1, and u_ip1 for ui , ui−1 , and ui+1 ,
respectively, and similar symbols for xi , xi−1 , and xi+1 . Find formulas
Exercise 9.13: Discretize a nonlinear 1D heat conduction
for the basis functions on each of the two cells, make expressions for u
PDE by finite differences
on the two cells, integrate over each cell, expand the answer and simplify.
You can ask sympy for LATEX code and render it either by creating We address the 1D heat conduction PDE
Exercise 9.15: Derive Picard and Newton systems from a

%c(T )Tt = (k(T )Tx )x , variational form
for x ∈ [0, L], where % is the density of the solid material, c(T ) is the We study the multi-dimensional heat conduction PDE
heat capacity, T is the temperature, and k(T ) is the heat conduction
coefficient. T (x, 0) = I(x), and ends are subject to a cooling law: %c(T )Tt = ∇ · (k(T )∇T )
in a spatial domain Ω, with a nonlinear Robin boundary condition
k(T )Tx |x=0 = h(T )(T − Ts ), −k(T )Tx |x=L = h(T )(T − Ts ),
∂T
−k(T ) = h(T )(T − Ts (t)),
where h(T ) is a heat transfer coefficient and Ts is the given surrounding ∂n
temperature. at the boundary ∂Ω. The primary unknown is the temperature T , %
is the density of the solid material, c(T ) is the heat capacity, k(T ) is
a) Discretize this PDE in time using either a Backward Euler or Crank-
the heat conduction, h(T ) is a heat transfer coefficient, and Ts (T ) is a
Nicolson scheme.
possibly time-dependent temperature of the surroundings.
b) Formulate a Picard iteration method for the time-discrete problem
a) Use a Backward Euler or Crank-Nicolson time discretization and
(i.e., an iteration method before discretizing in space).
derive the variational form for the spatial problem to be solved at each
c) Formulate a Newton method for the time-discrete problem in b). time level.
d) Discretize the PDE by a finite difference method in space. Derive the b) Define a Picard iteration method from the variational form at a time
matrix and right-hand side of a Picard iteration method applied to the level.
space-time discretized PDE.
c) Derive expressions for the matrix and the right-hand side of the
e) Derive the matrix and right-hand side of a Newton method applied equation system that arises from applying Newton’s method to the
to the discretized PDE in d). variational form at a time level.
Filename: nonlin_1D_heat_FD.
d) Apply the Backward Euler or Crank-Nicolson scheme in time first.
Derive a Newton method at the PDE level. Make a variational form of
the resulting PDE at a time level.
Exercise 9.14: Use different symbols for different Filename: nonlin_heat_FE.
approximations of the solution
The symbol u has several meanings, depending on the context, as briefly
mentioned in Section 9.5.1. Go through the derivation of the Picard Exercise 9.16: Derive algebraic equations for nonlinear 1D
iteration method in that section and use different symbols for all the heat conduction
different approximations of u:
We consider the same problem as in Exercise 9.15, but restricted to one
• ue (x, t) for the exact solution of the PDE problem space dimension: Ω = [0, L]. Simplify the boundary condition to Tx = 0
• ue (x)n for the exact solution after time discretization (i.e., h(T ) = 0). Use a uniform finite element mesh of P1 elements, the
P
• un (x) for the spatially discrete solution j cj ψj group finite element method, and the Trapezoidal rule for integration
• un,k for approximation in Picard/Newton iteration no k to un (x) at the nodes to derive symbolic expressions for the algebraic equations
arising from this diffusion problem. Filename: nonlin_1D_heat_FE.
Filename: nonlin_heat_FE_usymbols.
Exercise 9.17: Differentiate a highly nonlinear term !

d du n−1 du
The operator ∇ · (α(u)∇u) with α(u) = |∇u| appears in several physical
q µ0 = −β, u0 (0) = 0, u(H) = 0,
dx dx dx
problems, especially flow of Non-Newtonian fluids. The expression |∇u|
is defined as the Euclidean norm of a vector: |∇u|2 = ∇u · ∇u. In a where β > 0 and µ0 > 0 are constants. A target value of n may be
Newton method one has to carry out the differentiation ∂α(u)/∂cj , for n = 0.2.
P
u = k ck ψk . Show that a) Formulate a Picard iteration method directly for the differential
∂ equation problem.
|∇u|q = q|∇u|q−2 ∇u · ∇ψj . b) Perform a finite difference discretization of the problem in each Picard
∂uj
iteration. Implement a solver that can compute u on a mesh. Verify that
Filename: nonlin_differentiate.
the solver gives an exact solution for n = 1 on a uniform mesh regardless
of the cell size.
Exercise 9.18: Crank-Nicolson for a nonlinear 3D diffusion c) Given a sequence of decreasing n values, solve the problem for each
equation n using the solution for the previous n as initial guess for the Picard
iteration. This is called a continuation method. Experiment with n =
Redo Section 9.5.2 when a Crank-Nicolson scheme is used to discretize (1, 0.6, 0.2) and n = (1, 0.9, 0.8, . . . , 0.2) and make a table of the number
the equations in time and the problem is formulated for three spatial of Picard iterations versus n.
dimensions.
d) Derive a Newton method at the differential equation level and dis-
Hint. Express the Jacobian as Ji,j,k,r,s,t = ∂Fi,j,k /∂ur,s,t and observe,
cretize the resulting linear equations in each Newton iteration with the
as in the 2D case, that Ji,j,k,r,s,t is very sparse: Ji,j,k,r,s,t =
6 0 only for
finite difference method.
r = i ± i, s = j ± 1, and t = k ± 1 as well as r = i, s = j, and t = k.
Filename: nonlin_heat_FD_CN_2D. e) Investigate if Newton’s method has better convergence properties
than Picard iteration, both in combination with a continuation method.
Exercise 9.19: Find the sparsity of the Jacobian

Consider a typical nonlinear Laplace term like ∇ · α(u)∇u discretized by 9.7 Symbolic nonlinear finite element equations
centered finite differences. Explain why the Jacobian corresponding to
this term has the same sparsity pattern as the matrix associated with The integrals in nonlinear finite element equations are computed by
the corresponding linear term α∇2 u. numerical integration rules in computer programs, so the formulas for
the variational form is directly transferred to numbers. It is of interest to
Hint. Set up the unknowns that enter the difference equation at a point
understand the nature of the system of difference equations that arises
(i, j) in 2D or (i, j, k) in 3D, and identify the nonzero entries of the
from the finite element method in nonlinear problems and to compare
Jacobian that can arise from such a type of difference equation.
with corresponding expressions arising from finite difference discretization.
Filename: nonlin_sparsity_Jacobian.
We shall dive into this problem here. To see the structure of the difference
equations implied by the finite element method, we have to find symbolic
expressions for the integrals, and this is extremely difficult since the
Problem 9.20: Investigate a 1D problem with a continuation
integrals involve the unknown function in nonlinear problems. However,
method
there are some techniques that allow us to approximate the integrals and
Flow of a pseudo-plastic power-law fluid between two flat plates can be work out symbolic formulas that can compared with their finite difference
modeled by counterparts.
9.7 Symbolic nonlinear finite element equations 367 368 9 Nonlinear problems
X
We shall address the 1D model problem (9.50) from the beginning of f (u) ≈ f (xj )ϕj ,
Section 9.4. The finite difference discretization is shown in Section 9.4.1, j
while the variational form based on Galerkin’s method is developed in where f (xj ) is the value of f at node j. Since f is a function of u,
Section 9.4.3. We build directly on formulas developed in the latter f (xj ) = f (u(xj )). Introducing uj as a short form for u(xj ), we can write
section.
X
f (u) ≈ f (uj )ϕj .
j
9.7.1 Finite element basis functions This approximation is known as the group finite element method or the
product approximation technique. The index j runs over all node numbers
Introduction of finite element basis functions ϕi means setting
in the mesh.
The principal advantages of the group finite element method are
ψi = ϕν(i) , i ∈ Is ,
two-fold:
where degree of freedom number ν(i) in the mesh corresponds to unknown
number i (ci ). In the present example, we use all the basis functions 1. Complicated nonlinear expressions can be simplified to increase the
except the last at i = Nn − 1, i.e., Is = {0, . . . , Nn − 2}, and ν(j) = j. efficiency of numerical computations.
The expansion of u can be taken as 2. One can derive symbolic forms of the difference equations arising from
the finite element method in nonlinear problems. The symbolic form
X
u=D+ cj ϕν(j) , is useful for comparing finite element and finite difference equations
j∈Is of nonlinear differential equation problems.
but it is more common in a finite element context to use a boundary Below, we shall explore point 2 to see exactly how the finite element
P
function B = j∈Ib Uj ϕj , where Uj are prescribed Dirichlet conditions method creates more complex expressions in the resulting linear system
for degree of freedom number j and Uj is the corresponding value. (the difference equations) that the finite difference method does. It turns
X out that is very difficult toR see what kind of terms in the difference
u = DϕNn −1 + cj ϕν(j) . equations that arise from f (u)ϕi dx without using the group finite
j∈Is element method or numerical integrationR utilizing the nodes only.
In the general case with u prescribed as Uj at some nodes j ∈ Ib , we set Note, however, that an expression like f (u)ϕi dx causes no problems
X X in a computer program as the integral is calculated by numerical integra-
u= Uj ϕj + cj ϕν(j) , tion using an existing approximation of u in f (u) such that the integrand
j∈Ib j∈Is can be sampled at any spatial point.
where cj = u(xν(j) ). That is, ν(j) maps unknown number j to the Simplified problem. Our aim now is to derive symbolic expressions
corresponding node number ν(j) such that cj = u(xν(j) ). for the difference equations arising from the finite element method in
nonlinear problems and compare the expressions with those arising in the
finite difference method. To this end, let us simplify the model problem
9.7.2 The group finite element method and set a = 0, α = 1, f (u) = u2 , and have Neumann conditions at
both ends such that we get a very simple nonlinear problem −u00 = u2 ,
Finite element approximation of functions of u. Since we already u0 (0) = 1, u0 (L) = 0. The variational form is then
P
expand u as j ϕj u(xj ), we may use the same approximation for other
Z Z
functions as well. For example, L L
u0 v 0 dx = u2 v dx − v(0), ∀v ∈ V .
0 0
The
R 2
term with u0 v 0 is well known so the only new feature is the term Occasionally, we want to interpret this expression in terms of finite
u v dx. differences, and to this end a rewrite of this expression is convenient:
To make the distance from finite element equations to finite difference
equations as short as possible, we shall substitute cj in the sum u =
P h h2
j cj ϕj by uj = u(xj ) since cj is the value of u at node j. (InP
the more (f (ui−1 ) + 4f (ui ) + f (ui+1 )) = h[f (u) − Dx Dx f (u)]i .
6 6
general case with Dirichlet conditions as well, we have a sum j cj ϕν(j)
where cj is replaced by u(xν(j) ). We can then introduce some other That is, the finite element treatment of f (u) (when using a group finite
P
counter k such that it is meaningful to write u = k uk ϕk , where k runs element method) gives the same term as in a finite difference approach,
P
over appropriate node numbers.) The quantity uj in j uj ϕj is the same f (ui ), minus a diffusion term which is the 2nd-order discretization of
1 2 00
as u at mesh point number j in the finite difference method, which is 6 h f (xi ).
commonly denoted uj . We may lump the mass matrix through integration with the Trape-
R zoidal rule so that M becomes diagonal in the finite element method. In
Integrating nonlinear functions. Consider the term u2 v dx in the
P that case the f (u) term in the differential equation gives rise to a single
variational formulation with v = ϕi and u = k ϕk uk :
term hf (ui ), just as in the finite difference method.
Z L X
( uk ϕk )2 ϕi dx .
0 k
9.7.3 Numerical integration of nonlinear terms by hand
Evaluating this integral for P1 elements (see Problem 9.11) results in the
R
expression Let us reconsider a term f (u)v dx as treated in the previous section,
but now we want to integrate this term numerically. Such an approach
h 2 can lead to easy-to-interpret formulas if we apply a numerical integration
(u + 2ui (ui−1 + ui+1 ) + 6u2i + u2i+1 ),
12 i−1 rule that samples the integrand at the node points xi only, because at
to be compared with the simple value u2i that would arise in a finite such points, ϕj (xi ) = 0 if j 6= i, which leads to great simplifications.
difference discretization when u2 is sampledR at mesh point xi . More The term in question takes the form
complicated f (u) functions in the integral 0L f (u)ϕi dx give rise to Z
much more lengthy expressions, if it is possible to carry out the integral X
L
f( uk ϕk )ϕi dx .
symbolically at all. 0 k
Application of the group finite element method. Let us use the group Evaluation of the integrand at a node x` leads to a collapse of the sum
P
finite element method to derive the terms in the difference equation k uk ϕk to one term because
corresponding to f (u) in the differential equation. We have X
uk ϕk (x` ) = u` .
Z Z ! k
L L X X Z L
X
f (u)ϕi dx ≈ ( ϕj f (uj ))ϕi dx = ϕi ϕj dx f (uj ) . f( uk ϕk (x` )) ϕi (x` ) = f (u` )δi` ,
0 0 j j 0 | {z } | {z }
k
δk` δi`
We
R
recognize this expression as the mass matrix M , arising from where we have used the Kronecker delta: δij = 0 if i 6= j and δij = 1 if
ϕi ϕj dx, times the vector f = (f (u0 ), f (u1 ), . . . , ): M f . The associ- i = j.
ated terms in the difference equations are, for P1 elements, Considering the Trapezoidal rule for integration, where the integration
h points are the nodes, we have
(f (ui−1 ) + 4f (ui ) + f (ui+1 )) .
6
Z P
L X Nn
X Group finite element method. We set α(u) ≈ α(uk )ϕk , and then
f( uk ϕk (x))ϕi (x) dx ≈ h f (u` )δi` − C = hf (ui ) .
k
0
we write
k `=0
This is the same representation of the f term as in the finite difference Z L X X Z L X

method. The term C contains the evaluations of the integrand at the α( ck ϕk )ϕ0i ϕ0j dx ≈ ( ϕk ϕ0i ϕ0j dx)α(uk ) = Li,j,k α(uk ) .
0
ends with weight 12 , needed to make a true Trapezoidal rule: k k |0 {z } k
Li,j,k
h h
C = f (u0 )ϕi (0) + f (uNn −1 )ϕi (L) . Further calculations are now easiest to carry out in the reference cell.
2 2
With P1 elements we can compute Li,j,k for the two k values that are
The answer hf (ui ) must therefore be multiplied by 12 if i = 0 or i = Nn −1. relevant on the reference cell. Turning to local indices, one gets
Note that C = 0 for i = 1, . . . , Nn − 2.
!
One can alternatively use the Trapezoidal rule on the reference cell (e) 1 1 −1
and assemble the contributions. It is a bit more labor in this context, but Lr,s,t = , t = 0, 1,
2h −1 1
working on the reference cell is safer as that approach is guaranteed to
handle discontinuous derivatives of finite element functions correctly (not where r, s, t = 0, 1 are indices over local degrees of freedom in the reference
P
important in this particular example), while the rule above was derived cell (i = q(e, r), j = q(e, s), and k = q(e, t)). The sum k Li,j,k α(uk )
P1 (e)
with the assumption that f is continuous at the integration points. at the cell level becomes t=0 Lr,s,t α(ũt ), where ũt is u(xq(e,t) ), i.e., the
The conclusion is that it suffices to use the Trapezoidal rule if one value of u at local node number t in cell number e. The element matrix
wants to derive the difference equations in the finite element method becomes
and make them similar to those arising in the finite difference method. !
The Trapezoidal rule has sufficient accuracy for P1 elements, but for P2 1 1 1 −1
(α(ũ0 ) + α(ũ(1) )) . (9.101)
elements one should turn to Simpson’s rule. 2 h −1 1
As usual, we employ a left-to-right numbering of cells and nodes. Row
number i in the global matrix gets contributions from the first row of
9.7.4 Discretization of a variable coefficient Laplace term the element matrix in cell i and the last row of the element matrix in
cell i − 1. In cell number i − 1 the sum α(ũ0 ) + α(ũ(1) ) corresponds to
Turning back to the model problem (9.50), it remains to calculate the α(ui−1 ) + α(ui ). The same sum becomes α(ui ) + α(ui+1 ) in cell number
contribution of the (αu0 )0 and boundary terms to the difference equations. i. We can with this insight assemble the contributions to row number i
The integral in the variational form corresponding to (αu0 )0 is in the global matrix:
Z L X
α( ck ψk )ψi0 ψj0 dx .
0 1
k (−(α(ui−1 )+α(ui )), α(ui−1 )+2α(ui )+α(ui+1 ), α(ui )+α(ui+1 )) .
P 2h
Numerical integration utilizing a value of k ck ψk from a previous itera-
tion must in general be used to compute the integral. Now our aim is to Multiplying by the vector of unknowns ui results in a formula that can
integrate symbolically, as much as we can, to obtain some insight into how be arranged to
the finite element method approximates this term. To be able to derive
symbolic expressions, we must either turn to the group finite element 1 1 1
method or numerical integration in the node points. Finite element basis − ( (α(ui ) + α(ui+1 ))(ui+1 − ui ) − (α(ui−1 ) + α(ui ))(ui − ui−1 )),
h 2 2
functions ϕi are now used. (9.102)
which is nothing but the standard finite difference discretization of The final term in the variational form is the Neumann condition at the
−(α(u)u0 )0 with an arithmetic mean of α(u) (and the usual factor h boundary: Cv(0) = Cϕi (0). With a left-to-right numbering only i = 0
because of the integration in the finite element method). will give a contribution Cv(0) = Cδi0 (since ϕi (0) 6= 0 only for i = 0).
Numerical integration at the nodes. Instead of using the group finite
element method andR exact integration we can turn to the Trapezoidal Summary
P
rule for computing 0L α( k uk ϕk )ϕ0i ϕ0j dx, again at the cell level since
For the equation
that is most convenient when we deal with discontinuous functions ϕ0i :
−(α(u)u0 )0 + au = f (u),
Z Z 1
1 X h 1 X 2 dϕ̃r 2 dϕ̃s h P1 finite elements results in difference equations where
α( ũt ϕ̃t )ϕ̃0r ϕ̃0s dX = α( ũt ϕ̃t ) dX
−1 2 −1 h dX h dX 2 x
t t=0
Z • the term −(α(u)u0 )0 becomes −h[Dx α(u) Dx u]i if the group
1
1 1 X finite element method or Trapezoidal integration is applied,
= (−1)r (−1)s α( ut ϕ̃t (X))dX
2h −1 t=0 • f (u) becomes hf (ui ) with Trapezoidal integration or the “mass
1 1 matrix” representation h[f (u) − h6 Dx Dx f (u)]i if computed by a
1 X X
≈ (−1)r (−1)s α( ϕ̃t (−1)ũt ) + α( ϕ̃t (1)ũt ) group finite element method,
2h t=0 t=0 • au leads to the “mass matrix” form ah[u − h6 Dx Dx u]i .
1 (1)
= (−1) (−1) (α(ũ0 ) + α(ũ )) . (9.103)
r s
2h
The element matrix in (9.103) is identical to the one in (9.101), showing
As we now have explicit expressions for the nonlinear difference equa-
that the group finite element method and Trapezoidal integration are
tions also in the finite element method, a Picard or Newton method
equivalent with a standard finite discretization of a nonlinear Laplace
can be defined as shown for the finite difference method. However, our
term (α(u)u0 )0 using an arithmetic mean for α: [Dx xDx u]i .
efforts in deriving symbolic forms of the difference equations in the fi-
nite element method was motivated by a desire to see how nonlinear
Remark about integration in the physical x coordinate terms in differential equations make the finite element and difference
We might comment on integration in the physical coordinate system method different. For practical calculations in computer programs we
too. The common Trapezoidal rule in Section 9.7.3 cannot be used apply numerical integration, normally the more accurate Gauss-Legendre
to integrate derivatives like ϕ0i , because the formula is derived under quadrature rules, to the integrals directly. This allows us to easily evaluate
the assumption of a continuous integrand. One must instead use the the nonlinear algebraic equations for a given numerical approximation
more basic version of the Trapezoidal rule where all the trapezoids of u (here denoted u− ). To solve the nonlinear algebraic equations we
are summed up. This is straightforward, but I think it is even more need to apply the Picard iteration method or Newton’s method to the
straightforward to apply the Trapezoidal rule on the reference cell variational form directly, as shown next.
and assemble the contributions.
R
The term auv dx in the variational form is linear and gives these
terms in the algebraic equations:
ah h2
(ui−1 + 4ui + ui+1 ) = ah[u − Dx Dx u]i .
6 6
376 10 Uncertainty quantification and polynomial chaos expansions
understanding of Chapter 2 in this book, the authors think the ideas can
Uncertainty quantification and
10
be conveyed much faster using those approximation concepts rather than
the more traditional exposition in the literature.
polynomial chaos expansions
Remark on a major probabilistic simplification. We assume that we
have some function u(x, t; ζ), where ζ is a vector of random parameters
(ζ0 , . . . , ζN ) that introduces uncertainty in the problem. Note that the
uncertainty is described by M + 1 stochastic scalar parameters and is
therefore discretized from a statistical point of view. If we have temporal
or spatial stochastic processes or stochastic space-time fields, these are
continuous, and the mathematical formulations become considerably
more complicated. Assuming the uncertainty to be described by a set
of discrete stochastic variables represents the same simplification as
we introduced in earlier chapters where we assumed the solution to be
P
discrete, u(x) ≈ i ci ψi (x), and worked with variational forms in finite-
dimensional spaces. We could, as briefly shown in Chapter 2, equally well
Differential equations often contain unknown parameters, but mathe- worked with variational forms in infinite-dimensional spaces, but the
matically these must be prescribed to run the solution method. If some theory would involve Sobolev spaces rather than the far more intuitive
parameters are uncertain, how uncertain is then the answer? The simplest vector spaces. In the same manner, stochastic processes and fields and
conceptual approach to this question is to let each parameter vary in an their mathematical theory are avoided when assuming the uncertainty
interval and use interval arithmetics to compute the interval of solution to be already discretized in terms of random variables.
at the vertices of the mesh. Unfortunately, this approach is terribly inac-
curate and not feasible. Also, any deterministic approach to the question
hides fact that uncertainty is strongly connected to statistical variations.
10.1 Sample problems
Therefore, a more sound approach is to prescribe some statistics of the
input parameters and then compute some statistics of the output. This is
called stochastic uncertainty quantification, or just the name uncertainty We shall look at two driving examples to illustrate the methods.
quantification, abbreviated UQ, implies a stochastic description of the
input of the model. There has been a heavy development of the UQ field
during the last couple of decades. 10.1.1 ODE for decay processes
One of the most exciting and promising methods of UQ for differential A simple ODE, governing decay processes [10] in space or time, reads
equation is polynomial chaos expansions. The literature on this method
might seem complicated unless one has firm background in probability u0 (t) = a(t)u(t), u(0) = I, (10.1)
theory. However, it turns out that all the seemingly different variations
of this method are all strongly connected if one knows the approximation where a and I can be uncertain input parameter, making u an uncertain
framework of Chapter 2 of this book and some very basic probability output parameter for each t.
theory. It is therefore natural to present polynomial chaos expansions If a is uncertain, we assume that it consists of uncertain piecewise
from perspective and make the ideas more accessible to a larger audience. constants: a = ai is a random variable in some time interval [ti , ti+1 ].
P
This is exactly the purpose of the present chapter. Typically, we can write a(t) = M i=0 ai ψ̂i (x), with ψ̂i being P0 elements
There are some excellent expositions of polynomial chaos in the recent in time, and ai are the unknown coefficients (in this case we count to i
literature by Dongbin Xiu [18, 19]. However, for the reader with an from 0 to M and consider I as certain).

10.1 Sample problems 377 378 10 Uncertainty quantification and polynomial chaos expansions
Stochastic ODE (ζ0 , . . . , ζM ) with all the random parameters, from α(x) only, from f (x)
only, or as a combination of randomness in α and f . Here, one can
To be specific, we let a be a scalar uncertain constant and let also I obviously include randomness in boundary conditions as well.
be uncertain. Then we have two stochastic variables, M = 2: ζ0 = a
and ζ1 = I.
The analytical solution to this ODE is u(t) = Ie−at , regardless 1D model problem
of which variables that are random or not. With only I as random A specific 1D problem for heat conduction,
variable, we see that the solution is simply a linear function of I,
so if I has a Gaussian probability distribution, u is for any t, the ∂
α(x)
∂u
= 0, x ∈ Ω = [0, L],
well-known transformation of expectation and variance a normally ∂x ∂x
distributed random variable: may have two materials with α as an uncertain constant in each
material. Let Ω = Ω0 = [0, L̂] ∪ Ω1 = [L̂, 0], where Ωi is material
number i. Let αi be the corresponding uncertain heat conduction
E[u] = E[I]e−at , coefficient value. Now we have M = 2, and ζ0 = α0 and ζ1 = α1 .
Var[u] = Var[I]e2at .
However, as soon as a is stochastic, the problem is, from a stochastic

view point nonlinear, so there is no simple transformation of the
probability distribution for a nor its statistics. 10.2 Basic principles
The basic statistics of expectance and variance for a random
variable x varying continuously on the interval ΩX is as The fundamental idea in polynomial chaos theory is that we now assume
the mapping from an input parameter ζ = (ζ0 , . . . , ζM ) ∈ W to u, to
Z
be smooth, because we known that smooth mappings can efficiently be
E[u(x)] = u(x)p(x) dx, approximated by the sum
ZΩ
Var[u] = (u(x) − E[u])2 p(x) dx . N
X X
Ω u(x; ζ ) = ci ψi (ζ ) = cj ψj ∈ V . (10.3)
i=0 j∈Is
This is the same formula as we used for approximation in Chapter (10.3).

Note that dimW = M + 1, while dimV = N + 1.
10.1.2 The stochastic Poisson equation Another basic principle in polynomial chaos theory is to choose the
basis functions in V , ψi (x), as orthogonal polynomials. The reason is
We also address the stationary PDE
that this choice has the potential that the sum (10.3) may converge
very fast so N can be kept small. kam 46: I don’t get this. I arrested
− ∇ · (α(x)∇u(x)) = f (x)in Ω, (10.2)
Jonathan also on this earlier and he did not provide an answer. I think
with appropriate Dirichlet, Neumann, or Robin conditions at the bound- orthogonality leads to easy or direct computations of moments such as E
ary, The coefficients α and f are assumed to be stochastic quantities and Var.
P
discretized in terms of expansions, e.g., α = M i=0 αi ψ̂i (x), where ψ̂i (x) is
some spatial discretization – think of, for instance, of piecewise constant Polynomial chaos expansions may involve many unknowns!
P0 finite elements or global sine functions, and the parameters αi as the
random uncertain variables in the problem. We form some final vector
10.2 Basic principles 379 380 10 Uncertainty quantification and polynomial chaos expansions
10.2.1 Basic statistical results

We have M + 1 uncertain parameters so the approximation problem
takes place in an N + 1 dimensional space, contrary to previous Our task is to compute the coefficients ci . When these are known, we can
chapters where N would be physical space, N = 1, 2, 3. For each easily compute the statistics of the response u of the differential equation,
variable, we will need a set of basis functions to represent the or any other quantity derived from u. For example, the expectation of u
variation in that variable. Suppose we use polynomials of order Q is
and lower in each variable ζi , i = 0, . . . , M . The total number of X X
terms in the expansion is then E[u] = E[ cj ψj ] = E[cj ]ψj (10.5)
j∈Is j∈Is
(Q + M )!
N +1= . (10.4) A third pillar among assumption for the polynomial chaos method is
Q!M ! that the random input variables are stochastically independent. There
Suppose we have 10 uncertain input variables (M + 1 = 10) and are methods to come around this problem, but these are very recent and
want to approximate each variable by a cubic polynomial (Q = 3). we keep the traditional assumption in the present of the theory here.
This results in N + 1 = 220 terms in the expansion. And a cubic There are four methods that have been used to compute ci :
variation is modest, so going to degree 10, demands 92,378 terms.
Consequently, for the polynomial chaos method to be effective, 1. Least squares minimization or Galerkin projection with exact compu-
we need very fast convergence and that lower-order polynomial tation of integrals
variation is sufficient. First of all this demands that the mapping 2. Least-squares minimization or Galerkin projection with numerical
from ζ to u is smooth, but the parameter is very often smooth even computation of integrals
though the solution u(x) in physical space is non-smooth. 3. Interpolation or collocation
4. Regression
Method 2 and 4 are the dominating ones. These go under the names
pseudo-spectral methods and stochastic collocation methods, respectively.
Remark on terminology
The development of the theory in this chapter builds on the seminal
work on polynomial chaos theory by Roger Ghanem and co-workers. 10.2.2 Least-squares methods
This theory expands functions in Hermite polynomials, which was
a great success for a range of linear Gaussian engineering safety A major difference for deterministic approximation methods and poly-
problems. To obtain faster convergence in non-Gaussian problems, nomial chaos based on least squares is that we introduce probabilistic
functions were expanded in other families of orthogonal polynomials, information in the inner product in the latter case. This means that we
which is the view we take here, but that method is known as must know the joint probability distribution p(ζ ) for the input parameters!
generalized polynomial chaos, often abbreviated gPC. In this text, Then we define the inner product between two elements u and w in V as
however, we use the shorter term polynomial chaos for all expansions Z
of stochastic parameter dependence in terms of global orthogonal (u, v) = E[uv] = u(ζ )v(ζ )p(ζ )dζ (10.6)
Ω
polynomials.
with the associated norm
||u||2 = (u, u) .
The domain Ω is the hypercube of input parameters:
10.2 Basic principles 381 382 10 Uncertainty quantification and polynomial chaos expansions
Ω = [ζ0,min , ζ0,max ] × [ζ1,min , ζ1,max ] × · · · × [ζN,min , ζN,max ] . 10.2.3 Example: Least squares applied to the decay ODE
Since the random input variables are independent, we can factorize the In this model problem, we are interested in the statistics of the solu-
joint probability distribution as tion u(t). The response is u(t) = Ie−at = ζ0 e−ζ1 t as our f (ζ) function.
Consequently,
p(ζ ) = Πi=0
N
pi (ζi ) . X
u(x) = ci (t)ψi (ζ0 , ζ1 ) .
We then have i∈Is
ζi,max
Z Furthermore,
E[ζ ] = E[ζ0 ]E[ζ1 ] · · · E[ζ0 ] = N
Πi=0 ζi pi (ζ0 )dζi .
ζi, min ζ0,max
Z ζ1,max
Z
(ζ0 e−ζ1 t , ψi )
Requiring that the polynomials are orthogonal means that ci (t) = = ||ψi ||−2 ζ0 e−ζ1 t ψi (ζ0 , ζ1 )p0 (ζ0 )p1 (ζ1 )dζ0 dζ1 .
||ψi ||2
ζ0,min ζ1,min
(u, v) = δij ||u||,
This is a quite heavy integral to perform by hand; it really depends on
where δij is the Kronecker delta: δij = 1 if and only if i = j, otherwise the complexity of the probability density function p0 for I and p1 for a.
δij = 0. A tool like SymPy or Mathematica is indispensable, but even these will
Let f be the solution computed by our mathematical model, either soon give up when i grows.
approximately by a numerical method or exact. When we do the least Uniform distributions!
squares minimization, we know from Chapter 2 that the associated linear
system is diagonal and its solution is
(ψi , f ) E[f ψi ] 10.2.4 Modeling the response

ci = = . (10.7)
||ψ||2 E[ψ 2 ] Our aim is to approximate the mapping from input to output in a
In case of orthonormal polynomials ψi we simply have ci = E[f ψi ]. simulation program. The output will often require solving a differen-
Galerkin’s method or project gives tial equation. Let f be the output response as a result of solving the
differential equation. We then want to find the mapping
X
(f − ci ψi , v) = 0, ∀v ∈ V, X
i u(x; ζ ) = ci (x)ψi (ζ ),
which is equivalent to i∈Is
X or if the response is some operation (functional) on the solution u, say

(f − ci ψi , ψj ) = 0, j = 0, . . . , N, f (u), we seek the mapping
i
X
and consequently the formula (10.7) for ci . f (ζ ) = ci (x)ψi (ζ ),
The major question, from a computational point of view, is how to i∈Is
evaluate the integrals in the formula for ci . Let us look at our two primary Normally, ci will depend on the spatial coordinate x as here, but in some
examples. cases it
Give examples with flux etc!
10.4 Intrusive polynomial chaos methods 383 384 10 Uncertainty quantification and polynomial chaos expansions
10.2.5 Numerical integration Let us address the decay ODE, u0 = −au, where we have the mapping
P
u = i ci ψi for u which can be inserted in the ODE, giving rise to a
Clenshaw-Curtis quadrature, Gauss-Legendre quadrature, Gauss-
residual R:
Patterson quadrature, Genz-Keister quadrature, Leja quadrature, Monte
Carlo integration, Optimal Gaussian quadrature, X X
R= c0i (t)ψi (ζ ) − ci ψi (ζ ) .
i i
Galerkin’s method is a common choice for incorporating the polynomial

10.2.6 Stochastic collocation chaos expansion when solving differential equations, so we require as
The stochastic collocation method is nothing but what we called regres- usual (R, v) = 0 for all v ∈ V , or expressed via basis functions:
sion in Chapter 2. We simply choose a large number P of evaluation X
points in the parameter space W and compute the response at these (ψi , ψj )c0i (t) = −a(ψi , ψi ) i = 0, . . . , N tp
j
points. Then we fit an orthogonal polynomial expansion to these points
using the principle of least squares. Here, in this educational text, the This is now a differential equation system in ci , which can be solved by
principle of least squares suffices, but one should that for real-life compu- any ODE solver. However, the size of this system is (N + 1) × (N + 1)
tations a lot of difficulties may happen, especially in large-scale problems. rather than just a scalar ODE in the original model and the non-intrusive
We then need to stabilize the least squares computations in various ways, method. This is characteristic for non-intrusive methods: the size of the
and for this purpose, one needs tailored software like Chaospy to be underlying system grows exponentially with the number of uncertain
briefly introduced in Section 10.3. Chaospy offers, for example, a wide parameters, and one always ends up in high-dimensional spaces for the
range of regression schemes. ODEs or the coordinates for PDEs.
Also, the choice of collocation points should make use of point families For the Poisson equation we insert the mapping
that suit this kind of problems. Again, Chaospy offers many collections X
of point families, and not surprisingly, numerical integration points u(x; ζ ) = ci (x)ψi (ζ ),
constitute popular families. i∈Is
and get a residual

X
R=− ∇ · (α(x)∇cj (x))ψj (ζ ) + f (x) . (10.8)
10.3 The Chaospy software j∈Is
Now we can apply Galerkin’s method to get equations for ci (x):

Refer to our ref for comparison with Dakota and Turns.
X
(ψj (ζ ), ψi (ζ ))∇ · (α(x)∇cj (x)) = (f (x), ψi (ζ )), i = 0, . . . , N .
10.4 Intrusive polynomial chaos methods i∈Is
Here, we think that we solve the Poisson equation by some finite difference
So far we have described non-intrusive polynomial chaos theory, which or element method on some mesh, so for each node x in the mesh, we
means that the stochastic solution method can use the underlying differ- get the above system for cj . hpl 47: Integration by parts?
ential equation solver “as is” without any modifications. This is a great
advantage, and it also makes the theory easier to understand. However,
one may construct intrusive methods that applies the mapping () in the
solution process of the differential equation model.
386 11 Variational methods for linear systems
k
X
11
xk+1 = xk + u, u= (11.2)
Variational methods for linear
cj qj
j=0
systems where qj ∈ Rn are known vectors and cj are constants to be determined.

To be specific, let q0 , . . . , qk be basis vectors for Vk+1 :
Vk+1 = span{q0 , . . . , qk } .
The associated inner product (·, ·) is here the standard Euclidean inner
product on Rn .
The corresponding error in the equation Ax = b, the residual, becomes
k
X
rk+1 = b − Axk+1 = rk − cj Aqj .
j=0
A successful family of methods, usually referred to as Conjugate Gradient-

like algorithms, or Krylov subspace methods, can be viewed as Galerkin
11.1 Conjugate gradient-like iterative methods
or least-squares methods applied to a linear system Ax = b. This view is
different from the standard approaches to deriving the classical Conjugate 11.1.1 The Galerkin method
Gradient method in the literature. Nevertheless, the fundamental ideas Galerkin’s method states that the error in the equation, the residual, is
of least squares and Galerkin approximations from Section 2 can be used orthogonal to the space Vk+1 where we seek the approximation to the
to derive the most popular and successful methods for linear systems, problem.
and this is the topic of the present chapter. Such a view may increase P
The Galerkin (or projection) method aims at finding u = j cj qj ∈
the general understanding of variational methods and their applicability. Vk+1 such that
Our exposition focuses on the basic reasoning behind the methods,
and a natural continuation of the material here is provided by several (rk+1 , v) = 0, ∀v ∈ Vk+1 .
review texts. Bruaset [5] gives an accessible theoretical overview of a
wide range of Conjugate Gradient-like methods. Barrett et al. [2] present This statement is equivalent to the residual being orthogonal to each
a collection of computational algorithms and give valuable information basis vector:
about the practical use of the methods. Saad [17] and Axelsson [1] have
evolved as modern, classical text books on iterative methods in general (rk+1 , qi ) = 0, i = 0, . . . , k . (11.3)
for linear systems. Inserting the expression for rk+1 in (11.3) gives a linear system for cj :
Given a linear system
k
X
Ax = b, x, b ∈ Rn , A ∈ Rn,n (11.1) (Aqi , qj )cj = (rk , qi ), i = 0, . . . , k . (11.4)
j=0
and a start vector x0 , we want to construct an iterative solution method
that produces approximations x1 , x2 , . . ., which hopefully converge to
the exact solution x. In iteration no. k we seek an approximation

11.1 Conjugate gradient-like iterative methods 387 388 11 Variational methods for linear systems
11.1.2 The least squares method (Since rk+1 involves Aqk , and qk ∈ Vk+1 , multiplying by A raises the
dimension of the Krylov space by 1, so Aqk ∈ Vk+2 .) The free parameters
The idea of the least-squares method is to minimize the square of the
βj can be used to enforce desirable orthogonality properties of q0 , . . . , qk+1 .
norm of the residual with respect to the free parameters c0 , . . . , ck . That For example, it is convenient to require that the coefficient matrices in
is, we minimize (rk+1 , rk+1 ):
the linear systems for c0 , . . . , ck are diagonal. Otherwise, we must solve
∂ k+1 k+1 ∂rk+1 k+1 a (k + 1) × (k + 1) linear system in each iteration. If k should approach
(r , r ) = 2( , r ) = 0, i = 0, . . . , k . n, the systems for the coefficients ci are of the same size as our original
∂ci ∂ci
system Ax = b! A diagonal matrix, however, ensures an efficient closed
Since ∂rk+1 /∂ci = −Aqi , this approach leads to the following linear form solution for c0 , . . . , ck .
system: To obtain a diagonal coefficient matrix, we require in Galerkin’s method
k
that
X
(Aqi , Aqj )cj = (rk+1 , Aqi ), i = 0, . . . , k . (11.5)
j=0 (Aqi , qj ) = 0 when i 6= j,
whereas we in the least-squares method require
11.1.3 Krylov subspaces (Aqi , Aqj ) = 0 when i 6= j .

To obtain a complete algorithm, we need to establish a rule to update We can define the inner product
the basis B = {q0 , . . . , qk } for the next iteration. That is, we need to
compute a new basis vector qk+1 ∈ Vk+2 such that hu, vi ≡ (Au, v) = uT Av, (11.9)
provided A is symmetric and positive definite. Another useful inner
B = {q0 , . . . , qk+1 } (11.6)
product is
is a basis for the space Vk+2 that is used in the next iteration. The present
family of methods applies the Krylov space, where Vk is defined as [u, v] ≡ (Au, Av) = uT AT Av . (11.10)
n o These inner products will be referred to as the A product, with the
Vk = span r0 , Ar0 , A2 r0 , . . . Ak−1 r0 . (11.7)
associated A norm, and the AT A product, with the associated AT A
Some frequent names of the associated iterative methods are therefore norm.
Krylov subspace iterations, Krylov projection methods, or simply Krylov The orthogonality condition on the qi vectors are then hqk+1 , qi i = 0
methods. It is a fact that Vk ⊂ Vk+1 and that r0 Ar0 , . . . Ak−1 r0 are in the Galerkin method and [qk+1 , qi ] = 0 in the least-squares method,
linearly independent vectors. where i runs from 0 to k. A standard Gram-Schmidt process can be
used for constructing qk+1 orthogonal to q0 , . . . , qk . This leads to the
determination of the β0 , . . . , βk constants as
11.1.4 Computation of the basis vectors
hrk+1 , qi i
A potential formula for updating qk+1 , such that qk+1 ∈ Vk+2 , is βi = (Galerkin) (11.11)
hqi , qi i
k
X [rk+1 , qi ]
qk+1 = rk+1 + βj qj . (11.8) βi = (least squares) (11.12)
j=0
[qi , qi ]
for i = 0, . . . , k.
11.1.5 Computation of a new solution vector Termination criteria for Conjugate Gradient-like methods is a subject on
its own [5].
The orthogonality condition on the basis vectors qi leads to the following
solution for c0 , . . . , ck :
11.1.6 Summary of the least squares method
(rk , qi )
ci = (Galerkin) (11.13) In the algorithm below, we have summarized the computational steps in
hqi , qi i
the least-squares method. Notice that we update the residual recursively
(rk , Aqi ) instead of using rk = b − Axk in each iteration since we then avoid a
ci = (least squares) (11.14)
[qi , qi ] possibly expensive matrix-vector product.
In iteration k, (rk , qi ) = 0 and (rk , Aqi ) = 0, for i = 0, . . . , k − 1, in 1. given a start vector x0 , compute r0 = b − Ax0 and set q0 = r0 .
the Galerkin and least squares case, respectively. Hence, ci = 0, for 2. for k = 0, 1, 2, . . . until termination criteria are fulfilled:
i = 0, . . . , k − 1. In other words, a. ck = (rk , Aqk )/[qk , qk ]
b. xk+1 = xk + ck qk
xk+1 = xk + ck qk . c. rk+1 = rk − ck Aqk
When A is symmetric and positive definite, one can show that also βi = 0, d. if A is symmetric then
for 0 = 1, . . . , k − 1, in both the Galerkin and least squares methods i. βk = [rk+1 , qk ]/[qk , qk ]
[5]. This means that xk and qk+1 can be updated using only qk and not
A. qk+1 = rk+1 − βk qk
the previous q0 , . . . , qk−1 vectors. This property has of course dramatic
effects on the storage requirements of the algorithms as the number of e. else
iterations increases. i. βj = [rk+1 , qj ]/[qj , qj ], j = 0, . . . , k
For the suggested algorithms to work, we must require that the denom- P
ii. qk+1 = rk+1 − kj=0 βj qj
inators in (11.13) and (11.14) do not vanish. This is always fulfilled for
the least-squares method, while a (positive or negative) definite matrix
A avoids break-down of the Galerkin-based iteration (provided qi 6= 0). Remark. The algorithm above is just a summary of the steps in the
The Galerkin solution method for linear systems was originally devised derivation of the least-squares method and should not be directly used
as a direct method in the 1950s. After n iterations the exact solution is for practical computations without further developments.
found in exact arithmetic, but at a higher cost than when using Gaussian
elimination. Naturally, the method did not receive significant popularity
before researchers discovered (in the beginning of the 1970s) that the 11.1.7 Truncation and restart
method could produce a good approximation to x for k n iterations.
When A is nonsymmetric, the storage requirements of q0 , . . . , qk may be
The method, called the Conjugate Gradient method, has from then on
prohibitively large. It has become a standard trick to either truncate or
caught considerable interest as an iterative scheme for solving linear
restart the algorithm. In the latter case one restarts the algorithm every
systems arising from PDEs discretized such that the coefficient matrix
K + 1-th step, i.e., one aborts the iteration and starts the algorithm again
becomes sparse. P
with x0 = xK . The other alternative is to truncate the sum kj=0 βj qj
Finally, we mention how to terminate the iteration. The simplest
and use only the last K vectors:
criterion is ||rk+1 || ≤ r , where r is a small prescribed tolerance. Some-
times it is more appropriate to use a relative residual, ||rk+1 ||/||r0 || ≤ r . k
X
xk+1 = xk + βj qj .
j=k−K
Both the restarted and truncated version of the algorithm require storage type y ← ax + y, the slightly more general variant q ← ax + y, as
of only K + 1 basis vectors qk−K , . . . , qk . The basis vectors are also often well as inner products.
called search direction vectors. The truncated version of the least-squares
algorithm above is widely known as Generalized Minimum Residuals,
shortened as GMRES, or GMRES(K) to explicitly indicate the number of
search direction vectors. In the literature one encounters the name Gen-
eralized Conjugate Residual method, abbreviated CGR, for the restarted
11.1.9 A framework based on the error
version of Orthomin. When A is symmetric, the method is known under Let us define the error ek = x − xk . Multiplying this equation by A leads
the name Conjugate Residuals. to the well-known relation between the error and the residual for linear
systems:
11.1.8 Summary of the Galerkin method Aek = rk . (11.15)
In case of Galerkin’s method, we assume that A is symmetric and positive Using r = Ae we can reformulate the Galerkin and least-squares
k k
definite. The resulting computational procedure is the famous Conjugate methods in terms of the error. The Galerkin method can then be written
Gradient method. Since A must be symmetric, the recursive update of
qk+1 needs only one previous search direction vector qk , that is, βj = 0 (rk , qi ) = (Aek , qi ) = hek , qi i = 0, i = 0, . . . , k . (11.16)
for j < k. For the least-squares method we obtain
0 0 0 0
1. Given a start vector x , compute r = b − Ax and set q0 = r .
2. for k = 1, 2, . . . until termination criteria are fulfilled: (rk , Aqi ) = [ek , qi ] = 0, i = 0, . . . , k . (11.17)
a. ck = (rk , qk )/hqk , qk i This means that

b. xk = xk−1 + ck qk
c. rk = rk−1 − ck Aqk
hek , vi = 0 ∀v ∈ Vk+1 (Galerkin)
d. βk = hrk+1 , qk i/hqk , qk i
e. qk+1 = rk+1 − βk qk [ek , v] = 0 ∀v ∈ Vk+1 (least-squares)
The previous remark that the listed algorithm is just a summary of the In other words, the error is A-orthogonal to the space Vk+1 in the
steps in the solution procedure, and not an efficient algorithm that should Galerkin method, whereas the error is AT A-orthogonal to Vk+1 in the
be implemented in its present form, must be repeated here. In general, least-squares method. When the error is orthogonal to a space, we find the
we recommend to rely on some high-quality linear algebra library that best approximation in the associated norm to the solution in that space.
offers an implementation of the Conjugate gradient method. Specifically here, it means that for a symmetric and positive definite A,
the Conjugate gradient method finds the optimal adjustment in Vk+1 of
the vector xk (in the A-norm) in the update for xk+1 . Similarly, the least
The computational nature of Conjugate gradient-like squares formulation finds the optimal adjustment in Vk+1 measured in
methods the AT A-norm.
Looking at the Galerkin and least squares algorithms above, one A lot of Conjugate gradient-like methods were developed in the 1980s
notice that the matrix A is only used in matrix-vector products. and 1990s, some of the most popular methods do not fit directly into
This means that it is sufficient to store only the nonzero entries of the framework presented here. The theoretical similarities between the
A. The rest of the algorithms consists of vector operations of the methods are covered in [5], whereas we refer to [2] for algorithms and prac-
tical comments related to widespread methods, such as the SYMMLQ
11.2 Preconditioning 393 394 11 Variational methods for linear systems
method (for symmetric indefinite systems), the Generalized Minimal For increased flexibility we can write M −1 = CL CR and transform the
Residual (GMRES) method, the BiConjugate Gradient (BiCG) method, system according to
the Quasi-Minimal Residual (QMR) method, and the BiConjugate Gra-
dient Stabilized (BiCGStab) method. When A is symmetric and positive CL ACR y = CL b, y = CR−1 x, (11.19)
definite, the Conjugate gradient method is the optimal choice with re-
where CL is the left and CR is the right preconditioner. If the original
spect to computational efficiency, but when A is nonsymmetric, the
coefficient matrix A is symmetric and positive definite, CL = CRT leads
performance of the methods is strongly problem dependent, and one
to preservation of these properties in the transformed system. This
needs to experiment.
is important when applying the Conjugate Gradient method to the
preconditioned linear system. Even if A and M are symmetric and
positive definite, M −1 A does not necessarily inherit these properties.
11.2 Preconditioning It appears that for practical purposes one can express the iterative
algorithms such that it is sufficient to work with a single preconditioning
11.2.1 Motivation and Basic Principles matrix M only [2, 5]. We shall therefore speak of preconditioning in
terms of the left preconditioner M in the following.
The Conjugate Gradient method has been subject to extensive analysis,
and its convergence properties are well understood. To reduce the initial
error e0 = x − x0 with a factor 0 < 1 after k iterations, or more 11.2.2 Use of the preconditioning matrix in the iterative
precisely, ||ek ||A ≤ ||e0 ||A , it can be shown that k is bounded by
methods
1 2√ Optimal convergence for the Conjugate Gradient method is achieved
ln κ,
2 when the coefficient matrix M −1 A equals the identity matrix I and
where κ is the ratio of the largest and smallest eigenvalue of A. The only one iteration is required. In the algorithm we need to perform
quantity κ is commonly referred to as the spectral condition number. matrix-vector products M −1 Au for an arbitrary u ∈ Rn . This means
Common finite element and finite difference discretizations of Poisson-like that we have to solve a linear system with M as coefficient matrix in
PDEs lead to κ ∼ h−2 , where h denotes the mesh size. This implies that each iteration since we implement the product y = M −1 Au in a two step
the Conjugate Gradient method converges slowly in PDE problems with fashion: First we compute v = Au and then we solve the linear system
fine meshes, as the number of iterations is proportional to h−1 . M y = v for y. The optimal choice M = A therefore involves the solution
To speed up the Conjugate Gradient method, we should manipulate of Ay = v in each iteration, which is a problem of the same complexity
the eigenvalue distribution. For instance, we could reduce the condition as our original system Ax = b. The strategy must hence be to compute
number κ. This can be achieved by so-called preconditioning. Instead an M −1 ≈ A−1 such that the algorithmic operations involved in the
of applying the iterative method to the system Ax = b, we multiply by inversion of M are cheap.
a matrix M −1 and apply the iterative method to the mathematically The preceding discussion motivates the following demands on the
equivalent system preconditioning matrix M :
M −1 Ax = M −1 b . (11.18) • M −1 should be a good approximation to A,

• M −1 should be inexpensive to compute,
The aim now is to construct a nonsingular preconditioning matrix or • M −1 should be sparse in order to minimize storage requirements,
algorithm such that M −1 A has a more favorable condition number than • linear systems with M as coefficient matrix must be efficiently solved.
A. We remark that we use the notation M −1 here to indicate that it
should resemble the inverse of the matrix A. Regarding the last property, such systems must be solved in O(n) op-
erations, that is, a complexity of the same order as the vector updates
11.2 Preconditioning 395 396 11 Variational methods for linear systems
in the Conjugate Gradient-like algorithms. These four properties are choice of ω. hpl 48: Claim! Does TCSE1 have experiments? We
contradictory and some sort of compromise must be sought. refer to [2, 5] for more information about classical iterative methods as
preconditioners.
kam 49: I would like some references here.
11.2.3 Classical iterative methods as preconditioners
The simplest possible iterative method for solving Ax = b is
11.2.4 Incomplete factorization preconditioners
xk+1 = xk + rk . Imagine that we choose M = A and solve systems M y = v by a direct
method. Such methods typically first compute the LU factorization M =
Applying this method to the preconditioned system M −1
Ax = M −1
b
L̄Ū and thereafter perform two triangular solves. The lower and upper
results in the scheme
triangular factors L̄ and Ū are computed from a Gaussian elimination
procedure. Unfortunately, L̄ and Ū contain nonzero values, so-called fill-
xk+1 = xk + M −1 rk ,
in, in many locations where the original matrix A contains zeroes. This
which is nothing but the formula for classical iterative methods such decreased sparsity of L̄ and Ū increases both the storage requirements
as the Jacobi method, the Gauss-Seidel method, SOR (Successive over- and the computational efforts related to solving systems M y = v. An
relaxation), and SSOR (Symmetric successive over-relaxation). This idea to improve the situation is to compute sparse versions of the factors
motivates for choosing M as one iteration of these classical methods. In L̄ and Ū . This is achieved by performing Gaussian elimination, but
particular, these methods provide simple formulas for M : neglecting the fill-in (!). In this way, we can compute approximate factors
Lb and Ub that become as sparse as A. The storage requirements are hence
• Jacobi preconditioning: M = D.
only doubled by introducing a preconditioner, and the triangular solves
• Gauss-Seidel preconditioning: M = D + L.
become an O(n) operation since the number of nonzeroes in the L b and
• SOR preconditioning: M = ω −1 D + L. −1 −1 U matrices (and A) is O(n) when the underlying PDE is discretized
b
• SSOR preconditioning: M = (2−ω)−1 ω −1 D + L ω −1 D ω D+U
by finite difference or finite element methods. We call M = L bUb an
Turning our attention to the four requirements of the preconditioning incomplete LU factorization preconditioner, often just referred to as the
matrix, we realize that the suggested M matrices do not demand addi- ILU preconditioner.
tional storage, linear systems with M as coefficient matrix are solved Instead of throwing away all fill-in entries, we can add them to the
effectively in O(n) operations, and M needs no initial computation. The main diagonal. This yields the modified incomplete LU factorization
only questionable property is how well M approximates A, and that is (MILU). One can also allow a certain amount of fill-in to improve the
the weak point of using classical iterative methods as preconditioners. quality of the preconditioner, at the expense of more storage for the
The Conjugate Gradient method can only utilize the Jacobi and SSOR factors. Libraries offering ILU preconditioners will then often have a
preconditioners among the classical iterative methods, because the M tolerance for how small a fill-in element in the Gaussian elimination must
matrix in that case is on the form M −1 = CL CLT , which is necessary be to be dropped, and a parameter that governs the maximum number
to ensure that the coefficient matrix of the preconditioned system is of nonzeros per row in the factors. A popular ILU implementation is the
symmetric and positive definite. For certain PDEs, like the Poisson open source C code SuperLU. This preconditioner is available to Python
equation, it can be shown that the SSOR preconditioner reduces the programmers through the scipy.sparse.linalg.spilu function. hpl
condition number with an order of magnitude, i.e., from O(h−2 ) to 50: Need some references to textbooks; Axelsson? Saad?
O(h−1 ), provided we use the optimal choice of the relaxation parameter The general algorithm for MILU preconditioning follows the steps
ω. According to experiments, however, the performance of the SSOR of traditional exact Gaussian elimination, except that we restrict the
preconditioned Conjugate Gradient method is not very sensitive to the computations to the nonzero entries in A. The factors L b and U b can
11.2 Preconditioning 397
be stored directly in the sparsity structure of A, that is, the algorithm

overwrites a copy M of A with its MILU factorization. The steps in the
MILU factorizations are listed below.
1. Given a sparsity pattern as an index set I, copy Mi,j ← Ai,j , i, j =
1, . . . , n
2. for k = 1, 2, . . . , n
a. for i = k + 1, . . . , n
•if (i, k) ∈ I then
i. Mi,k ← Mi,k /Mk,k
•else
i. Mi,k = 0
•r = Mi,k
•for j = k + 1, . . . , n
–if j = i then
Pn
·Mi,j ← Mi,j − rMk,j + p=k+1 (Mi,p − rMk,p )
–else
·if (i, j) ∈ I then
·Mi,j ← Mi,j − rMk,j
·else
·Mi,j = 0
We also remark here that the algorithm above needs careful refinement
before it should be implemented in a code. For example, one will not run
through a series of (i, j) indices and test for each of them if (i, j) ∈ I.
Instead one should run more directly through the sparsity structure of
A.
400 A Useful formulas
n+ 12 n− 12
A
u −u
u0 (tn ) ≈ [Dt u]n = (A.1)
Useful formulas ∆t
un+1 − un−1
u0 (tn ) ≈ [D2t u]n = (A.2)
2∆t
un − un−1
u (tn ) = [Dt u] =
0 − n
(A.3)
∆t
+ n un+1 − un
u (tn ) ≈ [Dt u] =
0
(A.4)
∆t
n+1
u − un
u0 (tn+θ ) = [D̄t u]n+θ = (A.5)
∆t
2− n 3un − 4un−1 + un−2
u (tn ) ≈ [Dt u] =
0
(A.6)
2∆t
un+1 − 2un + un−1
u (tn ) ≈ [Dt Dt u] =
00 n
(A.7)
∆t2
t n+ 12 1 n+1
u(tn+ 1 ) ≈ [u ] = (u + un ) (A.8)
A.1 Finite difference operator notation 2 2
1
u(tn+ 1 )2 ≈ [u2 ]n+ 2 = un+1 un
t,g
(A.9)
2
All the finite differences here, and their corresponding operator notation. 1 2
take place on a time mesh for a function of time. The same formulas u(tn+ 1 ) ≈ [ut,h ]n+ 2 = 1 1 (A.10)
2 +
apply in space too (just replace t by a spatial coordinate and add spatial un+1 un
coordinates in u). u(tn+θ ) ≈ [ut,θ ]n+θ = θun+1 + (1 − θ)un , (A.11)

kam 51: trenger å sette navn ved de forskjellge formlene. Noen steder tn+θ = θtn+1 + (1 − θ)tn−1 (A.12)
er det ” = ” og andre steder er det ” ≈ ”
A.2 Truncation errors of finite difference approximations

A.3 Finite differences of exponential functions 401 402 A Useful formulas
n+ 12 n− 21
ue − ue
u0e (tn ) = [Dt ue ]n + Rn = + Rn ,
∆t 2 4 ω∆t
1 000 [Dt Dt u]n = un (cos ω∆t − 1) = − sin2 , (A.21)
Rn = − u (tn )∆t2 + O(∆t4 ) (A.13) ∆t ∆t 2
24 e 1
un+1 − un−1 [Dt+ u]n = un (exp (iω∆t) − 1), (A.22)
u0e (tn ) = [D2t ue ]n + Rn = e e
+ Rn , ∆t
2∆t 1
1 [Dt− u]n = un (1 − exp (−iω∆t)), (A.23)
Rn = − u000 (tn )∆t2 + O(∆t4 ) (A.14) ∆t
6 e

2 ω∆t
[Dt u]n = un i sin , (A.24)
u0e (tn ) = [Dt− ue ]n + Rn = e
un − un−1
e
+ Rn , ∆t 2
∆t 1
1 [D2t u]n = un i sin (ω∆t) . (A.25)
Rn = − u00e (tn )∆t + O(∆t2 ) (A.15) ∆t
2
un+1
− uen Real exponentials. Let un = exp (ωn∆t) = eωtn .
u0e (tn ) = [Dt+ ue ]n + Rn = e + Rn ,
∆t
1
2 4 ω∆t
Rn = − u00e (tn )∆t + O(∆t2 ) (A.16) [Dt Dt u]n = un (cos ω∆t − 1) = − sin2 , (A.26)
2 ∆t ∆t 2
1
n+1 n
u − ue
u0e (tn+θ ) = [D̄t ue ]n+θ + Rn+θ = e + Rn+θ , [Dt+ u]n = un (exp (iω∆t) − 1), (A.27)
∆t ∆t
1 1 1
Rn+θ = − (1 − 2θ)u00e (tn+θ )∆t + ((1 − θ)3 − θ3 )u000 2
e (tn+θ )∆t + [Dt− u]n = un (1 − exp (−iω∆t)), (A.28)
2 6 ∆t
O(∆t3 ) (A.17) 2 ω∆t
[Dt u]n = un i sin , (A.29)
3un − 4un−1 + uen−2 ∆t 2
ue (tn ) =
0
[Dt2− ue ]n +R = e
n e
+ Rn , 1
2∆t [D2t u]n = un i sin (ω∆t) . (A.30)
1 ∆t
R = u000
n
(tn )∆t2 + O(∆t3 ) (A.18)
3 e
un+1 − 2une + un−1
u00e (tn ) = [Dt Dt ue ]n + Rn = e e
+ Rn , A.4 Finite differences of tn
∆t2
1
Rn = − u0000 (tn )∆t2 + O(∆t4 ) (A.19) The following results are useful when checking if a polynomial term in a
12 e
solution fulfills the discrete equation for the numerical method.
ue (tn+θ ) = [ue t,θ ]n+θ + Rn+θ = θun+1

e + (1 − θ)une + Rn+θ , [Dt+ t]n = 1, (A.31)
1 [Dt− t]n = 1, (A.32)
Rn+θ = − u00e (tn+θ )∆t2 θ(1 − θ) + O(∆t3 ) . (A.20)
2
[Dt t]n = 1, (A.33)
[D2t t]n = 1, (A.34)
A.3 Finite differences of exponential functions [Dt Dt t]n = 0 . (A.35)
The next formulas concern the action of difference operators on a t2

Complex exponentials. Let un = exp (iωn∆t) = eiωtn .
term.
A.4 Finite differences of tn 403 404 A Useful formulas
return (u(t + dt) - 2*u(t) + u(t-dt))/(dt**2)
[Dt+ t2 ]n = (2n + 1)∆t, (A.36)

op_list = [D_t_forward, D_t_backward,
[Dt− t2 ]n = (2n − 1)∆t, (A.37) D_t_centered, D_2t_centered, D_t_D_t]
[Dt t2 ]n = 2n∆t, (A.38) def ft1(t):

return t
[D2t t2 ]n = 2n∆t, (A.39)
[Dt Dt t2 ]n = 2, (A.40) def ft2(t):
return t**2
Finally, we present formulas for a t3 term: These must be controlled def ft3(t):
against lib.py. Use tn instead of n∆t?? return t**3
def f_expiwt(t):
return exp(I*w*t)
[Dt+ t3 ]n = 3(n∆t)2 + 3n∆t2 + ∆t2 , (A.41)
def f_expwt(t):
[Dt− t3 ]n = 3(n∆t)2 − 3n∆t2 + ∆t2 , (A.42) return exp(w*t)
1
[Dt t3 ]n = 3(n∆t)2 + ∆t2 , (A.43) func_list = [ft1, ft2, ft3, f_expiwt, f_expwt]
4
[D2t t3 ]n = 3(n∆t)2 + ∆t2 , (A.44) To see the results, one can now make a simple loop like
[Dt Dt t3 ]n = 6n∆t, (A.45) for func in func_list:
for op in op_list:
f = func
e = op(f)
A.4.1 Software e = simplify(expand(e))
print e
Application of finite difference operators to polynomials and exponential if func in [f_expiwt, f_expwt]:
e = e/f(t)
functions, resulting in the formulas above, can easily be computed by e = e.subs(t, n*dt)
some sympy code: print expand(e)
print factor(simplify(expand(e)))
from sympy import *
t, dt, n, w = symbols(’t dt n w’, real=True)
# Finite difference operators
def D_t_forward(u):
return (u(t + dt) - u(t))/dt
def D_t_backward(u):
return (u(t) - u(t-dt))/dt
def D_t_centered(u):
return (u(t + dt/2) - u(t-dt/2))/dt
def D_2t_centered(u):
return (u(t + dt) - u(t-dt))/(2*dt)
def D_t_D_t(u):
406 REFERENCES
neering. Springer, 2016. http://hplgit.github.io/decay-book/

References doc/web/.
[11] H. P. Langtangen and A. Logg. Writing More Advanced FEniCS
Programs – The FEniCS Tutorial II. 2016. http://hplgit.github.
io/fenics-tutorial/doc/web/.
[12] H. P. Langtangen and A. Logg. Writing State-of-the-Art PDE Solvers
in Minutes – The FEniCS Tutorial I. 2016. http://hplgit.github.
io/fenics-tutorial/doc/web/.
[13] M. G. Larson and F. Bengzon. The Finite Element Method: Theory,
Implementation, and Applications. Texts in Computational Science
and Engineering. Springer, 2013.
[14] A. Logg, K.-A. Mardal, and G. N. Wells. Automated Solution of
Differential Equations by the Finite Element Method. Springer, 2012.
[15] M. Mortensen, H. P. Langtangen, and G. N. Wells. A FEniCS-
based programming framework for modeling turbulent flow by the
[1] O. Axelsson. Iterative Solution Methods. Cambridge University Reynolds-averaged Navier-Stokes equations. Advances in Water
Press, 1996. Resources, 34(9), 2011.
[2] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, [16] A. Quarteroni and A. Valli. Numerical Approximation of Partial
V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates Differential Equations. Springer, 1994.
for the Solution of Linear Systems: Building Blocks for Iterative [17] Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM,
Methods. SIAM, second edition, 1994. http://www.netlib.org/ second edition, 2003. http://www-users.cs.umn.edu/~saad/
linalg/html_templates/Templates.html. IterMethBook_2ndEd.pdf.
[3] D. Braess. Finite Elements: Theory, Fast Solvers, and Applications [18] D. Xiu. Fast numerical methods for stochastic computations: A
in Solid Mechanics. Cambridge University Press, third edition, 2007. review. Communications in Computational Physics, 5:242–272, 2009.
[4] S. Brenner and R. Scott. The Mathematical Theory of Finite Element [19] D. Xiu. Numerical Methods for Stochastic Computations: A Spectral
Methods. Springer, third edition, 2007. Method Approach. Princeton University Press, 2010.
[5] A. M. Bruaset. A Survey of Preconditioned Iterative Methods. Chap-
man and Hall, 1995.
[6] H. Elman, D. Silvester, and A. Wathen. Finite Elements and Fast
Iterative Solvers: with Applications in Incompressible Fluid Dynamics.
Oxford University Press, second edition, 2015.
[7] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson. Computational
Differential Equations. Cambridge University Press, second edition,
1996.
[8] C. Johnson. Numerical Solution of Partial Differential Equations by
the Finite Element Method. Dover, 2009.
[9] C. T. Kelley. Iterative Methods for Linear and Nonlinear Equations.
SIAM, 1995.
[10] H. P. Langtangen. Finite Difference Computing with Exponential
Decay Models. Lecture Notes in Computational Science and Engi-

408 INDEX
Gauss-Legendre quadrature, 116 mapping of reference cells

group finite element method, 369 affine mapping, 88
isoparametric mapping, 124
Index hat function, 80 mass lumping, 108, 248
Hermite polynomials, 114 mass matrix, 108, 245, 248
mesh
ILU, 396
finite elements, 73
incomplete factorization, 396
method of weighted residuals, 144
integration by parts, 151
Midpoint rule, 116
internal node, 75
MILU, 396
interpolation method (approxima-
mixed finite elements, 275
tion), 39
intrusive polynomial chaos, 383 natural boundary condition, 165
isoparametric mapping, 124 Newton-Cotes rules, 116
non-intrusive polynomial chaos,
Kronecker delta, 42, 76
383
Krylov space, 387
norm, 11
AT A = AT b (normal equations), continuation method, 354, 365
Lagrange (interpolating) polyno- normal equations, 27
27 convection-diffusion, 172, 229 numerical integration
mial, 42
latex.codecogs.com web site, Midpoint rule, 116
affine mapping, 88, 123 degree of freedom, 109
361 Newton-Cotes formulas, 116
approximation dof map, 110
least squreas method Simpson’s rule, 116
by sines, 34 dof_map list, 111
vectors, 12 Trapezoidal rule, 116
collocation, 39
interpolation, 39 edges, 121 linear elements, 81
online rendering of LATEX formulas,
of functions, 18 element matrix, 85 linear solvers
361
of general vectors, 14 essential boundary condition, 165 conjugate gradients, 391
of vectors in the plane, 11 Expression, 127 GCR, 390 P1 element, 81
assembly, 86 generalized conjugate residu- P2 element, 81
faces, 121 als, 390
FEniCS, 125 Petrov-Galerkin methods, 229
basis vector, 11 GMRES, 390 Picard iteration, 309
Bernstein(interpolating) polyno- finite element basis function, 80 minimum residuals, 390 polynomial chaos, 375
mial, 48 finite element expansion preconditioning, 393
reference element, 110 preconditioning, 393
linear systems classical iterations, 395
cell, 109 finite element mesh, 73 preconditioned, 393
finite element, definition, 110 product approximation technique,
cells list, 111 linearization, 308
fixed-point iteration, 309 369
Chaospy software, 383 explicit time integration, 306
FunctionSpace, 127 project (FEniCS function), 127
chapeau function, 80 fixed-point iteration, 309 projection
Chebyshev nodes, 45 Picard iteration, 309 functions, 20
collocation method (approxima- Galerkin method successive substitutions, 309
functions, 20 vectors, 13, 17
tion), 39 lumped mass matrix, 108, 248
vectors, 13, 17
407
INDEX 409
quadratic elements, 81
reference cell, 109

relaxation (nonlinear equations),
314
residual, 142
Runge’s phenomenon, 44
search (direction) vectors, 390

shared node, 75
simplex elements, 121
simplices, 121
Simpson’s rule, 116
single Picard iteration technique,
310
solve (FEniCS function), 127
sparse matrices, 101
stiffness matrix, 245
stopping criteria (nonlinear prob-
lems), 310, 326
strong form, 152
successive substitutions, 309
tensor product, 57
test function, 3, 145
test space, 145
TestFunction, 127
Trapezoidal rule, 116
trial function, 3, 145
trial space, 145
TrialFunction, 127
variational formulation, 144

vertex, 109
vertices list, 111
weak form, 152

weak formulation, 144
weighted residuals, 144

Introduction To Numerical Methods For Variational Problems PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Numerical Methods For Variational Problems PDF

Uploaded by

Copyright:

Available Formats

Introduction to Numerical

Methods for Variational

Hans Petter Langtangen1,2

This easy-to-read book introduces the basic ideas and technicalities

Sep 18, 2016

modern, very powerful tool for solving partial differential equations by

© 2016, Hans Petter Langtangen, Kent-Andre Mardal.

Supplementary materials. All program and data files referred to in this

Oslo, September 2016 Hans Petter Langtangen, Kent-Andre Mardal

2.4.2 Lagrange polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Function approximation by finite elements . . . . . . . . . . . 71

© 2016, Hans Petter Langtangen, Kent-Andre Mardal.

A.1 Finite difference operator notation . . . . . . . . . . . . . . . . . . . . . . 399

Exercise 3.12: Implement 3D computations with global basis

© 2016, Hans Petter Langtangen, Kent-Andre Mardal.

Exercise 9.13: Discretize a nonlinear 1D heat conduction PDE by

The finite element method is a versatile approach to construct compu-

−∇2 u = −uxx − uyy = f,

ui−1,j − 2ui,j + ui+1,j ui,j−1 − 2ui,j + ui,j+1

© 2016, Hans Petter Langtangen, Kent-Andre Mardal.

We will now show a complete FEniCS program to illustrate how a typical Z Z

method is actually just one way to choose the expansion functions ψ,

2.1 Approximation of vectors

© 2016, Hans Petter Langtangen, Kent-Andre Mardal.

are here written in boldface font there should be no confusion. We

The least squares method. We now want to determine the u that

0 E(c0 ) = (e, e) = (f − c0 ψ0 , f − c0 ψ0 ) . (2.5)

and f ). Then we see mathematically that e is orthogonal to any vector

sum of squares of differences between the components in f and u. We

Ai,j = (ψi , ψj ), (2.28)

Example. We repeat the computational example from Section 2.4.1, but 4 4 4

to be approximated by a linear function on Ω = [1, 2]. We divide Ω into 2 2 2

m + 2 intervals and use the inner m + 1 points: 4

Fig. 2.3 Approximation of a parabola by a regression method with varying number of

# Create m+3 points and use the inner m+1 points

points = np.linspace(Omega[0], Omega[1], m+3)[1:-1] 4

u, c = regression(f, psi, points)

sensitive to the number of points as long as they cover a significant part 12

domain and produces a good approximation which is barely improved

Fig. 2.5 Approximation of a parabola with noise by a parabola.

u(x) = 10x − 13.2, 2 points

r = b − Ac0 . exact sympy numpy32 numpy64

The exact value of cj , j = 0, 1, . . . , 10, appears in the first column while

2.3.2 Fourier series

u = 0 Below is such a method. It requires Python functions f(x) and psi(x,i)

0.25 2.4.2 Lagrange polynomials

p = 1 Approximation of a polynomial. The Galerkin or least squares methods

>>> x = 0.5 # numerical computing x = sym.Symbol(’x’)

def Chebyshev_nodes(a, b, N):

xi = (a + b) + (b − a) cos pi , i = 0 . . . , N, (2.60) 0.4

Fig. 2.11 Interpolation of an absolute value function by Lagrange polynomials and

def least_squares(f, psi, Omega):

main challenge, from a numerical analysis point of view, is to determine

for completeness we include a log-log plot in Figure 2.20 to illustrate the

u, c = regression_with_noise(log(Bernstein), psi, N_values) 2.6 Approximation of functions in higher dimensions

Alternatively, we may employ a single index,

= ψ̂p (x)ψ̂r (x)dx, = ψ̂q (y)ψ̂s (y)dy,

Fig. 2.21 Approximation of a 2D quadratic function (left) by a 2D bilinear function

provided integrand is an expression involving the sympy symbols x and

(0) (1) (m)

Problem 2.1: Linear algebra refresher u(x) = c0 ψ0 + c1 ψ1 = c0 + c1 sin(πx) .

Problem 2.10: Approximate a steep function by Lagrange

Function approximation by finite

Z L (u00 , v) = −(f, v), ∀v ∈ V . (4.41)

which is seen to be a linear system with entries

Z Z 4.4.1 Extensions of the code for approximation

−ûxx + ûx = 1, ∈ (0, 1), (4.61)