You are on page 1of 380

# Understanding and Implementing

## Understanding and Implementing the Finite Element Method

Mark S. Gockenbach
Michigan Technological University Houghton, Michigan

51HJTL
Society for Industrial and Applied Mathematics Philadelphia

Library of Congress Cataloging-in-Publication Data Gockenbach, Mark S. Understanding and implementing the finite element method / Mark S. Gockenbach. p. cm. Includes bibliographical references and index. ISBN 0-89871-614-4 (pbk.) 1. Finite element method. 2. Finite element methodData processing. TA347.F5G63 2006 518'.25dc22

I. Title. 2006045012

Contents
Preface xiii

1 1

## The Basic Framework for Stationary Problems

Some model PDEs 3 1.1 Laplace's equation; elliptic BVPs 3 1.1.1 Physical experiments modeled by Laplace's equation . . . 5 1.2 Other elliptic BVPs 8 1.2.1 The equations of isotropic elasticity 8 1.2.2 General linear elasticity 10 1.3 Exercises for Chapter 1 11 The weak form of a BVP 2.1 Review of vector calculus 2.1.1 The divergence theorem 2.1.2 Green's identity 2.1.3 Other forms of the divergence theorem and Green's identity 2.2 The weak form of a BVP 2.2.1 Minimization of energy 2.2.2 Relaxing the PDE 2.2.3 A few details about Sobolev spaces 2.3 The weak form for other boundary conditions and PDEs 2.3.1 Neumann conditions and the weak form 2.3.2 Mixed boundary conditions 2.3.3 Inhomogeneous boundary conditions 2.3.4 Other elliptic BVPs 2.4 Existence and uniqueness theory for the weak form of a BVP 2.4.1 Vector spaces and inner products 2.4.2 Hilbert spaces 2.4.3 Linear functionals 2.4.4 The Riesz representation theorem . 2.4.5 Variational problems and the Riesz representation theorem
vii

15 15 15 17 18 20 21 23 27 29 29 31 31 33 35 35 39 41 42 42

viii 2.5 Examples of ellipticity 2.5.1 The model problem 2.5.2 The equations of isotropic elasticity Variational formulation of nonsymmetric problems Exercises for Chapter 2

Contents 45 45 48 51 53

2.6 2.7 3

The Galerkin method 57 3.1 The projection theorem 57 3.2 The Galerkin method for a variational problem 59 3.2.1 Another interpretation of the Galerkin method 62 3.2.2 The Galerkin method for a nonsymmetric problem . . . . 63 3.3 Exercises for Chapter 3 63 Piecewise polynomials and the finite element method 67 4.1 Piecewise linear functions defined on a triangular mesh 67 4.1.1 Using piecewise linear functions in Galerkin's method . . 70 4.1.2 The sparsity of the stiffness matrix 74 4.2 Quadratic Lagrange triangles 77 4.2.1 Continuous piecewise quadratic functions 77 4.2.2 The finite element method with quadratic Lagrange triangles 78 4.3 Cubic Lagrange triangles 80 4.3.1 Continuous piecewise cubic functions 80 4.3.2 The finite element method with cubic Lagrange triangles . 83 4.4 Lagrange triangles of arbitrary degree 84 4.4.1 Hierarchical bases for finite element spaces 85 4.5 Other finite elements: Rectangles and quadrilaterals 86 4.5.1 Rectangular elements 86 4.5.2 General quadrilaterals 87 4.6 Using a reference triangle in finite element calculations 91 4.7 Isoparametric finite element methods 93 4.7.1 Isoparametric quadratic triangles 96 4.7.2 Isoparametric triangles of higher degree 100 4.8 Exercises for Chapter 4 101 Convergence of the finite element method 5.1 Approximating smooth functions by continuous piecewise linear functions 5.1.1 The standard refinement of a triangulation 5.1.2 Nondegenerate families of triangulations 5.1.3 Approximation by piecewise linear functions 5.2 Approximation by higher-order piecewise polynomials 5.3 Convergence in the energy norm 5.4 Convergence in the L2-norm 5.5 Variational crimes 5.5.1 Numerical integration 105 105 106 106 107 108 110 115 118 118

Contents 5.5.2 Outline of the analysis of the effect of quadrature 5.5.3 Isoparametric finite elements Exercises for Chapter 5

## ix 120 121 122

5.6

II 6

Data Structures and Implementation The mesh data structure 6.1 Programming the finite element method 6.1.1 Assembling the stiffness matrix 6.1.2 Computing the load vector 6.2 The mesh data structure 6.2.1 The list of nodes 6.2.2 The list of edges 6.2.3 The list of elements 6.2.4 The list of free boundary edges 6.2.5 Other fields in the mesh data structure 6.3 The MATLAB implementation 6.3.1 Generating a mesh by refinement 6.3.2 Generating a mesh from a triangle-node list 6.3.3 Assessing the quality of a triangulation 6.3.4 Viewing a mesh 6.3.5 Handling a domain with a curved boundary 6.3.6 Viewing a piecewise linear function 6.3.7 MATLAB functions 6.3.8 A summary of the notation 6.4 Exercises for Chapter 6

125 127 127 127 131 134 134 135 135 137 137 138 139 140 142 144 147 148 150 151 152

Programming the finite element method: Linear Lagrange triangles 155 7.1 Quadrature 155 7.1.1 Gaussian quadrature 155 7.1.2 Evaluating the standard basis functions on a triangle . . . 162 7.1.3 Quadrature over a square 165 7.2 Assembling the stiffness matrix 166 7.3 Computing the load vector 168 7.3.1 Inhomogeneous Dirichlet conditions 169 7.3.2 Inhomogeneous Neumann conditions 170 7.4 Examples 171 7.4.1 Homogeneous boundary conditions 173 7.4.2 Inhomogeneous boundary conditions 174 7.4.3 A more realistic example 179 7.5 The MATLAB implementation 182 7.5.1 MATLAB functions 182 7.6 Exercises for Chapter 7 183

Contents

Lagrange triangles of arbitrary degree 8.1 Quadrature for higher-order elements 8.2 Assembling the stiffness matrix and load vector 8.3 Implementing the isoparametric method 8.3.1 Placement of nodes in the isoparametric method 8.4 Examples 8.5 The MATLAB implementation 8.5.1 version2 8.5.2 versions 8.6 Exercises for Chapter 8 The finite element method for general BVPs 9.1 Scalar BVPs 9.1.1 An example 9.2 Isotropic elasticity 9.3 Mesh locking 9.4 The MATLAB implementation 9.5 Exercises for Chapter 9

187 187 192 195 199 200 203 203 205 206 209 209 212 213 218 220 221 223 225 225 226 228 229 233 235 235 239 243 244 244 249 250 252 253 254 256 262 262 263

III Solving the Finite Element Equations 10 Direct solution of sparse linear systems 10.1 The Cholesky factorization for positive definite matrices 10.1.1 The Cholesky factorization for dense matrices 10.1.2 The Cholesky factorization for banded matrices 10.2 Factoring general sparse matrices 10.3 Exercises for Chapter 10 Iterative methods: Conjugate gradients 11.1 The CG method 11.1.1 The CG algorithm 11.1.2 Convergence of the CG algorithm 11.2 Hierarchical bases for finite element spaces 11.2.1 Hierarchical bases for linear Lagrange triangles 11.2.2 Relationship between the stiffness matrices in nodal and hierarchical bases 11.3 The hierarchical basis CG method 11.4 The preconditioned CG method 11.4.1 Alternate derivation of PCG 11.4.2 Preconditioners 11.5 The pure Neumann problem 11.6 The MATLAB implementation 11.6.1 MATLAB functions 11.7 Exercises for Chapter 11

11

Contents 12 The classical stationary iterations 12.1 Stationary iterations 12.1.1 Matrix norms 12.1.2 Convergence of stationary iterations 12.2 The classical iterations 12.2.1 Jacobi iteration 12.2.2 Gauss-Seidel iteration 12.2.3 SOR iteration 12.2.4 Symmetric SOR 12.2.5 CG with SSOR preconditioning 12.3 TheMATLAB implementation 12.3.1 MATLAB functions 12.4 Exercises for Chapter 12 The multigrid method 13.1 Stationary iterations as smoothers 13.1.1 The stiffness matrix for the model problem 13.1.2 Fourier modes and the spectral decomposition of K 13.1.3 Jacobi iteration 13.1.4 Weighted Jacobi iteration 13.2 The coarse grid correction algorithm 13.2.1 Projecting the equation onto a coarser mesh 13.2.2 The projected equation and the Galerkin idea 13.2.3 The two-grid multigrid algorithm 13.3 The multigrid V-cycle 13.3.1 W-cycles and/^-cycles 13.4 Full multigrid 13.4.1 Discretization, algebraic, and total errors 13.5 The MATLAB implementation 13.5.1 MATLAB functions 13.6 Exercises for Chapter 13

xi 267 267 268 270 270 271 272 273 274 275 276 276 276 279 279 279 . . . 281 284 287 291 292 294 295 296 300 300 303 304 304 305

13

IV Adaptive Methods 14 Adaptive mesh generation 14.1 Algorithms for local mesh refinement 14.1.1 Algorithms based on the standard refinement 14.1.2 Algorithms based on bisection 14.2 Selecting triangles for local refinement 14.3 A complete adaptive algorithm 14.4 The MATLAB implementation 14.4.1 MATLAB functions 14.5 Exercises for Chapter 14

307 309 311 311 312 315 317 322 324 325

xii 15

Contents Error estimators and indicators 15.1 An explicit error indicator based on estimating the curvature of the solution 15.2 An explicit error indicator based on the residual 15.3 The element residual error estimator 15.4 Some final examples 15.4.1 A discontinuous coefficient 15.4.2 A reentrant corner 15.4.3 Transition from Dirichlet to Neumann conditions 15.5 The MATLAB implementation 15.5.1 MATLAB functions 15.6 Exercises for Chapter 15 329 330 334 340 345 346 346 348 349 349 349 353 357

Bibliography Index

Preface
The finite element method is the most popular general purpose technique for computing accurate solutions to partial differential equations (PDEs). Since PDEs form the basis for many mathematical models in the physical sciences and, increasingly, in other fields as well, it would be difficult to overstate the importance of the finite element method. There are a number of excellent books, such as Brenner and Scott [13], Strang and Fix [41], and Ciarlet [16], covering the theory of finite elements, but these books tend to devote little attention to the practical details of programming the algorithms. While the occasional talented student can work out a reasonable scheme without further help, I think most students will benefit from a careful explanation of data structures and specific coding strategies. This book explains how to write a finite element code from scratch. In addition, it comes with a collection of MATLAB programs implementing the ideas presented in the book. Students can use these codes to experiment with the method and extend them in various ways to learn more about programming finite elements. In addition to a careful explanation of computer implementation (Part II), I have also included, in Part 1, an overview of the theoretical basis of the finite element method. My purpose is to give the reader a good understanding of the "big picture" without getting bogged down in the technical details of the theory. The overview also serves to define the context and notation for discussing computer codes. The finite element method reduces a boundary value problem for a linear PDE to a system of linear equations, written in matrix-vector form as KU F, that must be solved. Part I derives this system of equations, and the algorithms in Part II show how to compute the matrix K and the vector F. Part III presents algorithms for solving KU = F efficiently even when this system is very large. The final part of the book discusses the related issues of a posteriori error estimation and adaptive error reduction. It is possible to analyze the computed finite element solution so as to estimate the errors present in the solution and to determine the regions of the computational domain where the solution can most profitably be improved. Part IV explains the various aspects of developing an adaptive finite element algorithm. Throughout the book, my goal is to provide students with a practical, working knowledge of finite elements. This knowledge should provide an excellent foundation for those who wish to delve into advanced texts on the subject.

xiii

xiv

Preface

## Detailed outline of the book

Although I mention the finite element method above, in fact, there are a number of finite element methods, sharing common features but with important differences. I focus my attention on the Galerkin finite element for steady-state boundary value problems (BVPs). The Galerkin method has a strong and elegant theoretical base that is accessible to undergraduate students with some knowledge of linear algebra. In Chapter 1,1 present the PDEs that are the focus of this book and discuss the physical phenomena that they model. As I mentioned in the previous paragraph, these models are steady state, that is, they describe equilibria in various systems. The Galerkin finite element method is based on three important ideas, which are presented in Chapters 2,3, and 4. The first is that a BVP presented in its classical ("strong") form can be recast in weak or variational form, as explained in Chapter 2. The weak form of a BVP is an algebraic formulation of the problem that allows the use of the Galerkin method. The Galerkin method, explained in Chapter 3, is a natural way of projecting the (infinitedimensional) equation onto a finite-dimensional approximating subspace. The result (for a linear BVP) is a (finite-dimensional) system of linear equations whose solution yields an approximate solution to the BVP. In fact, in a certain sense, the approximate solution is the best possible approximation from the given subspace. The finite element method is the use of certain approximating subspaces in Galerkin's method, namely, subspaces of piecewise polynomials. Piecewise polynomials make it (relatively) easy to form and solve the finite element equations. Chapter 4 introduces several spaces of piecewise polynomials that are commonly used in the finite element method. Piecewise polynomials are defined relative to a mesh on the computational domain. A mesh partitions the domain into simple subdomains, called elements. I will concentrate on triangular elements and two-dimensional domains, although I will describe other possibilities, such as quadrilateral elements. Chapter 5 outlines the convergence theory for Galerkin finite elements. I present the technical theorems, such as the necessary interpolation theory for piecewise polynomials, and show how they fit into the convergence theory. However, the proofs and detailed development of these techniques are beyond the scope of this book. The purpose of Chapter 5 is to show the reader what to expect from the finite element method. Part II is about the computer implementation of finite elements. I begin, in Chapter 6, with the strategy for organizing the computations. To make the discussion as concrete as possible, it initially focuses on the common case of piecewise linear functions defined on triangles (linear Lagrange triangles). The strategy described in Section 6.1 determines the information that must be stored to describe the mesh. The mesh data structure (again, restricted to linear Lagrange triangles) is carefully defined in Section 6.2. I have chosen to base the codes for this book on MATLAB, an interactive system that integrates numerical and symbolic computations with graphics and a programming language. I chose MATLAB for several reasons: 1. It is a popular tool in the numerical analysis community. 2. It provides state-of-the-art routines for handling sparse matrices; in particular, it is simple to solve a sparse system of linear equations (such as those produced by the finite element method).

Preface

xv

3. Its graphical capabilities make it easy to visualize meshes and solutions produced by the finite element method. However, the main algorithms are presented in an informal pseudocode that is independent of MATLAB. An excellent way for the student to ensure his or her understanding of these algorithms is to translate the pseudocode into some other high-level programming language, such as Fortran, C, or C++. The MATLAB codes discussed in the text can be downloaded from the following Web page: http://www.math.mtu.edu/"msgocken/fembook The basic computational algorithms are described in Chapter 7. Section 7.2 shows how to compute the stiffness matrix K, which is the finite-dimensional representation of the partial differential operator defining the PDE. Section 7.3 then shows how to compute the load vector F, which represents the right-hand side of the PDE and any nonzero boundary data. An important part of the discussion concerns incorporating various types of boundary conditions into the computations. The algorithms from Chapter 7 are extended in Chapter 8 to allow for piecewise polynomials of degree greater than one. Besides allowing for greater accuracy in approximating the solution, higher-order polynomials also make it possible to approximate a computational domain with a curved boundary with isoparametric elements. The idea of isoparametric finite elements is first presented in Section 4.7, while the implementation details are explained in Section 8.3. Chapters 7 and 8 focus on a simple model problem, because most of the essential ideas can be explained in a fairly simple setting. Chapter 9 shows how to extend the techniques to more complicated problems. Having computed the stiffness matrix and load vector, it remains only to solve the resulting matrix-vector equation and interpret the results. I ignored the issue of solving the system KU F in Part II, assuming that the built-in solver in MATLAB would be used. However, direct solvers such as the one in MATLAB can use an unacceptably large amount of time and/or computer memory when the system is large. For this reason, iterative methods are often preferred. Part III discusses both direct and iterative algorithms for solving a large system like KU = F. One reason the finite element method is so successful is that piecewise polynomials result in a sparse stiffness matrix K, that is, a matrix in which most of the entries are zero. This makes it possible to solve KU F even when the number of unknowns is very large. Chapter 10 gives a brief overview of direct methods for solving sparse systems. A direct method produces the exact solution (up to round-off error) in a finite number of steps. I have included Chapter 10 mainly to provide a context for understanding the advantages of iterative methods; a detailed discussion of direct algorithms is beyond the scope of this book. Chapters 11 to 13 describe a number of different iterative algorithms, which compute a sequence of approximate solutions that converges to the exact solution. Although the exact solution of KU = F is computed only in the limit (that is, in an infinite number of steps), a good iterative method can often produce an acceptable solution while using much less time and computer memory than a direct method.

xvi

Preface

Parti

Chapter 1

## Some model PDEs

Finite element methods are flexible and powerful techniques for solving partial differential equations (PDEs). There are actually a number of methods that go under the name of finite elements, so it is somewhat misleading to refer, as I do in the title, to the finite element method. In this book, I describe in some detail the Galerkin finite element method for stationary (equilibrium) problems. In the first part of the book, I derive the Galerkin finite element method, showing it to be the synthesis of three powerful ideas: 1. A boundary value problem (BVP) can be transformed into an equivalent form, called the weak or variational form, that can be approached by different methods, both analytical and computational, than those that apply to the original form of the problem. 2. The Galerkin method produces the best approximation, from a given approximating subspace, to the true solution of a variational problem. Moreover, this best approximation is the solution to a finite-dimensional system of equations. 3. When the approximating subspace in the Galerkin method is chosen to be a subspace of piecewisepolynomial functions, the resulting algorithm is both efficient and effective: The system of equations can be formed and solved efficiently even when the number of unknowns is very large, and the resulting approximate solution can be highly accurate. This chapter describes the classes of PDEs to which the finite element method will be applied.

1.1

## Laplace's equation; elliptic BVPs

Laplace's equation is the PDE where the Laplace operator (or the Laplacian), A, is defined by

## Chapter 1. Some model PDEs

in two dimensions, or

in three. Solutions of Laplace's equation are called harmonic functions. Much of the analysis and many aspects of the numerical methods covered in this book are the same whether the equation is posed in two or three spatial dimensions, but for definiteness I will describe the two-dimensional case, with some comments about three-dimensional problems. The inhomogeneous version of Laplace's equation,

where / is a function defined on Q, is called Poisson's equation. Equations (1.1) and (1.2) are most commonly posed on a bounded domain 2 in R 2 . A domain is a connected open set. Connected means that the set consists of only one "piece," or, more precisely, that any two points in the set are joined by a curve lying entirely within the set. Open means that the boundary of the set is not a part of the set. Bounded means that the set is finite in extent, that is, that it can be enclosed by a circle with a finite radius. The boundary of 2 will be denoted by 9 2, and the closure 2 of 2 is the union of 2 and dQ. The PDEs (1.1) and (1.2), by themselves, are insufficient to determine a unique solution; (1.1) and (1.2) have many solutions. A particular solution can be singled out by adding boundary conditions; as the reader will see, such conditions are natural in many physical problems. For example, if / is a function defined on ST2 and g is a function defined on 92, then

is called a Dirichlet BVP, and (1.3b) is referred to as a Dirichlet boundary condition. A Neumann BVP has the form

where du/dn is the normal derivative of u on 9 2. If the vector n(x, y) is the outwardpointing normal vector to 9 2 at (jc, y) e 9 2 and V is the gradient of u,

## then the normal derivative is defined by

Before trying to solve any mathematical problem, it is helpful to determine whether a solution exists and, if so, whether the solution is unique. These are the existence and uniqueness questions. These questions are particularly important when the problem is difficult to solve; one would not want to expend a lot of effort trying to compute something that does not exist!

## 1.1. Laplace's equation; elliptic BVPs

As I explain below, a Dirichlet BVP for Laplace's or Poisson's equation has a unique solution, as long as the functions / and g are reasonable. The situation with the Neumann problem is more subtle, and the existence and uniqueness questions are interrelated: If the functions / and h are compatible, then the Neumann BVP has infinitely many solutions, any two of which differ by a constant. On the other hand, if / and g are not compatible, then there is no solution. So either existence or uniqueness fails in the case of the Neumann problem. I will explain the compatibility condition below on physical grounds and derive it in the next chapter. The implications of the lack of uniqueness for computing solutions are discussed in Section 11.5 (see also Example 7.4).

1.1.1

## Physical experiments modeled by Laplace's equation

Steady-state heat flow The first application of Laplace's equation is to a flat metal plate occupying a domain ft in R2. The function u(x, y) represents the temperature at the point (x, y ) e ft. The plate is assumed to be insulated on the top and bottom, so heat can flow only in two dimensions. Such a plate has a third dimension, its thickness, but I will assume that neither the plate nor its temperature varies in the vertical direction, so that a two-dimensional model suffices. Laplace's equation, models the case of steady-state heat flow: The temperature is independent of time and the temperature gradient Vu indicates the flow of heat energy across the plate. Poisson's equation, models steady-state heat flow with heat sources and/or sinks in the plate. If f(x, y) > 0 for some (x, y) ft, then heat energy is being added at that point at a rate f(x, y) (in appropriate units). If f(x, y) < 0, then energy is being removed at (jc, y). In this context, Dirichlet boundary conditions,

indicate that the temperature of the plate is held fixed at the boundary, specifically, that the temperature at (x, y) e 3ft is held fixed at g(x, y). The Dirichlet BVP

models the following situation: The plate is insulated on the top and bottom, the temperature at each point (x, y) in the boundary is held fixed at the given value of g(x, y), and the plate is allowed to reach equilibrium. The equilibrium temperature distribution is then given by the solution u of the BVP. Neumann boundary conditions,

## Chapter 1. Some model PDEs

indicate that the heat flux across the boundary is the prescribed value h. The heat flux is the flow of heat energy, in units of energy per time per length. In particular, the homogeneous Neumann condition models the case of no heat fluxthe boundary is insulated. Units and physical parameters The equations described above are nondimensional versions of the PDEs; describing actual materials (such as an iron plate, for example) requires physical parameters. In the heat flow problem described above, the relevant parameter is the thermal conductivity K. The thermal conductivity is the constant of proportionality in Fourier's law of heat conduction, which postulates that the heat flux is proportional to the temperature gradient:

The units of K are energy per time per length per temperature. For example, the thermal conductivity of iron near 0 degrees Celsius is K 0.836W/(cm K). Poisson's equation is then written as From this equation, the units of / can be determined. They must be the same as the units of the left-hand side, which are energy per time per volume (for example, W/cm3). The thermal conductivity K is positive by definition. The sign of K also has an important mathematical significance, as will be shown in the next chapter. In that chapter, it will become apparent why I prefer to include the negative sign explicitly in Laplace's and Poisson's equations. If the material is heterogeneous, then the thermal conductivity varies throughout 2: K = K(X, y). The PDE becomes more complicated:

The divergence operator, denoted V-, is a partial differential operator that takes a vectorvalued function and produces a scalar-valued function as follows: If

then

## 1.1. Laplace's equation; elliptic BVPs

The appropriate form of the Neumann boundary condition, taking into account the physical characteristics of the material, is

## Since the heat flux, by Fourier's law, is KVu and

(1.5) simply says that the heat flux into 2 across dQ is the prescribed value h(x, y}. The Neumann BVP

indicates that heat energy is being added to or taken away from the plate in two ways: in the interior (the effect of the heat source /) and across the boundary (the effect of the heat flux h). If the temperature u is to be in equilibrium, it must be the case that the net amount of heat added is zero. This is expressed by the compatibility condition:

The first integral is the rate at which heat energy is added to the interior, while the second is the rate at which energy enters across the boundary. If the two integrals do not sum to zero, then existence failsthere is no solutionfor the above BVP. This will be shown mathematically in Section 2.1.1. On the other hand, if there is a solution u, then it is clear from the equations that u + C is also a solution for any constant C (only derivatives of u appear in the PDE and the boundary condition, so adding a constant to u does not affect the equations). Therefore the solution, if it exists, is not unique. This is easy to understand on physical grounds: The compatibility condition states that no net energy is being added to or taken from the plate, but nothing in the BVP indicates how much total heat energy is in the plate. Adding a constant to the temperature u changes the total amount of heat energy without changing the temperature gradient, on which the heat flux and the BVP depend. Small vertical deflections of a membrane Another experiment modeled by Laplace's or Poisson's equation is the following: A membrane that occupies a domain Q when at rest is fixed along the boundary and subjected to a small transverse pressure. The point on the membrane originally at (x, y, 0), (x, y) e 2, moves to (x, y, u(x, >')) under the influence of the pressure. It is a simplifying assumption that the point moves only in the vertical direction. This is not exactly true, but it will be nearly true if the pressure is small enough. Dirichlet conditions in this application indicate that the boundary of the membrane is fixed. For example,

## Chapter 1. Some model PDEs

means that the boundary is fixed in the original (horizontal) plane. An inhomogeneous Dirichlet condition, such as means that the membrane is stretched on a frame whose shape is determined by the boundary function g. In this context, a homogeneous Neumann boundary condition indicates that the boundary is free to move in the vertical direction. This condition is not physically plausible when applied to the entire membrane, but the following mixed boundary conditions describe a meaningful experiment:

Here PI and F2 form a partition of the boundary 3 2, and the boundary conditions indicate that part of the boundary (Fj) is fixed, while the remainder (I~2) is free to move up and down. Again, I have presented the nondimensional version of the equations. When taking into account the physical characteristics of the membrane, the relevant quantity is the tension T in the membrane, and Poisson's equation takes the form

A constant T means that the tension is the same throughout the membrane.

1.2

## Other elliptic BVPs

Laplace's equation is the prototypical elliptic PDE. Elliptic PDEs describe certain equilibrium phenomena and have mathematical properties that will be described in the next chapter. Here I will simply give some more examples of elliptic PDEs. 1.2.1 The equations of isotropic elasticity

An elastic membrane is said to be isotropic if its elastic response is the same in every direction. This means that if the membrane is stretched by a certain traction, or rotated about a point and then stretched by the same traction, the response in the two experiments will be the same. In two-dimensional linear elasticity, one models small planar deformations of an elastic membrane, and the unknown is the displacement of the material from a reference position. This displacement is a vector-valued quantity:

The displacement u has the following meaning: Under the applied load, the point of the membrane originally at (jc, y) moves to the location (jc + u\ (x, y), y + u^(x, y)).

## 1.2. Other elliptic BVPs

When the membrane is isotropic, it is described by two scalar quantities called the Lame moduli, n and A. The Lame moduli are constants if the membrane is homogeneous and functions of (jc, y) if it is heterogeneous. Those familiar with engineering mechanics may be accustomed to describing the elastic properties of a material in terms of Young's modulus E and Poisson's ratio v. The relationship between the Lame moduli and E and v is explored in the exercises at the end of this chapter. Since there are two unknown functions, MI and ui, there are two PDEs that together model the stretching of the membrane under an applied load. These are usually written in vector form as follows:

I will now identify each term in these equations. The gradient of a vector-valued function u is

(this is called the Jacobian matrix in other contexts). The quantity 6 is the (linearized) strain tensor, a measure of the local deformation of the membrane. The trace o f f , tr(e), is the sum of the diagonal entries of e:

The tensor a is called the stress tensor. It measures the elastic response of the membrane to the deformation described by the strain. In two dimensions, a has units of force per length; the units become force per area in three dimensions. The divergence of a tensor is the vector whose components are the divergences of the rows of the tensor:

The stress-strain relationship, expressed by (1.6b), is sometimes called the constitutive hypothesis. It is an assumption about the response of the particular material. On the other hand, (1.6a) is the balance law, which is the same for all materials. The function / in (1.6a) is the applied load, in units offeree per area. It might be more natural to express the body force in units offeree per mass, in which case the right-hand side of (1.6a) would be p~] f ( x , >'), where p is the density of the membrane, expressed in mass per area. As an exercise to verify that the notation is understood, the reader can assume that n and A. are constants and check that (1.6a)-(1.6c) are equivalent to the two scalar PDEs

10

## Chapter 1. Some model PDEs

However, as I will show in later chapters, it is most convenient to work with the equations in vector form. The right-hand side of PDE (1.6a) represents a body force acting on the interior of the membrane. A body force is a force that acts at a distance, such as gravity or an electromagnetic force. Often, in a membrane problem, there is no body force. Typically the load is introduced by a traction (applied stress) on the boundary, which leads to the boundary condition Frequently the traction is applied to only part of the boundary, while the remainder is fixed (that is, not allowed to move). Therefore, a natural B VP for a membrane has mixed boundary conditions:

(F| and F2 form a partition of d2). These boundary conditions model a simple experiment, in which part of the boundary is held motionless and the membrane is stretched by a traction applied to the rest of the boundary. It is also natural to consider the pure traction problem, in which a traction is applied to the entire boundary. This is a Neumann problem, and the comments from page 5 apply: either existence or uniqueness fails. More precisely, a solution does not exist unless the traction h satisfies a compatibility condition; if the compatibility condition is satisfied, then there are infinitely many solutions, any two of which differ by a rigid body motion (see Exercise 5). This will be explored in Section 2.5.2. The equations of isotropic elasticity for a three-dimensional elastic solid also take the form (1.6) when written in vector form. In that case, the displacement u has three components and Vw, e, and a are all 3 x 3.

1.2.2

## General linear elasticity

For an elastic membrane that is not assumed to be isotropic, the stress-strain relationship is more complicated. Assuming a linear relationship, there exists a 4-tensor A = Aijki such that that is,

In Exercise 3, the reader is asked to show that (1.6b) is a special case of (1.8). The tensor A is assumed to satisfy the symmetry conditions

11

## where the dot product of two 2-tensors 6, a is defined by

Exercise 4 discusses assumptions (1.9a) and (1.9b). The third symmetry condition, (1.9c), results from the assumption that the material in question is not merely elastic but hyperelastic. For a discussion of hyperelasticity, see Gurtin [24]. Condition (1.10) is natural in many settings; for example, Exercise 9 explores it for isotropic elasticity. Mathematically, it guarantees that the resulting PDE is elliptic, as will be shown in the next chapter. The boundary conditions described above for an isotropic membrane have the same meanings for an anisotropic membrane. Once again, the equations for a three-dimensional elastic body have exactly the same form as for a two-dimensional elastic membrane, with the indices describing the tensors taking the values 1,2,3 instead of just 1,2.

1.3

## Exercises for Chapter 1

1. Write the expression V (K Vw) explicitly in terms of partial derivatives and show that

2. Show that (1 .6) is equivalent to (1 .7) when JJL and X are constant. 3 . Show that ( 1 .6b) can be expressed in the form (1.8). What is the tensor A? (Notice that a general 4-tensor on two-dimensional space is determined by the 24 = 1 6 entries Aiikt. However, the symmetry conditions (1.9) imply that such an A has only six independent entries, namely, 4. (a) Show that if A is a 4-tensor, then Ae = Ae for all symmetric 2-tensors e, where A is defined by Aijki (A/;*/ + A ( / /^)/2 for all /, j, k, I. It follows that there is no loss of generality in assuming (1.9a). (b) Show that if (1.9a) holds and a = At is symmetric for each symmetric e, then (1.9b) must hold. (Note: The symmetry of the stress tensor a follows from the principle of conservation of angular momentum; see Gurtin [24, page 101].) 5. The purpose of this exercise is to determine all displacements u with the property that the corresponding strain 6 is zero. Since 6 measures the local deformation of the material, it would appear that rigid displacements, arising from translations and rotations of the membrane, would lead to zero strain. However, e is the linearized strain, or the linear approximation to the actual strain, so this is not quite correct.

12

Chapter 1 . Some model PDEs (a) Show that is zero if and only if u has the form

where <\$i, 82, and 83 are constants. The first term represents a translation, as expected. (b) Show that if u results from a rotation of angle S (about the origin, for simplicity), then

To first order, cos (8) = 1 and sin (8) = 8, so (1.12) can be approximated by

when 8 is small. This explains the significance of the second term in (1.11). Displacements of the form (1.11) are called infinitesimal rigid displacements. 6. (cf. Gurtin [24, Section 31]) This exercise examines some simple displacements of an isotropic membrane in order to demonstrate the physical meaning of the Lame moduli /z and X. In the following examples, 8 represents a small positive number. (a) Let the displacement u be given by M(JC, y) = (8y, 0) (a pure shear). Show that a 2/ze. For this reason, n is called the shear modulus. (b) Let (jto, Jo) be any point in 2 and consider the displacement

(a pure expansion). Show that, in this case, a = 2(/i + A)e. The constant /z + A is called the bulk modulus. (c) Let the stress a be given by

(so that the tension is all in the jc-directiona purely longitudinal tension). Find the most general displacement u that corresponds to such a stress, and show that this w can be written as

## 1.3. Exercises for Chapter 1 It follows that

13

That is, the stress in the x-direction is just E times the strain in the jc-direction. The constant of proportionality E is called Young's modulus. Moreover,

so that y gives the ratio of the lateral contraction to the longitudinal expansion. The constant v is called Poisson's ratio) 1. From (1.13), find formulas for n and A in terms of E and v. 8. At the end of Section 1.1,1 discussed the small vertical deflection of a membrane which is in a state of constant (planar) tension T > 0. In the language of Section 1.2.1, this means that a TI, where / is the identity, so that the tension is av = TV in every direction v. (a) Show that if the displacement u is given by

(a pure expansion; see Exercise 6), then a = TI. (b) Show that (1.14) is a solution to the traction problem

9. Show that if and only if JJL > 0 and /z + A. > 0 (that is, if and only if the shear and bulk moduli are positive).

1 The formulas for and v in terms of n and X are specific to a two-dimensional model. For a three-dimensional elastic body, the relationships are

Chapter 2

## The weak form of a BVP

In this chapter, I show how to rewrite a BVP in its weak or variational form, from which the finite element method is derived. The first example will be the model problem

where Q. is a bounded domain in R 2 . Later in the chapter, the calculations will be extended to more complicated PDEs and different boundary conditions. Deriving the weak form of (2.1) requires the use of some vector calculus.

## 2.1 Review of vector calculus 2.1 .1 The divergence theorem

The main result from vector calculus that we need is the divergence theorem, which is the multidimensional analogue of the fundamental theorem of calculus. In the following discussion, I describe the results in two dimensions, but the beauty of vector notation is that the results hold without change in three or more dimensions. If 1 is a domain in R2 with a smooth or piecewise smooth boundary and F is a vector field defined on 2, then the divergence theorem states that

where n is the outward-pointing unit normal vector to dQ. (The normal vector is a function of (x, y) dQ: n = n(x, y).) The integral on the left can also be written as a double integral,

15

16

## Chapter 2. The weak form of a BVP

and can also be referred to as an area integral. The integral on the right is an integral over the boundary of 2, which is a curve, and thus can be referred to as a line integral. In order for the divergence theorem to hold, the vector field F must be smooth enough. By taking the vector field F to be of the form

## it follows from the divergence theorem that

where n\ is the first component of the outward-pointing unit normal vector. Similarly,

The divergence theorem relates a quantity defined on the boundary to another quantity defined on the interior of the domain. This explains why the divergence operator appears in the PDEs described in the previous chapter. An example is the steady-state heat equation:

It is derived by writing the total amount of heat entering an arbitrary subdomain a> c 2 in two ways: in terms of the heat source or sink /, and in terms of the heat flowing across the boundary, due to the heat flux KVM. The two must sum to zero, since the temperature is assumed to be in equilibrium:

## The divergence theorem is then invoked, yielding

Since this holds for every subdomain a> c 2, a little analysis shows that the PDE

must hold. Although this derivation is sketchy, I hope it illustrates the connection between the divergence theorem and many common PDEs and also explains why the divergence and Laplace operators are fundamental.

2.1. Review of vector calculus The compatibility condition for the Neumann problem

17

In the previous chapter, I described the compatibility condition for the pure Neumann problem

Mathematically, the compatibility condition follows from the divergence theorem. If u is a solution to (2.5), then

Therefore,

## which is the compatibility condition given in the previous chapter.

2.1.2

Green's identity

The weak form of a BVP is derived using Green's identity, which is the multidimensional analogue of integration by parts. To understand Green's identity, it is useful to review integration by parts, which is obtained from the product rule for differentiation and the fundamental theorem of calculus:

Green's identity follows from the following product rule in multiple dimensions:

This rule can be derived, using the ordinary product rule, by writing out the left side in coordinates (see Exercise 1.3.1).

18

## Chapter 2. The weak form of a BVP

Green's identity is obtained from (2.6) by integrating both sides over 2 and applying the divergence theorem:

## Replacing Vw n with du/dn and rearranging yields Green's (first) identity:

In Section 2.2, I will show how (2.7) is used. As in the case of the divergence theorem, the primary use of Green's identity is not to evaluate specific integrals, but rather to derive useful formulas.

2.1.3

## Other forms of the divergence theorem and Green's identity

Equations (2.3) and (2.4) lead to the following identities, which look very much like integration by parts in one dimension:

These formulas will be essential in the next section. The reader will recall from Section 1.2.1 that the divergence of a tensor is computed by taking the divergence of each row of the tensor. Since the integral of a vector-valued function is computed by taking the integral of each component of the function, the following version of the divergence theorem is valid:

## I will need a version of Green's identity that applies to the integral

where v is a vector-valued function. Exercise 3 asks the reader to verify that if a is a 2-tensor and v is a vector, then

2.1. Review of vector calculus When a is symmetric (o\2 a2i), as is the case when a is a stress tensor, then
and

19

and thus The reader will recall from Section 1.2.1 that v is the strain tensor associated with the displacement v. The dependence of v on v is explicitly indicated, because shortly I will refer to two displacements and v and their associated strain tensors. I can now derive the needed extension of Green's identity (still assuming that a is symmetric):

Since a is assumed to be symmetric, holds, yielding the desired extension of Green's identity:

If a = au is the stress tensor arising in the linear elasticity model, then a,, = AU holds and the first integral on the right in (2.10) can be written as

This yields

It should be noticed that the 4-tensor A can be either a constant or a function of (x, y); (2.11), written as it is, holds in either case. It will be useful to also write (2.7) in a form that allows for a nonconstant coefficient:

The reader can verify that (2.12) follows from the product rule and the divergence theorem. For more details about the vector calculus reviewed in this section, see Kaplan [26], which gives a straightforward introduction. An alternative at the same level is Greenberg [22]. A more advanced treatment can be found in Marsden and Tromba [30].

20

2.2

## The weak form of a BVP

This section introduces the weak form of a BVP, one of the key ingredients of the finite element method. I will begin with the following Dirichlet problem:

If / is continuous and u is a solution of (2.13), then it is natural to expect that u and its partial derivatives of orders one and two are all continuous on Q, and, of course, u is zero on dQ. The space Ck(Q) is defined to be the set of all real-valued functions u defined on 2 with the property that u and its partial derivatives up to order k are all continuous on Q. A solution to (2.13) is sought in the subspace

## (the subscript "D" stands for "Dirichlet"). If u is a solution to (2.13), then

and therefore, for any function v defined on Q, multiplying both sides of the PDE by v yields Since the two functions V (/cVw) v and fv are equal on 2, their integrals over 2 must agree:

In this context, the function v is referred to as a test function. The idea is to check whether the PDE holds in the (weighted) average sense over 2, using the test function v to define the weights in the average. Obviously, just because (2.14) holds for a particular test function v is no reason to think that the PDE (2.13a) holds. However, as I will now explain, if (2.14) holds for all test functions v from a sufficiently large set, then (2.13a) must hold. The ball of radius 8 centered at (JCQ, yo) is denoted BS(XQ, yo):

Suppose (jco, yo) e 2 and 8 > 0 is small enough that B\$(XQ, yo) is contained entirely in 2. Consider any function v e C2D(l) with the following properties:

## 2.2. The weak form of a BVP

21

It is not difficult to show that many such functions v exist; in fact, I construct such a function in Section 2.2.2. But then

is just a weighted average of/ over the disk BS(XQ, >>o), and similarly

is a weighted average of V (K Vw) over the same disk. If 8 is very small, then

and

Moreover, in the limit as 8 -> 0, these become exact equations. Therefore, if the space of test functions contains all such functions v, and (2.14) holds for all test functions, then the original PDE must hold at every (XQ, jo) 2. By the above reasoning, it is sufficient to take the space of test functions to be C2D(l) (which contains the above-described test functions, as well as many others). Therefore, (2.14) holds for some u e C2D(l) and for all v e C^(2) if and only if u satisfies the BVP (2.13). The next step is to apply Green's identity to the left-hand side of (2.14):

(the boundary integral vanishes because v is zero on 32). This leads to the weak form of BVP(2.13):

As 1 have argued above, (2.13) and (2.15) are equivalent: A function u e C2D(Q) satisfies one if and only if it satisfies the other. Problem (2.15) is also referred to as a variationalproblem and as the variationalform of (2.13). Next I want to explain the reasons for using the terms "variational" and "weak" to describe (2.15), since these reasons are quite instructive.

2.2.1

Minimization of energy

When the PDE (2.13a) models a mechanical system in which u is the displacement and / is an external body force, the total potential energy of the system is

22

## Chapter 2. The weak form of a BVP

where Q is some constant. The state of equilibrium of the system corresponds to the displacement u that minimizes the potential energy. This may be a familiar idea, but here it is shown mathematically that the u that minimizes J is the same u that solves (2.15) (and hence the BVP (2.13)). In the physical problem modeled by (2.13), only displacements u that satisfy the boundary condition (2.13b) are admissible. If u and w are two admissible displacements (that is, if u, w e C2D(Q)) and v w u (so that w = u + u), then v is also in C2D(t). Mathematically, this reflects the fact that C2D(Q) is a vector space. Adding or subtracting two vectors in the space produces another vector in the same space. A direct calculation shows that

## The parameter K is positive and

provided v is a nonconstant function. Because of the boundary conditions, the only constant function in C2D(2) is the zero function, so if v is a nonzero displacement (that is, if u; / w), then

## Therefore, if and only if

(see Exercise 6). This shows that u minimizes J over C2D(l) if and only if u satisfies (2.15), the variational form of the equation. This is an interesting result from the physical point of viewmechanical equilibrium corresponds to minimal potential energy. However, I promised to explain the reason for the term "variational form." From basic calculus, the derivative of J ought to be zero at the minimizer. The formula

shows that

is the directional derivative of J at u in the direction of v.2 In somewhat old-fashioned language, this directional derivative is referred to as the (first) variation of J. Thus the variational form of the BVP simply states that, at the solution a, the first variation of the potential energy is zero in every directionhence the term "variational form."
2 Equation (2.20) expresses J(u + v) as /() + (a term linear in v) + (a term quadratic in v). The linear term must be the directional derivative of J.

23

2.2.2

## Relaxing the PDE

The original PDE, suggests that the solution u should have partial derivatives up to order twothat is, that u should be twice differential)le. On the other hand, the variational form (2.15) refers only to the first derivatives of u. Similarly, in the classical way of looking at things, it is expected that the right-hand-side function / be continuous over 2. However, in the variational form, it is only necessary that / be integrable (or, more precisely, that / times any test function be integrable). For this reason, (2.15) is referred to as the weak form of the original BVP, which can be called the strong form by contrast. When working with the weak form of the BVP, it is natural to make the weakest possible assumptions on the functions involved, so as to include as many cases as possible in the analysis. For this reason, the Sobolev spaces are introduced. Sobolev spaces In classical theory, all necessary partial derivatives are assumed to exist and be continuous, and this assumption leads to the spaces Ck(Q) defined above. The partial derivatives are defined as in calculus; for example,

However, in the weak form of a BVP, it is not necessary that all functions and derivatives be continuous. Moreover, there is another way to define partial derivatives that is actually more natural and useful in the context of the weak form. To explain this other definition of partial derivative, I must introduce some new concepts. First of all, if u is a function, its support is the closure of the set on which u is nonzero: If u is defined on 2 and supp(w) is a compact subset (that is, a closed and bounded subset) of 2, then u is said to be compactly supported in 2. A function compactly supported in "2 is zero on and near the boundary of 2. The space C(2) is defined to be the set of all functions that are infinitely differentiable on Q and compactly supported in 2. The condition that u e C%(Q) is quite strong, and the reader might wonder if, in fact, there are any functions in this space at all. Such a function must have the property that it and all of its partial derivatives go to zero as (jc, >') approaches the boundary of supp(w). To settle this question, I will show how to construct a family of functions in CQ(^). Let (XQ, >'o) e 2 and 8 > 0 be sufficiently small that B\$(XQ, >'o) C 2, and define 0 : 2 -> R by

Then, since

24

## Chapter 2. The weak form of a BVP

it follows that 0(;c, y) -> 0 as (jc, y) > dB\$(xo, yo). Moreover, each partial derivative of 0, inside B\$(XQ, yo), consists of a rational function times the same exponential, which is enough to show that each partial derivative converges to zero as (jc, y) > dBs(xo, jo). Thus0eC 0 (fi). In fact, defining

it follows that and thus y is a test function of the special type described on page 20. Using functions like 0, it is possible to generate lots of elements of CQ(^) (although I will not show how to do it). In fact, any function u e C(2) can be approximated arbitrarily well, in a sense to be described below, by functions in C(2). Now I can explain an alternate definition of partial derivative. Suppose for now that u e C1^)- Integrating by parts, that is, applying (2.8a), yields

for all smooth test functions v. If v e C0(2), then the boundary integral is zero (since v is identically zero on 3 2), and thus

## In other words, du/dx is that function g satisfying

Although I will not prove it formally, there can be only one such function, and thus, for u e Cl(tt), g - du/dx if and only if (2.22) holds. Therefore, (2.22) can serve as an alternate definition of the partial derivative du/dx of u e Cl(2). A similar definition is valid for du/dy. The advantage of this alternate definition is that it can be extended to many functions that are not differentiable everywhere in the sense of (2.21). DEFINITION 2.1. Suppose u is a real-valued function defined on a domain L in R2, and that u is integrable over every compact subset ofQ. (In this case, u is called locally integrable.) If there exists another locally integrable function g defined on 2 such that

holds, then u is said to be weakly differentiable (with respect tox) andg is called the weak partial derivative (with respect to jc) ofu. The weak partial derivative with respect to y is defined similarly. Weak derivatives are denoted by du/dx and du/dy, just as are strong derivatives.

## 2.2. The weak form of a BVP

25

Here is an example of a function that is not differentiable over all of 2 but has a weak partial derivative. EXAMPLE 2.2. Let 2 be the unit square, Q (0, 1) x (0, 1), and define

The following argument shows that u is weakly differentiable with respect to x and that

## On the other hand,

This proves that g is a weak partial derivative with respect to x ofu on 1. The previous example may lead the reader to believe that there real ly is not much to the definition of the weak derivative. After all, in Example 2.2, the function u is differentiable (in the classical sense) except on a line segment, and the weak derivative is found by simply computing the derivative of a, where it exists, in the classical manner. However, the following example presents a function that is differentiable except on the same line segment and yet is not weakly differentiable.

26

## Chapter 2. The weak form of a BVP

EXAMPLE 2.3. Let Q be the unit square, Q = (0, 1) x (0, 1), and define

## 00000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000 00000000

0000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000

must hold (since a line segment has zero area). (The same formula (2.22) can be used to extend the notion of derivative to functions such as the u in this example; however, such derivatives are not representable by locally integrable functions and are beyond the scope of this book.) The assumptions that must be made on the functions appearing in the variational form can be stated in terms of weak derivatives. Recall that it is desirable to make the weakest assumptions possible, so as to encompass as many cases as possible. For this reason, it is assumed that u and v are only weakly differentiate. Moreover, since only first derivatives appear in (2.15), it is assumed that u and v have weak partial derivatives of the first order. Therefore, whereas in the strong form (2.13) u must have continuous derivatives up to order two, in the weak form (2.15) it need only have weak derivatives of order one. This is indeed a considerable relaxation of the requirements on u. There is one restriction, however, that is necessary. The definition of weak derivative requires only that du/dx and du/dy be (locally) integrable. In the variation equation (2.15), though, it must be possible to integrate the products

## 2.2. The weak form of a BVP

27

In particular, v = u must be allowed, which shows that du/dx and du/dy must be squareintegrable:

## In the same way,

must be finite, which also suggests that / and v should be square-integrable (if v /, for example). It is therefore convenient to define the space

1 will require that /, the right-hand side of the PDE, belong to L 2 (2). The solution u of (2.15) must satisfy

and the test functions must satisfy the same conditions. These conditions define the Sobolev space H](Q):

Finally, it is necessary that both the solution u and all of the test functions v satisfy the Dirichlet boundary conditions. For this reason, it is convenient to introduce the following space: The space HQ (Q) is another example of a Sobolev space. In defining the above spaces, 1 have been rather cavalier about some important mathematical details. For the interested reader, I discuss these details in Section 2.2.3. The variational form of (2.13) can now be defined in terms of the Sobolev space
//o'():

The right-hand-side / is assumed to belong to L 2 (2). It should now be clear why the variational form is also called the weak form; the requirements of the right-hand-side / and the solution u have been considerably weakened over the classical strong form (2.13).

2.2.3

## A few details about Sobolev spaces

In the above explanations, 1 have not attempted to give a rigorous definition of the spaces L 2 (ft), #'(2), and HQ(&)\ to do so would lead us far afield. For the interested reader, I will sketch a few of the omitted details and give some references for further study.

28

## Chapter 2. The weak form of a BVP

First of all, to define L2(2), it is important to use the correct definition of the integral, though for purely technical reasons. The integral defined in the usual undergraduate calculus course is the Riemann integral. It is the integral of choice in introductory calculus courses because its definition is relatively simple compared to that of its competitors. However, the Riemann integral suffers from some serious technical shortcomings, most notably that it is not difficult to construct a sequence of Riemann integrable functions converging (in a natural sense) to a function that is not Riemann integrable. This property makes the Riemann integral largely useless for constructing a satisfactory theory of PDEs. The Lebesgue integral is preferred over the Riemann integral for precisely these theoretical reasons. I will not explain the definition of the Lebesgue integral, for the following reason: Any function that is Riemann integrable over a bounded domain is also Lebesgue integrable, and its Riemann and Lebesgue integrals are equal. The Lebesgue integral merely allows us to integrate certain singular functions that, for theoretical reasons, must be included. I will explain a bit more about this in Section 2.4. Functions that are regular enough to integrate are called (Lebesgue) measurable. The Lebesgue integral is based on the notion of Lebesgue measure of sets. Lebesgue measure corresponds to the notion of area for subsets of R2 that are regular enough that their areas can be defined. However, Lebesgue measure extends to very complicated sets. As in the case of integrals, this extension is needed mainly to allow for a satisfactory theory. My favorite reference for Lebesgue measure and integration theory is Folland's text on real analysis [20]; another text that is perhaps somewhat more accessible to the beginner is the real analysis text by Roy den [36]. When speaking of the space L 2 (2), it is important to realize that two functions / and g with the property that /Q \f g\ 0 must be regarded as equal. Functions / and g have this property whenever f ( x ) = g(x) except on a set with Lebesgue measure zero. For this reason, when describing a function that belongs to L 2 (2), it is acceptable to leave the function undefined on a set of measure zero. I did this, in fact, for the weak derivative of the function u in Example 2.2. I left du/dx undefined on the line segment corresponding to x = 1 /2; this line segment clearly has area (and hence measure) zero. Any values could be assigned to du/dx on that line segment without changing the function as an element of L 2 (ft). A corollary of the discussion in the previous paragraph is that a well-defined function u e L2(l) cannot be unambiguously restricted to a subset of measure zero. This is an important fact in the theory of BVPs, since, for a bounded domain 2 c R2, dQ. is a subset of measure zero. The space Hl(Q) and its subspace HQ(&) are just two examples of Sobolev spaces. For a bounded domain 2 and any positive integer k, the Sobolev space Hk(2) is defined to be the space of all integrable functions u defined on 2 with the property that u and its partial derivatives up to order k all belong to L 2 (2). In this context, L 2 (2) can be called //(2). It is also possible to define Hs(2) for fractional values of s, but there is more than one way to do this, and the development is rather complicated. In developing a rigorous theory of BVPs using weak derivatives, it is necessary to know how much (weak) differentiability is necessary for a function to have well-defined boundary values. The answer to this question is called the trace theorem, which states (in its most refined form) that restricting an HS(Q) function (2 c R 2 ) to a one-dimensional curve F produces a function in Hs~l/2(r). Therefore, in particular, an Hl (2) function has

2.3. The weak form for other boundary conditions and PDEs

29

boundary values belonging to H]/2(d2) c L 2 (92), and hence it makes sense to talk about an //'(2) function as a solution to (2.23) or (2.32). On the other hand, if u e //'(2), then the components of the gradient Vw belong to L 2 (ft), which implies that du/dn is not well-defined on dl. In this regard (and others) the analysis of a Neumann problem is more delicate than that of a Dirichlet problem. 1 will have more to say about this in Section 2.5. A comprehensive reference for Sobolev space theory is the book by Adams [1]. Most modern texts on PDEs and finite elements contain a development of this theory; the text by Brenner and Scott [13] is particularly concise and accessible.

2.3 The weak form for other boundary conditions and PDEs
2.3.1 Neumann conditions and the weak form
Next I will derive the weak form of the Neumann problem

There is an interesting distinction between the weak forms of a Dirichlet problem and a Neumann problem: The Dirichlet condition appears explicitly in the weak form (2.23) (in the definition of the space //0'(2)), but, as I will now show, the Neumann condition does not appear explicitly in the weak form of (2.24). I assume that u satisfies (2.24). This presupposes that u has some extra smoothness beyond the requirement that u e //'(2), both because the left side of (2.24a) involves second derivatives of u and because du/dn is not well-defined for an arbitrary #'(2) function. Then

The reader should notice the use of Green's identity in the above calculation, and also the fact that the boundary integral vanishes because of the Neumann boundary condition satisfied by the solution u. (In the Dirichlet case, it was the boundary condition on the test function v that caused the boundary integral to vanish.)

30

Chapter 2. The weak form of a BVP The weak form of (2.24) is thus defined to be the following problem:

As I showed above, any solution of (2.24) is also a solution of (2.25). However, the reader might well question whether the converse is true. After all, (2.25) does not mention the Neumann boundary condition and so it is not obvious that a solution of (2.25) will necessarily satisfy the Neumann condition. It does, though, provided the solution u is smooth enough that Green's identity applies and Vw can be restricted to 32. Suppose that u e Hl(Q) is a solution of (2.25). Then, since //J(^) C Hl(Q),

## Since v e HQ (2), the boundary integral vanishes, yielding

Now, applying the reasoning that I sketched in the Dirichlet case, it follows that the PDE (2.24a) must hold (notice that the type of test function that I discussed on page 20whose support is a disk BS(XQ, yo)belongs to the space HQ (2)). To show that the Neumann condition (2.24b) also holds, I return to (2.25) and apply Green's identity once again to obtain

## (since V (K Vw) and / are equal on Q). Therefore,

must hold. Although the precise argument is rather technical, it should be believable that (2.26) can hold for all v e Hl(l) only if du/dn is zero on dl (the reader should bear in mind that K is strictly positive). Since Dirichlet conditions must be explicitly imposed in the weak form, while Neumann conditions are implied even though not explicitly imposed, Dirichlet conditions are often called essential boundary conditions, while Neumann conditions are called natural boundary conditions.

2.3. The weak form for other boundary conditions and PDEs

31

2.3.2

## Mixed boundary conditions

A BVP can have both Dirichlet and Neumann boundary conditions, with the two different conditions applied to different parts of the boundary. If 92 = r\ U F2 is a partition of 92 (so that F| n F2 = 0), then the following BVP is said to have mixed boundary conditions:

According to the above discussion, the Dirichlet condition is essential, while the Neumann condition is natural. The space of test functions is therefore defined to be

The Neumann condition is not mentioned in this definition, since it is a natural boundary condition. The derivation of the weak form follows the now familiar pattern:

## The boundary integral can be written as

The first integral on the right vanishes because the test function v is zero on F|, and the second vanishes when u is a solution of (2.27), since du/dn is then zero on F2. The weak form is therefore the same as before,

except that the space of test functions has changed. Exercise 7 asks the reader to show that if u satisfies the weak form, then it also satisfies the strong form (2.27), including the Neumann boundary condition.

2.3.3

## In homogeneous Neumann conditions

I will now explain how inhomogeneous boundary conditions affect the weak form of a BVP, beginning with the inhomogeneous Neumann problem

32

## Chapter 2. The weak form of a BVP

where h is a given function defined on 912. The derivation of the weak form proceeds as follows:

## Thus the weak form of (2.29) is as follows:

The weak form (2.30) is the same as it was for homogeneous Neumann conditions, except that the right-hand side has been changed. Inhomogeneous Dirichlet conditions The case of inhomogeneous Dirichlet conditions is a bit more complicated. The BVP is

where g is a function defined on 92. The function g must satisfy some regularity conditions, and I will assume that there is a function G e Hl(l) such that G = g on d&. It turns out that the correct space of test functions is still HQ (2); however, since the desired solution u does not satisfy homogeneous Dirichlet conditions, it cannot be the case that u HQ (2). Instead, the function w = u G is zero on 92, and thus the solution has the form u = w + G, where G is assumed to be known and it; 6 HQ (2) is unknown. Here is the derivation of the weak form:

## Thus the weak form of (2.31) is the following:

2.3. The weak form for other boundary conditions and PDEs

33

The reader will notice that (2.32) has the same form as (2.23), except that the right-hand side has changed. In general, it might be difficult to find a suitable function G satisfying the Dirichlet condition G g on 3 2 (in a BVP, only g is given, not G). However, in the context of the finite element method this is easy. I will show, in Section 4.1.1, how to produce a function G that (approximately) satisfies the inhomogeneous Dirichlet condition. It is left to the reader to derive the weak form of the BVP

2.3.4

## Other elliptic BVPs

The weak form of the other elliptic BVPs presented in Chapter 1 can be derived using the alternate version of Green's identity derived in Section 2.1.3: If<r = a(x, y) is a symmetric tensor and v v(x, y) is a vector, then

## and the mixed boundary conditions

where PI and F2 form a partition of 9 2. The weak form is derived by multiplying both sides of the PDE by a test function and applying Green's identity as before. However, in this case, both V a and / are vectors, so a test function must be a vector-valued function with both components in H] (1). The following notation will be used:

The definition of the space of test functions must also take into account the essential boundary condition (2.35a):

## Chapter 2. The weak form of a BVP

As in Section 2.3.2, the boundary integral vanishes because v is zero on F] and aun is zero on f2. The weak form is therefore

## Since / v = tr(v), the previous equation can be written as

In (2.36), the Lame moduli IJL and X. can be either constants or functions of (x, y). A similar derivation gives the weak form of the general system of linear elasticity

## The weak form is

where V is the same space of test functions as before. I will leave it to the reader to derive the weak forms in the case of inhomogeneous boundary conditions (see Exercise 9).

2.4. Existence and uniqueness theory for the weak form of a BVP

35

2.4

## Existence and uniqueness theory for the weak form of a BVP

In this section, I will sketch the existence and uniqueness theory for the weak form of a BVP. This theory depends on Hilbert space theory, which I will briefly describe. I must warn the reader, particularly if he or she does not have a rigorous grounding in mathematics, that this section is fairly difficult compared to the rest of the book. This theory is not so essential for implementing the finite element method, so certain readers may prefer to read it lightly. However, all readers should notice the distinction between a pure Neumann problem and a problem with Dirichlet or mixed boundary conditions when treated in the variational framework.

2.4.11Vector spaces and inner products I have previously mentioned vector space concepts, including the fact that the various function spaces, such as C2D(Q), L2(2), and so forth, form vector spaces. I will now explain some basics about vector spaces and inner products, as finite element methods depend strongly on this foundation. In the course of this discussion, I will explain some concepts that I have already used, such as the dot product of two Euclidean vectors, so that I can extend these concepts to other situations. A vector space is a set of objects, called vectors, on which are defined two operations, addition and scalar multiplication. Associated with each vector space is a field of scalars, which will always be R (the set of real numbers) in this book. Thus, if V is a vector space and u, v e V, then u + v is another element of V. Also, if u e V and a R, then aw belongs to V as well. The operations of addition and scalar multiplication must satisfy a list of algebraic properties, such as u + v v + u for all u, v e V, but these are all obvious in practice and I will not bother to discuss them here. The most common vector space is R", Euclidean n-space. A vector x e R" has n components, each of which is a real number, and is written either as x (X], J C 2 , . . . , xn) or, more commonly, as

36

## and scalar multiplication is also defined component by component,

There is another operation defined on Euclidean n-space that is of great importance, namely, the dot product:

The dot product is intimately associated with the geometry of R n , particularly with the concept of orthogonality, which I use in the next chapter. For now, however, the main importance of the dot product is that it defines the norm (length) of a vector:

of the most important concepts necessary for understanding modern techniques for differential equations is that functions can be regarded as vectors. Since functions can be added and multiplied by scalars, they have the fundamental properties of vectors. Moreover, it is shown in calculus courses that the sum of two functions (or a scalar multiple of a function) has the same regularity properties as the function itself. For example, if / and g are continuous or differentiate, then so is / + g. Therefore, the spaces that I presented in Section 2.2, such as C^(2), L2(l), //' (2), and so forth, are vector spaces. Moreover, it is possible to define a "dot product" for each of these spaces and thereby to define norms as well. The Euclidean dot product is an example of an inner product, which is defined as follows. DEFINITION 2.4. Suppose V is a vector space. An inner product on V is an operation (,), taking two vectors from V and producing a real number, that satisfies the following properties:

The first two properties together imply that (w, au + ftv) a(w, u) + ft(w, v) also holds for allu,v,w e V and all a, ft e R. For the sake of completeness, I will give the definition of norm as well. DEFINITION 2.5. Suppose V is a vector space. A norm on V is a real-valued function defined on V, usually denoted \\ \\, that satisfies the following properties:

2.4. Existence and uniqueness theory for the weak form of a BVP

37

The last property is called the triangle inequality, since, in Euclidean space, it means that the length of any side of a triangle is no greater than the sum of the lengths of the other two sides. When more than one norm or inner product appears in a discussion, I will denote the norm on V by || || v and the inner product on V by (, -)v so that there is no confusion. It is not difficult to show that if (-, ) is an inner product on V, then the following formula defines a norm on V: The following relationship between an inner product and the associated norm, which is called the Cauchy-Schwarz inequality, is always valid (though it is not so straightforward to prove): In Euclidean 2- or 3-space, it can be shown that

where 0 is the angle between u and v. In this case, the Cauchy-Schwarz inequality is obvious, since |cos(#)| < 1. 1 now wish to give the definition of the inner product on L2(2), but I do not want to just "pull it out of a hat." Let me therefore start with a simple case, the one-dimensional case of

and show how the inner product ought to be defined. I f / : [a, b] -> R is continuous, then / is certainly in L2(a, b) (since a continuous function on a closed and bounded interval is bounded). The function / can be approximated by a Euclidean vector by means of sampling. Given a positive integer n, a grid a = XQ < x\ < KI < < xn = b is defined on [a, b] by Xj a + i AJC, AJC = (b a)/n. The vector F e R" is then defined by

Clearly F can be said to approximate /, as Figure 2.1 shows. If / and g are two such continuous functions and F and G are the corresponding vectors in R", then the dot product of F and G is

38

## Chapter 2. The weak form of a BVP

Figure 2.1. Approximating a function f ( x ) by a vector F e R". This cannot define the inner product of/ and g, since F G depends strongly on the grid. However, as the grid is refined, the Riemann sum

which is just F G weighted by AJC, converges to a quantity that is independent of the grid, namely,

Therefore, it seems consistent (with the Euclidean dot product) to define the L2(a, b) inner product by

It can be shown that if / and g belong to L2(a, b), then (2.40) is well-defined (even if / and/or g are not continuous). It can also be shown that (2.40) satisfies the definition of an inner product. If 2 is a domain in R2 (or R", in general), then the L2 inner product is defined by

## The corresponding L2-norm is defined by

2.4. Existence and uniqueness theory for the weak form of a BVP The Sobolev space //' (Q) is defined in terms

39

(where the partial derivatives are weak derivatives), and therefore so is its inner product:

The Sobolev space HQ (Q) is a subspace of Hl (2). The concept of subspace will be very important, particularly in the next chapter, so its definition is given here: DEFINITION 2.6. Suppose V is a vector space. A subset WofV is called a subspace ofV if the following properties hold: 1. The zero vector belongs to W;

The last two properties can be expressed by saying that W is closed under addition and scalar multiplication. A subspace is a vector space in its own right.

2.4.2

Hilbert spaces

My goal in this section is to describe the theory that guarantees that certain variational problems have unique solutions. There is a fundamental property of the spaces H](2) and HQ (Q) (and other Sobolev spaces) that is needed for this theory: Sobolev spaces are complete. The property of completeness is rather abstract, but I can give an example of it that is easy to understand. The equation jc2 = a can be solved for each a > 0 because the real numbers are completethere are no holes in the real number line. Thousands of years ago, it was thought that all numbers could be represented as ratios of integers, that is, as rational numbers. However, the Pythagorean theorem suggests that there ought to be a number whose square is 2 (this number is the length of the hypotenuse of a right triangle with two legs of length 1), and it is easy to prove that no rational number jc can satisfy x1 = 2. The set of rational numbers Q is not complete. This example suggests that the concept of completeness might be important for a theory describing when certain equations are guaranteed to have solutions. What does it mean for a space to be complete? Thinking about the previous example leads to the right description. The following sequence of numbers converges to a solution tojc 2 = 2:

40

## Chapter 2. The weak form of a BVP

This sequence consists of rational numbers, but the discussion above shows that the sequence cannot converge to a rational number. Therefore, Q is not complete because there is a sequence in the set that "ought" to converge, but there is no limit for the sequence in the set itself. The property of a complete space is that every sequence that ought to converge actually does converge to an element of the space. It remains only to define what is meant by a sequence that ought to converge. The idea is that terms in the sequence get closer and closer together the farther out in the sequence they are. I will give the definition in a normed vector space. DEFINITION 2.7. Let {vn} be a sequence of vectors in a normed vector space V. The sequence is called Cauchy if, given any > 0 (no matter how small), there exists a positive integer N such that

Here is the related definition of completeness. DEFINITION 2.8. A normed vector space V is said to be complete if every Cauchy sequence in V converges to an element ofV. It can be shown that the inner product spaces mentioned earlier, R n , L 2 (2), //' (2), and HQ (2), are all complete. A complete inner product space is called a Hilbert space. An important notion in the theory of complete spaces is that of a dense subspace. For example, every real number can be approximated arbitrarily well by a rational number. For this reason, Q is said to be dense in R. DEFINITION 2.9. Suppose V is a normed vector space. A subset W is said to be dense in V if, given any v e V and any > 0 (no matter how small), there exists w e W with

The set R is the completion of the dense subset Q; that is, R is precisely what results when all of the holes in Q are filled. The following theorem identifies some common dense subspaces of the Sobolev spaces. THEOREM 2.10. 1. C(2) is dense in L2(2) (that is, L2(Q) is the completion ofC(Q) under the L2 norm). 2. C'(ft) is dense in Hl(Q) (that is, Hl(Q) is the completion ofC}(tt) under the //' norm). 3. C<j(2) is dense in H^(Q.) (that is, H^(Q) is the completion ofCQ(Q) under the H} norm). Other spaces dense in L2 (ft) are C () and C (Q) n L2 (Q). The space C (ft) n #' (Q) is also dense in Hl(Q), and the space C0(2) is also dense in HQ (2).

2.4. Existence and uniqueness theory for the weak form of a BVP

41

This theorem shows that the Sobolev spaces arise from "filling the holes" in the classical spaces, in the same way that R arises from Q.

2.4.33Linear functionalsals
The Sobolev spaces are infinite-dimensional vector spaces, and the study of linear algebra is considerably more complicated in infinite dimensions than in finite dimensions. Much insight into the nature of infinite-dimensional vector spaces comes from studying the continuous linear functionals that can be defined on a given space. If V is a normed linear space, then a linearfunctional I on V is a real-valued function on V that is linear:

Continuity for a linear functional is defined just as for any other function: is continuous at u 6 V if or, equivalently, However, linearity has a couple of important implications for continuity. First of all, if I is continuous at any u in V, then t must be continuous at every u in V. To see this, notice that

where z UQ + V u. Since \\u v\\ 0 if and only if \\UQ z\\ -> 0, it follows that t is continuous at u if and only if it is continuous at MO. Second, it is not difficult to show3 that t is continuous at M = 0 (and hence at every ) if and only if there exists a nonnegative constant M such that

The smallest such bound M is called the norm of the linear functional t, and is denoted \\t\\.
3

\fl is continuous at 0, then there exists <\$ > 0 such that ||u|| < S implies that

But then, for any u e V, the vector (<5/IM|)M has norm equal to 8, and thus

and (2.42) holds with M = <S '. The other direction, that i is continuous at u = 0 if (2.42) holds, is even easier to show.

42

## Chapter 2. The weak form of a BVP

The set of all continuous linear functionals defined on V forms a normed vector space, called the dual space of V and denoted V*. The norm on V* is the norm defined in the previous paragraph. An alternate, equivalent definition of the norm for V* is the following:

A linear functional is continuous if and only if the supremum in (2.43) is finite, that is, if and only if t is bounded on the unit ball {u e V : \\u\\ < I}. For this reason, continuous linear functionals are often referred to as bounded linear functionals.

2.4.4

## The Riesz representation theorem

I can now present a fundamental theorem about Hilbert spaces that is the basis for the existence and uniqueness theory for (symmetric) variational problems. THEOREM 2.11 (the Riesz representation theorem). Suppose V is a Hilbert space. Then V* can be identified with V in the following sense: 1. For each u e V, the linear functional I defined by t(v) = (u, v) belongs to V*, and

## 2. For each t e V*, there exists a unique u e V such that

and

I will not prove the Riesz representation theorem. The first part follows easily from the Cauchy-Schwarz inequality, but the second part is more difficult.

2.4.5

## Variational problems and the Riesz representation theorem

Here are some of the variational problems that I have presented thus far:

Examination of these formulas shows that each can be written in the form

2.4.

## Existence and uniqueness theory for the weak form of a BVP

43

where V is a Hilbert space (one of the Sobolev spaces). The expression a(-, ) in each case is a symmetric bilinear form, that is, it satisfies the following properties:

## The bilinear form a(-, ) also satisfies the following property:

The reader should notice that these are most of the properties defining an inner product. In certain cases, the bilinear form a(-, ) can be shown to satisfy the following properties as well: 4. There exists a > 0 such that a(u, u) 5. There exists ft > 0 such that a (u, u) If property 4 holds, then a(-, ) is said to be elliptic over V (equivalently, to be V-elliptic). If property 5 holds, then ( - , - ) is called bounded. I will now explain how the Riesz representation theorem can be used to answer questions about existence, uniqueness, and stability of the solution to the variational problem (2.44) in the case that a(-, ) is V-elliptic and bounded and I is a bounded linear functional on V. First I have to deal with the following point: As a Hilbert space, V has an inner product (-, ) Under the assumption that a(-, ) is a V-elliptic, symmetric bilinear form, it follows that a(-, ) defines an alternate inner product on V. The V-ellipticity of (-, ) implies the final property of an inner product: a(u, u) 0 implies that u 0. Therefore, V is an inner product space under the energy inner product a(-, ) However, it does not immediately follow that V i s a Hilbert space under this inner product. That is, even though V is complete under (-, ), it may fail to be complete under a(-, ) It turns out that this cannot happen when a(-, ) is both bounded and V-elliptic. Indeed, in that case,

## where || ||^ denotes the energy norm:

Two norms that satisfy (2.45) are said to be equivalent, and it can easily be shown that vn -> v under one norm if and only if vn ^ v under any equivalent norm. That is, equivalent norms define the same notion of convergence. It follows that, since V is complete under (, ) it is also complete under a(-, ) The next point is the following: If I is a bounded linear functional on V, this means that there exists a constant M > 0 such that

44

## Chapter 2. The weak form of a BVP

the inequality also holds. Therefore, if t is bounded with respect to the original norm on V, it is also bounded with respect to the equivalent energy norm on V. The converse is also true: If (-,-) is bounded and V-elliptic, then I e V* if t is bounded with respect to the energy norm (see Exercise 10). Thus, given the variational problem (2.44), where a(-, ) is a bounded, V-elliptic, symmetric bilinear form and t is a bounded linear functional on V, I can apply the Riesz representation theorem directly: There exists a unique vector u e V such that Under these conditions, then, the existence and uniqueness of the solution to (2.44) is completely settled by the Riesz representation theorem. Moreover, the theorem also states that the solution u satisfies where |||| denotes the norm of t with respect to the energy norm on V: I showed above that where \\t\\v* represents the norm of with respect to the original norm on V. Therefore, the solution u satisfies The inequality expresses the stability of the solution of (2.44) on the data I. Another way to express this is to say that the solution u depends continuously on the data I. To see this, notice that if MI is the solution corresponding to t\ and u2 is the solution corresponding to ti, then

Subtracting and using linearity yields where t t\ 2- Thus u\ ui is the solution corresponding to the data t\ ii and therefore the following inequality holds: This shows that the solution u depends continuously on the data t and, in particular, that a small change in the data leads to at most a small change in the solution. In the next section, I will discuss the various examples from earlier in the chapter, showing which satisfy the crucial property of ellipticity and what can be done if ellipticity fails.

## 2.5. Examples of ellipticity

45

2.5

Examples of ellipticity

## 2.5.1 The model problem

1 begin with the BVP

where

## The inner product on V is given by

I will assume that there are constants &o, k\, with 0 < ko < k\, such that

The reader will recall the physical situations modeled by the above PDE; in those examples, K was a physical parameter, such as thermal conductivity, that was necessarily positive. The positivity of K is essential for mathematical analysis as well. The boundedness of a(-, ) is easy to show:

46

## Chapter 2. The weak form of a BVf 3

The proof that a(-, ) is HQ (ft)-elliptic is more difficult. It depends on Poincare's inequality: There exists a positive constant C, depending only on the domain ft, such that

It is important to understand that Poincare's inequality holds only when u is restricted to HQ (ft). It does not hold for all u e H[ (2); as a counterexample, one can take any nonzero constant function u. Such a function belongs to H' (2), but the inequality in (2.50) obviously fails. Given any u e HQ (2),

The reader should notice how Poincare's inequality and the fact that K is bounded away from zero were used to derive this bound. It follows that a(-, ) is HQ (ft)-elliptic; the constant a can be taken to be a k^C1. Since a(-, ) is both bounded and HQ (ft)-elliptic, the Riesz representation theorem applies, and the variational problem

has a unique solution for each t e (/f 0 (ft))*. It is not difficult to show that L 2 (ft) c (HQ (ft))* in the sense that

defines a member of (7/0'(ft))*. Similarly, the term introduced by the nonzero boundary data (if any) defines a member of (//J (ft))*. Therefore, it follows that

has a unique solution for each / e L 2 (ft), g.4 Moreover, the solution M depends continuously on / and g. The case of mixed boundary conditions

4 The required regularity of g is a bit subtle. It must be the case that there exists G e H' (2) such that g is the restriction to d& of G. By the trace theorem mentioned earlier, this means that g must lie in

## 2.5. Examples of ellipticity

47

where 3ft = F| U F2 is a partition of 3ft, is similar, assuming that FI has nonzero measure. The variational problem is

where = g on FI . A version of Poincare's inequality and G is any function in H] (ft) (2) satisfying G applies to this space as well:

The constant C may have a different value than in (2.50); its value depends on both ft and TI . However, the application of the Riesz representation theorem is the same, and existence, uniqueness, and stability for the BVP with mixed boundary conditions are also guaranteed. I have already shown that the BVP

may not have a solution, and, if it does, the solution cannot be unique. It is not surprising, therefore, that the variational form

has the same properties. Indeed, the energy inner product is not //' (ft)-elliptic, as I mentioned above. If u(x, y) = 1 (the constant function), then a(u, u) 0 but ||W||//I(Q) > 0. Therefore, fails to hold. The pure Neumann problem can still be handled by the Riesz representation theorem, but it is necessary to look for the solution in an appropriate subspace of //' (ft). Any two solutions to the BVP differ by a constant, so the solution set is {u + C \ C e R}, where u is any particular solution. One way to select a unique solution from this set is to take the one with mean zero, that is, the solution belonging to the subspace

48

## Chapter 2. The weak form of a BVP

where V is defined in (2.51). This is the same as before, except that Hl(Q) has been replaced with V. The fact that (-,-) is V-elliptic follows from Friedrich's inequality: There exists a positive constant C such that

where

Here |2| denotes the measure of the set 2, so that U is the average value of v over 2. If u e V, then U = 0, and hence Friedrich's inequality yields

It then follows that a(-, ) is V-elliptic. By the Riesz representation theorem, (2.52) has a unique solution that depends continuously on / and h. The requirement that u belong to V instead of to H} (2) is an extra constraint on the solution:

In the finite element method, this constraint is typically ignored when reducing the variational equation to a system of linear equations. The result is a singular linear system that is guaranteed, however, to have a solution. Solving this system requires special care because of the singularity. I will discuss this point in Example 7.4 of Section 7.4.2 and further in Section 11. 5. It is interesting to note that the Riesz representation theorem applies to (2.52) for arbitrary / and h. That is, (2.52) has a unique solution even if / and h do not satisfy the compatibility condition. The meaning of the solution is not obvious in the case that the compatibility condition fails since the strong form of the BVP does not have a solution in that case. This question is pursued further in Exercise 13.

## 2.5.2 The equations of isotropic elasticity

I now turn to the equations of isotropic elasticity:

## 2.5. Examples of ellipticity

49

where 3^2 r i U r 2 is a partition of 31 Special cases are the Dirichlet problem, H = 32, F2 = 0, and the Neumann problem, F\ = 0, r2 = 32. The Dirichlet problem is also called the displacement problem, and the Neumann problem is the traction problem. The variational form is

where and the functional t is determined by the data /, g, h. As in the case of the model problem, the fundamental question is whether the bilinear form

is bounded and ^-elliptic. The boundedness of a(~, ) is easy to verify, so 1 will focus on the question of V -ellipticity. This question is answered by various forms of Korn's inequality, which relates the //'-norm off to the L2-norm o f e v . For v e (Hl(2))2,

## One version of Korn"s inequality is

Here C is a positive constant depending only on 2, and v e (//J^)) 2 if and only if both components of v belong to HQ (Q). A straightforward calculation shows that

where/: = 2min{/u, /z + A} (see Exercise 19). Assuming k > 0, which is reasonable since/x is the shear modulus of the elastic material and /z+A. is the bulk modulus (see Exercise 1.3.6), it follows immediately that a(-, ) is (//Q(2)) 2 -elliptic. Therefore, the Riesz representation theorem can be applied to the variational form of the displacement problem

Thus there exists a unique solution that depends continuously on / and g. In the case of mixed boundary conditions, in which both PI and F2 have positive measure, a similar version of Korn's inequality applies. If V is given by (2.55), then

50

## Chapter 2. The weak form of a BVP

The value of the constant C is different than in (2.56), but the conclusion is the same: a(-, ) is elliptic over V, and therefore the BVP

has a unique solution depending continuously on f,g,h. It remains to consider the pure traction problem: PI = 0, F2 92. The analysis is now a bit more subtle. Thinking of e as a mapping from (H[(Q))2 into (L 2 (2)) 2x2 , has a three-dimensional null space, spanned by the functions

(see Exercise 1.3.5). These three functions define a space of (infinitesimal) rigid displacements. The functions w (1) and w (2) define translations, and (3) represents an infinitesimal rotation. For any linear combination of these three functions, u = ot] u ( } ) + 2 (2) + 3W (3) , u 0, and hence u satisfies -V-or =0inft, an 0 on 92. It follows that the pure traction problem cannot have a unique solution. The reader should notice that the null space of does not come into play if Dirichlet conditions are imposed on at least part of the boundary, since these "pin down" the solution. As for the model problem discussed earlier, this lack of uniqueness also implies that

does not always have a solution; the functions / and h must satisfy a compatibility condition if a solution is to exist. Deriving this condition is left as an exercise. To treat the pure traction problem in variational form, it is necessary to impose three additional constraints that remove the three degrees of freedom inherent in the space of rigid motions. An appropriate space is

The first two constraints in the definition of V eliminate nonzero translations, while the third implies that the average curl of the vector field v is zero. This eliminates nonzero rotations. If I write for the null space of 6, then

## 2.6. Variational formulation of nonsymmetric problems

51

This means that any u e (//' (2))2 can be written uniquely as u = v + n, where v e V and n e N (see Exercise 16). Another version of Korn's inequality states that

## The variational form of the traction problem can thus be posed as

This problem has a unique solution that depends continuously on / and h. As in the case of the scalar model problem, when addressing this Neumann problem with the finite element method, the result is a singular linear system.

2.6

## Many PDEs lead to nonsymmetric variational problems. An example is the BVP

where K : Q > R, c : 1 > R 2 , and p : 17 -> R. As usual, FI U F2 is a partition assume that there are constants k\$, k\, p\ such that 0 < &o < K < k\ in 2 and 0 < p < p\ in Q. The weak form is derived by multiplying by a test function and integrating by parts in the highest-order derivative; the result is

where Defining

## (2.62) can be written as

Although this is identical in appearance to the variational problems analyzed earlier, now fl(-, ) is not symmetric when c ^ 0.

52

## Chapter 2. The weak form of a BVP

The lack of symmetry means that the Riesz representation theorem does not apply to (2.64), since a(-, ) does not define an inner product over V. Fortunately, there is an analogous result, the Lax-Milgram theorem, that does apply to nonsymmetric problems. THEOREM 2.12 (the Lax-Milgram theorem). Suppose V is a Hilbert space anda(-, ) is a bilinear form on V that is bounded and V-elliptic: There exist a > 0 and ft > 0 such that

## Moreover, u depends continuously on t; to be precise,

The Lax-Milgram theorem applies to (2.62), provided the V-ellipticity of a(-, ) can be verified. I now describe two situations in which a(-, ), defined by (2.63a), is V-elliptic. The first is a simple consequence of Poincare's inequality and applies to the case in which the measure of PI is not zero (that is, it applies except in the case of a pure Neumann problem). In this situation, it follows from Poincare's inequality that there exists a constant C\ > 0
such that

The constant C\ depends on 2, PI, and the lower bound Q for K over 2. Suppose that the components of the vector-valued function c are each bounded over 2, say with Then it is straightforward to show that

Therefore, a ( - , ) is V-elliptic if C\ > C2/2, that is, if the nonsymmetric part of the operator is sufficiently small compared to the (leading-order) symmetric part. Notice that in this result, the zero-order term

## 2.7. Exercises for Chapter 2

53

Another result is Garding's inequality, which applies to a pure Neumann problem as well as to Dirichlet or mixed boundary conditions. Garding's inequality requires that the zero-order coefficient p be strictly positive: p > po > 0 in Q. The result states that if po is large enough, taking into account /CQ and 2, then a(-, ) is //'(^)-elliptic:

If po is not sufficiently large, then the variational problem may still have a unique solution, but it cannot be proved directly from the Lax-Milgram theorem. For more details, the reader can consult Brenner and Scott [13, Section 5.6].

2.7

## Exercises for Chapter 2

1. Let 2 be the unit square: 2 = (0, 1) x (0, 1). Verify (2.7) for

2. Let a : R2 - R 2 x 2 be smooth (that is, a is a smooth tensor- valued function of (x, y)). Use the ordinary divergence theorem to show that

## 3. Let a : R2 -> R 2x2 and u : R2 -> R2 be smooth. Show that

4. Use Green's first identity to prove that if u and v are smooth enough, then

This is Green's second identity. 5. Let y be defined by (2.16). Show that (2.17) holds. 6. Prove that (2.18) is equivalent to (2.19) as follows: (a) Show that if (2.19) fails to hold, then there exists v e Cj,(S2) such that

(b) Show that if u satisfies (2.67), then J(u + 8v) < J(u) for all S > 0 sufficiently small, and (2.18) fails to hold. The foregoing shows that (2.18) implies (2.19). The converse, that (2.19) implies (2.18), follows immediately from (2.17).

54

## Chapter 2. The weak form of a BVP

7. Show that if M satisfies (2.28) and u is smooth enough, then it also satisfies the BVP (2.27). 8. Derive the weak form of the BVP (2.33). 9. Consider the following BVP with inhomogeneous boundary conditions:

(a) Derive the weak form in the case of isotropic linear elasticity: a 2/ze + Atr(e)/. (b) Derive the weak form in the case of general linear elasticity: a Ae. 10. Suppose V is a Hilbert space, is a linear function on V, and a(-, ) is a bounded, V -elliptic, symmetric bilinear form on V. Show that if t is bounded with respect to the energy norm defined by a ( , ) > then I e V*. 11. Suppose / e L 2 (ft) and a functional t is defined by

where V is //' (ft) or some subspace of /71 (ft). Prove that t e V*. (Hint: Apply the Cauchy-Schwarz inequality.) 12. Suppose the function K satisfies 0 < k0 <x(x,y) < k\ for all (x, y) e ft and for some constants ko, k\ . Suppose further that the function b satisfies 0 < bo < b(x, y) < b\ for all (jc, y) e ft and for some constants bo,b\. Consider the BVP

(a) Derive the weak form of this BVP. (b) Show that the resulting bilinear form is bounded and //' (ft)-elliptic. 13. Consider (2.52), the variational form of the pure Neumann problem

Suppose

2.7.

55

## (a) Show that

has no solution. (Note that the difference between (2.52) and (2.68) is the space, V versus Hl(tt).) (b) Show that the solution of (2.52) solves

## 14. Show that if

(that is, M belongs to the null space of 6) and if M also belongs to V (defined by (2.59)), then u is the zero function. 15. Derive the compatibility condition for (2.58). 16. Let V" and N be the subspaces of (//' (J))2 defined in (2.59) and (2.60), respectively. Show that, for any v e (//' (2)) 2 , there exist 0 e V and r e N such that v = v + r. Show further that this representation is unique. 17. Verify (2.65). (Hint: Clearly

Use the fact that |Vu||u| < (|Vy| 2 + |u| 2 )/2 to get an upper bound for the second integral on the right.) 18. Let Q be the unit square: 2 = {(x, y ) : 0 < j c < l , 0 < ; y < : l } . Given that

use (2.65) to determine an upper bound cmux on the constant c such that the BVP

## Chapter 2. The weak form of a BP

holds for all symmetric tensors e. (Hint: Write the tensor as a "long vector" e = (e n , 22, 12, 21)- Then (2yu,6 + Atr(e)/) = (Be) 6 > ke e for a certain symmetric 2 x 2 matrix #. Find the eigenvalues of B.)

Chapter 3

## The Galerkin method

In the previous chapter, I explained how to transform an elliptic BVP into its weak form. One advantage of the weak form is that it puts much less stringent requirements on the problem functions (such as the right-hand side of the PDE) and the solution, and therefore applies to a larger collection of problems. However, a more important advantage is that the weak form admits new solution techniquesin particular, the Galerkin method. There are several ways to introduce the Galerkin method. I prefer to approach it from the standpoint of linear algebra, because then the most striking feature of the method is clear from the beginning: The Galerkin method computes the best approximation to the true solution from a given finite-dimensional subspace.

3.1

## The projection theorem

The projection theorem is a foundational result from linear algebra that provides a method for determining computable approximations to known or (in some contexts) unknown quantities. In its abstract form, it answers the following question: Given a vector u in an inner product space V and a finite-dimensional subspace W, what is the best approximation to u from W? That is, which vector w e W is closest to u in the sense that \\u w\\ is as small as possible? The projection theorem applies to inner product spaces such as Rn or L 2 (2) and is based on the notion of orthogonality. DEFINITION 3.1. IfV is an inner product space with inner product (-, ) and u, v are two vectors in V satisfying ( u , v ) 0, then u and v are said to be orthogonal. In R2 or R3, where vectors can be visualized, orthogonality is equivalent to perpendicularity, since it can be shown that, in R2 or R 3 , two vectors u, v are perpendicular if and only if u v 0. However, orthogonality extends to spaces like L 2 (2), where "two functions are perpendicular" has no apparent geometric meaning. In fact, the implications of orthogonality are pretty much the same, regardless of whether they can be visualized.
57

58

## Chapters. The Galerkin method

Since the norm in an inner product space is defined by the inner product, the Pythagorean theorem holds in any inner product space. If vectors u, v in an inner product space V satisfy (u, u) = 0, then

The Pythagorean theorem is the basis for the proof of the projection theorem. THEOREM 3.2. Suppose V is an inner product space, W is a finite-dimensional subspace ofV, andu e V. Then 1 . there is a unique vector w e W satisfying

The vector w is called the best approximation to u from W or the projection of u onto W and is denoted projwu. 2. a vector w is the best approximation to ufrom W if and only if it satisfies the following orthogonality condition:

(see Figure 3. 1). The projection theorem also holds if W is an infinite-dimensional (closed) subspace of V, but in the finite-dimensional case the orthogonality condition makes it possible to derive a system of equations defining the best approximation, as I now show. If W is finite-dimensional, then it has a basis [w], . . . ,wn} and w e W can be represented as

## Figure 3.1. Illustration of the projection theorem.

3.2. The Galerkin method for a variational problem If u; also satisfies (3.1), then (taking z w; ( )

59

Therefore, u; is determined by a system of n linear equations in the n unknowns a\ ,0.2,... ,ctn. If G e Rnxn and b e R" are defined by

respectively, then the system of equations can be written in matrix-vector form as Got = b. The matrix G is called the Gram matrix and the equations represented by Ga = b are referred to as the normal equations. Therefore, to compute the best approximation to u from W requires, when W is n-dimensional, the solution of an n x n system of linear equations. If it happens that {w], W 2 , . . . , wn} is an orthogonal basis for W, that is, if (Wj, w;/) = 0 for i / y, then the Gram matrix G is diagonal and it is simple to solve Ga = b:

It is not practical in the context of most PDEs to compute an orthogonal basis. However, if G is "nearly" diagonal, in the sense that most of the off-diagonal entries are zero, then Got = b can be solved efficiently even when n is very large. When most of its entries are zero, a matrix is said to be sparse. As shown in the next chapter, the finite element method produces systems with sparse matrices.

3.2

## The Galerkin method for a variational problem

As I showed in the previous chapter, the variational form of many BVPs takes the form

where a(-, -) is a symmetric bilinear form on the Hilbert space V and is a continuous linear functional on V. When the BVP is elliptic, the conditions

hold, and the bilinear form a(-, ) defines an alternate inner product on V.

lj

ghcfgty

Given a finite-dimensional approximating subspace W of V, the best approximation from W to the solution u of (3.2) is computed by solving the system Got b, where

and { w i , W2, . . . ,wn} is a basis for W. Since the basis [ w \ , w^, . . . , wn] is known, the matrix G is computable. However, the true solution u is unknown, and there is no way of computing the vector b. The Galerkin idea is simple: Compute the best approximation to u using the alternate inner product defined by the bilinear form a(-, ) (the energy inner product). Then the variational equation (3. 2) gives the value of a (u, u)forallv G V (even though u is unknown): a(u, v) = l(v}. To find the best approximation to u in the energy norm, it is necessary to solve the system KU = F, where K is the n x n matrix defined by

## and F e R" is defined by

In this context, the Gram matrix K is usually called the stiffness matrix, and the vector F is called the load vector. The vector U e Rn defines the approximate solution:

The load vector is computed by the formula Ft = i(wt), i = 1 , 2, . . . , n. The reader should recall from the derivation in the previous section that the computed solution w (given by (3.4)) satisfies

## or, since {w\ , if 2, . . . , wn} forms a basis for W,

The variational equation (3.5) is the form in which the Galerkin method is usually presented. It is identical to the variational equation (3.2) except that the space V is replaced with the finite-dimensional subspace W. Beginning with (3.5), the usual starting point, it can be shown directly that the approximate solution w is the best approximation to u from W. The solution u satisfies

61

## Similarly, w satisfies Subtracting (3.6) and (3.7) yields

or

This is the orthogonality condition that is necessary and sufficient for w to be the best approximation from W (in the energy norm) to the true solution u. The Galerkin method leads to an approximate solution w satisfying

where u is the true solution to (3.2). While the vector w is not the best approximation in the original norm || || on V, it cannot be much worse than the best approximation in the V-norm. The derivation is based on properties (3.3) and the following observation: If v e W, then, since W is a subspace of V, v w is also in W. Since w is the best approximation to u from W in the energy inner product, the orthogonality condition

## Dividing both sides by ||w w\\ yields

This result is called Cea's theorem. While \\u w\\ need not be as small as possible, it cannot be too much larger than the smallest possible \\u v\\,veW. This fact is central to the finite element method, which is based on a family of approximating subspaces Wh, with dimension of W/, growing as h > 0. The corresponding Galerkin approximations Wh e Wh must improve at the same rate as the best approximations from Wh as h -> 0.

62

## Chapters. The Galerkin method

Originally the Galerkin method was used to produce accurate approximate solutions from cleverly chosen low-dimensional subspaces. However, the finite element method is based on making the necessary computations (the computation of K, F, and the solution U of KU F) efficient even when the dimension of the approximating subspace is large. This is achieved by choosing a special kind of approximating subspace, as explained in the next chapter. The reader is asked to work out some simple examples of the Galerkin method in the exercises at the end of the chapter.

3.2.1

## Another interpretation of the Galerkin method

The reader will recall from Section 2.2 that the variational form of a BVP can be viewed as the optimality condition for the optimization problem of minimizing the potential energy of the physical system being modeled. Using the abstract notation from (3.2), the energy is given by

(I will ignore the additive constant, since it does not affect the minimization.) The problem is to find u e V to solve Given that V is infinite-dimensional, making (3.9) intractable in most cases, a natural approach is to choose a finite-dimensional approximating subspace W of V and solve instead

This approach is called the Ritz method. Since W is finite-dimensional with a basis {wi,u)2,.-.,wn}, (3.10) can be expressed in terms of the coordinates in R" defined by the basis for W. If w e W is given by

then

## 3.3. Exercises for Chapter 3

63

where K and U are the same stifmess matrix and load vector defined above. The problem now is to choose U e R" to minimize

## and thus the minimizer is found by solving the linear system

Therefore, minimizing J over the subspace W (the Ritz method) is really the same as the Galerkin method. For this reason, the method is sometimes called the Ritz-Galerkin method.

3.2.2

## The Galerkin method for a nonsymmetric problem

Although I restrict myself almost entirely to symmetric problems in this book, I want to point out that the Galerkin method is just as applicable to a nonsymmetric elliptic problem. In fact, the norm of the error satisfies Cea's theorem, as in the symmetric case, and for exactly the same reason. (The reader can verify that in the derivation of (3.8) I did not use the symmetry of a(-, )) However, it cannot be said that u; is optimal in the energy norm, since the bilinear product a (, ) does not define an inner product in the nonsymmetric case.

3.3

## Exercises for Chapter 3

Note: The following exercises require the computation of a number of definite integrals, and also the solution of systems of linear equations. The reader is encouraged to use a computer algebra system; these computations would be very tedious performed by hand. 1. (Some simple examples of the projection theorem.) Consider the inner product space

## with inner product

The function f ( x ) = ex belongs to L 2 (0, 1). The following exercises ask for the best approximation to / from various subspaces, using various inner products. (a) Consider the subspace P^ consisting of all polynomials of degree four or less. Find the best approximation to / from V\. Note that a basis for P^ is

64

Chapters. The Galerkin method (b) Now consider L 2 (0, 1) under the alternate inner product

Find the best approximation, relative to the norm induced by this inner product, to/ from P4. (c) Repeat Exercise 1 (a) with P4 replaced by the subspace spanned by the basis (d) Repeat Exercise l(b) with P^ replaced by F2. By graphing the error in each estimate, compare the four approximations to /. Is one to be preferred over the others? Why? The reader should note that when the vector / to be estimated is known, one may choose any convenient inner product. This should be contrasted with the Galerkin method, in which the vector to be estimated is unknown and the energy inner product must be used. 2. Consider the one-dimensional BVP

## where f ( x ) = ex . The weak form of this BVP is

where
and

The variational problem (3.1 1) can be written in the usual form, a(u, v) t(v} for all v e V, by defining V = /^(O, 1) and

## (a) Define W to be the subspace of HQ (0, 1) spanned by the basis

Apply the Galerkin method to find the best approximation, in the energy norm, from W to the solution u of (3.1 1).

3.3. Exercises for Chapter 3 (b) Repeat Exercise 2(a) with W replaced by

65

(c) Find the exact solution to the original BVP and compare the two approximate solutions by graphing the errors. 3. Let {w\, W2,. , wn] be a basis for a subspace W of an inner product space, and let G be the corresponding Gram matrix: G,; = ( w j t u;/). Notice that G is symmetric: Gtj - Gji for all i,j. (a) Show that G is positive definite: x Gx > 0 for all x e R", jc ^ 0. (Hint: Let v Y%=\ xiwi ar|d snow that x Gx = (v, v).) (b) Use the preceding result to show that G is nonsingular.

Chapter 4

## Piecewise polynomials and the finite element method

I have now presented the two theoretical foundations of the finite element method: the variational form of a B VP and the Galerkin method for producing an approximate solution to a variational equation from a given finite-dimensional subspace. The key practical ingredient of the method is the choice of the approximating subspace. The finite element method is Galerkin's method with a subspace of piecewise polynomial functions. The Galerkin method requires the computation of the stiffness matrix K and the load vector F, and the solution of the system KU = F. For the method to be efficient, it must be possible to compute K and F efficiently and also to solve the system efficiently. For the method to be effective, the approximating subspace must be chosen so that the true solution of the problem can be well-approximated by an element of the subspace. The three requirements described in the previous paragraph can all be met by an approximating subspace consisting of piecewise polynomial functions. It is easy to integrate and differentiate polynomials (the main requirements for computing K and F), and piecewise polynomials lead naturally to a sparse stiffness matrix, allowing KU F to be solved efficiently. Finally, smooth functions can be well-approximated by piecewise polynomials. I begin by describing the construction of spaces of piecewise polynomials.

4.1

## A polynomial in jc and y has the form

where aoo, io, , #o are constants. To define & piecewise polynomial over a domain f2, the domain must be partitioned into subdomains. A piecewise polynomial is a function that is defined by a polynomial on each subdomain. The collection of subdomains is referred to as a mesh.
67

68

## Chapter 4. Piecewise polynomials and the finite element method

Figure 4.1. Two examples of nonconfarming triangulations. In both examples, the intersection of triangles 1 and 2 is a line segment that is not an edge of triangle 1.

## Figure 4.2. Triangulations of two polygonal domains.

The most common meshes in two dimensions are triangulationsthe domain l is expressed as the union of triangles.5 The intersection of any two triangles must be a common vertex or a common edge. Situations such as those shown in Figure 4.1 are called nonconforming and are not allowed. If 1 is not polygonal, it is necessary to approximate pieces of 92 by line segments or simple curves, which can give rise to triangles having a curved edge. In this section, I will assume that 2 is polygonal, so that a triangulation covers it exactly. Figure 4.2 shows triangulations of a square and a pentagon. Some notation is required to describe a piecewise polynomial relative to a given triangulation. I will assume that a given triangulation consists of Nt triangles T\, T2,..., T^l. The vertices of the triangles will be denoted z\, zi,..., ZNI:, where Zj = (*_/, x/)- As Figure 4.2 illustrates, each vertex is typically a vertex of several triangles. To each triangle is associated three vertices from the list z\,Z2, ,ZNV, which can be identified by their indices in this list. The indices of the vertices of 7} will be denoted /,i, n/,2, /,3. That is, the vertices of 7/ are

5 Here I will use a convenient abuse of terminology. Strictly speaking, a triangle is the union of three line segments. However, 1 will denote by T the domain enclosed by a triangle together with the triangle itself, and refer to T as a triangle. In my notation, then, i)T is what one would normally call a triangle.

## 4.1 . Piecewise linear functions defined on a triangular mesh

69

The mapping from /, j to n,-j maps from local indices (j = 1 , 2, 3) to global indices. When focusing on a single triangle T, it is frequently convenient to use only local indices, in which case the vertices will be denoted z\ , 12, Z3- Strictly speaking, this is an abuse of the notation established above, but it simplifies things considerably. In implementing and analyzing the finite element method, it is necessary to consider a family of triangulations. Each triangulation is usually labeled by its mesh size h, where h is the maximum diameter of any triangle in the triangulation. For this reason, a typical triangulation is denoted by 7/j. The simplest space of continuous piecewise polynomials consists of continuous piecewise linear functions defined relative to a triangulation 7/j of a polygonal domain 2 c R 2 A piecewise linear function p must reduce to a first-degree polynomial a/ + b(x + c/y on each triangle 7} e 77, . The three parameters a, , bt , c/ are uniquely determined by the values of the function at the three vertices of 7)- . A simple way to see this is to realize that the graph of p, restricted to 7], is a piece of a plane. That plane is uniquely determined by the three points

Moreover, since by assumption p is continuous, its value at a vertex is well-defined. If triangles 7] and Tk both have Zj as a vertex, then a/, b, , c, and ak,bk, ck must be such that

Therefore, the 3N, parameters a\ , b\ , c\ , . . . , a^, , #, , CN, are not all independent. Assuming that 7} and Tk are two triangles sharing an edge e, then

must hold if /? is to be continuous across e. Since the graph of a linear function in two variables, restricted to a line segment in the plane, is a line segment in 3-space, the fact that the linear functions defining p on 7) and Tk agree at the two endpoints of e is enough to show that they agree on all of e. This reasoning shows exactly how many degrees of freedom there are in describing a piecewise linear function on a given triangulation T/, : If 7h contains Nv vertices, then a piecewise linear function on 7~h is determined by the Nv nodal values of the function. Figure 4.3 shows two piecewise linear functions, one defined on each of the meshes shown in Figure 4.2. Thus, if 77, has Nv vertices, then the space P^(1) of all continuous piecewise linear functions defined on T/, is a finite-dimensional vector space with dimension Nv. Each function v e P^} can be identified with a vector a e RN" consisting of the nodal values of v. Moreover, it is easy to find a basis {^i , fa, , ^N,,} fr PH ^ w ith the property that

Such a basis would have to have the property that, for any a e

70

## Figure 4.3. Two continuous piecewise linear functions.

Figure 4.4. Standard has is functions for two spaces of continuous piecewise linear functions. that is, that

The condition (4.1) uniquely defines the basis functions Vv, * 1 , 2 , . . . , Nv. Typical examples, again for the triangulations from Figure 4.2, are shown in Figure 4.4. A basis satisfying (4.1) is called a Lagrange basis or a nodal basis.

4.1.1

## Using piecewise linear functions in Galerkin's method

Before I discuss the details of using continuous piecewise linear functions in the Galerkin method, I want to show that it is correct to do so, namely, that a valid approximating subspace can be constructed of continuous piecewise polynomial functions. For example, consider the Neumann problem

71

## where ^2 is assumed to be polygonal. The weak form is

where

To apply Galerkin's method, a subspace V/, of V is needed. To justify the choice V/j = P(h \ it must be shown that P^]) is a subspace of V, that is, that a continuous piecewise linear function belongs to Hl (Q). If u 6 P^\ with u(jc, y) = a-i + b[X + c-,y for ( x , y ) e 7), then, in the classical sense,

Here int(7)) denotes the interior of the set 7}, that is, the triangular region, not including the boundary. In most cases, the classical derivatives of v are undefined on the boundaries of most triangles 7}, since there is no reason to expect the derivatives to be continuous across the boundary between two adjacent triangles. (A linear function is piecewise linear, so in certain special cases the derivatives can be continuous across some or all boundaries between triangles.) I will show that (4.5) defines the weak partial derivatives of v. This is all I need to show, since the functions defined by (4.5) obviously belong to L2(2). For any 0 e C(ft),

where n(l) is the outward-pointing unit normal to 97}. Every edge e of a triangle 7} either belongs to 92 or is the edge of one other triangle 7). In the first case, 0 is zero over e, and thus In the second case, n(l) = n ( / ) on e, and both

## Since v is continuous, it follows that

and thus these two integrals cancel. The conclusion of this reasoning is that

and thus

where 3u/3jc is defined by (4.5a). This shows that (4.5a) defines the weak partial derivative with respect to jc of v. A similar argument shows that (4.5b) defines the weak partial derivative of v with respect to y, and thus P^ } is a subspace of V. Next I will show how to incorporate Dirichlet boundary conditions into the definition of a space of continuous piecewise linear functions. The following example will be used:

Here FI and F2 form a partition of 32, and it is assumed that any points where Fj and F2 meet are nodes in the triangulation, and also that any such nodes belong to Fj. The weak form of (4.6) is

where and a(-, ) and i are defined as before. If e is an edge of a triangle T, e Th and e lies on dQ, then e will be called a free (boundary) edge if one or both endpoints lie in F2. On the other hand, e is a constrained edge if both endpoints lie in FI . The nodes lying on F2 or in 1 are called free nodes, while those lying in FI are called constrained nodes. The desired approximating subspace of V is

## 4.1. Piecewise linear functions defined on a triangular mesh

73

The reader should notice that, because v is linear on each edge, v = 0 on r\ if and only if v = 0 at every node contained in PI. The dimension of V/, is the number of free nodes (the nodal values of any v e Vh at constrained nodes are already known). It is necessary to establish some notation to distinguish the free nodes and constrained nodes. The number of free nodes will be denoted by Nf and the number of constrained nodes by Nc. I define a sequence f\, fi,..., //v, so that

are the free nodes, and another sequence c\,C2,..., cNc so that

are the constrained nodes. EXAMPLE 4.1. As an example of the above notation, let 2 be the unit square,

with (so that r\ is the top edge of the square) and Vi = dQ \ PI (so that F2 consists of the left, bottom, and right edges of the square). Figure 4.5 shows a mesh defined on 2, indicating the enumeration of both the triangles and the nodes. In this mesh, Nt = 32 and Nv 25. Recall that the integers /. i , n, 2 > /,3 we the indices of the three vertices of the triangle T[. For example, Figure 4.5 shows that

Since the first 20 nodes are free, Nf = 20 and fk = k, k = 1, 2, . . . , 20. Nodes 21, 22, 23, 24, and 25 are constrained, so Nc 5 and

Figure 4.5. A triangulation of the unit square. The left graph shows how the 32 triangles are enumerated, while the right graph shows the enumeration of the 25 nodes.

74

## Chapter 4. Piecewise polynomials and the finite element method

It is important to note that the enumeration of the triangles and of the nodes is not unique; the same mesh can be enumerated in different ways. If V h is a subspace of P^ \ as described above, then a basis for V/, consists of those standard basis functions \lr\, fa, .., ^NV corresponding to free nodes. That is, is a basis for Vh. For convenience of notation, 1 will write 0* = tyfk, so that the basis for Vh can be written as

## Inhomogeneous Dirichlet conditions

In Section 2.3.3,1 explained how to handle inhomogeneous Dirichlet boundary conditions, at least in principle. The weak form of the BVP

is the following:

where G is any function in Hl(Q) satisfying the boundary condition G = g on F[. It may not be easy to find such a G exactly, but it is easy to define a function G that approximately satisfies the boundary condition. Indeed, the continuous piecewise linear function G defined by

agrees with g at the endpoints of every constrained edge and therefore interpolates g on FI . This function G is a sufficiently good approximation to g for the purposes of the finite element method (see Section 5.3). 4.1.2 The sparsity of the stiffness matrix

I will now discuss why it is advantageous to use Phl} (or a subspace) as the approximating subspace in the Galerkin method. One advantage is that it is easy to work with polynomials (and particularly linear functions). Evaluating, differentiating, and integrating them is simple. The second reason is that when the standard nodal basis is used, the resulting stiffness matrix is sparse, that is, has few nonzero entries. As I showed in Section 3.2, if the approximating subspace V/2 has basis {0i, 0 2 , . . . , 0^}, then the stiffness matrix belongs to RN/*N/ and has entries

## 4.1. Piecewise linear functions defined on a triangular mesh

75

Figure 4.6. The support of the standard basis functions (j)\ and (fr given in Figure 4.5. In the scalar model problem,

and the reason that K is sparse is simply that each standard basis function 0/ is zero over most of the domain fi. It follows that, for most choices of/ and j, the integral

is zero, since V0(- V0y- is zero over all Q. This is not true for all pairs i, j, i ^ j, but it holds for most of them. The sparsity of K will be illustrated in detail for the mesh in Example 4.1. Figure 4.6 shows the supports of <p\ and 0i9. The support of 0i is T\ U TI, while the support of 0i9 is Ti\ U T22 U T23 U ?3o U T3i U T32. Since these two supports are disjoint, it follows that V0i V0|9 is zero everywhere on Q, and therefore K\^ #191 = 0. The previous paragraph shows that some entries in K are necessarily zero. In fact, most entries are zero. An entry Ktj can be nonzero only if the supports of 0, and 0; have some nontrivial overlap (intersection just along an edge of a triangle is trivial, since an edge has area zero). Examining any triangulation, it is easy to see that this is true only if free nodes i and j are vertices of a common triangle, in which case the nodes i and j are called adjacent. For example, consider free node 13 in the mesh displayed in Figure 4.5. The support of the corresponding basis function 0)3 is shown in Figure 4.7. The only free nodes adjacent to node 13 are nodes 7, 8, 12, 13, 14, 18, and 19. Therefore, in the 13th row of K, only entries can be nonzero. In this example, the stiffness matrix is 20 x 20, and no row can have more than seven nonzeros (if node / is on or close to the boundary, then row / will have fewer than

76

## Chapter 4. Piecewise polynomials and the finite element method

Figure 4.7. The support of the standard basis function (j>\\$ for the mesh given in Figure 4.5. The free nodes of the mesh are labeled in this graph.

Figure 4.8. The sparsity pattern of the stiffness matrix K corresponding to the mesh given in Figure 4.5. The 20 x 20 matrix K has 82 nonzeros. seven nonzeros). Therefore, the matrix K is fairly sparse. The sparsity pattern of K (for ic(x, y) = 1) is illustrated in Figure 4.8. (The reader will notice only, at most, five nonzeros per row instead of seven as indicated above. Entries like "13,7 and #13,19 happen to be zero due to the symmetry in the mesh.) It should be noted that as a general rule, when a triangulation is refined the number of nodes adjacent to a given node does not increase. For example, the mesh in Figure 4.9 has four times as many triangles as the mesh in Figure 4.5. Assuming the same boundary conditions, this finer mesh would have 72 free nodes and K would be 72 x 72. However, a node in the center of the mesh would still have at most seven adjacent nodes. Therefore, the degree of sparsity of the stiffness matrix increases as the mesh is refined. For example, for the coarser mesh, the 20 x 20 matrix contains 82 nonzeros, so about 20% of the entries are nonzero (82 out of 400). For the finer mesh, the matrix is 72 x 72 and 326 entries are nonzero, which is about 6%.

77

## Figure 4.9. A finer mesh (compare the mesh in Figure 4.5).

4.2

I will now discuss the use of higher-order piecewise polynomials, beginning with continuous piecewise quadratics. I will continue to assume that the underlying domain 2 is polygonal and that the piecewise polynomials are defined on a triangular mesh.

4.2.1

A linear function f(x, y ) = a + bx + cy is determined by three parameters, and this fact makes it natural to use piecewise linear functions defined on a triangulation. A quadratic function is of the form

78

## Chapter 4. Piecewise polynomials and the finite element method

The space of all continuous piecewise quadratic functions defined on a given triangulation Th will be denoted P^ \ The particular functions allowed on each triangle are called the shape functions of the element. In the case of P^ \ the shape functions are quadratic polynomials. The additional nodes described above (the midpoints of every triangle edge) are now part of the triangulation, and I must establish some notation to describe this more complicated situation. The nodes of the mesh, including both triangle vertices and edge midpoints, will be denoted by zi, 12, . , ZN,,- I define m, y so that the nodes of triangle 7} are

## The vertices of 7} are still denoted by

(At certain times, for instance when integrating over a triangle, it is necessary to distinguish the vertices from the other nodes.) As in the case of P^ \ a Lagrange basis {\jf\ , ^2, . . . , V%1 of PH ) is defined by

A triangulation of the type described in this section, suitable for use with continuous piecewise quadratic functions, will be called a mesh consisting of quadratic Lagrange triangles. The simpler mesh described in the preceding section consists of linear Lagrange triangles. Figure 4.10 shows a mesh of quadratic Lagrange triangles, while Figure 4.1 1 shows two examples of the corresponding standard basis functions, ^5 and V^is, respectively. There are two essentially different types of basis functions for the space of continuous piecewise quadratic functions: i/r, looks like one of the functions in Figure 4.11, depending on whether Zi is a triangle vertex or an edge midpoint.

4.2.2

## The finite element method with quadratic Lagrange triangles

( it is not much harder to implement the Given the above description of the space hP\ finite element method using quadratic Lagrange triangles than it was with linear Lagrange triangles. The Galerkin method is completely abstract, and the only thing that changes in going from piecewise linear functions to piecewise quadratic functions is the choice of the approximating subspace. As an example, I will consider the BVP with mixed boundary conditions

where T| and F2 form a partition of 9^2. As before, any point where Fj and F2 meet must be a triangle vertex and must belong to Fj .

79

Figure 4.10. A mesh of quadratic Lagrange triangles (the nodes of the mesh are labeled).

Figu re 4.11. The standard basis functions 1/^5 (left) and ty \ 8 (right) for P^ on the mesh from Figure 4.10.

defined

It is necessary, as before, to distinguish between free nodes (those belonging to 2 or F2) and constrained nodes (those belonging to FI). The free nodes will be denoted as Zf\ , Z f 2 , . . . ,ZfN. and the constrained nodes as zC{, zC2 ,> zCNi., as before. The weak form of (4.12) is

80

## Chapter 4. Piecewise polynomials and the finite element method

In order that v e P^ satisfy v = 0 on FI, it suffices that v have value zero at every node belonging to Fj. This is sufficient because a triangle edge having a nontrivial intersection with FI lies entirely in F] by assumption. A basis for V^ is therefore

or

## and the load vector by

In this notation, the formulas are exactly the same as those presented in the case of piecewise linear functions. Of course, the basis function represented by 0, is now different. But, as the reader will see, the implementation of the method in a computer program does not change much when piecewise quadratic functions are substituted for piecewise linear functions. EXAMPLE 4.2. Figure 4.12 shows a mesh (for the unit square) consisting of quadratic Lagrange triangles. The mesh contains 32 triangles, 81 nodes, and 49 free nodes (Dirichlet conditions are imposed on the entire boundary). I computed the stiffness matrix K corresponding to the BVP (4.\2)for this mesh and for K ( x , y ) = 1. The sparsity pattern of the stiffness matrix is also shown in Figure 4.12. The matrix contains 405 nonzeros; since it is 49 x 49, about 17% of the entries are nonzero. The pattern of the nonzeros, though not the number of nonzeros, depends on the order in which the nodes are numbered. A refined meshfor the same domain is shown in Figure 4. 1 3. // contains 128 triangles, 289 nodes, and225free nodes. The stiffness matrix, which is also illustrated in Figure 4.13, contains 2229 nonzeros (about 4% of the entries).

4.3
4.3.1

## Cubic Lagrange triangles

Continuous piecewise cubic functions

## 4.3. Cubic Lagrange triangles

81

Figure 4.12. A mesh for the unit square consisting of quadratic Lagrange triangles (left) and the spars ity pattern of the corresponding stiffness matrix (right).

Figure 4.13. A refined mesh for the unit square consisting of quadratic Lagrange triangles (left) and the sparsity pattern of the corresponding stiffness matrix (right). and is therefore determined by 10 parameters. To define a continuous piecewise cubic function on a triangular mesh, it is necessary to add seven nodes to each triangle, in addition to the three vertices. The placement of these nodes is determined by the following fact: A cubic function in two variables, restricted to a line segment, reduces to a cubic function in a single variable (when the line segment is parametrized). A cubic function of a single

82

## Chapter 4. Piecewise polynomials and the finite element method

variable is determined by four nodal values, which means that each triangle edge must contain a total of four nodes. Then cubic functions defined on two adjacent triangles will agree on the common edge provided they agree at the four nodes, making it easy to guarantee the continuity of a piecewise cubic function determined by its nodal values. These four nodes consist of the two vertices (the endpoints of the edge) and the points placed at regular intervals between the two vertices. Thus if the two vertices are (*i, y\) and (x2, ^2), the other two nodes on the edge are

The reader should notice that the regular placement of the nodes on the edges guarantees that two triangles intersecting along an edge share four common nodes. Adding two nodes on each edge yields six additional nodes, or nine total when the vertices are taken into account. The final node must lie in the interior of the triangle, and it is natural to place it at the centroid of the triangle. Therefore, if the three vertices of the triangle are (jti, y\), (x2, y2), and (*3, 373), then the interior node will be

A triangulation consisting of triangles, each having 10 nodes as described above, is said to consist of cubic Lagrange triangles. Figure 4.14 shows a mesh of cubic Lagrange triangles defined on the unit square. It contains 8 triangles, 49 nodes, and 25 free nodes (Dirichlet conditions are imposed on the entire boundary).

Figure 4.14. A mesh for the unit square consisting of cubic Lagrange triangles (left) and the sparsity pattern of the corresponding stiffness matrix (right).

## 4.3. Cubic Lagrange triangles

83

As in the case of quadratic Lagrange triangles, the nodes of the mesh are denoted z\, Z2, , ZNI:. The 10 nodes of triangle 7) are denoted by

The notation for free and constrained nodes remains the same as before.

4.3.2

## The finite element method with cubic Lagrange triangles

Using continuous piecewise cubic functions in the finite element method merely means applying Galerkin's method with the space P^} (or an appropriate subspace taking into account the Dirichlet boundary conditions) as the approximating subspace. Here P^ denotes the space of all continuous piecewise cubic functions defined on a given triangulation ThThe standard basis for P\:) is where t/'/ is defined by

## The basis functions corresponding to the free nodes z j \ , z/2,..., z/N

are denoted by

where 0, = \fffi. Since there are three different placements for nodes on cubic Lagrange triangles (centroid, interior of edge, and vertex), the standard basis functions take three different shapes. These are illustrated in Figure 4.15. The formulas for the stiffness matrix and the load vector are the same as before:

Figure 4.15. The standard basis functions ^44, corresponding to a centroidnode (left); 1/O6, corresponding to an edge node (center); and ^5, corresponding to a vertex (right), for P(h } defined on the mesh from Figure 4.14.

84

## Chapter 4. Piecewise polynomials and the finite element method

Once again, as I will show in the second part of the book, the algorithm for computing K and F does not change very much in going from piecewise linear or quadratic functions to piecewise cubic functions. EXAMPLE 4.3. / computed the stiffness matrix for (4.12) and the mesh shown in Figure 4.14, taking K(X, y) 1. The spars ity pattern of the resulting K is shown in Figure 4.14. The matrix K is 25 x 25, and 229 of the 625 entries are nonzero (about 37%). Refining the mesh of Figure 4.14 by replacing each triangle withfour results in a mesh with 32 triangles, 169 nodes, and 121 free nodes. (This finer mesh is not shown.) About 11% (1513 out of1 4 641) of the entries are nonzero.

4.4

## Lagrange triangles of arbitrary degree

The constructions of the previous sections can be generalized to the case of continuous piecewise polynomial functions of degree d. As before, the placement of the nodes in a triangular element is determined by two requirements: 1. On each edge there must be d + 1 nodes, since a one-dimensional polynomial of degree d has d + 1 degrees of freedom (that is, d + 1 coefficients). Each edge contains two vertices, and the other d I nodes will be regularly spaced between them. The total number of nodes on the boundary of the triangular element is therefore 3 + 3(d-}) = 3d. 2. A polynomial of degree d in two variables is determined by

parameters, since there is one constant term, two terms of degree one (jc, _y), three terms of degree two (x2,xy, y2), and so forth, up to d + 1 terms of degree d (xd, xd~ly,..., xyd~}, yd). Since 3d nodes lie on the boundary of the triangle, the remainder must lie in the interior. A simple calculation shows that

For d = 1, d = 2, d 3, this formula yields 0,0, 1, respectively, for the number of interior nodes. This is consistent with the constructions of linear, quadratic, and cubic Lagrange triangles presented previously. Ford > 3, the interior nodes can be arranged on a triangular lattice, as shown in Figure 4.16 for d = 4, d = 5, and d 6. Once the Lagrange elements have been defined, the rest of the development proceeds as in the case of quadratic or cubic Lagrange triangles. The Lagrange basis for P^d\ [ty\, V^2. , ^N,}-, is defined by

85

## Figure 4.16. Lagrange triangles of degrees d = 4 (left), d = 5 (center), andd 6 (right).

where z\ , 12, , ZN,, are the nodes in the mesh. Each node on the boundary of the domain 2 is designated as constrained or free, depending on whether a Dirichlet condition is posed there or not. All nodes in the interior of 2 are free. The free nodes are zj\ , z/2 , . . . , ZfN , and the basis functions corresponding to these nodes are written as 0i , fa, . . . , 0#;-, where

The Galerkin method, which, as noted before, is completely abstract, is then applied with the approximating subspace

4.4.1

## Hierarchical bases for finite element spaces

One shortcoming of using Lagrange triangles in the finite element method is that the stiffness matrix K tends to become quite ill-conditioned as the mesh is refined. The consequence is that solving KU = F becomes more difficult, in that direct methods are less accurate and iterative methods are less efficient. I will discuss both direct methods and iterative methods for solving KU F in Part III. I will also define the condition number of a matrix and show the effects of a large condition number on algorithms for solving KU F. The ill-conditioning of K actually arises from the choice of basis for the approximating subspace V/7, not the choice of the subspace itself. For example, given any subspace Vh, I could choose a basis for V/, that is orthonormal with respect to the energy inner product. Then the stiffness matrix K, which is given by AT,-/ = a (0 7 ,0,-), would be the identity matrix, which is perfectly conditioned. This shows that the ill-conditioning of A' is not intrinsic, but rather arises from the choice of basis. An orthonormal basis may not be practical because of the expense involved in computing the basis. However, there are many other bases that could be used. One possibility is a hierarchical basis (see Yserentant [44]). A hierarchical basis is defined in a natural way when the mesh is obtained by several refinements of an initial, coarse mesh. I will describe the use of hierarchical bases in Section 11.2 in the context of solving KU = F.

86

4.5

## Other finite elements: Rectangles and quadrilaterals

A triangle is not the only possible shape for element domains. If the computational domain happens to be rectangular (or a union of rectangles), then rectangular elements are a natural choice. For nonrectangular domains, general quadrilateral elements can be used.

4.5.1

Rectangular elements

A rectangle has four vertices, so it is natural to consider a class of polynomials with four degrees of freedom. Since a linear polynomial is determined by three degrees of freedom and a quadratic by six, only certain quadratic polynomials will be allowed, namely, those of the form Since every product (a + fix)(y + Sy) of linear polynomials can be written in the form a + bx + cy + dxy, such polynomials are referred to as bilinear. Given a rectangle aligned with the coordinate axes, say with vertices (x\ , y\ ), (xi , y\ ), (*2, Jz), (x\, ^2), and given any four real numbers u\, HI, 3, 4 , it is straightforward to show that there is a unique polynomial a + bx + cy + dxy such that

(see Exercise 8). If 2 is a rectangle or a union of rectangles, then a mesh M.h of rectangles can be defined on 12. A bilinear function reduces to a linear function in one variable on any edge of a rectangular element. Therefore, it is easy to see that a collection of real numbers, one for each node in the mesh, determines a unique continuous piecewise bilinear function on M.h- (The reader should notice, however, that this property depends on the assumption that the rectangles are aligned with the coordinate axes; see Exercise 9.) I will denote the space of all such functions by B^\ If the nodes of the mesh are denoted z i , Z 2 , , ZNU, then the standard basis for fi^1} is {^i , fa, , Vfw,J K where

## 4.5. Other finite elements: Rectangles and quadrilaterals

87

The free nodes of the mesh are the nodes that do not belong to F|, and are denoted as before by Zft, Zf2, , ZfN.. The basis for B^]) is then {(f>\,4>2, , 4>Nf}, where 0, ^;. Just as in the case of triangular elements, the result is a matrix-vector equation KU F, where

and

4.5.2

Rectangular elements are useful when the domain 2 is very simple, but a more general domain cannot be well-approximated by a mesh of rectangles. A mesh of general quadrilaterals can be used; however, it is not so straightforward to define a space of piecewise polynomials. As Exercise 9 shows, nodal values do not determine continuous piecewise bilinear functions on a mesh of general quadrilaterals. The usual way to define shape functions on a mesh of quadrilaterals is to view each quadrilateral Q as the image of a reference square SR under a mapping of the form

The reference square SR is taken to be the square with vertices (1, !),(!, !),(!, 1), and (- 1 , 1 ). If the vertices of Q are (jci , y\ ), (JQ, ^2), C*3, >'3), and (jc4, >'4) (in that order as the 9 Q is traversed), then the mapping (4. 1 5) is determined by the conditions that ( 1 , 1 ) be mapped to (x\, y\), (1, 1 ) be mapped to (xi , 3/2 ) (1,1) be mapped to (x^ , j3),and( 1, 1) be mapped to (^4, >'4). These conditions yield two 4 x 4 linear systems that determine a\,a2,ai,ci4 andb\, bi, bj, b\$:

and

88

## and the two systems can be written

where

The inverse of M is

so explicit formulas for a and b are easily derived. The bilinear shape functions on SR are then mapped to Q, defining the shape functions for that element. The shape functions on SR are represented by the basis functions

(see Exercise 10). As the following example shows, the shape functions on Q are usually not bilinear. EXAMPLE 4.4. Consider the trapezoid Q with vertices (0, 0), (3,0), (2, 1), and (I, 1). Using the reasoning given above, the mapping from SR to Q is

## The basis functions on Q are then defined by

4.5. Other finite elements: Rectangles and quadrilaterals Direct calculation then shows that

89

The shape functions on Q are therefore rational functions. In the previous example, the nodal basis functions <j>\, fa, \$3, \$4 are linear on the edges of Q, as is easily verified (see Exercise 11). This property is true in general: Although the nodal basis functions, generated by the above technique for an arbitrary quadrilateral Q, are rational functions, each reduces to a linear function on the edges of Q. This fact can be used to prove that, on a mesh of quadrilaterals, the nodal values determine a unique continuous and piecewise rational function (see Exercise 12). I now assume that a mesh Mh of quadrilaterals is defined on a polygonal domain Q. As usual, the nodes of the mesh are denoted by z\, 22, , Zjv,,- The space of continuous piecewise functions on M.h, constructed from bilinear functions on SR as described above, is denoted by B(h]\ The nodal basis for B(h } is {i/o, fa, , ^N,}, where ^/ is defined by

for j = 1 , 2 , . . . , Nv. The free nodes are denoted Zf\, z / 2 , . . . , ZfN , and the approximating subspace V/j has basis {0i, 02, , 0jv,K where 0/ \fffi. To compute the finite element solution, the stiffness matrix K and the load vector F must be formed. These are defined by

and
Since each basis function 0, has support consisting of a few elements (four quadrilaterals, to be precise, unless the corresponding node is on the boundary), the basic calculations that must be performed are of the integrals

and

90

## Chapter 4. Piecewise polynomials and the finite element method

where Q is a typical quadrilateral in the mesh. In practice, the basis functions are not computed on Q; rather, each integral is transformed into the reference square SR so that the relatively simple bilinear functions y\, y2, y?, y* can be used instead. Given a quadrilateral Q, I will write z (x, y) for a typical point in Q and u (s, ?) for a typical point in SR. The transformation from SR to Q is denoted z F(w),

## According to the rule for changing variables in a multiple integral,

This can be applied directly to the formula for F/. In the following formulas, it is convenient to use local indices: For a given quadrilateral Q, the vertices are denoted by (jcj, >>]), (jc2, J2), C*3, ys), C*4, ^4) and the corresponding basis functions (the only ones that are nonzero on Q) are denoted by 0i, 02, 03, 04. Then, under the transformation F, 0, is transformed to y,: Here is the formula needed for assembling the load vector:

## To change variables in the integral

it is necessary to know the relationship between V0, on Q and Vy/ on SR. This follows from the chain rule:

Therefore,

91

4.6

## Using a reference triangle in finite element calculations

In the previous section I showed how to use a reference element to extend piecewise bilinear functions from meshes of rectangles to meshes of general quadrilaterals. By defining a oneto-one transformation from the reference element to an arbitrary quadrilateral, the necessary computations could be carried out over the reference element instead of the quadrilateral. The reader will recall that this was necessary because bilinear functions do not extend continuously across element boundaries when the elements are general quadrilaterals as opposed to rectangles. Although it is not necessary to use a reference element when the elements are triangles, it is advantageous to do so. This is because the basis functions and their gradients can be computed once on the reference triangle and then used, by means of a transformation, on each triangle in the mesh. In this section I will show how this affects the computations. The reference triangle TR is the triangle with vertices (s\, t\) (0,0), (s2, t2) = (1,0), and (53, rO = (0, 1) (see Figure 4.17). I denote an arbitrary point in TR by (s, t) or, in vector notation, u (s, t). Given an arbitrary triangle T with vertices z\ (x\ , y\), z2 = (x2, y2), zi (xi, yti, an arbitrary point in T will be denoted by (jt, y) or z = (x,y). The reference triangle TR is mapped to T by the following transformation, which sends (0, 0) to ( x { , y\), (1,0) to (x2, y2), and (0, 1) to (*3, y3):

or

92 where

## Given any function / defined on T, there is a corresponding function on TR defined

by or
The function g has the same values as / does, in the sense that if u e TR corresponds to ze T,theng(u) = f ( z ) . The three standard Lagrange basis functions that are nonzero on T will be written (using local indices) as 0,, i = 1,2,3; they are defined by

Corresponding to 0, on T is y/ on TR:

## Moreover, a little algebra shows that each y, is linear in (s, t). If

then

Since each y/ is linear over TR, it follows that y\, xz, K? are just the standard Lagrange basis functions that are nonzero over TR. The following formulas are easily derived from condition (4.18):

The same functions y\,yi, K? correspond to <f>\, fa, 03 over any triangle T. This is the efficiency gain I mentioned earlier: y\,y2, Y3 can be computed once instead of computing 0i, 02, 03 on each triangle in the mesh.

4.7. Isoparametric finite element methods The formula for a change of variables in a multiple integral gives

93

where g is defined by

## The Jacobian factor is

Since the transformation from TR to T is linear,6 J and its determinant are constant. Therefore,

This result applies directly to the problem of computing the load vector:

## To compute the stiffness matrix, it is necessary to evaluate integrals of the form

As in the previous section, the relationship between V0, and Vy, follows from the chain rule: Therefore,

Here K is the function on TR corresponding to K on T. In the piecewise linear case, both V0, and Vy, are constant. Although (4.21) might look a little complicated, it should be noticed that J is just a 2 x 2 matrix, and therefore computing J ~T Vy/ is a simple matter.

4.7

## Isoparametric finite element methods

To this point, I have assumed that the domain Q is polygonal, so that it can be triangulated exactly. If 2 has a curved boundary, then a triangulation ?/; can only approximately cover !T2, introducing a new source of error. I will denote by 2/, the polygonal domain triangulated by 7/i and begin with an example of the effect of approximating Q by 2/?.
6 The proper term is affine, not linear; the mapping u \-> Ju is linear, and affine means "linear plus a constant." But it is a common abuse of terminology to refer to an affine transformation as linear.

94

## where 2 is the unit circle and f is chosen so that the solution is

/ will apply the finite element method with four increasingly finer triangulations; Figure 4.18 shows part of the first two meshes near the boundary. The meshes will be denoted by 7i, ?2, 7s, ?4 (7i the coarsest, ?4 the finest), the corresponding polygonal domains by l\, ^2 , ^3 > &4> dnd the corresponding finite element solutions by u \, 2, 3, "4- Since 2 is convex, 2* C ^2 holds, andu^ will be defined to be identically zero onl\ik. The error u Uk then reduces to u Uk = u on Q \ 2*. Piecewise linear finite elements yield the results shown in the following table:

## Error on Q/t 3.433 KT 1.877- KT1 9.630 1C-2 4.848- 1C-2

1

2 3 4

Error onQ\&k
1.927 ID" 9.941 ID"2
1

## 5.010- 10~2 2.510- JO" 2

Here the errors are measured in the energy norm; the error on Q^ is

and the errors on ^2 \ ^ and 2 are defined similarly. The reader should notice that the error on &k is decreasing by about a factor of two each time the mesh is refined. The error on 1 \ 2* shows the same pattern, and so does the total error.

Figure 4.18. The first (left) and second (right) meshes from Example 4.5. The boundary ofl is the dashed curve.

## 4.7. Isoparametric finite element methods

95

The above results are satisfactory in the sense that the error due to approximating 2 by l[, does not change the rate at which the total error goes to zero. Suppose, though, that quadratic elements are used in hopes of obtaining a more accurate solution. Here are the corresponding results:

Error on Q/< 7.737 10~ 2.573 1(T2 8.597- 1(T3 2.179- 1(T3
2

2 3 4

Error onQ\Qk
1.927- KT 9.941 1(T2 5.010- 10~2 2.510- 10~2
1

## Error on Q 2.076- 10-' 1.027- 10-' 5.049- 10~2 2.519 -10" 2

The error on 2k is smaller than it was in the case of linear elements, and it also decreases faster. However, the error onQ\2k is not affected by the increased order of the elements. The improvement in going from linear to quadratic elements is therefore modest, and there would be almost no additional improvement in going to cubic elements. The preceding example shows that approximating a domain with a curved boundary by a polygonal domain makes it difficult to obtain a highly accurate solution, at least when a uniform mesh is used. There are at least two ways around this difficulty. One is to use a nonuniform mesh, with smaller elements near the boundary. This is illustrated in the following example. EXAMPLE 4.6. Figure 4.19 shows a nonuniform mesh defined on the unit circle 17 It has approximately the same number of triangles as the mesh Tifrom the previous example (245 versus 256). Using piecewise quadratic polynomials on this mesh leads to a finite element solution 2 with the following errors:

k 2

2

## Error on2\Qk 3.930 - 1C"

2

Error on 1

4.291 10~2

As this example shows, concentrating the triangles near the boundary leads to a smaller total error for the same computational effort. The drawback to the method of the previous example is that it is more difficult to create a sequence of meshes that are properly refined near the boundary so as to attain the accuracy that would be possible for a polygonal domain. I will discuss nonuniform meshes further in Part IV, but now I will turn to the second method of treating curved boundaries. The isoparametric method allows elements with curved edges, so that a curved boundary can be better approximated. The meaning of the word isoparametric is that the elements are parametrized as images of a reference element, with the parametrization given by polynomials of the same degree as the shape functions themselves. 1 will now explain carefully how this works for quadratic elements, and later extend it to higher-order elements.
7

This mesh was created using the mesh generator described in [33].

96

## Figure 4.19. A nonuniform mesh on the unit circle.

Figure 4.20. A subregion u> with a curved edge (left), and a quadratic Lagrange triangle approximating co (right).

4.7.1

For convenience, it is usual to consider elements with only one curved edge. Figure 4.20 shows a subregion CD that could arise in creating a triangular mesh on a circle. If ordinary quadratic triangles are used, as in the above examples, then co would be approximated by the triangle shown on the right in Figure 4.20. To get a better approximation, the midpoint node nearest the boundary could be moved to the curve, as in Figure 4.2 1 . Of course, the six nodes in Figure 4.21 do not lie on a triangle, but, as 1 will now show, the reference triangle TR can be mapped (approximately) onto u> by a quadratic mapping

## 4.7. Isoparametric finite element methods

97

Figure 4.21. An isoparametric quadratic Lagrange triangle T approximating the subregion a>. The curved edge of the isoparametric triangle is lying right on top of the (dashed) curve of the boundary and cannot be distinguished at this scale. This mapping, which will also be denoted (x, y) F(s,t)orz = F(u), is determined by 12 parameters: a\,..., 6, b\, , ^6- These 12 parameters are uniquely determined by the condition that the six nodes,

## These conditions take the form

a system of 12 linear equations determining the 12 unknowns. The subregion a> is then approximated by T, the image of TR under F. For this example, T is shown in Figure 4.21, where it is essentially indistinguishable from CD. The shape functions on T are determined by the standard Lagrange basis functions on TR. This works just as it did in the previous section. The standard basis on TR will be denoted [y\ , . . . , ye}', it is defined by the conditions

98
or

## Chapter 4. Piecewise polynomials and the finite element method

In the setting of Section 4.6 the mapping F was linear, so each 0/ was a polynomial of the same degree as y/; in fact, { 0 i , . . . , <fo} was just the standard Lagrange basis of quadratic polynomials. Now T is not a triangle and the mapping F is not linear, so the shape functions on T are not polynomials. This is similar to the situation for quadrilateral elements described in Section 4.5, where the shape functions on the reference element are bilinear but the shape functions on an arbitrary quadrilateral are not (cf. Example 4.4, in which the shape functions were rational functions). In fact, the quadrilateral element described in Section 4.5 is an example of an isoparametric element: The shape functions on the reference element are bilinear, and bilinear transformations are used to map the reference element to an arbitrary quadrilateral element. By the way, there is an explicit formula for the isoparametric transformation F in terms of the functions y\, YI, Y6'-

The validity of this formula is easily verified using (4.22), which shows, for example, that

as required. The formulas for changing variables in integrals, developed in the previous two sections, apply here as well. As in the case of bilinear quadrilaterals, the Jacobian of F is not a constant:

In (4.26), / is the function on TR corresponding to / on T (f(u) /(F(w))) and similarly for ic and K. Although 0, may be a rather complicated function, it can be shown that the components of V0, are rational functions (see Exercise 13). The expression det(7(s, /)) is easily seen to be a polynomial. Therefore, even if K is a constant, the integrand

is in general a rational function. The implication of this is that numerical integration, while useful for piecewise polynomials on ordinary triangles, is essential for isoparametric triangles (the same is true of general quadrilateral elements).

## 4.7. Isoparametric finite element methods

99

There is a technical point involved in applying (4.26): The standard theorem for the change of variables formula requires that the mapping F define a one-to-one correspondence between TR and T, and that det(7), where J is the Jacobian of F, not change sign on TR. (The mapping F must also be sufficiently smooth, but this is immediate, since F is defined by polynomials.) I will analyze the condition on the determinant, and leave the verification of the one-to-one correspondence as an exercise for the interested reader. Following the approach of Strang and Fix (see Section 3.3 of [41]), it is convenient to use an intermediate reference domain TR with nodes (0, 0), (0.5, 0), (1, 0), (4, u), (0, 1), (0, 0.5). The node (54, 4), which is the key to the analysis, is chosen so that the linear map F2 defined by

also maps (4, 4) to (x4,y4) (that is, (4, 4) = F2 I(x4,y4)). As long as the nodes (jci, yi), (XT,, ys), (x5, >'5) define a true triangle, F? exists and is invertible:

The desired map F from TR to T is then defined by F(s, t) = F2(F\(s, t)), where the mapping FI from TR to TR is defined uniquely by the conditions

and
It is straightforward to show that

(see Exercise 16). Writing J\ for the Jacobian of F] and J^ for the Jacobian of FI, it follows that Since F^ is linear and invertible, det(./2) is constant and nonzero. It remains, therefore, to analyze conditions under which det(/i) is nonzero. Now,

## and det(/i (s, /)) simplifies to

Since TR is a triangle and det(/i(s, 0) is linear, it suffices that det(/i(s, t)) have the same sign at (0, 0), (1,0), and (0, 1):

1 00 Therefore,

## Chapter 4. Piecewise polynomials and the finite element method

imply that det(J\), and hence det(/), does not change sign on TR. Loosely speaking, this means that if (X4, ^4) is in the middle half of the second edge of T, so that (4, i\$) is in the middle half of the second edge of TR, then the isoparametric element is well-defined and (4.26) can be applied. The requirement that det(J) ^ 0 is also necessary for the transformations described in Section 4.5, but that situation, involving bilinear maps from quadrilaterals to the reference square, does not admit such a concise conclusion as (4.32). A practical finite element code may monitor the signs of det(J) so as to flag any instances in which an element shape has become too distorted. EXAMPLE 4.7. For the sake of comparison, I will now solve the BVP from Example 4.5 using isoparametric quadratic triangles. I will not go to the trouble of computing the error on2\ Qk, because it can be proved that the error on Q is of the same order as the error on Qk (see [41, Section 3.3]). Applying the isoparametric method with quadratic triangles on the same four meshes TI , ?2, 7s, ?4 as in Example 4.5 yields the following results:

k 1 2 3 4

Error on 2*

## io-2 io-2 io- 3 io-4

The results show a great improvement over using ordinary quadratic triangles, and also compare favorably with a nonuniform mesh (without the difficulty of refining the mesh in a nonunif orm fashion).

4.7.2

## Isoparametric triangles of higher degree

Extending the isoparametric technique to higher-order elements presents no difficulty. For example, the subregion to from Figure 4.20 can be approximated as the image of TR under a mapping of the form

The 20 parameters a\ , . . . , a\\$, b\ , . . . , b\Q are determined by the condition that the 10 standard (cubic) interpolation nodes on TR are mapped to the corresponding 10 nodes on a). Two nodes must be chosen on the curved part of the boundary of CD, and a condition analogous to (4.32) would guarantee that these nodes are not placed improperly. To get as much accuracy as possible with the isoparametric method, interior nodes (if any) must be defined carefully; this comment applies when elements of degree three or greater are used. I will discuss how to place the interior nodes in Section 8.3.

4.8.

101

4.8

## Exercises for Chapter 4

1. In Section 4.1, it was shown that P(h]) is a subspace of //'(2) (when Q is a polygonal domain). (a) Recall that P A (I) consists of all continuous piecewise linear functions. Explain why the space of all piecewise linear functions (including discontinuous ones) does not define a subspace of H' (2). Where does the argument from Section 4.1 break down if the piecewise linear functions are not assumed to be continuous? (b) Let P^ } be the space of all piecewise constant functions defined on a given triangulation Th- What is the dimension of pf}! (c) Explain why P(h
}

## is not a subspace of Hl(Q).

2. Compute the stiffness matrix for the Dirichlet problem if the mesh of Figure 4.5 is used. Notice that there will be nine free nodes, namely, nodes?, 8, 9, 12, 13, 14, 17, 18, 19. 3. Let 2 be the unit square and consider the Dirichlet problem

Compute the stiffness matrix for the mesh of quadratic Lagrange triangles shown in Figure 4.10. Take nodes 5, 13, 14, 15, 17, 18, 20, 21,22 as the free nodes (in that order). 4. Let 2 be the unit square and consider the Dirichlet problem

Compute the stiffness matrix for the following mesh of cubic Lagrange triangles:

Let the free nodes be nodes 9, 16, 15, 10 (in that order).

1 02

## Chapter 4. Piecewise polynomials and the finite element method

5. A mesh of quadratic Lagrange triangles has two types of basis functions, those corresponding to vertex nodes and those corresponding to midpoint nodes. How many nodes are adjacent to a typical vertex node? To a typical midpoint node? How many nonzeros lie in each row of the corresponding stiffness matrix? 6. Answer the previous question for the three types of nodes in a mesh of cubic Lagrange triangles. 7. Let Q be the unit square and consider a uniform triangulation of Q created by dividing Q into n2 subsquares, each with side length h \/n, and then dividing each subsquare into two triangles. (Such a triangulation, with n 4, is shown in Figure 4.5.) Notice that the uniform triangulation 7] with2(2n) 2 = 82 linear Lagrange triangulation has the same nodes as the uniform triangulation Ti with 2n2 quadratic Lagrange triangles. Consider the stiffness matrices corresponding to 71 and ?2- Which is sparser? 8. Suppose Jt2 > x\ and yi > y\- Show that, for any i , 2> "3, "4> there is a unique bilinear polynomial p(x, y) such that

9. Show by example that bilinear functions on neighboring quadrilaterals can agree at the two common vertices and yet not agree on the edge determined by the two vertices, provided the quadrilaterals are not rectangles aligned with the coordinate axes. 10. Let the vertices of SR be numbered from 1 to 4 in the order (-1, 1), (1, 1), (1, 1), and (1, 1). Show that the corresponding nodal basis functions for the space of bilinear functions on SR are given by (4. 1 6). 1 1 . Show that the basis functions <f)\ , fa , fa , fa are linear on the edges of Q in Example 4.4. 12. Suppose Q is an arbitrary quadrilateral with vertices (x\ , y\), (X2, y2), C*3, ^3), and (jt4, ^4), let yi, Y2, K3, and y\ be the nodal basis functions on SR defined by (4.16), and let fa , fa, 03, and fa be the corresponding nodal basis functions on Q. (a) Suppose (s \ , 1\ ) and (52 , h ) are two points on one edge of SR that are mapped to (Ji , 7i ) and (Jt2 , 72) in Q. Show that, for a\ , 2 > 0, a\ + 2 = 1 , a\ (s\ J\) + 012(32,12) is mapped to ct\ (x\ , ~y}) + ot2(x2, J2). (b) Use the previous result to show that each 0, is linear on each edge of Q. (Hint: 0, is linear on an edge e if

whenever

4.8.

## Exercises for Chapter 4

1 03

(c) Suppose two quadrilaterals Q\ and Q^ share an edge e, and suppose r\, r-i are shape functions on Q\, Q^ respectively, constructed by the method described in Section 4.5. Show that if r\ and r2 agree at the endpoints of e, then they agree on the entire edge e. 13. Let TR be the usual reference triangle, suppose y(s, t) is a polynomial on TR, and suppose TR is mapped onto a region T by a one-to-one mapping F, where F(s, t) ( p ( s , t ) , q ( s , 0) and p and q are both polynomials. Define 0 on T by 0 (x , y) y(s, f), where ( x , y ) F(s, t). Assuming that the Jacobian J of F is nonsingular for each (s, t) e TR, prove that

has components that are rational functions of (s, t). 14. Let y\, yi, X3, given by (4.19), be the three linear basis functions on the reference triangle. Compute the integrals

and

15. Let y\,..., y^ be the six quadratic basis functions on the reference triangle. Find the formula for each y { ( s , t ) , i I , . . . , 6. 16. Show that (4.28) and (4.29) imply that F{ is given by (4.30). 17. Suppose the function K satisfies 0 < &o < K(X, y) < k\ for all (x, >') e Q and for some constants ko, k\. Suppose further that the function b satisfies 0 < b(x, y) < b\ for all (x, y) e Q and for some constant b\. Consider the BVP

The weak form of this BVP was derived in Exercise 2.7.12. Suppose Galerkin's method is applied, with a basis {w\,wj,... ,wn} for the approximating subspace. Show that this leads to the system

where

are defined by

1 04

Chapter 4. Piecewise polynomials and the finite element method The matrix K and the vector F are the usual stiffness matrix and load vector, respectively. The matrix M is called the mass matrix.

## 18. Suppose 2 is apolygonal domain, TH is a triangulation of 2, and / belongs to L 2 (fi). If

is the best approximation to / from PA(1) in the L2(2)-norm, what are the normal equations that determine the vector a of coefficients? (cf. the previous exercise)

Chapter 5

## Convergence of the finite element method

The convergence theory of the Galerkin finite element method is fairly straightforward in outline, although the details can be quite complicated. I have already shown how the method produces the best approximation, in the energy norm, to the true solution of the given BVP. To prove that the approximations converge to the true solution as the mesh is refined requires understanding how well a given function can be approximated by piecewise polynomials. The purpose of this chapter is to discuss this approximation theory, without, however, going too far into detail or proving the theorems. Since the Galerkin method is directly tied to the energy norm, convergence in the energy norm is obtained if the true solution is regular enough. It is sometimes desirable to know the rate of convergence in other norms, particularly the L2-norm. After I present the basic theory, I will show how to extend the results from the energy norm to the L 2 -norm.

5.1

## Approximating smooth functions by continuous piecewise linear functions

The purpose of this section is to discuss how well a function can be approximated by a continuous piecewise linear function. Before proceeding, I want to discuss the nature of the error bounds presented below. First of all, the theory states that the finite element method yields the best approximation to the true solution when the error is measured in the energy norm, which is related to the Sobolev norms. It is therefore reasonable to try to bound the error of approximation in terms of the L 2 - and //'-norms. Second, as the reader will notice below, the bounds given here are asymptotic, not absolute. This means that the bounds will not tell how small the error is when the solution is approximated on a particular mesh. Instead, the bounds show how the error decreases as the mesh is refined. These bounds can also be characterized as a priori error estimates, in that the bounds do not involve the computed solutionindeed, the bounds are given before any approximate solution is computed. An a priori, asymptotic error bound of this type is useful, but it leaves some unanswered questions, such as how fine the grid must be in order to attain a certain accuracy. In
105

106

## Chapter 5. Convergence of the finite element method

Part IV, I discuss how to form a posteriori error estimates that use the computed solution to estimate the actual error in that computed solution. Since the theory presented below describes how the error decreases as the mesh is refined, it is not surprising that there are limitations on how the mesh is refined. The fundamental rule is that the triangles cannot be allowed to get arbitrarily "skinny."

## 5.1.1 The standard refinement of a triangulation

Beginning with any triangulation of 2, a finer triangulation is formed by placing a new node at the midpoint of every edge of every triangle and joining these new nodes with new edges. This replaces each triangle in the initial triangulation with four smaller triangles, as in Figure 5.1. The resulting mesh is called the standard refinement of the original mesh. It is easy to show that each triangle in the refined mesh is similar to (that is, has the same angles as) its "parent" triangle in the original mesh (see Exercise 3). This property is important to the convergence theory, as I discuss below. Also, each triangle in the refined mesh is half the size of its parent triangle, so the mesh size of the refined mesh is half that of the original mesh. There are several other ways to refine a mesh. The standard refinement is the method of choice when the goal is a uniform refinement, that is, when all of the triangles in the mesh are to be refined. However, as I discuss in Part IV, it is often desirable to refine only some of the triangles in the mesh, in which case other methods have some advantages over the standard refinement.

5.1.2

## Nondegenerate families of triangulations

One way to describe the shape of a triangle T is to compare the largest circle contained in T to the diameter of T. The diameter of a set S is defined to be

## Figure 5.1. Standard refinement of a triangle.

5.1. Approximating smooth functions by continuous piecewise linear functions 107 For each triangle T, d? is defined by

The ratio dr/diam(T) is then a measure of how skinny the triangle T is. If this ratio is very small, then T is long and thin, whereas if the ratio is close to 1 /\/3 (the maximum possible valuesee Exercise 1), then T is close to an isosceles triangle. Now consider a family of triangulations with an individual triangulation denoted Th, where h is the maximum diameter of any triangle in T/,. The family {Th} is called nondegenerate if there exists a constant p > 0 such that

Repeated application of the standard refinement procedure produces a nondegenerate family of meshes. The reader is asked for a proof of this in Exercise 4.

5.1.3

## Approximation by piecewise linear functions

I begin by considering the use of continuous piecewise linear functions, so the approximating subspace in the finite element method is P^l) (or a subspace of /^(l)). The simplest way to produce an estimate on the smallest error in approximating u from P(h } is to compare the best approximation with the piecewise linear interpolant / of a:

The function M/ has the same nodal values as does u itself, which is why a/ is called the interpolant of M. If u/, is the best approximation to u from P^\ it follows that

Therefore, it suffices to bound the error in u / . The following theorem is proved in Chapter 4 of Brenner and Scott [13]. THEOREM 5.1. Suppose {Th} is a nondegenerate family of triangulations of a polygonal domain 2 C R2, and suppose u e H2(l). Then there exists a constant C depending on Q and the value p from the definition of nondegenerate (but not on u or h) such that

and
Here \u\Hi is the seminorm

and uj e Ph

108

## Chapter 5. Convergence of the finite element method

I hope this theorem will strike the reader as reasonable. The energy norm measures the derivatives of the function. In order to estimate how different the derivatives of a can be from those of its linear interpolant, it is necessary to know how fast the derivatives of u can change. This information is provided by the size of the second derivatives of u (the second derivatives measure the rates of change of the first derivatives), and therefore \u \ # 2 ( Q, appears in the upper bounds. A similar result holds if V is a subspace of Hl(Q), such as HQ(&), instead of Hl(&) itself and Vh is the corresponding subspace of P^ \ The energy norm,

is bounded by a multiple of the Hl(Q) error, and so the above result provides an upper bound on the energy norm error in approximating u. In this context, the error \\u w/ |U is said to be O(h), meaning bounded by a constant times h, as h goes to zero. In the next section, I will show that by using higher-degree polynomials, convergence rates of O(h2), O(/z3), and so forth can be obtained. First, though, I will give an example of piecewise linear approximation. EXAMPLE 5.2. This example illustrates the accuracy, in both the L2- and H1-norms, of piecewise linear interpolation for the function u(x, y) = x(\ x) sin (ny). The domain is 2 (0, 1) x (0, 1), the unit square, and the meshes comprise a sequence of regular triangulations of the type shown in Figures 4.5 and 4.9. The first mesh has a total of& triangles, each an isosceles right triangle with legs of length 1 /2 andh = \/2/2. Successive meshes are obtained by the standard refinement described earlier (the second mesh is the one shown in Figure 4.5). The results are as follows:

## 4.1361 IP"1 2.2448 IP"1 1.1450-IP'1 5.7536 10~2

The reader should notice that \\ u u /1|L2 (Q) is decreasing approximately ash2 (that is, when h is divided by 2, the error is divided by approximately 4) and || w/ ||//I(Q) is decreasing approximately as h (when h is divided by 2, so is \\u / \\H[(&))-

5.2

## Approximation by higher-order piecewise polynomials

The theory in the case of Lagrange triangles of degree d is similar to that described in the previous section. The main result is that increasing the degree of the piecewise polynomials increases the order of approximation in both the L 2 - and //'-norms, provided that the

## 5.2. Approximation by higher-order piecewise polynomials

109

function being approximated is smooth enough. The reader will recall that P^} is the space of continuous piecewise polynomials of degree d relative to a given mesh Th The piecewise polynomial interpolant of degree d of a function u is

where

## is the standard Lagrange basis for

THEOREM 5.3. Suppose {Th} is a nondegenerate family of triangulations of a polygonal domain Q c R2, and suppose u e Hd+^ (2). Then there exists a constant C depending on 2 and the value pfrom the definition of nondegenerate (but not on u or h) such that

and
Here |M|#<'+I(Q) is the seminorm

andiii Pfr

## denotes the piecewise polynomial interpolant of degree dofu.

A proof of this theorem is found in Brenner and Scott [13, Theorem 4.4.20]. EXAMPLE 5.4. The following tables show the errors in approximating

using piecewise quadratic, cubic, and quartic functions. The meshes are the same as in Example 5. 2. First, the piecewise quadratic case: 5.0000 IP"1 2.5000 10~ 1.2500 IP" 1 6.2500 10~2 Next, the piecewise cubic case: 5.0000 IP"1 2.5000 IP"1 1.2500-10-' 6.2500 lO" 2 1.0447 10~3 6.8667 \(T*~ 4.3455 IP"6 2.7244 10~7 2.4357 10~2 3.1764 10~3 4.0125 IP"4 5.0288 10~5 7.8059 10~3 1.0413 - 10~3 1.3227 10~4 1.6600-10- 5 1.2655 IP'1 3.3340 1(T2 8.4444 1(T3 2.1180-KT 3

110

## Chapter 5. Convergence of the finite element method

Finally, thepiecewise quartic case: 5.0000 IP"1 2.5000 IP"1 1.2500 IP"1 6.2500 10~2 1.1860-1Q-4 3.8542 10~6 1.2162 IP"7 3.8098 10~9 3.6516-IP"3 2.3635 1(T4 1.4901 10~5 9.3335 10~7

In each case, the reader can verify that the asymptotic error bound is satisfied. For example, in the quartic case, the error in the Hl-norm should be O(h4), so that decreasing h by a factor of 2 should decrease \\u /||//i(fi) by a factor of approximately 16. The actual factors are given in the following table: Decrease in 5.0000 IP"1 2.5000 IP"1 1.2500- IP"1 6.2500 10~2 15.450 15.861 15.965

5.3

## Convergence in the energy norm

Using the results of the previous section, I can now give a convergence theorem for the finite element solution of the BVP

where 2 is a polygonal domain and dQ r\ U T2 is a partition of dQ. It is assumed that K is a function defined on 2 satisfying 0 < ko < K < k\ for some constants ko and k\. The variational form of (5.4) is

where and G Hl (2) satisfies G g on FI . The reader will recall that the bilinear form

is bounded and V-elliptic; that is, there exist positive constants a and ft such that

## 5.3. Convergence in the energy norm

111

The finite element method is just Galerkin's method with the approximating subspace Vh = {v e P(hd} : u = 0 o n r t } :

In this equation, G/, is a continuous piecewise polynomial interpolating g at the boundary nodes belonging to F| . It is easy to derive an error estimate for M/, in the case that g, and hence G and G/,, are zero, that is, in the case of homogeneous Dirichlet conditions. Then there is no need to introduce w and &>/,, and a, Uh satisfy

and

and

## In particular, since the piecewise polynomial interpolant / of u belongs to V/,,

and

Assuming that Lagrange triangles of degree d are used, the approximation results of the previous section yield
and

where C is a positive constant that is independent of u and h, and C" = Cfi/a. 1 now want to show that estimates (5.9) and (5.10) can be extended to the case of inhomogeneous Dirichlet conditions. To obtain the full rate of convergence using piecewise polynomials of degree d, the Dirichlet data must be smooth enough. It is assumed that there is a function G e Hd+[ (2) such that G = g on PI , where g is the Dirichlet data, and that (5.6) is solved with Gh = G/, the interpolant of G. Then u = w + G, w/, = wh + G/, and (in either norm)

11 2

## Chapter 5. Convergence of the finite element method

The bounds (5.9) and (5.10) apply to the term \\w - wh\\, since wh is the finite element solution of a variational problem of the same form as (5.5). Thus

Theorem 5.3 applies directly to the term ||G G/1|, and thus

Therefore,

If G is chosen so that it is orthogonal to V in the Hd+] (1) inner product,8 then, by the Cauchy-Schwarz inequality,

and thus

## This yields the estimate

In practice, G/, is not chosen to be G/ (since typically G, and hence G/, are unknown). Instead, G/, is defined to be the piecewise polynomial defined by the following nodal values:

However, any continuous piecewise polynomial interpolating g at the constrained nodes (including G/) yields the same computed solution uh as does Gh. To prove this, I assume that GJ,I} e P(hd) agrees with g at the constrained nodes, that w^ e V/, satisfies

and that u(hl) = w(^ + GJ^. I need to show that wj,0 = uh.
8 Any G e //'/+1 (Q) satisfying G = g on r\ can be replaced by G - VG, where DC is the orthogonal projection onto V of G in the //'/+l(2)-norm. Then, since VG 0 on PI, G DC = g on PI, and G VG is orthogonal to V by the definition of orthogonal projection.

5.3. Convergence in the energy norm Let g ( l ) e RN" be the vector of nodal values of GJ^. Then

1 13

where G/, is the piecewise polynomial whose nodal values agree with g at the constrained nodes and are zero elsewhere, and g ( l ) e RNf is defined by g(k } = g^\ Then, writing [/(i) e R/V/ f or me vector of (free) nodal values of u(h * and similarly for W(l\ it follows thatt/ ( 1 ) = W ( I ) + ( l ) . Now, for any i = 1 , 2, . . . , N/, the load vector F ( l ) is defined by

Therefore, F ( l ) F Kg(l\ where F is the load vector corresponding to G/,, and thus

This shows that the computed finite element solution w/, is the same no matter which interpolant Gh is used, and therefore (5.11) holds. The following theorem summarizes the results of this section. THEOREM 5.5. Suppose 2 is a polygonal domain in R2 and let {Th} be a nondegenerate family of meshes on 2 consisting ofLagrange triangles of degree d. Assume K is defined on 2 and there exist constants ko, k\ such that 0 < ko < K < k\ on 1 Finally, assume that the solution u of (5.5) satisfies u e Hd+*(Q), and let UK be the finite element solution of the BVP relative to the mesh Th. Then there exist constants C, C', both independent ofu and h, such that

114

## Chapter 5. Convergence of the finite element method

and
The following example illustrates Theorem 5.5. EXAMPLE 5.6. Suppose 1 is the unit square, and consider the BVP

## where K(X, y) = 1 + xy2 and f is chosen so that the exact solution is

Using linear Lagrange triangles on a sequence of regular meshes, the following errors are obtained: 5.0000-IP"1 2.5000 IP"1 1.2500-1Q-1 6.2500 10-2 4.0757 IP"1 2.2369 IP"1 1.1441-10"1 5.7524 10~2

The function u and the meshes are the same as in Example 5.2, which examined the error in the piecewise linear interpolant u\. Comparing the results from these examples shows that \\u uh \\Hi(Q) and \\u-uj ||//I(Q) are very similar. The following tables give the errors in the finite element solution using piecewise quadratic, cubic, and quartic polynomials. They can be compared with the interpolation errors from Example 5.4. Quadratic Lagrange triangles:

115

## 2.9626-1Q- 3 1.9044-Kr 4 1.1961-1Q- 5 7.4744-1Q- 7

// is also of interest to examine the error in the energy norm defined by the coefficient K, since the basis of finite element Galerkin theory is that \\ u w/, ||E is as small as possible. To confirm this, the following table compares (for the linear triangles) the energy norm error n to the same error inu\.

## 4.2675 - 10"' 2.3781 IP"1 1.2238-IP"1 6.1640 10~2

Although the differences are not large, the results show that uh does indeed have a smaller energy norm error than /.

5.4

## Convergence in the L2-norm

There is a standard trick, called a duality argument, for deriving an L2-estimate from an energy norm estimate. This argument requires that solutions of the BVP under consideration have the elliptic regularity property, namely, that the solution has two degrees more of smoothness (in the weak sense) than the right-hand side of the PDE. The elliptic regularity property is not difficult to understand. Consider a one-dimensional BVP of the form

## The solution can be obtained directly by integrating twice:

From this formula and the fundamental theorem of calculus, it is obvious that if / is continuous, then u is twice continuously differentiable. It is not at all obvious that such a property would extend to BVPs in multiple dimensions, since solutions cannot be obtained by direct integration. However, if the

116

## Chapter 5. Convergence of the finite element method

geometry of Q is not too complicated and if any coefficients appearing in the PDE are smooth enough, then elliptic regularity holds. The usual model problem,

will be used for illustration. For example, in two-dimensional problems, if / L2(Q), K is smooth, and either FI or F2 is empty (that is, the boundary conditions are either pure Dirichlet or pure Neumann), and either dQ is smooth or Q is convex, then the solution u is guaranteed to belong to //2(7). Moreover, there is a constant C such that Proofs can be found in Rauch [34] or, for the case of a nonsmooth 92, in Grisvard [23]. Elliptic regularity can be used to derive an L2-estimate on the error u w/,, where u is the solution to the variational form of (5.12) and Uh is the piecewise linear finite element solution for a corresponding approximating subspace V/,. Here is the duality trick: Writing a(-, ) for the usual energy inner product for (5.12) and (, ) for the L2 inner product, w e V is defined to be the solution to the variational problem

where V is the appropriate variational space (V = HQ (2) for a Dirichlet problem or V V for a Neumann problem, where V is defined as in Section 2.5). Since u uh e L 2 (2), the solution u; belongs to H2(2). Then

## Since the interpolant w{ belongs to V/,,

and
which imply that

By the elliptic regularity assumption, w e H2(Q) and therefore, by the interpolation results given in Section 5.2, there is a constant C such that

5.4. Convergence in the L 2 -norm It follows that By the elliptic regularity assumption, there is another constant C such that

11 7

Combining ft and the two constants denoted above by C into a new constant, also denoted by C, yields

or
Finally, applying the estimate derived in the previous section for \\u /j||//i(Q) yields (with a new value for the constant C). This should be compared with the estimate In the L 2 (2)-norm, another factor of h is obtained; the error is O(h2) instead of O(h) in the energy or //' (2)-norms. EXAMPLE 5.7. This is a continuation of Example 5.6. Here the errors in the L2-norm are recorded: Linear Lagrange triangles: 7.0613 10~2 2.2713 10~2 6.0681 IP"3 1.5429-10-3

5.0000-10"' 2.5000-10-' 1.2500-10-' 6.2500-10-2 Quadratic Lagrange triangles: 5.0000-Kr 1 2.5000 1Q-1 1.2500-10-'" 6.2500 10~2 Cubic Lagrange triangles:

118

## Quartic Lagrange triangles:

5.0000. 10-1 1.2326.10-4 2.5000. 10-1 3.9821. 10-6 1.2500.10 -1 1.2552. 10-7 6.2500.10-2 3.9273. 10-9

The reader can verify that the expected asymptotic rates of convergence are observed. For example, in the quartic case, the L1-error should be O(h5), so reducing h by a factor of 2 should reduce the error by a factor of approximately 32. Here are the actual results:
Decrease in 5.0000. 10-1 2.50000.10-1 30.954 1.25000.10-1 31.725 6.2500. 10-2 31.961

5.5

Variational crimes

The title of this section is a phrase coined by Strang [40] to describe violations of the variational framework, whose theory has been described in the preceding sections of this chapter. Two examples of variational crimes are the use of numerical integration (quadrature) and the use of isoparametric finite elements to approximate curved boundaries. When quadrature is used, the stiffiiess matrix K and the load vector F are not computed exactly, so the theory as presented above does not apply directly. On the other hand, when 2 is not polygonal, the finite element method solves a problem on an approximate domain 2/,. In this case also, the theory developed in the preceding sections does not apply. In this section I will briefly summarize extensions of the theory covering these variational crimes. I will use the model problem

to illustrate the ideas. The results described here can be extended to inhomogeneous boundary conditions and other PDEs.

5.5.1

Numerical integration

## 5.5. Variational crimes

119

must be computed, where T is a triangle in the mesh and 0,, 4>j are basis functions that reduce to polynomials when restricted to T. These integrals may be difficult or impossible to compute exactly, depending on the form of K and /. To estimate the above integrals, a quadrature rule of the form

can be used. Here (jc|r), y(jT)), j 1, 2, . . . , n, are the quadrature nodes on T and w(j \ j 1 , 2, . . . , n, are the corresponding quadrature weights. Specific quadrature rules will be presented in Part II. For this discussion, the specific rules are not important; only the concept of degree of precision is needed to analyze the effect of quadrature on the finite element method. A quadrature rule has degree of precision p if it integrates polynomials of degree p or less exactly. Since the finite element method is based on piecewise polynomials, it is natural to classify quadrature rules by their degrees of precision. However, since the coefficient K and the forcing function / need not be polynomial, the integrals (5.14) may not be computed exactly regardless of how high the degree of precision of the quadrature rule. Some kind of analysis is therefore required to prove that K and F are computed accurately enough that convergence is still obtained. Furthermore, mere convergence is probably not acceptable; it would be desirable to compute the integrals accurately enough that the rate of convergence, as presented in the previous sections, is unchanged. The effect of quadrature is to replace the variational problem

by
where a/, (-, ) and if, are defined by the quadrature rules (applied element by element) rather than by the usual integrals. There are then three functions to consider: the exact solution u of (5.13), the solution w/, of (5.15) (analyzed in the previous sections), and the solution uh of (5.16). By the triangle inequality,

If, for example, Lagrange triangles of degree d are used, then, by Theorem 5.5,

## It would be desirable, then, that

also hold. It is easy to guarantee (5.17), although the proof is quite involved. The entries in the stiffness matrix K are assembled from the integrals

120

## Chapter 5. Convergence of the finite element method

If the coefficient function K happens to be constant, then the integrands are polynomials of degree 2d 2, and hence the integrals will be computed exactly by a quadrature rule having degree of precision 2d 2. It turns out that such a quadrature rule, although not exact if K is nonconstant, nevertheless leads to a solution uh satisfying (5.17). The analysis, which I briefly outline below, assumes that the same quadrature rule is used for all integrals (those that contribute to K and those that contribute to F).

5.5.2

## Outline of the analysis of the effect of quadrature

The proof of the above conclusion is based on the following two results: LEMMA 5.8. Let V be a Hilbert space, Vh a finite-dimensional subspace ofV, (-,-) a symmetric, bounded, V-elliptic bilinear form, and t a bounded linear functional on V. Further, letah(-,-} be a symmetric, bounded, Vh-elliptic bilinear form andlh be abounded linear functional on Vh. Ifuf, is the unique solution to

## and Uh is the unique solution to

then

Proof. Since

and
Therefore,

This completes the proof. The preceding lemma is completely elementary, but has the following consequence. THEOREM 5.9.
and be as in the preceding lemma. Suppose

5.5. Variational crimes and there exists a constant C > 0 such that

121

Then

## Proof. By the preceding lemma,

Applying the V^-ellipticity to bound the left-hand side below and the hypothesis to bound the right-hand side above yields

or

## as desired. As discussed above, it is desired that

so p should be d in the preceding theorem. Therefore, it is first necessary to show that when a quadrature rule having degree of precision 2d 2 is used, the approximate bilinear forms /,(-, ) are uniformly Vh-elliptic:

This is straightforward if the quadrature weights are assumed to be positive, as is true for many common quadrature rules (cf. Sections 7.1 and 8.1). It is then necessary to show that a quadrature rule having degree of precision 2d - 2 results in
and

The proofs of these results are quite involved and will not be given here; probably the best source is Ciarlet [16, Section 4.1].

5.5.3

## Isoparametric finite elements

When Q, has a curved boundary and isoparametric finite elements are used, an approximate domain 2h is involved. This has several implications: Essential boundary conditions will not, in general, be satisfied exactly as they are in the case of a polygonal domain. The domain Q^ may extend outside of Q in places, and the problem functions K, / may not be defined on Qh\Q. The shape functions on elements with a curved boundary will not be polynomials, raising the question of whether a quadrature scheme based on integrating polynomials exactly will be adequate.

122

## Chapter 5. Convergence of the finite element method

The issues raised in the previous paragraph can all be surmounted by an analysis that is very similar in outline to that given above. One begins by establishing bounds on the interpolation error for isoparametric elements. This is possible only if the elements in the meshes Th are not too distorted from triangles, which imposes certain constraints on the construction of the meshes (similar to the definition of a nondegenerate family of meshes). These constraints impose conditions on how far the nodes on the curved boundary can be from the corresponding nodes on the ordinary (nonisoparametric) triangle, and also on the placement of the interior nodes (if d > 2)conditions which can be satisfied if the boundary of 2 is smooth or piecewise smooth. Next, an error bound analogous to Theorem 5.9 is established, the uniform V/jellipticity of the approximate bilinear form is verified, and finally, bounds analogous to (5.23) and (5.24) are established. All of this is carried out in detail for the case of quadratic Lagrange triangles in Ciarlet [16, Sections 4.3 and 4.4]. The result of all this analysis (see page 269 of [16]) is that the error using isoparametric finite elements goes to zero at the same rate as if ordinary Lagrange triangles were used on a polygonal domain. Moreover, this conclusion is valid when the same quadrature rule is used as for ordinary triangles (degree of precision 2d 2).

5.6

## 1. Show that if T is an isosceles triangle, then

2. Suppose (jcj, y\), (x2, yi), (x^, ^3) are the vertices of a triangle T. The barycentric coordinates (a\, 0.2, oti,) of (x, y) e T are defined by

Let di denote the distance from (jc, y) to the side of T opposite (jc,-, >>/) and let h, denote the distance from (jc/, yt) to the opposite side of T. (a) Show that (Hint: i. First show that the line L(a^) defined by

(e*3 e (0, 1) fixed) is parallel to the edge of T joining (x\, y\) and te, ^2) ii. Find the distance between the line L(a^) and the line through (jci, >>]) and (*2, yi) and show that it equals ^3/13. (The proofs for d\ and di are then exactly analogous.))

## 5.6. Exercises for Chapter 5

123

(b) A circle is inscribed in T if and only if its center (a\, ct2, o^) satisfies d\ = d2 d-\,. Using this condition, show by direct calculation that the center of the inscribed circle is given by

## and the radius of the circle is

(c) Now let b\ be the length of T opposite (^3, ^3), b^ the length of the side opposite (jci, >'i), and 3 the length of the side opposite (x2, ^2). Show that the radius of the inscribed circle can be written

where |T| denotes the area of T. (d) Let dT be the diameter of the largest circle contained in a triangle T, as defined in Section 5.1.2. Prove that

for all triangles T. 3. Let T be any triangle and suppose T is refined to four triangles by joining the midpoints of the edges of T, as in Figure 5.1. Prove the four new triangles are each similar toT. 4. Suppose a sequence of triangulations To, T\, T2,... is formed by standard refinement: Each 71- is the refinement of Tk-\ Prove that these meshes form a nondegenerate family: There exists p > 0 such that

(Hint: Use the preceding exercise.) 5. Let 2 be a polygonal domain in R2 and let To, T\,... be a sequence of triangulations of 2 formed by standard refinement. Let P(k]) be the space of continuous piecewise linear functions relative to 7*. Assume that u e H] (2) (but u is not necessarily in

124

## Chapter 5. Convergence of the finite element method

H2(Q)), and let u(k) be the best approximation to u from P(kl) in the //1(^)-norm. Prove that (Hint: The space // 2 (fi) is dense in Hl(&), so there is a sequence {u/} in H2(2) converging to u in the Hl(Q)-norm. Let vj} be the best approximation to Vj from T^. Use the fact that, for each j,

Part II

Chapter 6

## The mesh data structure

This part of the book discusses the implementation of the finite element algorithm in computer programs. In order to make the discussion as straightforward as possible, this chapter and the next will focus on the implementation of linear Lagrange triangles for the model problem

where 2 is a polygonal domain and dQ = F] U ["Y After carefully developing programs to handle the above problem, 1 will extend them in Chapter 8 to handle higher-order Lagrange triangles on polygonal domains and then to handle curved boundaries using the isoparametric method. Finally, in Chapter 9, BVPs more general than (6.1) will be treated. In Section 6.1,1 discuss the important issues that must be resolved in order to write a program implementing the finite element method, and outline the overall strategy. Then, in Section 6.2, a data structure for storing the mesh is presented.

6.1 Programming the finite element method 6.1.1 Assembling the stiffness matrix
The finite element method, applied to (6.1), produces a matrix-vector equation KU F, whose solution vector U contains the nodal values of the approximate solution function. There are three important steps in applying the finite element method: Creating a mesh on the computational domain 1. Computing the stiffness matrix K and the load vector F. Solving the linear system KU F.
127

128

## Chapter 6. The mesh data structure

In this part of the book, I will mostly concentrate on the second step. Section 6.3 gives examples of several different ways to generate a mesh on a given domain. Part IV presents algorithms for local refinement of meshes; these algorithms are intended to produce a mesh that is custom designed for a given problem. I defer the discussion of the solution of KU = F to Part III. For examples in this part of the book, the linear systems will be solved by the direct solver for sparse systems in MATLAB. Throughout the following discussion, Th is a triangulation of the polygonal domain Q and PA(1) is the space of continuous piecewise linear functions defined on Th- The stiffness matrix K corresponding to the BVP (6.1) is defined by

where {</>], 0 2 > > <t>Nf} is the basis for the approximating subspace Vh and

In the case of linear Lagrange triangles, the subspace V/, is the following subspace of P^ \ as described in Section 4.1.1:

The basis {<p\, fa,..., (f>Nf} consists of the standard basis functions for P^ } that correspond to the Nf free nodes in the mesh. The reader should recall the following requirement on the triangulation: Any point where F] and F2 meet must be a node in the mesh, and this node is considered to belong to F]. Most entries Ky of the stiffness matrix are zero, since the corresponding integrand K V0y V0, is zero throughout 2. For those entries KJJ that are not zero, the support of KV(f>j V0/ consists of a few triangles. One strategy for computing K is to loop over all i, j pairs, determine if KJJ is nonzero, and, if it is, compute the integral that defines it. If KIJ is nonzero and the support of K V0y V0/ is

then

To compute these integrals, it is necessary to compute the basis functions </>/ and 07 (or, actually, their gradients) on each of the triangles T r { , Tr2,..., Tfl. Algorithm 6.1 expresses this approach to computing K. This algorithm can be described as node-oriented, since it involves looping over the nodes in the mesh. The reader will notice that only the upper triangle of K is computed directly, since the matrix is known to be symmetric (Kji = K^).

## 6.1. Programming the finite element method

Initialize K to the zero matrix for for Determine if K is nonzero if Determine the triangles support of Set for forming the

129

Compute
Compute Set

and

## Algorithm 6.1. Node-oriented algorithm for computing the stiffness matrix K.

Figure 6.1. The support 0/013 in a certain mesh. The triangles are labeled in the left graph, while the free nodes are labeled in the right graph. One problem with Algorithm 6.1 is that the value of any given V</>, on any particular Tk will contribute to Kfj for several (usually three) values of j. For example, for the mesh illustrated in Figure 6.1, the value of V0!3 on T2o contributes to #13,12, #13,13, and #13,18 (and, by symmetry, #12.13, #18.13). Therefore, it must be computed repeatedly (at the cost of some inefficiency) or stored after it is computed (at the cost of some inconvenience). It would be preferable, if possible, to compute V0, just once on each triangle in its support, use its value, and then discard it. The simplest data structure describing a triangulation is the triangle-node list. This consists of two arrays: The node array contains the coordinates of the nodes, and the triangle array contains three indices for each triangle, identifying the nodes (from the nodes array) that are the vertices of the given triangle. When Algorithm 6.1 is executed, it is necessary to loop over the vertices of the triangles and to know, for a given vertex, which other vertices are adjacent to it. This implies storing the "connectivity" information of the mesh (that is, storing, for each vertex, the indices of the adjacent nodes). This connectivity information is contained in the triangle-node list, but only implicitly. It would be inefficient

130

## Chapter 6. The mesh data structure

to search through the list of triangles and vertices to determine the connectivity of the vertices. Algorithm 6.1 therefore requires that both the triangle-node list and the connectivity information be stored explicitly. It turns out that by adopting a different strategy for computing K, both of the above problems can be circumvented: 0, on Tk need be computed only once, and the connectivity information need not be stored explicitly. The idea is to loop over the triangles in the mesh and, for each triangle, compute the contributions to all entries KJJ that are affected by the given triangle. This is actually quite easy to do. Given a triangle Tk, the only basis functions whose support has a nontrivial intersection with Tk are those corresponding to the vertices of 7^. There are at most three such basis functions (fewer if one or more vertices are constrained). If all three vertices of Tk are free and the corresponding basis functions are then the following entries of K are affected:

The contribution to KI ^ is

To be precise,

where "H " represents integrals of KV<j>ep V4>tr/ over the other triangles that form its support. The integrals computed over Tk are often collected in a 3 x 3 matrix called the element stiffness matrix (over Tk):

This matrix need not be formed explicitly (except possibly as a programming convenience); rather, its entries are added to the corresponding entries of K. When computing the entries of the element stiffness matrix, it may be advantageous to compute the integrals by transforming to the reference triangle, as described in Section 4.6. The advantages of using a reference triangle will be discussed in Chapters 7 and 8. As always, the symmetry of A' should not be ignored. It is necessary to compute only six of the nine entries of the element matrix, namely, those in the upper triangle. If one of the three vertices of Tk is constrained, then Tk contributes to only four entries of K, while if two of the vertices are constrained, then Tk contributes to a single entry in K. It is possible that all three vertices of Tk can be constrained, but this could hold for only a few triangles in a given mesh, for example, those lying at the corner of a rectangle.

## 6.1. Programming the finite element method

131

Algorithm 6.2 incorporates the above ideas. The reader should recall that the vertices
of are Initialize k to the zero matrix for for for if and are both free
Find the indices Compute and of and in the list of free nodes and add it to

and to

Algorithm 6.2. Element-oriented algorithm for computing K. To implement this algorithm, it is necessary to know, for each triangle Tk, the nodes z , j 1, 2, 3. This information is required by any conceivable scheme, since integrals over Tk must be computed, and is contained in the triangle-node list. In addition, it must be possible to determine if a given vertex z is free or not. If it is free, its index in the list of all free nodes must be known. I have already established the following notation: The free nodes are enumerated 1 , 2, . . . , Nf and the vertices are enumerated \,2, . . . , Nv. Free node j is vertex z /,. . That is, I have established a mapping from j e {1, 2, . . . , N/} to /) e { 1 , 2 , . . . , Nv}. This mapping is necessarily one-to-one, so it has an inverse mapping defined by /?/ = j if and only if j e {1, 2, . . . , Nf} and i = f j . Except in the case that every node is free, the quantity /?, is not defined for some i e { 1 , 2, . . . , Nv}. For each node zn, it is necessary to store pn or a flag indicating that zn is constrained. I will present a convenient way to do this in the next section. For now I just point out that, given this information, the above algorithm is efficient and easy to implement. Since I will need it later, I will also define <?, = j if and only if j e { 1 , 2 , . . . , Nc} and i = c / . This establishes the analogous relationship for the constrained nodes. Other issues that must be addressed are the computation of the gradients of the basis functions on each triangle and the computation of the integrals over the triangles. These will be addressed in Chapter 7.

6.1 .2

The algorithm for computing the load vector F is exactly analogous to the algorithm for assembling the stiffness matrix. If the boundary conditions are homogeneous (that is, if g and h are zero in (6.lb) and (6.1c)), then the components of F are defined by

if the support of

then

132

## Chapter 6. The mesh data structure

As in the case of the stiffness matrix, the contributions to the components of the load vector are computed while looping over the triangles in the mesh. The result is Algorithm 6.3.
Initialize f to the zero vector for for if is free in the list of free nodes Find the index of and add it to Compute

Algorithm 6.3. Element-oriented algorithm for computing F in the case of homogeneous boundary conditions. The reader should notice that the index of znk, in the list of free nodes is This sort of indirect indexing is essential for the computer implementation of the finite element method.
Inhomogeneous Dirichlet conditions

When the Dirichlet conditions are inhomogeneous (g / 0 in (6.1b)), then the formula for Fi has an extra term,

where G is a function interpolating the Dirichlet data g on T\. When computing F by looping over the triangles, integrals of the form

must be computed. It is usual to choose, for G, the continuous piecewise linear function satisfying

But zn & T] if and only if zn is a constrained node. Therefore, it is necessary to know, for a given node zn, whether it is constrained or not and, if so, its index qn in the list of constrained nodes. The mesh data structure must contain this information if it is to be used for problems with inhomogeneous Dirichlet conditions.
Inhomogeneous Neumann conditions

If the Neumann condition is inhomogeneous (that is, if h ^ 0 in (6.1c)), then the formula for Fi contains an additional term:

6.1. Programming the finite element method When computing F, it is therefore necessary to compute integrals of the form

133

where e is a triangle edge lying in YI, that is, a free boundary edge. These contributions to F can be computed while looping over the triangles (along with the contributions from the right-hand side of the PDE and the inhomogeneous Dirichlet conditions). Then the values of </>/ on Tk need only be computed once. This suggests that the data structure should record the edges of each triangle, with an indication of which lie on the boundary and which boundary edges are free. The edges of a triangle Tk are fully specified by their endpoints, which are the nodes z,,k,, ink 2 , Znk 3 It would be sufficient, then, for the algorithms described thus far to augment a triangle-node list with flags describing the edges as interior edges, free boundary edges, or constrained boundary edges. However, for the purpose of local refinement (in which some triangles in a given mesh are refined, but others are not), it is necessary to identify not only the edges of a given triangle but also the triangle on the other side of each edge. This information is not easily extracted from the triangle-node list. Therefore, instead of two lists (triangles and nodes), the data structure defined in the next section includes three lists: triangles, edges, and nodes. 1 have already introduced the notation for the triangles and for the nodes. 1 will denote the edges in the mesh by and define indices kit\,fc/,2so that Zk,, and Zk, 2 are the endpoints of e,. Since 1 need to point from a triangle to its edges, I define indices Sk,\, s^.i, and %3 such that Tk has edges Each triangle is identified by its three edges, each edge by the nodes forming its endpoints, and each node by its coordinates. To these lists will be added arrays of flags and pointers as needed for the algorithms. For example, the mappings i \-+ fi and j i-> PJ can be stored as arrays of pointers. From the resulting data structure, all of the information needed by the algorithms described in this chapter can be easily extracted. Explicitly storing the triangle-edge and edge-vertex lists is convenient for handling inhomogeneous Neumann conditions and essential for local refinement. The vertices of a triangle can be cheaply extracted from these lists, so there is no significant penalty for eliminating the triangle-node list. When the Neumann data are specified for input to the algorithm for computing the load vector, it should be easy to determine which boundary edges are free (without searching through the list of all triangles). I will denote the number of free boundary edges by N/, and the free boundary edges themselves will be denoted The mapping / i-> b/ will also be stored.

134

6.2

## The mesh data structure

I will now define a data structure for describing a triangular mesh on a polygonal domain in R2. The data structure describes linear Lagrange triangles and will be extended in later chapters for higher-order Lagrange triangles. The necessary notation for referring to triangles, edges, nodes, free nodes, constrained nodes, and free boundary edges has already been established in Chapter 4 and Section 6.1. I should emphasize that the data structure presented here is based on simple arrays, as is appropriate for a procedural style of programming. Using object-oriented programming, it would be natural to encapsulate the mesh data structure together with access functions and other manipulations in a class (in the C++ programming language, for example). However, I will not pursue this extension here. The data structure that I present is designed for ease of use, not to minimize the amount of storage used. There is a trade-off between storing information explicitly and recomputing it when needed; the first approach uses more memory and the second more time. The report by Beall and Shephard [11] discusses this trade-off in detail. Below, in Section 6.3,1 give more details about the amount of memory used (see also Exercise 1).

## 6.2.1 The list of nodes

The most basic information required about the mesh is the list of nodes and their coordinates. The order of the nodes in this list is completely irrelevant. The nodes are z\ , Zi, . . , ZNV and their coordinates are (x\ , y\), (x^, ^2), , (XNV, /)' respectively. Therefore, the Nv x 2 array Nodes is

Associated with Nodes are three arrays of pointers, allowing one to retrieve the index of a given node in the list of free or constrained nodes, or to retrieve the index of a given free or constrained node in Nodes. The Nf x 1 array FNodePtrs (free node pointers) contains the pointer into Nodes for each free node:

CNodePtrs (constrained node pointers) is the analogous array for the constrained nodes:

## 6.2. The mesh data structure

135

In Section 4.1.1, I defined the notation used above: f\, fa,..., /#, are the indices of the free nodes in the list of all nodes. That is, z/, is the /th free node. Similarly, zc. is the iih constrained node. Finally, the Nv x 1 array NodePtrs contains the information necessary to determine, for each node in Nodes, if it is free or constrained and its index is in the list of free or constrained nodes. I use a small trick here to put all of this information in a single array:

The reader will recall that if node z/ is free, then its index in the list of free nodes is /?/, while if it is constrained, its index in the list of constrained nodes is q,. By examining the sign of NodePtrs ( i ) , one can determine whether z/ is free or constrained (positive for a free node, negative for a constrained node). The value of NodePtrs ( i ) (or its negative) then gives the index of z/ in the list of free or constrained nodes.

## 6.2.2 The list of edges

The Ne x 2 arrays Edges describes the edges by listing their endpoints, or, to be precise, by listing the indices of their endpoints in the Nodes array. My notation for the endpoints of edge [ is zki.,, z/t, 2 , and thus

The reader should notice that there is no preferred order for listing the endpoints of a given edge, but whatever order is chosen in the Edges array must be followed in the other parts of the code.

6.2.3

## The list of elements

Each triangular element is defined by its three edges. I have already established the notation estj to denote the jth edge of triangle Tk. The edges are listed in counterclockwise order, and each edge has an orientation defined by the order in which the endpoints are listed in Edges. Since the orientation of the edge is important in certain circumstances, it is convenient to indicate it explicitly. 1 therefore define the N, x 3 array Elements by

The positive sign is taken if, in traversing the boundary of triangle Tk counterclockwise along edges e X k l , eSk2, eSk 3 , edge eSk. is followed in its orientation defined in Edges. Otherwise, the negative sign is recorded.

136

## Chapter 6. The mesh data structure

Figure 6.2. A mesh with two triangles, T\ and 1^. The edges are e\ (bottom), e^ (left), 3 (diagonal), 64 (right), ande*, (top). To make this clear, consider the mesh shown in Figure 6.2. There are two triangles, T\ and TI, five edges, e\, 2, 63,64, e\$, and four nodes, z\,Z2,Zi, z\. If the Edges array is defined to be

then

This indicates that triangle T\ has edges 62, 3, ^5, with e^ traversed in the positive direction (that is, the direction defined by Edges) and e-i and e\$ traversed in the negative direction. Triangle T^ has edges e\, 64, 63, with e\ and 64 traversed in the positive direction and e^ traversed in the negative direction. At various points in the algorithms presented in subsequent chapters, it is necessary to identify each edge as being an interior edge, a free boundary edge, or a constrained boundary edge. For example, when refining a mesh, new nodes are added. If a new node belongs to an interior edge or a free boundary edge, then it will be free, but if it belongs to a constrained boundary edge, then it will be constrained. When doing local refinement, it is also necessary to know, for interior edges, the triangles on both sides of the edge. I define indices f/j, ?,,2 such that Tti, and Ttn are the triangles on the two sides of e-t. Ifei is a boundary edge, then f/,2 is defined to be 0. The reader will recall that the free boundary edges are denoted e^, ebl,..., et,Nb. I define / i-> af to be the inverse of j t- bj (that is, a, = j & i = bj). Then, if et is a free boundary edge, its index in the list of all free boundary edges is at.

## 6.2. The mesh data structure

137

The above information is collected in an Ne x 2 array called EdgeEls. If the /th edge is an interior edge, then

6.2.4

## The list of free boundary edges

The final information that must be recorded about the mesh is the list of free boundary edges, which must be available to deal with an inhomogeneous Neumann condition. I define the Nh x 1 array FBndyEdge s so that the /th entry contains the index bt of the /th free boundary edge:

6.2.5

## Other fields in the mesh data structure

Since the data structure and algorithms will be extended in later chapters to handle piecewise polynomials of degree greater than one, the mesh data structure will contain an integer called Degree that has value 1 for linear Lagrange triangles, 2 for quadratic Lagrange triangles, and so forth. The Ne x 1 array EdgeCFlags indicates, for each edge, whether it corresponds to a curved piece of the boundary. EdgeCFlags (i) is 0 if et is an interior edge or a boundary edge corresponding to a straight piece of d2. On the other hand, if e\ is a boundary edge approximating a curved piece of 3Q, then EdgeCFlags ( i ) is 1. The array EdgeCFlags is used by the Ref inel algorithm (described below) to automatically approximate a nonpolygonal domain by a polygonal mesh. When a domain has a curved boundary, some mechanism is required to describe the precise boundary. Since both the refinement algorithm and the use of isoparametric finite elements require points in the interior of given arcs of the boundary curve, a curved boundary is described by a function BndyFcn with the following property: Given two points on the boundary and a positive integer k, the function returns k 1 points in the interior of the arc having the given endpoints. The refinement algorithm uses k = 2, that is, just one point between the two given points is needed (see Figure 6.10 below). For generating isoparametric Lagrange triangles of degree d, k = d is used. Whenever the domain has a curved boundary, BndyFcn must be provided.

138

## Chapter 6. The mesh data structure

Two optional fields, LevelNodes and Node Parents, are added to the data structure for meshes created by the Re f ine 1 algorithm. The meaning of these fields is described in the next section.

6.3

## The MATLAB implementation

There is nothing in the data structure described above or the algorithms described in the following chapters that is specific to MATLAB; they could equally well be implemented in any programming language, such as Fortran, C, C++, and so forth. However, I provide an implementation in MATLAB, and at various points I will add some details about these MATLAB routines. For convenience, the various arrays in the mesh data structure are stored in a MATLAB struct. A struct is a data structure that collects two or more data fields in one object that can then be passed to routines. The mesh struct has the following fields: Degree

Elements, Edges, Nodes EdgeEls, EdgeCFlags, FBndyEdges NodePtrs, FNodePtrs, CNodePtrs BndyFcn (if necessary) LevelNodes, NodeParents (optional)
If an instance of the mesh struct is given the variable name T, then one can refer to any of the fields using the syntax T. FieldName (for example, T. Nodes). As mentioned earlier, there is a trade-off between memory usage and computational time that must be considered when designing a data structure. I have chosen to store needed information explicitly rather than recompute it, which means that my data structure uses a relatively large amount of memory. To put this into context, I can compare the storage requirement of the mesh to the storage requirement for a corresponding stiffness matrix. For a scalar PDE, such as the model problem (6.1), the mesh takes about three to four times as much memory as the stiffness matrix. However, the mesh only occupies this much memory because I am using the default MATLAB data types, which store all numbers, even integers, in double precision. If all the integer pointers are stored as 2-byte integer values rather than 8-byte double precision numbers, the mesh would take approximately the same amount of storage as the stiffness matrix itself (see Exercise 1). Such storage would be natural in a compiled language such as Fortran, C, or C++, and can be attained in MATLAB at the cost of some inconvenience. When meshes of higher-degree triangles are used, the stiffness matrix is less sparse and the memory used by the mesh data structure is less significant by comparison. For a system of PDEs, such as the equations of linear elasticity, the memory required for the mesh does not change, but the size of the stiffness matrix increases (by a factor of four for a 2 x 2 system). Therefore, for systems of PDEs, the memory used by the mesh is also less significant.

139

6.3.1

## Generating a mesh by refinement

Given an arbitrary polygonal domain 2, the simplest way to create a mesh is probably to define a coarse mesh and then refine it. I have provided a routine called Ref inel that takes a triangulation To and returns the standard refinement T described in Section 5.1.1 (each triangle is replaced with four triangles obtained by inserting edges joining the midpoints of the original triangle edgessee Figure 5.1). The algorithm used in Ref inel, which is outlined in Algorithm 6.4, is straightforward, though somewhat tedious; it mainly consists of careful bookkeeping.
Copy Nodes, Nodeptrs, FNode ptrs, and CNodeptrs from To to T Copy Node Level and Nodeparents form To to T (if they exist in To) be the number of triangles in To Let for for If edge j of trinagle i has not been bisected already Create the midpoint of edge j of triangle i Create the corresponding two new edges Begin updating Elements, EdgeEls, Nodeparents, EdgeCFlags, and FBndyEdges to create three new interior triangles i T Create the three new interior edges in T Finish updating Elements and Edge Els in T: Create the fourth interior triangle in t fpor Determine if the new midpoint of edge j is free or constgrained Update Node ptrs, FNode ptrs, and CNodeptrs in T

Algorithm 6.4. Algorithm for forming the standard refinement T of a triangulation ToThere are several important features to note from this algorithm: The nodes from To are also nodes in T, and they are guaranteed to be the first nodes in the Nodes array of T. When a mesh Tk is obtained by repeated refinement of an initial coarse mesh To via a sequence of intermediate meshes T T , . . . , T*-i, there is the possibility of using a hierarchical basis in place of the standard nodal basis. Hierarchical bases will be described in Section 11.2. When employing a hierarchical basis, it is necessary to know which nodes belong to TI but not to TI-1. I denote this set by A//; thus A// is the set of nodes added during the z'th refinement. Because Ref inel adds new nodes to the end of the list, the sets A/o, M A/Jt can be identified by simply recording the number of nodes in each of the meshes To, TI, . . . , Tk- These numbers are recorded in the array LevelNodes, which is automatically added to the mesh data structure

140

Chapter6. The mesh data structure when a mesh is created by Ref ine 1; on repeated refinements, the LevelNodes array is updated. The number of nodes in mesh 71 is stored in LevelNodes ( i+1 ) .

When one mesh is obtained from another by refinement, the new nodes are all obtained as midpoints of edges in the original mesh. It is often necessary to know the endpoints of these edges. For example, this is needed in the hierarchical basis approach, and it is also needed when a piecewise linear function from the coarse mesh is interpolated to obtain the same function on the fine mesh. Therefore, Ref ine 1 adds or updates an Nv x 2 array called NodeParents. If j\ = NodeParents ( i , 1), 72 = NodeParents ( i , 2 ) , then zi is the midpoint of the line segment with endpoints Zjl , Zj2 . This means that

The code itself (Ref ine 1 . m) is carefully documented and the interested reader can consult the source file to see how it is implemented.

6.3.2

## Generating a mesh from a triangle-node list

Meshes can be generated by many different methods; indeed, mesh generation itself is an active area of research. Any mesh generation tool (that creates a triangulation) will, at the least, produce a triangle-node description of the mesh. For example, MATLAB itself provides a routine called delaunay that takes a list of nodes and produces the Delaunay triangulation, defining the triangles precisely by the triangle-node list. The Delaunay triangulation has the property that the circumscribed circle around each triangle does not enclose any nodes except those of the given triangle. This gives one way to create a mesh: Define a list of nodes on the domain and then compute the Delaunay triangulation for those nodes. A simple MATLAB mesh generator di stmesh2 d is due to Persson and Strang [33].9 It is based on two user-defined functions. The first function d defines the domain by giving the signed distance from any point (x, y) to the boundary (negative if (x, y) is inside the domain) so that the boundary is defined by d(x, y) = 0. The second function h defines the desired triangle diameters: The relative diameter of a triangle containing the point (jc, y) should be about h(x, y) (and thus h(x, y) = 1 would give an approximately uniform mesh). The distmesh2d routine also describes the resulting mesh by the triangle-node list. To allow a broader use of the code that accompanies this book, 1 wrote a routine, MakeMeshl, that takes a mesh described by a triangle-node list and generates the data structure described in this chapter. The following example illustrates some possibilities. EXAMPLE 6.1. Suppose an (approximate) triangulation of the unit circle is needed. Here 1 present the results of three different approaches.
^Currently available at http: //math.mit .edu/~persson/mesh.

## 6.3. The MATLAB implementation

141

Figure 6.3. A very coarse mesh on the unit circle (left) and the result of refining it three times (right).

Figure 6.4. A set of nodes on the unit circle (left) and its Delaunay triangulation (right).

In the first method, I define a (very) coarse mesh consisting of four triangles; this is shown in Figure 6.3 (left). This mesh is then refined three times, resulting in the mesh shown on the right in Figure 6.3. This construction is explained in detail in Example 6.6 below. Another way to proceed is to define nodes uniformly spaced on the boundary, together with those nodes on a regular rectangular grid that lie in the interior of the circle (but not too close to the boundary). Such a set of nodes is shown in Figure 6.4 (left). The delaunay routine in MATLAB then computes the Delaunay triangulation, which is also shown in Figure 6.4 (right). Finally, the distmesh2d routine described above, when asked for a uniform triangulation, yields the mesh shown in Figure 6.5. The three meshes generated in this example have 256, 263, and 251 triangles, respectively.

142

## Figure 6.5. A triangulation of the unit circle generated by distmesh2d.

Figure 6.6. An intentionally bad initial mesh on the unit circle (left) and the result of refining it four times (right).

6.3.3

## Assessing the quality of a triangulation

Not every way of generating a triangulation produces equally good results. For example, Figure 6.6 shows an initial mesh for the unit circle that consists of a single triangle, and the result of refining it four times; the final mesh has 256 triangles. A close-up of part of the mesh is shown in Figure 6.7, which shows that some of the triangles are quite distorted. This suggests that this mesh may not be satisfactory in the finite element method. EXAMPLE 6.2. The effect of the mesh on the finite element method will be illustrated on the BVP

## 6.3. The MATLAB implementation

143

Figure 6.7. A close-up of the final mesh from Figure 6.6. where 1 is (bounded by) the unit circle and f and g are chosen so that the exact solution is u(x, y ) = sin (jc)y2. The solution was computed on the meshes 71, Ti, Ti, T\$ shown in Figures 6.3, 6.4, 6.5, and6.6, respectively, yielding the following results:
Mesh

T1 T2 T3 T4

## 0.154 0.152 0.132 0.170

These results suggest that 7s is the best of the four meshes, while T^ is the worst. The a priori error bounds presented in Chapter 5 all contain unknown constants, which, however, are known to depend on p, the lower bound on dT/d\am(T) forTeTh (cf. (5.1) and (5.2)). The smaller the value of p, that is, the further the worst triangles are from equilateral, the larger the bound on the error and, presumably, the larger the error itself. Thus the quality of a mesh can be measured by computing

for each T e Th- The factor of >/3 is included so that an equilateral triangle (the ideal) has a quality measure of 1. The computation of dj was discussed in Exercise 5.6.2. A related measure of the quality of a triangle is two times the ratio of the radius of the inscribed circle to the radius of the circumscribed circle. The factor of 2 is included so that an equilateral triangle has a quality measure of 1 in this case also. If the lengths of the three sides of T are b\,b2, bi, then

where rinsc is the radius of the inscribed circle and rcirc is the radius of the circumscribed circle (see Exercise 2).

144

## Chapter 6. The mesh data structure

A third quality measure for a triangle is the measure of the smallest angle, divided by n/3 (once again, so that an equilateral triangle will have quality 1). EXAMPLE 6.3. To illustrate the above quality measures, the quality measures q\,q2, q-\ were computed for the meshes 71 , ?2, 7s, T^from Example 6.2. The minimum and mean quality,

## are given in the following table:

Mesh
T1 T2 T3 T4 0.701 0.8944 0.806 0.916 0.568 0.792 0.588 0.744 0.658 0.848 0.572 0.0.752 0.620 0.917 0.702 0.971 0.637 0.889 0.0428 0.604 0.00572 0.656 0.0341 0.571

Mesh TI clearly has the best average triangle quality, while T\ is the worst from the standpoint of both average and minimum triangle quality. Comparing these results with the errors in the finite element solutions in Example 6.2 suggests that average triangle quality is the best indicator of the quality of the mesh. The MATLAB implementation includes a MeshQualityl routine that computes q\, q2, or #3 for the triangles in a given mesh. MATLAB has a help command for getting details about built-in and user-defined commands; one can type, for example, "help MeshQual ityl" at the MATLAB prompt to get details about the MeshQual i tyl command. The help command can be used to get the precise usage and other details about all of the MATLAB routines that accompany this book.

6.3.4

Viewing a mesh

To visualize a mesh stored in the data structure described above, I wrote a routine called ShowMeshl, which takes as input a mesh and (optionally) an array of flags indicating how the mesh is to be labeled. The nodes, edges, and triangles can be labeled in the figure. The following examples show how to define and plot meshes. EXAMPLE 6.4. Let Q be the region bounded by the triangle with vertices (0, 0), (1, 0), and (0, 1), and assume that Dirichlet conditions are to be imposed on the left edge (joining (0, 0) and (0, 1)) and Neumann conditions elsewhere. The following quantities define a triangulation ofl consisting of a single triangle:

## 6.3. The MATLAB implementation

145

Figure 6.8. A coarse mesh and three successive refinements (see Example 6.4). (The nodes and triangles are labeled in (a) and (b) but not in the others.)

This coarse mesh was refined three times by repeated application of the command T = Ref inel ( T ) . Figure 6.8 shows the originalmesh and'the three successive refinements. Plot (a) shows results from the command ShowMeshl (T, [ 1 , 0 , 1 , 0 ] ) . The first three flags accepted by ShowMeshl determine whether the nodes, edges, and/or triangles are enumerated, and the fourth flag whether the nodes are unmarked (flag = 0), marked with a dot (flag = 1), or marked with a circlefor afree node and a starfor a constrained node (flag = 2). Thus, ShowMeshl (T, [ 1 , 0 , 1 , 0 ] ) means that the nodes and triangles are enumerated. Plot (b) in Figure 6.8 was produced by the same command, while (c) and (d) were produced by ShowMeshl ( T ) , which is equivalent to ShowMeshl (T, [ 0 , 0 , 0 , 0 ] ) . EXAMPLE 6.5. Let 2 be the region bounded by the pentagon with vertices (0, 0), (1, 0), (1.5,0.5), (0.5, 1.5), and(Q, 1), and assume that Dirichlet conditions are to be imposed on all the edges except the edge e with endpoints (1.5, 0.5) and (0.5, 1.5), on which Neumann

146

## Chapter 6. The mesh data structure

conditions are imposed. In order that the triangles have the same area (which is not a requirement), I put nodes at (0.5, 0.5) and(\, 1), in addition to the five vertices. My initial triangulation ofQ has seven nodes and six triangles:

Next I refine this coarse mesh three times with the command'T = Ref inel ( T ) . Figure 6.9 shows the original mesh and the three successive refinements. The edges are enumerated in the coarsest mesh, and the free and constrained nodes are distinguished in the second mesh. The commands to plot the first two meshes are ShowMeshl (T, [ 0 , 1 , 0 , 0 ] ) and ShowMeshl (T, [ 0 , 0 , 0 , 2 ] ) , respectively.

## 6.3. The MATLAB implementation

147

Figure 6.9. A coarse mesh and three successive refinements. The edges are labeled in the coarsest mesh. In the second mesh, the free nodes are indicated by a small circle and the constrained nodes by a small star.

6.3.5

## Handling a domain with a curved boundary

I briefly described above how a mesh approximating a domain with a curved boundary can be refined. The mesh data structure includes an array EdgeCFlags, which indicates whether an edge is approximating a piece of a curved boundary or not. If it is, instead of bisecting the edge during refinement, the curve can be bisected instead to generate the new "midpoint." The edge is still replaced with two (straight) edges, but these edges now follow the boundary. The process is illustrated in Figure 6.10.

148

## Chapter 6. The mesh data structure

If any EdgeCFlags is nonzero, a "boundary function" must be included as the field BndyFcn in the mesh data structure. This function is called whenever an edge approximating an arc of a curved boundary is to be refined; the endpoints of the edge are passed in, and the boundary function must return the midpoint of the corresponding arc of the boundary curve. EXAMPLE 6.6. Let 2 be the unit circle, and assume that Dirichlet conditions are to be imposed everywhere on the boundary. Using the strategy described above, I can begin with a coarse mesh (consisting of only four trianglesa poor approximation to 2) and obtain a good approximate mesh after a few refinements. Here is the initial mesh, which is shown in Figure 6.11:

Since 2 has a curved boundary, T. BndyFcn is set to 'Circlef'; Circlef . m defines a boundary function, of the type described above, for circles centered at the origin. I refine this coarse mesh three times. Figure 6.11 shows the original mesh and the three successive refinements.

6.3.6

## Viewing a piecewise linear function

When working with the finite element code presented in this book, it is helpful to be able to graph a continuous piecewise linear function defined on a triangular mesh. I have provided a routine, ShowPWLinFcnl, for this purpose. In order to use ShowPWLinFcnl, it is

## 6.3. The MATLAB implementation

149

Figure 6.11. A coarse mesh and three successive refinements. necessary to specify the nodal values of the desired function at all nodes in the mesh. The calling sequence is ShowPWLinFcnl(T,u,g) The input T is the mesh data structure, u is a vector containing the nodal values corresponding to the free nodes, and g is a vector containing the nodal values corresponding to the constrained nodes. Thus u has Nf- components and g has Nc components. If the values at the constrained nodes are all zero, or if every node is free, then it is not necessary to provide the vector g. EXAMPLE 6.7. Consider the function u(x, y) = x2 + >'2 defined on the unit disk. IfT is the third mesh from Figure 6.11, then the following MATLAB commands evaluate the function u at the nodes of the mesh and plot the resulting piecewise linear function. The first line uses the built-in MATLAB functions inline and vectorize to define u as a function of two variables, while the second line evaluates u at the nodes of the mesh T, producing the free and constrained nodal values. The graph produced by the third line is shown in Figure 6.12.

150

## Chapter 6. The mesh data structure

Figure 6.12. Continuous piecewise linear function ShowPWLinFcnl (see Example 6.7).

graphed

using

6.3.7

MATLAB functions

Functions that deal exclusively with meshes of linear triangles have a name ending with " 1." The reader should recall that the MATLAB command help f cnname will display information about the function named f cnname, which is implemented in the file f cnname . m. Meshl: Type help Meshl to see a description of the mesh data structure. Generating sample meshes: - RectangleMeshDl Generates a uniform mesh on a rectangle [0, lx] x [0, ty \ (Dirichlet conditions). - RectangleMeshNl Generates a uniform mesh on arectangle [0, t x ] x [0, ty \ (Neumann conditions). - RectangleMeshTopDl Generates a uniform mesh on arectangle [0, t x ] x [0, ty] (Dirichlet conditions on the top edge, Neumann conditions elsewhere). Similar meshes are RectangleMeshLeftNl, RectangleMeshTopLeftDl. - CoarseCircleMeshDl,CoarseCircleMeshNl Generateacoarsemesh (four triangles) on the unit circle (Dirichlet, Neumann conditions, respectively). - CoarseSemiCircleMeshDl, CoarseSemiCircleMeshBottomDl Generate a coarse mesh (two triangles) on the unit semicircle (Dirichlet, mixed conditions, respectively).

## 6.3. The MATLAB implementation

151

Figure 6.13. The standard pattern of triangles for a uniform mesh (left) and an alternate pattern (right). - CoarseEllipseMeshDl, CoarseEllipseMeshNl Generate a coarse mesh (four triangles) on the ellipse jc2 + >'2/4 = 1 (Dirichlet, Neumann conditions, respectively). - NGonMe shDl Generates a coarse mesh on a regular n-gon of area one (Dirichlet conditions). Suitable for n not too large (if n is large, say much bigger than 10, the mesh quality is poor). Some of the Rectangle routines above have a "la" version, which generates according to the pattern on the right in Figure 6.13, instead of the standard pattern shown on the left (see Exercise 7.6.11). Ref inel Standard refinement of a given mesh. MakeMeshl Assembles the mesh data structure from a triangle-node list. MakeEdgesCurvedl Labels the boundary edges of a given mesh as curved. NeumannMesh Converts the boundary conditions on a mesh to pure Neumann. ShowMeshl Graphs a mesh. MeshQual i tyl Computes the measures of mesh quality discussed in Section 6.3.3. JiggleMeshl Improves mesh quality by moving the interior nodes.10 ShowSupportl Graphs a mesh and shades the support of a given basis function. ShowPWLinFcnl Graphs a continuous piecewise linear function on a given mesh. ShowPWConstFcn Graphs a piecewise constant function on a given mesh.

6.3.8

## A summary of the notation

I have now introduced a large amount of notation, all of which is necessary to describe the mesh precisely. For convenient reference, I summarize it here. Where appropriate, I also indicate where in the mesh data structure the corresponding information is to be found.
3

JiggleMeshl was adapted from a similar routine in the MATLAB PDE Toolbox.

152 Number of triangles: Nt Number of edges: Ne Number of nodes: Nv Number of free nodes: Nf Number of constrained nodes: Nc Number of free boundary edges: Nb Triangles: T\, T2,..., TJv, Edges: e\,e2, ...,eNe

## Chapter 6. The mesh data structure

Edges belonging to triangle 7): eSi,, eSi2, eSi 3 (Elements) Triangles to which edge e, belongs: Tti,, Tti 2 (f,,2 is undefined if e\ is a boundary edge)(EdgeEls) Nodes: z\, zi,..., ZNV, Zj (xj, yj) (Nodes) Nodes belonging to edge ei: zkil, zki2 (Edges) Nodes belonging to triangle 7): z,,, z B ._ 2 , zni,3 Free nodes: z/,, z / 2 , . . . , z/N (FNodePtrs) Inverse of j i-> /;: i \-> /?, (/?, = j <& i = fj) (NodePtrs)
Constrained nodes: zCi ,zC2,---, zCNc (CNodePtrs)

## Inverse of j \-> cf. i i-> qt (qi = j & i = Cj) (NodePtrs)

Nodal basis functions: Vi, fa, , &NV Nodal basis functions corresponding to free nodes: 0i, 02 > , 0/v7, 0y = ^f} Free boundary edges: ^,, e ^ 2 , . . . , e\,Nb (FBndyEdges) Inverse of i i-> /?/: j (- ay- (ay = / ^> j bt) (EdgeEls)

6.4
1.

## Exercises for Chapter 6

(a) Consider a family of uniform triangulations on the unit square (such as the mesh shown in Figure 4.5). Give asymptotic expressions for Nt and Ne (the numbers of triangles and edges, respectively) in terms of Nv (the number of nodes). These should be of the form

6.4.

## Exercises for Chapter 6

153

(b) Now consider a family of meshes obtained by standard refinement from an initial mesh. Do (6.2) still hold for the same constants ct, cel Why or why not? (c) Using the results from above and assuming integers require two bytes of storage and double precision numbers require eight bytes of storage, estimate the amount of storage used by the mesh data structure in terms of Nv, the number of nodes. (d) Consider a family of meshes as described in part 1 (b). About how many nonzeros will the stiffness matrix K have per row? In total? (e) A typical (double precision) sparse matrix storage scheme will require about 12 bytes of storage per nonzero. (It could be a little less; MATLAB uses a little more.) In terms of Nv, about how much storage does the stiffness matrix K require? 2. Let T be a triangle with side lengths b\,b2,bi, and let rinsc and rcirc be the radii of the inscribed and circumscribed circles, respectively. Show that

3. (MATLAB) Suppose Q is the annulus (that is, the region bounded by two concentric circles) centered at the origin, with inner radius r 0.5 and outer radius/? = 1. Using any of the tools mentioned in Section 6.2 (Ref inel, delaunay, distmesh2d), create a triangulation of 2 having a good mesh quality. 4. (MATLAB) Repeat the previous exercise when Q is the region bounded by the parabola _y = x2 and the line >' = 1. 5. (MATLAB) Repeat the previous exercise when Q is the region bounded by the ellipse

Chapter 7

## Programming the finite element method: Linear Lagrange triangles

This chapter discusses programs for solving model problem (6.1) using piecewise linear finite elements and the mesh data structure introduced in the previous chapter. Some details about quadrature and the evaluation of basis functions and their gradients are presented in Section 7.1. Then the assembly of the stiffness matrix and load vector are explained in Sections 7.2 and 7.3, respectively. Section 7.4 presents various examples of (6.1), solved using the MATLAB version of these algorithms.

7.1

There is one more technical matter to discuss before I can explain in detail the algorithms for assembling the stiffness matrix and load vector, namely, the computation of the integrals that define these quantities. As I discussed in Section 5.5, integrals such as

are usually estimated by quadrature rather than computed exactly. Section 5.5 presented the necessary theory: If the quadrature rule is exact for polynomials of degree 2d 2, then the convergence rate for Lagrange triangles of degree d will be just as good as if the integrals were computed exactly.

7.1.1

One-dimensional Gaussian quadrature I begin by describing Gaussian quadrature for one-dimensional integrals. This will simplify the introduction, and these rules will also be useful for the (one-dimensional) boundary integrals arising in a problem with inhomogeneous Neumann conditions. Moreover, quadrature rules over squares, useful for quadrilateral elements, are based directly on one-dimensional Gaussian quadrature rules.
155

156 Chapter 7. Programming the finite element method: Linear Lagrange triangles

It is standard to define Gaussian quadrature rules for the reference interval [1, 1]; that is, the rules are developed for integrals of the form

## General integrals of the form

are then handled by a linear change of variables, which is described below. The best-known quadrature rules are probably the trapezoidal rule and Simpson's rule. The simple trapezoidal rule is based on linear interpolation: The integrand / is approximated by the linear function I agreeing with / at the endpoints 1 and 1 of the interval of integration. The result is11

Since the trapezoidal rule is based on linear interpolation, it is obviously exact for every polynomial / of degree one or less (i = f for such an /). Simpson's rule is based on quadratic interpolation. The integrand / is approximated by the quadratic polynomial q that interpolates / at 1, 0, and 1. The result is12

Since Simpson's rule is based on quadratic interpolation, it is exact for any polynomial / of degree 2 or less. In fact, though, more is true: Simpson's rule is actually exact for polynomials of degree up to 3. In short, Simpson's rule has degree of precision 3 (while the trapezoidal rule has degree of precision 1). This extra degree of accuracy, together with the simplicity of Simpson's rule, makes it a popular quadrature rule.
1

'The reader has probably seen the composite trapezoidal rule applied to

which is derived by dividing the interval [a, b\ into n subintervals |jt/_i, jc/1, i = 1 , 2 , . . . , n, where *, a + i(b a)/n, and applying the trapezoidal rule on each subinterval. Summing the results yields the familiar formula

12 Again, the reader is probably more familiar with the composite Simpson's rule, which is derived in the same way as the composite trapezoidal rule and takes the form

(n even).

157

Having observed this extra degree of precision displayed by Simpson's rule, it is natural to pursue the following idea: to derive quadrature rules directly from the requirement that they be exact on polynomials of as high a degree as possible. The trapezoidal rule, Simpson's rule, and similar numerical integration formulas have the form

where t\ , ?2, , tn e [ 1, 1] are called the quadrature nodes and w\, W2, . . . , wn are the corresponding weights. For the trapezoidal rule, the nodes are t\ 1, ?2 1 and the weights are w\ = 1, u>2 1. The nodes for Simpson's rule are t\ = 1, ti 0, t^ 1 and the weights are w\ 1/3, W2 = 4/3, WT, 1/3. A quadrature rule of the form (7.1) is called an n-point rule. The general n-point quadrature rule depends on 2n parameters, namely, the n nodes and the n weights. It is reasonable to expect that the 2n degrees of freedom could be chosen so that the rule has degree of precision In 1, since such a polynomial is determined by 2n coefficients. To put it a different way, since both integration and the n-point rule are linear in the integrand, a quadrature rule is exact for polynomials of degree at most d if and only if it is exact on the d + 1 monomials 1 , t, t2, . . . , td . An n-point rule, having In parameters t\ , ?2, - - - , tn, w\ , W2, . . . , wn, can be expected to satisfy 2n equations:

In fact, this is possible for all n, and the resulting quadrature rules are called the Gaussian quadrature rules. For example, the one-point Gaussian quadrature rule is

## (the midpoint rule), and the two-point rule is

The one-point Gaussian quadrature rule is as accurate, from the point of view of integrating polynomials, as the two-point trapezoidal rule, and the two-point Gaussian quadrature rule is as accurate as the three-point Simpson's rule. The one- and two-point Gaussian quadrature rules are easily derived directly from the conditions given above, but for n > 2 this is not so easy. The existence and uniqueness of the general n-point Gaussian quadrature rule is proved using the theory of orthogonal polynomials, and the details are not relevant for this book. The last matter that 1 wish to discuss concerning one-dimensional integrals is the question of transforming the general integral

158 Chapter 7. Programming the finite element method: Linear Lagrange triangles into the special form

The result is

## Applying the n-point quadrature rule (7.1) yields

where Thus, to apply the n-point rule, it is necessary to compute the quadrature nodes on [a, b] from the nodes on the reference interval [1, 1 ] and also compute the Jacobian factor (b a)/2, which is constant since the change of variables is linear.

In keeping with the development of one-dimensional Gaussian quadrature in the previous section, I will begin by examining quadrature rules for integrals over the reference triangle (sometimes called the master element) TR having vertices (0, 0), (1, 0), and (0, 1). An n-point rule now takes the form

and it is desired to choose the quadrature nodes (s\, t\), (\$2, h),..., (sn, tn) and the weights w\, W2,..., wn so that the rule is exact for polynomials of as high a degree as possible. The monomials are 1, s, t, s2, st, t2,..., and the goal is to integrate as many of them exactly as is possible. With a one-point rule, it ought to be easy to integrate constant functions (that is, the monomial 1) exactly. A one-point rule is defined by three parameterss\, t\, w\and a constant function imposes only one constraint. Indeed, with f ( s , t ) = \,

and

159

Therefore, (s\, t\) can be any point in TK as long as the weight is correct: w\ 1/2. Any such rule has degree of precision 0. (But the reader should note that it is still necessary to evaluate the integrand at some point in TK, since constant functions usually do not have value 1. In other words, one cannot use a zero-point rule.) In order that a rule have degree of precision 1, it must satisfy three constraints: It must integrate 1, s, and / exactly. A one-point rule, having three parameters, should work here as well. The three conditions are

and the unique solution is s\ t\ = 1/3, w\ = 1/2 (see Exercise 1). The point (1/3, 1/3) is the centroid of the reference triangle TR. To integrate all second-degree polynomials exactly implies satisfying six constraints (namely, that 1, s, t, s2, st, and t2 are integrated exactly), and it would appear that a two-point rule, having six degrees of freedom, would suffice. However, it is here that the situation begins to break down, for, in fact, there is no two-point rule with degree of precision 2. The reader can write down the six (nonlinear) equations and, with a little effort, show that there is no solution (see Exercise 2). A three-point rule is required to integrate all second-degree polynomials exactly, but there is no uniqueness. In Table 7.1, I give two three-point rules for the reference triangle TR, with degree of precision 2. Rules with higher precision are presented in Chapter 8 in the context of Lagrange triangles of degree greater than 1. The one-point rule suffices for linear Lagrange triangles, at least for computing the stiffness matrix and load vector. In Exercise 16, the reader is asked to write a program to compute the mass matrix, which was introduced in Exercise 4.8.17:

Since the basis functions 0, and 07, restricted to each triangle, are linear, the integrand is at least quadratic and a rule having degree of precision 2 is required.
Rule 1 Rule 2 (s1,t1) (1/6, 1/6) (1/2, 0) (s2, y2) (2/3, 1/6) (1/2, 1/2) (s3, t3) (1/6,2/3) (0, 1/2) w1 1/6 41/6 w1 1/6 41/6 w1 1/6 41/6

Table 7.1. Two three-point quadrature rules for TR, each with degree of precision 2.

160 Chapter 7. Programming the finite element method: Linear Lagrange triangles Integrating over general triangles

Now I consider an arbitrary triangle T with vertices (JC], y\), (X2, yi), (*3, yi) and show how to estimate by a change of variables. The reference triangle TR is mapped to T by the following transformation (cf. Section 4.6), which sends (0, 0) to (x\, y\), (1,0) to (X2, ^2), and (0, 1) to(x3,y3): (s,t)eTR^ (x,y)e T,

## In vector form, the transformation is

orz = z\ + Ju. The standard formula for a change of variables in a multiple integral gives

where g is defined by The Jacobian factor is constant: Applying the one-point rule

## to the transformed integral, the result is

where

The reader should notice that the quadrature node on T is the centroid of T, just as (1 /3, 1/3) is the centroid of TR:

## Moreover, J7/2 is the area of triangle T (see Exercise 3).

161

A similar result holds if a three-point rule is used to estimate the transformed integral

## Using Rule 1 from Table 7.1,

where

As the above examples suggest, because of the linearity of the transformation between TR and T, the quadrature nodes are conveniently expressed in terms of barycentric coordinates. If the vertices of T are (x\, y\), fe, >'2), (xi, 3^3), then the barycentric coordinates (Ci > 2, 0 of (jc, _y) e T are defined by the conditions

For Rule 1 from Table 7. 1 , the barycentric coordinates for the three quadrature nodes on TR are (2/3, 1/6, 1/6), (1/6,2/3, 1/6), and (1/6, 1/6, 2/3); the corresponding quadrature nodes (7.3a-7.3c) on T have the same barycentric coordinates. For this reason, it is convenient to give quadrature rules using barycentric coordinates. When a quadrature rule is expressed in barycentric coordinates, it is conventional to express it for an arbitrary triangle of area 1. Since TR has area 1/2, the reader will notice that the weights are doubled when expressed in barycentric coordinates as compared to the coordinates of TR. Table 7.2 contains the same three-point rules as does Table 7. 1, but expressed in terms of barycentric coordinates. To apply one of these rules to an arbitrary triangle 7\ use nodes on T with the given barycentric coordinates; multiply the weights by the area of triangle T. The reader can verify that this procedure yields the results derived above.

162 Chapter 7. Programming the finite element method: Linear Lagrange triangles

Table 7.2. Two three-point quadrature rules, each with degree of precision 2. The nodes are given in terms of bary centric coordinates for a triangle of area 1.

7.1 .2

## Evaluating the standard basis functions on a triangle

In order to apply the quadrature rules described above, it is necessary to evaluate the basis functions and their gradients over each triangle. For linear Lagrange triangles, the standard basis functions are linear on each triangle, and, given any triangle Tk in the mesh, there are only three basis functions that are nonzero on 7*. To simplify the notation, I will use local indices in this section and assume that 7* has vertices (jci, y\), (X2, yi), and (*3, yj,). The corresponding nonzero basis functions will be denoted 0], 02, and 03, respectively.13 The functions 0i, 02> 03 are defined by

## and can be represented by

The coefficients ,, b{, c, can be found by solving a 3 x 3 system of equations. For example, for / = 1 this system is

## or, in matrix-vector form,

13

Using global indices, the vertices would be (xnk,, ynk ,),i = 1,2,3.

163

Each triple (a-,, b j , c/) is determined by a system like (7.4), and in each of these three systems, the coefficient matrix is the same. The three systems can therefore be combined into the matrix equation

## which shows that

is the inverse of

Therefore, the computation of a 3 x 3 matrix inverse suffices to give the coefficients of all three linear functions on Tk. When assembling the stiffness matrix, it is necessary to compute integrals of the form

Since each </>, is linear over Tk, the gradients are constant:

Therefore,

## The integral of K can be estimated by the one-point rule:

(A is the area of Tk and (;c, ~y) is the centroid of Tk}. The calculation then proceeds like this: 1. Estimate JL 'k K. 2. Invert the 3 x 3 matrix

and extract the gradients of the basis functions from M '. 3. Compute the dot products of the gradients:

164 Chapter 7. Programming the finite element method: Linear Lagrange triangles

The entries in the element stiffness matrix are then available, and they can be added to the proper entries in the global stiffness matrix. Computing the integrals that contribute to the load vector is even easier, at least when the standard one-point quadrature rule

is used. The computation is simplified by the fact that all three basis functions have value 1/3 at the centroid (J, ~y) (see Exercise 4), and so the estimate of

is the same for each / = 1,2, 3. Thus, when using the one-point rule, which is accurate enough when using linear Lagrange triangles, there is no work involved in computing the basis functions at the quadrature node. For future reference, I will show how to compute the basis functions 0i, 02,03 at a set of arbitrary points inside of 7*. This is easy if the points are given by their bary centric coordinates, since the barycentric coordinates (j, &, &) of ( x , y ) e Tk are simply the values (0| (jc, j), 02(*, }0, fo(x, 30) (see Exercise 5). If, on the other hand, the points are specified by their Cartesian coordinates, evaluating 0i, 02, 03 requires the solution of a linear system. If M is the matrix defined by (7.6), then

The values of 0i, 02, 03 at (fi, /?]), (2, m), , (p, VP) are given by

## These values are the entries of the matrix-matrix product

Defining

V = CM l satisfies V,7 = 0 y -(/, r/,). The equation V = CM l can be rearranged to give the matrix-matrix equation MTVT = CT, which can be solved efficiently using Gaussian elimination.

Using the reference triangle

165

The computations described above can be performed by transforming all integrals to the reference triangle, as described in Section 4.6. When using linear Lagrange triangles, there is a little gain in efficiency in using the reference triangle: The gradients of the basis functions need not be computed over every triangle, which saves the computation of a 3 x 3 matrix inverse. On the other hand, the transformation J must be computed and then J~lV<j)i, i = 1, 2, 3, which is nearly as expensive (see Exercise 6). Since the values of the basis functions at the centroid of any triangle are already known, there is no advantage in transforming

to the reference triangle. When using higher-order Lagrange triangles, there is a significant savings to be had in using the reference triangle. The values and gradients of the basis functions on TR can be computed once and used for every triangle in the mesh. I will discuss this further in Chapter 8.

7.1 .3

When using quadrilateral elements, all integrations are performed over the reference square SR [ 1, 1] x [ 1, 1]. Quadrature rules suitable for bilinear finite element spaces are easily obtained from the Gauss quadrature rules for the interval [ 1 , 1 ]. Any polynomial in s, t is the sum of terms of the form skti, and

## 1 66 Chapter 7. Programming the finite element method: Linear Lagrange triangles

Since the n-point (one-dimensional) Gauss rule is exact for polynomials of degree up to In 1 , rule (7.7) is exact for polynomials whose terms are of the form sktt, k, t < In \ . This space of polynomials does not include all polynomials of degree 4n 2 or less, but it does include all polynomials of the form

The product Gauss rules are therefore natural for use with bilinear finite elements, described in Section 4.5.2, and also with their generalizations to biquadratic, bicubic, and so forth. In Exercise 9, the reader is asked to show that the product Gauss rule with n = 2 (four points in all) is suitable for implementing bilinear quadrilateral finite elements.

7.2

## Assembling the stiffness matrix

I will now present the main computation of the finite element method, namely, the assembly of the stiffness matrix. In Section 6.1,1 presented an outline of an element-oriented algorithm for assembling K (see Algorithm 6.1). I can now express this algorithm in terms of the mesh data structure presented in Section 6.2. A fundamental operation needed for implementing element-oriented algorithms is the determination of the vertices of a given triangle. To be precise, it is necessary to know, for each vertex, its coordinates, whether it is free or constrained, and its index in the list of free or constrained nodes. The reader will recall that the mesh data structure stores, for each triangle in a mesh, (pointers to) the edges of the triangle. It also stores, for each edge, (pointers to) the endpoints of the edge. By a simple algorithm, then, one can extract pointers to the vertices of a given triangle. If the triangle is Tk, then, according to the notation developed earlier, these pointers are nk<r, r = 1,2, 3. Given n^\, w*,2, /u> the coordinates of znil:,, znk2, znk3 can be extracted from the Nodes array. Using the same pointers, the flags indicating whether the nodes are free or constrained can be extracted from NodePtrs. This algorithm, which I call getNodesl, is so fundamental that I describe it fully in Algorithm 7.1. The reader will recall that the sign of T. Elements (k, j ) indicates whether edge j of triangle k is traversed forward or backward. This fact is used in extracting the pointers to the vertices of the triangle. As I discussed at the end of the previous section, the three basis functions corresponding to a given triangle Tk are computed by inverting the following 3 x 3 matrix:

The matrix M } contains the coefficients of the basis functions as its columns, and since the basis functions are linear, these coefficients include the partial derivatives

7.2.

## Assembling the stiffness matrix

167

for j = 1,2, 3
eptr(j)=T.Elements(k,j ) ifeptr(l)> 0 indices(1)=T.Edges(eptr(1),1) indices(2)=T.Edges(eptr(1),2) else indices(1)=T.Edges(eptr(1),2) indices(2)=T.Edges(eptr(1),1) if eptr (2) >0 indices(3)=T.Edges(eptr(2,2)) else indices(3)=T.Edges(eptr(2,1))

for / = 1, 2, 3
ptrs(i)=T.NodePtrs(indices(i) )

for j = 1, 2
coords(i,j)=T.Nodes(indices(i) , j Algorithm 7.1. The get Node si algorithm: Given a mesh T and a triangle index k, determines the coordinates of the vertices ofT^ and the corresponding entries in T. NodePtrs. The coordinates are returned in a 3 x 2 array coords and the pointers in the 3 x 1 array ptrs.

Therefore, upon computing M ', the gradients of the three basis functions are immediately available. The integral

## is estimated using the one-point rule

where A is the area and (x, y) the centroid of the triangle. The computation of the area was explained in the previous section (see also Exercise 3). Algorithm 7.2 is the complete element-oriented algorithm for assembling K. I assume that a routine for computing a matrix inverse is available. MATLAB contains such a routine. If the code is to be written in a high-level language such as Fortran, C, or C++, it is recommended that code from the LAPACK package [3] be used. Since K is symmetric, the upper triangle is first computed and then copied to the lower triangle. Depending on the software to be used to solve KU F, it may not be necessary to fill in the lower triangle ofK.

168 Chapter 7. Programming the finite element method: Linear Lagrange triangles

Initialize K to the zero matrix forfc= l,2,...,Nt Call getNodesl to get coords and ptrs Compute the matrix M and its inverse forr = 1,2,3 fors r, ... ,3 G(r, s) = V0r V0, Estimate / = fT K using the one-point quadrature rule forr = 1 , 2 , 3 for s r,..., 3 if ptrs (r) >0 and ptrs (s) >0 / = min{ptrs (r) ,ptrs ( s ) } j = maxjptrs (r) ,ptrs ( s ) } MdG(r,s)ItoK(i,j) for/ = 2 , 3 , . . . , N f for j = 1 , 2 , . . . , / - 1 K(i,j) = KV,i) Algorithm 7.2. The complete algorithm for assembling K. The matrix M is defined by (7.8) and M~l contains V0i, V02, V<fo in its second and third rows. The stiffness matrix K is symmetric, so the upper triangle is computed in the main loop, and then the lower triangular entries are assigned at the end.

7.3

The basic algorithm for computing the load vector F is similar to that for assembling the stiffness matrix K, although it is complicated by the need to handle inhomogeneous boundary conditions. I will begin by completing the description of Algorithm 6.3 (see page 132), which applies to the following BVP (with homogeneous boundary conditions):

The integrals

## 7.3. Computing the load vector

169

where A is the area and (x, >) the centroid of Tk. Moreover, since all three basis functions have value 1/3 at (x, y), this estimate is the same for all three basis functions that are nonzero over Tk :

The details (getting the coordinates of the vertices of 7*, manipulating the pointers, computing the area of Tk, and so forth) are the same as for assembling the stiffness matrix. The complete algorithm for assembling F, under homogeneous boundary conditions, is given in Algorithm 7.3.

for

## Initialize F to the zro vector

Call getNodes1 to get coords and ptrs Compute the area A and the centroid Compute for if ptrs Add I to F(ptrs(r))

Algorithm 7.3. The algorithm for assembling the load vector in the case of homogeneous boundary conditions.

7.3.1

## In homogeneous Dirichlet conditions

It is not difficult to extend the above algorithm to handle inhomogeneous Dirichlet boundary conditions. In place of

it is necessary to compute

## The function G is the continuous piecewise linear function defined by

By definition, G is zero over every triangle except those having at least one constrained vertex. When looping over the triangles of the mesh, a contribution to F (from the inhomogeneous Dirichlet condition) must be computed whenever the triangle Tk contains at least one constrained node and at least one free node. These contributions have the form

170 Chapter 7. Programming the finite element method: Linear Lagrange triangles Since both VG and V0, are constant on Tk, it follows that

Moreover, VG is easily computed. If w\, W2, WT, are the nodal values of G at the three nodes of 7*, then, on Tk,

(these formulas are expressed in local indices). Algorithm 7.4 computes the load vector, taking into account the influence of the righthand-side function / and the Dirichlet data g (but still ignoring any nonzero Neumann data). This algorithm assumes that the nonzero Dirichlet data are given in an Nc x 1 array. Initialize F to the zero vector for*= 1 , 2 , . . . , N , Call getNodesl to get coords and ptrs Compute the area A and the centroid (x, j) of Tk if / is nonzero Compute / = Af(x, j)/3 forr = 1,2,3 ifptrs(r)>0 Add/toF(ptrs(r)) if g is nonzero and at least one ptrs (r) is negative Form the matrix M and its inverse Extract the nodal values of G on Tk Compute the gradient of G on 7* Compute / = AK(X, j) forr = 1,2,3 ifptrs(r)>0 Add VG V0r7 to F(ptrs (r)) Algorithm 7.4. The algorithm for assembling the load vector in the case of a nonzero right-hand-side f and/or nonzero Dirichlet data g.

7.3.2

## Inhomogeneous Neumann conditions

If the problem involves inhomogeneous Neumann conditions, the resulting contribution to the load vector can also be computed while looping over the triangles of the mesh. It can be determined from the array EdgeEls whether each edge is an interior edge, a constrained boundary edge, or a free boundary edge. To be precise, T. EdgeEls ( j , 2) is positive if edge ej is an interior edge, zero if ej is a constrained boundary edge, and / if e-} is the j'th free boundary edge.

7.4. Examples

171

The formula for the load vector, taking into account both Dirichlet and Neumann data,

is

If node zn is a free node that lies on 9 2, then it is the endpoint of two free boundary edges. Therefore, if i = pn (this notation means that zn is the /th free node) and zn is an endpoint of free edges e-h and ei2, then

These integrals can be computed while looping over the triangles. The reader should notice that e-h and e-h may or may not belong to the same triangle, but, as in the case of the integrals defining //, it is not necessary that the two integrals be computed together. The nodal values of the Neumann data h, at the endpoints of the free edges, can be provided in an Nb x 2 array. The function h is then approximated by its piecewise linear interpolant on each edge. Each integral of the form

is then estimated by the one-point Gaussian quadrature rule (the midpoint rule) for onedimensional integrals. Algorithm 7.5 is the complete algorithm for computing the load vector. It is coded in the MATLAB routine Loadl. The reader may notice that the array FBndyEdges, which identifies the free boundary edges, is not used while computing the load vector. Its only purpose is to identify the free boundary edges in a convenient way so that the needed Neumann data can be assembled, for input to Loadl, in a convenient way.

7.4

Examples

In this section, I solve several BVPs using the MATLAB code implementing the algorithms of this chapter. I have two goals in mind: 1. to show the steps involved in solving a BVP; and 2. to illustrate the theory described in Part I. To these ends, the solved problems illustrate a variety of boundary conditions and geometries. The easiest way to illustrate the convergence theory is to solve problems with known solutions. Few BVPs can be solved in closed form (that is, with the solution expressed as a finite combination of elementary functions). Therefore, to generate a test problem of the form

## 1 72 Chapter 7. Programming the finite element method: Linear Lagrange triangles

Initialize F to the zero vector f o r & = 1,2, . . . , N , Call getNodesl to get coords and ptrs if / is nonzero Compute the area A and the centroid (Jc, y) of 7* Compute / Af(x, y)/3 forr = 1,2,3 if ptrs (r) > 0 Add/toF(ptrs(r)) if g is nonzero and at least one ptrs (r) is nonzero if not already done Compute the area A and the centroid (J, 7) of 7* Form the matrix M and its inverse Extract the nodal values of G on 7* Compute the gradient of G on 7* Compute 7 = AK(X, ~y) forr = 1,2,3 ifptrs(r)> 0 Add VG V0r/ to F(ptrs (r)) if h is nonzero for; = 1,2,3 if edge j is a free boundary edge Extract the coordinates of the endpoints of the edge Extract the corresponding pointers ptrsl from NodePtrs Compute the length L of the edge Interpolate the boundary data at the midpoint of the edge to get h(m) Evaluate / = 0.5Lh(m) ifptrsl (1) > 0 Add/ to F(ptrsl (1)) ifptrsl (2) > 0 Add/toF(ptrsl(2)) Algorithm 7.5. The complete algorithm for assembling the load vector. \. Choose a domain 1 and a partition dQ = r\ U F2 of its boundary (possibly with FI or F2 empty). Choose the coefficient K for the PDE. 2. Choose the solution u. If the boundary conditions are homogeneous, then u must be chosen accordingly (it is easier to choose a solution for a problem with inhomogeneous boundary conditions). 3. Compute the right-hand-side / and, if M was not chosen to satisfy homogeneous boundary conditions, compute the boundary functions g and h.

7.4. Examples

1 73

7.4.1

## Homogeneous boundary conditions

The following two examples involve homogeneous boundary conditions on a simple geometry. EXAMPLE 7.1. The first example will involve homogeneous Dirichlet conditions on the unit square 1 = (0, 1) x (0, 1) = {(x, y) : 0 < jc < 1, 0 < y < 1}. The coefficient K is nonconstant: K(X, y) = 1 + xy1. In constructing a test problem, the only hard part is choosing a function u that satisfies the homogeneous boundary conditions; in this case, since the geometry is so simple, this is not hard at all. The function is obviously zero on the boundary, and f is computed to be

The computed f can be rather complicated; however, the procedure is purely mechanical and one can employ a computer algebra system (such as the Symbolic Toolbox in MATLAB, or Mathematica, or Maple}. Using the MATLAB code implementing the algorithms described in this chapter, the BVP \vassolvedonfive (successivelyfiner) regular meshes, having^, 32, 128, 512, aw/2048 triangles. The MATLAB routine RectangleMeshDl generates a regular mesh, assuming Dirichlet conditions, on any rectangle [0, tx ] x [0, t y ] . The mesh size h decreases by a factor of 2 with each successive refinement of the mesh. Therefore, according to the convergence theory presented in Chapter 5, the energy norm of the error should also decrease by a fact or of approximately 2 with each mesh refinement. In order to estimate the energy norm of the error (when the exact solution is known), I wrote a MATLAB routine EnergyNormErrl to estimate

where u is a given smooth function and /, is a piecewise linear function defined on a mesh 77;. The integral is estimated using a three-point quadrature rule on each triangle in 7~h, and the result may be somewhat inaccurate if the mesh is coarse. Table 7.3 shows the errors, measured in the energy norm, of the five computed solutions. It also shows the ratio of the errors in goingfrom one mesh to the next finer mesh. The results displayed in Table 7.3 are in agreement with the predictions of the theory, namely, that\\u-uh\\E = 0(h). The energy norm of the exact solution is approximately 0.1613, so even on a mesh with 2048 triangles, the error is still about 5%. EXAMPLE 7.2. In this example, 2, K, f are the same as in the previous example, but the boundary conditions are different:

174 Chapter 7. Programming the finite element method: Linear Lagrange triangles

## Table 7.3. The errors in Example 7.1.

Figure 7.1. The computed solutions to Examples 7.1 (left) and 1.2 (right). The only difference in the twoBVPs is that Dirichlet conditions are imposed on the bottom edge ofQ in Example 7.1, while Neumann conditions are imposed in Example 7.2.

The set YI is the bottom edge of the square, while the other three edges comprise Y\. To illustrate the effect of replacing the Dirichlet condition on YI by the Neumann condition, the computed solutions to the previous example and this example are displayed in Figure 7.1.

7.4.2

## I now turn to problems with inhomogeneous boundary conditions.

7.4.

Examples

175

Figure 7.2. The initial and final meshes from Example 7.3.

## Table 7.4. 7/ze errors / Example 1.3.

EXAMPLE 7.3. In this example, Q is the region bounded by two squares centered at the origin; the inner square has side length 2/3 and the outer square has side length 2 (see Figure 7.2). The BVP is

where f and g are chosen so that the exact solution is u(x, y) x/(x2 + y2). The initial mesh is defined by 16 triangles and 16 nodes, as shown in the left-hand plot of Figure 7.2. The mesh is then refined three times and the BVP is solved on each of the four meshes. Figure 7.2 also shows the final mesh, while Figure 7.3 shows the final computed solution. The errors in the computed solutions are displayed in Table 7.4. EXAMPLE 7.4. 77?e */ example is a pure Neumann problem on the unit square. The reader should recall that the right-hand-side f and the boundary data h cannot be chosen arbitrarily in the BVP

176 Chapter 7. Programming the finite element method: Linear Lagrange triangles

Figure 7.3. The computed solution, corresponding to the finest mesh in Figure 1.2, from Example 7.3. The error in the energy norm is about 16.8%. Rather, f and h must satisfy the following compatibility condition:

If the compatibility condition is not satisfied, then the BVP has no solution; if it is satisfied, then the BVP has infinitely many solutions, any two of which differ by a constant. By choosing u and computing f, h (so as to produce a problem with a known solution), I guarantee that the compatibility condition will be satisfied. However, I will still have to deal with the nonuniqueness of the solution, which manifests itself in a singular stiffness matrix. I choose u(x, y) = x2y2 + 2y, which yields f ( x , y) 2y2 2x2. To specify h, I partition 9 2 into its four edges, labeled T\ (bottom), F2 (right), FT, (top), andT^ (left). Then h is given by

The data for the Neumann problem consist of the values ofh at the endpoints of each free boundary edge. However, it is important to notice that h is determined not only by the points (jc, y) on the boundary, but also by the normal vector on the edge:

7.4.

Examples

177

Figure 7.4. The initial mesh for Example 7.4. The edges are labeled. Therefore, at corner points, where the unit normal changes discontinuously, h effectively has two different values. For example, the initial mesh, with the edges labeled, is shown in Figure 1.4, which shows that the free boundary edges are the eight edges numbered 1,2, 3, 4, 1, 8, 9, and 10. The data for the Neumann problem are then the following 8 x 2 array:

(The rows ofh correspond to the edges in the order listed in T. FBndyEdges; the entries in each row correspond to the endpoints of the edge in the order given in T. Edges.) Thus, for example, the corner point ( x , y ) = (1, 1) belongs to edges 8 and 10. Considered as an endpoint ofe%, the value ofh is

## but considered as an endpoint ofe\o,

Thus H(6, 2) = 2.0andH(8, 2) = 4.0. Mathematically, thejact Ihath (1, 1) is not uniquely defined is unimportant, since a single point is a set of measure zero in 91 But when the problem is discretized, the value ofh at a point is no longer negligible and both values are needed.

## 1 78 Chapter 7. Programming the finite element method: Linear Lagrange triangles

In addition to the computation of the Neumann data, the singular stiffness matrix requires special attention. The null space of K is one-dimensional and is spanned by E, the vector of all ones. A nonsingular system results from removing the first row and column ofK and the first component ofF. Removing the first column ofK is equivalent to setting the first nodal value to 0; removing the first row of K and the first component of F is equivalent to removing the equation corresponding to the first free node. The resulting system, KU = F, has a unique solution that corresponds to the piecewise linear function that is zero at the first free node. In the meshes used in this problem, the first free node is the origin, so the particular solution u(x, y) x2 y2 + 2y is approximated (as opposed to u(x, y) = x2y2 + 2y + C for some nonzero C). The technique described in the previous paragraph is probably the simplest way to deal with a singular stiffness matrix, but it may not be the best way. Constraining a single node on the boundary amounts to imposing a Dirichlet condition at a single point on the boundary, which is not a well-posed boundary condition. The practical effect of this technique is to make the stiffness matrix more ill-conditioned; this effect is analyzed in detail in the paper by Bochev andLehoucq [ 12]. Section 11.5 discusses alternatives for dealing with a singular stiffness matrix. The error in the computed solution, on the finest mesh, is shown in Figure 7.5, while the estimated errors in all the computed solutions are displayed in Table 7.5.

Figure 7.5. The error in the computed solution in Example 7.4 (on the finest mesh). (To be precise, the graph is of the difference between the piecewise linear interpolant of the exact solution and the computed solution.)

7.4.

Examples

179

7.4.3

## A more realistic example

Problems with known solutions are useful for testing the correctness of code, and also for testing convergence theory. Obviously, though, the real purpose of finite element code is to solve problems whose solutions are not known. In this section I will discuss a more realistic problem than the preceding examples, that is, a problem whose solution is not known. A critical part of solving a realistic problem is assessing the accuracy of the computed solution. Normally there is some goal for the quality of the solution (such as computing the solution to within 5% in some norm, for example), and it is important to have some assurance that the computed solution is correct to within that limit. Efficiently producing validated solutions is the topic of Part IV of this book. Here I will produce a simple heuristic which is effective under certain conditions. Assuming the goal is to approximate the solution to within some tolerance in the energy norm, the a priori error bound

is useful. To develop a heuristic, it is assumed that the bound is achieved as an equation, that is, for some (unknown) positive constant C. If the solution is computed on two successive meshes, Tih and 77,, where 77, is the refinement of Tin, then the computable quantity \\uh U2h II is a reasonable estimate of ||w M/, ||#, as I will now show. The reader should bear in mind, though, that this heuristic depends on the assumption (7.12), which in turns depends on the assumption that the true solution belongs to // 2 (fi) (and that the meshes are fine enough for (7.13) to hold approximately). By the reverse triangle inequality and assumption (7.13),

Thus

180 Chapter 7. Programming the finite element method: Linear Lagrange triangles

should be approximately true if (7.13) is approximately true. The reasoning above also suggests which is a computable test on whether (7.13) is approximately valid (even if the true solution u is smooth enough, (7.13) may fail if the meshes employed are too coarse). The error estimate (7.14) is an example of an a posteriori error estimate. To compute /, - U2h, it is necessary to interpolate U2h onto the mesh TH. Since 77, is assumed to be obtained from Tih by standard refinement, this is straightforward. Each node in 7/j either belongs to Tih or is of the form

for some nodes Zj} and Zj2 in Tih- Since UIH is linear over each edge in T^, it follows that

The mapping from / to j\, J2 is part of the mesh data structure, so this interpolation is easy to implement. The MATLAB routine Interpolate! implements the interpolation of U2h from 72* to ThEXAMPLE 7.5. Consider a square metal plate measuring 20 cm by 20 cm, completely insulated on the top and bottom and also on three edges. To model the plate mathematically, it is assumed that it occupies the set 2 = (0, 20) x (0,20) in the plane, with F\ = {(x, 20) : 0 < jc < 20} (the "top" edge) and F2 = d2 \ r\. The problem is to determine the steady-state temperature distribution u = u ( x , y ) in the plate if the temperature along FI is held fixed at g(x) =25 5 cos (nx/10). The BVP that models this experiment was presented in Section 1.1.1:

The coefficient K is the thermal conductivity of the metal; it is constant if the plate is homogeneous and variable (K K (x, y)) if the plate is heterogeneous. In this example, 1 will compare two situations. In the first, the plate is homogeneous and made of iron; its thermal conductivity is the constant K\ = 0.836 W/(cm K). In the second, the plate has a region of high thermal conductivity in the center; its thermal conductivity is given by for (x, y) inside the circle (x 10)2 + (y 10)2 = 25 andK2(x, y) 0.836 elsewhere. The function KI is displayed in Figure 7.6. For the purpose of this example, the goal will be to compute the solutions to within 10% in the energy norm. For both BVPs, the initial mesh is uniform, with 128 triangles and h = (5/2)V2 = 3.536. Solving the first BVP (with K = K\ = 0.836) on four successive meshes yields the following results:

7.4. Examples

181

Figure 7.6. The thermal conductivity KI (in W/(cmK)) for the second plate in Example 7.5.

1.875 1.950

## 30.3 16.2 8.3

The ratios of \\U2h 4/i I I E / \ \ U h U2h HE suggest that the assumption (7.13) is not unreasonable, and the values of\\Uh u^h II E show that the estimated error in the fourth computed solution is less than \ 0%. Repeating the above computations -with the nonconstant thermal conductivity KI yields similar results:

1.879 1.951

## 30.5 16.3 8.4

Again, the fourth computed solution seems to be sufficiently accurate. The goal of this example was to compare the heat distributions in the two plates. If M ( I ) is the true heat distribution in the first (homogeneous) plate and w (2) is the true heat distribution in the second (heterogeneous) plate, then Figure 7.7 shows the computed estimate ofu(2) w ( 1 ) . The figure shows that the second plate is cooler on one side of the high conductivity zone and warmer on the other, which would be expected because heat energy flows more easily across that zone.

182 Chapter 7. Programming the finite element method: Linear Lagrange triangles

## in the temperature distributions in the two plates of

While the heuristic used in the previous example can be useful, its applicability is limited because the true solution must be smooth and the number of nodes increases rapidly as the mesh is uniformly refined. Methods for nonuniform mesh refinement are presented in Part IV; these methods are much more efficient and can also handle problems with singular solutions. However, they are also much more complex.

7.5

## The MATLAB implementation

The main algorithms from this chapter are implemented in the MATLAB functions Stif fnessl and Loadl. Several routines are provided to estimate various norms of solutions and errors. There is also a master routine, TestConvl, which monitors the convergence of the finite element method on a model problem with a known solution.

7.5.1

MATLAB functions

Stif fnessl Assembles the stiffness matrix (as a MATLAB sparse matrix) for the model problem on a mesh of linear Lagrange triangles. Loadl Assembles the load vector for the mode I problem on a mesh of linear Lagrange triangles. EnergyNorml Estimates (by quadrature on a given mesh) the energy norm of a known function. EnergyNormErrl Estimates (by quadrature) the energy norm of the difference between a known function and a piecewise linear function on a given mesh. L2Norml, L2NormErrl Like the previous routines, but for the L2-norm. Linf NormErr 1 Like the previous routines, but for the L-norni.

## 7.6. Exercises for Chapter 7

183

Interpolate! Interpolates a piecewise linear function from a given mesh onto a finer mesh (obtained by one or more applications of Ref inel). TestConvl Monitors the convergence of the finite element method on a model problem with a known solution. The functions for retrieving information from the mesh data structure are - getDirichletData Evaluates a given function at the constrained nodes. - getNeumannDatal Evaluates h K'du/dn on the free boundary edges, given K and u. - getNeumannDatala Evaluates a given function h on the free boundary edges. - getGradientsl Evaluates the gradients of a given piecewise linear functions on all the triangles. - getNodalValues Evaluates a given function on the nodes of a given mesh. - getNodesl Extracts the coordinates of the vertices of a triangle, along with their indices. - getNormall Computes the outward unit normal to a given side of a triangle. - getTriNodelndicesl Creates the triangle-node list for a given mesh.

7.6

## Exercises for Chapter 7

1. Show that there is a unique one-point rule that integrates 1, s, and / exactly on TR, namely,

2. Show that there is no two-point rule that integrates 1, s, t, s2, st, and t2 exactly on TR. (Hint: Derive the six equations that must be satisfied by the six parameters, s\, t\, w\, S2,t2, u)2, and use algebra to prove that there is no solution.) 3. Let T be the triangle with vertices (;ci, y\), (X2, ^2), and (A/?, ^3), and let

Prove that | det(J)| is twice the area of T. (Hint: Define the vectors v\ (x-i x\,y2 y\), V2 = (XT, x\,y\$ y\), and let 0 be the angle between v\ and \>i. Then v\ v2 = HI;, ||||u2|| cos(#) and the area of T is (l/2)||u,||||u2||sin(0). The trigonometric identity cos2 (9} + sin2 (6) 1 and some algebra gives the desired result.) 4. Let T be the triangle with vertices (x\, y\), (x2, yi), C*3, Js), and let 0i, 02, </>3 be the standard nodal basis functions that are nonzero on T. Let (J, ~y) be the centroid of T. Prove that 0, (J, J) = 1/3 for i = 1,2,3.

184 Chapter 7. Programming the finite element method: Linear Lagrange triangles 5. Using the notation of the previous exercise, show that if (x, y) e T, the values 0i (x, y), 02 (x, y), 03 (x, y) are equal to the barycentric coordinates expressing (;c, y) in terms of (x\, y\), (jt2, y-i), Us, Js)- Show that solving for the barycentric coordinates of (,-, r]i) T, i = 1, 2 , . . . , / ? , leads directly to the equation MTVT Cr described near the end of Section 7.1.2. 6. (a) Let (x\, y\), (*2, ^2), (*3 Js) be the vertices of a triangle 7\ The basis {0i, 02, 03} for the shape functions can be determined by computing the inverse of

as explained in Section 7.1.2. (The gradients V0/ are then known.) Count the number of arithmetic operations required to compute M~l. (b) If the reference triangle TR is employed, it is necessary to compute V0,- = J~TVyi, where [y\, yi, ^3} is the basis for the shape functions on TR and J is the Jacobian of the transformation from TR to T. Count carefully the number of operations necessary to compute J~ r V]//, i 1,2, 3. (Hint: This is done most efficiently by row reducing the matrix [J\Vy\ \Vy2 \Vys].) (c) Count the rest of the arithmetic operations required to process one triangle by either approach to assembling the stiffness matrix. What is the overall savings, expressed as a percentage, in using a reference triangle in the computation? 7. Using the results of the previous exercise, give total operation counts for assembling the stiffness matrix, with and without the use of the reference triangle. Express the results in terms of Nt and also Nv (given that Nv = O(Nt/2)). 8. Do an operation count for the assembly of the load vector. Express the results in terms of Nt and also Nv (given that Nv O(Nt/2)~). 9. Assume that when implementing bilinear quadrilateral elements, all integrals of the form

where Q is an arbitrary quadrilateral, must be computed exactly in the case that K is a constant. Show that the product Gauss rule (7.7) corresponding to n 2 satisfies this requirement. Note: Remember that the above integral is first transformed to the reference square SR. Do not neglect the Jacobian determinant factor in the transformed integral. It may be assumed that this determinant never changes sign. 10. (MATLAB) Test the heuristic suggested in Section 7.4.3 on a problem with a known solution, such as any of Example 7.1, 7.3, and 7.4. Is \\Uh U2h\\E a good estimate of \\u -/JE?

7.6.

## Exercises for Chapter 7

185

11. (MATLAB) There are two obvious ways to define a uniform triangulation of a rectangle; these are shown in Figure 6.13 and are produced by the MATLAB routines RectangleMeshDl and RectangleMeshDla. Choose a test problem with a known solution and determine which kind of mesh is more efficient (that is, which gives a smaller error for a given number of degrees of freedom). 12. (MATLAB) Consider the BVP

## where 2 is the unit circle and g is defined by

Here 0 is the polar angle corresponding to (jc, >') on the unit circle ((*, y) (cos (0), sin (0))). Using piecewise linear finite elements, try to compute the solution to this BVP to within 5% in the energy norm. 13. (MATLAB) Repeat the previous exercise, replacing the Dirichlet data g with

Which problem is easier to solve to the given accuracy? Why? 14. (Programming) Write a MATLAB routine Solvel to apply piecewise linear finite elements with uniformly refined meshes to solve a given BVP to within a given tolerance. Use the heuristic from Section 7.4.3 to decide when the solution is accurate enough. Also include a limit on the number of refinements or the total number of triangles so that the routine will stop in a reasonable amount of time, even if an unrealistically small tolerance is given. 15. Consider the following BVP:

where

is a partition of

and

on

(a) Derive the variational form of the BVP (cf. Exercise 2.7.12; notice the inhomogeneous Dirichlet conditions here). (b) Derive the formula for computing the load vector F from the Galerkin formulation (cf. Exercise 4.8.17).

186 Chapter 7. Programming the finite element method: Linear Lagrange triangles

16. (Programming) Write a MATLAB routine Mas si, similar to Stif fnessl, for computing the mass matrix M e RNfxNf defined by

(cf. Exercise 4.8.17, where M was introduced). Use a quadrature rule having degree of precision 2. 17. (Programming) Extend Loadl so that it computes the load vector for the BVP from Exercise 15. 18. (Programming) Rewrite Stif fnessl and Loadl to perform all computations over the reference triangle TR. Compare the performance of the new version with the old. 19. (Programming project) Write a collection of MATLAB routines that completely implements bilinear quadrilateral finite elements. This will involve modifying the mesh data structure to describe quadrilateral elements and rewriting the routines Stif f 1 and Loadl, as well as the routines they invoke. As discussed in Section 4.5, it is necessary to perform all integrations over a reference square.

Chapter 8

## Lagrange triangles of arbitrary degree

This chapter extends the algorithms and codes of the previous chapters to Lagrange triangles of arbitrary degree, and also to isoparametric finite elements. The presentation continues to address the model problem

More general BVPs are considered in the next chapter. A few simple changes allow the mesh data structure to describe a mesh of Lagrange triangles of degree d > 1. Each edge now contains d + 1 nodes, so the Edges array will be Ne x (d + 1), with row i listing the (indices of the) nodes of / in order. In addition to the edge nodes, if d > 3, each triangle contains interior nodes. These (or their indices) will be listed in the IntNodes array. IntNodes has Nt rows and (d 2)(d l)/2 columns; row k contains the indices of the interior nodes belonging to 7*.

8.1

When using piecewise polynomials of degree greater than one, it is necessary to use a quadrature rule with a sufficiently high degree of precision, or else the finite element method may break down. (For example, the stiffness matrix can be singular if it is computed with a quadrature rule that is not sufficiently accurate.) If 0, and 0/, restricted to a triangle T, are polynomials of degree d, then V0/ V0; has degree 2d - 2. If the quadrature rule used to compute integrals of the form

has degree of precision Id 2, then the integrals are computed exactly in the case that K is constant. Moreover, as explained in Section 5.5, such a quadrature rule is sufficiently accurate even if K is not constant.
187

188

## Chapter 8. Lagrange triangles of arbitrary degree

1 1 1.000000000000000 0.333333333333333 2 3 0.333333333333333 0.666666666666667 ~ 3 F -0.562500000000000 0.333333333333333 30.5208333333333330.600000000000000 4 10.223381589678011 0.108103018168070 3~ 0.109951743655322 0.816847572980459~ ~5J0.225000000000000 0.333333333333333~ 3 0.132394152788506 0.059715871789770 3 0.125939180544827 0.797426985353087 6 3 0.116786275726379 0.501426509658179 3 0.050844906370207 0.873821971016996 6 0.082851075618374 0.053145049844817

0.333333333333333 0.333333333333333 0.166666666666667 0.166666666666667 0.333333333333333 ~0.333333333333333 0.2000000000000000.200000000000000 0.4459484909159650.445948490915965 0.091576213509771 0.091576213509771 0.333333333333333 ~0.333333333333333 0.470142064105115 0.470142064105115 0.101286507323456 0.101286507323456 0.249286745170910 0.249286745170910 0.063089014491502 0.063089014491502 0.310352451033784 0.636502499121399

Table 8.1. Symmetric Gaussian quadrature rules (Dunavant [18]). The degree of precision is p, the quadrature weight for each point is w, and n permutations of each point of the form (ft, 2. 3) appear in the formula, each with weight w.

Dunavant [18] has derived quadrature rules of degree of precision up to 20. These rules have a certain symmetry, namely, if one of the quadrature nodes is (ft, ft, 2) On barycentric coordinates), then the following three points are all quadratures nodes, and with the same weight:

Similarly, if (ft, 2, &) is one quadrature node, then the following six nodes are all quadrature nodes, and with the same weight:

Table 8.1 gives quadrature rules having degree of precision up to 6. In assembling the stiffness matrix, the element stiffness matrices are computed on each triangle T and added to the global stiffness matrix. There are two ways this can be done: Estimate

by the appropriate quadrature rule over T, or transform the integral to the reference triangle TR and then estimate it. For a generic integral

there is really not much difference between the two approaches. However, since the integrand in (8.2) has such a special form, there is, as it turns out, a great advantage in using the reference triangle.

## 8.1. Quadrature for higher-order elements

189

I will begin by showing how to compute the integrals without using the reference triangle and estimate the work necessary to do so. In order to estimate (8.2) directly, it is necessary to know the values of V0, and V07 at the quadrature nodes. 1 will assume that i and j are local coordinates, so each of 0, and </>/ has value 1 at one of the nodes on T and value 0 at all the other nodes. If

a n d \ f ( x k , y>k),k 1 , 2 , . . . , id, are the nodes on T, then the conditions defining <j>\, fa,..., </>,, on T can be written as the matrix equation MA /, where / is the identity matrix,

and

is the matrix of coefficients of (f>\, fa,..., 0/,r (The system MA I represents /j systems of id equations in ij unknowns; each system has the same coefficient matrix.) Therefore, all of the local basis functions can be computed simultaneously by computing A = M~]. In addition to the mesh nodes (jcj, y\), (jt2, yj),..., (jc/ (/ , >'/,,), there are n quadrature nodes on T, say (*, rjk), k = 1, 2 , . . . , n. The values of the basis functions at these quadrature nodes form a matrix V e Rnx'<>, Vfj </> 7 -(/, ??,), which can be computed by the formula where

If the values of the basis functions are needed, there is no reason to compute the coefficients A = M~] explicitly; instead, V can be computed directly by solving

The cost of solving such a matrix equation, where M is ij x id and C is n by /</, is approximately

arithmetic operations.

190

Chapter 8. Lagrange triangles of arbitrary degree The gradients of the local basis functions can be computed similarly. I will write

and

(these matrices are obtained by differentiating the columns of C with respect to and /?). Then, if Vx, Vy are defined by

it follows that The matrices Vx and Vy can be efficiently computed by solving the matrix equation

The cost of solving this system, in which M is ij x id and [Cj|C^] is /</ x (2n), is approximately

arithmetic operations. Having computed the local basis functions (or their gradients), the integrals are easy to estimate. If Wk, k = 1, 2 , . . . , n, are the quadrature weights over T and (&, r)k), k \, 2 , . . . , n, are the corresponding quadrature nodes (as noted above), then

It is convenient to define the vector / e R" by /* = w*/(*. //*) (since these products are needed for each i = 1,2, . . . , / < / ) . Then

so all the integrals over T can be estimated by the matrix-vector product VT/, which involves 2nid arithmetic operations.

## 8.1. Quadrature for higher-order elements Similarly,

191

where gk W^K^^, rjk). The cost of the above sum is 4 operations. Since there are id (id + l)/2 integrals of the form fT ^V0/ V0/ over T, the total cost of the integrals for the element stiffness matrix is about 2nij. The above results show that the cost of computing the element stiffness matrix is about

operations. I have ignored some of the necessary calculations, such as computing the quadrature weights and nodes on T, but these cost little compared to (8.4). The cost of computing the element load vector is about

operations. This total comes from computing the matrix V (that is, the values of the local basis functions); the additional cost of computing VT/, 2/n</ operations, is negligible by comparison. In obtaining the totals (8.4) and (8.5), I have ignored the cost of computing the values of K and / at the quadrature nodes. Depending on how complicated these functions are, the cost of computing them might be much less or much greater than the costs counted above. In light of the above discussion, it is easy to see the advantage of using a reference triangle. The values and gradients of the local basis functions on TR, y\,Y2, , Yilt, can be computed once and saved. Then, given any triangle T,

Here w(f\ (%(k \ /?[r)), k = 1 , 2 , . . . , , are the quadrature weights and nodes on T and u'k-, (*, %), k 1, 2 , . . . , n, are the quadrature weights and nodes on TR. The scalar A is the area of T. The dominant cost of computing the element load vector over T, namely the cost of computing the matrix V of values of the basis functions, has now been eliminated. Or, to be precise, the same computation must now be performed only once (for TR), so that the number of operations per triangle in (8.5) has been replaced by the negligible cost of

192

## Chapter 8. Lagrange triangles of arbitrary degree

operations per triangle. The other costs remain the same (the quadrature nodes still must be computed over T, the values of / at the quadrature nodes of T must still be obtained, etc.). The overall result is to greatly decrease the cost of assembling the load vector. When using the reference triangle to assemble the stiffness matrix, the formula is

Instead of computing the matrices Vx, Vy described above, this approach requires the computation of J~TVyi(^k, rik) for k = 1 , 2 , . . . , n and i = 1 , 2 , . . . , id. This requires about 6nid operations (see Exercise 1). The other operation counts remain the same. Therefore, the cost of computing the element stiffness matrix for each triangle, using the reference triangle approach, is about

This is again a considerable reduction over the first approach described above, though not as dramatic as for the load vector. The approximate operation counts given above, such as (8.3), are valid for id large. Since id is not particularly large when d is small (for example, 12 = 6, 13 = 10), the specific formulas should not be regarded as precise. Nevertheless, the overall conclusion is valid, even for d = 2: There is a significant gain in efficiency in using the reference triangle approach.

8.2

## Assembling the stiffness matrix and load vector

Most of the necessary details for assembling the stiffness matrix and load vector were described in the preceding section. A remaining detail is the definition of the interpolation nodes on the triangular elements, specifically when there are interior nodes (d > 2). Lagrange triangles of degree d 4, 5, 6 were shown in Figure 4.16. Using barycentric coordinates, the nodes are easily seen to be

## The interior nodes are

The interior nodes on the reference triangle TR are (in Cartesian coordinates)

## 8.2. Assembling the stiffness matrix and load vector

193

Because the various integrals are computed over TR rather than over each triangle Tk, the algorithms for assembling the stiffness matrix and load vector change somewhat in outline from Algorithms 7.2 and 7.5. The gradients of the nodal basis functions y\, YI, ..., yiti, evaluated at the quadrature nodes on TR, are computed as described in the previous section and stored in two matrices:

194

Chapter 8. Lagrange triangles of arbitrary degree Inhomogeneous Dirichlet conditions lead to integrals of the form

which are handled exactly as in Algorithm 8.1. The part of Algorithm 8.2 that handles g should also be understandable at this point. Inhomogeneous Neumann conditions lead to integrals of the form

where e is a free boundary edge and 0, is one of the local basis functions on the triangle to which e belongs. When using Lagrange triangles of degree d, e contains d + 1 nodes. I will assume that h is not provided exactly, but rather that the values of h at the nodes of each free boundary edge are given as input to the routine that assembles the load vector. Then h, restricted to e, will be replaced with the polynomial he (in one variable) interpolating h at the d + 1 nodes on e, and (8.9) will be estimated by

The local basis function 0,, restricted to e, reduces to a polynomial in one variable of degree d. Therefore, the integrand of (8.10) is a polynomial of degree 2d, and (8.10) can be computed exactly by the (d + l)-point Gaussian quadrature rule introduced in Section 7.1.1. To transform (8.10) to the reference interval [-1, 1], e can be parametrized by arc length: e {(x(s), y(s)) : 0 < s < t ] , where I is the length of the line segment e. Then, since [1, 1] is mapped onto [0, 1] by s = t(t + l)/2, there exist polynomials he(t), y/(0 such that

Then

where Wj, tj, j 1, 2 , . . . , d + 1, are the weights and nodes of the (d + l)-point Gauss quadrature rule on [1, 1]. Finally, the functions y,,/ = 1 , 2 , . . . , d, are the one-dimensional nodal basis functions on [ 1, 1 ] (analogous to y,, / = 1, 2 , . . . , *</, on the triangle TR); each is a polynomial of degree d defined by

8.3.

## Implementing the isoparametric method

195

where r!, 1 + 2(j \)/d, j 1, 2 , . . . , d + 1. Any polynomial of degree d can be expressed in terms of y\, fa, , Yd+\, and thus the values of these basis functions at the quadrature nodes can be computed once and for all. Thus the reference interval [1, 1] is used in a fashion exactly analogous to how the reference triangle TR is used when the stiffness matrix and load vector are assembled. The remaining details are left to the reader (see Exercise 3). The complete algorithm for assembling the load vector is given in Algorithm 8.2.

8.3

## Implementing the isoparametric method

The isoparametric method for solving a BVP on a nonpolygonal domain was described in Section 4.7. If T is an element next to the boundary, then T is defined to be the image of the reference triangle under a transformation of the form

where p and q are polynomials of degree J. The reader will recall that "isoparametric" means that d is also the degree of the shape functions. As shown in Section 4.7, the transformation can be represented explicitly as

where (x\, y\}, (^2, > ' 2 ) , - - , (-*/,,, >',,/) are the nodes on the element T and y\, 5/2, , Yi,, are the nodal basis functions on the reference triangle 7^. The basis functions on T are then defined by where (s, t) and (x, y) are related by (8.13). Since (f>\, fa,..., 0,,, are not polynomials, it is essential to compute

## by transforming the integrals to TR:

When implementing these formulas, it is important to bear in mind that the Jacobian matrix J is not constant.

196

8.3.

## Implementing the isoparametric method

197

Using an -point quadrature rule with degree of precision 2d -2 results in the following formulas:

Here w^ (*, rjk), k 1, 2 , . . . , n, are the quadrature weights and nodes on TR, and /, K are the functions /, K transformed from T to TR. These formulas can be implemented using the matrices V, Vs, Vt, as in the previous section. The only novelties are the computation of the quadrature nodes on an arbitrary T and the computation of the nonconstant Jacobians. I will write (/ r) , r? ( - r) ) for the quadrature node on T corresponding to (,-, ??,). In light of the correspondence (8.13), (/ r) , /?((7)) is given by

## Therefore, the matrix-vector products Vx, Vy, where

yield the coordinates of the quadrature nodes on T. (I am abusing notation by using the same symbol for the variable x and the vector whose components are x\, X 2 , . . . , *,-,,, and similarly for y.) The values

and hence

198

## Chapter 8. Lagrange triangles of arbitrary degree

The other partial derivatives needed to compute J (/, 77,-) can be computed similarly. All of the needed values of J can thus be computed from four matrix-vector products: Vsx, Vtx, Vsy, Vty. The above discussion covers the assembly of K and the contribution of the function / (the right-hand side of the PDE) to the load vector. Computation of the contribution of nonzero Dirichlet data to the load vector parallels the computation of entries in the stiffness matrix, and the details are left to the reader. If the BVP includes inhomogeneous Neumann conditions, then integrals of the form

must be computed, where e is an edge of triangle T. I adopt the convention that if T contains a curved edge, then it must be the second edge. It then corresponds to the second edge of TR, {(I s, s) : 0 < s < 1}, and can therefore be parametrized as where and (p, q) defines the transformation from TR to T (see (8.12) and (8.13)). Then

(using the element of arc length to write the line integral). An n-point (one-dimensional) Gauss quadrature rule can be used to estimate the integral:

(rather than invent new notation, I now use w,, f, to denote the weights and nodes from the one-dimensional Gauss quadrature rule). Evaluation of

now follows a familiar pattern. The matrices V, Vs, and Vt are defined by

These matrices can be computed in the same fashion as V, Vs, and Vt (see Section 8.1). Then

## 8.3. Implementing the isoparametric method

199

The last two formulas follow from the chain rule applied to a(s) p(\ s, s) and b(s) q(\ - s,s). If the function h is not given explicitly (so that h(a(ti), b(tj)) cannot be evaluated directly), then an interpolating polynomial can be defined to approximate h, as was described in the previous section. This is the approach taken in the MATLAB code.

8.3.1

## Placement of nodes in the isoparametric method

A subtlety that arises in implementing the isoparametric method is the definition of the nodes interior to the elements having curved edges. Formula (8.8), which defines the interior nodes for an ordinary triangle, gives the nodes in terms of the vertices of the triangle. When the element has a curved edge, this formula is no longer adequate; the interior nodes must move with the curved edge. This is necessary because the transformation from TR to T is defined by the nodal placements, and the integration theory requires that the Jacobian determinant of this transformation not change sign. The convergence theory imposes even more stringent conditions, such as uniform bounds, on the norms of the Jacobians and their inverses. Ciarlet and Raviart [15] derived the basic theory that governs the placement of the nodes, and Scott [38] devised an algorithm for computing nodes on triangular meshes that satisfy the required conditions. This algorithm was later extended by Lenoir [27] to higherdimensional problems. I will now describe Scott's algorithm. Given an element with a curved boundary, there are actually three related elements (plus the reference element TR) involved. As in Section 4.7,1 denote by a> a subregion of 2 that includes part of the (curved) boundary of 2. The ordinary triangle approximating co (see Figure 4.20) will be denoted by f , while the isoparametric triangle approximating a> will be denoted by T. The curved part of the boundary ofco will be denoted by Fw, which is parametrized as

## The element T is obtained by approximating F(y by a polynomial curve

where a, b are polynomials of degree d that interpolate a and b at d + 1 nodes on f w , namely, at the points

The reference triangle TR can be mapped to f by a linear mapping F. By convention, the curved edge of T must be the second edge, which corresponds to the second edge of TR:

Thus the second edge of f is { F ( \ s, s) : 0 < s < 1}. The interior nodes of TR lie on (horizontal) lines joining a node (0, i/d) on the third edge of TR with a node (1 i / d , i/d) on the second edge of TR. The linear mapping sends these nodes to d i 1 nodes on the line joining the node F(Q, i/d) on the third edge of T with the node F(\ i / d , i/d) on the second edge of f . The Scott algorithm merely moves these nodes to be evenly spaced

200

## Chapter 8. Lagrange triangles of arbitrary degree

Figure 8.1. The reference triangle TR of degree 5, and a corresponding isoparametric "triangle." between F(Q, i/d) (which is a node on the third edge of T) and (a(i/d), b(i/d)) (the node on the second edge of T corresponding to F(\ i / d , i/d)). Thus the interior nodes are

## for i = 1 , 2 , . . .,d 2, j = 1, 2 , . . . , d 1 i. An example is shown in Figure 8.1.

8.4

Examples

To illustrate the use of higher-order Lagrange triangles, I will repeat Example 7.3 from Section 7.4, but now using Lagrange triangles of increasing order. EXAMPLE 8.1. In this example, 2 is a region bounded by two squares (see Figure 7.2). The BVP is

-where f andg are chosen so that the exact solution is u(x, y) = x/(x2 + y2). The initial mesh is defined by 16 triangles and 16 nodes; the initial mesh and three refinements were shown in Figure 7.2. In Example 7.3, it was shown that linear elements on the finest mesh from Figure 7.2 resulted in an energy norm error of about 16.8% not particularly small. In this example, higher-order elements are used to solve the same problem. The following table shows the percent errors (in the energy norm) resulting from solving the BVP using Lagrange triangles of degree d = 1,2,3,4 on the four meshes To, 7], ?2, TS from Figure 7.2. The numbers in parentheses are the numbers of free nodes for each mesh. Mesh To T\ Ti Ti d=1 79.3(0) 55.4(16) 32.0(96) 16.8(448) d =2 50.5(16) 18.6(96) 5.8(448) 1.6(1920) d=3 28.0(48) 5.6(240) 0.97(1056) 0.14(4416) d=4 14.6(96) 1.6(448) 0.16(1920) 0.012(7936)

8.4. Examples

201

The number of free nodes gives some indication of the amount of work to form and solve the finite element equations, and therefore provides some basisfor comparison. The results show that, for this problem, higher-order elements are more efficient when an accurate solution is desired. For example, d = \ on mesh 71, d 2 on mesh Ti, andd 4 on mesh T\ all result in 448 degrees of freedom. The errors are 16.8%, 5.8%, and \ .6%, respectively, confirming that the higher-order method gives a smaller error for a given amount of work. This is always true when the true solution is smoothhigher-order elements are asymptotically more efficient than low-order elements. EXAMPLE 8.2. To illustrate the use of isoparametric finite elements, this example solves the same BVP as in the previous example, except that the outer boundary ofQ is changed to the unit circle. The following table compares the percent errors obtained using linear Lagrange triangles with those obtained using cubic isoparametric triangles. In each case, only the (energy norm) error over the approximate domain 2h is computed In the case of the cubic elements, the additional error is quite small, since 1^ is very close to 2. For linear elements, the neglected error is much more significant (cf. Example 4.5 in Section 4.7), but it does go to zero at the same rate as the computed error. As in the previous example, the number of free nodes on each mesh is given in parentheses.
Mesh T0 T1 T2 T3 T4
78.8 (0) 27.0 (36) 54.5 (12) 5.3 (180) 31.3 (72) 0.89 (792) 16.4 (3236) 0.12 (7923) 8.3 (1440) 0.016 (13536)

Comparing these results to the results of the previous exercise shows that the results of using isoparametric elements on a domain with a curved boundary are quite similar to using ordinary elements on a boundary with a polygonal domain. Figure 8.2 shows the initial nonisoparametric mesh and the initial isoparametric mesh. Figure 8.3 shows the final computed solution.

Figure 8.2. The initial nonisoparametric mesh (left) and the initial isoparametric mesh (right) from Example 8.2. The exact (curved) boundary ofl is shown as a dashed line. In the isoparametric mesh, the cubic boundary segments follow the curved boundary quite closely.

202

## Chapter 8. Lagrange triangles of arbitrary degree

Figure 8.3. The final computed solution from Example 8.2. If Lagrange triangles of degree d are used to solve a BVP that has a smooth solution, then

holds. Under the assumption \\u Uh \\ Chd, the reverse triangle inequality yields

## This leads to the heuristic

which can be used to assess the quality of a solution. EXAMPLE 8.3. As the final example for this chapter, I will solve the second BVP from Example 7.5 again, this time using quadratic Lagrange triangles. The BVP is

## 8.5. The MATLAB implementation

203

and it models the steady-state temperature distribution in a square metal plate measuring 20 cm by 20 cm, completely insulated on the top and bottom and also on three edges. The plate is heterogeneous, with the thermal conductivity plotted in Figure 7.6. The reader can review Example 1.5 for the rest of the details about the By P. In this example, the goal will be to compute the solution to within 1 % in the energy norm. The heuristic (8.16), in the case d = 2, is

The initial mesh is uniform, with 32 triangles and h 5\/2 = 7.071. Solving the BVP on four successive meshes consisting of quadratic Lagrange triangles yields the following results:
7.071 3.536 1.768 0.8839
0.456 0.128 0.0335 3.56 3.83 5.09 5.15 5.10 9.0 2.5 0.66

The ratios of \\U2t, u\$h H/||/, ii2h HE suggest that the assumption

is not unreasonable, and the values of\\u^ U2h\\E show that the estimated error in the fourth computed solution is less than 1 %.

## 8.5 The MATLAB implementation

8.5.1 version2
MATLAB routines that handle Lagrange triangles of arbitrary degree form version2; the names of routines generally have a suffix "2." The main routines in version2 of the MATLAB code are Stif fness2 and Load2, which assemble the stiffness matrix and load vector for the model problem. These routines work for Lagrange triangles of any degree, with one limiting factor: Quadrature rules have been coded only up to degree of precision 20, limiting the degree of the triangles for model problem (8.1) to d = \ I. The quadrature rules are coded in DunavantData. Other major computational routines are EvalNodalBasisFcns, for computing the matrix V// = Yj(si> */) and EvalNodalBasisGrads, for computing the related matrices Vs and Vt. Other routines presented in Section 7.5, such as ShowMeshl, are extended to meshes of arbitrary degree. MATLAB functions Mesh2: Type help Mesh2 to see a description of the mesh data structure. GenLagrangeMesh2 Converts a mesh of degree 1 to a mesh of degree d.

204

## Chapter 8. Lagrange triangles of arbitrary degree

ExtractLinearMesh Recreates the original (linear) mesh from the output of GenLagrangeMesh2. Ref Tri Creates the reference triangle as a mesh of degree d (with one triangle). ShowMesh2 Graphs a mesh. ShowPWPolyFcn2 Graphs a continuous piecewise polynomial function on a given mesh. St if f ness2 Assembles the stiffness matrix (as a MATLAB sparse matrix) for the model problem on a mesh of arbitrary degree. Load2 Assembles the load vector for the model problem on a mesh of arbitrary degree. TransToRef Tri Computes the transformation from the reference triangle TR to an arbitrary triangle T. DunavantData Defines the quadrature rules for a triangle. GaussData Defines the quadrature rules for an interval. EvalNodalBasisFcns Evaluates the nodal basis functions at a list of evaluation nodes. EvalNodalBasisGrads Evaluates the gradients of the nodal basis functions at a list of evaluation nodes. EvalNodalBasisFcns ID Evaluates the nodal basis functions for a one-dimensional interval at a list of evaluation nodes. EvalPWPolyFcn2 Evaluates a piecewise polynomial function of arbitrary degree at a list of evaluation nodes. EnergyNorm2 Estimates (by quadrature on a given mesh) the energy norm of a known function. EnergyNormErr2 Estimates (by quadrature) the energy norm of the difference between a known function and a piecewise linear function on a given mesh. L2Norm2, L2NormErr2 Like the previous routines, but for the L2-norm. Interpolate2 Interpolates a piecewise linear function onto a mesh of higher degree obtained by an application of GenLagr angeMe sh2. Interpolate2a Interpolates a piecewise polynomial function onto a finer mesh obtained by a single application of Ref inel. TestConv2 Monitors the convergence of the finite element method on a model problem with a known solution.

## 8.5. The MATLAB implementation

205

The functions for retrieving information from the mesh data structure are - getNeumannData2 Evaluates h Kdu/dn on the free boundary edges, given K and u. - getNeumannData2a Evaluates a given function h on the free boundary edges. - getNodes Extracts the coordinates of the nodes of a triangle, along with their indices. - getVertices Extracts the coordinates of the vertices of a triangle, along with their indices. - getNorma!2 Computes the outward unit normal to a given side of a triangle. - getTriNodelndices Creates the triangle-node list for a given mesh. Some routines, like Ref Tri and getNodes, have no suffix of "2" because they do not need updating for versions.

8.5.2

versions

The most general routines handle Lagrange triangles of any order, and use isoparametric elements when the boundary is curved. These routines comprise versions, and their names have no suffix: Stiffness, Load, GenLagrangeMesh, ShowMesh, ShowPWPolyFcn, and so forth.
MATLAB functions

All of the following routines, including the graphic routines, handle isoparametric elements correctly. Mesh: Type help Mesh to see a description of the mesh data structure. GenLagrangeMesh Converts a mesh of degree 1 to a mesh of degree d. ShowMesh Graphs a mesh. ShowPWPolyFcn Graphs a continuous piecewise polynomial function on a given mesh. Stiffness Assembles the stiffness matrix (as a MATLAB sparse matrix) for the model problem on a mesh of arbitrary degree. Load Assembles the load vector for the model problem on a mesh of arbitrary degree. EnergyNorm Estimates (by quadrature on a given mesh) the energy norm of a known function. EnergyNormErr Estimates (by quadrature) the energy norm of the difference between a known function and a piecewise linear function on a given mesh.

206

## Chapter 8. Lagrange triangles of arbitrary degree

TestConv Monitors the convergence of the finite element method on a model problem with a known solution. The functions for retrieving information from the mesh data structure are - getNeumannData Evaluates h = Kdu/dn on thefreeboundary edges, given K and u. - getNormal Computes the outward unit normal to a given side of a triangle.

8.6

## Exercises for Chapter 8

1. Show that solving JTB G for B, where J is a nonsingular 2 x 2 matrix and G is a 2 x n matrix, requires about 6n arithmetic operations. 2. Count carefully the number of arithmetic operations required to assemble the stiffness matrix (for the model problem) using the algorithm described in this chapter in the case of quadratic Lagrange triangles. Express the results in terms of TV, and also Nv (cf. Exercise 7.6.7). 3. Letrj = -\+ 2(j - \)/d, j = 1, 2 , . . . , d + 1, and let y;, / = 1 , 2 , . . . , d + 1, be defined by (8.11). (a) Given quadrature nodes r;, j 1 , 2 , . . . , d + 1, on [ 1, 1], show how to efficiently compute Yi(tj),i, j \, 2 , . . . , d + 1. (b) How many arithmetic operations does it cost to perform the computation described in the previous part? 4. (MATLAB) Test the heuristic suggested in Example 8.3 on the following BVP:

Here Q is the unit square, f ( x , y) = sin (nx) sin (2ny), and the exact solution i s w ( j c , y ) = (Sjr2)"1 sin(7TJt)sin(27ry). Use piecewise quadratic functions (the MATLAB function Interpolate2a will be useful). Is (1/3) \\Uh U2h\\E a good estimate of \\u /, ||E? 5. (MATLAB) Consider the BVP

## 8.6. Exercises for Chapter 8

207

where 2 is the upper half of the unit circle (Q {(x, y) : x1 + y1 < 1, y > 0}), K(X, y) = 1 + y, and / is chosen so that the exact solution is u(x, y) y(\ x2 y2)ex. The BVP can be solved using regular triangles or isoparametric triangles (the two coincide for linear triangles). The purpose of this exercise is to examine the improvement in accuracy obtained by using isoparametric triangles. Measuring the errors in this case is complicated by the fact that there are three different domains involved (2, the polygonal approximation to 2, and the isoparametric approximation to 2). In this exercise, the errors near the boundary will be disregarded and only the errors in the interior will be compared. To simplify matters further, only errors in the nodal values will be considered. Start with a coarse mesh on 2 and refine it several times. On each mesh, solve the BVP using ordinary triangles and again using isoparametric triangles. Record the level of refinement and the maximum error at any free node for each computed solution. At what rate is each error (nonisoparametric, isoparametric) going to zero? 6. (MATLAB) Repeat the previous exercise, but this time use cubic triangles. Do the errors resulting from ordinary triangles improve in going from quadratic elements to cubic elements? Explain your results. 7. (MATLAB) Let 2 be the polygonal region having vertices (0,0), (1,0), (1, 1), (-1,1), (-!,-!), (0, -1) and let u be the function defined in polar coordinates by u = r 2 / 3 sin (20/3), 0 < 0 < 2n. (a) Verify that u is harmonic (that is, satisfies Aw = 0). (b) Verify that VM is singular at the origin. In fact, does not belong to //2(2), so the standard convergence theory from Chapter 5 does not apply. In particular, when solving by finite elements, \\u Uh \\E = O(h) need not hold. (c) Solve the Dirichlet problem

(g u on 92) using piecewise linear functions on a sequence of uniformly refined meshes. Record the energy norm errors. At what rate does \\u /, \\E appear to go to zero? (d) Repeat the previous part using piecewise quadratric functions. (e) Repeat using piecewise cubic functions. Does increasing the degree of the shape functions improve the accuracy of the solution? 8. (MATLAB) Consider the BVP

208

Chapter 8. Lagrange triangles of arbitrary degree where 2 is the unit circle and g is defined by

Here 0 is the polar angle corresponding to (jc, _y) on the unit circle ((x, y) = (cos (#), sin (#))). Using piecewise quadratic finite elements, try to compute the solution to this BVP to within 1% in the energy norm. 9. (MATLAB) Repeat the previous exercise, replacing the Dirichlet data g with

Which problem is easier to solve to the given accuracy? Why? 10. (Programming) Write a MATLAB routine Solve2 to apply finite elements, with quadratic Lagrange triangles and uniformly refined meshes to solve a given BVP to within a given tolerance. Use the heuristic (8.16) to decide when the solution is accurate enough. Also include a limit on the number of refinements or the total number of triangles so that the routine will stop in a reasonable amount of time, even if an unrealistically small tolerance is given. 11. (Programming) Write a MATLAB routine Mass2, similar to Stiffness2, for computing the mass matrix M e RN/xNf defined by

(cf. Exercise 4.8.17, where M was introduced). Your routine should handle a polygonal mesh of arbitrary degree d. Use a quadrature rule having degree of precision Id. 12. (Programming) Write a MATLAB routine Mass, similar to Stiffness, for computing the mass matrix M e RNfxNf defined by

(cf. Exercise 4.8.17, where M was introduced). Your routine should handle a mesh of arbitrary degree d with isoparametric elements. Use a quadrature rule having degree of precision 2d.

Chapter 9

## The finite element method for general BVPs

The previous two chapters have been restricted to the model problem (8.1). In this chapter, I will discuss some of the more general problems that can be treated by the same techniques. I will also discuss some limitations of the Galerkin method.

9.1

Scalar BVPs

## The most general second-order PDE in x and y takes the form

(the negative signs on the leading coefficients are for convenience). When the solution u is sufficiently smooth, the mixed partial derivatives of u are equal:

Then

209

210

## Chapter 9. The finite element method for general BVPs

For this reason, it will be assumed that the matrix A defined by (9.2) is symmetric. Furthermore, the reader can easily verify that

(see Exercise 1). Therefore, writing c for the vector-valued function with components c\, 2, (9.1) can be rewritten as

The theory and techniques developed in this book are applicable when the matrix A is symmetric and uniformly positive definite over Q: There exists a constant A > 0 such that

If this fails to be true, the PDE is not elliptic and (9.1) has a different character. In such a case, data from part of the boundary are typically propagated across the domain 1 to the remainder of the boundary, and BVPs in which the boundary data are specified on the entire boundary (such as have been studied in this book) are not well-posed. Finite element methods can be developed for such problems; in the most-studied cases, one of the variables jc or y is time and the propagation of information is explicitly recognized in that the time variable is treated differently than the spatial variable. Such problems are beyond the scope of this book. As discussed in Section 2.6, the presence of the term (c + V A) Vu means that the variational form of (9.4) involves a nonsymmetric bilinear form, which may or may not be elliptic. Also, if/? is negative, the bilinear form may fail to be elliptic. Therefore, the theory and techniques developed thus far in the text apply most readily to the PDE

where A is symmetric positive definite and p > 0 in Q. A well-posed BVP would have the form

find

## 9.1. Scalar BVPs

2JJ_

Here V = {v e //' (2) : v = 0 on F\} and G is any function in //' (2) satisfying G = g on 32. Under the assumptions given above, the bilinear form

is V-elliptic (see Exercise 3). Given a finite element subspace V/, of V, having basis {0i, 02, , </>Nf}, (9.7) can be treated in the usual way. The Galerkin method leads to the variational problem

where

or

## this can be written as the matrix-vector equation

The matrix K is the version of the stiffness matrix suitable for this problem. The matrix M is called the mass matrix; it was the subject of Exercises 2.7.12, 4.8.17, and 7.6.16. The vector F is the load vector for this problem.

212

## Chapter 9. The finite element method for general BVPs

Assembling K, M, and F presents little that is different from previous calculations. The entries of K have the form

## are assembled from integrals of the form

where T is an arbitrary triangle in the mesh. Transforming to the reference triangle TR and using an n -point quadrature rule,

Here J = \ det (J)\ is the Jacobian determinant and p is the function p transformed from T to TR (p(s, t) p(x, y), (s, t) e TR, (x, y) e T). The quadrature rule should be chosen to be exact in the case that p is constant, that is, to have degree of precision 2d if Lagrange triangles of degree d are used. Finally, the term

is the only new part of the expression for the load vector, and a simple modification of Algorithm 8.2 will handle this.

9.1.1

An example

## Replacing the scalar-valued function K in the PDE

with a symmetric positive definite matrix A introduces anisotropy into the model. For example, when (9.10) represents steady-state heat flow, the basic physical law is Fourier's law of heat conduction,

## 9.2. Isotropic elasticity

213

where u is the temperature, K is the thermal conductivity, and q is the heat flux. Since K is a scalar, the magnitude of the heat flux at a point (jc, y ) e Q, \\q(x, y)\\2 = \\K(X, y)Vu(x, y)\\2, is the same regardless of the direction of VM. Also, the direction of the heat flux is the (opposite of the) direction of VM. On the other hand, the law

says that the magnitude of the heat flux depends on both the direction and size of VM, and the direction of the heat flux is not simply the direction of VM. According to this model, heat flows in some directions more readily than in others. I will refer to the PDE (9.5a) as the matrix conductivity problem. EXAMPLE 9.1. Consider the (constant) matrix

which has eigenvalues 0.5 and 5. The corresponding eigenvectors point in the directions of the vector ( 1 , 1 ) and(1,1), respectively. Therefore, if A represents an anisotropic thermal conductivity, then heat flows from the lower right to the upper left much more readily than from the lower left to the upper right. Now consider the BVP

where 2 is the unit square and T\, V^, 1^, F4 are the bottom, right, top, left edges, respectively. This B VP models an anisotropic plate with the left and right edges insulated and the top edge held at 0 degrees. Heat energy is entering across the bottom edge at the rate of 10 (in W/cm1, for example). To illustrate the effect of the anisotropy, I solved the BVP using a regular mesh of 128 linear Lagrange triangles. The resulting heat flux (AVu) is displayed on the left in Figure 9.1 as a vector field. For comparison, I solved the same BVP but with A replaced by a scalar; the result is the heat flux shown on the right in Figure 9.1. As one would expect, the heat flows toward the upper left corner in the anisotropic plate but straight up in the isotropic plate.

9.2

Isotropic elasticity

The equations of isotropic elasticity form perhaps the simplest interesting elliptic system of PDEs (as opposed to the scalar PDEs that have been treated up to now). This system describes the displacement M of an isotropic elastic membrane with Lame moduli n, X under the influence of a body force /:

214

## Chapter 9. The finite element method for general BVPs

Figure 9.1. The heat flux from the anisotropic plate (left) and the isotropic plate (right) of Example 9.1. The lengths of the flux vector are scaled, so that the vectors in each graph can be compared only to other vectors in the same graph (that is, comparing the vector lengths on the left to the vector lengths on the right is meaningless). The displacement u is a vector-valued function of (x, y) e Q and V is the 2 x 2 matrix (called the Jacobian of u in other contexts) whose /th row contains the partial derivatives of HI with respect to x and y. The tensor is the (linearized) strain, and a is the stress tensor. The Lame moduli describe the elastic properties of the membrane (see the exercises in Chapter 1 for more details). The tensors a and depend on M, and I will write au, eu when necessary. A typical BVP for a membrane is

where, as usual, TI, T2 partition 3fi. If F\ = 32, T2 = 0, the BVP is called a pure displacement problem, while if FI = 0, F2 = d2, it is a pure traction problem. For simplicity, I will begin by considering only homogeneous Dirichlet conditions (g = 0 in (9.12)). The variational form of (9.12) (with g 0) is

where In the finite element method, the approximating subspace of V can be taken to be a vectorvalued version of the usual space of scalar piecewise polynomials,

## 9.2. Isotropic elasticity

215

Here P^ represents the space of continuous piecewise polynomial functions of degree at most d, relative to a given triangulation Tt, of 2. The vector-valued version of V/, is

If {01 > 02, , <t>N,} denotes the usual nodal basis for Vh, then the nodal basis for Vh is

## and t for the linear functional

At the abstract level, the Galerkin method is unchanged: The approximate solution is

## where U e R 2N/ solves KU = F and

To assemble K and F, though, it is necessary to take into account the fact that there are two different kinds of basis functions, 0, = (0,, 0) and 0# /+ / = (0, 0,). As a result, K and F are naturally partitioned as

where

and

216

## Chapter 9. The finite element method for general BVPs

The matrices K(U) and K{22) are symmetric and K(2l) = (K(U))T, but (12) is not symmetric. The problem of assembling K and F now reduces to assembling K(}1\ K(U\ (21) , (22) K and F(}\ F(2). As in the case of the matrix conductivity considered in the previous section, the computations are very similar to those considered in previous chapters. I begin with^ (11) . Since

## The above formulas yield

Equation (9.15) is a slight variation of formulas that have been treated earlier. Exercise 4 asks the reader to derive the following formulas:

## 9.2. Isotropic elasticity

217

where f\ and h \ are the first components of the vector-valued functions / and h, respectively. Similarly,

Analogous formulas appeared when treating scalar PDEs. Finally, if the Dirichlet data function g is nonzero, then the variational problem becomes

find
where G is any function in (//' (f2))2 such that G = g on F]. When applying the Galerkin method with finite element functions, G can be replaced with G/,, the piecewise polynomial interpolating g on F[ and having all other nodal values equal to zero. It remains then to compute the contributions a(Gh, (0,, 0)) and a(G, (0, 0,)) to F (l) and F (2) , respectively. The derivations follow the pattern used when computing K^l) above, and the results are

EXAMPLE 9.2. Consider a square isotropic elastic membrane occupying the unit square 2, and subjected to a purely longitudinal traction in the x-direction:

Suppose n and A are the constants /z 1, A = 2, which correspond to a Young's modulus ofE = 3 and a Poisson's ratio ofv = 0.5 (see Exercise 1.3.6). Solving the BVP

yields the solution shown in Figure 9.2. This simple traction problem was examined in Exercise 1.3.6, in which the solution was expressed in terms of the Young's modulus E and Poisson 's ratio v of the membrane,

218

## Chapter 9. The finite element method for general BVPs

Figure 9.2. The homogeneous membrane from Example 9.2. The mesh on the deformed membrane is superimposed on the mesh on the undeformed membrane. the exact solution is

Since u is linear, it should compute exactly with linear Lagrange triangles, so it is easy to check whether the code performs correctly. Now consider the same membrane with a stiff region, the disk of radius 0.2 centered at (x, y) (0.6, 0.7). This region also has v = 0.5, but E is increased by a factor of about 10. To be precise, Young's modulus in the stiff region is defined, for this simulation, by

## The necessary formulas for // and A. can be computed from

(cf. Exercise 1.3.7). The resulting BVP was solved and the solution graphed in Figure 9.3. The effect of the stiff region can be clearly seen.

9.3

Mesh locking

The elastic properties of a membrane can be described by the Lame moduli /A and A., or equivalently by Young's modulus E and Poisson's ratio v. As Exercise 1.3.6 suggests, another pair of moduli that could be used instead are the bulk modulus JJL + A. and the shear modulus fji. The bulk modulus determines the response of the material to a pure expansion; when the bulk modulus is large, the material resists expansion. On the other hand, the shear modulus determines the response to a shear. When the bulk modulus is very large compared to the shear modulus, that is, when A. is large compared to //, then the material is nearly incompressible. Such cases present difficulties for simulation with Galerkin finite elements.

9.3.

Mesh locking

2119

Figure 9.3. The heterogeneous membrane from Example 9.2. The stiff region of the (undeformed) heterogeneous membrane is indicated by the circle. EXAMPLE 9.3. Consider a square membrane, occupying the unit square when at rest, and fixed on its top boundary. The bulk modulus /-i + A. and the shear modulus [i of the membrane are taken to be 5000 and 2.5, respectively, so that A = 4997.5 and the membrane is nearly incompressible. A traction of magnitude 1 is applied in the downward direction on the bottom of the membrane, so that the displacement u of the membrane is governed by the BVP

where a = 2/ii6,( + A/r(e,,)/ and Y\, F2, FT, F4 are the bottom, right, top, and left sides of the square, respectively. The reader should be able to predict the shape of the deformed membrane without difficulty; it is stretched downward from its fixed top edge and contracted horizontally. On the other hand, when this B VP is solved using the Galerkin method with linear Lagrange triangles, the computed displacement is as shown on the left in Figure 9.4. 777/5 result is clearly inaccurate; this inaccuracy is due to an effect called mesh locking. A computed displacement free from the effects of locking can be found using higher-order elements. The right-hand graph in Figure 9.4 shows an accurate displacement computed using quadratic Lagrange triangles. The mesh locking in the previous example can be explained as follows: The strain 6U satisfies the variational problem

With the boundary conditions (which determine the right-hand side) fixed, it seems believable that increasing A must cause tr( (/ ) to decrease, and that A > oo implies that tr(e u ) > 0.

220

## Chapter 9. The finite element method for general BVPs

Figure 9.4. The computed displacement for Example 9.3, using linear Lagrange triangles (left) and quadratic Lagrange triangles (right}. The inaccuracy in the left-hand graph is due to mesh locking. This can be proved rigorously (see [13, Section 11.3] and the references contained therein), so A. large implies that tr(e M ) = V u is small in 2. This makes a finite element space based on linear Lagrange triangles a poor choice, since such a space typically contains no nonzero divergence-free functions. This implies that V w/, is small only when w/, itself is smallthat is, the mesh "locks" (displays inaccurately small displacements). Nearly incompressible materials can be simulated using higher-order Galerkin finite elements; for example Babuska and Suri [9] show that locking cannot occur for Lagrange triangles of degree d > 4. If the use of higher-order elements is undesirable, then it is necessary to move outside of the Galerkin framework and, moreover, into an active research problem. The papers by Arnold and Winther [4,5] and by Cai and Starke [ 14] will introduce the reader to recent results.

9.4

## The MATLAB implementation

versions contains the MATLAB routines necessary to generate the examples for this chapter. No attempt has been made to provide a complete set of codes. For example, there is no updated Load function for the PDE with a matrix coefficient. Load2 works for this problem as long as the Dirichlet data (if any) are zero; a new routine would be necessary to handle nonzero Dirichlet data.
MATLAB functions

All of the following routines handle isoparametric elements correctly. ShowDisplacement Graphs a two-dimensional displacement as a perturbation to a mesh. Stiff nesslso Assembles the stiffness matrix (as a MATLAB sparse matrix) for the system of isotropic elasticity on a mesh of arbitrary degree.

## 9.5. Exercises for Chapter 9

221

Load I so Assembles the load vector for the system of isotropic elasticity on a mesh of arbitrary degree. Stif fnessMC Assembles the stiffness matrix (as a MATLAB sparse matrix) for the matrix conductivity problem on a mesh of arbitrary degree. Tes tConvI so Monitors the convergence of the finite element method for a system of isotropic elasticity with a known solution. TestConvMC Monitors the convergence of the finite element method for a matrix conductivity problem with a known solution. getNeumannDatalso Evaluates h = outi on the free boundary edges, given the entries in Vw and /z, A.

9.5

## Exercises for Chapter 9

1. Suppose A is a matrix-valued function A : 2 > R 2 x 2 defined by (9.2). Apply the product rule to the expression V (AVu) to derive (9.3). 2. Derive (9.6), assuming A : 2 > - R 2 x 2 , M : ! Q - > R , and v : 2 -> R are smooth and extend smoothly to the boundary of Q. 3. Suppose A <E R 2 x 2 is symmetric positive definite. Then the eigenvalues of A are positive, and if A2 > AI > 0 are the eigenvalues, then

Use these facts to show that the bilinear form (9.8) is V-elliptic when p > 0 on 2, A is symmetric and uniformly positive definite on Q, and V = {v e Hl(2) : v Oonr,}. 4. Derive formulas (9.16) and (9.17). 5. (MATLAB) Solve the BVP from Example 9.3 using piecewise linear finite elements, beginning with the mesh shown in Figure 9.4, and refining the mesh twice. Are the effects of locking still visible on the refined mesh? 6. Consider the pure traction problem for the system of isotropic elasticity,

where a = 2/ze + Atr(e)/. When the finite element method is applied, the result is a singular stiffness matrix K. (a) Use the results of Exercise 1.3.5 to show that the null space of K is threedimensional, and to find a basis for the null space.

222

## Chapter 9. The finite element method for general BVPs

(b) Explain how to produce a nonsingular matrix by removing three (properly chosen) rows and columns from K (cf. Example 7.4). (Hint: Removing a column from K is equivalent to setting the corresponding component of the displacement to zero. It would be physically reasonable to fix one point (thus eliminating the translational degrees of freedom) and to set either the x- or ^-component of displacement of another point to zero (thus eliminating the rotational degree of freedom).) 7. (MATLAB) Let 2 be the unit circle and consider the pure traction problem (9.21), where [i 1000, A. = 10, and the traction on the boundary is given by on 100. Solve the BVP using piecewise linear finite elements, graph the displacement, and verify that the computed displacement is correct by comparing it with the results of Exercises 1.3.6b and 1.3.8. (Warning: Recall that a pure traction problem like this results in a singular stiffness matrix. Use the results of the previous exercise to solve KU = F.) 8. (MATLAB) Let Q be the unit square and consider the BVP

where Fj is the bottom side of the square (corresponding to y = 0) and the other three sides comprise F2. (a) What boundary traction h will produce the pure shear of Exercise 1.3.6a? (b) Formulate and solve the BVP in MATLAB for fi = 1000, X = 10, and S = 0.1 (8 is the constant from Exercise l.3.6a). 9. (MATLAB) Consider the pure fraction problem (9.21), where Q, is the unit circle, H 1000, A = 10, and h is given in polar coordinates by h(0) = 1000cos (kO)n, n = (cos (6), sin (6)). Solve the BVP and plot the displacement for various (small) integers k. Use quadratic isoparametric triangles. 10. (Programming) Write a MATLAB function LoadMC that assembles the load vector for BVP (9.5). (The reader can choose whether to do this for linear meshes, arbitrary polygonal meshes, or arbitrary meshes with isoparametric elements.) LoadMC should handle inhomogeneous boundary conditions. 11. (Programming project) Write a collection of routines (like Stiffnesslso, Loadlso, and so forth) to solve the BVP (9.12) on a mesh of quadrilaterals. Use bilinear shape functions. Test your code on the BVP from Example 9.3 and on similar problems with nearly incompressible materials. Are the same locking effects seen?

Part III

Chapter 10

## Direct solution of sparse linear systems

This part of the book discusses the problem of solving the finite element equations K U F. As I have pointed out several times, one of the primary advantages of the finite element method is that the matrix K is sparse, which makes it possible to solve KU = F even when K is very large. There are two classes of algorithms for solving linear systems: direct and iterative methods. Direct methods are related to the fundamental Gaussian elimination algorithm and are distinguished by the fact that the exact solution (up to round-off error) is computed in a finite number of calculations. On the other hand, an iterative method computes a sequence of approximations that converges to the exact solution. Although the exact solution is computed only in the limit, an acceptable approximation is often found after relatively few iterations. For many problems, an appropriate iterative method uses much less memory and computational time than a direct method. Most of Part HI focuses on iterative methods. However, I begin with a brief discussion of direct methods for solving sparse systems. This will provide a context for explaining the advantages and disadvantages of iterative methods. For simplicity, I will restrict myself to symmetric positive definite systems KU = F.

## 10.1 The Cholesky factorization for positive definite matrices

Mathematically, the solution of KU = F can be expressed as U = K~[ F, where ^ -l is the inverse of the matrix K. Computationally, though, it is not a good idea to compute K~]. Instead, a factorization of K can be computed which then makes it easy to solve KU F. For symmetric positive definite systems, the Cholesky factorization is preferred.

225

226

## 10.1.1 The Cholesky factorization for dense matrices

The Cholesky factorization of a symmetric positive definite matrix K takes the form K = RT R, where R is an upper triangular matrix.14 An upper triangular matrix R has nonzeros only on or above the main diagonal; that is, /?,-7- = 0 for i > j. I will begin by discussing the Cholesky factorization for dense matrices, and later turn to the implications of sparsity. Given K = RT R, it is easy to solve KU = F:

The inverse matrices are used for notational convenience. However, no inverse matrices need be formed. The system RT V = F takes the form

The first equation can be solved to get V\, which can then be substituted into the second equation to get 2, and so on. This algorithm is called forward substitution. Solving the upper triangular system RU = V by back substitution is no harder. The last equation is solved for UN, which is then substituted into the next-to-the-last equation to get UN-I, and so on. Forward and back substitution each require N2 arithmetic operations when R is N x N and it cannot be assumed that any of the upper triangular entries in R are zero (see Exercise 1). The factorization K = RT R can be derived directly from the definition of matrixmatrix multiplication:

Since R is upper triangular by hypothesis, the index k can be restricted so that the above summation does not involve any elements from the strict lower triangle of R:

In particular, and thus R] \ = *JK\\. It can be shown that all the diagonal entries of a positive definite matrix are positive, so the square root produces a real number. On the other hand, if K\ \ fails to be positive, then the algorithm can detect that K is not positive definite. Having computed R\\,\i is possible to compute the remainder of the first row of R:

14 Sometimes the factorization is expressed as K = LLT, where L is a lower triangular matrix. Obviously the two forms are equivalent, with L = RT.

## 10.1. The Cholesky factorization for positive definite matrices

227

In general, having computed the first i 1 rows of R, the /th row can be computed as follows. First of all,

## It can be shown that if K is positive definite, then

must hold, and therefore, once again, taking the square root is allowed. On the other hand, if

is detected, this demonstrates that K is not positive definite. Finally, having computed /?/,, the remainder of row i of R can be computed as follows:

The complete algorithm (omitting the test to verify (10.1), that is, to verify that K is really positive definite) is shown in Algorithm 10.1.

Algorithm 10.1. The Cholesky factorization for a dense N x N symmetric positive definite matrix. Computing the Cholesky factorization of a (dense) N x N matrix requires N3/3 + N /2 - 5N/6 arithmetic operations, plus TV square roots (see Exercise 2). This operation count is usually expressed as
2

since only the leading term in the polynomial is significant when N is large.

228

## 10.1.2 The Cholesky factorization for banded matrices

The simplest kind of sparse matrix is a banded matrix, in which all of the nonzeros belong to a band centered at the main diagonal. The precise definition is as follows: AT is a banded matrix with half-bandwidth p if KIJ = 0 for \i j\ > p. The stiffness matrix is banded in problems with simple geometry when the nodes are ordered in a regular fashion. For example, Figure 10.1 shows a mesh on a square domain with the free nodes numbered by rows. The square is divided into n2 smaller squares, with each small square then divided into two triangles (n 6 in this example). A typical free node z/, is adjacent to free nodes with indices

which shows that the stiffness matrix is banded with half-bandwidth n. 15 A stiffness matrix K corresponding to Figure 10.1 is shown in Figure 10.2. The Cholesky factor R of a banded symmetric positive definite matrix K with halfbandwidth p is upper triangular with half-bandwidth p. Typically R has more nonzeros than the corresponding lower triangle of K, since zero entries within the nonzero band of K can become nonzero in the corresponding entries of R. This is referred to as fill-in, and it is an important concept in the direct solution of sparse systems. Due to fill-in, the factors of a sparse matrix are usually less sparse than the original matrix. However, when the original matrix is banded, fill-in is limited to the band, and so the factors cannot be too much denser than the original matrix. This is illustrated in Figure 10.2, which shows the sparsity patterns of both K and its Cholesky factor R for the example described above.

## Figure 10.1. A uniform mesh with regular mesh numbering.

lf the PDE is Laplace's equation, then the entries #,,/- = Ki,i+n = 0 due to cancellation. In that case, the half-bandwidth reduces to 1.

I5

10.2.

## Factoring general sparse matrices

229

Figure 10.2. Left: The sparsity pattern of a stiffness matrix K corresponding to the mesh of Figure 10.1. Right: The spars ity pattern of the Cholesky factor ofK.

Algorithm 10.2. The Cholesky factorization for a banded N x N symmetric positive definite matrix with half-bandwidth p. The algorithm for the banded Cholesky factorization differs from the dense case only in that the various loops are limited so that the computations remain in the band. Algorithm 10.2 gives the details. Computing the Cholesky factorization of an N xN banded symmetric positive definite matrix with half-bandwidth p requires O (2p2N) arithmetic operations, while solving the corresponding triangular systems requires O(2pN) operations each. The advantage of factoring K instead of computing K~l is dramatic for a banded (or, more generally, sparse) matrix. The inverse of a banded matrix is dense, so the number of operations required to compute K~[ is proportional to pN2. When N ^> p, this is much more than is required to factor K. Even after K~l is known, O(2N2} operations are required to compute A'"1 F, whereas only O(4pN) are needed to solve the two triangular systems in the Cholesky factorization approach.

10.2

## Factoring general sparse matrices

If K is a sparse symmetric positive definite matrix with no particular sparsity pattern, then it is difficult to predict how many arithmetic operations will be required to compute the

230

## Figure 10.3. A triangulation with 1024 triangles and481 free nodes.

Figure 10.4. Left: The sparsity pattern of the stiffness matrix K corresponding to the mesh of Figure 10.3 (and a certain ordering of the free nodes}. Right: The sparsity pattern of the Choles ky factor R ofK.

Cholesky factorization K RT R. This is because, in the absence of some regular pattern to the position of nonzeros in the matrix, it is hard to predict how much fill-in will occur. For example, the mesh in Figure 10.3 has 1024 triangles and 481 free nodes. Figure 10.4 shows the sparsity patterns of the stiffness matrix K and its Cholesky factor R. About 1.3% of the entries of K are nonzero, while almost 12% of the entries of (the upper triangle of) R are nonzero. It is important to note that both the sparsity pattern of K and the resulting sparsity pattern of/? are dependent on the ordering of the free nodes in the mesh. The number of nonzeros in K is not changed when the free nodes are listed in a different order, but the sparsity pattern is changed. On the other hand, because fill-in is influenced not just by the

## 10.2. Factoring general sparse matrices

231

number of nonzeros but by the sparsity pattern, the number of nonzeros in R changes when the free nodes in the mesh are listed in a different order. There are various algorithms for reordering the nodes of a mesh so as to reduce the bandwidth, or more generally the fill-in, of the resulting stiffness matrix. Alternatively, similar algorithms reorder the rows and columns of a sparse matrix to reduce bandwidth or fill-in. The details of these algorithms are beyond the scope of this book; 1 will just briefly mention two such algorithms that apply directly to the matrix. The symmetric reverse Cuthill-McK.ee (RCM) algorithm defines a permutation matrix Q such that QKQ1 tends to have its nonzero entries closer to the diagonal than does K, thus reducing the bandwidth. A permutation matrix results from permuting the rows of the identity matrix. Left-multiplying K by Q permutes the rows of K, while right-multiplying K by QT permutes the columns of A' according to the same permutation. Figure 10.5 shows the sparsity patterns of QKQT (for the matrix K illustrated in Figure 10.4) and the corresponding Cholesky factor, where Q is obtained from the symmetric RCM method. The number of nonzeros in the Cholesky factor is about 23% less than the number of nonzeros in the original Cholesky factor R. The symmetric RCM algorithm is implemented in the built-in MATLAB function sytnrcm. The symmetric minimum degree permutation defines another permutation matrix, Q, such that QKQT tends to have a sparser Cholesky factor than K. This algorithm tries to minimize fill-in directly rather than to reduce bandwidth. Since it is expensive to implement the minimum degree algorithm exactly, there are various approximate algorithms, one of which is implemented in the built-in MATLAB function symamd. Figure 10.6 shows the sparsity patterns ofQKQT (still for the same matrix K), where Q is obtained from symamd, and the corresponding Cholesky factor. The number of nonzeros in the Cholesky factor is now reduced by 50%.

Figure 10.5. Left: The sparsity pattern of QKQT, where K is the matrix from Figure 10.4 and the permutation matrix Q is obtained from the symmetric RCM algorithm. Right: The spars ity pattern of the C'holesky factor ofQKQT.

232

## Chapter 10. Direct solution of sparse linear systems

Figure 10.6. Left: The sparsity pattern of QKQ7, where K is the matrix from Figure 10.4 and the permutation matrix Q is obtained from the symmetric minimum degree permutation. Right: The spars ity pattern of the Cholesky factor of QKQT.

Without the use of reordering, the cost (in computational time and memory) of factoring K can easily become unacceptable. For example, the following table shows the number of nonzeros in the stiffness matrix K and its Cholesky factor R as the mesh shown in Figure 10.3 is refined. It also shows that time required to compute R.}6 N I 48T 1985 8065 32513 | nnz(K) I nnz(R) I Time(s) 3017 13748 0.031552 12681 121387 0.4246 51977 1035889 8.6038 210441 | 8613949 | 183.8831

The size of K grows by a factor of about four each time the mesh is refined, as would be expected, and the number of nonzeros in K grows proportionately. However, the number of nonzeros in R grows much more quickly, and the time taken in computing R grows more quickly still. Using the symamd function in MATLAB to reorder the rows and columns of K leads to the following results: N nnz(K) nnz(R) 481 3017 6841 1985 12681 41451 8065 51977 230098 32513 I 210441 | 1 212742 Time (s) 0.090548 0.13162 0.87183 | 7.9944

16 The actual times are irrelevant, as they were obtained on a nearly obsolete laptop computer. But the relative times are still meaningful.

10.3.

## Exercises for Chapter 10

233

The improvement is dramatic; nevertheless, the number of nonzeros in R and the computational time are both growing noticeably faster than the number of nonzeros in K. For large problems, the cost of solving KU = F will therefore overwhelm the cost of forming K and F. It is for this reason that iterative methods for solving KU F are usually preferred.

10.3

## Exercises for Chapter 10

1. Verify that forward or back substitution requires TV2 arithmetic operations for a triangular N x N system. 2. Verify that the exact operation count for Algorithm 10.1 is N3/3 + N2/2 5N/6. 3. Suppose R e RNxN is a banded triangular matrix with half-bandwidth p. Determine the exact operation count for solving RU = F or RTU F. 4. An alternate form for the Cholesky factorization is K = RTDR, where D is a diagonal matrix with positive diagonal entries and R is an upper triangular matrix with ones on the diagonal. The advantage of this alternate form is that no square roots must be computed to form D and R. (a) Write out an algorithm, similar to Algorithm 10.1, for computing D and R. (b) What is the relationship between D and R and the Cholesky factor R satisfying K = RTR? 5. (MATLAB) Consider a sequence of uniformly refined meshes approximating the unit circle, such as shown in Figure 6.3. Consider solving the resulting finite element equations KU = F with and without reordering the nodes (or, equivalently, reordering the rows and columns of K). Make a table showing the number of refinements, the time required to solve KU = F without reordering, and the time required to solve KU F with reordering. Use the MATLAB routines chol to compute the Cholesky factorization and symamd to reorder the rows and columns of K. 6. In Section 3.1, the possibility of using an orthogonal basis in the context of Galerkin's method was discounted as too expensive. The purpose of this exercise is to examine the cost of producing an orthogonal basis via the Gram-Schmidt process.17 Let Vh be a finite element space, and suppose { 0 i , . . . , 0} is the standard nodal basis for V/j. Let K be the usual stiffness matrix. Let [9\,..., 9n] be the orthonormal basis produced by the Gram-Schmidt orthogonalization process applied to {0|,..., 0n}. (a) Prove that there is a lower triangular matrix L such that

l7 The Gram-Schmidt process is a standard method for producing an orthogonal basis from a given basis. An explanation can be found in any introductory text on linear algebra, such as those listed in the bibliography.

234

## Chapter 10. Direct solution of sparse linear systems

(b) Show that LKLT /, where / is the n x n identity matrix. (c) Find the relationship between L and the Cholesky factor R of K (R upper triangular, K = RTR). (d) Compare the cost of finding the orthonormal basis with the cost of forming the matrix K and solving the equation KU F.

Chapter 11

In the preceding chapter, direct methods for sparse systems were shown to be hampered by fill-in, which in turn was determined by the sparsity pattern of the matrix. For iterative methods, on the other hand, the sparsity pattern matters little, since the only requirement is that the matrix-vector product K V can be computed efficiently. The sparsity of K is important only in that it makes the computation of K V inexpensive. I will discuss two general classes of iterative methods, those based on minimizing an associated functional, and stationary iterations, based on fixed point iteration. The most important example of the first class is the conjugate gradient (CG) method, which is the subject of this chapter. The second class, examined in the next chapter, includes the Jacobi, Gauss-Seidel, and successive overrelaxation (SOR) methods.

11.1

The CG method

The CG method is actually an algorithm for minimizing a quadratic form. If K e R yVxA ' is symmetric positive definite, F e R yv , and 0 : RN > R is defined by

## Therefore, the unique stationary point of 0 is

Moreover, a consideration of the second derivative matrix shows that this stationary point is the global minimizer of 0 (a quadratic form defined by a symmetric positive definite matrix
235

236

## Chapter 11. Iterative methods: Conjugate gradients

Figure 11.1. The graph of a quadratic form defined by a positive definite matrix. The contours of the function are also shown.

is analogous to a scalar quadratic ax2 + bx + c with a > 0 see Figure 11.1). Therefore, solving KU F and minimizing 0 are equivalent. Any iterative minimization algorithm can therefore be applied to 0 and, assuming it works, it will converge to the desired value of U. An important class of minimization algorithms consists of descent methods based on a line search. Such algorithms are based on the idea of a descent direction: Given an estimate t/ (() of the solution, a descent direction P for 0 at t/ (l) is a direction such that, starting from t/ (l) , 0 decreases in the direction of P. This means that, for all a > 0 sufficiently small,

holds. Equivalently, this means that the directional derivative of 0 at U(i) in the direction of P is negative, that is,

Given a descent direction, a line search algorithm will seek to minimize 0 along the ray {U(l) + aP : or > 0} (that is, it will search along this "line," which is really a ray). The quadratic 0 reduces to a scalar quadratic along a one-dimensional subset, so it is particularly easy to perform the line search. Indeed,

## 11.1. TheCG method

237

(the symmetry of K was used to combine the terms U(i) KP/2 and P KU(i)/2). The minimum is easily seen to occur at

How should the descent direction be chosen? The obvious choice is the steepest descent direction since the directional derivative of 0 at / (/) is as negative as possible in this direction. The resulting algorithm (choose a starting point, move to the minimum in the steepest descent direction, calculate the steepest descent direction at that new point, and repeat) is called the steepest descent algorithm. For an example of a line search in the steepest descent direction, see Figure 11.2. The steepest descent method is guaranteed to converge to the minimizer U of 0, that is, to the solution of KU F. However, it can be shown that the steepest descent method converges slowly, especially when K is ill-conditioned. The condition number of a symmetric positive definite matrix K is

where A m j n and A max are the smallest and largest eigenvalues of K, respectively. The matrix K is called ill-conditioned if cond(^) ^> 1. When K is the stiffness matrix for a (twodimensional) BVP and a nodal basis is used, its condition number is O(h~2), where h is the mesh size. This shows that K becomes increasingly ill-conditioned as the mesh is refined. To show the relationship between the condition number of K and the convergence of the steepest descent algorithm, it is convenient to introduce an alternate inner product and norm for RN: (x, y)K x Ky, \\x\\K = y/(x,x)K. It can be shown that (, -)/c defines an inner product on R^ when K is symmetric positive definite.18 Moreover, the sequence of vectors f/ (0) , f/ ( 1 ) , f / < 2 ) , . . . produced by the steepest descent algorithm satisfies

## A proof can be found in Luenberger [28].

I8 lf K is the stiffness matrix for a BVP, then the inner product defined by K is the discrete version of the energy inner product a ( - , )

238

## Chapter 11. Iterative methods: Conjugate gradients

Figure 11.2. The contours of the quadratic form from Figure 11.1. The steepest descent direction from U = (4, 2) (marked by o) is indicated, along with the minimizer in the steepest descent direction (marked by o). The desired (global) minimizer is marked by x. The bound (11.4) implies that the number of iterations needed by the steepest descent method is roughly proportional to the condition number of K (see Exercise 2). The algorithm therefore takes increasingly long to converge as the condition number of K increases (for example, as the underlying mesh is refined). As I will show, much better algorithms are available. EXAMPLE n.i. A fixed test problem will be used to test the algorithms described in this chapter, namely, the BVP

## where 2 is the unit square and

The exact solution is The mesh on 2 is uniform, constructed by dividing the x and y intervals into n subintervals each. This results in 2n2 triangles and (n I}1 free nodes. The finite element equation, KU = F, is therefore N x N, with N = (n - I) 2 .

## 11.1. TheCG method

239

The steepest descent algorithm is used to solve KU = F, starting with f/ (0) 0 and stopping when the relative residual

falls below 10 6. The vector R(k) = F KU^ is called the residual in the equation KU F; it is the amount by which the equationfails to be satisfied. The iterations required are reported in the following table, along with the condition number of the matrix K.

N
9 49 225 961 3969 16 129

## Ratio 4.00 4.35 4.55 4.64 4.66 4.67

In this experiment, the number of iterations is roughly proportional to the condition number ofK,as suggested above.

## 11.1.1 The CG algorithm

The CG algorithm is another descent algorithm that is usually a great improvement over the steepest descent method. The problem with the steepest descent method is that while the steepest descent direction is locally optimal, the search directions are poorly chosen from a global point of view. Indeed, it can be shown that successive search directions are orthogonal, and thus the path followed to the solution is not efficient. The CG algorithm defines the successive search directions to satisfy a pleasing global property: Each step preserves the optimality of previous steps. To be precise, after k steps of CG, the estimated solution is the minimizer of 0 over the ^-dimensional subset spanned by the first k search directions. It is rather difficult to derive the CG algorithmthe final form results from several nonobvious simplifications. I will content myself with showing the critical step: the computation of the search direction. I assume that the initial estimate of the solution is t/(0) = 0, that the first/: search directions are P (1) , P(2\ ..., P(k \andthat after k steps a. \, 2, , * are determined so that

solves 1 now wish to find a new search direction p(k+l) with the following property: If

240

## Chapter 11. Iterative methods: Conjugate gradients

solves

where hten

solves

It is not clear a priori how to compute such a jP(*+1). However, it turns out to be easy, and this is the secret of the CG method. The solution of (11.6) is given by with the property that

is as small as possible. I separate the last term, \$t+i P(k+]\ from the sum because I already know how to make

## Here is the crucial observation: If P(k+l) is chosen so that

is zero, then

The minimization problem is then "decoupled." That is, fi[, fa, , ftk can be chosen to minimize

and \$t+i to minimize and the resulting ft], fa, , Pk+\ will be the solution of (11.6). By assumption, \$ = a,, i 1,2,... ,k, solves (11.7), and I have already shown how to compute fik+\ using the fact that (11.8) reduces to a quadratic in one variable.

241

## It is certainly sufficient to satisfy

To find p(*+1\ it can be assumed that this property was satisfied at the previous steps:

Condition (11.9) states that the vectors P ( ] ) , P(2\ ..., P(k) are orthogonal with respect to the inner product (-, -)/c introduced above. A search direction P (A+1) with the desired property can be computed from any descent direction by subtracting off its component lying in the subspace

the result will be orthogonal to each of the vectors P (1) , F ( 2 ) , . . . , P(k\ The CG method results from choosing the descent direction to be the steepest descent direction R(k) = -V(j)(U(k)) = F - KU(k). Then

## Finally, computing ak+\ is simple, as shown earlier:

Then The above formulas define the CG algorithm. However, efficient implementation depends on several simplifications that are not straightforward to derive. The reader is referred to Golub and Van Loan [21] for proofs of the following assertions. It can be shown that if P ( l ) , P ( 2 ) , . . . , P(k) are chosen as described above, then

## has an alternate basis:

A subspace of the form (11.12) is called a Krylov subspace, and so the CG method can be described as a Krylov subspace method. Because (11.12) holds, it can be shown that

242

## and thus (11.10) reduces to

In other words, R(k) is automatically orthogonal to P (1) , P(2),..., P(k l\ so it need only be orthogonalized against P(k). This results in a significant savings in computation that makes the CG method affordable. Further analysis shows that

and
Using these formulas, which reduce the amount of computational work that must be done, the CG algorithm can be written in the efficient form found in Algorithm 11.1. The reader should note that only a single matrix- vector product is required at each step of the algorithm, making it very efficient. In addition to the matrix- vector product, only O(\QN) arithmetic operations are required for each iteration. The storage requirements, beyond K and F, are only four vectors, denoted R, P, V, U in Algorithm 11.1. The CG algorithm is usually halted when the relative residual is reduced to a level considered satisfactory, or when a predetermined iteration limit is reached.

## R <-F P +- R c\ <- R R for* = 1 , 2 , . . . V *- KP c2 <- P V a <- ci/c 2 C7 <- C/ + aP /? ^- ^ - a V c3 <- R- R

ft ^ C 3 /Ci C] <~C3

P ^ pP + R

Algorithm 11.1. The CG algorithm for solving KU = F, where K is a symmetric positive definite matrix. The name "conjugate gradient" is derived from the fact that many authors refer to the orthogonality of the search directions, in the inner product defined by K, as K-conjugacy. Therefore, the key step is to make the (negative) gradient direction conjugate to the previous search directions.

## 11.1. TheCG method

1 1 .1 .2 CCConvergence of the CG algorithm

243

According to the above derivation, CG produces a sequence U ( l } , U(2) , . . . with the property that U{k) minimizes

over the ^-dimensional subspace Sk. Since SN R^ and the exact solution of KU F minimizes 0 over R^, this shows that U ( N ) is the exact solution of KU F. Therefore, CG can be regarded as a direct method; in exact arithmetic, it produces the exact solution of the equation in a finite number of steps! However, this fact is not really relevant, for two reasons. First of all, round-off errors tend to lead to a loss of orthogonality, so that U ( N ) is usually not as accurate a solution of KU = F as one expects from a direct method. More importantly, though, is that CG is usually applied to systems that are large enough that performing N iterations is unrealistic. The CG method is useful because it can produce quite accurate results in many fewer than N iterations. The convergence analysis, which is outlined in Luenberger [28], shows that

Therefore, the convergence behavior depends on the square root of the condition number rather than the condition number itself, a considerable improvement over the steepest descent algorithm; one can show that the number of iterations required is proportional to the square root of the condition number (see Exercise 3). In many cases, the algorithm behaves even better than this bound suggests. Nonetheless, it is still advantageous to try to replace KU = F by an equivalent system with a better-conditioned coefficient matrix. This is the subject of the next two sections. EXAMPLE 11.2. The convergence of the CG is illustrated on the test problem from Example 11.1, solving KU = F for a sequence of increasingly fine meshes. The following table shows the number of iterations taken by CG in reducing the relative residual to 10~6:

## Ratio 1.67 2.62 2.69 2.69 2.71 2.87

In this example, the number of iterations is roughly proportional to the square root of cond(K), a striking improvement over the steepest descent algorithm.

244

11.2

## So far, I have assumed that a nodal basis {<j>\,<f>2, , (f>Nf}, characterized by

will be used to represent the finite element solution. However, a finite-dimensional space has many possible bases, and it may be advantageous to choose one basis over another. The nodal basis is the most popular basis because of ease of use; its interpretation is simple and it is straightforward to assemble K and F. On the other hand, the condition number of K depends on the particular basis chosen, and a nodal basis leads to a condition number of O(h~2), which grows significantly as the mesh is refined. An alternative to a standard nodal basis is a hierarchical basis (first proposed by Zienkiewicz et al. [45]; see also Yserentant [44]). For the piecewise linear case, a hierarchical basis can be defined in a natural way when the mesh is obtained by several refinements of an initial, coarse mesh. In the following discussion I will consider a sequence of meshes To, T\, Ti,..., where To is a given (coarse) mesh and each 77+i is the standard refinement of 77- Each mesh is assumed to consist of linear Lagrange triangles, so every node in a given mesh is a vertex of one or more triangles. By construction, each node in 77 is also a node in Ti+\. Moreover, every node belonging to 77+1 but not to 77 is the midpoint of an edge in 77- I will assume that the nodes belonging to To are z\, 12, , ZNM, the nodes of T\ are z\, 12, , zN(i>, and so forth. Thus the nodes of 77 are the first N^l) nodes of 77+1.

11.2.1

## Hierarchical bases for linear Lagrange triangles

I will write P((1) for the space of continuous piecewise linear functions defined on 77- The standard basis functions for P^ do not belong to the space P/ }, since they are not linear over the triangles of Pf-(1). However, every function in p f l ) is a member of P/+j (see Figure 11.3). In particular, any basis for P;(1) is a linearly independent set in P^\, and can therefore be extended to a basis for P^}. This is the foundation of the hierarchical basis construction. The first step in the hierarchical basis construction is to define

to be the standard Lagrange basis for P 0 . The hierarchical basis for rPt(1)i is then taken to be where i/r/y) denotes one of the standard basis functions defined relative to T/. I will now show the exact relationship between the hierarchical basis (11.15) and the standard Lagrange basis

for

## 11.2. Hierarchical bases for finite element spaces

245

Figure 11.3. A standard basis function on a mesh (left) and the same function on the refined mesh (right). If i < N^\ then all the basis functions in (11.15), except ty- \ are zero at node z t. That is,

## It follows that if u E P\}) is written as

then In other words, the first N(v ' coefficients representing u are its nodal values at the nodes of To. On the other hand, i f / > A^0), then v[0), , V^m are nt>t all zero at z/, which is a midpoint of some edge in To- If the endpoints of this edge are z enc j (/1} and zen(j(/ 2 ) then

and
Therefore, (11.17) implies

246

## where the second equality comes from applying (11.18). Thus

This equation should be interpreted as follows. Suppose w (0) is the piecewise linear function on To that interpolates u at the nodes of To- Then, as I have already shown,

## The function u itself can then be written as

where the coefficients otNm+l,..., a^in are the differences between the nodal values at the nodes ZNW+} , . . . , ZNW of u and the interpolant M (O) :

This pattern continues as more levels are added to the hierarchy. For example, if M e P2(1)> men u can be written as

where

is the piecewise linear function from P,(1) interpolating u at the nodes of T\. Since, for i > N^\ each i^/(2) is zero at all of the nodes of 7i, the coefficients a\, 2 , . . . , oyn are defined exactly as above. To determine the coefficients a A , < i ) + 1 , . . . , o^at, I reason as before. Suppose / > N^\ so that z, is a node belonging to Ti but not to 7i. Then zt is the midpoint of some edge in 71, and

## 11.2. Hierarchical bases for finite element spaces Therefore,

247

In general, the hierarchical basis for P/1* is the hierarchical basis for P^\ , augmented by the nodal basis functions corresponding to nodes belonging to Ti but not to 77- 1. Any M e / ( 1 ) can be written as

where u ( l ) is the element of P^ ) interpolating u at the nodes of 7/. Each of the differences w ( ( ) M ( '" I } has the property that it is zero at the nodes of 71-1, and thus u ( l ) u(l~}) is oscillatory (see Figure 11.4).

Figure 11.4. The function u(0) (top left), w ( l ) - w ( 0 ) (top right), u{2) - w ( 1 ) (bottom left), and (3) w ( 2 ) (bottom right) for u(x, y) sin (nx) sin (ny).

248

## Chapter 11. Iterative methods: Conjugate gradients

The relationships derived above lead to compact algorithms for converting from a nodal basis to a hierarchical basis, and vice versa. The linear operator taking the coefficients representing u e P^ in the hierarchical basis and producing the vector U of nodal values of u will be denoted by S. To describe algorithms for computing S and S"1, the following notation will be used: The set A//, i = 1 , 2 , . . . , k, consists of the nodes belonging to T\ but not to TI-i As above, any z/ e A/i (i > 1) is the midpoint of the edge with endpoints z end(i i) anc^ zend(i 2r Suppose the nodal values of a piecewise linear function M, defined on the mesh 7*, are stored in an array U: U(i) = w(z/). The following double loop19 implements U < S~}U; that is, it overwrites the values of U with the coefficients a/, i = 1 , 2 , . . . , N^k\ expressing u in terms of the hierarchical basis: for for This algorithm works for the following reason: While the coefficients corresponding to Zj e MI are being computed, the condition U(t} u ( z t ) still holds for it e ACV, s < i. It is therefore a simple matter to subtract off the average of the nodal values at the endpoints of the edge whose midpoint is Zj. The following double loop takes an array U containing the coefficients expressing u e P}) in terms of the hierarchical basis and overwrites it with SU, the vector of nodal values of u: for*

for
In solving most BVPs, the approximating subspace is a subspace V, of P((1) rather than itself. The subspace takes into account the essential (Dirichlet) boundary conditions. In this case, the nodal basis for V/ is a subset of the nodal basis for P ( (l) , denoted P((1)

## In this case, the hierarchical basis for V* is

The above algorithms for computing 5 and S } are still valid; the coefficients (in either the nodal or hierarchical basis) of basis functions corresponding to constrained nodes are assigned value zero upon input and ignored upon output. For the sake of convenience, I will denote the hierarchical basis for V* as

Each Yi is a nodal basis function from one of the meshes To, 7 T , . . . , 7/t19

249

11.2.2

## Relationship between the stiffness matrices in nodal and hierarchical bases

In a hierarchical basis resulting from a mesh that has been refined several times, some of the basis functions (those coming from the coarse meshes) have supports that cover many triangles. One of the main advantages of the nodal basis is the sparsity of the resulting stiffness matrix, which results directly from the fact that the nodal basis functions have small support. It would appear, then, that the hierarchical basis surrenders this sparsity of the stiffness matrix, and would therefore be of limited use. However, when the finite element equations are solved by iterative methods, such as the CG method, the matrix itself is not needed but only the ability to multiply it by a vector. For the sake of the following discussion, I will denote by K the stiffness matrix arising from the nodal basis for Vk and by K the stiffness matrix arising from the corresponding hierarchical basis. Similarly, F and F will denote the two load vectors. I will now show how the matrix-vector product KU can be computed nearly as cheaply as KU, even though K is much less sparse than K. The matrix K is defined by the values W KU, where U and W are arbitrary vectors in RNf . Indeed, ifU,W e RNf , and u, w are the continuous piecewise linear functions defined by

then

In particular, the entries of K are determined by taking u, w to be nodal basis functions and U, W to be the corresponding Euclidean vectors. By the same reasoning, the matrix K is determined by the values of

where

250

## Chapter 11. Iterative methods: Conjugate gradients

But if U SU, then U and U define the same piecewise linear function, and similarly for W = SW:

## Therefore, since this holds for all U, W,

Similar reasoning shows that F ST F. If an iterative method is used to solve KU = F, the matrix K STKS need not be formed explicitly. Rather, matrix-vector products of the form KU can be computed via

I have already given an algorithm for computing SU efficiently. Here is the corresponding algorithm for overwriting U with STU: for for

For the sake of completeness, I will also give the algorithm for overwriting U with S TU: for for

11.3

## The hierarchical basis CG method

As I discussed in Section 11.2, the condition number of the stiffness matrix K is not determined solely by the BVP, mesh, and choice of approximating subspace. The choice of basis is of primary importance. If a hierarchical basis is used in place of the standard nodal basis, then the condition number of the resulting stiffness matrix is much smaller than when the nodal basis is used. As in Section 11.2, K and F denote the stiffness matrix and load vector corresponding to the hierarchical basis.

## 11.3. The hierarchical basis CG method

251

The reader will recall that any piecewise linear function uh e P(h}) can be written either in terms of the nodal basis or the hierarchical basis, and there is a vector of coefficients representing /, in either basis. For the nodal basis, the vector U of coefficients contains the nodal values of M/, . For the hierarchical basis, the vector U of coefficients can be related to U by a simple linear transformation: U = SU. The transformation S can be implemented in a simple and efficient double loop, which was given in Section 11.2. When the approximating subspace V/, is a subspace of P^ } obtained by omitting the basis functions corresponding to constrained nodes, there is a similar transformation, which will also be denoted by 5. The hierarchical basis CG method is simple: The ordinary CG method is applied to KU F instead of to KU F. The solution U is then transformed into the corresponding U. It was shown in Section 11.2.2 that K = STKS and F = STF. The hierarchical basis CG method is then applied as follows: 1. Form K and F in the usual manner. 2. Compute F = STF. 3. Apply the CG method to KU F to get an approximate solution U. The matrix K is needed only to compute matrix-vector products and is never formed explicitly. Instead, when it is necessary to compute KP (ST KS)P, the equivalent computation K P = ST(K(SP))isusGd. 4. The approximate solution UofKU F is transformed into an approximate solution ofKU = Fby U = SU. The algorithms for applying S and ST each require about 3NV operations, for a total of 6N V = 6Nf operations. One iteration of CG required one matrix-vector multiplication plus about 10W/ operations. Since computing KP will typically require at least lOAf/ operations, the operation count per iteration is about 20Nf for ordinary CG versus about 26 Nf for the hierarchical CG method. Therefore, an iteration of hierarchical CG costs, at most, only about 30% more than an iteration of the ordinary CG algorithm. EXAMPLE 11.3. To illustrate the convergence of the hierarchical basis CG method, I use the test problem from Example 11.2. The following table shows the number of iterations taken in reducing the relative residual to 10~6:
N

## 9 49 225 961 3969 16129

Iterations 6 18 26 34 41 48

## cond(K) 9.00 37.26 150.42 603.05 2413.60 9655.79

In this example, the number of iterations grows only slowly with N. In particular, on the finest mesh, the hierarchical basis CG method requires less than 14% of the iterations of the standard CG, and so the total time required should be less than 20% of the time for the standard CG.

252

## 11.4 The preconditioned CG method

The CG method works well for solving KU F provided K is not too ill-conditioned. If K is ill-conditioned, it may be possible to replace KU = F by an equivalent system KU = F with the property that K is better conditioned than K. This is the idea behind preconditioning. A preconditioner is a matrix L that is somehow an approximation of K but has the property that it is significantly easier to solve systems of the form LU G than KU = F. I assume that L is also symmetric positive definite, which means that it has a square root, a symmetric positive definite matrix L l / 2 such that L = (L 1/2 ) 2 . Then K = L~l/2KL~l/2 is symmetric and positive definite, and K may be better conditioned than K. For example, if L were actually equal to K, then K would be the identity matrix, which is perfectly conditioned. If L is close to K in some sense, then K may be close to the identity in some sense and therefore not too badly conditioned. The system KU F is equivalent to the system KU = F, where F = L~ 1/2 F and the solutions are related by U L~*/2U. Given a preconditioner, the CG method can then be applied to KU F, yielding the following formulas:

This approach is not useful as it stands, because computing L 1/2 is nearly always more expensive than solving KU = F in the first place. However, by the following substitutions, the explicit use of L 1/2 can be avoided. The vectors P(k\ U(k\ and R(k) are defined by

The reader should notice the difference in the definition of R(k). These definitions yield the following simplifications:

## 11.4. The preconditioned CG method

253

The preconditioner L appears in (11.21) only in that an equation of the form LZ = R must be solved. One of the fundamental assumptions about the preconditioner L is that LZ = R is considerably cheaper to solve than KU = F, and now the meaning of this is made clear: It must be reasonable to solve a system of the form LZ = R at each iteration of the algorithm. The preconditioned conjugate gradient (PCG) algorithm is presented in Algorithm 11.2. The storage requirement is five vectors, R, Z, P, V, U, in addition to K and F. Each iteration requires O(107V) arithmetic operations plus one matrix-vector product KP and the solution of LZ R.

## R <- F Z +- L~1R P +-Z

C \ <- R - Z

for& = 1 , 2 , . . . V <- KP c2 +- P -V a <- C] /c2 f/ ^ [ / + P /? <- / ? - V Z ^- L-'/? ci^ R-Z ft <- c 3 /ci P ^-P + Z
C, ^C 3

Algorithm 11.2. The PCG algorithmfor solving K U F, where K is a symmetric positive definite matrix and L is a symmetric positive definite preconditioner.

11.4.1

## Alternate derivation of PCG

A symmetric positive definite preconditioner L can be factored as L = EET instead of L = (L 1/2 ) 2 . The matrix E could be obtained from the Cholesky factorization, but there are other possible choices. The system KU = F is then transformed into KU F, where

254 Defining

## Chapter 11. Iterative methods: Conjugate gradients

and substituting into the CG algorithm applied to KU F yields the same PCG algorithm given in Algorithm 11.2. Once again, E need not be computed explicitly; the preconditioner only appears in that the system LZ R must be solved at each iteration.

11.4.2

Preconditioners

In this section, I mention several preconditioners that can be used in conjunction with the CG method.
Hierarchical bases

## The hierarchical basis CG method transforms KU F to

where 5 is the transformation from the hierarchical basis to the nodal basis. Taking E1 S~T, this can be interpreted in terms of Section 11.4.1. The preconditioner is L = S~T S~l, and the hierarchical basis CG method is an example of PCGs. In this special case, however, the transformation that is so important in expressing the PCG algorithm (obviating the need to form L 1/2 or E) is not needed, since S is readily available. Because of its generality and relative simplicity, the hierarchical CG method is highly recommended for two-dimensional problems.
Diagonal preconditioning

Perhaps the simplest preconditioner is a diagonal matrix, and the obvious diagonal entries are those of A': This is referred to as Jacobi preconditioning, for reasons that will be obvious in the next chapter. Although this preconditioner is very simple to use, it is frequently not very helpful. If the diagonal of K is constant, as in Example 11.1, then Jacobi preconditioning has no effect on the convergence of CG; this is one extreme. On the other hand, Jacobi preconditioning can make a significant difference when the magnitudes of the diagonal entries of K vary significantly, as in the following example. EXAMPLE 11.4. To illustrate the advantage of Jacobi preconditioning on certain problems, I will use the BVP

## 11.4. The preconditioned CG method where 2 is the unit disk,

255

and
Since K varies noticeably over 2, so do the diagonal entries ofK, which range from about 4 to almost 90. The following table shows the number of iterations taken by CG and PCG with Jacobi preconditioning in reducing the relative residual to 10~6:

W 5 25

## 113 481 1985

8065
SSOR preconditioning

## Iterations (CG) 1 6 25 62 149 317

Iterations (PCG) 1 6 16

33 69 144

In the next chapter, I describe the symmetric successive overrelaxation (SSOR) iteration. While SSOR is an iterative method in its own right, it also defines a preconditioner for CG. I will describe the use of this preconditioner in Section 12.2.5.
The incomplete Cholesky factorization

As I discussed in Chapter 10, the difficulty with using the Cholesky factorization is that the level of fill-in is unpredictable and can be significant. A preconditioner can be obtained by computing an approximate Cholesky factorization of K in which fill-in is limited by fiat. Such a factorization K = RTR is called an incomplete Cholesky factorization. For example, the level 0 incomplete Cholesky factorization allows no fill-in; the only nonzeros in the factor correspond to nonzero entries in (the lower triangle of) K. Another variation of the incomplete Cholesky replaces small entries in the Cholesky factorization with zero, thus reducing the level of fill-in. The drop-tolerance determines which entries are considered small enough to be set to zero. An incomplete Cholesky factorization defines a preconditioner L RTR which can then be used in the PCG algorithm. For more details about incomplete Cholesky factorizations, the reader can consult the seminal papers by Meijerink and van der Vorst [31] and Manteuffel [29]. The books by Saad [37] and Axelsson [6] discuss incomplete factorizations and many other types of preconditioners. Fast Poisson solvers When the domain is a rectangle and a uniform mesh is used, it is possible to develop a fast solver for constant coefficient PDEs such as the Laplace or Poisson problem using the fast Fourier transform (FFT). Such a solver is called a fast Poisson solver, it can be used as a preconditioner for nonconstant coefficient problems. For more details, the reader can consult Saad [37].

256

11.5

## Some problems with pure Neumann boundary conditions, such as

lead to singular systems of finite element equations. (An example of this type was encountered in Example 7.4.) When the finite element method is applied, resulting in the equation KU F, the stiffness matrix is singular. In this section, I will explain why K is singular in this context, and how the system KU F can be solved. The reader will recall from Section 1.1 that (11.23) has a solution only if the compatibility condition

is satisfied. If (11.24) holds, then there are infinitely many solutions, any two of which differ by a constant. When (11.23) is expressed in its weak form,

the bilinear form a(u, v) = / f i /cVw Vi> fails to be //'(^-elliptic. To be precise, the constant function 1 has the property that

and thus

cannot hold for all u e Hl (Q). When the finite element method is applied (with a nodal basis), the result is the system KU = F, where

## Since the constant function 1 can be written as

the vector E of all ones represents the constant function 1. The following calculations show E spans the null space of AT, that is, that K V 0 if and only if V = cE for some constant c:

## 11.5. The pure Neumann problem

257

Therefore, K is a positive semidefinite matrix with a one-dimensional null space (spanned by ), and special methods are needed to solve KU = F. As I discussed in Section 2.5, the variational problem (11.25) can be made well-posed by restricting the solution u and the test functions v to the space

## The bilinear form (-, ) is V-elliptic, and thus

has a unique solution. The question then arises of how to translate this into an effective computational technique, that is, to a nonsingular sparse system KU F. Any basis for Vh V n Ph} (the space of continuous piecewise polynomials of degree d with mean zero) will lead to a nonsingular stiffness matrix. However, unless the basis is constructed carefully, K will be dense. Bochev and Lehoucq [12] show how to define a nodal basis for V/,; under such a basis, K will be sparse and one could use either direct or iterative methods to solve KU = F. An alternative (also suggested in [12]) is to modify the bilinear form a ( - , ) to make it //'(^-elliptic, dispensing with the need for a basis for V. To see how to do this, it is helpful to recall the results of Section 2.2.1: (11.25) is equivalent to the problem

where The function J does not have a unique minimizer but rather a one-dimensional set of minimizers. However, the constrained minimization problem

258

## Chapter 11. Iterative methods: Conjugate gradients

has a unique minimizer, the solution w of (11.26). Constrained minimization problems are more difficult to solve than unconstrained problems, and so various techniques have been developed to convert a constrained problem into an unconstrained problem (or a sequence of unconstrained problems). The simplest such technique is the quadratic penalty method, in which the square of the constraint is added to the function to be minimized as a penalty term. The penalty term is multiplied by a weight so that the constraint can be more or less emphasized in the minimization. Thus (11.27) is replaced by

where

and p is the penalty parameter. In the typical application of the quadratic penalty method, there would be no minimizer of J that also satisfied the constraint, so the exact solution of (11.27) would be found only in the limit as p - oo (increasing p causes the constraint to be satisfied more closely). However, the application at hand is a special case: J does have a minimizer, namely u, that also satisfies the constraint. It is easy to see, therefore, that the unique solution of (11.28), for any p > 0, is u. The optimality condition for (11.28) is

## where DJp(u} is the derivative of Jp at u. The following computation reveals DJp(u):

This expresses Jp(u + v) as Jp(u) + (a term linear in v) + (a term quadratic in u). The linear term must be DJp(u)v:

The solution u of (11.26) and (11.28) is the unique solution of the variational equation

## 11.5. The pure Neumann problem or, equivalently, where

259

The preceding discussion was intended to motivate replacing (11.26) with (11.29). I will now show directly that (11.29) has u as its unique solution. First of all, since ap(u, v) = a(u, v) whenever JQ u 0, M solves (11.29). It remains only to show that a p ( - , - ) i s H l ( 2 ) elliptic, and therefore (11.29) has a unique solution. Any v e //' (Q) can be written as u = 0 + U, where v e V and U is a constant. The constant U is defined by

where |2| is the measure of Q, (|2| = JQ 1). The function v is defined by v v v, and a direct calculation shows that v e V (see Exercise 4). Moreover,

while

260

## Chapter 11. Iterative methods: Conjugate gradients

This shows that ap(-, ) is H] (2)-elliptic. When the finite element method is applied to (11.29), the result is the system KPU F, where F is the usual load vector and

The matrix Kp can be written as Kp K + pWWT, where K is the usual stiffness matrix and W e RN" is the vector defined by

The matrix WWT is called a rank-one matrix; each column is a multiple of the vector W. Although WWT and therefore Kp are dense, in the context of iterative methods this turns out not to matter. It is necessary only to efficiently multiply Kp by a vector, and this can be done by taking advantage of the special structure of WWT:

(WT V is the scalar W V). The matrix Kp is necessarily symmetric and positive definite, since it is derived from the //'(^O-elliptic bilinear form a p ( - , ) This can be seen directly, since V KV > 0 for all vectors V, with V KV = 0 only if V is a multiple of E, the vector of all ones. On the other hand, V WWT V = (W V}2 > 0 for all V, and

## It follows that V Kp V > 0 for all V, since

and both terms are nonnegative, with at least one of them positive. The unique solution of KPU = F is the unique solution of KU = F satisfying the constraint W U = 0. This corresponds to the piecewise polynomial M/, that satisfies a(uh, v) = t(v) for all v e Vfl and also the constraint /n /, = 0. If desired, the vector W can be replaced with E itself, the vector of all ones. Writing now Kp = K + pEET, the solution of KPU F is the unique solution of KU = F that satisfies the additional constraint E U 0. This avoids the computation of the vector W.

## 11.5. The pure Neumann problem

261

The value of the penalty parameter p should be chosen so that the matrix Kp is as well-conditioned as possible. The eigenvalues of K are 0 < ^2 < < A/y,,, and thus the effective condition number of K (in a sense that 1 will not define precisely) is A/v,, A 2 . If P is defined to be a/Nv, where a satisfies ^2 < a < X.NV, then the eigenvalues of Kp K + pEET lie in the interval [A.2, A.#(l], and the condition number of Kp is also A.jv,,/A.2. The reader is asked to justify this conclusion in Exercise 5. If Kp is defined by Kp K + pWWT instead, then a reasonable choice of p is p = aNv/(W E)2 (see [12]). A reasonable value of a can be found by the formula

where V is a vector whose components are random numbers. The resulting a is guaranteed to lie in the interval [0, Xyy,,] and virtually certain to lie in the interval [^2, A.#J. The preceding discussion assumed that K and F are computed exactly, so that the null space of K is known and KU = F is a consistent singular system. When quadrature is used, the exact K and F are replaced with K and F, and the system KU = F may not be consistent. 1 will ignore the error in K, for the following reason: The quadrature rule is unlikely to change the null space of K, since, as 1 showed above,

and the quadrature rule need merely preserve the condition that J Q ^Vi Vv 0 if and only if Vv is identically zero. Therefore, although K is not computed exactly when K is nonconstant, K has the same null space as K, so the errors in K can be ignored when discussing the consistency of the linear system. Thus, 1 will discuss the system KU = F, where F is an estimate of the load vector F. The error F F can be decomposed into two components, the part that lies in the null space of K and the part that is orthogonal to the null space. Thus F F F + F, where E F 0 and F = uE. The system KU = F is therefore inconsistent unless F happens to equal zero. On the other hand, since Kp is nonsingular, the system KPU = F must be consistent (with a unique solution). If the solution is denoted by Up, then

In Exercise 6, the reader is asked to show that K~] F is orthogonal to E and independent of p, and that K ~ ] F is a multiple of E whose size depends on p. The error K~] F is consistent with the equation KU = F; that is, it cannot be separated from the true solution U = K~} F (unless F is known exactly). On the other hand, the error K~] F arises from the inconsistency of the right-hand-side F. It is a multiple of the constant vector E and its presence means that Up will not satisfy E Up = 0. The error K~} F can be eliminated by replacing F with its projection onto the null space of K, namely,

262

## Chapter 11. Iterative methods: Conjugate gradients

Doing so sets F to zero and eliminates the inconsistent error in the solution of the linear system. This is not necessary; doing it changes the computed solution by a constant vector. Since two solutions of the original BVP (or two solutions ofKU = F) differ by a constant, this is just a question of which solution is to be computed. I suggest replacing F with (11.31) since then Up K~} F is independent of p, is a solution of KU = F, and satisfies E Up = 0. EXAMPLE 11.5. This example is a continuation of Example 7.4, where a pure Neumann problem (posed on the unit square) was solved. The linear system KU F from that example (corresponding to the finest mesh) consists of\ 089 equations and 1089 unknowns. Denoting as above the computed load vector by F, I will solve KPU = F by the CG method. Using a relative residual tolerance o/"10~8, CG requires 134 iterations to solve the system. The value of p (computed from a random vector, as suggested above) was 3.46 10~3. In this example, the average value of the components of F was 7.97 10~7; this is nonzero due to quadrature error. As a result, the computed solution did not satisfy E-U = 0; instead, the average value of the components of the computed U, 2.11 10~7, was of the same order of magnitude. When F was replaced with (11.31), the resulting system also required 134 CG iterations to solve, but the computed solution satisfied E-U 0 up to round-off error (the mean value of the components ofU was 2.32 10~17).

## 11.6 The MATLAB implementation

Some of the algorithms discussed in this chapter apply to any matrix-vector equation KU F, where K is symmetric and positive definite. These include CG and its preconditioned version. Other algorithms, such as the hierarchical basis CG method, are developed specifically for solving finite element equations. Code is provided for such algorithms for the piecewise linear case only and therefore is found in versionl. 11.6.1 MATLAB functions

CG Applies the CG method to solve a linear system with a symmetric positive definite coefficient matrix. PCG Applies the PCG method to solve a linear system with a symmetric positive definite coefficient matrix. HierCGl Applies the hierarchical basis CG method to solve the finite element equation KU F (piecewise linear functions only). CGsing, HierCGsingl: Versions of CG and HierCGl for a singular matrix. The functions implementing the operators S, S~ ] , ST, S~T discussed in Section 11.2.1 are - HierToNodall Transforms the coefficients representing a piecewise linear function in terms of a hierarchical basis into the nodal values (the operator 5). - HierToNodalTransl: The operator ST.

## 11.7. Exercises for Chapter 11

263

- NodalToHierl Transforms the nodal values of a piecewise linear function into the coefficients representing it in terms of a hierarchical basis (the operator
0000

11.7

## Exercises for Chapter 11

1. Let K be an n x n symmetric positive definite matrix, let F e R" be given, and let 0 : R" -> R be defined by

Show that 2. Let K be an n x n symmetric positive definite matrix and let F e R" be given. Suppose the steepest descent algorithm is applied to solve KU F, starting with the initial estimate (7(0). Show that the bound (11.4) implies that the number of iterations required to achieve

## 3. Use (11.14) to show that the number of CG iterations required to achieve

is proportional to ^/cond(K). (Hint: See the previous exercise.) 4. For each v e H' (2), define v by (11.30). Prove that v - v e V. 5. Let K be the (singular) stiffness matrix for the pure Neumann problem (11.23), and let the eigenvalues of K be 0 < A 2 < < XNV . Denote the corresponding eigenvectors by E, V2,..., VJv,,, where E is the vector of all ones. Show that the eigenvalues and eigenvectors of Kp = K + pEET are the same as those of A', except that the zero eigenvalue is changed to Nvp. 6. Let K, E, and Kp be as in the preceding exercise. (a) Show that if E-F = Q,thenE-K~}F = 0. Show also that K~' F is independent of p.

264

## Chapter 11. Iterative methods: Conjugate gradients

(b) Show that if F is a multiple of E, then so is K~}F. Show that the norm of K~l F is inversely proportional to p. 7. In Section 11.2, algorithms are presented for computing SU, S~1U, ST U, and S~T U, where S maps the weights in the hierarchical basis representation of a piecewise linear function to its nodal values. Given that the algorithm implementing the action of S~] is correct (which is justified in the text), show that the other three algorithms are correct. (Hint: Represent S~] as the composition of k linear operators, where k is the number of levels of refinement: S"1 = Sf'sr 1 ,..., Sr1.) 8. (MATLAB) Consider the BVP

where 2 is the unit circle. Establish a mesh on Q by starting with a coarse triangulation with four triangles (as in Figure 6.11) and refining it five times. Form the finite element equations KU = F and solve them using both CG and the hierarchical basis CG method. How many iterations of each are required to reduce the relative residual below 10~6? How large is the difference between the solutions computed by the two CG methods and the "exact" solution computed by the MATLAB direct solver? 9. (MATLAB) Repeat the previous exercise, but take 2 to be the polygon with vertices (0, 0), (1, 0), (1, 1), (-1, 1), (-1.-1), and (0,-1). Start with a coarse mesh consisting of six triangles and refine it five times to obtain the final mesh. 10. (MATLAB) Apply the CG algorithm to solve KU = F, the finite element equations resulting from solving the BVP (11.5) using piecewise quadratic functions. Create a table analogous to the table in Example 11.2. 11. Use the results of Exercise 4.8.5 to determine the operation count for each iteration of CG in the previous exercise. Compare this to the operation count when K is the stiffness matrix for piecewise linear finite elements. Taking into account the number of CG iterations required in each case, the cost per iteration, and the number of unknowns, compare the cost of solving the finite element equations using piecewise linear and piecewise quadratic functions (for the given example). 12. (Project) The idea of a hierarchical basis can be extended to piecewise polynomials of higher degree. For example, to incorporate quadratic shape functions on a sequence of meshes To, 71,..., 71, it can be assumed that the basis functions corresponding to To, 7 i , . . . , 7/c-i are piecewise linear, exactly as before. The new basis function corresponding to nodes belonging to 7* but not to Tjt-1 are then taken to be the standard piecewise quadratic basis functions corresponding to the midpoints of the edges in 7*-i (these midpoints are precisely the new nodes in Tk). (a) In this scheme, approximately what percentage of the basis functions are quadratic?

## 11.7. Exercises for Chapter 11

265

(b) Work out the details (for example, the relationship between the stiffness matrices in the standard and hierarchical bases), and implement the corresponding CG method. (c) Compare this approach with the ordinary CG algorithm applied to the stiffness matrix corresponding to the standard nodal basis of piecewise quadratics for BVP (11.5). Is there a difference in efficiency? In accuracy?

Chapter 12

## The classical stationary iterations

In this chapter I will briefly cover the classical Jacobi, Gauss-Seidel, and SOR (successive overrelaxation) methods, all of which can be classified as stationary iterations. These methods generally do not converge as rapidly as CG, but they are still important because they can sometimes be used as preconditioners, and because they play an important role in the multigrid method, which is covered in the next chapter.

12.1

Stationary iterations

Since the methods described in this section are not necessarily restricted to symmetric or positive definite systems, I will write the equation to be solved as

where A RNxN is nonsingular and b e RN. It may be possible to transform the equation Ax = b into an equivalent equation

Equation (12.1) has a unique solution if and only if / B is nonsingular. A solution of an equation of the form x f(x) is called a stationary point (or fixed point) of f. A general algorithm for computing a stationary point is to choose an initial estimate jc(0) and compute jc (1) , j c < 2 ) , . . . by jt(*+1) = f ( x ( k ) ) . This is referred to asfixed point iteration. In the context of (12.1), fixed point iteration takes the form

The iteration (12.2) is called a stationary iteration for AJC b. The exact solution jc satisfies jc = Bx + c, and thus

Therefore,

267

## 268 where the error is defined by

Chapter 12. The classical stationary iterations This in turn implies that

and thus This equation makes the analysis of stationary iterations fairly straightforward, but this analysis requires the use of matrix norms.

12.1.1

Matrix norms

The general definition of a norm for a vector space was given in Section 2.4.1. Since matrices can be added and multiplied by scalars, the set of matrices of a given size can be regarded as a vector space. In this book, only the space RNxN of square matrices with real entries is used. A norm on RNxN is any real-valued function || || satisfying the three fundamental properties:

## A matrix norm on RNxN is required to satisfy a fourth property:

In particular,

and, by induction,

The most useful matrix norms, at least from a theoretical point of view, are those induced by a given vector norm. An induced matrix norm is defined by

Here I use the same notation for the vector norm || || on RN and the induced matrix norm || || on R^ x N . it is easy to show from the definition that an induced matrix norm satisfies

Here are the most important induced matrix norms. 1. The Euclidean norm on

12.1.

Stationary iterations

269

induces a matrix norm that is also denoted || ||2. It can be shown that ||A||2 is the square root of the largest eigenvalue of the positive semidefinite matrix AT A. For a symmetric positive definite (or positive semidefinite) matrix A, this reduces to

## where A. max (A) is the largest eigenvalue of A. 2. The norm on RN is defined by

The corresponding induced matrix norm is also denoted by || ||oo. It can be shown that

In other words, \\A\\QQ is the maximum absolute row sum of A. It should be noted that ||A||2 is quite expensive to compute and is therefore used primarily for theoretical analysis. The most important result on matrix norms, which I give below, refers to the spectral radius of a matrix:

The reader should note the difference between p(A) and A. max (A): A. max (A) is meaningful only when the eigenvalues of A are real, while p(A) makes sense for any matrix, even one with complex eigenvalues. For symmetric positive definite matrices, the two are equal: p(A) = A max (A). Therefore, if A is symmetric positive definite, then ||A||2 = p(A)
00000000000

The following results about matrix norms are fundamental. THEOREM 12.1. /. If A e RNxN and \\ \\ is any matrix norm (induced by a vector norm or not), then

2. If A e R" x/! and is any positive real number, then there exists at least one vector norm \\ \\ such that the corresponding induced matrix norm satisfies

This theorem and the next are proved in Ciarlet [17, Sections 1.4 and 1.5]. As noted above, || Ak | < || A ||* for all k = 1, 2......, 3,.. It follofollows that for any matrixxx It follows that, for any matrix norm || ||, The main result now follows from the previous theorem.

270

## Chapter 12. The classical stationary iterations

THEOREM 12.2. Let A e RNxN. Then the follow ing are equivalent:

12.1.2

## Convergence of stationary iterations

As I showed earlier, the error e(k) = x(k) x in the sequence produced by the stationary iteration satisfies Convergence of x(k) to jc is equivalent to the convergence of e(k) to zero. Therefore, by Theorem 12.2, the iteration (12.4) converges for each possible jc(0) if and only if p(B) < 1. Moreover, (12.5) and the inequality

suggest that the smaller \\B\\ (and hence />(#)) is, the faster the error converges to zero. This conclusion is not necessarily true for all e(G\ since the inequalities

need not be tight. Nevertheless, it can be shown that there is an asymptotic sense in which the above statement is true, namely, that given two stationary iterative methods

and
the first is better than the second if p(B\) < p(#2)- For this more refined analysis, see Ciarlet [17, Theorem 5.1-2] or Varga [42].

## 12.2 The classical iterations

A general way to construct a stationary iteration for a system of the form Ax b is to "split" the matrix A into A M N, where M is nonsingular. Then

12.2.

271

## Generally the iteration is actually implemented as

and so it must be inexpensive to solve systems in which M is the coefficient matrix. The convergence of the iteration is determined by the matrix B = M~]N, and thus the fundamental question is whether p ( M ~ ] N } < 1 holds for A in the class of matrices under consideration. I now describe the most popular splittings and resulting iterations. I will use the notation that D is the diagonal matrix whose diagonal entries are those of A, L is the negative of the strict lower triangle of A, and U is the negative of the strict upper triangle of A. That is,

and A = D - L - U.

12.2.1

Jacobi iteration

The Jacobi iteration corresponds to the splitting M = D, N L + U, and is only defined if all of the diagonal entries of A are nonzero. This iteration is easily implemented as follows:

One class of matrices for which Jacobi iteration is guaranteed to converge is the class of strictly diagonally dominant matrices. A matrix A e R^XW js called strictly diagonally dominant if

## If A is strictly diagonally dominant, then

and hence p (M~] TV) < 1 in this case (see Exercise 1).

272

## Chapter 12. The classical stationary iterations

EXAMPLE 12.3. To illustrate the convergence of the Jacobi iteration, I use the test problem from Example 11.1. This example does not fit into the convergence theorem just mentioned, as K is diagonally dominant but not strictly diagonally dominant. However, Jacobi iteration does converge. The following table shows the number of iterations taken in reducing the relative residual to 10~6:

N
9 49 225 961 3969 16129

## Iterations 40 171 692 2776 11 107 44422

The iteration converges slowly in this example, and this is not uncommon. Jacobi iteration is easy to implement and each iteration is inexpensive, but convergence is quite slow.

12.2.2

Gauss-Seidel iteration

Jacobi iteration requires two vectors of storage for the solution, since x(k) must be kept while is being computed. The value of x(k) can be overwritten with x(k+l) if the value of is used as soon as it is known. One might also expect that this would lead to faster convergence, since the (presumably better) value jc^ +l) is used in place of jcf0. The result is the following iteration, which is called the Gauss-Seidel iteration:

Gauss-Seidel corresponds to the splitting M = D L, N = U. It might be thought that, since M is lower triangular rather than diagonal, Gauss-Seidel is more costly per iteration than Jacobi. In fact, however, the number of operations per iteration is exactly the same, and the memory cost is less since it is not necessary to maintain separate arrays for x(k) and It can be shown that the Gauss-Seidel iteration converges for all symmetric positive definite matrices A. EXAMPLE 12.4. The following table shows the number of iterations of Gauss-Seidel taken in reducing the relative residual in the system from Example 11.1 to 10~6:

N
9 49 225 961 3969 16129

## 12.2. The classical iterations

273

In this example, GaussSeidel converged much more quickly than Jacobi (which is typical), although the performance is still poor.

12.2.3

SOR iteration

Another popular iteration can be obtained by modifying the Gauss-Seidel method. A parameter a> > 0 is chosen and A is split as follows:

The resulting iteration is called the successive overtaxation (SOR) iteration. (Whenw < 1, the method is sometimes referred to as underrelaxation.) Gauss-Seidel is the special case of SOR corresponding to u> 1. The eigenvalues of B = M~^NW depend continuously on CD. Therefore, if Gauss-Seidel converges, then so does SOR for a) sufficiently close to co 1. It might be hoped that one could find co such that

## Ideally, one would find the optimal co by solving

leading to the optimal SOR method. Actually computing the optimal co, though, might be costly and not worthwhile. The following results are known: 1. In order that SOR converge, it is necessary that \co 11 < 1 hold. In fact,

2. If A is symmetric and positive definite, then SOR converges provided \co 1| < 1, that is, provided 0 < co < 2. 3. If A is symmetric positive definite and tridiagonal or block tridiagonal, then

so that Jacobi and Gauss-Seidel both converge and Gauss-Seidel converges more rapidly than Jacobi. Here I denote the Jacobi splitting by A Mj Nj (Mj = D, Nj L + U). Moreover, in this case SOR has an optimal parameter of given by

Ciarlet [17, Section 5.3] contains proofs of the above results. It should be emphasized that even if the optimal SOR parameter exists, it may be a nontrivial task to compute it. Young [43] contains an extensive discussion of practical methods to estimate the optimal SOR parameter.

274

## Chapter 12. The classical stationary iterations

EXAMPLE 12.5. Thefollowing table shows the number of iterations ofSOR taken in reducing the relative residual in the system from Example 11.1 to 10~6. The parameter a> was chosen to be a> 1.5. (/ did not attempt to estimate the optimal value ofu>.)
N
9 49 225 961 3969 16129

## On this example, SOR converged much more quickly than GaussSeidel.

12.2.4

Symmetric SOR

Given a splitting A = M N, the matrix M is a candidate for a preconditioner for the CG method, provided M is symmetric and positive definite. Of the methods considered thus far, only Jacobi's method leads to such an M. However, the SOR method can be symmetrized to give another possible preconditioner. The reader may have noticed that, in the Gauss-Seidel method, the splitting M = D L, N = U is rather arbitrary; it might just as well be M D U, N L. The same remark applies to the SOR method: One could use

instead of The resulting iteration is referred to as backward SOR, since a step corresponds to solving an upper triangular system by back substitution. When A is symmetric, UT = L, and thus

The symmetric SOR (SSOR) method consists of taking one step of the ordinary SOR method followed by one step of the backward SOR method. Applying one step of SOR to x(k) yields

## 12.2. The classical iterations

275

EXAMPLE 12.6. The following table shows the number of iterations of SSOR taken in reducing the relative residual in the system from Example 11.1 to 10~6. The parameter o) was chosen to be a> = 1.5.

N
9 49 225 961 3969

16129

## Iterations 27 93 351 1355 5332 21 193

In this example, SSOR converged about as quickly as Gauss-Seidel (in terms of the number of iterations}, and much slower than SOR with the same parameter. The reader should notice that the cost per iteration of SSOR is twice that of GaussSeidel or SOR.

12.2.5

## CG with SSOR preconditioning

Although it is not immediately obvious, the SSOR iteration given by (12.8) results from a splitting of A, A M(l) N^. The matrix Mw must be given by

## must be the matrix multiplying b in (12.8)),and some simplification shows that

Therefore,

A formula for the matrix Nw could also be derived, but it is not needed. The matrix MM is symmetric and positive definite and so is a candidate for preconditioning the CG algorithm. EXAMPLE 12.7. The following table shows the number of iterations ofCG, preconditioned by SSOR with o) 1.5, taken in reducing the relative residual in the system from Example ll.lfoKT6.

## N 9 49 225 961 3969 16129

Iterations 1 13 23 42 79 165

The number of iterations is reduced by a factor of almost two over unpreconditioned CG.

276

## 12.3 The MATLAB implementation

The algorithms discussed in this chapter are implemented for a general system Ax b\ none of them are mesh-dependent.

12.3.1

MATLAB functions

Jacobi: Jacobi method. GaussSeidel: Gauss-Seidel method. SOR: Successive overrelaxation method (user must provide relaxation parameter). SSOR: Symmetric SOR method (user must provide relaxation parameter). CGSSOR: Conjugate gradients with SSOR preconditioning (user must provide relaxation parameter).

12.4

## Exercises for Chapter 12

1. Suppose A e R"x" is strictly diagonally dominant and that A D L U is the splitting of A into its diagonal, lower triangular, and upper triangular parts, as in this chapter. Prove that and hence that Jacobi iteration is guaranteed to converge. 2. Verify that the Gauss-Seidel iteration corresponds to the splitting M D L, N = U. 3. Suppose A Rnxn is symmetric and positive definite, and let A = M N, where M is nonsingular. Consider the matrix C = MT + N. (a) Prove that C is symmetric. (b) Prove that if C is positive definite, then p ( M ~ ] N ) < 1. (Hint: Recall that

for any matrix norm || ||. Prove that \\M M norm induced by the vector norm

## Some simplification shows

(c) Using the previous result, prove that when A is symmetric and positive definite, the SOR method converges for 0 < CD < 2.

277

## 4. (From [ 16]) Consider the two matrices

For each matrix, determine the following by direct computation: (a) the spectral radius of the iteration matrix for Jacobi's method; (b) the spectral radius of the iteration matrix for the Gauss-Seidel method; (c) whether Jacobi and/or Gauss-Seidel converges. 5. Prove that the matrix norm || ||2 induced by the Euclidean vector norm || ||2 reduces to ||A||2 = ^mcu(A) when A is symmetric and positive definite. (Hint: There is an orthonormal basis of eigenvectors jci ,XT, ... ,xn with corresponding eigenvalues 0 < A, < X2 < - < X n . For any x = c\X]+C2X2-\ \-cnxn, \\x\\l = cJ+c^H \-c2n. Compute Ax using the expansion of x in terms of eigenvectors; then compute || Ax\\\ and compare it to ||jc l^.) 6. Show that the matrix norm || ||oo induced by the vector norm || ||oo is given by

(Hint: Notice that if ||jc||oo = 1, then no component of x is more than 1 in magnitude. Use this to show that HAHoo is bounded by the maximum absolute row sum of A. Then suppose row k o f A has the largest absolute row sum and consider a vector with all components equal to 1, the signs matching the signs of the entries in row k of A.) Consider a model problem on a rectangle resulting in a stiffness matrix with no more than five nonzeros per row. What is the cost of an iteration of CG? What is the cost of an iteration of CG with SSOR preconditioning? Taking into account the added cost of the SSOR preconditioning, did the SSOR preconditioning lead to an increase in efficiency in Example 12.7? 8. (MATLAB) Consider the BVP

where 1 is the unit circle. Starting with a coarse triangulation with four triangles (as in Figure 6.11), refine it five times to obtain a sequence of six meshes. Form the finite element equations KU F and apply each of the following methods to solve KU F, reducing the relative residual to below 10~6. Create a table analogous to that of Example 12.3 for each: (a) Jacobi iteration, (b) Gauss-Seidel,

278

## Chapter 12. The classical stationary iterations

(c) SOR (choose the relaxation parameter by trial and error), (d) SSOR with the same relaxation parameter, (e) CG with SSOR preconditioning. 9. (MATLAB) Repeat the previous exercise, but take 2 to be the polygon with vertices (0, 0), (1,0), (1, 1), (-1, 1), (-1, -1), and (0, -1). Start with a coarse mesh consisting of six triangles and refine it five times to obtain a sequence of six meshes. 10. (M ATLAB) Apply each of the following methods to solve KU F, the finite element equations resulting from solving BVP (11.5) using piecewise quadratic functions. Create a table analogous to that of Example 12.3 for each: (a) Jacobi iteration, (b) Gauss-Seidel, (c) SOR (choose the relaxation parameter by trial and error), (d) SSOR with the same relaxation parameter, (e) CG with SSOR preconditioning. Is the relative efficiency of the methods the same as in the piecewise linear case?

Chapter 13

## The multigrid method

In the preceding chapter, I described the classical stationary iterations and showed how the rate of convergence of a given method depends on the spectral radius of the iteration matrix. However, there is more to an iterative method than the rate at which the norm of the error goes to zero. As I will show in Section 13.1, certain iterative methods smooth the error: the high frequency components of the error go to zero quickly, while low frequency components are reduced much more slowly. Multigrid methods take advantage of this phenomenon, switching from one mesh to another to reduce various components of the error efficiently.

13.1

## Stationary iterations as smoothers

To lay the foundation for the multigrid method, I begin by examining in detail the performance of Jacobi and Gauss-Seidel iterations on a simple model problem. The BVP is Poisson's equation under Dirichlet conditions, the domain is the unit square, and the mesh is a uniform triangulation. It is possible to describe very precisely the fashion in which the error in the solution to KU F goes to zero.

13.1.1

## In this section, 2 will denote the unit square,

ft = {(x,y) : 0 <Jt < I, 0 < > ' < ! } ,

and Th will denote an n x n uniform triangulation of 2, as shown in Figure 13.1 for n = 4. In this section, the integer n will be of the form n = 2m for some positive integer m. Such a mesh has 2n2 triangles, (n + I) 2 nodes, and (n I) 2 free nodes. The nodes will always be numbered by rows, from left to right and bottom to top, from 1 to (n + I) 2 , and the free nodes, which are the interior nodes for a Dirichlet problem, will be numbered in the same way.
279

280

## Chapter 13. The multigrid method

Figure 13.1. A uniform triangulation of the unit square. All the nodes are labeled on the left; the free nodes are labeled on the right. For reasons that will become clear, it is convenient to define an alternate numbering of the nodes in the mesh, one that explicitly displays the two-dimensional nature of the mesh. Defining h 1/n, a typical node is (ih, jh) = (i/n, j/n), i, j = 0, 1 , . . . , n. In this section, the nodes will be denoted

The free nodes are then Z i j , i, j 1 , 2 , . . . , n 1. The stiffness matrix for the BVP

takes a special form on the uniform mesh Th- First of all, no row of K has more than five nonzero entries. Referring to Figure 13.1, it might appear that, for example, K5\, AT52, AT54, ^55, ^56, ^58> ^59 would all be nonzero, for a total of seven nonzeros in row five of K. However, due to the geometry of the mesh, K*,\ and 59 are both zero (V05 and V<j>\ are orthogonal on their common support, as are V05 and V</>9). The analogous cancellation occurs in each row of K. Second, the five nonzeros in a typical row correspond to the nodes

20

## 13.1. Stationary iterations as smoothers

281

in the linear ordering of the nodes, which determines the order of the unknowns and the sparsity structure of K. Therefore, nonzeros in K occur only in diagonals (n 1), 1 , 0 , 1, n 1, where the main diagonal is numbered 0, the subdiagonals are numbered 1, 2, and the superdiagonals are numbered 1,2, Finally, due to the uniformity of the mesh and the PDE, the nonzero entries are

Most rows have these five nonzeros, although rows corresponding to nodes next to the boundary have only four nonzeros and rows corresponding to nodes in the corners (nodes z\,\, 2/1-1,1, z\,n-\, zn-\,n-\) nave onry three nonzeros. The result is the block tridiagonal matrix

## where A is an (n - 1) x (n - 1) tridiagonal matrixm,

and / is the (n 1) x (n 1) identity matrix. Those readers familiar with finite difference methods for solving Laplace's or Poisson's equation will recognize that K is a multiple of the finite difference matrix resulting from the standard five-point stencil for the Laplacian:

13.1.2

## Fourier modes and the spectral decomposition of K

The matrix K has the following (rare) property: It is possible to write down analytic formulas for its eigenvalues and eigenvectors. It turns out that the eigenvectors of K can be obtained from the eigenfunctions of the Laplace operator, which are also known analytically. Defining

it follows that

## Chapter 13. The multigrid method

showing that u/ fc - ) , /z*^ form an eigenfunction-eigenvalue pair for each k, t > 1. Perhaps surprisingly, the eigenvectors of K are obtained by merely sampling the eigenfunctions M/*-) at the free nodes. The vector W(k-l) e R^"1* is defined by

When it is necessary to arrange the components of W(M) in a one-dimensional array, as when forming the matrix-vector product

## the order is the same as the order of the nodes:

I will now show that W(kti) is an eigenvector of K by computing atypical entry (K W^-^^ij. The manipulation that follows is based on the trigonometric identities

The reader should notice how the following expression arises from the nonzeros 1, -1,4, 1, 1 in the typical row of K:

The above calculation, which is derived for a typical row of K with five nonzeros, is also valid for the rows with four or three nonzeros. In these cases, one or two of the adjacent nodes Zij-i, Zi-ij, Zi+ij, Zij+\ lies on the boundary, where the value of w/M) is zero. In such a case, any term that does not belong in the above calculation o f ( K W ( k ' e ) ) i j is in fact zero and has no effect.

## 13.1. Stationary iterations as smoothers I have thus shown that

283

where A.^ = 4 2 cos (kn/n) 2 cos (tn/n). The relationship between W(k*t} and w(k'l} has already been stated (sampling w(k-l) yields W(k^). When k and t are small compared to , then, by a Taylor expansion,

The stiffiiess matrix K would be scaled by 1 /h2 to make it a discrete approximation to A (see (13.2)), and this scaling would make A.^ close to /u^ for K, I small. Since K is (n I) 2 x (n I) 2 , it can have only (n I) 2 independent eigenvectors. Indeed, the vectors W(k-^ are distinct only for k, I 1, 2 , . . . , n 1. For larger values of k and/or t, aliasing occurs because the discrete mesh cannot resolve such large frequencies. To be precise, the following relationships hold (Exercise 1):

The eigenvectors W(k-^,k,l \,2,... ,n - I, can be shown to be orthogonal under the Euclidean dot product. They therefore form an orthogonal basis for R (n ~ 1)2 . It is convenient to normalize these eigenvectors so as to obtain an orthonormal basis, and this is facilitated by the fact that each W(k-i} has the same Euclidean norm:

Therefore, if

then{W ( 0 ) : k,i= 1,2, ...,n- 1} is an orthonormal basis for R ( r t ~ l ) 2 . The significance of the above calculations can now be explained. When U e R ( "~ 1)2 is written in terms of the basis of eigenvectors,

284

## Chapter 13. The multigrid method

Figure 13.2. Four Fourier modes for n = 16: W(}'1) (upper left), W ( 1 ' I 5 ) (upper right), W(15>1) (lower left), W (15 - I5) (lower right). (this is the standard formula for expressing a vector in terms of a given orthonormal basis; see Exercise 2), the frequency content off/ is revealed by the coefficients U W(k^\ This is because each eigenvector is a sinusoid that oscillates with a certain frequency, as illustrated in Figure 13.2. The eigenvectors W(*') are called (discrete) Fourier modes. When k and t are both small compared to n, then W(k^ is relatively smootha low frequency mode. When k and t are both close to n, then W(k^ is quite oscillatorya high frequency mode. When one of the indices is small and the other large, then W(k-^ is smooth in one variable and oscillatory in the other. Such a mode is called a mixed frequency mode. It will be useful below to define these terms precisely: 1 < k, t < n/2 correspond to a low frequency mode, n/2 < k,i < n 1 to a high frequency mode, and the remaining values of k, t to mixed frequency modes.

13.1.3

Jacob! iteration

## 13.1. Stationary iterations as smoothers

285

Figure 13.3. The initial vector U^ from Example 13.1 (left} and its frequency content (right). is applied to solve KU F, the error

## (where U is the exact solution of KU F) satisfies

For this reason, when analyzing convergence, it is common to take F 0, so that the system is KU 0, the solution is U 0, and the error is just E(k) U(k\ This simplifies the notation somewhat, but does not limit the applicability of the results. Of course, the initial vector t/ (0) , which plays the role of the initial error, must be nonzero. The following example shows how the error goes to zero under Jacobi iteration. EXAMPLE 13.1. Consider the model problem with n 16, so that K e R 225x225 . The initial vector is chosen to contain all frequencies equally:

This vector is displayed (as a piecewise linear function) in Figure 13.3. The equivalent vector, having as components f/(0) W(kX) and thus revealing the frequency content of U(Q is also displayed in Figure 13.3. This equivalent vector can conveniently be computed as WTU(k\ where W is the matrix whose columns are the Fourier modes:

Ten steps of the Jacobi iteration yields the vector (7(10), which is shown in Figure 13.4 along with WTU(}0). The norm of the error is reduced by about a factor of\Qby these 10 iterations:

286

## Chapter 13. The multigrid method

Figure 13.4. The vector f/ (10) from Example 13.1 (left) and its frequency content (right). More importantly, Figure 13.4 shows that the various Fourier modes of the error were damped according to a distinct pattern: The low and high frequency modes were reduced much less than the mixed frequency modes. The results of the previous example could have been predicted. The Jacobi splitting is K = Mj Nj , where My is the diagonal matrix agreeing with K on the diagonal, and Nj is the rest of the matrix. Jacobi 's method is then

and the eigenvectors of Bj = MJ1 Nj are precisely the same as the eigenvectors of K. Indeed, a similar calculation to that given on page 282 shows that

Therefore,

where

It follows that

## 13.1. Stationary iterations as smoothers

287

Figure 13.5. The eigenvalues 6k. t of the Jacobi iteration matrix, graphed as a function of k and t (left) and as a function of a single index (right). The eigenvalues corresponding to low, mixed, and highfrequencies are indicated by o, o, andx, respectively. The Fourier modes are damped since \0k.e\ < 1, and the smaller the eigenvalue \6k^ , the faster the mode W(k'i} is damped. The expression 9k^, as a function of A: and I, is graphed in Figure 13.5. Since the three-dimensional view of #*. is difficult to interpret, it is helpful to view Q^ as a function of a single index; this is also displayed in Figure 13.5. All of the eigenvalues 6k^ are less than one in magnitude, but those that correspond to the lowest frequency and the highest frequency modes are close to one in magnitude. It follows that these modes are slowly eliminated by the Jacobi iteration. On the other hand, the eigenvalues corresponding to the mixed frequency modes are small and these modes are eliminated quickly by the Jacobi iteration.

13.1.4

## Weighted Jacobi iteration

As 1 show in the next section, it is useful to have iterative methods that smooth the error; that is, iterations that eliminate the high frequency modes most rapidly and the low frequency modes most slowly. Such an iteration can be designed using weighted Jacobi iteration. Given any iteration matrix B and a scalar a> e (0, 1), a new iteration can be defined as follows: When B is Bj, the Jacobi iteration matrix, this is called the weighted Jacobi iteration. The eigenvalues of are simply (see Exercise 3), and

288

## Chapter 13. The multigrid method

Figure 13.6. The eigenvalues of the weighted Jacobi iteration matrix Bj , graphed as a function ofk and t (left) and as a function of a single index (right). The eigenvalues corresponding to low, mixed, and high frequencies are indicated by o, o, and x, respectively. (The dotted lines correspond to 1/3.)

(1 < Bk,t < 1, as was shown above). To damp the high frequencies, o> should be chosen so that the values \a)6k,i + 1 w are as small as possible for n/2 < k, t < n 1. Exercise 4 asks the reader to show that the optimal value of a), from this point of view, is a> = 2/3; with this value of co,

The eigenvalues of By

## are shown in Figure 13.6. Examination of Figure 13.6 shows that,

as expected, the eigenvalues corresponding to high frequency modes are all less than 1/3 in magnitude, while those corresponding to mixed frequency modes are all less then 2/3. Only the eigenvalues corresponding to low frequency modes are close to 1 in magnitude. It can be predicted, then, that the iteration matrix Bj } will eliminate the high frequency modes of the error quickly, leaving behind only the smooth (low frequency) modes. This is confirmed in the following example. EXAMPLE 13.2. The model problem, with n = \6, is solved by weighted Jacobi iteration with a) 2/3. Once again the initial vector is chosen to contain all frequencies equally (see Figure 13.3). Ten steps of the weighted Jacobi iteration yield the vector (7(10) shown in Figure 13.7 along with WTU(]0). The convergence is actually degraded a bit as compared to the ordinary Jacobi iteration, when only the norm of the error is considered:

## 13.1. Stationary iterations as smoothers

289

Figure 13.7. The vector (7 (IO) from Example 13.2 (left) and its frequency content (right).

## the lowest frequencies are discerniblen inll(]0).

However, Figure 13.7 shows the expected damping of high frequency modes. Only

Other stationary iterations can be seen to smooth the error, although it may not be so simple to analyze the eigenvalues and eigenvectors of the iteration matrix in other cases. For example, the Gauss-Seidel iteration matrix BOS is not symmetric and has complex eigenvalues and eigenvectors, so its effects cannot be as easily analyzed as above. Nevertheless, the Gauss-Seidel iteration has a pronounced smoothing effect on the error, as the following example suggests. EXAMPLE 13.3. This time the model problem with n = 16 is solved using the Gauss-Seidel iteration. The initial vector U (0) is as before (see Figure 13.3). Ten steps of the Gauss Seidel iteration yield the vector f/ ( I O ) shown in Figure 13.8 along with WTU^\ The convergence is much faster than was exhibited by either Jacobi or weighted Jacobi iteration:

Moreover, Figure 13.8 shows a very pronounced smoothing effect. Gauss-Seidel will be used as the basic iterative method while developing the multigrid method. The following example shows that the above ideas are not restricted to the model problem. EXAMPLE 13.4. Consider the equation KU 0 arising from the BVP

290

## Chapter 13. The multigrid method

Figure 13.8. The vector /(10) from Example 13.3 (left) and its frequency content (right).

## Figure 13.9. The mesh for Example 13.4.

where K (x, y) = 1 + x2 + 2 y 2 and 1 is the unit disk. A mesh is established on Q as shown in Figure 13.9; the mesh is intentionally asymmetric. The initial vector J7(0) is taken to be the piecewise linear interpolant of

ThusU(Q} contains many frequencies. Figure 13.10 shows f/ (0) , f/ (1) , f/ (2) , (7(4), obtained by applying the Gauss-Seidel method to KU = 0. The results show that the error is smoothed by the Gauss-Seidel iteration.

## 13.2. The coarse grid correction algorithm

291

Figure 13.10. The vectors U(Q}, U ( l ) , U(2\ U(4) from Example 13.4.

13.2

## The coarse grid correction algorithm

Here is a key idea underlying the multigrid method: Some Fourier modes that are classified as low frequency on a given mesh appear to be high frequency on a coarser mesh. This is because the frequency at which a Fourier mode oscillates is independent of the mesh, while the classification into low and high frequency modes depends on n, that is, on how fine or coarse the mesh is. To explain this idea in detail, I consider a sequence of nested meshes, with one mesh obtained from the previous by refinement. The final (finest) mesh will be denoted 7/z, and the previous, coarser meshes by TIH , T*h, For motivation, the model problem should be kept in mind, but the following development is valid for any BVP considered in this book. Other quantities associated with these meshes will be identified by the subscripts h,2h,4h, For example, Kh will denote the stiffness matrix for mesh ?/,, K^h the stiffness matrix for mesh Tib, ar >d so forth. I will denote the exact solution of KhU Fh by (//,, and similarly for the other meshes. The number of free nodes in 7/, will be denoted by /V/,, so that Kh e RN"xN, Fh e R"", and so forth.

292

13.2.1

## Projecting the equation onto a coarser mesh

Applying k steps of the Gauss-Seidel method to KhU F/, yields an estimate U^k) of the solution Uh- Since the multigrid switches between meshes, performing iterations on each, it becomes inconvenient to keep track of the iteration number k. Therefore, I will denote the current approximate solution on Th by Vh rather than by U(h \ The error Eh Uh V/, is expected to be relatively smooth, which has two important implications. First of all, a smooth (low frequency) function on Th can be well-represented on Tih\ this is not true of a function that contains significant high frequency content. This is illustrated in Figure 13.11. It follows that it should be possible to compute Eh accurately (though not exactly) on Tin. Second, since Eh is smooth, it is mostly made up of low frequency modes on Th. Referring for the moment to the model problem, this would mean that the significant components of Eh are W(k-l), where 1 < k < n/2 and 1 < t < n/2. However, on mesh Tih, k is a high frequency if k > (n/2)/2 n/4, and similarly for t. This means that many of the frequencies that are hard to eliminate on Th can be eliminated quickly on Tih

Figure 13.11. Top: A smooth function on a fine mesh (left) and on a coarse mesh (right). Bottom: An oscillatory function on a fine mesh (left) and on a coarse mesh (right).

## 13.2. The coarse grid correction algorithm

293

I must now explain how to move from Th to Tih when solving KhU = Fh. It is not possible to transfer the equation itself from Th to Tin, since the solution Uh may not be well-represented on the coarser mesh (Uf, may contain significant high frequency modes). However, the error /, = //, Vh is known to be smooth, so it makes sense to look for an equation satisfied by Eh. This is easy to find:

The residual Rh Fh KhVh is the amount by which Vh fails to satisfy the equation Kf,U Ff,. While Eh cannot be computed (if Vh and Eh were both known, then the exact solution Uh = Vh + Eh would be known), Rh is computable once the estimate V/j has been computed. Therefore, Eh satisfies the known equation

To understand the next step, it is important to recall that each vector Vh e RNh corresponds to a piecewise linear function vh on Th (that is, Vh e Ph )). Similarly, Vih e R^2A corresponds to a piecewise linear function VIH on Tih (vih P-ih)- Since Th is obtained from Tih by refinement, it follows that P^ C P^ }. That is, any function vih that is piecewise linear on Tih is also piecewise linear on Th- The function vih can be represented by the vector V2h containing the nodal values of VIH on Tih- To obtain the corresponding vector Vh containing the nodal values of vih on 7/i, it is necessary to interpolate V2h on Th- There is an operator lihh that takes Vih and produces Vh by interpolation. The operator hh,h can be represented by a matrix, but all that is needed is to compute the action of hh,h on an arbitrary vector Vih ' hh.hVih- This is easy because each node in Th either belongs to Tih or is the midpoint of an edge in Tih- If ^ih represents the (indices of) nodes of Tih and A//, represents the (indices of) nodes in Th that are not in Tih-, then the following algorithm computes Vh hh.h Vih- This algorithm is slightly complicated by the fact that a free node in Th might be the midpoint of an edge in Tih with one free node and one constrained node as endpoints. Therefore, the algorithm actually computes V/,, the vector of nodal values for all nodes, free and constrained. The components corresponding to free nodes are then extracted.
Copy the components of vw to the appropriate components of (those corresponding to nodes For

## forresponding to free nodesto obtain

Here end(/, 1), end(/, 2) represent the indices of the endpoints of the edge in Tih of which Z; . Th is the midpoint. The equation KhEh = RH is now to be solved for a vector corresponding to a piecewise linear function belonging to the subspace V^ of VH- This means that the solution will be of the form hh.h^ih-, and the equation becomes

294

## Chapter 13. The multigrid method

Equation (13.5) represents Nh equations in only N^h unknowns, and is therefore overdetermined and unlikely to have a solution. As I mentioned above, it should be possible to compute Eh accurately but not exactly on Tih- To recover a square system, the transpose of hh,h can be applied to both sides of (13.5):

Since ^^ maps R^2'1 into R^'1, the transpose I%h h maps R^'1 into R^2*. For this reason, l^h h is sometimes denoted as //, 2/1. The operator 72^ h could also be represented by a matrix, but its action can be computed directly and efficiently without a matrix representation. The following algorithm computes V2h 1^ h^ Cnce again, the intermediate vector Vh, containing all of the nodal values (free and constrained), is used. Copy the components of V), to the appropriate components of Vh (those corresponding to the free nodes of Th). Initialize the other components of Vh to zero. For/ e A//,,

Extract the components of V/, corresponding to free nodes in Tih to obtain V^.

## 13.2.2 The projected equation and the Galerkin idea

It is now necessary to solve, at least approximately, the equation

This can be done using the Gauss-Seidel iteration. Before I discuss this, however, I want to prove that the matrix lh hKhhh,h is just the stiffness matrix for the mesh Tih-

This can be shown by considering arbitrary functions v^h and wj_h in P^ and the corresponding vectors V^, ^2h e R^2* of nodal values. If Vh and Wf, are defined by

and vh , wh are the corresponding functions in Phl\ then vh V2h and wh W2h- Therefore,

## where #(-,-) is the usual energy inner product:

13.2. The coarse grid correction algorithm Since vh V2h and wh w2h, this shows that

295

## But K2h is the unique matrix with the property that

and thus as desired. It is also true that F2h = l{h h Fh (see Exercise 7). Multiplying by l^h h to produce

may seem rather arbitrary, but it is actually an instance of the Galerkin method. The system

## is equivalent to the variational equation

As I remarked above, this equation is overdetermined and cannot be solved in general. However, it can be projected onto the subspace RNlh by requiring that (13.7) hold only for Wh of the form Wh = hh,hW2h, W2h e RNl11. The new variational equation is then

## 13.2.3 The two-grid multigrid algorithm

The basic two-grid multigrid algorithm can be summarized as follows: 1. Do n\ Gauss-Seidel iterations on KhU Fh to get an estimate V/7 of the solution Uh. 2. Compute the residual Rh Fh KhVh and project it onto T2h to get I^h hRh3. Do n\ + n2 Gauss-Seidel iterations on K2f,E I^h hRh to get an estimate D2h of EU4. Interpolate D2h onto TH and update the approximate solution on the fine mesh:

296

## Chapter 13. The multigrid method

It is easy to verify empirically that the corrected Vh, while more accurate than the initial V/,, tends to have relatively more high frequency errors. This is because, during step 3, the values of Eh are estimated more accurately at the nodes of Tih (where, after all, the computation of Dih is performed) than at the nodes of Th that do not belong to Tih- Therefore, it is usual to add a fifth step to the above algorithm: 5. Do HI Gauss-Seidel iterations on K^U Fh, beginning with the value of V/, from step 4. Replace V/, with this improved estimate of //,. Step 1 is calledpresmoothing, steps 2 to 4 coarse-grid correction, and step 5 postsmoothing. In a two-dimensional problem, computations on Tih cost about one fourth of the analogous computations on Th- Applying the intermesh transfer operators lih,h and I*h h during coarse-grid correction costs significantly less than one Gauss-Seidel iteration on Th, and therefore will be ignored. Computing the residual Rh = Ft, Kh Vf, costs about the same as one Gauss-Seidel iteration. Therefore, the total cost of the two-grid algorithm is no more than (5/4)(nj + 2 + 1) Gauss-Seidel iterations on K^U Fh. It makes sense, then, to compare the performance of the two-grid algorithm and Gauss-Seidel for the equivalent amount of work. EXAMPLE 13.5. The finite element method with linear Lagrange triangles is applied to the BVP

where K(X, y) 1 +x 2 +2y2, Q is the unit square, and f is chosenso that the exact solution is u(x, y) x(l x) sin (ny). The coarse and fine meshes are shown in Figure 13.12. Table 13.1 shows the error in the computed solution for the two-grid algorithm described above and for the equivalent number of GaussSeidel iterations. The parameters n\ = 2 and HI = 1 were used. The results show that the two-grid algorithm is significantly more efficient than the GaussSeidel iteration. As in the preceding example, the cost of various multigrid algorithms can be compared to the cost of Gauss-Seidel iterations on the finest mesh. In this context, the computational cost of one Gauss-Seidel iteration on the finest mesh is called one work unit.

13.3

## The multigrid V-cycle

The previous example shows that the two-grid algorithm is an improvement over GaussSeidel; however, the convergence of the two-grid method is still not particularly good. The real power of the multigrid idea lies in applying coarse-grid correction recursively. When solving the equation KIHE = 1^ hRh on Tih, a coarse-grid correction can be applied using T^h- Similarly, when solving the equation on T^h, a coarse-grid correction using TSA can be applied. The resulting method is called the multigrid V-cycle. The name arises from the schematic shown in Figure 13.13, which shows the method beginning on the finest mesh (at the top), descending to the coarsest mesh (at the bottom), and then returning to the finest mesh. On the way down, presmoothing iterations are performed; on the way up, postsmoothing is done.

## 13.3. The multigrid V-cycle

00000000000000000000000000000000000000000000000000000000000000000 00000000000 0 0 0000000000 0000 0000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000

297

Table 13.1. Comparison of the two-grid algorithm with Gauss-Seidel. The number of GaussSeidel iterations is chosen so that the two methods will use roughly the same amount of computational work. The ratio in the fifth column is the error in the solution produced by GaussSeidel divided by the error in the solution produced by the two-grid algorithm.

Figure 13.12. The coarse and fine meshes for Example 13.5. The multigrid V-cycle algorithm is defined in Algorithm 13.1. This algorithm can be iterated as many times as needed to reduce the residual Rh to an acceptable level. The first time the V-cycle is invoked, the initial estimate Vh 0 is used. Since the parameters n\, ri2 (the number of presmoothing and postsmoothing iterations, respectively) determine the cost and rate of convergence of the V-cycle, the V-cycle with these parameters is often called the V(n\, 2)-cycle. As the following example shows, the V-cycle is much more effective than the two-grid algorithm. However, it is not much more costly. Since a Gauss-Seidel iteration on T^h costs only 1/4' of an iteration on Tt,, the total cost of the V-cycle is

298

## Chapter 13. The multigrid method

Figure 13.13. The multigrid V-cycle. This should be compared with an upper bound of (5/4)(n\ + ni + 1) for the two-grid algorithm; the cost of the V-cycle is only about 8% more.

Given an estimate Vf, of Uf,: Perform n\ iterations of Gauss-Seidel on K^U = Fh to improve the estimate V/, of f//, Compute the residual Rh = Fh Kh V/, for/ = 1,2, . . . , * Project /J2,--iA onto 7^ to get F2//, = /^-i,,^-'/* Starting with the zero vector, perform n\ iterations of Gauss-Seidel on K2ihE = F2<h to get an estimate D2ih of E2:h Perform 2 iterations of Gauss-Seidel on Ki*hE Fikh to improve the estimate D-^h of I2*/! fori = * - l , * - 2 , . . . , 1 Interpolate D^\h onto T^h to get 1^+^,2'h^2i+lh Correct D2,h\ D2-h <- D2.'A + /2'+A,2'A^2'+ I * Perform 2 iterations of Gauss-Seidel on K2>hE F2>h to improve the estimate D2.-/, of E2<h Interpolate D2/j onto 7^ to get /2/,,/,Z>2/, Correct V*: VA <- Vh + / 2A , A D 2A Perform n 2 iterations of Gauss-Seidel on K^E = Fh to improve the estimate Vh of Uh Algorithm 13.1. The multigrid V-cycle for solving K^U = Fh on the mesh ThThe nested family of meshes is assumed to be ?2*/!,..., 72/t, 7/j-

## 13.3. The multigrid V-cycle

299

V-cycle its 1 2 3 4 5 6 7 8 9 10

Rel. error 8.802 10~2 8.119- 10~3 7.847- 10~4 7.767- 10~5 7.755 10~6 7.741 - 10-7 7.701 10~8 7.654- 10-9 7.666 10-10 7.839- 10-'1

Gauss-Seidel its 6 11 16 22 27 32 38 43 48 54

Rel. error 7.906 10-' 6.452- 10-' 5.262 10-' 4.117- lO" 1 3.354-10-' 2.731 10-' 2.134- 10-' 1.737- 10~' 1.413- 10-1 1.103- JO" 1

Ratio 8.98- 10 7.95- 101 6.71 102 5.30- 103 4.32 104 3.53 105 2.77 - 106 2.27- 107 1.84- 108 1.41 -10 9

Table 13.2. Comparison of the multigrid V(2, \)-cycle with Gauss-Seidel. The number of Gauss-Seidel iterations is chosen so that the two methods will use roughly the same amount of computational work. The ratio in the fifth column is the error in the solution produced by Gauss-Seidel divided by the error in the solution produced by the V-cycle. EXAMPLE 13.6. The V(2, \)-cycle is applied to the system K^U = F^from Example 13.5, using a sequence of four meshes, with 8, 32, 128, and 512 triangles. Table 13.2 shows the error in the computed solution for the V-cycle algorithm described above and for the equivalent number of Gauss-Seidel iterations. The multigrid V-cycle is conveniently expressed and implemented as a recursive algorithm. In this context, it is more natural to write the relevant system on each level as KyhUzh Fiih, even though Fr,h is not the load vector except on the finest mesh. Similarly, 1/2'h is not an estimate of the desired solution except on the finest mesh; on the coarser meshes, it represents an estimate of the error on the next finer mesh. With this understanding, the multigrid V-cycle can be expressed as a recursive algorithm Vh mgv(Kh, F/,, Vh) for replacing an estimate V), of the solution of KhUh = F/, with a better estimate. The recursion is expressed succinctly as Algorithm 13.2.
Given an estimate Vh of the solution of KhUh = Fh: Perform n\ iterations of Gauss-Seidel to improve the estimate Vh If this is not the coarsest mesh: Compute F2h = 72rM (Fh - KhVh) and set Vih = 0 Call mgv recursively: V2i, <- mgv(K2h, F2h, ^2h) Correct Vh: Vh +- Vh + hh.hVih Perform nj iterations of Gauss-Seidel to improve the estimate V/,

Algorithm 13.2. The recursiveform of the multigrid V-cycle for solving K^U = F/, on the mesh 7/j.

## Chapter 13. The multigrid method

As Example 13.6 shows, a single V-cycle may not be enough to solve the equation accurately. An alternative to simply performing multiple V-cycles is to do multiple cycles on the coarsegrid equations. This makes the coarse-grid correction more accurate without incurring additional (relatively expensive) computations on the finest mesh. When the coarse-grid correction cycle is performed IJL times, the result is called a fi-cycle. The recursive algorithm Vf, - mg/ji(Kh, Fh,Vf,) is presented in Algorithm 13.3. Given an estimate V/, of the solution of K^Uf, = Fh: Perform n \ iterations of Gauss-Seidel to improve the estimate Vh If this is not the coarsest mesh: Compute the F2h = I2h,h (Fh - KhVh) and set V2h = 0 Call mg/M recursively /x times: for/ = 1,2, . . . , / x V2h <- mgfi(K2h, F2h, V2h) Correct y ft : Vh ^ Vh + I2h,hV2h Perform n2 iterations of Gauss-Seidel to improve the estimate V/, Algorithm 13.3. The recursive form of the multigrid ii-cycle for solving KhU = Fh on the mesh ThWhen IJL = 1, the /i-cycle is just the V-cycle, while /* = 2 results in the W-cycle, which is illustrated (for four meshes) in Figure 13.14. The name W(n\, 2)-cycle is used to specify the number of Gauss-Seidel iterations at each step. Exercise 5 asks the reader to show that the cost of the W(n\, 2)-cycle is bounded by 2(n\ + 2 + 1) work units. Therefore, for example, a W(l, l)-cycle costs roughly the same as a V(2, l)-cycle. Increasing /x beyond /z = 2 produces little additional accuracy for the additional computational cost; therefore, in practice, only the V-cycle and the W-cycle are used. EXAMPLE 13.7. This example applies the W(\, \)-cycle to the system KhU Fh from Example 13.5. Table 13.3 shows the error in the computed solution for the W-cycle algorithm described above and for the equivalent number of GaussSeidel iterations. The results should also be compared with Table 13.2. For this example, theW(l, \}-cycle is somewhat more efficient than the V(2, \}-cycle.

13.4

Full multigrid

Multigrid V-cycles and W-cycles perform well, but there is one way in which they might be improved: The initial estimate of (//, (on the finest mesh) is taken to be V/, = 0, and it is

## 13.4. Full multigrid

301

Figure 13.14. The multigrid W-cycle. W-cycle its Rel. error 2.350- 10~2 5.464 10~4 4.077- 10~5 2.397- 10~6 2.131 10~7 2.213- 10~8 2.348- 10~9 2.583- l<r 10 2.869- 10~n 3.187- 10~12 Gauss-Seidel its Rel. error 7.906- 10"1 6.194- 10-' 4.849- 10"' 3.793 10-' 2.965- 10-' 2.317- 10"1 1.810- 10'1 1.413- 10'1 1.103- 10'1 8.614- 10'2 Ratio 3.36- 101 1.13- 103 1.19- 104 1.58- 105 1.39- 106 1.05- 107 7.71 - 107 5.47- 108 3.85- 109 2.70- 1010

1 2 3 4 5 6 7 8 9 10

6 12 18 24 30 36 42 48 54 60

Table 13.3. Comparison of the multigrid W-cycle with GaussSeidel. The number of Gauss-Seidel iterations is chosen so that the two methods will use roughly the same amount of computational work. The ratio in the fifth column is the error in the solution produced by Gauss-Seidel divided by the error in the solution produced by the W-cycle.

natural to try to do better. A simple idea is to estimate the solution to Kih U FIH using the mesh Tih, and then interpolate this solution onto 7/, to use as the initial V/j. However, there is no reason to start on Tih \ an initial estimate of the solution to Kih U Fih can be obtained by estimating the solution to K^U F^ on ?4/,, and so forth. The/w// multigrid algorithm begins by solving the equation K2^,U 2t/, F on the coarsest mesh T^ and then interpolating to obtain an initial estimate of the solution Lfy-i/, of K2k-ti,U = F2*-i/,, which is then improved by n /x-cycles. This estimate to Lfy-i/, is then interpolated onto T-^-ih and used as an initial estimate of f/ 2 -2 h . Again, n /^-cycles are used

302

## Chapter 13. The multigrid method

Algorithm 13.4. The full multigrid algorithm for solving Kh U = Ff, on the mesh - The nestedfamilty of meshes is assumed to be T&h-. , T2h, TH-

## Figure 13.15. The fall multigrid algorithm.

to improve the estimate of U2k-2h, and the process continues. The full multigrid algorithm is easily expressed, in Algorithm 13.4, in terms of the /it-cycle. Figure 13.15 shows the order in which the meshes are visited for IJL = 1 (the V-cycle version) and n = 1. The full multigrid algorithm can also be expressed recursively, as in Algorithm 13.5, in the form Vh <- fmg(Kh, Fh). The reader will note that the full multigrid algorithm is determined by the parameters n\ (presmoothing iterations), n2 (postsmoothing iterations), ^ (/x = 1 for V-cycle, \JL = 2 for W-cycle), and n (the number of /n-cycles at each level). In the full multigrid algorithm, n /i-cycles are performed at each of the meshes T^-i/z, T2k-2h,... ,Th- The cost of a /Lt-cycle on T^h 's about 1/4' times that on Th. Therefore, reasoning as before, the total cost of the full multigrid is bounded by 4/3 times the cost of n /x-cycles, or a total of (\6/9)n(n\ + n2 + 1) work units for the V-cycle version and (8/3)/i(i + n2 + 1) work units for the W-cycle version.

## 13.4. Full multigrid

303

Algorithm 13.5. Recursive version of the full multigrid algorithm for solving Kh U = Fh on the meshTh- The nestedfamilty of meshes is assumed to be Ty / , , - , Tih, 7/7.

13.4.1

## Discretization, algebraic, and total errors

To evaluate the performance of full multigrid and, in particular, to compare it with the //cycle, it is helpful to consider the errors involved in solving Kh U Fh and to think about the goal of an iterative algorithm. When a BVP is discretized by the finite element method and the system Kh U = Fh is solved to get an estimate of Uh, there are actually three "solutions" involved. One is the exact (continuous) solution u of the BVP. The second is (//,, the exact solution of the discretized problem K/,U Fh. Finally, there is the computed solution Vh. The vectors Uh and Vh correspond to piecewise polynomials ufl and vh, respectively. In solving the system K/,U /*/,, one is nominally trying to estimate Ui, (and hence M/,). However, the real goal is that vh approximate a. The error in v^ can be bounded by the triangle inequality:

The error u uh (or its norm) is referred to as the discretization error, while uh vh is the algebraic error and u u/, is the total error in u/,. There is no point in expending computational effort reducing the algebraic error much below the discretization error, since doing so reduces the total error in vh marginally at best. Therefore, to compare the //-cycle with full multigrid, it is necessary to determine which algorithm reduces the error in vh to the level of discretization more efficiently. Since the /i-cycle is determined by three parameters (/i, n\, n-i) and full multigrid by four (//, n, n\, 712), a precise comparison is complicated. The following example will give the reader an idea of how the various possibilities compare on a specific problem. EXAMPLE 13.8. This example considers the B VPfrom Example 13.5 on a sequence of seven meshes. The number of free nodes ranges from 1 on the coarsest mesh to 16 129 on the finest. Five different multigrid algorithms-were applied: V(2, \)-cycle, W(\, \)-cycle, and three versions of the full multigrid algorithm, (fj,,n,n\,H2) (1, 1,2, 1), (;u, n, i, 712) = (2, 1, 1, 1), (/z, , ft], 712) = (1,2, 1, 1). To compare the effectiveness of the algorithms, the ratio of algebraic error to discretization error was recorded. (Since the exact solution is known for this problem, the discretization error can be computed.} The results are reported

304

## Chapter 13. The multigrid method V-cycle

Itlt WI I ^U \\ I>- I<\\F. ||-A|k
U V

W-cycle

Full multigrid
v (H,n,n ,n ) WU ";**|' v ^' ' }" 2z / \\u-uh\\E

Its 1 2 3

wu
6 12 18

## II;*-"* 'I* \\U-Uh\\E

0.542 0.028 0.002

1 2 3 4

## multigrid algorithms on a model problem

in Table 13.4. This example suggests that the full multigrid algorithm is more efficient than either the V-cycle or the W-cycle alone. Roughly speaking, solving the system K^ U = Ft, to the level of discretization required 16 work units using the V(2, \}-cycle, 12 work units using the W(\, \}-cycle, and 1 work units using a full multigrid scheme. The reader should notice how remarkable these results are. One work unit equates to one GaussSeidel iteration on the finest mesh. Example VIA from the previous chapter -would suggest that 10 iterations of Gauss-Seidel would not begin to solve the system satisfactorily; indeed, 1000 iterations of Gauss-Seidel for this problem produced

It would take many thousands of Gauss-Seidel iterations to solve the system to the level of discretization. The best nonmultigrid algorithm presented in this text is the hierarchical basis CG {HCG) method, an iteration of which is somewhat more costly than a GaussSeidel iteration. Ten iterations of HCG produced

Even the powerful HCG algorithm does not approach the level of discretization for this amount of computational cost.

## 13.5 The MATLAB implementation

The MATLAB code for the algorithms from this chapter assumes that the inputs are the quantities from the finest mesh (and, in particular, the finest mesh itself). It is therefore necessary to extract a coarser mesh from its refinement; this is inexpensive and coded in unRef inel. The other supporting functions are Interpolate!, the intermesh transfer operator hh,h, and its transpose, InterpolateTransl.

13.5.1

MATLAB functions

mgmul Applies the multigrid /z-cycle to the model problem (implemented using recursion).

## 13.6. Exercises for Chapter 13

305

fullmgl Applies the full multigrid algorithm to the model problem (implemented using recursion). Interpolate! Implements the intermesh transfer operator hh,h InterpolateTransl Implements the intermesh transfer operator lh^h = 1^ h. unRef inel Extracts the original mesh from a mesh produced by Ref inel.

13.6

## Exercises for Chapter 13

1. Use trigonometric identities to show that (13.3a)-(13.3e) hold. 2. Suppose ( M I , 2, , } is an orthonormal basis for R", and v is any vector in R". Show that

(Hint: Since { M I , M 2 , . . . , un] is a basis, v can be written as v ]C"=1 <*/"/ Solve for a j by taking the dot product of both sides with w 7 .) 3. Show that the eigenvalues of B^ are a>9k,t + ! & > . 4. Show that a> should be chosen to be 2/3 in the weighted Jacobi method to damp the high frequencies. (Hint: First show that

## Then show that

where the optimal value of a> is 2/3.) 5. Suppose n\ presmoothing and n2 postsmoothing Gauss-Seidel steps are used. Show that the cost of a W-cycle is bounded by 2(n\ + n2 + 1) work units. 6. In Section 13.2.2, it was shown that (13.6) is the result of applying Galerkin's method to (13.5). It follows that the solution of (13.6) must be the best approximation to the solution of (13.5) in some sense. State this conclusion precisely. 7. Show that the load vector F2h on T2h is characterized by the condition

Use this fact to show that F2h iL .Fh, where Fh is the load vector on Th.

## where 2 is the unit circle and

The exact solution is u(x, v) (1 x2 y2)ex . (a) Create a mesh on 2 as in Example 6.6, refining the coarse mesh four times. (b) Compute the stiffness matrix K and the load vector F. (c) Compute the "exact" finite element solution /, by solving KU = F using the built-in solver in MATLAB. Compute the discretization error. (d) Solve K U Fusing the multigrid V(2, 1) -cycle, and record the algebraic error after each cycle. How many cycles are required to solve the equations to the level of discretization (for example, algebraic error at most 1 0% of discretization error)? (e) Solve KU F using the full multigrid method with n 1 = n 2 = /* = = ! Is the system solved to the level of discretization? 9. (MATLAB) Repeat the previous exercise, using the BVP from Example 7. 1 . Find the combination of parameters n\, 2, M> and n with which the full multigrid algorithm most efficiently solves KU = F to the level of discretization. 10. (MATLAB) In Section 13.3.1, it was stated that a /z-cycle with /x greater than 2 is not cost effective. Using the BVPs from the previous two exercises, demonstrate that this is true. 1 1 . (MATLAB) Consider the BVP

where 2 is the unit circle. Establish a mesh on 2 by starting with a coarse triangulation with four triangles (as in Figure 6.11) and refining it five times. Form the finite element equations KU F and solve them using the full multigrid method with each of the following choices for (n\,n2, f i , n ) : (2, 1, 1, 1), (1, 1,2, 1), (1, 1, 1,2). How many work units are required by each method? What is the algebraic error produced by each method? 12. (MATLAB) Repeat the previous exercise, but take 1 to be the polygon with vertices (0,0), (1,0), (1, 1), (-1, 1), (-1, -1), and (0, -1). Start with a coarse mesh consisting of six triangles and refine it five times to obtain the final mesh.

Part IV

Chapter 14

When solving a BVP, the goal is to obtain a solution that is sufficiently accurate; frequently the exact details of how this is accomplished are unimportant. This is especially true of the mesh; in many cases, it would be desirable if the finite element algorithm could automatically generate a suitable mesh. That is the topic of this part of the book. In order for the computed solution to be accurate, it is necessary that the mesh be fine enough to represent the variation in the true solution; if the solution is changing rapidly, then the mesh must be quite fine. On the other hand, if the solution changes slowly, then a coarser mesh will suffice. Many solutions have the property that they change rapidly over part of the computational domain 2 and slowly over other parts. An example is the function

which has a sharp peak at (x, y) = (0.5, 0.75). Two meshes on the unit square are shown in Figure 14.1. One is a uniform mesh with 2048 triangles, while the other is refined in the neighborhood of the point (0.5, 0.75) and has only 858 triangles. The piecewise linear interpolants of u on these two meshes are shown in Figure 14.2. The error in the interpolant is less on the locally refined mesh, whether measured in the energy norm (0.2912 versus 0.3534) or the maximum pointwise error (0.03078 versus 0.04538).

Figure 14.1. Two meshes on the unit square, The mesh on the left has 2048 triangles, while the mesh on the right has 858 triangles.

309

310

## Chapter 14. Adaptive mesh generation

Figure 14.2. Thepiecewise linear interpolants ofthefunction (14.1) on the meshes of Figure 14.1. Adaptive finite element methods try to determine a locally refined mesh, like the one in the previous example, automatically during the process of solving a BVP. The basic algorithm is this: Given an initial mesh T: repeat Solve the BVP on T Estimate the error in the computed solution on each element If the total error is sufficiently small, stop Otherwise, use the error estimates to select certain elements to refine Locally refine the mesh There are therefore three components to an adaptive algorithm: 1. an element-by-element error estimator; 2. a strategy for choosing which triangles to refine; and 3. an algorithm for locally refining a mesh. There are a number of possible choices for each of these algorithms. Indeed, adaptive finite element methods form an ongoing area of research, and there is no currently accepted "best" algorithm. In this chapter, I will discuss algorithms for local refinement of a triangulation and triangle selection strategies. I end the chapter with a conceptually simple but expensive error estimator, and show how an adaptive algorithm would perform. In the next chapter, I present several efficient error estimators.

311

14.1

## 14.1.1 Algorithms based on the standard refinement

Locally refining a triangulation is not straightforward because the resulting mesh must conform to the following rule, first mentioned in Section 4.1: The intersection of any two triangles must be a common vertex or a common edge. For instance, if it is decided that one triangle in a mesh is to be refined, then the standard refinement, applied to the single triangle, leads to a nonconforming mesh, as shown in Figure 14.3. The situation illustrated by Figure 14.3 arises in any conceivable algorithm for local mesh refinement, and so any algorithm must be able to deal with it. The usual method is to refine neighboring triangles, as necessary, until a conforming triangulation is obtained. For example, the second triangle in the original mesh of Figure 14.3 could be bisected, leading to the situation shown on the left in Figure 14.4. It is customary to refer to the bisection of a triangle as a green refinement and the resulting (sub)triangles as green triangles. Although green refinement creates a conforming mesh, it can also lead to a degenerate sequence of meshes. The reader will recall, from Section 5.1, that the convergence theory and standard error estimates are based on the nondegeneracy of the family of meshes, which requires that the shapes of the triangles not degenerate (informally, that the triangles not become arbitrarily skinny). However, as suggested by the mesh on the right in Figure 14.4, repeated green refinements can produce unacceptable triangles.

Figure 14.3. A mesh with two triangles (left} and a nonconforming refinement (right).

Figure 14.4. Left: The nonconforming mesh of Figure 14.3, made to conform by a triangle bisection. Right: Another step of this refinement process; a degenerate family of meshes could arise by continuing this process.

312

## Chapter 14. Adaptive mesh generation

To avoid degeneracy, a given triangle is typically subjected to at most one green refinement. One way to ensure this is to keep track of which triangles are descended from a green refinement; if such a triangle become nonconforming due to refinement of an adjacent triangle, then it must be subject to a regular refinement. Another way to handle green refinements is to remove all green triangles from a given mesh before locally refining it. Of course, this introduces nonconforming triangles, but so does the local refinement, so not much added difficulty is introduced. In this second option, the sequence of meshes is not nested (not every triangle in a given mesh need be a subtriangle of a triangle in the previous mesh).

14.1.2

## Algorithms based on bisection

Since standard refinements must be supplemented by green refinements to enforce triangle conformity, it is natural to consider using triangle bisection exclusively for local refinement. There are several algorithms based on triangle bisection, and I will describe two of them in detail.

Triangle bisection leads to degeneracy if repeated bisecting edges are incident with the same node, as in Figure 14.4. A simple way to avoid this is to always bisect any given triangle from the newest node of that triangle, that is, from the node most recently added to the family of meshes. The newest node in a triangle is referred to as the peak of the triangle, and the opposite edge as the base. When a triangle is bisected, the new node becomes the peak of both subtriangles. This leads to the sequence of refinements shown in Figure 14.5. Newest-node bisection was introduced by Sewell [39], who showed that every descendent of a given triangle falls into one of four similarity classes (that is, each subtriangle is similar to one of four triangles), as illustrated in Figure 14.5. This is enough to prove that newest-node bisection cannot lead to degeneracy. Triangle bisection can lead to nonconforming triangles just as does standard refinement. However, if a triangle and one of its neighbors share a common base, then the two triangles can be refined together without creating a nonconformity. In this case, each of the two triangles is said to be compatibly divisible. (A triangle is also called compatibly divisible if its base is a boundary edge.) Mitchell [32] extended the newest-node method to a simple recursive algorithm, which is based on the following observation: Suppose 7*, with base e, is to be bisected, and 7) is the neighbor of Tk sharing the edge e. Ife is not the base of 7), then, after a single bisection of 7), e will be the base of a subtriangle of 7). This is illustrated in Figure 14.6. Mitchell's idea was to recursively refine 7} so that Tk and its (new) neighbor can then be refined together. The recursion continues until it reaches a pair of neighbors sharing a base or a triangle whose base is a boundary edge. It can be shown that the recursion "bottoms out" after a finite number of steps provided that, in the initial mesh, the bases are defined so that every triangle either shares its base with a neighbor or has a boundary edge as a base. The recursive algorithm is summarized in Algorithm 14.1.

## 14.1. Algorithms for local mesh refinement

313

Figure 14.5. Refined triangles created by newest-node bisection. The triangles are labeled by their similarity classes.

Figure 14.6. Newest-node bisection. The left-hand triangle in the mesh on the left is to be bisected. Its base is not the base of its neighbor; however, after a single bisection of the neighbor, the first triangle shares a base with one of the new subtriangles. Given a triangle Tk of mesh T to bisect: If the base e ofTk is a boundary edge replace Tk by two subtriangles else Let TJ be adjacent to Tk across e Ife is also the base of TJ replace Tk and TJ each by two subtriangles else recursively call this routine to bisect 7) recursively call this routine to bisect Tk Algorithm 14.1. The recursive form of newest-node bisection.
Longest-edge bisection

Another algorithm based on bisection prevents degeneracy by choosing the longest edge of the triangle as the edge to be bisected. This is referred to as longest-edge bisection. Rivara

314

## Chapter 14. Adaptive mesh generation

Figure 14.7. The nonconforming refinement of Figure 14.3 (left) and a further refinement (right), which is conforming.

Figure 14.8. The original mesh (left) and a nonconforming refinement (middle). The lower right triangle is bisected twice, first by the longest side, to make the mesh conforming again.

[35] proposed two algorithms based on longest-edge bisection, one of which is described here. When a triangle is bisected by the longest edge, a neighboring triangle can become nonconforming. The remedy is to bisect the neighboring triangle by the longest edge. In the simplest case, this leads to a conforming triangulation, as in Figure 14.7. If it does not, then one of the subtriangles in the neighboring triangle is bisected to make the mesh conforming. This is illustrated in Figure 14.8. Since the subtriangle is bisected by the newest node, this algorithm is a hybrid of the longest-edge and newest-node methods. (Rivara also presented a method based entirely on longest-edge bisection, which will not be described here.) The process illustrated in Figure 14.8 may create new nonconforming triangles, since making a neighboring triangle conform may induce a nonconformity in one of its neighbors. In this case, the above process is iterated. That is, first the desired triangles are bisected by the longest side. Then any nonconformities created in the first steps are removed, possibly introducing new nonconformities. Then any nonconformities introduced in the second step are removed, and so forth. Since there is a finite number of triangles in the original mesh, the process must terminate with a conforming mesh. An application of this algorithm refines certain triangles T from the original mesh T. Each refined triangle is replaced by two, three, or four subtriangles, and it is guaranteed that even repeated application will not reduce the minimum angle in the mesh by more than a factor of two. Moreover, the nondegeneracy condition from Section 5.1 is guaranteed to hold (see Rivara [35]).

## 14.2. Selecting triangles for local refinement

315

The performance of the newest-node and longest-edge/newest-node hybrid bisection algorithms tends to be similar in practice (see [32]). The newest-node algorithm is simpler to implement, since no nonconforming triangles are ever created, and therefore it seems to be a better choice in practice. The only drawback of the newest-node method is that the bases must be chosen in the initial mesh so that every triangle is compatibly divisible. This is easy to do by inspection for a coarse mesh, and algorithms can be devised to perform this step automatically.

14.2

## Selecting triangles for local refinement

The previous section presented algorithms for refining certain triangles in a mesh without refining all of the triangles. Another essential ingredient of an adaptive algorithm is a method for selecting which triangles to refine. The reader will recall the framework for an adaptive algorithm: The given BVP is solved on a mesh ?/, to produce an approximation uh of the true solution u. The error u /, is then estimated on each triangle T e Th, and triangles are selected to be refined based on the size of the errors. It is necessary to choose a norm in which to measure the error on T; possibilities are the L 2 - and energy norms:

Another possibility is to estimate the maximum pointwise error over T; this introduces a new norm, the L-norm, which is defined for continuous functions by

## An adaptive algorithm might then try to estimate

for each T e 7/j . When the true solution is smooth enough and elliptic regularity holds, the following error estimates hold when uh is the piecewise linear finite element solution:

## (see Chapter 5). Using the L-norm, the analogous result is

(see Chapter 8, and page 224 in particular, of Brenner and Scott [13]). The bound (14.2) is stated more precisely in the next chapter. Since |log(/z)l grows so slowly as h > 0, it is reasonable to ignore it in designing practical numerical algorithms. The most common triangle selection strategy is due to Babuska and Rheinboldt [7] and is based on two assumptions:

316

## Chapter 14. Adaptive mesh generation

1 . The error estimate on a given triangle T (and its subtriangles formed as the mesh is refined) has the form chx, where c, A are positive constants and h is the diameter of7\ 2. A mesh is (nearly) optimal when the errors are equilibrated, that is, when the elementwise errors are nearly constant. The asymptotic error estimates given above suggest that the first assumption is reasonable, at least asymptotically, whether the error is measured in the L2-, energy, or L-norm. The second assumption can be justified rigorously (see Section 3.5 of [8]) under certain circumstances. As working hypotheses for the development of practical algorithms, the above assumptions have been quite successful. The Babuska-Rheinboldt strategy works as follows: Consider a triangle T e Th with diameter h\ and suppose the estimated error over T is \. Suppose further that T was obtained by refining a triangle (in an earlier mesh) having diameter ho, and assume that the estimated error over that triangle was 6Q. The assumptions

allow the determination of c and A, which in turns allows an estimate of the error that would result if T were refined so that the diameter of its subtriangles were h\/2:

The estimate 2 is computed for each triangle in the current mesh and M is defined to be the largest of these values. That is, M is an estimate of the largest elementwise error that would result from a uniform refinement of the current mesh. Now the second assumption above is used, that the mesh can be optimized by equilibrating the error. Every triangle in the current mesh whose estimated error e\ is greater than M is selected for refinement. The above strategy also provides a basis for deciding how much to refine a given triangle. The reader should notice that a single bisection of a triangle T produces two subtriangles T\ , TI satisfying

(for example, bisecting an equilateral triangle T produces two subtriangles with the same diameter as that of T). Bisecting T twice (that is, bisecting T and both its subtriangles) produces four subtriangles whose diameters are at most half of that of T. Should every selected triangle be bisected twice? Since the goal is to equilibrate the mesh, this may not be the best strategy. Subtriangles of T would ideally have a diameter hi satisfying

which yields hi (M/c)l/k. Of course, the bisection algorithm does not allow hi to be specified, so T could be bisected once or twice so that subtriangles have diameter at most 0000000

## 14.3. A complete adaptive algorithm

317

There is one more technicality to be considered. The reader will recall that the bisection algorithm may require the refinement of unselected triangles in order to maintain compatibility. Since a single bisection does not necessarily reduce the diameter, there is the possibility that h \ ho, in which case (14.3) does not determine c and A.. Some ad hoc procedure is necessary in this case; I suggest replacing h\ with ho/V2 before determining c and X (although the diameter of T is the same as that of its supertriangle, its area is half of the supertriangle). The triangle selection strategy described above has one drawback: It may select very few triangles for refinement at a given step. This is appropriate if the main goal is to equilibrate the error on the mesh. However, since the finite element equations must be formed and solved at each step of the adaptive algorithm, efficiency demands that the adaptive algorithm terminate in relatively few steps. For this reason, it is reasonable to augment the above strategy by requiring that at least a fixed fraction r of the triangles be selected for refinement at each step; a reasonable value of this fraction would be r 0.2. If each selected triangle is bisected twice, then the number of nodes would increase by nearly a factor of two, at least, at each step. (With r = 0.2, the factor could be as little as 1.6, but since nonselected triangles must usually be bisected to maintain a conforming mesh, the factor is greater in practice.)

14.3

318

## Moreover, in a two-dimensional problem with uniformly refined meshes,

where Nv is the number of nodes in the mesh. It follows that, for h small enough,

for some constant C. When u is smooth but varies sharply in some parts of 2 (as in the examples presented below), it might be necessary that h be unrealistically small before this asymptotic rate of convergence is observed (and thus, in practice, this rate of convergence is not observed when uniform refinement is used). A fair test of the effectiveness of an adaptive algorithm is whether the optimal rate of convergence (14.4) is observed. Therefore, in the examples shown below and in the next chapter, the values of (Nv , || u uh \\ E) are recorded for the meshes generated during the adaptive algorithm, and values of C and p are determined such that = CNP fits the data points (N, e) = (Nv, \\u Uh HE) as nearly as possible in the least-squares sense. If/? is (close to) 1/2, then the algorithm has exhibited a (nearly) optimal rate of convergence, and can be judged a success. The following three examples are given in order of increasing difficulty. The first has a mild boundary layer (region of sharp change near the boundary). Since the boundary layer is not severe, a sequence of uniform meshes exhibits the optimal convergence rate, and the adaptive algorithm shows only a small improvement in efficiency. The second has a sharp peak in the interior of 2, and the adaptive algorithm is noticeably more efficient than the nonadaptive algorithm. The third problem has a region of rapid change in the interior of ^ and is the most difficult of the three. EXAMPLE 14.1. In this example, the domain is the unit square, and the true solution, the function

changes somewhat rapidly near the top and right boundaries. The function u is shown in Figure 14.9. The BVP is

where g is chosen so that the function u given above is the solution. A sequence of uniform meshes yielded errors satisfying

The final mesh contained 8192 triangles and 4225 nodes, and the error in the energy norm was about 0.0713. The convergence of the error to zero during the course of the adaptive iteration displayed the following behavior:

14.3.

319

## Figure 14.9. The exact solution for Example 14.1.

Figure 14.10. A uniformly refined mesh (left} and a locally refined mesh (right) generated in Example 14.1. The final mesh has 9728 triangles an d 5022 nodes, and the energy norm error in the solution was about 0 . 03 88. Figure 14.10 shows a sample uniformly refined mesh and a sample locally refined mesh from this example. Figure 14.1 1 displays the convergence to zero of the error for both methods. The graph is given on a log-log plot, on which a relationship of the form = CNP appears as a straight line with slope p. In this example, the two methods are converging at roughly the same rate, so the two lines are nearly parallel. The adaptive method is slightly superior (C is less for the adaptive method), as seen in Figure 14. 1 1. EXAMPLE 14.2. Consider the Dirichlet problem

320

## Chapter 14. Adaptive mesh generation

Figure 14.11. The convergence to zero of the error (in the energy norm) for Example 14.1: uniform refinement (solid line) and local refinement (dashed line).

Figure 14.12. A locally refined mesh in Example 14.2 and the computed solution.

where 2 is the unit square and f is chosen so that the exact solution is

This function has a sharp peak at ( x , y ) = (0.5, 0.117) (see Figure 14.12).

## 14.3. A complete adaptive algorithm

321

Figure 14.13. The convergence to zero of the error (in the energy norm) for Example 14.2: uniform refinement (solid line) and local refinement (dashed line). The finite element method with a sequence of uniform meshes exhibited the following convergence: The adaptive algorithm behaved like this: The errors for the two methods are plotted in Figure 14. 13. // is also instructive to compare the two methods in the following way. On a uniform mesh with 8 1 92 triangles and 4225 nodes, the error (in the energy norm) was about 0.0 1 44. On the other hand, the adaptive algorithm achieved a similar error (0.01 38) on a mesh with only 388 triangles and 209 nodes. Even using the quadratic error estimator, which is quite inefficient, the adaptive algorithm only took about 2 1 % of the time required to achieve this level of accuracy on the uniform mesh. EXAMPLE 14.3. In this example, the domain is the rectangle 2 = (0.01 , 1) x ( 1 , 1), and the true solution is the harmonic function The function u changes abruptly as (x, y) approaches the lower left corner ofi (see Figure 14. 14); indeed, u has a singularity at the origin, which lies just outside ofL. The BVP is the Dirichlet problem

322

## Chapter 14. Adaptive mesh generation

Figure 14.14. A locally refined mesh in Example 14.3 and the computed solution. The finite element method with a sequence of uniform meshes exhibited the follow ing convergence:

## The adaptive algorithm behaved like this:

The errors for the two methods are plotted in Figure 14.15. The preceding examples suggest that the adaptive algorithm described in this section is quite satisfactory in every respect but one: The error estimator is extremely expensive. In fact, computing the error estimate wf } uh costs much more time than computing uh itself. To be satisfactory, an a posteriori error estimate must require less computational time than computing the solution M/, itself, preferably much less time. The next chapter describes several such estimators that, in most respects, work as well as the expensive estimator used above.

## 14.4 The MATLAB implementation

The main algorithms from this chapter are the newest-node refinement algorithm, the Babuska-Rheinboldt triangle selection strategy, and the quadratic error estimator. These algorithms are implemented in the MATLAB functions LocalRef inel, SelectTris, and QuadElementErrEstl, respectively. Estimating the errors on each element requires the element stiffness matrices corresponding to the quadratic mesh, so a new version

## 14.4. The MATLAB implementation

323

Figure 14.15. The convergence to zero of the error (in the energy norm} for Example 14.3: uniform refinement (solid line) and local refinement (dashed line). of Stiffness2, named Stif fnessE, is provided to return these quantities. The Interpolate2 routine is used to interpolate a piecewise linear function onto a piecewise quadratic mesh. Several new access functions are needed to extract information from the mesh data structure. In order to apply the newest-node algorithm to a mesh, the base of each triangle must be defined. The bases are stored in an N, x 1 array Bases in the mesh data structure; Bases (k) contains the index of the base of triangle Tk in the list of edges. Only LocalRef inel uses the Bases field, and it automatically updates the array when it refines a mesh. It was mentioned above in Section 14.1.2 that the recursive newest-node algorithm is guaranteed to work provided that every triangle in the initial mesh is compatibly divisible (that is, shares a base with a neighboring triangle or has a boundary edge as base). Choosing the bases in the initial mesh to satisfy this property is an example of a matching problem from graph theory. Algorithms for this problem are complicated and beyond the scope of this book. Interested readers can consult the book by Jungnickel [25] for a detailed discussion of matching algorithms. For a coarse mesh, the user can easily choose the bases appropriately and define the Bases array. For example, Figure 14.16 shows two mesh with the elements and edges labeled by their indices. In the mesh on the left, a natural choice of T. Bases is T.Bases = [4, 4, 6, 6, 11, 11, 13, 13]. This indicates that triangles 1 and 2 share edge 4 as base, triangles 3 and 4 share edge 6, triangles 5 and 6 share edge 11, and triangles 7 and 8 share edge 13. In the mesh on the right, one choice of T. Bases is T.Bases = [8,9, 10, 11, 12, 13, 14].

324

## Chapter 14. Adaptive mesh generation

Figure 14.16. Two triangulations with the triangles and edges labeled by their indices. In this case, each triangle has its boundary edge as base. In neither case is the choice of bases unique. For example, in the mesh on the right in Figure 14.16, another possible choice of bases is given by T. Bases = [2, 2, 4, 4, 6, 6, 14]. If a mesh is passed to LocalRef inel and the mesh data structure does not contain the Bases field, then LocalRef inel calls Def ineBases to choose the bases. The MATLAB routine Def ineBases implements a heuristic algorithm that is likely but not guaranteed to find an acceptable choice of bases. If Def ineBases fails, then LocalRef inel prints an error message and terminates (in which case the user would have to define T. Bases directly). The interested reader can consult the MATLAB file Def ineBases . m for details about the algorithm it implements. 14.4.1 MATLAB functions

These functions, together with those described in Section 15.5, comprise the adaptive code. LocalRef inel: The newest-node algorithm for local refinement. Updates the lists of triangle diameters and error estimates needed for triangle selection. Def ineBases: Heuristic algorithm for defining the base of each triangle in a mesh. If the algorithm succeeds, the resulting mesh is compatibly divisible. SelectTris: Babuska-Rheinboldt triangle selection strategy. Always selects at least 20% of the triangles. QuadElementErrEst 1 Estimates the errors on each triangle by comparing to the piecewise quadratic solution on the same mesh. Returns the more accurate piecewise quadratic solution if requested. Solve Implements the adaptive algorithm (for the model problem) described in this chapter. Can also use one of the efficient error estimators described in the next chapter. Solve 1: Similar to Solve but with uniform refinement. Intended for comparison with Solve.

## 14.5. Exercises for Chapter 14

325

StiffnessE: Version of Stiff ness2 which also returns the element stiffness matrices; needed by QuadElementErrEstl. The functions for retrieving information from the mesh data structure are - getAdjacentTriangle Gets the index of the triangle on the other side of a given edge of a triangle. - getAdj acentTriangles Gets the indices of all the triangles adjacent to a given triangle. - getDiameter Computes the diameter of a given triangle. getDiameters Computes the diameters of a list of triangles. - getFBndyEdgeNodes Gets the coordinates of the endpoints of a free boundary edge. - getGradl Gets the gradient of a piecewise linear triangle on a given triangle. - getOppositeVertex Gets the index of the third vertex of a triangle. - getOtherEdges Given the index of an edge of a triangle, gets the indices of the other two edges.

14.5

## Exercises for Chapter 14

1. Here are the values of N and 6 obtained in the adaptive algorithm in Example 14.3:

TV 1 180
2250 4379 8440

16 182

## 0.23224 0.16039 0.11387 0.082623 0.059271

Verify that the least-squares fit to these data is = 8.86W 5I8 . (Hint: The problem can be expressed as a linear least-squares problem by taking the log:

326

## Chapter 14. Adaptive mesh generation

2. If a triangle is bisected repeatedly by the newest-node algorithm, all the subtriangles fall into one of four similarity classes (see Figure 14.5). Determine the angles for triangles in each of the four classes, assuming the initial triangle is (a) an equilateral triangle; (b) an isosceles right triangle (take the hypotenuse to be the base). 3. (MATLAB) Let ft be the polygonal region having vertices (0,0), (1,0), (1,1), (1, 1), (1, 1), and (0, -1), and let u be the harmonic function defined in polar coordinates by u = r 2/3 sin (20/3), 0 < 9 < 2n (see Exercise 8.6.7). Using the adaptive algorithm described in this chapter, solve the Dirichlet problem

where g is chosen so that the given function u is the solution. At what rate does II" W/JE go to zero? Solve the problem using a sequence of uniformly refined meshes. How much more efficient is the adaptive algorithm? 4. (MATLAB) The function

is harmonic. Let ft be the square (1, 1) x (1, 1) and use the adaptive algorithm described in this chapter to solve the Dirichlet problem

where g is chosen so that the given function u is the solution. (The solution u has peaks at the four corners of the domain.) At what rate does \\u Uh \\E go to zero? Solve the problem using a sequence of uniformly refined meshes. How much more efficient is the adaptive algorithm? 5. (MATLAB) Solve the Dirichlet problem

## 14.5. Exercises for Chapter 14

327

Figure 14.17. The initial mesh for Exercise 1. 6. (MATLAB) Repeat the preceding exercise with

7. (MATLAB) Let tt be the hexagon with vertices (1,0), (1/2, >/3/2), (-1/2, >/3/2), (-1,0), (-1/2, -A/3/2), and (1/2, -A/3/2), and let To be the triangulation of 2 shown in Figure 14.17. Consider the Dirichlet problem

where K is the discontinuous function with value 1 on triangles 1, 3, and 5 and value 100 on triangles 2, 4, and 6 of To- The boundary data are chosen so that the exact solution is

Solve the BVP using the adaptive algorithm described in this chapter, and using a sequence of uniform meshes produced by the standard refinement. In both cases, use To as the initial mesh. Which method is more efficient? 8. (Programming) Extend the MATLAB codes Solve and QuadElementErrEstl to allow a zero-order term in the PDE, as in

328

## 9. (MATLAB) Change the BVP in Example 14.1 so that the solution is

and solve using the adaptive algorithm from this chapter. (The code from the previous exercise is required.) Also solve using a sequence of uniformly refined meshes, and compare the efficiency of the adaptive algorithm to the nonadaptive algorithm.

Chapter 15

## Error estimators and indicators

In this chapter, several practical error estimators will be presented. An a posteriori error estimator can serve two purposes. First of all, as was demonstrated in the preceding chapter, an elementwise error estimate allows an algorithm to choose which triangles to refine so as to reduce the error with as little computational effort as possible. Second, an accurate error estimate shows when the problem has been solved accurately enough and therefore when the algorithm can be halted. Some techniques give rise to error indicators, which effectively indicate where the error is large, and therefore which elements can fruitfully be refined, without giving a quantitative measure of the error. An indicator can be used in an adaptive algorithm, although one would have to use other criteria to decide when to halt the algorithm. Error indicators are worth pursuing because it has been shown that the most accurate error estimator does not necessarily lead to the most efficient adaptive algorithm (see [32]). Although this may be surprising at first, it follows naturally from the fact that the nodal values are found by solving a coupled system KU F; no nodal values are independent of any others. Therefore, if the mesh is insufficiently refined in one region, the error in the associated nodal values will affect the nodal values away from the given region. I will typically write "error estimators" when, strictly speaking, "error estimators and indicators" would be more precise (for example, in the next paragraph). However, when discussing a specific technique, I will indicate the category into which it falls (error estimator or error indicator). There are two types of error estimators: explicit and implicit. An implicit error estimator requires the solution of systems of (algebraic) equations; the estimator is expressed as an implicit function of the computed solution. On the other hand, an explicit error estimator is given as an explicit function of the computed solution. It follows that explicit estimators are less expensive to compute. In the following sections, I present two explicit estimators and one implicit estimator. Throughout the discussion, the model BVP

329

330

15.1

## An explicit error indicator based on estimating the curvature of the solution

The first practical error indicator presented here is due to Eriksson and Johnson [19] and is based on an a priori error estimate for

Here u is the exact solution of the BVP and M/J is the finite element solution, which is always assumed to be piecewise linear in this chapter. The reader will recall that, for continuous functions, the L-norm measures the largest pointwise value of the given function. When the function might be discontinuous, the L-norm ignores sets of measure zero.21 In the previous chapter, I presented the following asymptotic error bound:

The reader will recall from Chapter 5 that error estimates typically are expressed in terms of higher derivatives of the exact solution; a typical result was

where

is defined by

(see Section 5.3). The bound (15.2) can be stated more precisely in terms of the second derivatives of M, but this requires the definition of the following Sobolev spaces:

## and its partial derivatives up to order k are in

The W*'-norm is just the largest of the L -norms of M and its partial derivatives up to order k, while the \y*'-seminorm |lvy*.~(n) is the largest of the L -norms of the partial derivatives of M of order exactly k. Given these definitions, (15.2) can be expressed as

## 21 The precise fefinition is

The number defined on the right is called the essential supremum of |/| on 2. It is the smallest number M such that | /| < Af except possibly on a set of measure zero.

## 15.1. Explicit error indicator based on estimating curvature of solution

331

(the constant C in (15.3) is differentthan in (15.2); the constant in (15.2) absorbed the factor of ||M||w2-(Q))- A variation of (15.3), which is useful for local refinement, is expressed element by element:

Based on (15.4), it can be expected that the values ofh^-\u\w^^(T) are representative of the relative sizes of \\u /, |U~(7> Since |log (h)\ grows so slowly as h decreases, it is usually considered to be absorbed in the constant C. Since the value of C is unknown, this method produces an error indicator, not an error estimator (although Eriksson and Johnson [19] do discuss methods for estimating C, thereby turning the indicator into an estimator). To develop a practical method, it remains only to find a way of estimating \u\W2,^(T) from the computed solution uh. It is not surprising that the size of the second derivatives of u would provide an indication of where the error in Uh is large. Since w/, is piecewise linear, its second derivatives are zero inside each triangle T. If the second derivatives of u are large (that is, if Vw is changing rapidly) on part of 1, w/, could accurately represent u on that region only if the mesh were highly refined there. By definition, \u\W2^(T) is the largest of

## Eriksson and Johnson proposed to estimate this quantity by the largest of

where (XT, }'r) represents the centroid of the triangle T and T ranges over the three triangles adjacent to T. (Thus there are six difference quotients to compute for each T, fewer if T is adjacent to dQ.) To demonstrate the effectiveness of the Eriksson-Johnson error indicator, the examples from Section 14.3 will be solved using it in place of the quadratic error estimate. EXAMPLE 15.1 (cf. Example 14.1). The ErikssonJohnson error indicator attempts to estimate (at least up to an unknown multiplicative constant) the L-norm of the error. The following table records the maximum of the elementwise indicators and also the actual Lnorm of the error in the computed solution. The last column gives the ratio of the actual error to the estimated error.

332

## Chapter 15. Error estimators and indicators

Nv

II" -uh\\L*nn) 2.1296- ID" 7.9937 1(T2 4.5236 lO-2 1.6384- 10-2 1.5173- lO-2 5.1592- 1(T3 2.5034- 10~3 1.4781 10~3
1

Estimated error 1.46878 7.63613- KT1 5.24178- 10"1 2.23028 10-1 1.24818- 10-1 6.35376-10- 2 3.0163 JO"2 1.58193- JO' 2

Ratio

## 0.145 0.105 0.0863 0.0734 0.122 0.0812 0.0830 0.0934

Figure 15.1. The convergence to zero of the error (in the energy norm) for Example 15.1: uniform refinement (solid line) and local refinement (dashed line). These results suggest that the indicator is quite successful. The convergence of the energy norm of the error to zero was

which is similar to the result obtained in Example 14.1. This convergence is compared to the results for uniform refinement in Figure 15.1 (cf. Figure 14.11). The reader will recall that this B VP is not particularly difficult, and the adaptive refinement does not lead to a large improvement in this case. EXAMPLE 15.2 (cf. Example 14.2). The solution of this problem has a sharp peak in the interior ofl (see Figure 14.12) and is more difficult than the previous problem. Here are the results of the adaptive algorithm, displayed as in the previous example.

333

## Nv 45 86 145 239 398 802 1546 2815 5243

II w -uh\\L~(n) 7.0416- 10-2 8.5744 10~2 1.3493- lO-2 3.0149- 10~3 7.1862- 10~4 4.3305 10~4 1.6103- 10~4 1.4172- 10~4 7.1135- 10-5

Estimated error 1.6036- 10-1 6.0089- 10~2 2.7297- 10~2 2.0955 10~2 6.2306 10-3 4.0484 10~3 1.3948- 10-3 8.5832- 10~4 4.1649- 10-4

Ratio 0.43911 1.427 0.49430 0.14387 0.11534 0.10697 0.11545 0.16511 0.17080

Figure 15.2. The convergence to zero of the error (in the energy norm) for Example 15.2: uniform refinement (solid line) and local refinement (dashed line).

## The convergence of the energy norm of the error to zero was

which is virtually identical to the result obtained in Example 14.2. This convergence is compared to the results for uniform refinement in Figure 15.2 (cf. Figure 14.13). EXAMPLE 15.3 (cf. Example 14.3). This problem is the most difficult of the threefor uniform refinement (the solution and domain are shown in Figure 14.14). Here are the results of the adaptive algorithm, displayed as in the previous examples.

334
Nv

## 95 198 364 692 1350 2477 4717

II" -uh\\L<>(a) 5.5502 10-1 3.7094- lO"1 1.6246- 10-' 5.6060 10~2 1.7011 10~2 5.0250 10~3 4.02478 10-3

## Ratio 0.22290 0.17893 0.11328 0.073291 0.058826 0.038585 0.046028

Figure 15.3. The convergence to zero of the error (in the energy norm) for Example 15.3: uniform refinement (solid line) and local refinement (dashed line). The convergence of the energy norm of the error to zero was

which, again, is similar to the result obtained in Section 14.3. This convergence is compared to the results for uniform refinement in Figure 15.3 (cf. Figure 14.15).

15.2

## 15.2. An explicit error indicator based on the residual

335

where u is the exact solution. If eh u M/, is the error in w/, and v is any element of V, then Therefore, e^ satisfies the variational equations

Many error estimators are based on manipulating these equations. In deriving the following error estimator, I will assume that any Dirichlet boundary conditions are homogeneous, as in the following model problem:

The necessary modification for nonzero Dirichlet data will be given at the end of the derivation. The weak form of the model problem (15.6) is a(u,v) = ^(u)foralli; eV = {veHl(&) : v = 0 o n r , ) , where

The derivation of the error estimate begins with (15.5a), expressed in terms of the individual triangles:

## was used; it is valid since uh, restricted to a single triangle T, is smooth.

336

Chapter 15. Error estimators and indicators The above expression for a(eh, v) involves the element-by-element residuals

and

which measure by how much /, fails to solve the PDE in the interior of each triangle and by how much it fails to satisfy the boundary condition on each boundary edge. The above expression can be further simplified by defining /, to be the set of all edges in the triangulation and Xh to be the set of all e e h such that e is not a boundary edge. If e lies on r\, then, for any test function v e V, v = 0 on e and hence

For each e e Xh, Tft\ and Te^ will denote the triangles on either side of e and ne will be the outward-pointing unit normal vector to 3Tej on e. The order of Te>\, Te^ is arbitrary. Then

where

denotes the jump in the discontinuous function Kduf,/dne across e. (It is important to notice that this jump is the same regardless of which direction is chosen for ne.) It thus follows that

To simplify this expression further, the definition of the boundary residual R is extended to all edges in h:

## 15.2. An explicit error indicator based on the residual

337

For the next step of the derivation, a piecewise linear approximation to an arbitrary v 6 V is needed. If v lies in # 2 (2), then the piecewise linear interpolant vj of u is available; unfortunately, the size of v v/ is bounded in terms of \v\H2(^}. A similar approximation vh is needed with ||u - Vh\\L2m bounded in terms of the Hl norm o f f . Such an estimate can be given by referring to the patch f of a given triangle T:

(thus f is the collection of triangles in 77, that intersect T, even at a single point). Theorem 1.7 in Ainsworth and Oden [2] yields, for any v e H] (2), a continuous piecewise linear vh satisfying

## where e is any edge of T. Since TJ/; e V/,, (15.5b) implies that

and thus, by (15.5a), It follows that v can be replaced with v vh in the right-hand side of (15.8). Therefore,

The reader should notice the use of the Cauchy-Schwarz inequality in the second and fourth steps (the first time for integrals, the second for Euclidean vectors). In the final upper bound, fe denotes the patch of a triangle having e as an edge. It can be shown that

(this follows from the fact that each T' can belong to at most k patches T for some finite integer k; otherwise, the minimum angle condition would be violated).

338

## Taking v eh and using the V-ellipticity of a ( - , ) yields

or, regrouping the boundary integrals and noting that most edges belong to two triangles,
1/2

## Based on this indicator, the explicit residual indicator is defined by

When the BVP contains inhomogeneous Dirichlet data, the finite element problem solves

where Gh is the piecewise linear function interpolating g (the Dirichlet data) at the constrained nodes and having value zero at the free nodes. Since Gh is given, the finite element procedure, as described in previous chapters, actually solves for Wh. The explicit residual indicator rjT in this case is exactly as defined above, bearing in mind that M/,, not wh, is used to define the residuals r and R. As in the previous section, the effectiveness of the indicator will be illustrated by applying it to the examples from Section 14.3. EXAMPLE 15.4 (cf. Example 14.1). The explicit residual estimator led to

The results are compared to the results for uniform refinement in Figure 15.4 (cf. Figure 14.11). EXAMPLE 15.5 (cf. Example 14.2). The three-point element residual estimator led to

The results are compared to the results for uniform refinement in Figure 15.5 (cf. Figure 14.13).

## 15.2. An explicit error indicator based on the residual

339

Figure 15.4. The convergence to zero of the error (in the energy norm) for Example 15.4: uniform refinement (solid line), local refinement based on the explicit residual indicator (dashed line).

Figure 15.5. The convergence to zero of the error (in the energy norm) for Example 15.5: uniform refinement (solid line), local refinement based on the explicit residual estimator (dashed line).

340

## Chapter 15. Error estimators and indicators

Figure 15.6. The convergence to zero of the error (in the energy norm) for Example \ 5.6: uniform refinement (solid line), local refinement based on the explicit residual estimator (dashed line). EXAMPLE 15.6 (cf. Example 14.3). The explicit residual estimator led to These results are compared to the results for uniform refinement in Figure 15.6 (cf. Figure 14.15). In all these examples, the explicit residual indicator was as effective as the explicit indicator presented in the last section, or, indeed, as the expensive quadratic error estimator of the last chapter.

## 15.3 The element residual error estimator

In the last chapter, I showed how an effective a posteriori estimate for the error in a piecewise linear solution could be obtained from the piecewise quadratic solution on the same mesh. This is not practical because of the expense of solving the global problem to get the piecewise quadratic solution. The idea of implicit residual methods is to obtain a higher-order estimate of the solution element by element. This can be done by posing a BVP for the true solution u on each element T. In this section, I will present one such BVP and the resulting error estimate. This method is due to Bank and Weiser [10] and is called the element residual method. Given that the exact solution to the BVP (15.1) is u and that the approximate solution uh has been computed on the finite element mesh Th, it is easy to derive the PDE satisfied by the error e^ = u Uh in each T e Th '

## 15.3. The element residual error estimator

341

This calculation is valid in the interior of each triangle since the piecewise linear function Uh is smooth there. This PDE must be augmented by suitable boundary conditions; in the element residual method, Neumann conditions are used:

The Neumann boundary conditions follow immediately from ehu Uh. It is not possible to solve (15.9) to find e^ on T, since the exact value of du/dn on dT is unknown except on F2 (where Kdu/dn = h). However, if e c dT is an edge in the interior of Q and f is the triangle on the other side, then a reasonable estimate is

In this formula, n is the outward unit normal to T and V w/,| r , V uh\f are the gradients of Uh on triangles T and 7", respectively. Since M/J is piecewise linear, these gradients are constant. The notation

will be used to denote the averaged value oficduh/dn. If an edge of T lies in F2, then the boundary condition

will be applied on that edge. On the other hand, if an edge of T lies in F|, then the Dirichlet condition h = 0 can be applied on that edge. (Strictly speaking, the correct condition would be eh g Uh, since /, likely satisfies the Dirichlet condition exactly only at the endpoints of the edge. However, it is simpler to take eh = 0 on such an edge, and the additional error introduced does not significantly affect the performance of the method.) Thus eh is estimated on T by solving the PDE

together with the following boundary conditions applied to each edge e of dT:

Exercise 2 asks the reader to show that the weak form of this BVP is

342

## Chapter 15. Error estimators and indicators

where
and

is interpreted as h on edges lying in F2. Having solved (15.14), at least approximately, the element residual error estimate is defined by

It remains only to decide how to solve (15.14). Except when T has an edge lying in FI, the BVP is a pure Neumann problem, and there is, in fact, no guarantee that a solution exists. (The reader should recall that a pure Neumann problem such as (15.14) has no solution except when the appropriate compatibility condition is satisfied.) Nevertheless, it has been shown (cf. [ 10]) that a useful estimate of eh can be obtained by solving (15.14) by Galerkin finite elements over a special subspace of quadratic functions, one over which the bilinear form defining (15.14) is elliptic. The space Pi(T} of quadratic functions on a triangle T is six-dimensional, and elements ofP2(T) are uniquely determined by their nodal values at the three vertices and the midpoints of the three edges (cf. Section 4.2.1). The bilinear form

is not P2(T)-e\\iptic, but it is elliptic over the three-dimensional subspace M(T) spanned by the nodal basis functions corresponding to the three edge midpoints. Restricting the approximating subspace from P2(T) to M(T) is equivalent to estimating eh to be zero at the vertices of T. Again, it has been shown that this is not too severe an assumption. The Galerkin problem used to estimate et, on T is then

## The (three-point) element residual estimator is then

Sometimes the space M(T) is augmented by a fourth basis function, which is cubic with value 0 on dT and value 1 at the centroid of T. The resulting four-dimensional space will be denoted by M(T}, and the solution of the above Galerkin problem, with M(T) replaced by M(T), will be denoted by /,. The (four-point) element residual estimator is

## 15.3. The element residual error estimator

343

Figu re 15.7. The convergence to zero of the error (in the energy norm) for Example 15.7: uniform refinement (solid line), local refinement based on the three-point element residual estimator (dashed line), local refinement based on the four-point element residual estimator (dotted line). The dashed and dotted lines are almost indistinguishable. Because of their distinctive shapes, the basis functions of M(T) are called bubble functions (see Figures 4. 1 1 and 4. 1 5). To illustrate their effectiveness, the two element residual estimators will be applied to the test problems from Section 14.3. EXAMPLE 15.7 (cf. Example 14.1). The three-point element residual estimator led to

## while the four-point element residual method resulted in

These results are compared to the results for uniform refinement in Figure 15.7 (cf. Figure 14.11). EXAMPLE 15.8 (cf. Example 14.2). The three-point element residual estimator led to

344

## Chapter 15. Error estimators and indicators

Figure 15.8. The convergence to zero of the error (in the energy norm) for Example 15.8: uniform refinement (solid line), local refinement based on the three-point element residual estimator (dashed line}, local refinement based on the four-point element residual estimator (dotted line). The dashed and dotted lines are almost indistinguishable. These results are compared to the results for uniform refinement in Figure 15.8 (cf. Figure 14.13). EXAMPLE 15.9 (cf. Example 14.3). The three-point element residual estimator led to

## while the four-point element residual method resulted in

These results are compared to the results for uniform refinement in Figure 15.9 (cf. Figure 14.15). In all three examples, the three-point estimator was just as effective as the four-point estimator and therefore should be preferred due to its reduced cost. Even the three-point element residual method is significantly more expensive than the explicit error indicators presented in the previous sections. Implicit estimators, such as the element residual method, are thought to be more robust and accurate than explicit estimators (see, for example, the discussions in Sections 2.1 and 3.1 of Ainsworth and Oden [2]). Indeed, the explicit estimators presented in this book are really error indicators, not error estimators. The element residual method, on the other hand, is derived as a true estimator, and it is quite effective. The following table shows the ratios of true energy norm error to estimated energy norm error for the three-point element residual estimator for Example 15.8; the estimator provides a close upper bound for the true error.

## 15.4. Some final examples

345

Figu re 15.9. The convergence to zero of the error (in the energy norm) for Example 15.9: uniform refinement (solid line), local refinement based on the three-point element residual estimator (dashed line), local refinement based on the four-point element residual estimator (dotted line). The dashed and dotted lines are almost indistinguishable.

N 830 1712 3303 6722 12594 Ration 0.851 0.844 0.836 0.834 0.829
However, in his comparative study of error estimation and local refinement techniques, Mitchell [32] found that a wide variety of implicit and explicit estimators resulted in a similar performance in practice. Indeed, using the exact error (computed from a knowledge of the exact solution) as the error estimator did not usually result in the best mesh, at least in his numerical experiments. For this reason, inexpensive explicit indicators may be preferred over more expensive implicit estimators. There are a number of other estimators and indicators that have been proposed, beyond the three presented in this chapter. More information can be found in the book by Ainsworth and Oden [2].

15.4

## Some final examples

To illustrate the workings of the finite element method, I have presented a number of numerical examples throughout this book. Most of these have been problems with known smooth solutions, as such problems are convenient for illustrating various aspects of the performance of the algorithms. This final section will examine some typical problems that lead to singular solutions. The exact solutions are not known, but the adaptive algorithm developed in the last two chapters can be used to analyze the singularities numerically.

346

## Chapter 15. Error estimators and indicators

Figure 15.10. The mesh obtained by the adaptive finite element method and corresponding computed solution for the BVP described in Section 15.4.1.

## 15.4.1 A discontinuous coefficient

If K is piecewise continuous and the weak form of the BVP

is solved, it can be shown that the flux K Vu of the solution is continuous. Therefore, Vu will necessarily have a discontinuity wherever K does (unless VM is zero there). To illustrate this, 1 will define 2 to be the unit square,

and f(x, y) = 1, and solve the BVP using the adaptive algorithm described above (with the three-point element residual error estimator). Figure 15.10 shows the final mesh obtained and the corresponding solution; both the mesh and the solution show the effect of the discontinuous coefficient K . Figure 15.11 shows the graph of z u(Q.5, y), a "slice" of the computed solution. The discontinuity in the gradient is clearly visible.

## 15.4.2 A reentrant corner

Elliptic regularity, described in Section 5.4, holds for planar regions ft if 9ft is smooth or ft is convex. A nonconvex region ft with a boundary that fails to be smooth can lead to solutions with singularities even if the problem functions (coefficients, forcing functions, boundary data) are smooth. The region ft shown in Figure 15.12 is such a region; the corner at the center of the graph is commonly referred to as a reentrant corner.

## 15.4. Some final examples

347

Figure 15.11. The graph ofz = w(0.5, y), where u is the computed solution from the example in Section 15.4.1. The discontinuity in the gradient is clearly seen.

Figu re 15.12. The mesh obtained by the adaptive finite element and corresponding computed solution for the BVP described in Section 15.4.2.

## The simple Dirichlet BVP

was solved by the adaptive algorithm, resulting in the mesh and computed solution shown in Figure 15.12. The locally refined mesh, in a neighborhood of the reentrant corner, is magnified by factors of 4 and 16 in Figure 15.13.

348

## Chapter 15. Error estimators and indicators

Figure 15.13. The mesh from Figure 15.12, magnified 4 times (left) and 16 times (right) around the reentrant corner.

Figu re 15.14. 77/e me*// obtained by the adaptive finite element and corresponding computed solution for the BVP described in Section 15.4.3. 15.4.3 Transition from Dirichlet to Neumann conditions

The final example involves mixed boundary conditions, with a singularity appearing where the boundary conditions change from Dirichlet to Neumann. The BVP is

where Q is the unit square, F2 is the interval on the y-axis with endpoints (0, 0.25) and (0, 0.75), and PI is the rest of 32. The mesh obtained from the adaptive algorithm and the corresponding computed solution are graphed in Figure 15.14. The solution has singularities at the points (0, 0.25) and (0, 0.75), where the boundary conditions change type. This is clearly seen in Figure 15.15, where z = (0, y) is graphed.

## 15.6. Exercises for Chapter 15

349

Figure 15.15. A graph ofz w(0, y), where u is the computed solution from the example in Section 15.4.3. The discontinuity in the gradient is clearly seen.

15.5

## The MATLAB implementation

In addition to routines implementing the estimators from this chapter, there are functions to estimate the actual errors, in the energy and L-norms, on each triangle. These routines require that the exact solution be provided. I say estimate, even in this context, because quadrature is used to estimate the integrals defining the energy norm, and the L-norm is estimated by sampling the given function at a judicious selection of points. 15.5.1 MATLAB functions

EJindicatorl: The Eriksson-Johnson error indicator. ExpResiduall: The explicit residual error indicator. ElementResiduall: The element residual error estimator (three- or four-point versions). ElementEnergyNormErrsl: True energy norm errors (estimated by quadrature). Requires the true solution. ElementLinfNormErrsl: True L-norm errors (estimated by sampling). Requires the true solution. getBubbleVals: Computes the values and gradients of the bubble functions at specified evaluation nodes on TR.

15.6

## Exercises for Chapter 15

1. On page 337, it was stated that each triangle T' e Th belongs to at most k patches f for some finite integer k. Find k if 77, is a standard uniform mesh on a rectangle (such as the mesh on the left in Figure 14.1).

350

## Chapter 15. Error estimators and indicators

2. Show that the weak form of the BVP defined by (15.10) and (15.11H15.13) is (15.14). 3. Perform an operation count for each of the error estimators described in this chapter. How does each compare to the cost of assembling the stiffness matrix K1 (See Exercise 7.6.7.) 4. (MATLAB) Solve the BVP from Exercise 14.5.4 using the adaptive algorithm using the element residual error estimator. How does the total error estimated by the element residual method compare with the total error in the final solution? 5. (MATLAB) Consider the BVP

where Q is the polygon with vertices (1,-1), (1, 1), (-1, 1), (-1,-1), and (0, 0). Using an adaptive algorithm, estimate the solution with an (estimated) error of no more than 5% in the energy norm. 6. (MATLAB) Consider the linear system KU F arising from the BVP in Example 14.2 on a locally refined mesh. Solve this system using each of the following algorithms:
(a) CG;

(b) CG with the hierarchical basis preconditioning; (c) CG with SSOR preconditioning. Repeat for several levels of (local) refinement. Which method seems to be the most efficient? 7. (MATLAB) Repeat the previous exercise for the system KU F arising from the BVP in Example 14.3. 8. (MATLAB) Consider the BVP from Exercise 14.5.7. (a) Solve the BVP using the adaptive algorithm with each of the error estimators presented in this chapter. Which method is most efficient? (b) Does the element residual estimator accurately estimate the errors in the computed solution? If not, why not? (Hint: Consider the behavior of the true solution across the discontinuities of K.) 9. (Programming) Extend the MATLAB codes ExpResiduall and Element Residua 11 to allow a zero-order term in the PDE, as in

15.6.

## Exercises for Chapter 15

351

10. (Programming) Extend the MATLAB code ElementEnergyNormErrsl to allow a zero-order term in the PDE, as in

## 11. (MATLAB) The solution of

has a boundary layer whose severity increases with p. Solve this BVP on the unit square for p 10, 100, 1000 and observe how the meshes change with p. 12. (Programming) The triangle selection algorithm involves a number of choices. The MATLAB codes Solve and SelectTris include the following features: SelectTris always selects at least 20% of the triangles for refinement. Each selected triangle is refined until the diameters of its subtriangles are at most half of its diameter. These choices are almost certainly not optimal. The purpose of this exercise is to design a more efficient triangle selection process. Modify the code as follows: (a) SelectTris selects at least rN, triangles to refine, where r e [0, 1) is an algorithmic parameter. (If r = 0, then the algorithm selects only the triangles indicated by the strategy described in Section 14.2, even if the number is small.) (b) Solve refines each selected triangle until the diameters of its subtriangles are at most p times its diameter, where p (0, 1] is an algorithmic parameter. (If p 1, then Solve refines each selected triangle only once, which may not lead to a decreased diameter since the refinement is done by bisection.) (c) Solve calls CG to solve the finite element equations KU = F. By numerical testing, determine good values for r and p. Use the test problems from Examples 14.1, 14.2, and 14.3, and measure the efficiency by the total amount of work required to reduce the error in the computed solution below a given threshold. (To measure this total work, the results of Exercises 7.6.7, 7.6.8, and 15.6.3, as well as the operation count for the CG algorithm given in Section 11.1.1, will be useful.) Be sure to include the cost of the calculations on the intermediate meshes in the totals.

Bibliography
[ 1 ] R. A. Adams. Sobolev Spaces. Academic Press, New York, 1975. [2] Mark Ainsworth and J. Tinsley Oden. A Posteriori Error Estimation in Finite Element Analysis. John Wiley, New York, 2000. [3] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. 3rd edition, SI AM, Philadelphia, 1999. [4] D. N. Arnold and R. Winther. Mixed finite elements for elasticity. Numer. Math., 42:401-419,2002. [5] D. N. Arnold and R. Winther. Nonconforming mixed elements for elasticity. Math. Models Methods Appl. ScL, 13:295-307, 2003. [6] Owe Axelsson. Iterative Solution Methods. Cambridge University Press, Cambridge, UK, 1994. [7] I. Babuska and W. C. Rheinboldt. Error estimates for adaptive finite element computations. SIAMJ. Numer. Anal., 15:736-754, 1978. [8] I. Babuska and T. Strouboulis. The Finite Element Method and its Reliability. Oxford University Press, New York, 2001. [9] Ivo Babuska and Manil Suri. Locking effects in the finite element approximation of elasticity problems. Numer. Math., 62:439-463, 1992. [10] R. E. Bank and A. Weiser. Some a posteriori error estimators for elliptic partial differential equations. Math. Comp., 44:283-301, 1985. [11] Mark W. Beall and Mark S. Shephard. Mesh Data Structures for Advanced Finite Element Applications. Technical Report SCOREC Report #23-1995, Scientific Computation Research Center, Rensselaer Polytechnic Institute, Troy, NY, 1995. [12] P. Bochev and R. B. Lehoucq. On the finite element solution of the pure Neumann problem. SIAM Review, 47:50-66, 2005. [13] Susanne C. Brenner and L. Ridgway Scott. The Mathematical Theory of Finite Element Methods. Springer-Verlag, New York, 1994.
353

354

Bibliography

Bibliography

355

[31] J. A. Meijerink and H. A. van der Vorst. An iterative solution method for linear systems of which the coefficient matrix is a symmetric m-matrix. Math. Comp., 31:148-162, 1977. [32] William F. Mitchell. A comparison of adaptive refinement techniques for elliptic problems. ACM Trans. Math. Software, 15:326-347, 1989. [33] Per-Olof Persson and Gilbert Strang. A simple mesh generator in MATLAB. SIAM Review, 46:329-345, 2004. [34] Jeffrey Rauch. Partial Differential Equations. Springer-Verlag, New York, 1991. [35] Maria-Cecilia Rivara. Algorithms for refining triangular grids suitable for adaptive and multigrid techniques. Int. J. Numer. Methods Engrg., 20:745-756, 1984. [36] H. L. Royden. Real Analysis. 2nd edition, Macmillan, New York, 1968. [37] Yousef Saad. Iterative Methods for Sparse Linear Systems. Philadelphia, 2003. 2nd edition, SIAM,

[3 8] R. Scott. Finite Element Techniquesfor Curved Boundaries. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1973. [39] E. G. Sewell. Automatic Generation ofTriangulationsfor Piecewise Polynomial Approximation. PhD thesis, Purdue University, West Lafayette, IN, 1972. [40] Gilbert Strang. Variational crimes in the finite element method. In A. K. Aziz, ed., The Mathematical Foundations of the Finite Element Method with Applications to Partial Differential Equations, Academic Press, New York, 1972, pp. 689-710. [41 ] Gilbert Strang and George J. Fix. An Analysis of the Finite Element Method. WellesleyCambridge Press, Wellesley, MA, 1988. [42] Richard S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, NJ, 1962. [43] David M. Young. Iterative Solution of Large Linear Systems. Academic Press, New York, 1971. [44] Harry Yserentant. On the multi-level splitting of finite element spaces. Numer. Math., 49:379-412, 1986. [45] O. C. Zienkiewicz, B. M. Irons, F. E. Scott, and J. S. Campbell. High speed computing of elastic structures. In Proceedings of the Symposium of the International Union of Theoretical and Applied Mechanics. Liege, 1970.

Index
adaptive finite elements basic algorithm, 310 admissible displacement, 22 algebraic error, 303 approximating subspace, 70, 79 for Dirichlet conditions, 72 back substitution, 226 balance law, 9 banded matrix, 228 barycentric coordinates, 122, 161 basis, 58 hierarchical, 85 Lagrange, 70, 78 nodal, 70 best approximation, 58 bilinear form, 43 bounded, 43 elliptic, 43 BndyFcn, 137, 148 body force, 10 boundary curved, 93 of a domain, 4 boundary condition inhomogeneous Neumann, 132 boundary conditions, 4 Dirichlet, 4, 5, 7, 45, 169 essential, 30, 33 inhomogeneous Dirichlet, 74, 111, 132, 194 inhomogeneous Neumann, 194, 198 interpolated, 112 mixed, 8, 10,31,33,46,72,78 natural, 30 Neumann, 4, 5, 47, 70, 170, 177 transition, 348
357

boundary layer, 3 1 8 boundary value problem, 3, 4 Dirichlet, 20 elliptic, 8 Neumann, 7, 17, 47 one-dimensional, 64 variational form, 3, 15, 21, 22 weak form, 3, 15,21,27,34,57 bounded set, 4 bulk modulus, 12, 13,49,218 C A (ft),20 C0(ft),23 Cauchy sequence, 40 Cauchy-Schwarz inequality, 37 Cea's theorem, 61, 63 CG, 262, 276 CGsing, 262 CGSSOR, 276 change of variables in a 1-D integral, 158 in a 2-D integral, 160 Cholesky factorization, 226 of a banded matrix, 229 of a general sparse matrix, 230 Ciarlet, 199 closure of a domain, 4 CNodePtrs, 134 coarse-grid correction, 296 CoarseCircleMeshDl, 150 CoarseCircleMeshNl, 150 CoarseEllipseMeshDl, 151 CoarseEllipseMeshNl, 151 CoarseSemiCircleMeshBottomDl, 150 CoarseSemiCircleMeshDl, 150

c(n),2o

358

Index

compact set, 23 compactly supported function, 23 compatibility condition, 5, 7, 17, 256 compatibly divisible triangles, 312 complete space, 40 condition number and choice of basis, 244 of a stiffness matrix, 237 of a symmetric positive definite matrix, 237 conjugate directions, 242 conjugate gradient algorithm, 239, 242 preconditioned, 253 with hierarchical bases, 251, 254 connected set, 4 connectivity of a mesh, 129 consistency of a singular system, 261 constitutive hypothesis, 9 continuous dependence, 44 convergence of the conjugate gradient algorithm, 243 of the steepest descent algorithm, 237 degenerate family of meshes, 311 Degree, 137 Delaunay triangulation, 140 dense subset, 40 descent direction, 236 diagonally dominant matrix, 271 direct method for solving a linear system, 225 discontinuous coefficient, 346 discretization error, 303 displacement problem, 49 distmesh2d, 140 divergence of a tensor, 9 divergence operator, 6, 16 divergence theorem, 15, 18 for a vector-valued function, 18 domain, 4 dot product, 36 of tensors, 11

dual space of a normed vector space, 42 duality trick, 116 Dunavant (quadrature rules), 188 DunavantData, 203,204 edge constrained, 72 free, 72 EdgeCFlags, 137 EdgeEls, 137 Edges, 135 efficiency of high-order elements, 201 EJ Indicator, 349 elasticity, 8 general, 34 isotropic, 9, 48, 213 element quadrilateral, 87, 165 rectangular, 86 reference, 87, 91, 188, 195 triangular, 68 ElementEnergyNormErrs, 349 ElementLinfNormErrs, 349 ElementResidual, 349 Elements, 135 elliptic regularity, 115 EnergyNorm, 205 EnergyNorml, 182 EnergyNorm2, 204 EnergyNormErr, 205 EnergyNormErrl, 173, 182 EnergyNormErr2, 204 error bound a posteriori, 106 a priori, 105 asymptotic, 105 finite element solution, 111 inhomogeneous Dirichlet problem, 114 isoparametric finite elements, 122 piecewise linear interpolation, 107 piecewise polynomial interpolation, 109 quadrature, 121

Index

359
getDirichletData, 183 getFBndyEdgeNodes, 325 getGrad 1,325 getGradientsl, 183 getNeumannData, 206 getNeumannDatal, 183 getNeumannData la, 183 getNeumannData2, 205 getNeumannData2a, 205 getNeumannDatalso, 221 getNodal Values, 183 getNodes, 205 getNodesl, 166, 183 getNormal, 206 getNormall, 183 getNorma!2, 205 getOppositeVertex, 325 getOtherEdges, 325 getTriNodelndices, 205 getTrilModelndicesl, 183 getVertices, 205 global indices, 69 gradient, 9 Gram matrix, 59 Green's (first) identity, 18 alternate version, 19 extension, 19 //'(ft), 27, 71 half-bandwidth, 228 harmonic function, 4 heat flux, 6, 7, 16,212 help in MATLAB, 144 hierarchical basis, 244 HierCG 1,262 HierCGsing 1,262 HierToNodal 1,262 HierToNodalTransl, 262 Hilbert space, 40 hyperelasticity, 11 ill-conditioned matrix, 237 incomplete Cholesky factorization, 255 incompressible material, 218 indices, 69

error estimate a posteriori, 180, 317 heuristic, 179,202 error estimator element residual, 340 explicit versus implicit, 329 quadratic, 317 error indicator, 329 Eriksson-Johnson, 331 explicit residual, 338 EvalNodalBasisFcns, 203, 204 EvalNodalBasisFcnslD, 204 EvalNodalBasisGrads, 203, 204 EvalPWPolyFcn2, 204 existence, 4 ExpResidual, 349 ExtractLinearMesh, 204 fast Poisson solver, 255 FBndyEdges, 137 fill-in, 228 reducing, 231 finite differences for the Laplacian, 281 finite element method, 3, 67 Galerkin, 3 fixed point, 267 fixed point iteration, 267 FNodePtrs, 134 forward substitution, 226 Fourier modes, 284 Fourier's law of heat conduction, 6, 212 Friedrich's inequality, 48 fullmgl,305 fundamental theorem of calculus, 17 Galerkin method, 57, 60, 295 Gauss-Seidel iteration, 272, 289 GaussData, 204 GenLagrangeMesh, 205 GenLagrangeMesh2, 203 getAdjacentTriangle, 325 getAdjacentTriangles, 325 getBubbleVals, 349 getDiameter, 325 getDiameters, 325

360

Index

induced matrix norm, 268 infinitesimal rigid displacements, 12 inner product, 36, 57 H\39 L 2 , 38 energy, 43 integral Lebesgue, 28 Riemann, 28 integration by parts in multiple dimensions, 1 8 interior node, 192 isoparametric, 199 interpolant piecewise linear, 107 piecewise polynomial, 109 Interpolate 1, 180, 183,305 InterpolateTransl, 305 interpolation operator, 293 IntNodes, 187 isoparametric finite elements, 95, 121,1 95, 201 iterative method for solving a linear system, 225 Jacobi, 276 Jacobi iteration, 271, 285 weighted, 287 Jacobi preconditioning, 254 Jacobian determinant, 93, 160 matrix, 90 nonconstant, 98, 197 JiggleMeshl, 151 Korn's inequality, 49, 5 1 Krylov subspace, 241 L 2 (ft),27 L 2 (a,fc),37 L2Norml, 182 L2Norm2, 204 L2NormErrl, 182 L2NormErr2, 204

Lagrange triangle cubic, 82 linear, 78 quadratic, 78 Lame moduli, 9,214 Laplace operator, 3, 16 Laplace's equation, 3 Laplacian, 3 Lax-Milgram theorem, 52 Lenoir, 199 LevelNodes, 140 line search, 236 linear functional, 41 bounded, 42 LinfNormErrl, 182 Load, 205 load vector, 60, 93, 169, 195 element-oriented assembly, 132 Loadl, 171, 182 Load2, 203, 204 Loadlso, 221 local indices, 69 locally integrable function, 24 LocalRefine 1,324 longest-edge bisection, 313 MakeEdgesCurvedl, 151 MakeMeshl, 140, 151 mass matrix, 104 matrix banded, 228 diagonally dominant, 271 ill-conditioned, 85 sparse, 74 symmetric positive definite, 65 matrix conductivity problem, 213 matrix norm, 268 induced, 268 measurable function, 28 membrane isotropic, 8 small vertical deflections of, 7 Mesh, 205 mesh, 67 data structure, 134, 187 nonconforming, 68, 311

Index nonuniform, 95 quality, 143 uniform versus locally refined, 309 mesh locking, 219 mesh size, 69 Mesh 1,150 Mesh2, 203 MeshQualityl, 144, 151 mgmul, 304 midpoint rule, 157 multigrid //-cycle, 299 full multigrid, 301 two-grid algorithm, 295 V-cycle, 296 V-cycle, recursive version, 299 W-cycle, 299 Neumann data, 177 NeumannMesh, 151 newest node bisection, 312 NGonMeshDl, 151 nodal value, 69 NodalToHier 1,263 NodalToHierTransl, 263 node constrained, 72 free, 72 interior, 192 NodeParents, 140 NodePtrs, 135 Nodes, 134 nodes constrained, 79 free, 79 nondegenerate family of triangulations, 107 norm, 36 equivalent, 43 of a linear functional, 41 of a matrix, 268 normal derivative, 4 normal equations, 59 notation summary, 151 numerical integration, 119 open set, 4 orthogonal basis, 59 orthogonal vectors, 57 partial derivative classical, 23 weak, 24, 71 piecewise linear function continuous, 69, 107 piecewise polynomial, 3, 67, 109 piecewise quadratic function continuous, 78 Poincare's inequality, 46 Poisson's equation, 4 Poisson's ratio, 9, 13,217 polynomial, 67 bilinear, 86 postsmoothing, 296 potential energy, 21, 62 minimal, 22 preconditioner, 252 presmoothing, 296 product rule, 17 in multiple dimensions, 17, 18 programming object-oriented, 134 procedural, 134 projection theorem, 58 Pythagorean theorem, 58 QuadElementErrEstl, 324 quadratic form, 235 quadratic penalty method, 258 quadrature degree of precision, 156 Gaussian, 1-D, 156 midpoint rule, 157 on a square, 165 on a triangle, 158, 188, 196 product Gauss rule, 165 Simpson's rule, 156 trapezoidal rule, 156 quadrature rule, 119 degree of precision, 119

361

362

Index spectral radius of a matrix, 269 splitting of a matrix, 270 SSOR, 276 SSOR preconditioning, 255 stability, 44 stationary iteration, 267 stationary point, 267 steady-state heat flow, 5, 16 steepest descent algorithm, 237 direction, 237 Stiffness, 205 stiffness matrix, 60, 93, 167, 193 element-oriented assembly, 130 hierarchical versus nodal bases, 249 node-oriented assembly, 128 singular, 178,256 Stiffness 1, 182 Stiffhess2, 203, 204 StiffnessE, 325 Stiffnesslso, 220 StiffnessMC, 221 strain tensor linearized, 9 stress tensor, 9 stress-strain law general, 10 struct MATLAB, 138 subspace, 39, 58 successive overrelaxation, 273 support of a function, 23, 129 symamd, 231 symmetric minimum degree permutation, 231 symmetric reverse Cuthill-McKee algorithm, 231 symmetric successive overrelaxation, 255, 274 and conjugate gradients, 275 symrcm, 231 temperature gradient, 6 test function, 20 TestConv, 206 TestConvl, 183

R rt ,35 rank-one matrix, 260 rational numbers, 39 Raviart, 199 RectangleMeshDl, 150, 173, 185 RectangleMeshDla, 185 RectangleMeshNl, 150 RectangleMeshTopDl, 150 RectangleMeshTopLeftDl, 150 RectangleMeshTopLeftNl, 150 reentrant corner, 346 Refinel, 139, 151 refinement and curved boundaries, 147 green, 311 standard, 106 RefTri, 204 residual, 293 Riemann sum, 38 Riesz representation theorem, 42, 47-^9 rigid displacement infinitesimal, 11, 50 Ritz method, 62 Scott, 199 SelectTris, 324 shape functions, 78, 97 rational, 89 shear modulus, 12, 13, 49, 218 ShowDisplacement, 220 ShowMesh, 205 ShowMeshl, 144, 151 ShowMesh2, 204, 205 ShowPWConstFcn, 151 ShowPWLinFcnl, 148, 151 ShowPWPolyFcn, 205 ShowPWPolyFcn2, 204 ShowSupportl, 151 Simpson's rule, 156 Sobolev space, 27, 28 Solve, 324 Solve 1,324 SOR, 273, 276 optimal, 273 sparse matrix, 59 sparsity pattern, 76, 80

Index TestConv2, 204 TestConvlso, 221 TestConvMC, 221 thermal conductivity, 6, 45 anisotropic, 213 total error, 303 trace, 9 trace theorem, 28, 46 traction problem, 10, 49, 50 TransToRefTri, 204 trapezoidal rule, 156 triangle green, 311 with curved edge, 97 triangle inequality, 3 triangle selection for local refinement, 315 triangle-node list, 129 triangulation, 68 Delaunay, 140 nonconforming, 68 uniqueness, 4 unRefine 1,305 variation first, 22 variational crime, 118 variational problem nonsymmetric, 51, 63 vector space, 35 work unit, 296 Young's modulus, 9, 13,217

363