3 views

Uploaded by Ricardo

Metodos de Dominio de Descomposición Para La Solución Numérica de Ecuaciones Diferenciales

save

You are on page 1of 775

Lecture Notes

in Computational Science

and Engineering 61

Editors

Timothy J. Barth

Michael Griebel

David E. Keyes

Risto M. Nieminen

Dirk Roose

Tamar Schlick

Tarek P. A. Mathew

Domain Decomposition

Methods for the Numerical

Solution of Partial

Differential Equations

With 40 Figures and 1 Table

ABC

**Tarek Poonithara Abraham Mathew
**

tmathew@poonithara.org

ISBN 978-3-540-77205-7 e-ISBN 978-3-540-77209-5

**Lecture Notes in Computational Science and Engineering ISSN 1439-7358
**

Library of Congress Control Number: 2008921994

**Mathematics Subject Classiﬁcation (2000): 65F10, 65F15, 65N22, 65N30, 65N55,
**

65M15, 65M55, 65K10

**c 2008 Springer-Verlag Berlin Heidelberg
**

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is

concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,

reproduction on microﬁlm or in any other way, and storage in data banks. Duplication of this publication

or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,

1965, in its current version, and permission for use must always be obtained from Springer. Violations are

liable for prosecution under the German Copyright Law.

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,

even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws

and regulations and therefore free for general use.

Cover design: WMX Design GmbH, Heidelberg

Printed on acid-free paper

9 8 7 6 5 4 3 2 1

spinger.com

**In loving dedication to my (late) dear mother,
**

and to my dear father and brother

Preface

**These notes serve as an introduction to a subject of study in computational
**

mathematics referred to as domain decomposition methods. It concerns divide

and conquer methods for the numerical solution and approximation of partial

diﬀerential equations, primarily of elliptic or parabolic type. The methods in

this family include iterative algorithms for the solution of partial diﬀerential

equations, techniques for the discretization of partial diﬀerential equations

on non-matching grids, and techniques for the heterogeneous approximation

of partial diﬀerential equations of heterogeneous character. The divide and

conquer methodology used is based on a decomposition of the domain of the

partial diﬀerential equation into smaller subdomains, and by design is suited

for implementation on parallel computer architectures. However, even on serial

computers, these methods can provide ﬂexibility in the treatment of complex

geometry and heterogeneities in a partial diﬀerential equation.

Interest in this family of computational methods for partial diﬀerential

equations was spawned following the development of various high perfor-

mance multiprocessor computer architectures in the early eighties. On such

parallel computer architectures, the execution time of these algorithms, as

well as the memory requirements per processor, scale reasonably well with

the size of the problem and the number of processors. From a computational

viewpoint, the divide and conquer methodology based on a decomposition

of the domain of the partial diﬀerential equation, yields algorithms having

coarse granularity, i.e., a signiﬁcant portion of the computations can be im-

plemented concurrently on diﬀerent processors, while the remaining portion

requires communication between the processors. As a consequence, these al-

gorithms are well suited for implementation on MIMD (multiple instruction,

multiple data) architectures. Currently, such parallel computer architectures

can alternatively be simulated using a cluster of workstations networked with

high speed connections using communication protocols such as MPI (Message

Passing Interface) [GR15] or PVM (Parallel Virtual Machines) [GE2].

VIII Preface

**The mathematical roots of this subject trace back to the seminal work of
**

H. A. Schwarz [SC5] in the nineteenth century. Schwarz proposed an iterative

method, now referred to as the Schwarz alternating method, for constructing

harmonic functions on regions of irregular shape which can be expressed as

the union of subregions of regular shape (such as rectangles and spheres). His

motivation was primarily theoretical, to establish the existence of harmonic

functions on irregular regions, and his method was not used in computations

until recently [SO, MO2, BA2, MI, MA37, DR11, LI6, LI7, BR18].

A general development of domain decomposition methodology for par-

tial diﬀerential equations occurred only subsequent to the development of

parallel computer architectures, though divide and conquer methods such as

Kron’s method for electrical circuits [KR] and the substructuring method

[PR4] in structural engineering, pre-date domain decomposition methodol-

ogy. Usage of the term “domain decomposition” seems to have originated

around the mid-eighties [GL2] when interest in these methods gained mo-

mentum. The ﬁrst international symposium on this subject was held in Paris

in 1987, and since then there have been yearly international conferences on

this subject, attracting interdisciplinary interest from communities of engi-

neers, applied scientists and computational mathematicians from around the

globe.

Early literature on domain decomposition methods focused primarily on

iterative procedures for the solution of partial diﬀerential equations. As the

methodology evolved, however, techniques were also developed for coupling

discretizations on subregions with non-matching grids, and for constructing

heterogeneous approximations of complicated systems of partial diﬀerential

equations having heterogeneous character. The latter approximations are built

by solving local equations of diﬀerent character. From a mathematical view-

point, these diverse categories of numerical methods for partial diﬀerential

equations may be derived within several frameworks. Each decomposition of

a domain typically suggests a reformulation of the original partial diﬀeren-

tial equation as an equivalent coupled system of partial diﬀerential equations

posed on the subdomains with boundary conditions chosen to match solu-

tions on adjacent subdomains. Such equivalent systems are referred to in

these notes as hybrid formulations, and provide a framework for develop-

ing novel domain decomposition methods. Divide and conquer algorithms can

be obtained by numerical approximation of hybrid formulations. Four hybrid

formulations are considered in these notes, suited for equations primarily of

elliptic type:

• The Schwarz formulation.

• The Steklov-Poincar´e (substructuring or Schur complement) formulation.

• The Lagrange multiplier formulation.

• The Least squares-control formulation.

Alternative hybrid formulations are also possible, see [CA7, AC5].

Preface IX

**The applicability and stability of each hybrid formulation depends on
**

the underlying partial diﬀerential equation and subdomain decomposition.

For instance, the Schwarz formulation requires an overlapping decomposi-

tion, while the Steklov-Poincar´e and Lagrange multiplier formulations are

based on a non-overlapping decomposition. The least squares-control method

can be formulated given overlapping or non-overlapping decompositions.

Within each framework, novel iterative methods, discretizations schemes on

non-matching grids, and heterogeneous approximations of the original par-

tial diﬀerential equation, can be developed based on the associated hybrid

formulations.

In writing these notes, the author has attempted to provide an accessible

introduction to the important methodologies in this subject, emphasizing a

matrix formulation of algorithms. However, as the literature on domain de-

composition methods is vast, various topics have either been omitted or only

touched upon. The methods described here apply primarily to equations of

elliptic or parabolic type, and applications to hyperbolic equations [QU2], and

spectral or p-version elements have been omitted [BA4, PA16, SE2, TO10].

Applications to the equations of elasticity and to Maxwell’s equations have

also been omitted, see [TO10]. Parallel implementation is covered in greater

depth in [GR12, GR10, FA18, FA9, GR16, GR17, HO4, SM5, BR39]. For

additional domain decomposition theory, see [XU3, DR10, XU10, TO10]. A

broader discussion on heterogeneous domain decomposition can be found in

[QU6], and on FETI-DP and BDDC methods in [TO10, MA18, MA19]. For

additional bibliography on domain decomposition, see http://www.ddm.org.

Readers are assumed to be familiar with the basic properties of ellip-

tic and parabolic partial diﬀerential equations [JO, SM7, EV] and tradi-

tional methods for their discretization [RI, ST14, CI2, SO2, JO2, BR28, BR].

Familiarity is also assumed with basic numerical analysis [IS, ST10], com-

putational linear algebra [GO4, SA2, AX, GR2, ME8], and elements of op-

timization theory [CI4, DE7, LU3, GI2]. Selected background topics are re-

viewed in various sections of these notes. Chap. 1 provides an overview of

domain decomposition methodology in a context involving two subdomain

decompositions. Four diﬀerent hybrid formulations are illustrated for a model

coercive 2nd order elliptic equation. Chapters 2, 3 and 4 describe the ma-

trix implementation of multisubdomain domain decomposition iterative al-

gorithms for traditional discretizations of self adjoint and coercive elliptic

problems. These chapters should ideally be read prior to the other chapters.

Readers unfamiliar with constrained minimization problems and their saddle

point formulation, may ﬁnd it useful to review background in Chap. 10 or

in [CI4], as saddle point methodology is employed in Chaps. 1.4 and 1.5 and

in Chaps. 4 and 6. With a few exceptions, the remaining chapters may be

read independently.

X Preface

**The author expresses his deep gratitude to the anonymous referees who
**

made numerous suggestions for revision and improvement of the manuscript.

Deep gratitude is also expressed to Prof. Olof Widlund who introduced the

author to this subject over twenty years ago, to Prof. Tony Chan, for his kind

encouragement to embark on writing a book extending our survey paper on

this subject [CH11], and to Prof. Xiao-Chuan Cai, Prof. Marcus Sarkis and

Prof. Junping Wang for their research collaborations and numerous insightful

discussions over the years. The author deeply thanks Prof. Timothy Barth

for his kind permission to use the ﬁgure on the cover of this book, and for

use of Fig. 5.1. To former colleagues at the University of Wyoming, and to

professors Myron Allen, Gastao Braga, Benito Chen, Duilio Concei¸c˜ao, Max

Dryja, Frederico Furtado, Juan Galvis, Etereldes Gon¸calves, Raytcho Lazarov,

Mary Elizabeth Ong, Peter Polyakov, Giovanni Russo, Christian Schaerer,

Shagi-Di Shih, Daniel Szyld, Panayot Vassilevski and Henrique Versieux, the

author expresses his deep gratitude. Deep appreciation is also expressed to the

editors of the LNCSE series, Dr. Martin Peters, Ms. Thanh-Ha LeThi, and

Mr. Frank Holzwarth for their patience and kind help during the completion

of this manuscript. Finally, deep appreciation is expressed to Mr. Elumalai

Balamurugan for his kind assistance with reformatting the text. The author

welcomes comments and suggestions from readers, and hopes to post updates

at www.poonithara.org/publications/dd.

January 2008 Tarek P. A. Mathew

Contents

1 Decomposition Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Hybrid Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Schwarz Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Steklov-Poincar´e Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.4 Lagrange Multiplier Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.5 Least Squares-Control Framework . . . . . . . . . . . . . . . . . . . . . . . . . 36

**2 Schwarz Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
**

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.2 Projection Formulation of Schwarz Algorithms . . . . . . . . . . . . . . 56

2.3 Matrix Form of Schwarz Subspace Algorithms . . . . . . . . . . . . . . 66

2.4 Implementational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.5 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

**3 Schur Complement and Iterative
**

Substructuring Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.2 Schur Complement System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

3.3 FFT Based Direct Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

3.4 Two Subdomain Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . . . 140

3.5 Preconditioners in Two Dimensions . . . . . . . . . . . . . . . . . . . . . . . . 155

3.6 Preconditioners in Three Dimensions . . . . . . . . . . . . . . . . . . . . . . . 162

3.7 Neumann-Neumann and Balancing Preconditioners . . . . . . . . . . 175

3.8 Implementational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

3.9 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

**4 Lagrange Multiplier Based Substructuring:
**

FETI Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

4.1 Constrained Minimization Formulation . . . . . . . . . . . . . . . . . . . . . 232

4.2 Lagrange Multiplier Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 239

4.3 Projected Gradient Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

4.4 FETI-DP and BDDC Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

XII Contents

**5 Computational Issues and Parallelization . . . . . . . . . . . . . . . . . . 263
**

5.1 Algorithms for Automated Partitioning of Domains . . . . . . . . . . 264

5.2 Parallelizability of Domain Decomposition Solvers . . . . . . . . . . . 280

**6 Least Squares-Control Theory: Iterative Algorithms . . . . . . 295
**

6.1 Two Overlapping Subdomains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

6.2 Two Non-Overlapping Subdomains . . . . . . . . . . . . . . . . . . . . . . . . 303

6.3 Extensions to Multiple Subdomains . . . . . . . . . . . . . . . . . . . . . . . . 310

**7 Multilevel and Local Grid Reﬁnement Methods . . . . . . . . . . . 313
**

7.1 Multilevel Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

7.2 Iterative Algorithms for Locally Reﬁned Grids . . . . . . . . . . . . . . 321

**8 Non-Self Adjoint Elliptic Equations: Iterative Methods . . . . 333
**

8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

8.2 Diﬀusion Dominated Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

8.3 Advection Dominated Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

8.4 Time Stepping Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

8.5 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

**9 Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
**

9.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

9.2 Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

9.3 Non-Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384

9.4 Parareal-Multiple Shooting Method . . . . . . . . . . . . . . . . . . . . . . . . 401

9.5 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408

**10 Saddle Point Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
**

10.1 Properties of Saddle Point Systems . . . . . . . . . . . . . . . . . . . . . . . . 418

10.2 Algorithms Based on Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

10.3 Penalty and Regularization Methods . . . . . . . . . . . . . . . . . . . . . . . 434

10.4 Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437

10.5 Krylov Space and Block Matrix Methods . . . . . . . . . . . . . . . . . . . 445

10.6 Applications to the Stokes and Navier-Stokes Equations . . . . . . 456

10.7 Applications to Mixed Formulations of Elliptic Equations . . . . . 474

10.8 Applications to Optimal Control Problems . . . . . . . . . . . . . . . . . . 489

**11 Non-Matching Grid Discretizations . . . . . . . . . . . . . . . . . . . . . . . . 515
**

11.1 Multi-Subdomain Hybrid Formulations . . . . . . . . . . . . . . . . . . . . . 516

11.2 Mortar Element Discretization: Saddle Point Approach . . . . . . . 523

11.3 Mortar Element Discretization: Nonconforming Approach . . . . . 551

11.4 Schwarz Discretizations on Overlapping Grids . . . . . . . . . . . . . . . 555

11.5 Alternative Nonmatching Grid Discretization Methods . . . . . . . 559

11.6 Applications to Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . 564

Contents XIII

**12 Heterogeneous Domain Decomposition Methods . . . . . . . . . . . 575
**

12.1 Steklov-Poincar´e Heterogeneous Model . . . . . . . . . . . . . . . . . . . . . 576

12.2 Schwarz Heterogeneous Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585

12.3 Least Squares-Control Heterogeneous Models . . . . . . . . . . . . . . . 589

12.4 χ-Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

12.5 Applications to Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . 603

13 Fictitious Domain and Domain Imbedding Methods . . . . . . . 607

13.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608

13.2 Preconditioners for Neumann Problems . . . . . . . . . . . . . . . . . . . . 610

13.3 Preconditioners for Dirichlet Problems . . . . . . . . . . . . . . . . . . . . . 611

13.4 Lagrange Multiplier and Least Squares-Control Solvers . . . . . . . 614

14 Variational Inequalities and Obstacle Problems . . . . . . . . . . . . 621

14.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622

14.2 Projected Gradient and Relaxation Algorithms . . . . . . . . . . . . . . 628

14.3 Schwarz Algorithms for Variational Inequalities . . . . . . . . . . . . . 633

14.4 Monotone Convergence of Schwarz Algorithms . . . . . . . . . . . . . . 636

14.5 Applications to Parabolic Variational Inequalities . . . . . . . . . . . . 644

15 Maximum Norm Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647

15.1 Maximum Principles and Comparison Theorems . . . . . . . . . . . . . 648

15.2 Well Posedness of the Schwarz Hybrid Formulation . . . . . . . . . . 659

15.3 Convergence of Schwarz Iterative Algorithms . . . . . . . . . . . . . . . . 661

15.4 Analysis of Schwarz Nonmatching Grid Discretizations . . . . . . . 668

15.5 Analysis of Schwarz Heterogeneous Approximations . . . . . . . . . . 674

15.6 Applications to Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . 677

16 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679

16.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680

16.2 Gradient and Preconditioned Gradient Methods . . . . . . . . . . . . . 682

16.3 Schur Complement Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683

16.4 Schwarz Subspace Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684

16.5 Modal Synthesis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686

17 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689

17.1 Traditional Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690

17.2 Schwarz Minimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 697

18 Helmholtz Scattering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699

18.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700

18.2 Non-Overlapping and Overlapping Subdomain Methods . . . . . . 701

18.3 Fictitious Domain and Control Formulations . . . . . . . . . . . . . . . . 704

18.4 Hilbert Uniqueness Method for Standing Waves . . . . . . . . . . . . . 705

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761

1

Decomposition Frameworks

**In this chapter, we introduce and illustrate several principles employed in
**

the formulation of domain decomposition methods for an elliptic equation. In

our discussion, we focus on a two subdomain decomposition of the domain

of the elliptic equation, into overlapping or non-overlapping subdomains, and

introduce the notion of a hybrid formulation of the elliptic equation. A hybrid

formulation is a coupled system of elliptic equations which is equivalent to

the original elliptic equation, with unknowns representing the true solution

on each subdomain. Such formulations provide a natural framework for the

construction of divide and conquer methods for an elliptic equation. Using

a hybrid formulation, we heuristically illustrate how novel divide and con-

quer iterative methods, non-matching grid discretizations and heterogeneous

approximations can be constructed for an elliptic equation.

We illustrate four alternative hybrid formulations for an elliptic equation.

Each will be described for a decomposition of the domain into two subdomains,

either overlapping or non-overlapping. We shall describe the following:

• Schwarz formulation.

• Steklov-Poincar´e formulation.

• Lagrange multiplier formulation.

• Least squares-control formulation.

For each hybrid formulation, we illustrate how iterative methods, non-matching

grid discretizations and heterogeneous approximations can be formulated for

the elliptic equation based on its two subdomain decomposition. In Chap. 1.1,

we introduce notation and heuristically describe the structure of a hybrid for-

mulation. Chap. 1.2 describes a two subdomain Schwarz hybrid formulation,

based on overlapping subdomains. Chap. 1.3 describes the Steklov-Poincar´e

formulation, based on two non-overlapping subdomains. The Lagrange mul-

tiplier formulation described in Chap. 1.4 applies only for a self adjoint and

coercive elliptic equation, and it employs two non-overlapping subdomains.

Chap. 1.5 describes the least squares-control formulation for a two subdo-

main overlapping or non-overlapping decomposition.

2 1 Decomposition Frameworks

1.1 Hybrid Formulations

Given a subdomain decomposition, a hybrid formulation of an elliptic equation

is an equivalent coupled system of elliptic equations involving unknowns on

each subdomain. In this section, we introduce notation on an elliptic equation

and heuristically describe the structure of its two subdomain hybrid formu-

lation. We outline how divide and conquer iterative methods, non-matching

grid discretizations, and heterogeneous approximations can be constructed for

an elliptic equation, using an hybrid formulation of it. Four commonly used

hybrid formulations are described in Chap. 1.2 through Chap. 1.5.

1.1.1 Elliptic Equation

We shall consider the following 2nd order elliptic equation:

L u ≡ −∇ · (a(x)∇u) + b(x) · ∇u + c(x) u = f, in Ω

(1.1)

u = 0, on ∂Ω,

for Ω ⊂ IRd . The coeﬃcient a(x) will be assumed to satisfy:

0 < a0 ≤ a(x), ∀x ∈ Ω,

while b(x) and c(x) ≥ 0 will be assumed to be smooth, and f (x) ∈ L2 (Ω).

Additional restrictions will be imposed on the coeﬃcients as required.

1.1.2 Weak Formulation

A weak formulation of (1.1) is typically obtained by multiplying it by a suﬃ-

ciently smooth test function v(x) and integrating the diﬀusion term by parts

on Ω. It will seek u ∈ H01 (Ω) satisfying:

⎧

⎪

⎪ A(u, v) = F (v), ∀v ∈ H01 (Ω), where

⎨

A(u, v) ≡ Ω (a(x) ∇u · ∇v + (b(x) · ∇u) v + c(x) u v) dx (1.2)

⎪

⎪

⎩

F (v) ≡ Ω f v dx,

where the Sobolev space H01 (Ω) is formally deﬁned as below [NE, LI4, JO2]:

H01 (Ω) ≡ v ∈ H 1 (Ω) : v = 0 on ∂Ω ,

while the space H 1 (Ω) is deﬁned as:

1

H (Ω) ≡ v ∈ L2 (Ω) : v21,Ω < ∞ , where

v21,Ω ≡ Ω v 2 + |∇v|2 dx,

for ∇v ≡ ∂x ∂v

1

, . . . , ∂xd . The bilinear form A(., .) will be coercive if:

∂v

A(u, u) ≥ α u21,Ω , ∀ v ∈ H01 (Ω),

for some α > 0 independent of u. Coercivity of A(., .) is guaranteed to hold

by the Poincar´e-Freidrichs inequality, see [NE].

1. Ω1 ∩ Ω2 = ∅. .1 Hybrid Formulations 3 1. Boundaries of the subdomains will be denoted Bi ≡ ∂Ωi∗ and their interior and exterior segments by B (i) ≡ ∂Ωi∗ ∩ Ω and B[i] ≡ ∂Ωi∗ ∩ ∂Ω. JO2. see Fig. BR].1) will yield the system: A u = f. 1. 1.2. Boundaries of the subdomains will be denoted ∂Ωi and their interior and ex- terior segments by B (i) ≡ ∂Ωi ∩ Ω and B[i] ≡ ∂Ωi ∩ ∂Ω. Two subdomain decompositions .3 Discretization A ﬁnite element discretization of (1. 2 will be referred to as an overlapping decomposition of Ω if the following holds: Ω1∗ ∪ Ω2∗ = Ω.4 Subdomain Decompositions We shall employ the following notation. 1. If {φ1 . φn } forms a basis for Vh ∩H01 (Ω). see [ST14. respectively.1. BR28.1. .1. A collection of two open subregions Ωi∗ ⊂ Ω for i = 1. 2 will be referred to as a non-overlapping decomposition of Ω if the following hold: Ω 1 ∪ Ω 2 = Ω. then the ﬁnite element discretization of (1. φj ) for 1 ≤ i. Deﬁnition 1.1) is obtained by Galerkin approximation of (1. .1. respectively. Let Th (Ω) denote a triangulation of Ω with elements of size h and let Vh denote the space of continuous piecewise linear ﬁnite element functions on Th (Ω). Deﬁnition 1.2). Non-overlapping subdomains Overlapping subdomains HH Ω2 Ω2∗ H Ω1 Ω1∗ H HH Fig. 1. . j ≤ n and f i = F (φi ) for 1 ≤ i ≤ n. A collection of two open subregions Ωi ⊂ Ω for i = 1. CI2. We will denote the common interface by B ≡ ∂Ω1 ∩ ∂Ω2 . where Aij = A(φi .

in Ω \Ω i (1.1.5). where B (i) ≡ (∂Ωi∗ ∩ Ω). To obtain a smooth function χi (x). where h0 denotes the diameter of each subdomain Ωi∗ . In applications. in Ω.4 1 Decomposition Frameworks Remark 1. Heuristically. Each χi (x) will be discontinuous across B = ∂Ω1 ∩ ∂Ω2 .5) d1 (x) + d2 (x) By construction. if x ∈ Ω i . in Ω \Ω i (1. be constructed from a nonoverlapping subdomain Ωi by extending it to include all points in Ω within a distance β > 0 of Ωi . nonnegative. and satisfy the desired properties. in Ω i ∗ χi (x) = 0.1). in Ω i χi (x) = 0. a decomposition of Ω into subdomains can be chosen based either on the geometry of Ω or on the regularity of the solution u (if known). Each χi (. if x ∈ Ω i di (x) = ∗ (1.4. each di (x) may ﬁrst be molliﬁed. for 1 ≤ i ≤ 2. a hybrid formulation of (1.5 Partition of Unity A partition of unity subordinate to the overlapping subdomains Ω1∗ and Ω2∗ consists of smooth functions χ1 (x) and χ2 (x) satisfying: ⎧ ∗ ⎨ χi (x) ≥ 0. each di (x) will be continuous. Given a non-overlapping decomposition Ω1 and Ω2 of Ω. formally deﬁne: di (x) χi (x) ≡ . (1. Let di (x) denote the distance function: ∗ dist x.) may be non-zero on B[i] .4) 0. we shall sometimes employ a discontinuous partition of unity satisfying: ⎧ ⎨ χi (x) ≥ 0. in Ω. In applications. is a coupled system of partial diﬀerential equations .6 Hybrid Formulation Let Ω1 and Ω2 (or Ω1∗ and Ω2∗ ) form a decomposition of a domain Ω. a continuous partition of unity subordinate to Ω1∗ and Ω2∗ can be computed as follows. yielding uniform overlap.6) ⎩ χ1 (x) + χ2 (x) = 1. An overlapping subdomain Ωi∗ can. Such a partition of unity may be constructed using di (x) = 1 on Ω i in (1. see [ST9].1. if desired.3.3) ⎩ χ1 (x) + χ2 (x) = 1. 1. Then. each χi (x) may be re- quired to satisfy a bound of the form |∇χi (x)| ≤ C h−1 0 . with support in ∗ Ω i . B (i) . Then. 1. Remark 1.

on each subdomain Ωi (or Ωi∗ ). 2. it must depend continuously on the data. Such coupling must ensure consistency and well posedness. for Dirichlet boundary conditions Ti (wi .8) Here ni denotes the unit exterior normal to B (i) and γ(·) denotes a coeﬃcient function in the Robin boundary condition. Local Problems. a hybrid formulation consists of a local problem posed on each individual subdomain. with one unknown function wi (x). i. γ) = gi .1). The choice of the boundary operator Ti (wi . 1. using a partition of unity χ1 (x) and χ2 (x) appropriate for the subdomains. on Ωi (or Ωi∗ ) T (w . or new variables may be introduced to couple the local problems. (u1 (x).. Typically. (1. Second. and furthermore. the restriction ui (x) of the true solution u(x) of (1.7) by choosing gi (.. however. The second requirement ensures that the hybrid formulation is stable and uniquely solvable. a global functional may be employed. (1. The latter is essential for the stability of a numerical approximation of the hybrid formulation. the solution u(x) of (1. representing the local solution. γ) denotes a boundary operator which enforces either Dirichlet. In some hybrid formulations. u2 (x)) must solve the hybrid formulation. On each subdomain Ωi (or Ωi∗ ). The boundary data gi (. 2. i. Typically.1). Two requirements must be satisﬁed. the hybrid formulation must be well posed as a coupled system.1 Hybrid Formulations 5 equivalent to (1. yielding wi (x) = ui (x) for i = 1.1) to each subdomain Ωi (or Ωi∗ ) must solve the hybrid system.1) can be expressed in terms of the local solutions wi (x) as: u(x) = χ1 (x) w1 (x) + χ2 (x) w2 (x). whose optima is sought. w2 (x)) must exist and be unique.e. First. Neumann or Robin boundary conditions on B (i) : ⎧ ⎪ ⎨ wi . a hybrid formulation will require wi (x) to solve the original partial diﬀerential equation (1. γ) = ni · (a(x)∇wi ) for Neumann boundary conditions ⎪ ⎩ ni · (a(x)∇wi ) + γ wi for Robin boundary conditions. The ﬁrst requirement ensures that the hybrid formulation is consistent with the original problem (1. γ) may diﬀer with each hybrid formulation. fi (x) is f (x) restricted to Ωi (or Ωi∗ ).1): ⎧ ⎨ Lwi = fi . on B (i) for i = 1. on B[i] where Ti (w1 . Matching conditions couple the diﬀerent local prob- lems (1.e. Once the hybrid system is solved.) to ensure that the hybrid formulation is equiv- alent to (1. its solution (w1 (x).) applied to the solution on the adjacent domain. matching conditions are equations satisﬁed by the .7) ⎩ i i wi = 0. along with matching conditions that couple the local problems.1). Matching Conditions. it may also be a control or a Lagrange multiplier function which couples the local problems.) typically corresponds to Ti (. Typically.

Once a hybrid formulation con- sisting of local equations of the form (1. the hybrid formulation may be derived as a saddle point problem (Chap.4 or Chap. g2 ) = 0. overlapping case or they may be diﬀerential constraints. 1. For an elliptic equation. on ∂Ωi∗ ∩ Ωj∗ . as in the preceding constraints. as suggested by elliptic regularity theory [GI]. non-overlapping case ni · (a(x)∇ui ) − ni · (a(x)∇uj ) = 0. Such equations specify gi (. Well Posedness of the Hybrid Formulation. satisfying. for C > 0 independent of the data. (1.1) by directly applying traditional relaxation. where · and | · | are appropriately chosen norms for the solution and data.9) for suitably chosen operators Hi (·) on the interface B (i) . the bound: (w1 + w2 ) ≤ C (|f1 | + |f2 | + |g1 | + |g2 |) . the global solution u(. such as the requirement of continuity of the local solutions ui (x) and uj (x) across adjacent subdomains: ui − uj = 0.9) has been formulated and solved. we require that the hybrid formulation be well posed [SM7. In the lat- ter case. descent or . g1 . Reconstruction of the Global Solution. or indirectly through the use of intermediary variables such as Lagrange multipliers. Matching conditions may be enforced either directly.6 1 Decomposition Frameworks true solution u(x) restricted to the interfaces or regions of overlap between adjacent subdomains.). We shall express general matching conditions in the form: Hi (w1 .10) where χi (x) is a (possibly discontinuous) partition of unity subordinate to the subdomains Ω 1 and Ω 2 (or Ω1∗ and Ω2∗ ). on ∂Ωi∗ ∩ Ωj∗ . on ∂Ωi ∩ ∂Ωj . w2 . EV]. (1. non-overlapping case ui − uj = 0. Iterative Methods. Other diﬀerential constraints may also be employed using linear combinations of the above algebraic and diﬀerential constraints. Domain decomposition iterative algorithms can be for- mulated for solving (1. these may be either algebraic equations. such as continuity of the local ﬂuxes: ni · (a(x)∇ui ) + nj · (a(x)∇uj ) = 0. To ensure that the hybrid formulation is solvable and that it may be approximated numerically by stable schemes.7) for 1 ≤ i ≤ 2 together with equa- tions of the form (1. overlapping case where ni denotes the unit exterior normal to ∂Ωi . for 1 ≤ i ≤ 2.) may be represented in the form: u(x) = χ1 (x) w1 (x) + χ2 (x) w2 (x). 10) of an associated constrained optimization problem. on ∂Ωi ∩ ∂Ωj .

w2 .2. g2 ) = 0. see Fig. using the current iterates on the other subdomains. a descent or saddle point algorithm can be employed. namely. the the local problems and the matching conditions. replacing Ti (wi . Discretization on a Nonmatching Grid.1) on a non-matching triangulation on Ωi (or Ωi∗ ). on B[i] . Non-overlapping subdomains Overlapping subdomains r r r r rHr r r r r r Hr r r Ω2 r Ω2∗ r H r r r Hr r r b b b r br rb r b b b b b H Hbr r b b b b b brH r rb b b b b b b b b b b b b b H b rHrbr b b b ∗ Ω1 b b b b b b b b rH br r b b b b bΩ 1 b b b HrbH r b b b b b b b b b b b bHr rb b rb b b b H r r r r H rH r r HH r r r Fig. 1. j = 1. Each local discretization should be a stable scheme. Given current approximations of w1 . γ) = gi by either of the equations: Hj (w1 . • Let Thi (Ω) (or Thi (Ωi∗ )) denote independent triangulations of Ωi (or Ωi∗ ) with local grid sizes hi . These grids need not match on the region of intersection or overlap between the subdomains.2. The resulting grids. Heuristically. on Ωhi (or Ωh∗i ) T (w . will involve the following steps. • Each local problem in the hybrid formulation can be discretized as: ⎧ ⎨ Ahi whi = f hi . Nonmatching grids . In various applications.1) may be sought by directly discretizing the hybrid formulation.2.1 Hybrid Formulations 7 saddle point algorithms to a hybrid formulation. For instance. on B (i) ⎩ i i wi = 0. g1 . 1. 1. may not match on the regions of intersection between the subdomains. γ) = gi . on B (i) ⎩ hi hi hi whi = 0. 1. a global discretization of (1. Alternatively. g2 update for wi by solving: ⎧ ⎨ Lwi = fi . however. w2 . γ ) = ghi . and are referred to as nonmatching grids. on Ωi (or Ωi∗ ) T (w . 2. it may be of interest to independently triangulate diﬀerent subregions Ωi (or Ωi∗ ) with grids suited to the geometry of each subdomain. each unknown wi may be updated sequentially using a relaxation procedure. see Fig. On such non-matching grids. the construction of a global discretization of equation (1. on B[i] . g1 .

x2 ). 1 ≤ i ≤ 2. Heterogenous Approximation.5. 1.11) will be approximately of elliptic character in Ω2 (or Ω2∗ ). We shall consider an advection dominated equation: − ∆u + b(x) · ∇u + c(x) u = f (x). eﬃcient computational methods may be available for the local problems involved in an heterogeneous partial diﬀerential equation. where 0 < 1 is a small perturbation parameter.2 through 1. and in Chap. where: ⎧ ⎨ L u ≡ L0 u + L1 u L0 u ≡ −∆u ⎩ L1 u ≡ b(x) · ∇u + c(x)u. on ∂Ω. KE5. Depending f (x). care must be exercised in discretizing the matching conditions across the subdomain grids.11). which is of hyperbolic type for x1 > 0 and of elliptic type for x1 < 0. In such cases. Our discussion will be restricted to an elliptic-hyperbolic heterogeneous approximation of a singularly perturbed elliptic equation of heterogeneous character. In various applications. wh2 . there may be a subdomain Ω1 (or Ω1∗ ) on which: |∆u| |b(x) · ∇u + c(x)u|. it may be computationally advantageous to approximate elliptic equation (1.8 1 Decomposition Frameworks • The matching conditions should also be discretized: Hih (wh1 . then equation (1. An example is Tricomi’s equation [JO]: ux1 x1 − x1 ux2 x2 = f (x1 . If Ω2 (or Ω2∗ ) denotes a complementary (layer) region. A partial diﬀerential equation is said to be heterogeneous if its type changes from one region to another. gh2 ) = 0. Such issues are described in Chap. On Ω1 (or Ω1∗ ).11) u = 0. To obtain an heterogeneous approximation of (1. We refer to such models as heterogeneous approximations. . the restriction of elliptic equation Lu = f to the subdomain. in Ω (1. it may be of interest to approximate a partial diﬀerential equation of heterogeneous character by a partial diﬀerential equation of heterogeneous type. approximately satisfying L1 u = f . To ensure the stability and consistency of the global discretization of the hybrid formulation. will be of hyperbolic character. for x ∈ Ω1 (or Ω1∗ ). OM]. Motivated by singular perturbation methodology [LA5. 11.11) by an heterogeneous approximation involving an equation of mixed hyperbolic and elliptic character. gh1 .

H for i = 1. we describe the hybrid formulation underlying the Schwarz alternating method for a two subdomain overlapping decomposition of Ω. and we may approximate (1. Schwarz [SC5] in 1870. Ti and Hi (·). for i = 1. Although Schwarz’s motivation was to study the existence of harmonic functions on irregular regions.7) by: ⎧ ⎨L ˜ i vi = fi . we illustrate the formulation of iterative methods. Similarly. Often. on Ωi (or Ωi∗ ). respectively.2 through Chap. we may approximate (1. see Chap. on B[i] .1). In this section. Using the hybrid formulation. . g˜2 ) = 0. the hybrid formulation underlying Schwarz’s iterative method. 2 ˜ i . 1. Care must be exercised in the selection of approximations since each local problem must be well posed.9) by: ˜ i (v1 . if we choose L ˜ 2 u = Lu on Ω2 (or Ω ∗ ) then the local problem on Ω2 (or Ω ∗ ) will be 2 2 elliptic and Dirichlet boundary or ﬂux boundary conditions can be employed on B (2) and B[2] .5. and heterogeneous approximations for elliptic equation (1. obtained by formally omitting ∆u on Ω1 (or Ω1∗ ). with vi (x) ≈ wi (x). T˜i and H where L ˜ i (·) are heuristic local approximations of Li .5 and Chap.3 and Chap. 2 ⎩ i i ˜ vi = 0. We refer the reader to Chap. and let B (i) = ∂Ωi∗ ∩Ω and B[i] = ∂Ωi∗ ∩ ∂Ω denote the interior and exterior boundary segments of Ωi∗ . T˜ (v .2 Schwarz Framework The framework that we refer to as the Schwarz hybrid formulation is based on the earliest known domain decomposition method. γ) = g˜i . g˜1 . 1. which solves Laplace’s equation on an irregular domain that is the union of regular regions (such as rectangular and circular regions). formulated by H. non-matching grid discretizations. For instance. For instance. We let Ω1∗ and Ω2∗ denote the overlapping subdomains. and it enables the formulation of other divide and conquer approximations. 1. 1. and we must replace Dirichlet boundary conditions on B (1) and B[1] by inﬂow boundary conditions. approximate matching conditions for a heterogeneous problem can also be derived heuristically by a vanishing viscosity approach. now referred to as the Schwarz alternating method. see Fig. applies to a wider class of elliptic equations. v2 . and the global coupled system must also be well posed. if we deﬁne L ˜ 1 u = L1 u on Ω1 (or Ω ∗ ) then 1 the local problem will be hyperbolic. 12 for speciﬁc examples. 1.2 Schwarz Framework 9 we may approximate its hybrid formulation based on Ωi (or Ωi∗ ) for 1 ≤ i ≤ 2.3. Remark 1. A. 1. Schwarz formulated an iterative method. 12. on B ˜ (i) .

on B[2] (1.1) and w1 (x) ≡ u(x) in Ω1∗ and w2 (x) ≡ u(x) in Ω2∗ .3. Deﬁne wi (x) = u(x) on Ωi∗ for 1 ≤ i ≤ 2. on B (1) ⎪ ⎩ (2) 1 2 w2 = w1 . on Ω 2 . then w1 (x) and w2 (x) will satisfy (1. Suppose the following assumptions hold. Proof.12) by construction. suppose that w1 (x) and w2 (x) satisfy (1. Let c(x) ≥ 0 and ∇ · b(x) ≤ 0. 3. if the above coupled. note that w1 (x)−w2 (x) .2. Then the following result will hold: ∗ w1 (x). It will therefore hold that: ⎧ ∗ ⎧ ∗ ⎪ ⎨ Lw1 = f. Then. on B[1] w2 = 0. To this end. is well posed.10 1 Decomposition Frameworks Ω2∗ Ω1∗ B (1) B[2] (2) B B[1] Fig. To prove the converse. We will ﬁrst show that w1 (x) = w2 (x) on Ω1∗ ∩Ω2∗ .12). in Ω2 w1 = 0. Furthermore. on B (2) ⎪ ⎩ ⎪ ⎩ w1 = 0. 2.6. 2.1 Motivation To derive the hybrid formulation underlying Schwarz’s method.1).1). decomposed system for w1 (x) and w2 (x). in Ω1 ⎪ ⎨ Lw2 = f. on Ω 1 u(x) = ∗ w2 (x). We have the following uniqueness result. on B[1] and w2 = 0. by construction L wi = f in Ωi∗ . Theorem 1. in Ω1∗ ⎪ ∗ ⎨ ⎨ Lw2 = f.12) ⎪ ⎩ w = w . Let u(x) denote a suﬃciently smooth solution of equation (1. the original solution can be recovered with u(x) = wi (x) on Ωi∗ for i = 1. Importantly. Let w1 (x) and w2 (x) be suﬃciently smooth solutions of the following sys- tem of coupled elliptic equations: ⎧ ⎧ ⎪ Lw1 = f. Boundary segments for an overlapping decomposition 1. in Ω2 w1 = w2 . If u(x) is a solution of equation (1. 1. on B . the continuity of u will yield matching of w1 and w2 on Ω1∗ ∩ Ω2∗ . on B[2] . 1. on B (1) and w2 = w1 . let u(x) denote the solution of (1. then by solving it.

By uniqueness of L-harmonic functions for c(x) ≥ 0 and ∇ · b(x) ≤ 0.1) and (1. in Ω1∗ ⎪ ⎨ Lw ˜2 = f˜2 . Now let χ1 (x) and χ2 (x) denote a suﬃciently smooth partition of unity subordinate to the cover Ω1∗ and Ω2∗ . it will follow that w1 (x) − w2 (x) = 0 in Ω1∗ ∩ Ω2∗ . Remark 1. 1. on B (2) . in Ω2∗ w ˜1 = 0.8. however. If we deﬁne u(x) = χ1 (x) w1 (x) + χ2 (x) w2 (x). The latter requires that the perturbed system: ⎧ ⎧ ⎪ ⎨ Lw˜1 = f˜1 .12). on B[2] (1.7. Additionally. The preceding theorem yields equivalence between suﬃciently smooth solutions to (1.12). This yields that w1 (x) = w2 (x) on Ω1∗ ∩ Ω2∗ .13) ⎪ ⎩ ⎪ ⎩ ˜1 = w2 + r˜1 . on B[1] and w ˜2 = 0. The above result suggests that given a partition of unity χ1 (x) and χ2 (x) subordinate to Ω1∗ and Ω1∗ . w be uniquely solvable and satisfy a bound of the form: . Remark 1.1) and (1. not a result on the well posedness (stability) of formulation (1. a solution to elliptic equa- tion (1. by construction w1 (x) − w2 (x) will be L-harmonic. on B (1) w ˜2 = w1 + r˜2 .12) and deﬁning: u(x) = χ1 (x) w1 (x) + χ2 (x) w2 (x). respectively. since w1 = w2 in Ω1∗ ∩ Ω2∗ and since Lwi = f in Ωi∗ .1) may be obtained by solving (1. It is. This yields an equivalence between (1. then u(x) will satisfy (1.12) under perturbations of its data.2 Schwarz Framework 11 has zero boundary conditions on ∂ (Ω1∗ ∩ Ω2∗ ).1).

The resulting algorithm is the Schwarz alternating method. BA2] and [MI. and can be motivated heuristically using the block structure of (1. It is robustly convergent for a large class of elliptic equations. . BR18]. 15 for maximum norm well posedness. in Ω1∗ ⎪ ⎨ Lw2 (k+1) = f. MO2. in appropriate norms. MA37.12).2 Iterative Methods The iterative method proposed by H. It is sequential in nature and summarized below. See Chap. LI6. on B and w2 = w1 . it can (k) be updated by solving the block equation of (1. on B[1] w2 = 0. in Ω2∗ (k+1) (k) (1) (k+1) (k+1) w1 = w2 . (|w ˜2 |) ≤ C f˜1 + f˜2 + ˜ ˜1 | + |w r1 + ˜ r2 . on B (2) ⎪ ⎩ (k+1) ⎪ ⎩ (k+1) w1 = 0.2. A. see [SO. 1.12) posed on subdomain Ωi∗ with boundary conditions w1 = w2 on B (1) or w2 = w1 on B (2) approximated by the current iterate on its adjacent subdomain: ⎧ ⎧ ⎪ ⎨ Lw1 (k+1) = f. DR11. If wi denotes the k’th iterate on subdomain Ωi∗ . Schwarz is a very popular method for the solution of elliptic partial diﬀerential equations. on B[2] . LI7.

4. new iterates are computed as follows. referred to as the additive Schwarz algorithm [DR11]. respectively. Under suitable assumptions on the coeﬃcients of the elliptic equation and overlap amongst the subdomains Ωi∗ . For k = 0. 1. Below. Then. see [DR11. 1. Solve for w1 as follows: ⎧ ⎪ ⎨ Lw1 (k+1) = f1 . see Chap. Solve for w2 as follows: ⎧ ⎪ ⎨ Lw2 (k+1) = f2 . on B[1] . on Ω2∗ (k+1) w2 v (k+1) ≡ v (k+1/2) .2. we describe an unaccelerated parallel Schwarz algorithm which requires the concurrent solu- tion of subdomain problems. 2. Let wi denote the (k) ∗ k’th iterate on Ωi for 1 ≤ i ≤ 2.12 1 Decomposition Frameworks Algorithm 1. MA33.9.1 (Schwarz Alternating Method) Let v (0) denote the starting global approximate solution. requiring the solution of one subdomain problem prior to another. CA19. It is motivated by a popular parallel method. . to the entire domain Ω.1). the iterates v (k) converge geometrically to the true solution u of (1. in Ω1∗ (k+1) w1 = v (k) . which is employed typ- ically as a preconditioner. TA5]. on Ω \Ω2∗ .5 when b(x) = 0. on B (2) Deﬁne v (k+1) as follows: . on Ω \Ω1∗ . Deﬁne v (k+1/2) as follows: . (k+1) 3. Endfor Output: v (k) 1 Remark 1. · · · . on B (1) ⎪ ⎩ (k+1) w1 = g. until convergence do: (k+1) 2. FR8. The iterates v (k+ 2 ) and v (k+1) in the preceding algorithm are (k+1) (k+1) continuous extensions of the subdomain solutions w1 and w2 . on Ω1∗ (k+1) w1 v (k+1/2) ≡ (k) v . in Ω2∗ (k+1) w2 = g. The preceding Schwarz algorithm is sequential in nature. The algorithm we describe is based on a partition of unity χ1 (x) and χ2 (x) subordinate to the overlapping subdomains Ω1∗ and Ω2∗ . on B[2] ⎪ ⎩ (k+1) w2 = v (k+1/2) .

however. 1. as in Fig. 1. 1.12).12) based on a two subdomain decomposition of Ω.1). 2. For details.3 Global Discretization An advantage of the hybrid formulation (1. may be nonconforming along the internal boundaries B (i) of the subdomains. using ﬁnite diﬀerence schemes on the subdomains. We triangulate each subdomain Ωi∗ for 1 ≤ i ≤ 2 by a grid Thi (Ωi∗ ) of size hi as in Fig. 1. 3. see Chap.4. The additive Schwarz method [DR11] is also introduced there.12) is that novel discretizations of (1. In practice. In Chap. Remark 1. Endfor (k) (k) Output: (w1 .2.2 Schwarz Framework 13 Algorithm 1. 11. Matrix versions of Schwarz algorithms are described in Chap. on B (i) ⎪ ⎪ ⎩ (k+1) wi = 0. For i = 1. 2 determine wi in parallel: ⎧ ⎪ ⎪ (k+1) in Ωi∗ ⎨ Lwi = f.1). There the multisubdomain case is considered. w2 ) If c(x) ≥ c0 > 0 and there is suﬃcient overlap. we block partition the .4. For k = 0. On each subdomain. given a discretization of (1. corresponding to a generalized block Jacobi method. 1. 1. 15. Endfor 4. we outline the construction of a global ﬁnite diﬀerence discretiza- tion of (1. the iterates v (k) deﬁned by: (k) (k) v (k) ≡ χ1 (x) w1 (x) + χ2 (x) w2 (x). (k+1) (k) (k) wi = χ1 (x) w1 (x) + χ2 (x) w2 (x).2 (Parallel Partition of Unity Schwarz Method) (0) (0) Let w1 . on B[i] . 2 it is observed that the matrix version of the Schwarz alternating method corresponds to a generalization (due to overlap) of the traditional block Gauss-Seidel iterative method. resulting in a possibly non-matching grid. The resulting solution. Furthermore. · · · . each local problem may be discretized using tradi- tional techniques suited to the local geometry and properties of the solution. see Fig.1) may be obtained by discretizing (1. Below. see Chap. 1. and coarse space correction is introduced.4.2. w2 denote starting local approximate solutions. will converge geometrically to the solution u of (1. Each subdomain Ωi∗ may be independently triangulated. The local triangulation can be suited to the geometry and regularity of the solution on Ωi∗ . until convergence do: (k+1) 2. and care must be exercised in discretizing the matching conditions to ensure that the global discretization is stable. which is essential for robust convergence. discrete versions of the above algorithms must be applied.10.

Nonmatching overset grids local discrete solution whi on Thi (Ωi∗ ) as: . 1.4.14 1 Decomposition Frameworks Th2 (Ω2∗ ) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q a a a a a aqq q aq q q qa q q q qa q q aq q q q q q q a a a a a aq q aq qa q qa q aq q q a a a a a aqq q aq q q qa q q q qa q q aq q q q q q q a a a a a aqq q aq q q qa q q q qa q q aq q q q q q q a a a a a aq q aq qa q qa q aq q q a a a a a a a a a a a a a a a a a a a a Th1 (Ω1∗ ) Fig.

on each boundary segment B (i) . wB (i) . By (i) assumption on the boundary values of whi on B[i] . for i = 1.14) ⎪ (2) (2) (2) (2) ⎪ AII wI + AIB (2) wB (2) = f h2 . for 1 ≤ i ≤ 2. wB[i] . then this discretization step would be trivial. 2 discretize the elliptic equation Lwi = fi on Ωi∗ by employing a stable scheme on Thi (Ωi∗ ) and denote the discretization as: (i) (i) (i) (i) AII wI + AIB (i) wB (i) = f hi . respectively. . Next. The global discretization now will have the following block matrix form: ⎧ ⎪ ⎪ (1) (1) (1) (1) AII wI + AIB (1) wB (1) = f h1 . discretize the inter-subdomain matching conditions w1 = w2 on B (1) and w2 = w1 on B (2) by applying appropriate interpolation stencils or by discretizing its weak form. Here Ihh21 will denote a matrix of size m1 × (n2 + m2 + l2 ) and Ihh12 will denote a matrix of size m2 × (n1 + m1 + l1 ). for nonmatching grids care must be exercised to ensure stability of the global scheme. If interpo- (1) lation stencils are employed. 2 corresponding to the grid points in the interior and the boundary segments B (i) and B[i] . for i = 1. then the value wh1 (x) at a grid point x on Bh1 may be expressed as a weighted average of nodal values of wh2 (·) on the grid points of Ωh∗2 . on B (i) and B[i] . mi and li denote the number of grid points of triangulation Thi (Ωi∗ ) in the interior of Ωi∗ . B (2) h1 h1 This algebraic system can be solved by the Schwarz alternating method. ⎪ ⎪ ⎪ ⎩ (2) w = I h2 w . ⎪ ⎪ ⎪ ⎨ (1) wB (1) = Ihh21 wh2 (1. respectively. However. Let ni . If the local grids match on each segment B[i] . T (i) (i) (i) whi = wI . We denote the discretized matching conditions as: (1) (2) wB (1) = Ihh21 wh2 and wB (2) = Ihh12 wh1 . it will hold that wB[i] = 0. Next.

the singularly perturbed elliptic equation may be approximately of hyperbolic character on some subregions and of elliptic character elsewhere.2 Schwarz Framework 15 Remark 1. ⎪ ⎩ (1) w1 = w2 . KE5].2. however. where. Here. If c(x) ≥ c0 > 0 and the local discretizations satisfy a discrete maximum principle. Motivated by singular perturbation theory [LA5. on Bin . . on ∂Ω. on B[1].in ≡ {x ∈ B[1] : b(x) · n(x) < 0} (1) Bin ≡ {x ∈ B (1) : b(x) · n(x) < 0}. In this case. then the above discretization can be shown to be stable and convergent of optimal order in the maximum norm. 11.in . see Chap. 1. in Ω (1. Suppose the overlapping subdomain Ω1∗ can chosen such that: ∗ |∆u(x)| |b(x) · ∇u(x) + c(x) u(x)| for x ∈ Ω 1 . and if the overlap is suﬃciently large so that a certain contraction property holds. the Dirichlet boundary value problem on Ω1∗ must be replaced by suitable inﬂow boundary conditions. where 0 < 1 is a perturbation parameter. we illustrate the construction of an elliptic- hyperbolic approximation of an advection dominated elliptic equation: L u ≡ − ∆u + b(x) · ∇u + c(x) u = f. motivating a heterogeneous approximation.12). Then. in Ω1 w1 = 0. a global heterogeneous approximation of the singularly perturbed equation (1. To ensure well posedness of the local subproblems.15) u = 0.4 Heterogeneous Approximation A heterogeneous approximation of a partial diﬀerential equation is a model system of partial diﬀerential equations in which the problems posed on dif- ferent subdomains are not all of the same type. the inﬂow boundary segments are deﬁned by: B[1]. 1.11. depending on the solution u. Such approximations may be useful if there is a reduction in computational costs resulting from the use of a heterogeneous model. on Ω1∗ the term L u may be approximated by L0 u deﬁned by: L0 u ≡ b(x) · ∇u + c(x)u. due to the hyperbolic nature of L0 w1 = f1 : ⎧ ∗ ⎪ ⎨ L0 w1 = f1 .15) may be sought by replacing the elliptic equation L w1 = f1 on Ω1∗ by the hyperbolic equation L0 w1 = f1 within the Schwarz hybrid formulation (1. if the inter-grid interpolations Ihh12 and Ihh21 are convex weights.

The resulting global heterogeneous approximation will be: ⎧ ⎧ ⎪ ⎪ L0 w1 = f1 .in and w2 = 0. on B (1. 1.3. AG. The underlying principle states that across any interface within a conduct- ing medium. on Bin w2 = w1 . in Ω2∗ ⎨ ⎨ w1 = 0. em- ployed in the study of electric ﬁelds in conductors [PO.17) u = 0. ST8. Remark 1. while the second condition requires the local ﬂuxes n1 · (a∇w1 − b w1 ) and n1 · (a∇w2 − b w2 ) associated with w1 and w2 to also match on B. as well as bounds on the error resulting from such approximation are discussed in Chap. see Chap.16) ⎪ ⎪ ⎪ ⎪ ⎩ (1) ⎩ w1 = w2 .12. 2. LE12. on B (2) . 12. will hold on the interface B for smooth solutions: w1 = w2 . Ω2 denote a non-overlapping decomposition of Ω. on ∂Ω. be continuous. Let ni (x) denote the unit outward normal vector to ∂Ωi at the point x ∈ B. The mathematical version of this principle suggests a hybrid formulation for a 2nd order elliptic equation given a two subdomain non-overlapping decomposition of its domain.1) posed on Ω: L u ≡ −∇ · (a(x) ∇u) + b(x) · ∇u + c(x) u = f.5. as in Fig. the electric potential as well as the ﬂux of electric current must match.3 Steklov-Poincar´ e Framework The hybrid formulation that we refer to as the Steklov-Poincar´e framework is motivated by a principle in physics referred as a transmission condition.e. in Ω1∗ ⎪ ⎪ Lw2 = f2 . This heterogeneous system can be discretized. QU5]. 15.18) n1 · (a∇w1 − b w1 ) = n1 · (a∇w2 − b w2 ) . separated by an interface.16 1 Decomposition Frameworks where n(x) denotes the exterior unit normal to ∂Ω1∗ at x. and the resulting algebraic system can be solved by the Schwarz alternating method. on B[1]. For i = 1. the following transmission conditions. Then. i. Well posedness of this heterogeneous system. in Ω (1. which are derived later in this section. on B. The ﬁrst condition requires the subdomain solutions w1 and w2 to match on B. on B[2] (1.1 Motivation Consider elliptic equation (1. with interface B = ∂Ω1 ∩ ∂Ω2 separating the two subdomains and B[i] ≡ ∂Ωi ∩ ∂Ω.. 1. . Let Ω1 . denote the solution on each subdomain Ωi by wi (x) ≡ u(x). 1.

GA14. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 − w2 = 0. since w1 (x) = w2 (x) on B and since n1 (x) = −n2 (x) on B. In this section. AC7.1) is continuous. on B. see [QU6.5. If the coeﬃcient b(x) in elliptic equation (1. the following equivalent ﬂux transmission condition is preferred in several domain decomposition methods: 1 1 n1 · a∇w1 − b w1 + n2 · a∇w2 − b w2 = 0. . 2 2 for continuous b(x).1): ⎧ ⎪ ⎪ Lw1 = f.18).1). on B ⎪ ⎩ 1 1 1 2 2 w2 = 0.13. in Ω2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n · (a∇w − b w ) + n · (a∇w − b w2 ) = 0. on B[1] ⎪ ⎪ Lw2 = f. In particular. we shall outline how this hybrid formulation can be employed to formulate novel domain decomposition iterative methods. A two subdomain non-overlapping decomposition Combining the transmission conditions with the elliptic equation on each subdomain. then the ﬂux boundary condition may also be equivalently stated as: n1 · (a∇w1 ) + n2 · (a∇w2 ) = 0. by taking linear combinations of (1. 1. Equivalence of the Steklov-Poincar´e hybrid formulation is shown next. on B[2] .3 Steklov-Poincar´e Framework 17 Ω2 - B n1 Ω1 B[2] B[1] Fig. 1. RA3]. Remark 1. on B. on B ⎪ ⎪ ⎪ ⎨ w1 = 0. yields the following hybrid formulation equivalent to (1. discretization methods and heterogeneous approximations for (1.

By construction. w1 (x) = u(x). Then. 1. Proof.1). If v is chosen to be of compact support in Ω and not identically zero on B. Let w1 (x) and w2 (x) be smooth solutions of the following coupled system of partial diﬀerential equations: ⎧ ⎪ ⎪ Lw1 = f. Substituting that Lwi = f on Ωi . B yielding the result that n1 · (a∇w1 − b w1 ) = n1 · (a∇w2 + b w2 ) on B.14. for v ∈ C0∞ (Ω). To verify that the local ﬂuxes match on B. in Ω1 ⎪ ⎪ ⎪ ⎪ w 1 = 0.1) with smooth coeﬃcient b(x) and solution u. we will verify that (w1 . Remark 1. it follows that: n1 · (a∇w1 − b w1 − a∇w1 + b w1 ) v dsx = 0. on Ω 2 . i=1 Ωi i=1 Ωi for v ∈ C0∞ (Ω). the following result will hold. Lwi = f in Ωi and wi = 0 on B[i] .1) and wi ≡ u on Ω i . employ the following weak formulation of (1.15. on Ω 1 w2 (x) = u(x). in Ω2 ⎪ ⎪ ⎪ ⎪ w2 = 0. and express each integral on Ω as a sum of integrals on Ω1 and Ω2 .18 1 Decomposition Frameworks Theorem 1. on B (1.19) ⎪ ⎪ Lw2 = f.19). w2 ) solves (1. The above result only demonstrates the equivalence of solu- tions to both systems. on B[2] ⎪ ⎩ n1 · (a∇w2 − b w2 ) = n1 · (a∇w1 − b w1 ) . Suppose the following assumptions hold. then integration by parts yields: 2 i=1 Ωi −∇ · (a∇wi ) v + (b · ∇wi ) v + c wi v dx 2 − B n1 · (a∇w1 − b w1 − a∇w2 + b w2 ) v dsx = i=1 Ωi f v dx.19). Suppose u is a smooth solution to (1. It does not guarantee well posedness of hybrid for- mulation (1. Let L u be deﬁned by (1. ∀v ∈ C0∞ (Ω). on B[1] ⎪ ⎨ w1 = w2 . to obtain: 2 2 (a∇wi · ∇v − wi ∇ · (b v) + c wi v) dx = f v dx. on B. 2. The converse can be veriﬁed analogously. This may be demonstrated using elliptic regularity theory in . By continuity of u (or an application of the trace theorem). we obtain that w1 = w2 on B.

S(g. corresponding to this choice of interface data g(·). in Ω1 ⎪ ⎨ Lw2 = f2 . As a result.19) may be reduced to the search for interface data g(·) which solves the Steklov-Poincar´e problem (1. f2 ) on g(·) will be aﬃne linear. where X = H00 (B) for a standard subdomain decomposition and 1/2 X = H 1/2 (B) for an immersed subdomain decomposition. i. It will map the Dirichlet data g(·) on B to the jump in the local ﬂuxes (Neumann data) across interface B using (1.20) will yield the solution to (1. on B.19).20).21) then. For such interface data g(·). f2 ) = 0. Importantly. on B n1 · (a∇w1 − b w1 ) = n1 · (a∇w2 − b w2 ) . we deﬁne a Steklov-Poincar´e operator S(g. .e. 1.20) ⎪ ⎩ ⎪ ⎩ w1 = g. on B. if X denotes the space of Dirichlet data on B.3 Steklov-Poincar´e Framework 19 appropriately chosen norms (however. the ﬂux or Neumann data will belong to its dual space X . w2 ) to (1. then the action of the Steklov-Poincar´e operator S(g. on B[1] and w2 = 0 on B[2] (1. (1. which represents hybrid formulation (1. w2 ) will solve (1. in Ω2 w1 = 0. referred to as a Steklov-Poincar´e operator. the local solutions (w1 . f1 .16. so that (w1 . the local solutions w1 (·) and w2 (·) to (1. given a solution (w1 . w2 = g.19) is well posed.21). where w1 (·) and w2 (·) are solutions to the following problems: ⎧ ⎧ ⎪ ⎨ Lw1 = f1 . We now introduce an operator. w2 ) to problem (1. If the local forcing terms f1 (·) and f2 (·) are nonzero. Deﬁnition 1.19) more compactly.20) will satisfy: w1 = w2 (= g). if an interface function g(·) can be found which yields zero jump in the ﬂux across B.19) with g(x) = u(x) on B. w2 ) to (1. f2 ) ≡ n1 · (a∇w1 − b w1 ) − n1 · (a∇w2 − b w2 ) .19). thus yielding a solution u to (1. When a weak formulation is used. f1 . When system (1. f1 . the search for a solution (w1 .1). we shall omit this). we may deﬁne: w1 in Ω 1 u≡ w2 in Ω 2 . f2 ) as follows: S (g. Given suﬃciently smooth Dirichlet boundary data g(·) on the interface B. on B. f1 .

f2 ) ≡ S (1) (g. for w1 and w2 deﬁned by (1. then the system of equations posed on subdomain Ωi in (1. and let 0 < θ < 1 denote a relaxation parameter required to ensure convergence [BJ9. They are referred to as pseudo- diﬀerential operators.20). MA29]. .19) can be solved to yield updates (k+1) wi for the local solutions. FU. the Steklov-Poincar´e operator S may be expressed as the sum of two subdomain operators: S(g. if w1 and w2 denote the k’th iterates on subdomains Ω1 and Ω2 .19). For instance. global dis- cretizations and heterogeneous approximations can be constructed for the original problem (1. 1. and is referred to as a Dirichlet-Neumann algorithm as it requires the solution of Dirichlet and Neumann boundary value problems. and for the correct choice of Dirichlet interface data g(·) on B. f2 ). the maps S (i) are commonly referred to as lo- cal Dirichlet to Neumann maps. The resulting iterative algorithm sequentially enforces either the continuity or ﬂux transmission boundary conditions on B. obtained by solution of the local problems (1.20 1 Decomposition Frameworks Remark 1. These Dirichlet to Neumann maps are not diﬀerential operators since the solutions wi to (1.3. Both S (1) and S (2) map the Dirichlet interface data g(·) pre- scribed on B to the corresponding Neumann ﬂux data n1 · (a∇w1 − b w1 ) and n2 · (a∇w2 − b w2 ) on B.18. where S (1) (g. respectively. In the rest of this section. BR11. Remark 1.17. As a result. For computational purposes. each operator S (i) will require only subdomain information and will be aﬃne linear.2 Iterative Methods The block structure of the Steklov-Poincar´e system (1.19) suggests various (k) (k) iterative algorithms for its solution. f1 ) + S (2) (g. f2 ) ≡ n2 · (a∇w2 − b w2 ) .20). In the following.1) using the Steklov-Poincar´e formulation (1. respectively. we outline how iterative methods. By deﬁnition. f1 ) ≡ n1 · (a∇w1 − b w1 ) S (2) (g. suppose that b(x) = 0 in Ω.20) have representations as integral operators acting on the data g. f1 . the jump in the Neumann data on B will be zero for the local solutions. with boundary conditions chosen using preced- ing iterates.

1. · · · . in Ω2 (k+1) . Solve for w1 as follows: ⎧ ⎪ ⎨ Lw1 (k+1) = f1 . Solve for w2 as follows: ⎧ (k+1) ⎪ ⎨ Lw2 = f2 . 1. on B ⎪ ⎩ (k+1) w1 = 0.3 Steklov-Poincar´e Framework 21 Algorithm 1. (k+1) 3. in Ω1 (k+1) (k) w1 = v2 .1 (Dirichlet-Neumann Algorithm) (0) (0) (0) Let v2 (where v2 ≡ w2 on B) denote a starting guess.3. For k = 0. 1. until convergence do: (k+1) 2. on B[1] .

. w2 = 0.

Below. Remark 1. DO18.20. 3. on B[2] ⎪ ⎩ n a∇w(k+1) = n a∇w(k+1) . the local solution w1 matches v2 on B (however. 5. and additional restrictions on the parameter 0 < θ < 1. Update: v2 = θ w2 + (1 − θ)v2 on B. Various algorithms have been proposed which solve subdomain problems in parallel. w2 ) (k+1) (k) Remark 1. on B. Multidomain matrix versions of such algorithms are described in Chap. DR18. Under restrictions on the coeﬃcients (such as b(x) ≡ 0 and c(x) ≥ 0). see [FU. MA29]. We assume b(x) = 0. DE3. 2 2 2 1 (k+1) (k+1) (k) 4. RA3]. This step requires the solution of an elliptic equation on Ω1 with Dirichlet conditions on B[1] and B. . the iterates (k) wi in the Dirichlet-Neumann algorithm will converge geometrically to the true local solution wi of (1. In step 3. GA14. Endfor (k) (k) Output: (w1 . we de- scribe a two fractional step algorithm. DO13. MA14.19. QU6.19) as k → ∞. the local ﬂuxes may not match on B). (k+1) (k+1) (k+1) the ﬂux of w2 matches the ﬂux of w1 on B (though w2 may not (k+1) match w1 on B). see [BO7. In step 2. This step requires the solution of an elliptic equation on Ω2 with Dirichlet conditions on B[2] and Neumann conditions on B. each step requiring the solution of subdomain problems in parallel [DO13. AC7. YA2]. The preceding Dirichlet-Neumann algorithm has sequential steps. A matrix formulation of this algorithm is given in Chap. 3.

3. Let 0 < θ. · · · . δ. For k = 0. 1. β.2 (A Parallel Dirichlet-Neumann Algorithm) (0) (0) Let w1 and w2 denote a starting guess on each subdomain. until convergence. 1. α < 1 denote relaxation parameters.22 1 Decomposition Frameworks Algorithm 1.

do: .

on B 2. in Ω1 ⎪ ⎪ ⎨ ⎨Lw2 1 = f. in Ω2 (k+ 12 ) (k+ 2 ) . (k+ 21 ) (k) (k) µ = θ n1 · a∇w1 + (1 − θ) n1 · a∇w2 . In parallel solve for w1 and w2 ⎧ ⎧ (k+ 12 ) (k+ 12 ) ⎪ ⎪ Lw1 = f. (k+ 12 ) (k+ 12 ) 3. Compute (k+ 12 ) (k) (k) g = δ w1 + (1 − δ) w2 . on B.

⎩ w(k+ 2 ) = g(k+ 12 ) . on B. 1 2 ⎧ . on B[2] ⎪ ⎪ 1 ⎪ ⎪ 1 ⎩n1 · a∇w(k+ 2 ) = µ(k+ 12 ) . w1 = 0. on B. on B[1] and w2 = 0.

.

In parallel solve for w1 and w2 ⎧ ⎧ (k+1) ⎪ (k+1) ⎨ Lw1 = f. (k+1) (k+1) 5. in Ω1 ⎪ ⎨ Lw2 = f. Compute ⎪ ⎪ (k+ 1 ) (k+ 1 ) ⎪ ⎪ g(k+1) = α w1 2 + (1 − α) w2 2 . on B[1] and . ⎪ ⎨ 1 2 on B 4. in Ω2 (k+1) (k+1) w1 = 0. ⎩ on B. ⎪ (k+ 12 ) (k+ 12 ) ⎪ ⎪ µ(k+1) = β n2 · a∇w + (1 − β) n2 · a∇w .

on B. α. w1 =g . a parallel algorithm. . referred to as a Robin-Robin algorithm can also be used [QU6. see [DO13. ˜i = 2 when i = 1 and ˜i = 1 when i = 2). and the relaxation parameters θ. DO18]. GA14. the Robin-Robin algorithm has the following form. For related parallel algorithms. β. AC7. ˜i will denote a complementary index to i (namely. Endfor (k) (k) Output: (w1 .21. 2 denote a local Robin boundary operator on B for i = 1. Let: 1 Φi (w) ≡ ni · a(x)∇w − b(x) w + zi (x) w. Then. When the advection coeﬃcient b(x) = 0. For convenience. on B[2] ⎪ ⎩ (k+1) (k+1) ⎪ ⎩ n · a∇w(k+1) = µ(k+1) . on B. w2 ) Remark 1. 2 2 6. 2 for an appropriately chosen bounded interface function zi (x) > 0. this parallel algorithm will converge geometrically [YA2]. Under appropriate restrictions on the coeﬃcients a(x) and c(x). w 2 = 0. RA3]. δ.

3. · · · .3 Steklov-Poincar´e Framework 23 Algorithm 1.3 (A Robin-Robin Algorithm) (0) (0) Let w1 and w2 denote a starting guess on each subdomain Let 0 < θ < 1 denote a relaxation parameter 1. 2 in parallel solve: ⎧ (k+1) ⎪ ⎨ Lwi = fi . 1. in Ωi (k+1) . 1. For i = 1. For k = 0. until convergence do: 2.

. i w = 0.

.

19) is that each subdomain Ωi can be independently triangulated. DO4] and in the context of spectral methods. see [QU6.6. 1. When (c(x)− 12 ∇·b(x)) ≥ β > 0. Nonmatching local grids . A potential advantage of discretizing (1. see Fig.6. 1.1). Endfor 4. care must be exercised in discretizing the transmission conditions so that the resulting global discretization is stable. AG2.19) can be used to construct a global discretization of (1. on B[i] ⎪ ⎩ Φ w(k+1) = θ Φ w(k) + (1 − θ) Φ w(k) . 1.19) using ﬁnite element methods. AC7. r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r b b b b b b b b b rb r r r r r r r r r r r r r r r r r r r b b b b b b b b b rb r r r r r r r r r r r r r r r r r r r b b b b b b b b b rb r r r r r r r r r b b b b b b b b b rb r r r r r r r r r r r r r r r r r r r b b b b b b b b b rb r r r r r r r r r b b b b b b b b b b Th2 (Ω2 ) b b b b b b b b b b Th1 (Ω1 ) Fig. see [AG. Below. however. see [MA4.22. RA3]. we heuristically outline the general stages that would be involved in discretizing (1. and each subproblem may be discretized independently. for a suitable choice of relaxation parameter 0 < θ < 1 and zi (x) > 0.3 Global Discretization Hybrid formulation (1. Endfor (k) (k) Output: (w1 . However. on B i i i i ˜i ˜i 3. by methods suited to the local geometry and regularity of the local solution. Such discretizations have not been studied extensively.3. GA14. PH]. the Robin-Robin iterates will converge geometrically. w2 ) Remark 1.

ABI ABB wB ghi (i) (i) where wI denotes the interior unknowns on Ωhi and wB denotes the bound- ary unknowns on B associated with the discrete solution on Thi (Ωi ). Sepa- rately discretize the two transmission conditions on B: w1 = w2 . ∀µ ∈ Yh (B). On each subdomain Ωi . employ a traditional method to discretize the following Neumann problem: ⎧ ⎨ Lwi = f. we indicate how each transmission condition can be discretized by a “mortar” element type method. Below. on B. Then the continuity equation w1 = w2 on B may be discretized by a Petrov-Galerkin approximation of its weak form: (w1 − w2 ) v dsx = 0. on B.6. Examples of such spaces are described in Chap. Xh (B) is typically chosen as a ﬁnite element space deﬁned on a triangulation of B inherited from either triangulation Th1 (Ω1 ) or Th2 (Ω2 ). 1. v ∈ Xh (B). For deﬁnite- ness suppose Xh (B) = Xh1 (B) is chosen to be of dimension m1 based on the triangulation of B inherited from Th1 (Ω1 ).24 1 Decomposition Frameworks On each subdomain Ωi . on B[i] ⎩ ni · (a∇wi − b wi ) = gi . as in Fig. B . the discretized continuity transmission condition will have the following matrix form: (1) (2) M11 wB = M12 wB . in Ωi wi = 0. generate a grid Thi (Ωi ) of size hi suited to the local geometry and solution. denote the resulting local discretization by: (i) (i) (i) AII AIB wI f hi (i) (i) (i) = . respectively. 11. B where Xh (B) denotes some appropriately chosen subspace of L2 (B). where M11 and M12 are m1 × m1 and m1 × m2 mass matrices. Then. they will be referred to as nonmatching grids. Let ni and mi denote the number of unknowns (i) (i) in wI and wB respectively. In a mor- tar element discretization. on B n1 · (a∇w1 − b w1 ) = n1 · (a∇w2 − b w2 ) . wB ) may be nonmatching on B. If the resulting local grids do not match along B. (i) (i) Since the grid functions (wI . care must be exercised to ensure well posedness and stability of this discretization. where ni denotes the exterior unit normal to ∂Ωi and the ﬂux data gi is to be chosen when the transmission conditions are applied. The ﬂux transmission condition on B may be similarly discretized: (n1 · (a∇w1 − b w1 ) − n1 · (a∇w2 − b w2 )) µ dsx = 0. Employing block matrix notation.

since Xh (B) = Xh1 (B) is of dimension m1 . it will be preferable that Yh (B) be chosen using the complementary triangulation. Again. we choose Yh (B) = Yh2 (B) of dimension m2 based on triangulation Ωh2 . to ensure that the total number of equations equals the total number of unknowns in the global system. This will yield m2 constraints. However. In the above example. 1. Yh (B) may be chosen as a ﬁnite element space deﬁned on the triangulation of B inherited from either triangulation Ωh1 or Ωh2 . which we denote as: .3 Steklov-Poincar´e Framework 25 where it is suﬃcient to choose Yh (B) ⊂ H01 (B).

.

The actual choice of subspaces Xh1 (B) and Yh2 (B) will be critical to the stability of the resulting global discretization: ⎧ ⎪ ⎪ (1) (1) (1) (1) AII wI + AIB wB = f h1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (1) M11 wB = M12 wB (2) ⎪ ⎪ (2) (2) (2) (2) AII wI + AIB wB = f h2 ⎪ ⎪ ⎪ ⎪ . (1) (1) (1) (1) (1) (2) (2) (2) (2) (2) M21 ABI wI + ABB wB − f B = −M22 ABI wI + ABB wB − f B . where M21 and M22 are m2 × m1 and m2 × m2 matrices. respectively. The (i) interface forcing terms f B have been added to account for the approximation resulting from integration by parts.

.

⎪ ⎩ M21 A(1) w(1) + A(1) w(1) − f (1) = −M22 A(2) w(2) + A(2) w(2) − f (2) . We would then obtain M11 = M12 . then m1 = m2 . If the grids Th1 (Ω1 ) and Th2 (Ω2 ) match on B.23. Similarly.19) are not known to the author. and this scheme was heuristically considered only for its intrinsic interest. yielding: (1) (2) wB = wB . M21 = M22 will be square and nonsingular yielding: . both square and nonsingular. BI I BB B B BI I BB B B General theoretical results on the stability of such discretizations of (1. Remark 1.

.

4 Heterogeneous Approximations A heterogeneous approximation of a partial diﬀerential equation is a coupled system of partial diﬀerential equations which approximates the given equa- tion. (1) (1) (1) (1) (1) (2) (2) (2) (2) (2) ABI wI + ABB wB − f B = − ABI wI + ABB wB − f B . in which the approximating partial diﬀerential equations are not of the same type in diﬀerent subregions [GA15. In the following. QU6].1). The resulting global discretization will then correspond to the standard ﬁnite element discretization of (1. motivated .3. 1.

19) will be employed to heuristically approximate (1.).22) may be obtained by substituting the preceding approximation in the hybrid formulation corresponding to (1. Suppose Ω1 and Ω2 form a non-overlapping decomposition of Ω such that: |∆u| |b · ∇u + c u| . Then. LA5].) will yield an ill-posed problem for w1 (. on ∂Ω. QU6]. Formally. . the boundary conditions w1 = 0 on B[1] and w1 = w2 on B can be replaced by inﬂow boundary conditions w1 = 0 on B[1]. on ∂Ω. where L0 u ≡ b(x) · ∇u + c(x)u. respectively.η L v = f. on Ω 1 . Accordingly.in and w1 = w2 on Bin . resolves this local ill-posedness on Ω1 . yielding: ⎧ ⎪ ⎪ L0 w1 = f. denote the inﬂow and outﬂow boundary segments on B and B[1] by: ⎧ ⎪ ⎪ B ≡ {x ∈ B : n1 · b(x) < 0} ⎨ in Bout ≡ {x ∈ B : n1 · b(x) > 0} ⎪ ⎪ ⎩ B[1]. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = 0. Since L0 w1 = f is hyperbolic. on B ⎪ ⎪ L w2 = f.22) u = g. we may approximate L u = f by L0 u = f . However. The Steklov-Poincar´e hybrid formulation (1. on Ω v = 0. on B[1] ⎪ ⎨ w1 = w2 . a subdomain vanishing viscosity approach may be employed as in [GA15]. retaining the Dirichlet boundary conditions on B and B[1] for w1 (. in Ω (1. speciﬁcation of Dirichlet or Neumann boundary conditions on the entire boundary ∂Ω1 will yield a locally ill posed problem. see [GA15. since L0 w1 is hyperbolic on Ω1 . on B. a global heterogeneous approximation of (1.26 1 Decomposition Frameworks by classical singular perturbation approximations [KE5. Thus.22).in ≡ {x ∈ B[1] : n1 · b(x) < 0}. on subdomain Ω1 . To deduce the remaining transmission boundary conditions in the heterogeneous approximation. on B[2] ⎪ ⎩ n1 · (∇w2 − b w2 ) = n1 · (∇w1 − b w1 ) . where 0 < 1 is a perturbation parameter. Fortunately. the elliptic equation L u = f may be approximated by the discontinuous coeﬃcient elliptic problem: . we heuristi- cally outline how an elliptic-hyperbolic heterogeneous approximation can be constructed for the following singularly perturbed elliptic equation: L u ≡ − ∆u + b(x) · ∇u + c(x) u = f. Indeed. in Ω2 ⎪ ⎪ ⎪ ⎪ w2 = 0.22). replacing the Dirichlet conditions by inﬂow conditions.

12. on Bin −n1 · b w1 = n1 · (∇w2 − b w2 ) . the global system of partial diﬀerential equations satisﬁed by the weak limit of the solutions v . Dirichlet-Neumann iterative methods can be formulated to solve the above heterogeneous approximation to (1. on B[1]. 1. on Bout . η) ≡ for x ∈ Ω2 . GL7]. on Bin ⎪ L w2 = f. For rigorous results on the well posedness of the preceding het- erogeneous system. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = 0. in Ω2 ⎪ ⎪ ⎪ 2 ⎪ n · (∇w 2 − b w2 ) = −n2 · b w1 . on Bin ⎪ ⎩ −n1 · b w1 = n1 · (∇w2 − b w2 ) . 1.4 Lagrange Multiplier Framework The framework we refer to as the Lagrange multiplier formulation [GL. on B. on Bin . letting η → 0+ . heuristically. When b(x) is continuous. ⎪ ⎪ ⎪ ⎪ n2 · ∇w2 = 0. the problem will be elliptic and the traditional trans- mission conditions should hold: w1 = w2 . readers are referred to [GA15]. on Bout . Remark 1. on Bin 0 = n1 · ∇w2 . As a result. η) is deﬁned by: η for x ∈ Ω1 a(x. ⎩ w2 = 0. on B[2] . and imposing the inﬂow condition on Bin yields: w1 = w2 . the substitution that w1 = w2 on Bin will yield the following additional simpliﬁcations: ⎧ ⎪ ⎨ w1 = w2 . on B. on B n1 · (η∇w1 − b w1 ) = n1 · (∇w2 − b w2 ) . It is employed in the FETI (Finite Element Tearing and Interconnection) method . For > 0 and η > 0. underlies a variety of non-overlapping domain decomposition methods.22).4 Lagrange Multiplier Framework 27 where L. QU6] and Chap.η v ≡ −∇ · (a(x.24. see [GA15. However.η as η → 0 will be: ⎧ ⎪ ⎪ L0 w1 = f. η)∇v) + b(x) · ∇v + c(x) v and a(x.in ⎪ ⎪ ⎪ ⎨ w1 = w 2 .

7.) can be obtained by minimizing this extended energy functional. see (1. we will show that the optimization problem (1. on ∂Ω. The Lagrange multiplier hybrid formulation will be the saddle point problem associated with this constrained minimization problem. It is well known that the solution u minimizes an energy J(. with c(x) ≥ 0. 1. We outline the steps below. we illustrate its application to formulate iterative algorithms.).24) can be reformulated as a constrained optimization problem based on the subdomains. An immersed non-overlapping decomposition .23) u = 0. The Lagrange multiplier hybrid formulation is the saddle point problem associated with this constrained minimization problem. Ω1 Ω2 B Fig. BE22. BE18. Accordingly. Thus. The resulting sum of local energies will be well deﬁned even if the local displacement functions are dis- continuous across the interface B = ∂Ω1 ∩ ∂Ω2 . It is thus an extended energy functional. the solution u must optimize some energy functional J(·). 1. WO4.28 1 Decomposition Frameworks (a constrained optimization based parallel iterative method [FA16.1 Motivation Let Ω1 and Ω2 form a non-overlapping decomposition of the domain Ω of elliptic equation (1. the elliptic equation (1. For such a property to hold.1) must be self adjoint and coercive.4. requiring that b(x) = 0 and c(x) ≥ 0.24) and (1. FA15]). and in non- overlapping Schwarz iterative methods [LI8. the mortar element method (a method for discretizing elliptic equations on nonmatching grids [MA4. Using this decomposition of Ω.23). WO5]). In this section.25) within H01 (Ω). in this section we shall consider: L u ≡ −∇ · (a(x) ∇u) + c(x) u = f. The Lagrange multiplier framework is applicable only when there is an optimization principle associated with the elliptic equation. non-matching grid discretiza- tions and heterogeneous approximations. see Fig.7. Given any non-overlapping subdomain decomposition of Ω. A constrained minimization problem equivalent to the minimization of J(. in Ω (1. we may decompose the energy functional J(·) associated with (1. 1. BE4. BE6. GL8]. subject to the constraint that the local displacements match on the interface B.23) as a sum of en- ergy contributions Ji (·) from each subdomain Ωi .

CI2. wi ) ≡ Ωi (∇vi · a∇wi + cvi wi ) dx. w2 ). ∀µ ∈ Y } . Xi ≡ v ∈ H (Ωi ) : v = 0 on B[i] . By optimization theory. for vi . 1 Here JE (w1 . we minimize JE (v1 . but subject to the weak constraint that the subdomain functions match on B: m ((v1 . (1. w2 ) is deﬁned even when w1 = w2 on B. ·). ⎪ ⎩ Fi (wi ) for wi ∈ Xi . w) ≡ Ω (a∇v · ∇w + c vw) dx. v2 ). v2 ) ∈ X1 × X2 : m ((v1 . wi ) − Fi (wi ). ⎪ ⎪ ⎪ ⎩X ≡ H01 (Ω). 10. 1 for wi ∈ Xi ⎪ Ai (vi . see [ST14. η) ∈ X1 × X2 × Y as: L ((v1 .24) w∈X where ⎧ ⎪ ⎪ J(w) ≡ 12 A(w. w2 ) to the constrained minimization problem (1. the solution (w1 . Problem (1.23): J(u) = min J(w). see [CI4] and Chap.24). w2 ) ≡ J1 (w1 ) + J2 (w2 ). v2 ). wi ∈ Xi ⎪ ⎪ ⎪ ⎪ ≡ Ωi f wi dx. where: ⎧ ⎪ JE (w1 . v2 ) within the larger (extended) class of functions X1 × X2 deﬁned above. v2 ). η) . w ∈ X (1. We may express the energy J(w) = JE (w1 . µ) of an associated Lagrangian functional L (·. for w ∈ X. v2 ). It is well known. Constrained Minimization Formulation. w2 ) ≡ J1 (w1 ) + J2 (w2 ). µ) = 0.27) . (1.26) (v1 . B −1/2 1/2 where Y ≡ H00 (B) (the dual space of H00 (B)). where µ ∈ Y denotes an artiﬁcially introduced variable referred to as a Lagrange multiplier. 1. v2 ). η) ≡ J1 (v1 ) + J2 (v2 ) + m ((v1 . that the solution u to (1. We deﬁne the Lagrangian func- tion for ((v1 . w) − F (w). Saddle Point Formulation.4 Lagrange Multiplier Framework 29 Minimization Formulation. ∀µ ∈ Y. ⎪ ⎪ ⎪ ⎨ A(v. To obtain a constrained minimization problem equivalent to (1.25) ⎪ ⎪ F (w) ≡ Ω f wdx.24) will thus be formally equivalent to the following constrained minimization problem: J1 (w1 ) + J2 (w2 ) = min J1 (v1 ) + J2 (v2 ). µ) ≡ (v1 − v2 ) µ dsx = 0. Suppose wi ≡ w on Ω i for 1 ≤ i ≤ 2. (1. JO2. BR28].v2 )∈K where K ≡ {(v1 . for v.23) minimizes the energy J(·) associated with (1. for wi ∈ Xi ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Ji (wi ) ≡ 2 Ai (wi . Let {Ωi }2i=1 be a non-over- lapping decomposition of Ω.26) can be expressed as components in the saddle point ((w1 .

Then u(x) = w1 (x) in Ω 1 and u(x) = w2 (x) in Ω 2 . The next result indicates the equivalence of (1.29) by parts.19) for the substitution µ = n2 · (a∇u) on B. v2 ). Requiring the ﬁrst order variation at the saddle point ((w1 .29) m ((w1 . Let (w1 .30).30).23) is equivalent to (1. on B[2] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n2 · (a∇w1) = µ.29) associated with it [GI3].) will be uniquely solvable provided B[i] = ∅. µ) satisfying: ⎧ ⎪ Lw1 = f. η) ≤ L ((w1 . η) = 0. w2 ).) (representing the ﬂux on B) so that w1 = w2 on B. w2 (. The above system is referred to as a saddle point problem. w2 ).30) ⎪ ⎪ ⎪ ⎪ w2 = 0. w2 .). µ) to be zero yields: 2 2 i=1 Ai (wi . on B ⎪ ⎩ w1 = w2 . 1. µ) = i=1 Fi (vi ). we can express it in terms of partial diﬀerential equations involving w1 (. v2 ).23). for vi ∈ Xi (1. For each choice of Neumann data µ(·). w2 . in Ω2 (1. Let u be a solution to (1.30). Theorem 1. Suppose the following assumptions hold. we obtain: L ((w1 .19).25.30) to (1. for η ∈ Y. We seek (w1 . It does not demonstrate the well posedness of (1. v2 ) ∈ X1 × X2 and η ∈ Y . µ) ∈ X1 × X2 × Y of L (·). on B[1] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n · (a∇w1 ) = −µ.23) and (1. Proof. 2.30) is equivalent to (1. each subdomain problem for wi (. The preceding result only asserts the equivalence between solu- tions of (1. The latter can be demonstrated for (1.) as follows. vi ) + m ((v1 . w2 ). and since (1. .) and the Lagrange multiplier variable µ(. w2 ). If we integrate the weak form (1. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = 0. on B where B[i] ≡ ∂Ωi ∩ ∂Ω is the exterior boundary and ni is the unit exterior normal to ∂Ωi for i = 1. 2.26. The equivalence follows since (1. Hybrid Formulation.28) for any choice of (v1 . µ) (1.23). on B ⎨ 1 Lw2 = f.30 1 Decomposition Frameworks At the saddle point ((w1 . µ) ≤ L ((v1 . µ) be a solution to the hybrid formulation (1. Remark 1. w2 ). We must choose the Lagrange multiplier µ(.30) by employing general results on the well posedness of the saddle point problem (1.

to update the Lagrange multiplier function µ(·).1 (Uzawa’s Method) Let µ(0) denote a starting guess with chosen step size τ > 0.30). an iterative method for solving (1. Determine w1 and w2 in parallel: ⎧ . 1. 1. 1.23) can be obtained by applying a saddle point iterative algorithm such as Uzawa’s method. Algorithm 1.) in (1.2 Iterative Methods Since the Lagrange multiplier µ(. · · · until convergence do: (k+1) (k+1) 2.) determines w1 (. see Chap. 10.) and w2 (. as described below.4 Lagrange Multiplier Framework 31 1. For k = 0.4.4.

on B[1] ⎪ ⎪ . in Ω1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (k+1) w1 = 0. ⎪ ⎪ −∇ · a∇w1 (k+1) + c w1 (k+1) = f.

⎪ ⎪ n1 · a∇w1 (k+1) = −µ(k) . ⎪ . ⎨ on B.

in Ω2 ⎪ ⎪ ⎪ ⎪ (k+1) w2 = 0. on B[2] ⎪ ⎪ . ⎪ ⎪ −∇ · (k+1) (k+1) ⎪ ⎪ a∇w 2 + c w2 = f.

Update µ(k+1) as follows: . 3. ⎪ ⎩ (k+1) n2 · a∇w 2 = µ(k) . on B.

for x ∈ B. w2 ) . (k+1) (k+1) µ(k+1) (x) = µ(k) (x) + τ w1 (x) − w2 (x) . Endfor (k) (k) Output: (w1 . 4.

·) by an augmented Lagrangian Lδ (·.28. see [GL7. see Chap. Remark 1. The FETI method [FA16. (k) (k) Remark 1. ·). However. GL8]: δ Lδ ((v1 . and where the local problems may be singular. 10. it generalizes the preceding saddle point iterative algorithm to the multisubdomain case. 2 . where an additional non-negative functional is added to the original Lagrangian functional with a coeﬃcient δ > 0. An alternative hybrid formulation equivalent to (1. 4. µ) + v1 − v2 2L2 (B) . is also based on updating the Lagrange multiplier µ. and thus the iterates will converge geometrically to the true solution for suﬃciently small τ > 0. FA15]. v2 ). µ) ≡ J1 (v1 ) + J2 (v2 ) + m ((v1 .30) can be obtained by replacing the Lagrangian functional L (·. Discrete versions of Uzawa’s algorithm are described in Chap.27. The map µ(k) → w1 − w2 will be compact. where the rate of convergence may deteriorate with increasing number of subdomains. v2 ).

For k = 0. Applying an alternating directions implicit (ADI) method to determine the saddle point of the augmented Lagrangian functional. Let δ > 0 be a chosen parameter.4. both formulations will be equivalent.32 1 Decomposition Frameworks The augmented term 2δ v1 − v2 2L2 (B) will be zero when the constraint v1 = v2 is satisﬁed on B. GL8]. w2 denote starting guesses. Solve in parallel: ⎧ . 1. 1.2 (Non-Overlapping Schwarz Method) (0) (0) Let w1 . Algorithm 1. · · · until convergence do: 2. As a result. and the saddle point of the augmented Lagrangian will also yield the desired solution. will yield the following algorithm. referred to as the non-overlapping Schwarz method [LI8.

⎪ ⎪ −∇ · a∇w1 (k+1) (k+1) + cw1 = f. in Ω1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (k+1) w1 = 0. on B[1] ⎪ ⎪ .

.

on B. ⎪ ⎪ (k+1) (k+1) (k) (k) ⎨ n1 · a∇w1 + δw1 = n1 · a∇w2 + δw2 . ⎪ .

. in Ω2 ⎪ ⎪ ⎪ ⎪ (k+1) w2 = 0. ⎪ ⎪ −∇ · (k+1) (k+1) ⎪ ⎪ a∇w2 + cw2 = f.

on B[2] ⎪ ⎪ .

DO4. to ensure that the resulting discretization yields a constrained minimization problem. w2 ) Remark 1. 1.8. on B. In practice. Construct a ﬁnite element space Yh1 (B) ⊂ L2 (B) ⊂ Y consisting of piecewise polynomial functions deﬁned on . BE22.23) can be obtained by discretizing (1. 3. ⎪ ⎩ (k+1) (k+1) (k) (k) n2 · a∇w2 + δw2 = n2 · a∇w1 + δw1 . WO5]. see Fig. Select a triangulation of interface B inherited either from Th1 (Ω1 ) or Th2 (Ω2 ). 11.29). suppose that Th1 (Ω1 ) is chosen. a careful choice of parameter δ > 0 will be necessary for optimal convergence [LI8. see also Chap. a discretization of (1. BE4.30). For deﬁniteness. WO4. Endfor (k) (k) Output: (w1 . GL8]. it is advantageous to employ a Galerkin approximation of the saddle point problem (1. An extensive literature exists on such nonmatching grid discretization techniques. However. The resulting discretization is re- ferred to as a mortar element method.29. see [MA4. Triangulate each subdomain Ωi by a grid Thi (Ωi ) of size hi suited to the local geometry and solution for 1 ≤ i ≤ 2. Each subdomain can be triangulated independently without requiring the local triangulations to match on B. Let Xhi ⊂ Xi denote a traditional ﬁnite element space deﬁned on the triangulation Thi (Ωi ).4.3 Global Discretization In principle. 1.

1. for 1 ≤ i ≤ 2 ⎪ ⎩ (1) m ((wh1 . Discretization of the saddle point formulation (1. µh ) = µh M wh1 − M wh2 . M (1) −M (2) 0 µh 0 where: ⎧ ⎨ Ai (whi . for 1 ≤ i ≤ 2 F (whi ) T = whi f hi .4 Lagrange Multiplier Framework 33 Ω1 Ω2 B[1] Fig.8. 1. T (2) Here we have used whi and µh to denote ﬁnite element functions and whi and µh as their vector representations with respect to some ﬁxed basis. The dimension of Yh1 should equal the dimension of Xh1 ∩H01 (B). See Chap. whi ) ⎪ = wThi A(i) whi . wh2 ). 11 for multiplier spaces Yh1 (B).29) using the subspaces Xh1 × Xh2 × Yh1 (B) will yield a linear system of the form: ⎡ (1) T ⎤⎡ ⎤ ⎡ ⎤ A 0 M (1) wh1 f h1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢0 A(2) −M (2) ⎥ T ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ wh2 ⎦ = ⎣ f h2 ⎦ . . Non-overlapping nonmatching grids the triangulation of B inherited from Th1 (Ω1 ).

wB cor- responding to the unknowns in the interior of each subdomain and on the interface B. Substituting. ⎢ ⎢ ⎥⎢ (2)T ⎥ ⎢ (2) ⎥ ⎥ ⎢ ⎥ ⎢ (2) ⎥ ⎢0 0 (2)T AIB (2) ABB −MB ⎦ ⎣ wB ⎦ ⎣ f B ⎦ ⎣ 0 MB (1) 0 −M (2) 0 µh 0 B . we obtain: ⎡ (1) (1) ⎤⎡ ⎤ ⎡ (1) ⎤ A AIB 0 0 0 (1) wI fI ⎢ II T ⎥ ⎢ A(1) A(1) (1)T ⎥ ⎢ (1) ⎥ ⎢ (1) ⎥ ⎢ IB 0 0 MB ⎥ ⎢ wB ⎥ ⎢ ⎥ ⎢ fB ⎥ ⎥ ⎢ BB ⎥⎢⎢ (2) ⎥ ⎢ (2) ⎥ ⎢0 0 (2) AII AIB (2) ⎥ 0 ⎥ ⎢ wI ⎥ = ⎢ f I ⎥ . then matrices A(i) and M (i) will have the block structure: (i) (i) AII AIB (i) A = (i) T (i) and M (i) = 0 MB (i) . for 1 ≤ i ≤ 2 AIB ABB (i) (i) where wI and wB are of size ni and mi . T (i) (i) If each nodal vector whi is block partitioned as whi = wI .

. we may parameterize the solution space of the interface constraints as: (1) (2) (1)−1 (2) wB ≡ R12 wB where R12 ≡ MB MB . for x ∈ Ω 1 . on ∂Ω. (1) (1) (2) The local unknowns can then be represented as wI . in Ω (1. Below.31). we heuristically apply the subdomain vanishing viscosity method as in [GA15]: −∇ · (a. in Ω (1.31) u = g(x). (2) (2) wI .23) is singularly perturbed. Suppose Ω1 and Ω2 form a nonoverlapping decomposition of Ω. where 0 < 1 is a small perturbation parameter and c(x) ≥ c0 > 0.η (x)∇u) + c(x) u = f (x). then matrix MB will be square and invertible of size m1 . 11. wB = R12 wB . wB ) and applying ﬁrst order stationarity con- ditions for its minimum yields the following linear system: ⎡ (1) (1) ⎤ ⎡ (1) ⎤ ⎡ (1) ⎤ AII 0 AIB R12 wI fI ⎢ ⎥ ⎢ (2) ⎥ ⎢ ⎥ ⎢0 AII (2) (2) AIB ⎥⎢w ⎥ = ⎢ (2) fI ⎥. WO5]. In this case. The resulting global discretization will be stable and convergent of optimal order.30) can be employed to heuristically study an heterogeneous approximation of it. Substituting this representation into the discrete energy (1) (2) (2) (2) Jh1 (wI . 1.4. a basis for Yhi (B) can (i) be constructed so that matrix MB is diagonal [WO4.32) u = g(x). we illustrate two alternative approximations of the following singularly perturbed. and wB .23). its Lagrange multiplier formulation (1. self adjoint elliptic equation [KE5]: −∇ · (∇u) + c(x) u = f (x). Ω2 must enclose the boundary layer region of the solution.4 Heterogeneous Approximations When elliptic equation (1. BE22. then R12 = I and the above discretization reduces to the traditional conforming ﬁnite element discretization of (1. R12 wB ) + Jh2 (wI . on ∂Ω. To obtain an heterogeneous approximation of (1. BE6. BE18. ⎣ ⎦⎣ I ⎦ ⎣ ⎦ T (1)T (2)T T (1) (2) (2) T (1) (2) R12 AIB AIB R12 ABB R12 + ABB wB R12 fB + fB If both grids match. such that: |∆u| |c(x) u| . They include piecewise polynomial functions which are continuous across elements as well as piecewise polynomial functions which are discontinuous across elements [MA4. Mortar element spaces Yhi (B) are described in Chap.34 1 Decomposition Frameworks (1) If the dimension of the space Yh1 (B) is m1 . Then. In the latter case. BE4].

in Ω2 ⎪ ⎪ ⎪ ⎪ w2 = w1 . on B[2] . on B ⎪ ⎪ ⎩ w2 = g(x). on B ⎨ − ∆w2 + c(x) w2 = f (x).) formally satisﬁes a zeroth order equation in Ω1 .4 Lagrange Multiplier Framework 35 where η for x ∈ Ω1 a. yielding the alternative system: ⎧ ⎪ ⎪ c(x) w1 = f (x). then the local solution may be ill posed. then continuity of the local solutions must be enforced and the ﬂux transmission condition needs to be omitted. Either the transmission condition w1 = w2 or ∂w 2 ∂n = 0 can be enforced. in Ω2 ⎪ ⎪ ⎪ ⎪ ∂w 2 ⎪ ⎪ ∂n = 0. on B ⎩ w2 = g(x). the above problem is elliptic and coercive. For > 0 and η > 0. If a discontinuous approximation is sought. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = g(x). indicating a poor choice of subdomain Ω1 . on B ⎪ ⎪ ⎩ w1 = w2 . Since c(x) ≥ c0 > 0. Two alternative approximations may be constructed. on B[1] ⎨ − ∆w2 + c(x) w2 = f (x). on B[1] ⎨ − ∆w2 + c(x) w2 = f (x). If a continuous (or H 1 (·)) solution is sought. formally the limiting system (1. in Ω2 ⎪ ⎪ ⎪ ⎪ w2 = g(x). in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = g(x). 1. then the continuity transmission condition can be omitted.e. and the ﬂux transmission condition can be enforced. since w1 (. in Ω1 ⎪ ⎪ ⎪ ⎪ w1 = g(x). . on B[2] .η (x) ≡ for x ∈ Ω2 .30) becomes: ⎧ ⎪ ⎪ c(x) w1 = f (x). but not both. as η → 0+ . However. on B[1] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 = µ. the limiting equation on Ω1 for w1 (x) can be solved to formally yield: f (x) w1 (x) = .. on B. i. c(x) If B[1] = ∅ and the boundary data g(x) is not compatible with the formal (x) solution fc(x) (x) . if g(x) = fc(x) on B[1] . on Ω1 . on B[2] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∂w ∂n2 = µ. yielding the following system: ⎧ ⎪ ⎪ c(x) w1 = f (x).

and is based on the minimization of a square norm objective functional. In both cases. and heterogeneous approximations for (1. 1. the limiting solutions may not minimize the energy functional J. In this section. an optimization principle need not be as- sociated with the underlying partial diﬀerential equation. GU3.30) is equivalent to (1. hence the name least squares-control. which has various applications to partial diﬀerential equations. The subdo- mains can be overlapping or non-overlapping. in which the domain Ω is decomposed into two subdomains. We illustrate the formulation of iterative methods. the square norm functional typically measures the diﬀerence between the subdomain solutions on the re- gions of overlap or intersection between the subdomains. . the subproblems for w1 and w2 are formally decoupled.19). subject to constraints. in ∂Ω. Similar heuristics may be applied to construct an approximation of the singularly perturbed anisotropic elliptic equation using (1. we describe the hybrid formulation associated with the least squares-control method for the following elliptic equation: Lu ≡ −∇ · (a(x) ∇u) + b(x) · ∇u + c(x) u = f (x).33). while the constraints require the local solutions to solve the original partial diﬀerential equation on each subdomain.36 1 Decomposition Frameworks In this case. The control boundary data must be determined to minimize the square norm function. see [AT. it is regarded as a control function which parameterizes the local solution. GL] is a general optimization method.32) as η → 0+ .31. GL13. Remark 1. In domain decompo- sition applications.η (·) associated with (1.33) u = 0. in Ω u = g(x). Since (1. but we focus on the overlapping case. both transmission conditions can be retained in the limiting problem. non-matching grid discretizations. with appropriate boundary conditions.30.5 Least Squares-Control Framework The least squares-control method [LI2. GU2].30): − ux1 x1 − ux2 x2 − ux3 x3 + c(x) u = f (x). for which the limiting problem is a degenerate elliptic equation. Since the boundary data on each subdomain boundary is unknown. Remark 1. rigorous results on the well posedness of the above approximation may be deduced from [GA15]. in Ω (1. Importantly. It results in a constrained least squares problem. In this case. on ∂Ω.

v2 ). for the true subdomain solutions. on B[i] ⎪ ⎩ ni · (a∇wi ) = gi . An overlapping decomposition 1. w1 = w2 . If wi (. since w1 and w2 will match on Ω12 .34) 2 Ω12∗ 2 Ω12 ∗ Typically (γ1 = 1. on Ω12 . On each subdomain Ωi∗ for 1 ≤ i ≤ 2. w2 ) = min J(v1 . ∗ ∗ Furthermore.5 Least Squares-Control Framework 37 B[2] Ω1∗ ∗ Ω12 Ω2∗ B[1] Fig. Let B = ∂Ωi ∩ Ω and B[i] = ∂Ωi ∩ ∂Ω denote the interior (i) and exterior segments. and let ni denote the unit exterior normal to ∂Ωi∗ . within a class K: J(w1 . The preceding observation suggests the following constrained minimization problem equivalent to (1. but other choices are possible.) on Ωi∗ and gi (. (1. on B (i) .5.) = ni · (a(x)∇u) on B (i) .1 Motivation Let Ω1∗ and Ω2∗ form an overlapping decomposition of Ω. γ2 = 0). (1. 1. Then.9.. w2 ) which minimizes J (·) (with minimum value zero). ∗ ∗ see Fig. respectively.e. on B[i] .v2 )∈K where K is deﬁned by the constraints: ⎧ ⎫ ⎨ Lvi = f. it will hold that w1 − w2 L2 (Ω ∗ ) = 0 and |w1 − w2 |H 1 (Ω ∗ ) = 0. we let wi denote the approximation of the solution u to (1. with Ω12 ∗ = Ω1∗ ∩ Ω2∗ . (1. w2 ) = 0.36) ⎩ ⎭ vi = 0.33) on Ωi∗ . it will hold: J(w1 . i. then wi will satisfy: ⎧ ⎪ ⎨ Lwi = f. and let gi denote the local Neumann data associated with wi on B (i) . Motivated by this. v2 ) : ni · (a∇wi ) = gi . Determine (w1 . of the subdomain boundaries. in Ωi∗ ⎬ K ≡ (v1 . in Ωi∗ wi = 0. 1.) = u(. 2 2 12 12 deﬁne the following square norm functional J (·): γ1 γ2 2 J (v1 .9.35) (v1 . on B (i) for 1 ≤ i ≤ 2 . 1.33). v2 ) ≡ (v1 − v2 ) dx + 2 |∇(v1 − v2 )| dx.

u2 ) = 0.. coercivity of (1. Thus. Such a result. w2 ) = min J(v1 . Then this minimum value must be zero. Hopefully. we may alternatively pose Robin or Dirichlet conditions.) and that J(w1 . we cannot pose Dirichlet conditions on B (i) . in Ω12 . Let (w1 . on Ω2∗ .32. . Then at the minimum: J(w1 .35). To avoid cumbersome notation. Thus. in the non-overlapping case. using the deﬁnition of J(. gi ). Let the solution u of (1. In a strict sense. Suppose the following assumptions hold.33) and (1. yields that J(w1 .) typically measures the diﬀerence between the Dirichlet data.33). u2 ) ∈ K and J(u1 .) ≥ 0. (v1 . Remark 1. suppose a solution to (1. w2 ) = 0. ∗ The desired result follows using w1 = w2 on Ω12 .35) subject to the constraints (1. v2 ). will hold under appropriate assumptions (such as b = 0. w2 ) = 0 and minimizes J(. Suppose u is the solution to (1.33) and wi ≡ u on Ωi∗ for 1 ≤ i ≤ 2.33)) given suﬃcient overlap between the subdomains.v2 )∈K it will hold that: w1 = u. Then.33) exist and be smooth. 1.35) under perturbation of data. However.) and K.36). w2 ) minimize (1.. . Let χ1 (x) and χ2 (x) form a partition of unity subordinate to the cover Ω1 and Ω2∗ . on Ω1∗ w2 = u.33) it follows that: u(x) ≡ χ1 (x) w1 (x) + χ2 (x) w2 (x). since for ui ≡ u in Ωi∗ for 1 ≤ i ≤ 2 it will hold that (u1 .36) and minimizes J(v1 . Conversely. however. we obtain that ∗ w1 = w2 on Ω12 . since Lwi = f in Ωi∗ and since w1 = w2 in Ω12 ∗ ..33. by the uniqueness of solutions to (1.36). Furthermore: ∗ w1 − w2 = u − u = 0. we must replace vi by (vi . Proof.38 1 Decomposition Frameworks Instead of Neumann conditions on B (i) . since the functional J(. The following equivalence will hold. . The it is easily veriﬁed that χ1 (x) w1 (x) + χ2 (x) w2 (x) ∗ solves (1. such omission should be clear from the context. The preceding result only demonstrates an equivalence between the solutions of (1. we often omit explicit inclusion of gi as an argument in the deﬁnition of J(. . v2 ). (w1 . It does not guarantee the well posedness of (1. 2. Theorem 1.35) exists. subject to constraints (1. w2 ) will satisfy all the required constraints (1.

g2∗ ) = min H(g1 . where n · (a∇vi ) = gi . on B (i) ⎩ i vi = 0.. g2 ). for x ∈ B (2) . or such equations may be derived by heuristic analogy with the associated discrete saddle point problem. 1.38) (g1 . an augmented Lagrangian formulation [GL7] may be employed to regularize (1. the function space Xi for the boundary data for gi is typically chosen for each 1 ≤ i ≤ 2 as Xi = (H00 (B (i) )) or Xi = H −1/2 (B (i) ). Neumann or Robin data gi speciﬁed on each boundary segment B (i) . Thus. Such unconstrained minimization does not require Lagrange multipliers.) as (w1 . the desired local solutions will satisfy wi ≡ Ei gi∗ for 1 ≤ i ≤ 2.35).33) is self adjoint and coercive.36) can be parameterized in terms of the Dirichlet. AT]. 1 ≤ i ≤ 2} .35) as an unconstrained minimization problem. (1. deﬁne an aﬃne linear mapping Ei as follows: ⎧ ⎨ L vi = f. . the constraint set K in (1.5 Least Squares-Control Framework 39 Remark 1. We shall omit the derivation of these equations. E2 g2 ) : for gi ∈ Xi . the unconstrained minimum (g1∗ . where the term J(v1 . as described in Chap. on B[i] . For Neumann conditions. when Neumann boundary conditions are imposed on each B (i) .38). ·): H(g1∗ .35) will depend on the deﬁnition of J (·).37) Then. w2 ) = (E1 g1∗ . E2 g2 ). when the elliptic equa- tion (1. g2 ) = 0 ⇔ v2 (x) = 0. Then. Well posedness of the constrained minimization problem (1. the constraint set K can be represented as: K ≡ {(E1 g1 . for 1 ≤ i ≤ 2. 6. v2 ) = 12 v1 − v2 2H 1 (Ω ∗ ) can be 12 shown to yield a well posed saddle point problem [GL.g2 ) will yield the constrained minimum of J(. E2 g2∗ ). once g1∗ and g2∗ have been determined by minimizing H(·. except to note that the calculus of variations may be applied to (1. Deﬁne a function H(·): H(g1 . (1. For instance. for x ∈ B (1) δH (g1 . This parameterization 1/2 enables the reformulation of this constrained minimization problem (1. The unknown control data g1 and g2 can be determined by solving the system of equations which result from the application of ﬁrst order stationarity conditions δH = 0 at the minimum of H(·). More generally. For instance. J(v1 . As mentioned earlier. The resulting ﬁrst order stationarity equations will be of the form: v1 (x) = 0. ·). g2 ) ≡ J(E1 g1 . where g1 and g2 are regarded as control data. in Ωi∗ Ei gi ≡ vi .34. g2∗ ) of H(·. v2 ) is coercive in the constraint space K.

on B. If Ω is decomposed into non-overlapping subdomains Ω1 and Ω2 with common interface B = ∂Ω1 ∩ ∂Ω2 . The above constraints will ensure that the original elliptic equation is solved on . Solve: ⎧ ∗ ⎨ −∇ · (a ∇wi ) + b · ∇wi + c wi = f (x). Later. 2. a preconditioned CG method can be employed to solve the resulting linear system. 2 and K consists of all (v1 . Here µ(x) is a ﬂux variable on the interface B (which can be eliminated). in Ω1 ⎪ ⎪ ⎪ ⎪ v1 = 0. (v1 . on B[1] ⎪ ⎪ ⎪ ⎨ n1 · (a∇v1 ) = µ(x). Next. v2 ) satisfying the following constraints: ⎧ ⎪ ⎪ Lv1 = f (x). v1 (x) and v2 are deﬁned as the solutions to: ⎧ ∗ ⎨ −∇ · (a ∇vi ) − ∇ · (b vi ) + c vi = r(x). on B[2] ⎪ ⎩ n2 · (a∇v2 ) = −µ(x). see Chap. 6. When (1. v2 ). on B (i) for w1 (x) and w2 (x) using g1 (x) and g2 (x). w2 ) = min J(v1 .v2 )∈K where 1 J(v1 . 2 ⎪ ⎩ ni · (a∇wi ) = gi (x). on B ⎪ ⎪ Lv2 = f (x). a least squares-control formulation may be constructed as follows [GU3. Seek (w1 . on B[i] for i = 1.40 1 Decomposition Frameworks where v1 (x) and v2 (x) are deﬁned in terms of g1 (x) and g2 (x) as follows. compute: ∗ w1 (x) − w2 (x).35.35) is discretized. on B (i) The control data g1 (x) and g2 (x) must be chosen to ensure that vi (x) = 0 on B (i) for i = 1. in Ωi ⎪ wi = 0. w2 ) which minimizes: J(w1 . Remark 1. ⎪ ⎩ ni · (a∇vi + b vi ) = 0. an explicit matrix representation can be derived for H(·) and its gradient. for x ∈ Ω12 . in Ωi ⎪ vi = 0. v2 ) ≡ v1 − v2 2L2 (B) . Then. GU2]. In this case. on B[i] for 1 ≤ i ≤ 2. in Ω2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ v2 = 0. we shall outline a gradient method to determine g1 and g2 iteratively. for x ∈ Ω12 r(x) ≡ ∗ 0.

on B (i) . ⎪ in Ωi∗ vi = 0. 6. 3.33) can be determined iteratively. 1.38).5. for x ∈ Ω12 5. on B[i] ⎪ ⎩ (k) ni · (a∇vi ) = gi (x).2 Iterative Methods The solution to (1. 2 in parallel solve: ⎧ ⎨ −∇ · (a ∇vi ) + b · ∇vi + c vi = f (x). 1. and that the Neumann ﬂuxes of the two subdomain solutions match on B. the feasible set K can be parameterized in terms of the ﬂux µ(x) = n1 · (a∇v1 ) on B. H00 (B) 1/2 where H00 (B) denotes a fractional Sobolev norm (deﬁned in Chap.1 (Gradient Least Squares-Control Algorithm) (0) (0) Let g1 (x) and g2 (x) denote starting guesses and τ > 0 a ﬁxed step size. with suﬃciently small step size τ > 0. Endfor (k) (k) Output: (g1 . For k = 0. Endfor 7. 8. for x ∈ B (1) (k+1) (k) g2 (x) = g2 (x) + τ w2 (x). In this case. in Ωi ⎪ wi = 0. 1. Algorithm 1. In applications. 2 in parallel solve the adjoint problems: ⎧ ∗ ⎨ −∇ · (a ∇wi ) − ∇ · (b wi ) + c wi = r(x). Update: (k+1) (k) g1 (x) = g1 (x) − τ w1 (x). Such an algorithm can be derived formally using calculus of variations. v2 ) ≡ 12 v1 − v2 2 1/2 may also be employed. For i = 1. Endfor 4. on B (i) . an alternative choice of objective functional J(v1 . Compute: ∗ v1 (x) − v2 (x). for x ∈ Ω12 r(x) ≡ ∗ 0. · · · until convergence do: 2. by formally applying a steepest descent method to the unconstrained minimization problem (1. 1. 6.5. or by analogy with the discrete version of this algorithm described in Chap. For i = 1. g2 ) . 3).5 Least Squares-Control Framework 41 each subdomain. for x ∈ B (2) . on B[i] ⎪ ⎩ ni · (a∇wi + b wi ) = 0.

42 1 Decomposition Frameworks

**Alternative divide and conquer iterative algorithms can be formulated
**

for (1.33) using its saddle point formulation. However, the resulting algorithm

may require more computational resources. For instance, suppose that:

1

J(v1 , v2 ) = v1 − v2 2L2 (Ω12

∗ ),

2

and that Neumann boundary conditions are imposed on B (i) . Then, as de-

scribed in Chap. 10, a constrained minimization problem such as (1.35)

with (1.36), can be equivalently formulated as a saddle point problem, and

saddle point iterative algorithms can be formulated to solve it.

Indeed, if λ1 and λ2 denote the Lagrange multipliers, then the saddle point

problem associated with (1.35) would formally be of the form:

⎧

⎪

⎪ χΩ12 (w1 − w2 ) + L∗1 λ1 = 0,

⎪

⎪

⎨ −χΩ12 (w1 − w2 ) + L∗2 λ2 = 0,

(1.39)

⎪

⎪ L1 w ˜1 = f1 ,

⎪

⎪

⎩ L2 w ˜2 = f2 .

**Here Li w˜i = fi formally denotes the operator equation associated with
**

L wi = f in Ωi∗ with Neumann conditions ni · (a ∇wi ) − gi = 0 on B (i) and ho-

mogeneous Dirichlet boundary conditions wi = 0 on B[i] , with w ˜i = (wi , gi ).

The operator L∗i formally denotes the adjoint of Li . Here, χΩ12 ∗ (x) denotes

∗

the characteristic (indicator) function of Ω12 . We omit elaborating on such a

saddle point problem here, except to note that, it may be obtained by heuris-

tic analogy with the discrete saddle point problems described in Chap. 10.

The λi (x) corresponds to Lagrange multiplier functions, see [GL, AT]. In this

saddle point problem, the Lagrange multiplier variables will not be unique,

and an augmented Lagrangian formulation would be preferable.

1.5.3 Global Discretization

Hybrid formulation (1.35) or (1.38) can, in principle, be employed to dis-

cretize (1.33) on a nonmatching grid such as in Fig. 1.10. Such discretizations

have not been considered in the literature, however, a heuristic discussion of

such a discretization is outlined here for its intrinsic interest, employing for-

mulation (1.38). We employ ﬁnite element discretizations on the subdomains.

A nonmatching grid discretization of (1.38) will require discretizing J(·):

1

J(v1 , v2 ) = v1 − v2 2H 1 (Ω12

∗ ),

2

and this will involve two overlapping non-matching grids. In the following, we

∗

heuristically outline a mortar element discretization of J(v1 , v2 ) on Ω12 , and

employ this to construct a global non-matching grid discretization of (1.33),

with Dirichlet boundary controls on each subdomain boundary B (i) . Each

subdomain problem will involve only a conforming grid.

1.5 Least Squares-Control Framework 43

Th2 (Ω2∗ )

r r r r r r r r r r

r r r r r r r r r r

r r r r r r r r r r

r r r r r r r r r r

b b b b b rb r br rb r rbr br r r

r r r r r r r r r r

b b b b b rb r br rb r rbr br r r

r r r r r r r r r r

b b b b b rb r br rb r rbr br r r

b b b b b rb r br rb r rbr br r r

r r r r r r r r r r

b b b b b rb r br rb r rbr br r r

b b b b b b b b b b

b b b b b b b b b b

Th1 (Ω1∗ )

Fig. 1.10. Overlapping nonmatching grids

**Remark 1.36. If J(v1 , v2 ) is replaced by JB (v1 , v2 ) ≡ 12 v1 − v2 2B where B =
**

∂Ω1 ∩ ∂Ω2 and Ωi∗ is an extension of a non-overlapping decomposition Ωi ,

such a discretization would be considerably simpler.

**Local Triangulation. For 1 ≤ i ≤ 2 triangulate each subdomain Ωi∗ by a
**

grid Thi (Ωi∗ ) according to the local geometry and regularity of the solution,

see Fig. 1.10. We shall assume that at least one of the local grids triangulates

∗

the region of overlap Ω12 . For deﬁniteness assume that triangulation Th1 (Ω1∗ )

triangulates Ω12 . Let ni and mi denote the number of nodes of grid Thi (Ωi∗ )

∗

**in the interior of Ωi∗ and on B (i) , respectively. Additionally, let li denote the
**

∗

number of nodes of triangulation Thi (Ωi∗ ) in Ω 12 .

Local Discretizations. For 1 ≤ i ≤ 2, employ Dirichlet boundary conditions

on B (i) in (1.36) and discretize the resulting local problems using a ﬁnite

element space Xhi ⊂ Xi based on triangulation Thi (Ωi∗ ):

Xi ≡ vi ∈ H 1 (Ωi∗ ) : vi = 0 on B[i] .

(i) (i)

Block partition the unknowns whi = (wI , wB )T according to the interior

unknowns and the unknowns on the boundary B (i) respectively. Denote the

block partitioned linear system for the discretized Dirichlet problem as:

(i) (i) (i) (i) (i)

AII wI + AIB wB = f I ,

(i) (i)

wB = gB .

∗

Weak Matching on Ω12 . Choose a ﬁnite element space:

∗ ∗

Yh (Ω12 ) ⊂ L2 (Ω12 )

44 1 Decomposition Frameworks

∗

based on the triangulation of Ω12 inherited from Th1 (Ω1∗ ), of dimension l1 .

∗

Deﬁne the weak matching condition on Ω12 as:

∗

(wh1 − wh2 ) µh1 dx = 0, for µh1 ∈ Yh1 (Ω12 ),

∗

Ω12

∗

enforced using the subspace Yh1 (Ω12 ). Denote its matrix form as:

M11 wh1 − M12 wh2 = 0,

−1

where M11 is invertible of size l1 . Deﬁne an oblique projection P1 ≡ M11 M12 .

Discrete Functional J(·, ·). Let A(12) be the stiﬀness matrix associated

∗

with J(·) on the triangulation Th1 (Ω12 ). The quadratic functional J(·) can be

(12)

discretized using A and the projection P1 as follows:

⎧

⎪ J (vh1 , vh2 ) ≡ 12 vh1 − vh2 2H 1 (Ω ∗ )

⎨ 12

T

⎪ ≈ 2 (vh1 − P1 vh2 ) R12

1 T

A(12) R12 (vh1 − P1 vh2 )

⎩

≡ Jh (vh1 , vh2 ) .

∗

Here R12 is a restriction map onto the nodes of Ω 12 from Ω1∗ , see Chap. 6.

The reduced functional Hh (·) can be discretized using:

Hh (gh1 , gh2 ) ≡ Jh (vh1 , vh2 ) ,

where

(i)−1 (i) (i) (i)

AII (f I − AIB gB )

v hi = (i)

for 1 ≤ i ≤ 2.

gB

**Stationarity Condition. The ﬁrst order derivative conditions for the mini-
**

(1) (2)

mum of Hh (·) will yield the following equations for (gB , gB ):

(1) (1)

E1T R12

T

A(12) R12 E1 − E1T R12 T

A(12) R12 P1 E2 gB γB

=

− E2 P1 R12 A R12 E1

T T T (12) T T T (12)

E2 P1 R12 A R12 P1 E2 gB

(2) (2)

γB

(1.40)

where

⎧ (1)

(1) (2)

⎪

⎪ γ B ≡ E1T R12 T

A(12) R12 −µI + P1 µI ,

⎪

⎪

⎪

⎪

⎪

⎪ γ

(2)

≡ E T T T

P R A (12)

R −µ

(1)

+ P µ

(2)

,

⎪

⎪ B 2 1 12 12 I 1 I

⎪

⎪

⎪

⎪ −1

⎨ (i) (i)

Ei ≡ −AII AIB ,

⎪ I

⎪

⎪

⎪

⎪ −1

⎪

⎪ (i) A

(i)

f

(i)

⎪

⎪ µI ≡ II I ,

⎪

⎪ 0

⎪

⎪

⎪

⎩ w(i) = A(i)−1 f (i) − A(i) g(i) ,

I II I IB B for i = 1, 2.

1.5 Least Squares-Control Framework 45

**Thus, a non-matching grid discretization of (1.33) based on the subdomains
**

(1) (2)

involves solving system (1.40) for the control boundary data gB and gB .

(i)

Subsequently, the subdomain solution wI can be determined as:

(i) (i)−1 (i) (i) (i)

wI = AII f I − AIB gI , for 1 ≤ i ≤ 2.

**Remark 1.37. General results on the stability and convergence properties of
**

such discretizations are not known. However, when both local grids match on

∗

Ω12 , projection P1 = I and the global discretization will be equivalent to a

traditional discretization of (1.33) on the global triangulation.

1.5.4 Heterogeneous Approximations

**The least square-control formulation (1.35) provides a ﬂexible framework for
**

constructing heterogeneous approximations of general systems of partial dif-

ferential equations of heterogeneous character [AT, GL13]. We illustrate here

how an elliptic-hyperbolic approximation can be constructed for the following

singularly perturbed elliptic equation:

L u ≡ − ∆u + b(x) · ∇u + c(x) u = f, on Ω

(1.41)

u = 0, on ∂Ω,

**where 0 < 1 is a perturbation parameter. Suppose Ω1∗ and Ω2∗ form an
**

overlapping covering of Ω such that:

| ∆u| |b(x) · ∇u + c(x) u| , in Ω1∗ .

**We may then heuristically approximate L u = f in Ω1∗ by L0 u = f where
**

L0 u ≡ b(x) · ∇u + c(x) u. To construct an elliptic-hyperbolic approximation

of (1.41), replace the elliptic problem L v1 = f on Ω1∗ by the hyperbolic prob-

lem L0 v1 = f within the least squares-control formulation (1.35) of (1.41).

The resulting heterogeneous problem will seek (w1 , w2 ) which minimizes:

ˆ 1 , w2 ) =

J(w min ˆ 1 , v2 ),

J(v

(v1 ,v2 )∈K

ˆ

where

1

Jˆ (v1 , v2 ) ≡ v1 − v2 2L2 (Ω12

∗ ),

2

ˆ consists of (v1 , v2 ) which satisfy the constraints:

and K

⎧ ∗

⎧ ∗

⎪

⎨ L0 v1 = f, on Ω1 ⎪

⎨ L v2 = f, on Ω2

(1)

v1 = g1 , on Bin and v2 = g2 , on B (2) (1.42)

⎪

⎩ ⎪

⎩

v1 = 0, on B[1],in v2 = 0, on B[2] .

46 1 Decomposition Frameworks

**Here the inﬂow boundary segments of B (1) and B[1] are deﬁned by:
**

(1)

Bin ≡ x ∈ B (1) : n1 (x) · b(x) < 0

B[1],in ≡ x ∈ B[1] : n1 (x) · b(x) < 0 ,

where n1 (x) is the unit outward normal to B1 at x.

**Remark 1.38. The admissible set Kˆ may be parameterized in terms of the local
**

boundary data. An equivalent unconstrained minimization problem may then

be obtained analogous to (1.37) and (1.38). See also Chap. 12.

**Remark 1.39. The solution (w1 , w2 ) to the above heterogeneous model may
**

∗ ˆ within the class Kˆ may no

not match on Ω12 and the minimum value of J(·)

longer be zero. A continuous global solution, however, may be obtained by

employing a partition of unity χ1 (x) and χ2 (x) subordinate to the cover Ω1∗

and Ω2∗ and by deﬁning:

w(x) ≡ χ1 (x) w1 (x) + χ2 (x) w2 (x).

**Remark 1.40. Rigorous results are not known on the well posedness of the
**

above heterogeneous model. The above procedure has been generalized and

employed to construct heterogeneous approximations to the Boltzmann,

Navier-Stokes and Euler equations [AT, GL13].

2

Schwarz Iterative Algorithms

**In this chapter, we describe the family of Schwarz iterative algorithms. It
**

consists of the classical Schwarz alternating method [SC5] and several of its

parallel extensions, such as the additive, hybrid and restricted Schwarz meth-

ods. Schwarz methods are based on an overlapping decomposition of the do-

main, and we describe its formulation to iteratively solve a discretization of a

self adjoint and coercive elliptic equation. In contrast with iterative algorithms

formulated on non-overlapping subdomains, as in Chap. 3, the computational

cost per Schwarz iteration can exceed analogous costs per iteration on non-

overlapping subdomains, by a factor proportional to the overlap between the

subdomains. However, Schwarz algorithms are relatively simpler to formulate

and to implement, and when there is suﬃcient overlap between the subdo-

mains, these algorithms can be rapidly convergent for a few subdomains, or

as the size of the subdomains decreases, provided a coarse space residual cor-

rection term is employed [DR11, KU6, XU3, MA15, CA19, CA17].

Our focus in this chapter will be on describing the matrix version of

Schwarz algorithms for iteratively solving the linear system Au = f obtained

by the discretization of an elliptic equation. The matrix versions correspond

to generalizations of traditional block Gauss-Seidel and block Jacobi iterative

methods. Chap. 2.1 presents background and matrix notation, restriction and

extension matrices. Chap. 2.2 describes the continuous version of the classi-

cal Schwarz alternating method [MO2, BA2, LI6] and derives its projection

version, which involves projection operators onto subspaces associated with

the subdomains. The projection version of the Schwarz alternating method

suggests various parallel generalizations such as the additive Schwarz, hybrid

Schwarz and restricted Schwarz methods. Chap. 2.3 describes the matrix ver-

sion of Schwarz algorithms, which we refer to as Schwarz subspace algorithms

[XU3]. Chap. 2.4 discusses implementational issues for applications to ﬁnite

element or ﬁnite diﬀerence discretizations of elliptic equations. Speciﬁc choices

of coarse spaces are also described. Chap. 2.5 describes theoretical results on

the convergence of Schwarz algorithms in an energy norm.

48 2 Schwarz Iterative Algorithms

2.1 Background

In this section, we introduce notation on the elliptic equation and its weak

formulation and discretization, subdomain decompositions and block matrix

partitioning of the resulting linear system, restriction and extension maps.

2.1.1 Elliptic Equation

We consider the following self adjoint and coercive elliptic equation:

⎧

⎨ Lu ≡ −∇ · (a(x)∇u) + c(x) u = f, in Ω

⎪

u = gD , on BD (2.1)

⎪

⎩

n · (a∇u) + γ u = gN , on BN ,

on a domain Ω ⊂ IRd for d = 2, 3, with unit exterior normal n(x) at x ∈ ∂Ω,

Dirichlet boundary BD ⊂ ∂Ω, and natural (Neumann or Robin) boundary

BN ⊂ ∂Ω where B D ∪ BN = ∂Ω and BD ∩ BN = ∅. We shall assume that the

diﬀusion coeﬃcient a(x) is piecewise smooth and for 0 < a0 ≤ a1 satisﬁes:

a0 |ξ|2 ≤ ξ T a(x) ξ, ≤ a1 |ξ|2 , ∀x ∈ Ω, ξ ∈ IRd .

To ensure the coercivity of (2.1), we shall assume that c(x) ≥ 0 and γ(x) ≥ 0.

In most applications, we shall assume BD = ∂Ω and BN = ∅.

Remark 2.1. When BD = ∅, γ(x) ≡ 0 and c(x) ≡ 0, functions f (x) and gN (x)

will be required to satisfy compatibility conditions for solvability of (2.1):

f (x)dx + gN (x)dsx = 0.

Ω ∂Ω

**In this case, the general solution u(·) to the Neumann boundary value problem
**

will not be unique, and will satisfy u(x) ≡ u∗ (x) + α where u∗ (x) is any

particular non-homogeneous solution and α is a constant.

2.1.2 Weak Formulation

The weak formulation of (2.1) is obtained by multiplying it by a test function

v(.) with zero boundary value on BD , and integrating the resulting expression

by parts over Ω. The weak problem will seek u ∈ HD 1

(Ω) which satisﬁes

u(.) = gD (.) on BD such that:

A(u, v) = F (v), ∀v ∈ HD

1

(Ω), (2.2)

where A(·, ·), F (·) and 1

HD (Ω) are deﬁned by:

⎧

⎪

⎪ A(u, v) ≡ (∇u · a∇v + c u v) dx + BN γ u v dsx ,

⎨

Ω

F (v) ≡ f v dx + BN gN v dsx , (2.3)

⎪

⎪

Ω

⎩ 1

HD (Ω) ≡ v ∈ H 1 (Ω) : v = 0 on BD .

1

Here HD (Ω) denotes the space satisfying zero Dirichlet boundary conditions.

2.1 Background 49

2.1.3 Finite Element Discretization

**Let Th (Ω) denote a quasiuniform triangulation of Ω ⊂ IRd with elements of
**

size h. For simplicity, we assume that the elements are simplices (triangles

when d = 2 or tetrahedra when d = 3) and that Vh ⊂ H 1 (Ω) is the space of

continuous piecewise linear ﬁnite element functions on Th (Ω). Homogeneous

essential boundary conditions can be imposed in Vh by choosing Vh ∩ HD1

(Ω).

The ﬁnite element discretization of (2.1), see [ST14, CI2, JO2, BR28, BR],

will seek uh ∈ Vh with uh = Ih gD on BD and satisfying:

A(uh , vh ) = F (vh ), ∀vh ∈ Vh ∩ HD

1

(Ω). (2.4)

**Here Ih denotes the nodal interpolation onto Vh , restricted to BD . This will
**

yield a linear system Ah uh = f h . We shall often omit the subscript h.

Let nI , nBN and nBD denote the number of nodes of triangulation Th (Ω)

in the interior of Ω, the boundary segments BN and BD , respectively. Denote

by xi for 1 ≤ i ≤ (nI + nBN + nBD ) all the nodes of Th (Ω). We assume that

these nodes are so ordered that:

⎧

⎨ xi ∈ Ω, for 1 ≤ i ≤ nI

⎪

xi ∈ BN , for (nI + 1) ≤ i ≤ (nI + nBN )

⎪

⎩

xi ∈ BD , for (nI + nBN + 1) ≤ i ≤ (nI + nBN + nBD ).

**Corresponding to each node 1 ≤ i ≤ (nI + nBN + nBD ), let φi (x) denote the
**

continuous piecewise linear ﬁnite element nodal basis in Vh , satisfying:

φi (xj ) = δij , for 1 ≤ i, j ≤ (nI + nBN + nBD ),

**where δij denotes the Kronecker delta. Given uh (x) ∈ Vh , we expand it as:
**

nI nB

uh (x) = i=1 (uI )i φi (x) + i=1N (uBN )i φnI +i (x)

nBD

+ i=1 (uBD )i φnI +nBN +i (x),

**where uI , uBN and uBD denote subvectors deﬁned by:
**

⎧

⎪

⎪ (uI )i ≡ uh (xi ), 1 ≤ i ≤ nI ,

⎨

(uBN )i ≡ uh (xnI +i ), 1 ≤ i ≤ n BN ,

⎪

⎪

⎩ (u ) ≡ u (x ), 1 ≤ i ≤ n .

BD i h nI +nBN +i BD

**This block partitions the vector of nodal values associated with uh as:
**

T

uh = uTI , uTBN , uTBD ,

corresponding to the ordering of nodes in Ω, BN and BD , respectively.

50 2 Schwarz Iterative Algorithms

**Employing the above block partition, the ﬁnite element discretization (2.4)
**

of (2.1) is easily seen to have the following block structure:

⎧

⎪

⎨ AII uI + AIBN uBN + AIBD uBD = f I

ATIBN uI + ABN BN uBN + ABN BD uBD = f BN

⎪

⎩

uBD = Ih gD ,

**where the block submatrices and vectors above are deﬁned by:
**

⎧

⎪

⎪ (AII )ij = A(φi , φj ), 1 ≤ i, j ≤ nI

⎪

⎪

⎪

⎪ (AIBN )ij = A(φi , φnI +j ), 1 ≤ i ≤ nI , 1 ≤ j ≤ n BN

⎪

⎪

⎪

⎪ (AIBD )ij = A(φi , φnI +nBN +j ), 1 ≤ i ≤ nI , 1 ≤ j ≤ n BD

⎪

⎪

⎪

⎨ (A )

BN BN ij = A(φ nI +i , φnI +j ), 1 ≤ i, j ≤ n BN

⎪

⎪ (A )

BN BD ij = A(φ , φ ), 1 ≤ i ≤ n BN , 1 ≤ j ≤ n BD

⎪

⎪

nI +i nI +nBN +j

⎪

⎪ 1 ≤ i ≤ nI

⎪ (f I )i

⎪

= F (φi ),

⎪

⎪

⎪ (f BN )i

⎪ = F (φnI +i ), 1 ≤ i ≤ n BN

⎪

⎩

(Ih gD )i = gD (xnI +nBN +i ), 1 ≤ i ≤ n BD .

**Eliminating uBD in the above linear system yields:
**

AII uI + AIBN uBN = f I − AIBD Ih gD

(2.5)

ATIBN uI + ABN BN uBN = f BN − ABN BD Ih gD .

**In matrix notation, this yields the block partitioned linear system:
**

AII AIBN uI ˜f I

= ,

ATIBN ABN BN uBN ˜f B

N

where

˜f I f I − AIBD Ih gD

≡ .

˜f B

N

f BN − ABN BD Ih gD

**Remark 2.2. If BN = ∅, then problem (2.1) will be a Dirichlet problem with
**

∂Ω = BD . In this case, the discretization reduces to:

Ah uh = f h , (2.6)

**with Ah ≡ AII and f h ≡ f I − AIB Ih gB , where we have denoted B ≡ BD .
**

Remark 2.3. If BD = ∅, then (2.1) will be a Robin problem if γ(x)
= 0, or a

Neumann problem if γ(x) ≡ 0. In this case ∂Ω = BN and we shall use the

notation B ≡ BN . The discretization of (2.1) will then have the form:

AII AIB uI ˜f I

Ah uh = f h , with Ah ≡ , u h ≡ , f h ≡ . (2.7)

ATIB ABB uB ˜f B

2.1 Background 51

**If γ(x) ≡ 0 and c(x) ≡ 0, then matrix Ah will be singular, satisfying Ah 1 = 0,
**

where 1 and 0 denote vectors of appropriate size having all entries identically

1 or 0, respectively. In this case, the forcing f h in (2.7) will be required to

satisfy the compatability condition 1T f h = 0, for the linear system to be

solvable. The solution space will then have the form uh = u∗h + α 1 for α ∈ IR,

where u∗h is any particular solution.

2.1.4 Multisubdomain Decompositions

We employ the following notation for multidomain decompositions, see Fig. 2.1.

**Deﬁnition 2.4. A collection of open subregions Ωi ⊂ Ω for 1 ≤ i ≤ p will be
**

referred to as a nonoverlapping decomposition of Ω if the following hold:

p

∪l=1 Ω i = Ω,

Ωi ∩ Ωj = ∅, if
= j.

**Boundaries of the subdomains will be denoted Bi ≡ ∂Ωi and their interior
**

and exterior segments by B (i) ≡ ∂Ωi ∩ Ω and B[i] ≡ ∂Ωi ∩ ∂Ω, respectively.

We denote common interfaces by Bij ≡ Bi ∩ Bj and B ≡ ∪i B (i) .

**When the subdomains Ωi are shape regular, we let h0 denote its diameter.
**

For additional notation on non-overlapping subdomains, see Chap. 3.

**Deﬁnition 2.5. A collection of open subregions Ωi∗ ⊂ Ω for 1 ≤ i ≤ p will be
**

referred to as an overlapping decomposition of Ω if the following holds:

∪pl=1 Ωi∗ = Ω.

p

If {Ωl }l=1 forms a non-overlapping decomposition of Ω of diameter h0 and

each Ωi ⊂ Ωi∗ , then {Ωl∗ }l=1 will be said to form an overlapping decomposition

p

p

of Ω obtained by extension of {Ωl }l=1 . Most commonly:

Ωi∗ ≡ Ωiβ h0 ≡ {x ∈ Ω : dist(x, Ωi ) < β h0 } (2.8)

**where 0 < β < 1 is called the overlap factor. Boundaries will be denoted ∂Ωi∗
**

and with abuse of notation, B (i) ≡ ∂Ωi∗ ∩Ω and B[i] ≡ ∂Ωi∗ ∩∂Ω, respectively.

**Non-overlapping subdomains Selected extended subdomains
**

Ω1 Ω2 Ω3 Ω4 Ω1∗

Ω5 Ω6 Ω7 Ω8

∗

Ω9 Ω10 Ω11 Ω12 Ω11

Ω13 Ω14 Ω15 Ω16

Fig. 2.1. Multidomain overlapping and non-overlapping decompositions

52 2 Schwarz Iterative Algorithms

**2.1.5 Restriction and Extension Maps
**

Restriction and extension maps are rectangular matrices used for representing

domain decomposition preconditioners. A restriction map will restrict a vector

of nodal values to a subvector corresponding to indices in some index set S.

An extension map will extend a subvector of nodal values in S to a full vector,

whose entries will be zero outside S. Formally, given any subregion S ⊂ (Ω ∪

BN ), order the nodes of Th (Ω) in S in some local ordering. Let n ≡ (nI + nBN )

denote the total number of ﬁnite element unknowns, and nS the number of

nodes of Th (Ω) in S. We shall associate an index function index(S, i) to denote

the global index of the i’th local node in S for 1 ≤ i ≤ nS . We then deﬁne an

nS × n restriction matrix RS which will map a vector in IRn of nodal values

on the grid Th (Ω) into a subvector in IRnS of nodal values associated with the

nodes in S in the local ordering:

1 if index(S, i) = j

(RS )ij = (2.9)

0 if index(S, i)
= j.

**The transpose RST of restriction matrix RS is referred to as an extension
**

matrix. It will be an n × nS matrix which extends a vector in IRnS to a vector

in IRn with zero entries corresponding to indices not in S.

Remark 2.6. Given a vector v ∈ IRn of nodal values in Th (Ω), the vector

RS v ∈ IRnS will denote its subvector corresponding to indices of nodes in S

(using the local ordering of nodes in S). Given a nodal vector vS ∈ IRnS of

nodal values in S, the vector RST vS ∈ IRn will denote a nodal vector in Th (Ω)

which extends vS to have zero nodal values at all nodes not in S. To imple-

ment such maps, their action on vectors should be computed algorithmically

employing suitable data structures and scatter-gather operations.

Remark 2.7. Given the global stiﬀness matrix Ah of size n, its submatrix ASS

of size nS corresponding to the nodes in S may be expressed formally as:

ASS = RS Ah RST .

In implementations, the action of ASS on vectors should be computed algo-

rithmically employing scatter-gather operations and sparse data structures.

Remark 2.8. Typical choices of S in Schwarz algorithms will be indices of

nodes in Ωi∗ ∪ (BN ∩ ∂Ωi∗ ). (In Schur complement algorithms, see Chap. 3,

the set S will correspond to indices of nodes on segments, called globs, of the

subdomain boundaries B (i) . The notation RS and RTS will be used).

2.1.6 Partition of Unity

Given an overlapping decomposition Ω1∗ , . . . , Ωp∗ of Ω, we shall often employ a

smooth partition of unity χ1 (x), . . . , χp (x) subordinate to these subdomains.

The partition of unity functions must satisfy the following requirements:

2.1 Background 53

⎧ ∗

⎨ χi (x) ≥ 0, in Ω i

∗

χi (x) = 0, in Ω \Ω i (2.10)

⎩

χ1 (x) + · · · + χp (x) = 1, in Ω.

As in Chap. 1.1, a continuous partition of unity may be constructed based on

the distance functions di (x) ≡ dist(x, ∂Ωi∗ ∩ Ω) ≥ 0 as follows:

di (x)

χi (x) ≡ , for 1 ≤ i ≤ p.

d1 (x) + · · · + dp (x)

Smoother χi (x) may be obtained by using molliﬁed di (x), see [ST9].

2.1.7 Coarse Spaces

The convergence rate of one-level domain decomposition algorithms (namely,

algorithms involving only subdomains problems) will typically deteriorate as

the number p of subdomains increases. This may be understood heuristically

as follows. Consider a rectangular domain Ω divided into p vertical strips. Each

iteration, say of a Schwarz alternating method, will only transfer information

between adjacent subdomains. Thus, if the forcing term is nonzero only in

the ﬁrst strip and the starting iterate is zero, then it will take p iterations for

the local solution to be nonzero in the p’th subdomain. For elliptic equations

(which have a global domain of dependence on the solution, due to the Green’s

function representation), the solution will typically be nonzero globally even

when the forcing term is nonzero only in a small subregion. Thus, an algo-

rithm such as the classical Schwarz alternating method (and other one-level

methods) will impose limits on the speed at which information is transferred

globally across the entire domain.

The preceding limitation in the rate of convergence of one-level domain de-

composition iterative algorithms can be handled if a mechanism is included for

the global transfer of information across the subdomains. Motivated by multi-

grid methodology [BR22, HA2, MC2] and its generalizations [DR11, XU3],

such a global transfer of information can be incorporated by solving a subprob-

lem on an appropriately chosen subspace of the ﬁnite element space, whose

support covers the entire domain. Such subspaces are referred to as coarse

spaces, provided they satisfy speciﬁed assumptions. A simple example would

be the space of coarse grid ﬁnite element functions deﬁned on a coarse trian-

gulation Th0 (Ω) of Ω, as in two-level multigrid methods. In the following, we

list the approximation property desired in such coarse spaces, where 0 < h0

represents a small parameter (typically denoting the subdomain size).

Deﬁnition 2.9. A subspace V0 ⊂ Vh ∩ HD 1

(Ω) will be referred to as a coarse

space having approximation of order O(h0 ) if the following hold:

Q0 uh H 1 (Ω) ≤ C uh H 1 (Ω) , ∀uh ∈ Vh ∩ HD 1

(Ω)

uh − Q0 uh L2 (Ω) ≤ C h0 uh H 1 (Ω) , ∀uh ∈ Vh ∩ HD

1

(Ω)

where Q0 denotes the L2 -orthogonal projection onto subspace V0 ∩ HD

1

(Ω).

54 2 Schwarz Iterative Algorithms

**Using a coarse space V0 ⊂ Vh , information may be transferred globally
**

across many subdomains, by solving a ﬁnite dimensional global problem, using

residual correction as follows. Suppose wh denotes an approximate solution of

discrete problem (2.4) in Vh ∩ HD 1

(Ω). An improved approximation wh + w0

of uh may be sought by selecting w0 ∈ V0 so that it satisﬁes the following

residual equation:

A(w0 , v) = F (v) − A(wh , v), ∀v ∈ V0 . (2.11)

**It is easily veriﬁed that w0 is the A(., .)-orthogonal projection of uh − wh onto
**

the subspace V0 . Once w0 is determined, wh + w0 will provide an improved

approximation of the desired solution uh .

The preceding coarse space residual problem (2.11) can be represented in

matrix terms as follows. Let n0 denote the dimension of V0 ⊂ Vh ∩ HD 1

(Ω)

(0) (0)

and let ψ1 (·), · · · , ψn0 (·) denote a basis for V0 . If n = (nI + nBN ) is the

dimension of Vh ∩ HD 1

(Ω), let x1 , · · · , xn denote the nodes in (Ω ∪ BN ). Deﬁne

an n × n0 matrix R0 whose entries are deﬁned as follows:

T

⎡ (0) (0) ⎤

ψ1 (x1 ) · · · ψn0 (x1 )

⎢ ⎥

⎢ .. .. ⎥

R0T = ⎢ . . ⎥.

⎣ ⎦

(0) (0)

ψ1 (xn ) · · · ψn0 (xn )

**Let w0 = R0T α and v = R0T β denote nodal vectors representing w0 and v
**

above, for suitable coeﬃcient vectors α, β ∈ IRn0 . Then (2.11) becomes:

β T (R0 Ah R0T )α = β T R0 (f h − Ah wh ) , ∀β ∈ IRn0 .

**This yields the linear system A0 α = R0 (f h − Ah wh ) , where A0 = (R0 Ah R0T ).
**

The vector update to the approximate solution wh will then be wh + R0T α,

which may also be expressed as wh + R0T A−1 0 R0 (f h − Ah wh ). Four speciﬁc

coarse spaces V0 are described in the following. Additional spaces are described

in [BR15, SM2, CO8, SA7, WI6, MA17].

Coarse Triangulation Space. If domain Ω can be triangulated by a quasi-

uniform triangulation Th0 (Ω) with elements of size h0 > h, such that Th (Ω)

is obtained by successive reﬁnement of Th0 (Ω), then a coarse space V0 can be

deﬁned as the space of continuous, piecewise linear ﬁnite element functions on

triangulation Th0 (Ω). To enforce homogeneous essential boundary conditions

so that V0 ⊂ Vh ∩ HD1

(Ω), the Dirichlet boundary segment BD must the union

of boundary segments of elements of Th0 (Ω). Such coarse spaces are motivated

by multigrid methodology.

Interpolation of a Coarse Triangulation Space. If the geometry of Ω

is complex or the triangulation Th (Ω) is unstructured, then it may be com-

putationally diﬃcult, if not impossible, to construct a coarse triangulation

2.1 Background 55

**Th0 (Ω) of Ω from which to obtain Th (Ω) by successive reﬁnement. In such
**

cases, an alternative coarse space [CA4, CH17] can be constructed as follows,

when BN = ∅. Let Ω ∗ ⊃ Ω denote an extension of Ω having simpler geometry

(such as a polygon). Let Th0 (Ω ∗ ) denote a coarse triangulation of Ω ∗ hav-

ing elements of size h0 > h. The elements of Th0 (Ω ∗ ) will in general not be

the union of elements in Th (Ω). Despite this, a coarse subspace of Vh can be

deﬁned as follows. Let Vh0 (Ω ∗ ) ⊂ H01 (Ω ∗ ) denote a ﬁnite element space on

triangulation Th0 (Ω ∗ ) of Ω ∗ with zero boundary values. Deﬁne V0 as:

V0 ≡ {πh wh∗ 0 : wh∗ 0 ∈ Vh0 (Ω ∗ )},

**where πh denotes the standard nodal interpolation onto all grid points of
**

Th (Ω) excluding nodes on BD . By construction V0 ⊂ Vh ∩ HD

1

(Ω).

Interpolation of a Polynomial Space. If as in the preceding case, the

geometry of Ω is complex or the triangulation Th (Ω) is unstructured, and

BD = ∅, then a coarse space may be deﬁned as follows. Let Pd (Ω) denote

the space of all polynomials of degree d or less on Ω. Generally Pd (Ω)
⊂ Vh .

However, we may interpolate such polynomials onto the ﬁnite element space

V h ∩ HD

1

(Ω) as follows:

V0 ≡ {πh wd (x) : wd (x) ∈ Pd (Ω)} ,

**where πh denotes the standard nodal interpolant onto the ﬁnite element space
**

V h ∩ HD

1

(Ω). By construction V0 ⊂ Vh ∩ HD1

(Ω).

Piecewise Constant Space. A more general coarse space, referred to as the

piecewise constant coarse space [CO8, SA7, MA17, WA6], can be constructed

given any nonoverlapping decomposition Ω1 , . . . , Ωp of Ω as follows. Let h0

denote the size of the subdomains and deﬁne Ωi∗ as the extension of Ωi con-

taining all points of Ω within a distance β h0 to Ωi . Let χ1 (.), . . . , χp (.) denote

a partition of unity based on Ω1∗ , . . . , Ωp∗ . This partition of unity should be

constructed so that its sum is zero on BD and unity on BN . Denote the union

of subdomain interfaces as B ≡ (∪pi=1 ∂Ωi ) \ BD .

Deﬁne a restriction map RB which restricts any function w(x) onto B:

RB w(x) ≡ w(x), for x ∈ B.

**Given a function v(x) deﬁned on B, denotes its piecewise harmonic extension
**

Ev(x) into the interior of each subdomain Ωi for 1 ≤ i ≤ p as:

L (Ev) = 0, in Ωi

Ev = v, on ∂Ωi ,

**where L Ev denotes the elliptic operator applied to Ev. The continuous version
**

of the piecewise constant coarse space V0 is now deﬁned as:

V0 ≡ span [E RB χ1 . . . E RB χp ] .

56 2 Schwarz Iterative Algorithms

**A ﬁnite element version of V0 can be constructed analogously, see Chap. 2.5,
**

using restriction onto nodal values on B and discrete harmonic extensions into

the subdomains. If the coeﬃcient a(.) in (2.1) is discontinuous of the form:

a(x) ≡ ai for x ∈ Ωi , 1 ≤ i ≤ p,

**then it will be advantageous to rescale the original partition of unity to account
**

for large variation in a(.). A new partition of unity χˆ1 (.), . . . , χ

ˆp (.) will be:

ai χi (x)

ˆi (x) ≡

χ for 1 ≤ i ≤ p.

a1 χ1 (x) + · · · + ap χp (x)

An alternative coarse space Vˆ0 can be constructed based on this.

**2.2 Projection Formulation of Schwarz Algorithms
**

In this section, we describe the classical Schwarz alternating method for iter-

atively solving the following coercive elliptic equation:

⎧

⎨ Lu ≡ −∇ · (a(x)∇) + c(x) u = f, in Ω

⎪

n · ( a∇u) + γ u = gN , in BN (2.12)

⎪

⎩

u = 0, on BD ,

**where c(x) ≥ 0, γ(x) ≥ 0, and BD and BN denote Dirichlet and natural
**

boundary segments of ∂Ω. The weak formulation of (2.12) seeks u ∈ HD

1

(Ω):

A(u, v) = F (v), ∀v ∈ HD

1

(Ω), (2.13)

where

⎧

⎪

⎪ A(u, v) ≡ Ω

(a(x)∇v · ∇v + c(x) u v) dx

⎪

⎪

⎨ + BN γ(x) u v ds(x), for u, v ∈ HD1

(Ω)

(2.14)

⎪

⎪ F (v) ≡ Ω f (x) v(x) dx + BN gN (x) v ds(x), for v ∈ HD1

(Ω)

⎪

⎪

⎩

HD1

(Ω) ≡ v ∈ H 1 (Ω) : v = 0 on BD .

**Applying integration by parts to the continuous version of the multidomain
**

Schwarz alternating method, we shall derive a formal expression for the up-

dates in the iterates as involving orthogonal projections onto certain subspaces

1

of HD (Ω). Employing these projections, we shall derive various parallel ex-

tensions of the classical Schwarz alternating method, including the additive

Schwarz, hybrid Schwarz and restricted Schwarz methods. Let Ω1∗ , · · · , Ωp∗

denote an overlapping decomposition of Ω, and let B (i) ≡ ∂Ωi∗ ∩ Ω and

B[i] ≡ ∂Ωi∗ ∩ ∂Ω denote the interior and exterior boundary segments of Ωi∗ .

2.2 Projection Formulation of Schwarz Algorithms 57

2.2.1 Classical Schwarz Alternating Method

**Let w(0) denote a starting iterate satisfying w(0) = 0 on BD . Then, the
**

multidomain Schwarz alternating method will iteratively seek the solution

to (2.12) by sequentially updating the iterate on each subdomain Ωi∗ in some

prescribed order. Each iteration (or sweep) will consist of p fractional steps

and we shall denote the iterate in the i’th fractional step of the k’th sweep as

i i−1 i

w(k+ p ) . Given w(k+ p ) the next iterate w(k+ p ) is computed as follows:

⎧

⎪ (k+ pi ) i

⎪

⎪ −∇ · a(x)∇w + c(x) w(k+ p ) = f (x), in Ωi∗

⎪

⎪

⎨ i i

n · a∇w(k+ p ) + γ w(k+ p ) = gN , on B[i] ∩ BN

(2.15)

⎪

⎪ (k+ i

) (k+ i−1

) (i)

⎪

⎪ w p = w p , on B

⎪

⎩ i

w(k+ p ) = 0, on B[i] ∩ BD .

i−1

The local solution w(k+ p ) is then extended outside Ωi∗ as follows:

i i−1 ∗

w(k+ p ) ≡ w(k+ p ) , on Ω \ Ωi . (2.16)

**The resulting iterates will thus be continuous on Ω by construction.
**

Algorithm 2.2.1 (Continuous Schwarz Alternating Method)

Input: w(0) starting iterate.

1. For k = 0, 1, · · · until convergence do:

2. For i = 1, · · · , p solve:

⎧

⎪ (k+ pi ) i

⎪ −∇ · a(x)∇v

⎪ + c(x) v (k+ p ) = f (x), in Ωi∗

⎪

⎪

15. Endfor The iterates w(k) (. and the coeﬃcients in (2.5 and Chap. the diameters diam(Ωi∗ ) of the subdomains. . on B (i) ⎪ ⎪ ⎪ ⎩ i v (k+ p ) = 0. on Ω \ Ω i . The convergence factor 0 < δ < 1 will generally depend on the overlap β between the subdomains. ⎨ i i n · a∇v (k+ p ) + γ v (k+ p ) = gN . 3. on B[i] ∩ BD .1).) with: u − w(k) H 1 (Ω) ≤ δ k u − w(0) H 1 (Ω) . Update: i ∗ (k+ pi ) v (k+ p ) .) will converge geometrically to the solution u(. see Chap. Endfor 4. on B[i] ∩ BN ⎪ ⎪ i i−1 v (k+ p ) = w(k+ p ) . 2. on Ω i w ≡ i−1 ∗ w(k+ p ) .

then since information is transferred only between adja- cent subdomains during each sweep of the Schwarz iteration. d do: 3. while if w(0) = 0 and f (. · · · . However. since such subdomain does not intersect. · · · . p} is said to yield a d-coloring of the subdomains if: i.) 1. The following is the multicolor Schwarz algorithm with starting iterate w(. can often be remedied by using coarse space residual correction (described later). 2.10.) has support in only one subdomain. · · · . a partition C1 . For l = 1. Ωp∗ . The Schwarz alternating Alg. 2.). Given subdomains Ω1∗ . This is because the true solution to (2.58 2 Schwarz Iterative Algorithms As the number p of subdomains increases.2 (Multicolor Schwarz Alternating Algorithm) Input: w(. however. j ∈ Ck with i = j =⇒ Ωi∗ ∩ Ωj∗ = ∅.). For each i ∈ Cl solve in parallel: ⎧ .2. Cd of the index set {1. For k = 0. Such a deteri- oration in the convergence. It is sequential in nature. Then.1 is also known as the multiplicative or sequential Schwarz algorithm. · · · . Deﬁnition 2. the convergence rate typically deteriorates yielding δ → 1. all subproblems on subdomains of the same color can be solved concurrently.12) has a global domain of dependence on f (. · · · until convergence do: 2. it may generally take p sweeps before this information is transferred globally. Algorithm 2.2. so that subdomains of the same color Ck do not intersect. paral- lelizability of this algorithm can be signiﬁcantly improved by grouping the subdomains into colors so that distinct subdomains of the same color do not intersect.

in Ωi∗ ⎪ ⎪ . ⎪ i i ⎪ ⎪ −∇ · a(x)∇v (k+ p ) + c(x) v (k+ p ) = f (x).

Endfor 5. ⎨ i i n · a∇v (k+ p ) + γv (k+ p ) = gN . ⎪ ⎩ i v (k+ p ) = 0. on Ω i .) . on B[i] ∩ BD . Update: i ∗ w ← v (k+ p ) . Endfor 6. w(k+1) ← w 7. Endfor Output: w(. on B[i] ∩ BN ⎪ ⎪ i v (k+ p ) on B (i) ⎪ ⎪ = w. 4.

C2 = {2. ∗ 1 see [MA37. If q processors are available and the subdomains can be colored into d colors with approximately (p/d) subdomains of the same color.. and further if (p/d) is a multiple of q. 2. 4. 2. We deﬁne an A(. 10. then Pi u can be computed without explicit knowledge of u using that A(u. JO2]. v).17) We will employ the property that the bilinear form A(.2.2 may be grouped into four colors: C1 = {1. Additionally. . . C4 = {6. 13.) in (2.12. 3. i The updates w(k+ p ) in the continuous Schwarz alternating method can be 1 expressed in terms of certain projection operators onto subspaces of HD (Ω). to ensure that the loads assigned to each processor are balanced. Remark 2. LI6]. The existence and uniqueness of Pi w is guaranteed by the Lax-Milgram lemma. · · · . 11} . and each subdomain should be approximately of the same diameter.13. 9. Some communication will be necessary between the diﬀerent subdomains. C3 = {5. . Deﬁnition 2. 15} . Remark 2. On each Ωi deﬁne a subspace Vi of HD (Ω) as: Vi ≡ v ∈ HD 1 (Ω) : v = 0 in Ω \ Ωi∗ . Multisubdomain overlapping decomposition Remark 2.14) deﬁnes an inner product on HD 1 (Ω) when BD = ∅.13). Given w ∈ HD 1 (Ω) deﬁne Pi w ∈ Vi as the solution of: A(Pi w. the number of colors d should be chosen to be as small as possible. then subdomains of the same color may be partitioned into q groups and each group assigned to one of the processors.11. Ω16 ∗ in Fig. there should be approx- imately the same number of subdomains of each color. 12} .2 Projection Formulation of Schwarz Algorithms 59 Non-overlapping decomposition Selected extended subdomains Ω1 Ω2 Ω3 Ω4 Ω1∗ Ω5 Ω6 Ω7 Ω8 ∗ Ω9 Ω10 Ω11 Ω12 Ω11 Ω13 Ω14 Ω15 Ω16 Fig. 2. 7.)- 1 orthogonal projection operator Pi onto subspace Vi of HD (Ω) as follows. v) = F (v). 14. If u denotes the solution of weak formula- tion (2. provided the overlap β is not too large. For instance. To minimize the number d of sequential steps. 8. see [CI2].. since F (·) is given for all v ∈ Vi . The following result shows that the projection maps Pi can represent the updates in the continuous version of the Schwarz alternating method.14. (2. for v ∈ Vi . the subdomains Ω1∗ . 16} . see [CI2. v) = A(w.

on B[i] ∩ BN (2. 1. on Ωi∗ ⎪ ⎪ ⎨ n · (a∇wi ) + γ wi = gN . ∀v ∈ Vi . on B (i) ⎪ ⎩ wi = 0. v) = F (v) = A(u. An (k+ pi ) (k+ i−1 application of Lemma 2. Ωi∗ Ω where wi ∈ Vi due to its boundary conditions. Then wi = w + Pi (u − w) . Suppose the following assumptions hold. Multiplying (2. Let u satisfy (2. Thus. Given w ∈ HD 1 (Ω) let wi satisfy: ⎧ ⎪ −∇ · (a(x)∇wi ) + c(x) wi = f (x). v) = A(u − w. Since (wi − w) = 0 in Ω \ Ωi∗ it yields wi − w ∈ Vi and wi − w = Pi (u − w). 2.18) by v ∈ Vi ⊂ HD 1 (Ω) (which is zero outside Ωi∗ ). The continuous version of the Schwarz alternating method may now be reformulated in terms of the projection operators Pi onto Vi ⊂ HD 1 (Ω). with wi ≡ w on Ω \ Ωi∗ .18) ⎪ ⎪ wi = w. Proof.15.13) and let gN (x) ≡ n · (a(x)∇u) + γ(x) u on BN . v). on B[i] ∩ BD . we obtain wi = w + Pi (u − w) . Employing the above yields: A(wi − w.60 2 Schwarz Iterative Algorithms Lemma 2. and integrating the resulting term by parts yields: (Lwi ) v dx = (Lwi ) v dx = A(wi . ∀v ∈ Vi . v).15 with wi ≡ w and w ≡ w p ) yields: i i−1 .

Algorithm 2.2. 1. · · · until convergence do: 2. i−1 w(k+ p ) = w(k+ p ) + Pi u − w(k+ p ) . For k = 0.3 (Projection Version of the Classical Schwarz Method) Input: w(0) starting iterate. p do i i−1 . (2. 1. · · · .19) Substituting this representation into the Schwarz alternating method yields its projection formulation. For i = 1.

Endfor Output: w(k) . 3. Endfor 4. i−1 w(k+ p ) = w(k+ p ) + Pi u − w(k+ p ) .

2.16. the subspaces Vi of HD (Ω) must satisfy: 1 HD (Ω) = V1 + · · · + Vp see Chap.2 Projection Formulation of Schwarz Algorithms 61 Remark 2.5. the projections Pi may no longer involve the solution of partial diﬀerential equations on subdomains. For general subspaces Vi ⊂ HD 1 (Ω). however. The preceding projection version of the Schwarz alternating method will also be applicable for more general subspaces Vi ⊂ HD 1 (Ω). Subtracting the iterates in (2.19) from u and recursively applying the expression yields the following equation for the error u − w(k+1) ⎧ . To 1 ensure convergence. 2.

p−1 ⎪ ⎪ u − w(k+1) = (I − Pp ) u − w(k+ p ) ⎪ ⎪ .

13) will also solve: P u = w∗ .13) using a sum of projections P ≡ P1 + · · · + Pp . where each Pi is the A(.13) is a highly parallel algorithm in the Schwarz family [DR11]. ⎪ ⎩ = (I − Pp ) · · · (I − P1 ) u − w(k) . . since the terms Pi u ∈ Vi can be computed by solving: A(Pi u. ∀v ∈ Vi . It is shown in Chap. Deﬁne the error ampliﬁcation map by T = (I − Pp ) · · · (I − P1 ). . (2. see Chap. the solution u of (2. ⎪ ⎪ . upper and lower bounds can be calculated for the spectra of P . we may compute w∗ ≡ (I − T )u without explicit knowledge of u. It reformulates (2.)-orthogonal projection onto Vi deﬁned by (2. Furthermore. . v) = F (v).).20) Equation (2.21). an equivalent problem for determining u is: (I − T )u = w∗ . 2... This map T will be a contraction (in an appropriate norm.20) will be well posed since T is a contraction.2 Additive Schwarz Method The additive Schwarz method to solve (2. For instance. ensuring the well posedness of problem (2. Formally.5).17).5 that the operator P is self adjoint and coercive in the Sobolev space HD 1 (Ω) equipped with the inner product A(. 2. Since (I − T ) involves only sums (or diﬀerences) of products of projections Pi . 2. Consequently. (2.2. when p = 2 we obtain that (I − T ) = P1 + P2 − P2 P1 and w∗ = P1 u + P2 u − P2 P1 u. ⎪ ⎨ p−2 = (I − P )(I − P ) u − w(k+ p ) p p−1 ⎪ ⎪ . v) = A(u.21) where w∗ ≡ P1 u + · · · + Pp u can be computed without explicit knowledge of u..

13) is based on the solution of (2.62 2 Schwarz Iterative Algorithms The additive Schwarz formulation of (2.21). for illustrative purposes we indicate a Richardson iteration to solve (2. a new iterate w(k+1) is constructed as follows [TA5].21). For 1 ≤ i ≤ p solve in parallel: ⎧ . Given an iterate w(k) . however. it is typically employed as a preconditioner. In the discrete case.

⎪ ⎪ −∇ · a(x)∇v (k+1) (k+1) + c(x) vi = f (x). in Ωi∗ ⎪ ⎪ .

on B[i] ∩ BD ≡ w(k) on Ω \ Ωi∗ . on B (i) ⎪ ⎪ vi ⎪ ⎩ (k+1) vi = 0. Then update: (k+1) and extend vi . on B[i] ∩ BN ⎪ ⎪ (k+1) = w(k) . i ⎪ ⎨ (k+1) (k+1) n · a∇vi + γ vi = gN .

where 0 < t1 < τ < t2 < p1 is the step size parameter in Richardson’s iteration. Algorithm 2. (k+1) w(k+1) ≡ (1 − τ p) w(k) + τ v1 + · · · + vp(k) .4 (Additive Schwarz-Richardson Iteration) Input: w(0) (starting iterate) and 0 < t1 < τ < t2 < p1 1. Compute in parallel: . For k = 0. The resulting algorithm is summarized below in terms of projections. · · · until convergence do: 2.2.

.2. If a coarse space V0 ⊂ HD 1 (Ω) is employed. Additionally. Endfor The additive Schwarz-Richardson iterates w(k) will converge geometrically to u for appropriately chosen τ .. As in the additive Schwarz method. then P = (P0 + · · · + Pp ) must be employed. w(k+1) ≡ w(k) + τ P1 (u − w(k) ) + · · · + Pp (u − w(k) ) .17). The resulting method yields improved convergence over the additive Schwarz method.)-orthogonal projections Pi for 1 ≤ i ≤ p.3. subspaces Vi are deﬁned by (2. 2. the multiplicative Schwarz iterates will generally converge more rapidly [XU3].3 Hybrid Schwarz Method The hybrid Schwarz method is a variant of the additive Schwarz method obtained by incorporating sequential steps from the multiplicative Schwarz method [MA15]. a . but the algorithm is less parallelizable due to the extra sequential steps. However. with associated A(.2. The matrix version of the additive Schwarz preconditioner is described in Chap. 3.

22) where f∗ ≡ Pˆ u can be computed explicitly.. without explicit knowledge of u. .13) as: u = P0 u + (I − P0 )u.17..2 Projection Formulation of Schwarz Algorithms 63 coarse space V0 ⊂ HD 1 (Ω) is employed with A(. Remark 2. ∀vi ∈ Vi . where V0⊥ denotes the orthogonal complement of V0 in the inner product A(. which is an A(. v0 ) = F (v0 ).. the hybrid Schwarz method solves (2. 2. Determine u0 ∈ V0 satisfying: A(u0 . ∀v0 ∈ V0 . Deﬁne w ≡ w1 + · · · + wp and determine u ˜0 ∈ V0 satisfying: u0 .).13): Pˆ u = f∗ . and formally construct the following problem equivalent to (2. Deﬁne: Pˆ ≡ P0 + (I − P0 ) (P1 + · · · + Pp ) (I − P0 ).)-orthogonal projection P0 . . The preceding observations may be combined.) and will generally have improved spectral properties over the additive Schwarz operator P = (P1 + · · · + Pp )..22). ∀v0 ∈ V0 . v0 ) = A(w.22) can be computed explicitly as follows. v0 ) = F (v0 ). . .22). . The hybrid Schwarz formulation decomposes the solution to (2. in principle. Formally. The operator Pˆ can be shown to be self adjoint and coercive in A(. vi ). A(˜ ∀v0 ∈ V0 . The component P0 u ∈ V0 can be formally determined by solving the subproblem: A(P0 u. (2. vi ) = F (vi ) − A(u0 . For 1 ≤ i ≤ p determine wi ∈ Vi satisfying: A(wi . Here g∗ = (I − P0 ) (P1 + · · · + Pp ) (I − P0 )u can be computed without explicit knowledge of u. by applying an additive Schwarz method in V0⊥ : (I − P0 ) (P1 + · · · + Pp ) (I − P0 )u = g∗ . The component (I − P0 )u ∈ V0⊥ can be sought. v0 ). we illustrate a Richardson iteration to solve (2. The forcing f∗ in (2.)-orthogonal decomposition. In the following. Then f∗ = Pˆ u = u0 + (w − u ˜0 ).

In the latter case. since χi (x) = 0 on B (i) . the algorithm can be applied either as an unaccelerated iteration or as a preconditioner. 3. Then. The balancing domain decomposition preconditioner for Schur complement matrices (in Chap.) solve the hybrid formulation for 1 ≤ i ≤ p: ⎧ ⎪ ⎪ −∇ · (a(x)∇wi ) + c(x) wi = f (x).19. Let w1 (.5 (Hybrid Schwarz-Richardson Iteration) Input: w(0) starting iterate and 0 < t1 < τ < t2 < p1 1.2.18. Let u(x) denote a solution to (2. it can also be motivated by a multisubdomain hybrid formulation of (2. the coarse space V0 may be constructed so that all the subdomain compatability conditions are simultaneously enforced in the orthogonal complement of V0 . CA17].12). 2.23) ⎪ ⎪ wi = j=i χj wj . Let c(x) ≥ c0 > 0 in (2. Using this. Endfor Remark 2. Suppose the following assumptions hold.)}pi=1 .64 2 Schwarz Iterative Algorithms Algorithm 2.12). For k = 0. on B (i) ⎪ ⎪ ⎩ wi = 0. on B[i] ∩ BD . on B[i] ∩ BN (2. the following result will hold: ∗ u(x) = wi (x) on Ω i . · · · . wp (. Compute in parallel: w(k+1) ≡ w(k) + τ P˜ (u − w(k) ). MA17]. in Ωi∗ ⎪ ⎪ ⎨ n · (a∇wi ) + γ wi = gN . 3.4 Restricted Schwarz Algorithm The restricted Schwarz method is a variant of the additive Schwarz method employing a partition of unity. Formally. · · · . In such applications. certain compatibility conditions must be satisﬁed locally. Ωp∗ . 2. In it. . · · · . In practice. · · · until convergence do: 2. KU6. the exact projections Pi are replaced by approximations which require the solution of Neumann boundary value problems on non-overlapping sub- domains Ωi . it yields a non-symmetric preconditioner even for self adjoint problems. for 1 ≤ i ≤ p.12) based on a partition of unity χ1 (x). χp (x) subordinate to Ω1∗ . Proof.2. we note that j=i χj (x) = 1 on B (i) for 1 ≤ i ≤ p. 3) is based on this principle [MA14. Given the partition of unity {χi (. see [CA19. 15 for the case BN = ∅.). See Chap. we obtain the hybrid formulation. Theorem 2. 1. For each subdomain Neumann problem to be solvable.

2. vp ) = (w1 . (2.24) ⎪ ⎪ wi = j=i χj vj . · · · .15 to wi with w ≡ v to obtain: wi = v + Pi (u − v). j=i Substitute this into (2. the outputs wi satisfy: ⎧ ⎪ ⎪ −∇ · (a(x)∇wi ) + c(x) wi = f (x). p in parallel compute: . · · · until convergence do: 2.6 (Restricted Schwarz Method in Projection Form) (0) (0) p (0) Input: (w1 .25) i i i The following algorithm corresponds to a Picard iteration of the map T . the mapping T will be a contraction and the Picard iterates of T will converge to its ﬁxed point (u1 . for 1 ≤ i ≤ p where u solves (2. At the ﬁxed point of T where v = w. this yields: w= χi wi = χi (w + Pi (u − w)) = w + χi Pi (u − w). the global approximation v(x) will satisfy: v(x) = χj (x)vj (x).24) and apply Lemma 2. · · · . Under the assumption c(x) ≥ c0 > 0 and BN = ∅. Algorithm 2. · · · . 1. on B[i] ∩ BD . Given local approximations p (v1 .23) corresponds to a ﬁxed point equation for the following linear mapping T deﬁned by: T (v1 . wp ) where for vi satisfying vi = 0 on B[i] ∩ BD and n · (a∇vi ) + γ vi = gN on B[i] ∩ BN for 1 ≤ i ≤ p. on Ωi∗ . For k = 0. on B (i) ⎪ ⎪ ⎩ wi = 0. · · · . on each B (i) . · · · . vp ) deﬁne a global approximation v ≡ j=1 χj vj . on Ωi∗ ⎪ ⎪ ⎨ n · (a∇wi ) + γ wi = gN . Since χi (x) = 0 for x ∈ B (i) . on B[i] ∩ BN (2.13).2. wp ) and w(0) (x) ≡ j=1 χj (x)wj (x) 1. For i = 1. up ) where ui ≡ u on each subdomain Ωi∗. · · · .2 Projection Formulation of Schwarz Algorithms 65 The hybrid formulation (2.

(k+1) wi ≡ Pi u − w(k) . 3. Deﬁne: w(k+1) (x) ≡ w(k) (x) + i=1 χi (x)wi (x). Endfor . 5. Endfor p (k+1) 4.

w ∈ V (2.Ω denotes the maximum norm. for 0 ≤ i ≤ p. TA8. Our formulation will employ the ﬁnite dimensional linear space V = IRn . for problems of the form (2.28) We shall formulate matrix Schwarz algorithms to solve this system by analogy with the projection algorithms described in Chap. 2. Consider the ﬁnite dimensional space V ≡ IRn endowed with a self adjoint and coercive bilinear form A(.66 2 Schwarz Iterative Algorithms Under appropriate assumptions.. w) ≡ vT Aw. we shall seek u ∈ V such that: ⎧ ⎨ A(u. 15 when BN = ∅. for v ∈ V. The convergence of the preconditioner associated with the preceding algorithm can also be improved signiﬁcantly if a coarse space projection term is employed additively. which also deﬁnes an inner product on V . we shall describe the matrix version of Schwarz algorithms. . as the number of subdomains is increased and their diameters decrease in size.27) ⎪ ⎩ F (v) ≡ vT f .). for v ∈ V. BR18.3 Matrix Form of Schwarz Subspace Algorithms In this section. v) = F (v). The matrix form of preconditioner associated with the restricted Schwarz method is described in Chap. for v. MA37. We shall further assume that we are given subspaces Vi ⊂ V for 0 ≤ i ≤ p satisfying: V = V0 + V1 + · · · + Vp .). XU3.Ω . 2. (2. problem (2. T will be a contraction and the iterates w(k) will converge geometrically to the solution u of (2. Given a linear functional F (·).. Consequently. Remark 2. where · ∞. where A is an n × n symmetric and positive deﬁnite matrix and f ∈ IRn . 2. see Chap. endowed with a self adjoint and coercive bilinear form A(. see [MI. (2. and that it is the column space (Range) of an n × ni matrix RiT of full rank: Vi ≡ Range RiT . .27) will correspond to the linear system: Au = f . the rate of convergence of the algorithm can deteriorate. . We shall assume that each Vi ⊂ IRn is of dimension ni . GR4].13). DR11.20.2.Ω ≤ δ k w(0) − u∞.12): w(k) − u∞. The preceding restricted Schwarz algorithm did not employ coarse space residual correction.3.26) In this case. where ⎪ A(v. matrix expressions can be derived for the projection version of the Schwarz algorithms described in the preceding section. In matrix terms.

wi ) = A(v. Given v ∈ V . This requires that given v ∈ V . can be obtained . yi ∈ IRni .22.26). we obtain that Ai xi = Ri Av. (n0 + · · · + nl )} . we deﬁne Pi v ∈ Vi : A(Pi v. The matrices Ri will be referred to as restriction maps while their transposes RiT will be referred to as extension maps. Remark 2. .24.3 to solve system (2. and substi- tuting Pi v = RiT xi results in the expression: Pi = RiT A−1 i Ri A (2.. . 2. In particular. ∀yi ∈ IRni . Remark 2. Since this must hold for all yi ∈ IRni . Remark 2. If the rows and columns of matrix Ri are elementary vectors. Instead. Matrix versions of Schwarz algorithms to solve (2. the columns of RiT must form a basis for Vi . The matrix version of Alg.30) for the matrix representation of Pi .28) instead of problem (2. then matrix Ai = Ri ARiT will correspond to principal submatrices of A.3 Matrix Form of Schwarz Subspace Algorithms 67 Thus. Solving this linear system yields xi = A−1 i Ri Av.28) based on the subspaces Vi can be ob- tained by transcribing the projection algorithms in terms of matrices. This will require a matrix representation of the projections Pi .21. . Matrix A−1 i should not be assembled. there must exist vi ∈ Vi satisfying: v = v0 + v1 + · · · + vp . then Al will correspond to the diagonal block of A with indices in Il . if (n0 + · · · + np ) = n and Rl corresponds to the rows of an identity matrix of size n with indices in Il : Il = {(n0 + · · · + nl−1 ) + 1.23.2.29) to obtain: yTi (Ri ARiT ) xi = yTi Ri A v. A matrix representation of Pi can be derived as follows. corresponding to selected columns or rows or some identity matrix of appro- priate size. Deﬁnition 2. . represent Pi v = RiT xi and wi = RiT yi for xi . We assume that Vi satis- ﬁes (2. an expression wi = A−1 i ri can be computed by solving Ai wi = ri .29) as the A(. Multiplicative Schwarz Algorithm.)-orthogonal of v ∈ V onto Vi . An elementary rank argument will show that (n0 + n1 + · · · + np ) ≥ n.2). where Ai ≡ (Ri ARiT ). wi ) ∀wi ∈ Vi (2. Since Vi is the column space of RiT . Substitute these representations into (2. . 2.

Substituting the matrix form of projection Pi . by replacing (k+ i−1 i−1 each update Pi (u − w p ) ) by its discrete counterpart Pi u − w(k+ p ) .28). where u is the solution to (2.

68 2 Schwarz Iterative Algorithms and using that Au = f yields: ⎧ .

⎨ Pi (u − w) = RT A−1 Ri A u − w(k+ p ) i−1 i i .

⎩ i−1 T −1 = Ri Ai Ri f − Aw(k+ p ) . the matrix form of: i i−1 . Thus.

becomes: i−1 . i−1 w(k+ p ) = w(k+ p ) + Pi u − w(k+ p ) .

p do: i+1 . f 1. · · · until convergence do: 2. 1. (k+ i−1 + RiT A−1 i w(k+ p ) = w(k+ p ) i R i f − Aw p ) . Algorithm 2. The resulting multiplicative or sequential Schwarz algorithm is listed next.1 (Multiplicative Schwarz Method to Solve (2.3. For i = 0. · · · . For k = 0.28)) Input: w(0) = 0 (starting guess).

2 (Symmetrized Schwarz Preconditioner for (2. The inverse of the symmetrized Schwarz preconditioner M is described below. 1. p do: w ← w + RiT A−1 i Ri (r − Aw). its action on a vector should be computed by solution of the associated linear system. Endfor Output: w(k) The iterates w(k) in this algorithm will converge to the solution of (2. Instead. 1. 2.25. . A−1i should not be assembled. For instance.3. the computation of RiT A−1 i Ri f should ﬁrst involve the computation of Ri f . · · · . Endfor Output: M −1 r ≡ w Remark 2. Endfor 4. · · · . 3.28)) Input: w ≡ 0 and r 1.28) without acceleration. In practice. followed by the solution of the linear system Ai vi = Ri f . 0. The notation A−1 i was only employed for convenience in the preceding algorithms. If CG acceleration is employed to solve Au = f . Scatter-gather operations can be used to implement RiT and Ri . followed by the computation RiT vi . w(k+ p+1 ) = w(k+ p+1 ) + RiT A−1 i i i Ri f − Aw (k+ p+1 ) . For i = p. then a symmetric positive deﬁnite preconditioner would be necessary [GO4]. Algorithm 2.

2. the matrices Ai = Ri ARiT can be replaced by appropriately chosen preconditioners A˜i = A˜Ti > 0. AX. SA2]. a sparse preconditioner A˜i for Ai can be obtained by ILU fac- torization of Ai .3 Matrix Form of Schwarz Subspace Algorithms 69 Remark 2.26. As an example. If approximations are employed in the multiplicative Schwarz . see [BE. In both of the preceding algorithms.

p. Algorithm 2. · · · . 1. i=0 This is summarized below. Endfor 3. Sum: w ≡ w0 + · · · + wp . · · · . Additive Schwarz Algorithm. see [XU3]. then it is easily seen that the matrix version of the additive Schwarz preconditioner corresponds to a block Jacobi . In step 1 of the preceding algorithm.28. For i = 0.3. p−1.31) for u corresponds to a preconditioned system of the form M −1 Au = M −1 f . When (n0 +n1 +· · ·+np ) = n and the columns of Rl correspond to selected columns of an identity matrix. 0. (2. 0. an alternative sym- metrization involving one additional fractional step can be used in the sym- metrized Schwarz preconditioner. p in parallel do: wi = RiT A−1 i Ri r 2. This yields the additive Schwarz preconditioner as: p M −1 = RiT A−1 i Ri . · · · . p−1. to ensure convergence without acceleration. If a preconditioner is employed for A0 .28) has the form: ! p T −1 Ri Ai Ri A u = w∗ .3 (Additive Schwarz Preconditioner for (2. i=0 The system (2. Remark 2.28)) Input: r 1. Output: M −1 r ≡ w Remark 2. The matrix version of the additive Schwarz equation P u = f∗ for solution of (2.27. Both versions will be equivalent if an exact solver is employed for A0 . residual corrections can be implemented for i = p. 1. ˜−1 the condition λmax Ai Ai < 2 must be satisﬁed.31) i=0 where p w∗ ≡ RiT A−1 i Ri f . method.

28)) Input: r 1. Next. Thus. seek it in the form u0 = R0T α0 for some unknown coeﬃcient vector α0 ∈ IRn0 . p in parallel do: vi = RiT A−1 i Ri (r − Aw0 ). Compute: w0 = R0T A−1 0 R0 r. 2. . to verify that M −1 r will satisfy R0 AM −1 r = 0 whenever r ∈ IRn satisﬁes R0 r = 0. If the input residual r satisﬁes R0 r = 0. For i = 1. Sum: v = v1 + · · · + vp . so that R0 (f − Au0 ) = 0. Compute: v0 = R0T A−1 0 R0 Av. and the matrix version of the multiplicative Schwarz method corresponds to the block Gauss-Seidel method.70 2 Schwarz Iterative Algorithms preconditioner. Imposing the preceding constraint will yield: R0 (f − A u0 ) = 0 ⇔ R0 f − A R0T α0 = 0 ⇔ α0 = A−1 0 R0 f . 5. Then. yielding w0 = 0. 3. Algorithm 2. Note that to construct a starting iterate u0 ∈ IRn . This suggests choosing a starting iterate u0 ∈ IRn in the conjugate gradient method so that the initial residual r = f − A u0 satisﬁes R0 (f − A u0 ) = 0. the action M −1 of the inverse of preconditioner M can easily be deduced to be the following. as will be shown below.29.3.4 (Hybrid Schwarz Preconditioner for (2. 6. Hybrid Schwarz Method. Compute: w = w0 + v − v0 . As this problem represents the preconditioned system M −1 Au = M −1 f . · · · . where A0 = R0 AR0T . all subsequent residuals in the conjugate gradient method with hybrid Schwarz preconditioner will satisfy this constraint. Output: M −1 r ≡ w Remark 2. When (n0 + · · · + np ) > n or when the columns of Rl are not columns of an identity matrix. apply R0 A to step 6 in the hybrid Schwarz preconditioner with w0 = 0 to obtain: R0 AM −1 r = R0 Av − R0 AR0T A−1 0 R0 Av = 0. then step 1 in the hybrid Schwarz preconditioner can be skipped. Endfor 4. The matrix version of the hybrid Schwarz pre- conditioner can be derived from the hybrid Schwarz problem P˜ u = f∗ where Pˆ = P0 + (I − P0 ) (P1 + · · · + Pp ) (I − P0 ). then the multiplicative and additive Schwarz algorithms generalize the block Jacobi and block Gauss-Seidel algorithms. u0 = R0T A0−1 R0 f .

Ep form a discrete partition of unity relative to R1 . Restricted Schwarz Algorithm. . In the version given below.32. if αi is an ni × di matrix whose columns form a basis for the null space of A˜i . so that the subproblems in step 2 of the hybrid Schwarz preconditioner are well deﬁned when Ai is replaced by A˜i . Substituting the deﬁnition of R0 yields that αTi Ri (r − Aw0 ) = 0 for 1 ≤ i ≤ p. if such can be found. where each Ei is an n × ni matrix for 1 ≤ i ≤ p. By construction of the term w0 in step 1 of the hybrid Schwarz preconditioner. it may even be advantageous to employ singular matrices A˜i whose null spaces are known. linear systems of the form A˜i vi = ri will be solvable only if a compatibility condition is satisﬁed. .31. In such ap- plications. with E0 ≡ R0T . then αTi ri = 0 must hold for solvability. 2. the projection term v − R0T A−10 R0 Av in step 6 modiﬁes these arbitrary terms so that R0 AM −1 r = 0 holds.5.28) is motivated by (2. Rp if: E1 R1 + · · · + Ep Rp = I. This is the principle underlying the balancing domain decomposition preconditioner [MA14]. Remark 2. Each vi in step 2 of the hybrid Schwarz preconditioner can have an arbitrary additive term of the form RiT αi β i with β i ∈ IRdi . Let Vi = Range(RiT ) be subspaces of V = IRn for 1 ≤ i ≤ p. The action M −1 of the inverse of the restricted Schwarz preconditioner to solve (2. · · · . a coarse space correction term is included. and will involve an arbitrary additive term from the null space. Indeed. Deﬁne a coarse space V0 ⊂ IRn as: " # V0 ≡ Range R0T . it is shown that the hybrid Schwarz precondi- tioned matrix P˜ is better conditioned than its associated additive Schwarz preconditioned matrix P . and also eﬀectively handle the arbitrariness of the local solutions. Then. RpT αp .25) when iterate w = 0. its general matrix version will require an algebraic partition of unity. it will hold that R0 (r − Aw0 ) = 0.30. The submatrices Ai = Ri ARiT in the hybrid Schwarz precondi- tioner may be replaced by approximations A˜i for 1 ≤ i ≤ p.2 is based on a partition of unity. . · · · . Remark 2. solve the linear system Av = f −Au0 by a conjugate gradient method with a hybrid Schwarz preconditioner in which step 1 is skipped. In this case. We say that matrices E1 . 2. In Chap. a careful choice of coarse space V0 in the hybrid Schwarz method can ensure solvability of all such local problems. where R0T ≡ R1T α1 . Since the restricted Schwarz algorithm in Chap. However. In certain appli- cations. . Deﬁnition 2. To determine v. . the computational costs in a conjugate gradient method to solve Au = f can be reduced by splitting the solution as u = u0 + v with u0 = R0T A−1 0 R0 f . the solution vi will not be unique.3 Matrix Form of Schwarz Subspace Algorithms 71 Thus. 2.

3 to solve a discretization of (2.1). and at most one nonzero entry per row or column. Ωp∗ of Ω. Then. it cannot be employed in a conjugate gradient method [CA19]. j) denote the global index of the j’th local ∗ node in Ωi ∪ BN ∩ B[i] . FA9.33. if index(Ωi∗ . Remark 2. ∗ Let ni = dim (V i ) and let index(Ωi . 2. p in parallel compute: wi = Ei A−1 i Ri r. The action of Ri and RiT for 1 ≤ i ≤ p . and given the ﬁnite element space Vh ⊂ HD 1 (Ω). 2. We shall also remark on local solvers and parallel software libraries. For i = 0. if index(Ωi∗ . an automated strategy may be employed. . · · · .4 Implementational Issues In this section. k) = j (Ri )kj = for 1 ≤ i ≤ p. . the number of subdomains p also depends on the number of processors.28)) Input: r. though the methodology (with the exception of a coarse space V0 ) will typically carry over for a ﬁnite diﬀerence discretization. deﬁne Ri as an ni × n restriction matrix: 1. . 1. Since the above preconditioner is not symmetric. using the graph partitioning algorithms discussed in Chap. 2. For simplicity. we deﬁne the local spaces as: $ ∗ % Vi ≡ Vh ∩ v ∈ H 1 (Ω) : v = 0 on Ω\Ω i for 1 ≤ i ≤ p. Ideally.72 2 Schwarz Iterative Algorithms Algorithm 2. so that the decomposition yields approximately balanced loads. SI2. 5. PO3. availability of fast solvers for subdomain problems and hetero- geneity in the coeﬃcients.1 Choice of Subdomains and Subdomain Spaces Various factors may inﬂuence the choice of an overlapping decomposition Ω1∗ . FO2. see [BE14. Endfor Output: M −1 r ≡ α w0 + (1 − α) (w1 + · · · + wp ). we remark on applying the matrix Schwarz algorithms from Chap.5 (Restricted Schwarz Preconditioner for (2. 2. 1. BA20. Once a an overlapping decomposition {Ωl∗ }pl=1 has been chosen.3. k) = j. 0 < α < 1. we only consider a ﬁnite element discretization. 0. When a natural decomposition is not obvious. For 1 ≤ i ≤ p these matrices will have zero or one entries. These include the geometry of the domain.4. PO2]. regularity of the solution. .

2.4 Implementational Issues 73

**may be implemented using scatter-gather operations and the data structure
**

of index(Ωi∗ , ·). The subdomain submatrices Ai of size ni × ni deﬁned by:

Ai = Ri Ah RiT , for 1 ≤ i ≤ p,

will be principal submatrices of A corresponding to the subdomain indices.

2.4.2 Choice of Coarse Spaces

**A coarse space V0 ⊂ (Vh ∩HD 1
**

(Ω)) may be employed as described in Chap. 2.1.

(0) (0)

If ψ1 (·), · · · , ψn0 (·) forms a ﬁnite element basis for V0 , then an extension

matrix R0T of size n × n0 will have the following entries:

T (0)

R0 ij = ψj (xi ), for 1 ≤ i ≤ n, 1 ≤ j ≤ n0 .

**Matrix R0 will not be a zero-one matrix, unlike Ri for 1 ≤ i ≤ p. Furthermore,
**

A0 = R0 Ah R0T will not be a submatrix of A. In some applications, the coarse

space may be omitted, without adversely aﬀecting the rate of convergence

of Schwarz algorithms. For instance, if c(x) ≥ c0 > 0 and coeﬃcient a(x) is

anisotropic with a suﬃciently small parameter and aligned subdomains, or for

a time stepped problem, with suﬃciently small time step and large overlap.

**Remark 2.34. When the boundary segment BD
= ∅, equation (2.12) will have
**

a unique solution, and matrix A will be symmetric positive deﬁnite. However,

when BD = ∅ and c(x) = 0 and γ(x) = 0 then (2.12) will be a Neumann

problem. In this case, a compatability condition must be imposed for the

solvability of (2.1), and its solution will be unique only up to a constant. By

construction, all the subdomain matrices Ai will be nonsingular for 1 ≤ i ≤ p

since Dirichlet boundary conditions will be imposed on B (i)
= ∅. However,

matrix A0 will be singular with 1 spanning its null space. To ensure that

each coarse problem of the form A0 v0 = R0 r is solvable, it must hold that

1T R0 r = 0. Then, the coarse solution will be nonunique, but a speciﬁc solution

may be selected so that either 1T v0 = 0, or 1T v = 0 for the global solution.

2.4.3 Discrete Partition of Unity

**For the restricted Schwarz algorithm, an algebraic partition of unity consisting
**

of matrices Ei can be constructed as follows. Let χ1 (·), · · · , χp (·) denote a

continuous partition of unity subordinate to Ω1∗ , · · · , Ωp∗ . If x1 , · · · , xn denote

the nodes of Th (Ω) in Ω ∪ BN , deﬁne:

χi (xl ) if index(Ωi∗ , j) = l

(Ei )lj =

0 if index(Ωi∗ , j)
= l

74 2 Schwarz Iterative Algorithms

**Here 1 ≤ i ≤ p, 1 ≤ l ≤ n and 1 ≤ j ≤ ni . Then, by construction:
**

p

Ei Ri = I.

i=1

**Similar discrete partitions of unity are employed in [MA17]. For the coarse
**

space, we formally deﬁne E0 ≡ R0T .

2.4.4 Convergence Rates

**For discretizations of self adjoint and coercive elliptic equations, Schwarz al-
**

gorithms typically converge at a rate independent of (or mildly dependent

on) the mesh size h and the subdomain size h0 , provided the overlap between

subdomains is suﬃciently large, and a coarse space V0 is employed with an

O(h0 ) approximation property. This is veriﬁed by both computational tests

and theoretical analysis. The latter typically assumes that the overlap between

subdomains is β h0 > 0 and shows that the rate of convergence can depend

on the coeﬃcient a(.), and mildly on the parameter β, see Chap. 2.5.

2.4.5 Local Solvers

**The implementation of Schwarz algorithms requires computing terms of the
**

form wi = A−1 i Ri r for multiple choices of Ri r. In practice, wi is obtained

by solving the associated system Ai wi = Ri r, using a direct or iterative

solver. Direct solvers are commonly employed, since they are robust and do

not involve double iteration. Furthermore, eﬃcient sparse direct solvers are

available in software packages. In the following, we list several solvers.

Direct Solvers. Since Ai = ATi > 0 is sparse, a direct solver based on

Cholesky factorization can be employed [GO4, GE5, DU]. Matrix Ai its

Cholesky factorization Ai = Li LTi should be stored using a sparse format.

Systems of the form Ai wi = Ri r can then be solved using back substitution,

solving Li zi = Ri r and LTi wi = zi , see [GO4]. Such algorithms are available

in LAPACK, SPARSPAK and SPARSKIT, see [GE5, DU, GO4, SA2, AN].

Remark 2.35. The cost of employing a direct solver to solve Ai wi = Ri r de-

pends on the cost of computing its Cholesky factors Li and LTi , and the cost

for solving Li zi = Ri r and LTi wi = zi . When multiple systems of the form

Ai wi = Ri r need to be solved, the Cholesky factors of Ai need to be deter-

mined only once and stored. The cost of computing the Cholesky factorization

of Ai will depend on the sparsity of Ai , while the cost of solving Li zi = Ri r

and LTi wi = zi will depend on the sparsity of Li . These costs can be sig-

niﬁcantly reduced by reordering (permuting) the unknowns. For instance, if

subdomain Ωi∗ is a thin strip, then a band solver can be eﬃcient, provided

the unknowns are reordered within the strip so that the band size is mini-

mized. Other common orderings include the nested dissection ordering, and

2.4 Implementational Issues 75

**the Cuthill-McKee and reverse Cuthill-McKee orderings, see [GE5, DU, SA2].
**

Sparse software packages such as SPARSPAK and SPARSKIT, typically em-

ploy graph theoretic methods to automate the choice of a reordering so that

the amount of ﬁll in is approximately minimized, to reduce the cost of em-

ploying a direct solver [GE5, DU]. Such solvers typically have a complexity of

O(nαi ) for 1 < α < 3.

**FFT Based Solvers. Fast direct solvers based on Fast Fourier Transforms
**

(FFT’s) may be available for special geometries, coeﬃcients, triangulations

and boundary conditions, see [VA4]. Such solvers will apply when the eigen-

value decomposition Ai = Fi Λi FiT of Ai is known, where Λi is a diagonal

matrix of eigenvalues of Ai , and Fi is a discrete Fourier (or sine or cosine)

transform. Such solvers will typically have a complexity of O(ni log(ni )).

Iterative Solvers. Each subdomain problem Ai wi = ri may also be solved

iteratively using a CG algorithm with a preconditioner Mi (such as ILU,

Gauss-Seidel, Jacobi) in an inner loop. This will introduce double iteration.

To ensure convergence, the ﬁxed number of local iterations must be accurate

to within the discretization error. If the number of iterations vary with each

application of the local solver, then the Schwarz preconditioner may vary with

each iteration, see [GO4, SA2, AX, SI3].

Remark 2.36. If an iterative local solver is employed, with ﬁxed number of

iterations and zero starting guess, this will yield a preconditioner A˜i for Ai ,

see [GO4, BE2, NO2, AX, MA8]. To ensure the convergence of Schwarz algo-

rithms when approximate solvers are employed, matrices A˜i must satisfy cer-

tain assumptions. For instance, the condition number of the additive Schwarz

preconditioner with inexact solver will increase at most by the factor γ:

**maxi λmax A˜−1 i Ai
**

γ≡

.

mini λmin A˜−1 i Ai

If inexact solvers A˜i are employed in

the multiplicative

Schwarz algorithm,

˜−1

then the spectral radius must satisfy ρ Ai Ai < 2 to ensure convergence. In

the hybrid Schwarz algorithm (in balancing domain decomposition [MA15])

the coarse problem must be solved exactly.

2.4.6 Parallelization and Software Libraries

**With the exception of the sequential Schwarz algorithm without coloring, the
**

computations on diﬀerent subdomains in a Schwarz algorithm can typically

be implemented concurrently. From the viewpoint of parallelization, Schwarz

algorithms thus have “coarse granularity”, i.e., a signiﬁcant portion of the

computations can be performed in parallel, with the remaining portion re-

quiring more intensive communication between processors. As an example,

76 2 Schwarz Iterative Algorithms

consider the additive Schwarz preconditioner:

p

M −1 r = RlT A−1

l Rl r.

l=0

**Suppose there are (p + 1) processors available, and that we assign one proces-
**

sor to each subproblem and distribute the data amongst the processors. Then,

the action of M −1 r can be computed as follows. First, given r, synchronize all

the processors and communicate relevant data between the processors, so that

processor l receives the data necessary to assemble Rl r from other processors.

Second, let each processor solve its assigned problem Al wl = Rl r in parallel.

Third, synchronize and communicate the local solution wl to other processors,

as needed (processor l = 0 should transfer Rl w0 to processor l, while proces-

sor l should transfer Rj RlT wl to processor j if Ωj∗ ∩ Ωl∗
= ∅). Fourth, let each

processor sum

relevant components and store the result locally (processor l

can sum Rl R0T w0 + R1T w1 + · · · + RpT wp ). For simplicity, processor 0 may

be kept idle in this step. Other Schwarz algorithms may be parallelized simi-

larly. The PETSc library contains parallelized codes in C, C++ and Fortran,

for implementing most Schwarz solvers, see [BA15, BA14, BA13]. These codes

employ MPI and LAPACK.

MPI. The message passing interface (MPI) is a library of routines for imple-

menting parallel tasks in C, C++ and Fortran, see [PA, GR15]. It is based on

the “message passing model”, which assumes that diﬀerent processors have

separate memory addresses, and that data can be moved from one memory

address to another. Using MPI, a parallel computer architecture can be simu-

lated given a cluster of work stations connected by high speed communication

lines. Once the MPI library has been installed, the same executable code of

a parallel program employing the MPI library is stored and executed on each

processor. Each processor is assigned a label (or rank). If there are p proces-

sors, then processor l is assigned rank l. Since the same executable code is

to be run on each processor, parallelization is obtained by branching the pro-

grams based on the rank. The library employs protocol for synchronizing and

communicating data between the diﬀerent processors. Readers are referred to

[PA, GR15] for details on the syntax, and for instructions on downloading and

installing MPI. In many domain decomposition applications, however, details

of MPI syntax may not be required if the PETSc parallel library is employed.

PETSc. The suite of routines called PETSc (Portable, Extensible Toolkit

for Scientiﬁc Computing) is a library of routines for implementing domain de-

composition iterative methods, optimization algorithms, and other algorithms

used in scientiﬁc computing. The PETSc library is available in C, C++ and

Fortran, but requires installation of the MPI and LAPACK libraries. Most

Schwarz and Schur complement solvers are implemented in PETSc, and are

coded to run on parallel computers. We refer to [BA14] for a tutorial on the

syntax for this library.

2.5 Theoretical Results 77

2.5 Theoretical Results

In this section, we describe theoretical results on the convergence of multi-

plicative, additive and hybrid Schwarz algorithms in an Hilbert space norm,

see [MA37, DR11, LI6, LI7, WI4, BR18, XU3]. We formulate an abstract

convergence theory for Schwarz projection algorithms on a ﬁnite dimensional

Hilbert space, where the convergence rate of the algorithms can be reduced

to two key parameters, which depend the properties of the subspaces under-

lying the projections. The theoretical framework admits replacement of exact

projections by approximations, in which case two additional parameters will

arise in the convergence bounds. We focus ﬁrst on the abstract theory before

estimating the key parameters in applications to ﬁnite element discretizations

of self adjoint and coercive elliptic equations. Additional analysis of Schwarz

algorithms is presented in [ZH2, WA2, GR4, DR17, MA15].

Our discussion will be organized as follows. In Chap. 2.5.1 we present

background and notation. Chap. 2.5.2 presents the abstract Schwarz conver-

gence theory. Applications to ﬁnite element discretizations of elliptic equa-

tions are considered in Chap. 2.5.3. Our discussion follows [XU3, CH11]

where additional results may be found. Selected results on the convergence of

Schwarz algorithms in the maximum norm are presented in Chap. 15, see also

[FR7, FR8].

2.5.1 Background

Let V denote a Hilbert space equipped with inner product A(., .) and norm:

wV ≡ A(w, w)1/2 , ∀w ∈ V.

We consider the following problem. Find u ∈ V satisfying:

A(u, v) = F (v), ∀v ∈ V, (2.32)

**where F (·) is a bounded linear functional on V . The solution to (2.32) will be
**

sought by Schwarz algorithms based on (p + 1) subspaces V0 , · · · , Vp of V :

V = V0 + V1 + · · · + Vp ,

i.e., for each v ∈ V we can ﬁnd vi ∈ Vi such that

v = v0 + · · · + vp .

On each Vk , let Ak : Vk × Vk → IR be a symmetric, bilinear form deﬁned as:

Ak (v, w) ≡ A(v, w), ∀v, w ∈ Vk .

**If inexact projections (or solvers) are employed in the Schwarz algorithms, we
**

let A˜k : Vk × Vk → IR denote a symmetric, bilinear form corresponding to the

inexact solver for the projection onto Vk .

78 2 Schwarz Iterative Algorithms

**Remark 2.37. We assume there exists parameters 0 < ω0 ≤ ω1 such that:
**

Ak (v, v)

ω0 ≤ ≤ ω1 , ∀v ∈ Vk \{0} (2.33)

A˜k (v, v)

**for 0 ≤ k ≤ p. If A˜k (·, ·) ≡ Ak (·, ·) for 0 ≤ k ≤ p we obtain ω0 = ω1 = 1.
**

Remark 2.38. If V is ﬁnite dimensional, by employing basis vectors for V and

Vk , we may represent the bilinear forms A(·, ·), Ak (·, ·) and A˜k (·, ·) in terms

of matrices A, Ak and A˜k , respectively. Indeed, suppose n and nk denote

the dimensions of V and Vk , respectively, and let φ1 , . . . , φn be a basis for

(k) (k)

V and ψ1 , · · · , ψnk a basis for Vk . Deﬁne an n × n matrix A and nk × nk

matrices Ak and A˜k with entries (A)ij = A(φi , φj ) for 1 ≤ i, j ≤ n, and

(k) (k) (k) (k)

(Ak ) = Ak (ψ , ψ ) and A˜k

ij i j = A˜k (ψ , ψ ) for 1 ≤ i, j ≤ nk .

i j

ij

Matrix Ak may be obtained from matrix A as follows. Denote by RkT an n×nk

extension matrix whose i’th column consists of the coeﬃcients obtained when

(k)

expanding ψi in the basis φ1 , · · · , φn for V :

(k)

n

ψi = RkT ji

φj , for 0 ≤ k ≤ p.

j=1

**Substituting this into the deﬁnition of Ak above, yields:
**

!

(k) (k)

n

n

(Ak )ij = Ak (ψi , ψj ) =A RkT li φl , RkT qj φq = Rk ARkT ij .

l=1 q=1

nk (k)

Thus Ak = Rk ARkT . Substituting v = j=1 (v)j ψj into (2.33) yields:

vT Ak v

ω0 ≤ ≤ ω1 , ∀v ∈ IRnk \{0}.

vT A˜k v

This yields:

**ω0 = min λmin A˜−1
**

k Ak ≤ max λmax A˜−1

k Ak = ω 1 ,

k k

**corresponding to uniform lower and upper bounds for the spectra of A˜−1
**

k Ak .

**Remark 2.39. In applications to elliptic equation (2.12) with BN = ∅, the
**

Hilbert space V = H01 (Ω) and Vk = H01 (Ωk∗ ) for 1 ≤ k ≤ p, the forms are:

A(u, v) ≡ Ω (a(x)∇u · ∇v + c(x)uv) dx, for u, v ∈ V

Ak (u, v) ≡ Ω ∗ (a(x)∇u · ∇v + c(x)uv) dx, for u, v ∈ Vk .

k

**A simple approximation A˜k (·, ·) of Ak (·, ·) can be obtained by replacing the
**

variable coeﬃcients a(.) and c(.) by their values at an interior point xk ∈ Ωk∗ .

2.5 Theoretical Results 79

**This can be particularly useful if Ωk∗ is a rectangular domain with a uniform
**

grid, in which case fast solvers can be formulated for A˜k :

A˜k (u, v) ≡ (a(xk )∇u · ∇v + c(xk )uv) dx, for u, v ∈ Vk .

Ωk∗

**Provided a(·) and c(·) do not have large variation in Ωk∗ then ω0 and ω1 will
**

correspond to uniform lower and upper bounds for a(x a(x)

k)

c(x)

and c(x k)

in Ωk∗ . In

applications, A˜k can be any scaled preconditioner for Ak , such as ILU.

**We now deﬁne a projection map Pk : V → Vk and its approximation
**

P˜k : V → Vk for 0 ≤ k ≤ p as follows.

Deﬁnition 2.40. Given u, w ∈ V , we deﬁne Pk u and P˜k w as the unique

elements of Vk satisfying:

Ak (Pk u, v) = A(u, v), for all v ∈ Vk

A˜k (P˜k w, v) = A(w, v), for all v ∈ Vk .

**The existence of Pk and P˜k follows by the Lax-Milgram lemma, see [CI2].
**

The following properties of Pk and P˜k will be employed in this section.

Lemma 2.41. Let Pk and P˜k be as deﬁned above. The following hold.

˜ k of P˜k are given by:

1. The matrix representations Pk of Pk and P

**Pk = RkT A−1 ˜ T ˜−1
**

k Rk A and Pk = Rk Ak Rk A.

**2. The mappings Pk and P˜k are symmetric, positive semideﬁnite in A(·, ·):
**

A(Pk v, w) = A(v, Pk w), for v, w, ∈ V

A(P˜k v, w) = A(v, P˜k w), for v, w, ∈ V

**with A(Pk v, v) ≥ 0 and A(P˜k v, v) ≥ 0 for v ∈ V . In matrix terms, this
**

corresponds to APk = PTk A, AP ˜k = P˜ T A, vT APk v ≥ 0, vT AP˜ k v ≥ 0.

k

3. The projections Pk satisfy:

Pk Pk = Pk , Pk (I − Pk ) = 0 and Pk V ≤ 1.

**4. The map P˜k satisﬁes P˜k V ≤ ω1 and also:
**

ω0 A(Pk u, u) ≤ A(P˜k u, u), for all u ∈ V

A(P˜k u, P˜k u) ≤ ω1 A(P˜k u, u), for all u ∈ V.

80 2 Schwarz Iterative Algorithms

**Proof. Properties of orthogonal projections Pk are standard, see [ST13, LA10].
**

The symmetry of P˜k in A(·, ·) may be veriﬁed by employing the deﬁnition of

P˜k and using that P˜k u, P˜k v ∈ Vk for all u, v ∈ V :

A(P˜k u, v) = A(v, P˜k u) = A˜k (P˜k v, P˜k u) = A˜k (P˜k u, P˜k v) = A(u, P˜k v).

The positive semi-deﬁniteness of P˜k in A(·, ·) follows since:

0 ≤ A˜k (P˜k v, P˜k v) = A(v, P˜k v), ∀v ∈ V.

To obtain P˜k V ≤ ω1 , apply the deﬁnition of P˜k and employ (2.33):

**P˜k u2V = A(P˜k u, P˜k u) = Ak (P˜k u, P˜k u)
**

≤ ω1 A˜k (P˜k u, P˜k u)

= ω1 A(u, P˜k u)

≤ ω1 uV P˜k uV .

**The desired bound follows. To verify the bound on A(P˜k u, u), employ the
**

˜ k u of Pk u and P˜k u, respectively to obtain:

matrix equivalents Pk u and P

A(Pk u, u) = uT APk u = uT ARkT A−1

k Rk Au

T ˜−1

≤ u AR A Rk Au

1 T

ω0 k k

˜

ω0 A(P u, u).

1

=

Here, we have employed the property of symmetric positive deﬁnite matrices:

vT Ak v 1 vT A−1

k v 1

ω0 ≤ ≤ ω1 ∀v
= 0 ⇔ ≥ −1

≥ ∀v
= 0.

v A˜k v

T ω0 v A˜k v

T ω1

To verify that A(P˜k u, P˜k u) ≤ ω1 A(P˜k u, u) consider:

**A(P˜k u, u) = A(u, P˜k u) = A˜k (P˜k u, P˜k u)
**

≥ 1 Ak (P˜k u, P˜k u)

ω1

˜ ˜

ω1 A(Pk u, Pk u),

1

=

**where we have employed the deﬁnition of P˜k u, and property (2.33), and the
**

deﬁnition of ak (·, ·). This yields the desired result.

In the following, we shall derive properties of diﬀerent Schwarz algorithms

in terms of the mappings Pk or P˜k , which will be used later.

Classical (Multiplicative) Schwarz Algorithm. Each sweep of the clas-

sical Schwarz algorithm to solve (2.32) based on subspaces V0 , · · · , Vp has the

following representation in terms of projections (or its approximations):

⎧

⎪ 0, · · · , p do

⎨ For i = i+1

i i

u (k+ p+1 )

= u(k+ p+1 ) + P˜i u − u(k+ p+1 )

⎪

⎩

Endfor

2. · · · . subtracting this from the above yields: ⎧ ⎪ For i = 0.5 Theoretical Results 81 Since the solution u trivially satisﬁes u = u + P˜i (u − u) for 0 ≤ i ≤ p. p do ⎪ ⎪ .

⎪ ⎨ i+1 i i u − u(k+ p+1 ) = u − u(k+ p+1 ) − P˜i u − u(k+ p+1 ) .

• The inverse M −1 of the additive Schwarz preconditioner satisﬁes: p M −1 A = RiT A˜−1 i Ri A.42. If M −1 denotes the matrix action corresponding to one sweep of the unsymmetrized Schwarz Alg.. By (2.28).34) which expresses the error u − u(k+1) in terms of the error u − u(k) . This will be demonstrated later in this section.) inner product.1 to solve (2. . λmin (M A) λmin (P˜ ) and it will be estimated later in this section. Hybrid and Symmetrized Schwarz Preconditioners. in the A(. it would follow that the Schwarz vector iterates u(k) will converge ˜ p ) · · · (I − P to u if (I − P ˜ 0 )V ≤ δ for some 0 ≤ δ < 1. and will be shown to be coercive. we express the preconditioned matrices M −1 A corresponding to the additive. Remark 2. hybrid and symmetrized multiplicative Schwarz preconditioners with ˜ k. This may also be expressed as: ˜ ≡ p P ˜ −1 P i=0 i = M A. Next. This is referred to as the error propagation map or the error ampliﬁcation map. (2. A) ≡ −1 = . in matrix form ˜ p ˜ P ≡ Pi . i=0 where an inexact solver A˜k was assumed.3.34). inexact local solvers A˜k in terms of the matrices P Additive. Its condition number satisﬁes: λmax (M −1 A) λmax (P˜ ) cond(M. then we obtain: I − M −1 A = (I − P ˜ p ) · · · (I − P ˜ 0 ). 2. i=0 in operator form where P˜ is self adjoint. ⎪ ⎪ − ˜ − i (k+ p+1 ) ⎪ ⎪ = (I P i ) u u ⎩ Endfor Recursive application of the above yields the following expression: (u − u(k+1) ) = (I − P˜p ) · · · (I − P˜0 )(u − u(k) ). The iterates u(k) of the multiplicative Schwarz algorithm will converge to the desired solution u in the energy norm · V if (I − P˜p ) · · · (I − P˜0 )V < 1. .

82 2 Schwarz Iterative Algorithms • The inverse M −1 of the hybrid Schwarz preconditioner satisﬁes: ⎧ ⎪ ⎪ M −1 A ⎪ ⎪ ⎨ ≡ R0T A−1 R0 A + (I − R0T A−1 R0 )( p RiT A˜−1 Ri )(I − AR0T A−1 R0 )A 0 .

the local matrices Ai were replaced by approximations A˜i for 1 ≤ i ≤ p. We obtain: . 0 p ˜ i=1 i 0 ⎪ ⎪ = P0 + (I − P0 ) P0 + i=1 Pi (I − P0 ) ⎪ ⎪ ⎩ ˜ (I − P0 ) . = P0 + (I − P0 ) P where P ˜ ≡ P0 + P ˜1 + ··· + P ˜ p . However. Here. the coarse matrix A0 should not be approximated. to ensure that all iterates lie in V0⊥ .

A) ≡ = . λmax (M −1 A) λmax P0 + (I − P0 )P˜ (I − P0 ) cond(M.

Both symmetrizations will be equivalent if P ˜ 0 = P0 . λmin (M −1 A) λmin P0 + (I − P0 )P˜ (I − P0 ) where P˜ = P0 + P˜1 + · · · + P˜p represents the additive Schwarz operator. We will analyze the latter. • The symmetrized Schwarz preconditioner M satisﬁes: M −1 A ≡ I − (I − P ˜ p ) · · · (I − P1 )(I − P0 )(I − P ˜ 1 ) · · · (I − P ˜ p ). This will be shown to be better conditioned than P˜ . Here. Ep ≡ (I − P˜p ) · · · (I − P˜0 ). and Ep . denotes an .2 Convergence of Abstract Schwarz Algorithms Our study of the convergence of Schwarz algorithms will involve the study of the operator P˜ associated with the additive Schwarz method. each P˜i as deﬁned earlier. ˜ 0 = P0 is employed. Schwarz convergence analysis will be based on bounds for the preceding. though the latter involves an extra residual correction on V0 . 2. then the If an approximate coarse space projection P ˜ following alternative symmetrization A may also be employed: A˜−1 A ≡ I − (I − P ˜ p ) · · · (I − P1 )(I − P ˜ 0 )(I − P ˜ 0 )(I − P ˜ 1 ) · · · (I − P ˜ p ). the error propagation map of the multiplicative Schwarz method: P˜ ≡ P˜0 + · · · + P˜p . .5.

approximation .

of the projection Pi onto the subspace Vi . The spectra λmin P˜ and λmax P˜ of the A(·. ·)-self adjoint operator P˜ and the norm Ep V of the error propagation map Ep will be estimated. These quantities will generally depend on two parameters .

ω0 Proof. and the approximate solvers A˜i for 0 ≤ i ≤ p. Then. w). wi ) ≤ Ai (wi . wi ). 1. w). Vp and the forms A˜0 (. · · · . . wi ) ≤ C0 A(w. i=0 Substituting ω0 A˜i (wi .44. . . We associate a parameter K0 > 0 with the spaces V0 . . . A˜p (. In matrix form. Estimates of K0 and K1 will be described later in this section for a ﬁnite element discretization of a self adjoint and coercive elliptic equation and will also depend on the parameters ω0 and ω1 . Let ω0 > 0 be deﬁned by (2. i=0 The following result reduces the estimation of K0 to a parameter C0 in [LI6]. i=0 2. i=0 Remark 2. .. . . Lemma 2.. there exists wi ∈ IRni for 0 ≤ i ≤ p such that: w = R0T w0 + · · · + RpT wp . w). Deﬁnition 2.5 Theoretical Results 83 K0 and K1 associated with the subspaces V0 . . By assumption: p Ai (wi . yields the desired result. .). wi ) ≤ C0 A(w. Vp . for 0 ≤ i ≤ p. and p wTi A˜i wi ≤ K0 wT Aw.33). the following estimate will hold: C0 K0 ≤ . Let C0 > 0 be a parameter such that for each w ∈ V there exists wi ∈ Vi for 0 ≤ i ≤ p satisfying w = w0 + · · · + wp and: p Ai (wi . the above may be stated that given w ∈ IRn .45. wi ) ≤ K0 A(w.) if for each w ∈ V there exists wi ∈ Vi : w = w0 + · · · + wp and satisfying the bound: p A˜i (wi . 2. Suppose the following assumptions hold. in the above. .43.

w0 . · · · . · · · . j) : 0 ≤ i ≤ p. 0 ≤ j ≤ p} . the following holds: !1/2 ⎛ p ⎞1/2 . Let K1 > 0 be a parameter such that for all choices of v0 .84 2 Schwarz Iterative Algorithms Deﬁnition 2.46. vp . wp ∈ V and for any collection I of subindices: I ⊂ {(i.

p .

.

In matrix terms. wp and indices I the following holds: vTi ARiT A˜−1 T ˜−1 i Ri ARj Aj Rj Awj (i. wj ⎠ . · · · . the preceding requires that for all choices of v0 . w0 .47. P˜j wj ≤ K1 A P˜i vi . · · · .j)∈I i=0 j=0 Remark 2.j)∈I . A P˜i vi . vi ⎝ A P˜j wj . vp . (i.

p 1/2 .

For each index pair i. then ij = 0. each vector in Vi is orthogonal to each vector in Vj . Let parameter ω1 be as deﬁned earlier. i=0 i j=0 j Here we denote the norm xi 2A˜−1 = xTi A˜−1 ˜ ˜T i xi for Ai = Ai > 0. If ij < 1 the above is called a strengthened Cauchy-Schwartz inequality. wj )1/2 . Let matrix E be as deﬁned earlier.49.p 1/2 ≤ K1 Ri Avi 2A˜−1 Rj Awj 2A˜−1 . i The parameter K1 can be estimated in terms of ω1 and the spectral radius ρ (E) of a matrix E = (ij ). while if Vi and Vj share at least one nontrivial vector in common. if the subspaces are orthogonal. wj ∈ Vj . Remark 2. j ≤ p. Then the following estimate will hold: K1 ≤ ω1 ρ (E) . Lemma 2. whose entries ij are strengthened Cauchy-Schwartz inequality parameters associated with each pair of subspaces Vi and Vj .e. Parameter ij represents the maximum modulus of the cosine of the angle between all pairs of vectors in subspace Vi and Vj . 1. Matrix E ≡ (ij ) for 0 ≤ i. wi )1/2 A(wj . p} deﬁne the parameters 0 ≤ ij ≤ 1 as the smallest possible coeﬃcient satisfying: A(wi . Suppose the following assumptions hold. ∀wi ∈ Vi . Deﬁnition 2. .50. j ∈ {0.. i. In particular. · · · . then ij = 1. wj ) ≤ ij A(wi .48. 2.

i=0 j=0 For additional details. We apply lemma 2. if V0 is employed. If a coarse space V0 is not employed. The following result describes alternative bounds for K1 . we estimate K1 as follows. 2. j) ∈ I : i = 0. Given an index set I ⊂ {(i. since ρ(E)˜ ≤ E˜ ∞ = l0 . See [XU3. · · · .35) 1≤i≤p j=1 Then the following estimate will hold: ω 1 l0 .j)∈I ≤ ij A(P˜i vi . P˜j wj ) (i. TO10]. Lemma 2. Suppose the following assumptions hold. . P˜j wj )1/2 (i.j ≤ ij ω1 A(P˜i vi . wj )1/2 i. j ≤ p. 2. I01 . P˜i vi )1/2 A(P˜j wj . Denote by l0 ⎛ ⎞ p l0 ≡ max ⎝ ij ⎠. I11 as follows: ⎧ ⎪ ⎪ I00 ≡ {(i. Applying the strengthened Schwartz inequalities pairwise yields: A(P˜i vi . if V0 is not employed. j) ∈ I : 1 ≤ i ≤ p. j = 0} ⎪ ⎩ I11 ≡ {(i. 1 ≤ j ≤ p}. see [XU3].5 Theoretical Results 85 Proof. 3. j = 0} ⎪ ⎨ I ≡ {(i.j)∈I ≤ ij A(P˜i vi . j) ∈ I : i = 0. Vp denote subspaces of V . P˜j wj )1/2 i. 1 ≤ j ≤ p} 01 ⎪ ⎪ I10 ≡ {(i. j ≤ p} deﬁne I00 . I10 . vi )1/2 A(P˜j wj . if V0 is not employed K1 ≤ ω1 (l0 + 1). j) : 0 ≤ i. let E˜ be deﬁned by E˜ij ≡ ij for 1 ≤ i. Let E = (ij ) denote the strengthened Cauchy-Schwartz parameters which are associated with the subspaces Vi and Vj for 0 ≤ i. Proof. If a coarse space V0 is employed. Let V0 .j p p ≤ ω1 ρ(E) ( A(P˜i vi . j) ∈ I : 1 ≤ i ≤ p. P˜i vi )1/2 A(P˜j wj . 1.51. wj ))1/2 . vi ))1/2 ( A(P˜j wj . j ≤ p.50 to estimate K1 as: K 1 ≤ ω 1 l0 . (2.

0)∈I10 A(P˜i vi . P˜0 w0 ))2 ≤ A( i:(i. P˜j wj ))2 = ( j:(0.j)∈I11 a(P˜i vi . i:(i.j)∈I A(P˜i vi . v0 ) A( j:(0. i:(i. P˜0 v0 ) A( j:(0.50 yields: ( (i. P˜0 w0 ) ≤ ω1 A( i:(i. P˜0 w0 ))2 = (A( j:(i.j)∈I01 P˜j wj . wi )). vi )) A(P˜0 w0 . Combining the preceding results using that I = I00 ∪ I01 ∪ I10 ∪ I11 yields: ( (i. wj )) p ≤ ω12 l0 A(P˜0 v0 . wi ∈ Vi for 0 ≤ i ≤ p. u) K0−1 ≤ ≤ K1 . j:(0. This yields the desired bound for K1 ≤ ω1 (l0 + 1).j)∈I01 A(P˜0 v0 . u) . v0 ) ( j=0 A(P˜j wj . wj )). vi )) ( j=0 A(P˜i wi . Applying Lemma 2.0)∈I10 P˜i vi . v0 ) ( j:(0. Next.j)∈I01 A(P˜j wj .j)∈I01 P˜j wj .0)∈I10 P˜i vi . Similarly. w0 ) p p ( (i.86 2 Schwarz Iterative Algorithms Let vi . u = 0. as also the eigenvalues of P˜ . vi ))( j=1 A(P˜j wj . j:(0. P˜j wj ))2 = (A(P˜0 v0 .j)∈I10 A(P˜i vi .j)∈I01 P˜j wj ))2 ≤ A(P˜0 v0 . P˜j wj ))2 p p ≤ ω12 (1 + 2l0 + l02 ) ( i=0 A(P˜i vi . P˜j wj ))2 = ( j:(i.0)∈I10 P˜i vi . P˜j wj ))2 ≤ ω12 A(P˜0 v0 . We now estimate p the condition number of the additive Schwarz operator M −1 A = P˜ = i=0 P˜i . w0 ) p ≤ ω12 l0 ( i=0 A(P˜i vi .j)∈I01 P˜j wj ) ≤ ω12 l0 A(P˜0 v0 . vi )) ( j=0 A(P˜i wi . consider the sum over index set I01 : ( (i. v0 ) A(P˜0 w0 . Since each P˜i is symmetric in the A(. The condition number of P˜ will be a quotient of the maximal and minimal eigenvalues of P˜ .) inner product. wj )). A(u.0)∈I10 P˜i vi ) A(P˜0 w0 . j:(0. we obtain for the sum over index set I10 : ( (i. and will satisfy the following Rayleigh quotient bounds: A(P˜ u. P˜j wj ))2 ≤ ω12 l02 ( i=1 A(P˜i vi . w0 ) ≤ ω12 l0 ( i:(i. its eigenvalues will be real. wi )) 2 p p = ω12 (1 + l0 ) ( i=0 A(P˜i vi .j)∈I01 P˜j wj ) ≤ ω1 A(P˜0 v0 .j)∈I00 A(P˜i vi .0)∈I10 A(P˜i vi ..j)∈I01 A(P˜i vi . .0)∈I10 P˜i vi ) A(P˜0 w0 . vi )) A(P˜0 w0 . w0 ).

2.52. The following bounds will hold for the spectra of P˜ : .5 Theoretical Results 87 Theorem 2.

.

Proof. expand P˜ v2V . K0−1 ≤ λmin P˜ ≤ λmax P˜ ≤ K1 . and apply the deﬁnition of K1 : . For an upper bound.

p .

P˜j v . P˜ v = i=0 j=0 A P˜i v. p P˜ v2V = A P˜ v.

1/2 .

v) ≤ K1 i=0 A( P j=0 A(P . 1/2 p ˜i v. v) p ˜j v.

˜ = K1 A P v. v) and simplify using the deﬁnition of P˜i and the Cauchy-Schwartz inequality: p p . For a lower bound. v ≤ K1 P˜ vV vV . The upper bound P˜ vV ≤ K1 vV thus follows immediately. choose v ∈ V and expand v = v0 + · · · + vp employing the decomposition guaranteed by deﬁnition of K0 . Substitute this into A (v.

vi ) = i=0 A˜i P˜i v. vi p . A (v. v) = i=0 A (v.

vi ) p . 1/2 ≤ i=0 A˜i P˜i v. P˜i v A˜i (vi .

1/2 = i=0 A v. vi ) p . P˜i v A˜i (vi .

1/2 p ≤ ( i=0 A(v. vi ) . P˜i v))1/2 ˜ i=0 Ai (vi .

v)1/2 ˜ i=0 Ai (vi .e. . If subspaces V0 . A˜k = Ak for all k). A) = ≤ K0 K1 . In this case the additive Schwarz preconditioned system will have condition number of 1 and the conjugate gradient method will converge in a single iteration. Vp form an orthogonal decomposition of V and exact solvers are employed (i. 1/2 We thus obtain vV ≤ K0 A(P˜ v. v). then it is easily veriﬁed that K0 = K1 = 1. v)1/2 . · · · . which is a lower bound for the spectrum of P˜ . TO10]. Remark 2. v) ≤ K0 A(P˜ v. Squaring both sides yields: v2V = A(v.53.. 1/2 p = A(P˜ v. Remark 2. See [XU3. λmin (P˜ ) which is a bound for the condition number of M −1 A = P˜ . vi ) 1/2 ≤ A(P˜ v. Combining the upper and lower bounds together yields: λmax (P˜ ) cond(M.54. v)1/2 K0 vV .

the deﬁnition of P˜i and that P˜i P˜ −1 v ∈ Vi yields: p ˜ p ˜ . For 0 ≤ i ≤ p deﬁne: vi ≡ P˜i P˜ −1 v. i=0 i=0 For this decomposition. 2.55. By construction p p vi = P˜i P˜ −1 v = P˜ P˜ −1 v = v. then: Lemma 2. K 0 Proof.52 yields: 0 < K −1 ≤ λmin P˜ . P˜ is invertible and given v ∈ V we may construct an optimal partition. ˆ 0 is the smallest admissible choice of parameter K0 . Thm. For any choice of admissible parameter K0 . 0 Thus.88 2 Schwarz Iterative Algorithms The following result concerns the optimal choice of parameter K0 . If K ˆ −1 = λmin P˜ .

˜ ˜ −1 ˜ ˜ −1 i=0 A i (vi . Pi P v p . vi ) = i=0 Ai Pi P v.

P˜i P˜ −1 v . = i=0 A P˜ −1 v.

p = A P˜ −1 v. i=0 P˜i P˜ −1 v .

= A P˜ −1 v. P˜ P˜ −1 v .

v) .56. Deﬁne: P˜ ≡ P0 + P˜1 + · · · + P˜p P˜∗ ≡ P0 + (I − P0 )P˜ (I − P0 ). Let K0 and K1 be as deﬁned above. The following result shows that the hybrid Schwarz preconditioner P˜∗ is better conditioned than the associated additive Schwarz preconditioner. = A P˜ −1 v. the spectra of P˜∗ will satisfy: . Lemma 2. K0 = 1 λmin (P˜ ) is an admissible parameter. Then. v ≤ 1 λmin (P˜ ) A (v. Thus.

.

.

.

In particular. . K0−1 ≤ λmin P˜ ≤ λmin P˜∗ ≤ λmax P˜∗ ≤ λmax P˜ ≤ K1 . κ2 (P˜∗ ) ≤ κ2 (P˜ ).

. Expand the terms in the Rayleigh quotient associated with P˜∗ as: A(P˜∗ u. P0 u) + A(P˜ (I − P0 )u. Since the range of (I − P0 ) is V0⊥ . .5 Theoretical Results 89 Proof. u) A(P0 u. u) A(P0 u. the Rayleigh quotient associated with the self adjoint operator (I − P0 )P˜ (I − P0 ) will satisfy: .)-orthogonality of the decomposition u = P0 u + (I − P0 )u. 2. (I − P0 )u) employing the A(. (I − P0 )u) = . P0 u) + A((I − P0 )u. a subspace of V . A(u.

Remark 2. we consider the latter sweep. p − 1. In our analysis. Let M be the symmetrized multiplicative Schwarz preconditioner with: I − M −1 A = ETp Ep . 1. u∈V0 \{0} (A((I − P0 )u. If inexact solvers are employed. Lemma 2. For some 0 ≤ δ < 1 let Ep = (I − P˜0 ) · · · (I − P˜p ) satisfy: Ep V ≤ δ.58. p − 1. Both deﬁne w = 0 initially and deﬁne M −1 f ≡ w at the end of the sweeps. · · · . (I − P0 )u) ⊥ since the extrema are considered on a subspace of V . 1. the error map associated with the multiplicative Schwarz method. ⎧ ⎨ For k = p. (I − P0 )u) and A(P˜ (I − P0 )u. there are two alternative pos- sibilities for symmetrizing Schwarz sweeps. · · · . 0. then both sweeps will be mathematically equivalent. . The ﬁrst symmetrization is: ⎧ ⎨ For k = p. · · · . 0. 2. p do w ← w + RkT A˜−1 k Rk (f − Aw) ⎩ Endfor An alternative symmetrization has an additional fractional step for k = 0. ˜ 0 ) · · · (I − P where Ep = (I − P ˜ p ) is the matrix equivalent of Ep . 1. 1. p − 1. 1. · · · . u∈V0⊥ \{0} (A((I − P0 )u. (I − P0 )u) λmin P˜ ≤ min . Suppose the following assumptions hold. Bounds for Ep V directly yield convergence rates for the multiplicative Schwarz method and condition number estimates for the symmetrized Schwarz preconditioner. Substituting these ob- servations in the Rayleigh quotient yields the desired result. A(P˜ (I − P0 )u. p − 1. (I − P0 )u) max ≤ λmax P˜ . We next consider norm bounds for Ep = (I − P˜0 ) · · · (I − P˜p ).57. 0. p do w ← w + RkT A˜−1 k Rk (f − Aw) ⎩ Endfor If an exact solver is used for k = 0.

We next derive an estimate for Ep V . with v denoting the vector representation of v. The following algebraic relations will hold for Ei deﬁned by (2. Ep v) ≤ δ 2 A(v.. Ep v) ≤ δ 2 A(u.59. A) ≡ −1 ≤ . v) − A (Ep v. the desired results follow. The minimum eigenvalue of M −1 A will satisfy: 1 − δ 2 ≤ λmin (M −1 A). we may substitute the above into the following Rayleigh quotient. See [XU3. ⎪ ⎪ ⎪ ⎪ . We derive two preliminary results. to obtain: vT AM −1 Av A (v. 3. Since M −1 A = I − ETp Ep . . 2. Employing the deﬁnition of Ek and substituting Ek = (I − P˜k )Ek−1 for 0 ≤ k ≤ p yields the ﬁrst identity. Proof. ⎪ ⎩ Ep ≡ (I − P˜p ) · · · (I − P˜0 ). Lemma 2. Ep v) = . The maximum eigenvalue of M −1 A will satisfy: λmax M −1 A ≤ 1. ∀u ∈ V. We employ the notation: ⎧ ⎪ ⎪ E ≡I ⎪ −1 ⎪ ⎪ ⎪ E ˜ 0 ≡ (I − P0 ) ⎪ ⎨ E1 ≡ (I − P˜1 )(I − P˜0 ) (2.36): Ek−1 − Ek = P˜k Ek−1 .. λmin (M A) 1 − δ2 Proof. for 0 ≤ k ≤ p i I − Ei = k=0 P˜k Ek−1 . The condition number of the preconditioned matrix will satisfy: λmax M −1 A 1 cond(M. T v Av A (v. The assumption that Ep V ≤ δ is equivalent to: A(Ep v. for 0 ≤ i ≤ p. .36) ⎪ ⎪ . 1.90 2 Schwarz Iterative Algorithms Then the following results will hold. TO10]. The second identity is obtained from by summing up the ﬁrst identity and collapsing the sum. v). v) Since 0 ≤ A (Ep v. u). .

Then. 2.60. Let the parameters ω1 .5 Theoretical Results 91 Lemma 2. K0 and K1 be as deﬁned earlier. the following bound will hold: p . for v ∈ V .

Consider identity Ek−1 v − Ek v = P˜k Ek−1 v from Lemma 2. v2V − Ep v2V ≥ (2 − ω1 ) A P˜j Ej−1 v..) inner products of both sides with Ek−1 v + Ek v. and simplify: . . j=0 Proof. Ej−1 v . take A(.59.

.

Ek v . Ek−1 v2V − Ek v2V = A P˜k Ek−1 v. Ek−1 v + A P˜k Ek−1 v.

.

(I − P˜k )Ek−1 v . = A P˜k Ek−1 v. Ek−1 v + A P˜k Ek−1 v.

.

. By Lemma 2. the map P˜k is symmetric and positive semideﬁnite in the A(.41.) inner product and satisﬁes: . P˜k Ek−1 v . = 2A P˜k Ek−1 v. Ek−1 v − A P˜k Ek−1 v. .

.

Ek−1 v . Substituting this yields: . P˜k Ek−1 v ≤ ω1 A P˜k Ek−1 v. A P˜k Ek−1 v.

.

P˜k Ek−1 v . Ek−1 v − A P˜k Ek−1 v. Ek−1 v2V − Ek v2V = 2A P˜k Ek−1 v.

≥ (2 − ω1 )A P˜k Ek−1 v. · · · . p and collapsing the sum yields the desired result: p . Ek−1 v . Summing for k = 0.

Proof. Expand P˜ v and substitute v = Ei−1 v + (I − Ei−1 )v to obtain: . Let parameters ω1 .61. k=0 See [XU3.37) K0 (1 + K1 )2 for the error propagation map Ep of the multiplicative Schwarz method. Ek−1 v . Theorem 2. the following bound will hold: 2 − ω1 Ep v2V ≤ 1 − v2V (2. K0 and K1 be as deﬁned earlier. We are now able to derive the main result on norm bounds for Ep . v2V − Ep v2V ≥ (2 − ω1 ) A P˜k Ek−1 v. TO10] for additional details. Then for v ∈ V .

.

p A P˜ v. v = i=0 A P˜i v. v p .

.

Ei−1 v + i=0 A P˜i v. (I − Ei−1 )v p . p = i=0 A P˜i v.

i .

p = i=0 A P˜i v. . P˜k Ek−1 v . Ei−1 v + i=0 k=1 A P˜i v.

the Cauchy-Schwartz inequality may be generalized to yield: . ·). By Lemma 2. the mappings P˜i are symmetric and positive semideﬁnite in A(·. Conse- quently.59.92 2 Schwarz Iterative Algorithms The last line was obtained by an application of Lemma 2.41.

.

1/2 .

Ei−1 v ≤ A P˜i v. v A P˜i Ei−1 v. Ei−1 v . Summing the above for i = 0. p yields: p . 1/2 A P˜i v. · · · .

.

1/2 .

1/2 ˜ p ˜ i=0 A Pi v. Ei−1 v ≤ i=0 A Pi v. v A P˜i Ei−1 v. Ei−1 v .

1/2 .

v) i=0 A(Pi Ei−1 v. 1/2 p ˜ p ˜ ≤ i=0 A(Pi v. Ei−1 v) .

1/2 .

v i=0 A(P˜i Ei−1 v. Applying the deﬁnition of K1 yields: p i . 1/2 p = A P˜ v. Ei−1 v) .

P˜k Ek−1 )v P i=0 k=1 . A ˜i v.

1/2 .

Ek−1 v) . v) k=0 A(Pk Ek−1 )v. 1/2 p ˜ p ˜ ≤ K1 i=0 A(Pi v.

1/2 .

Combining both these results yields: . 1/2 p = K1 A P˜ v. v ˜ k=0 A(Pk Ek−1 )v. Ek−1 v) .

.

1/2 .

1/2 p A P˜ v. v ≤ A P˜ v. Ei−1 v) . v ˜ i=0 A(Pi Ei−1 v.

1/2 .

Ek−1 v) . v ˜ k=0 A(Pk Ek−1 )v. 1/2 p + K1 A P˜ v.

1/2 .

1/2 p = (1 + K1 )A P˜ v. v k=0 A( ˜k Ek−1 )v. Canceling common terms yields: . Ek−1 v) P .

1/2 .

v ≤ (1 + K1 ) A(P˜k Ek−1 )v. Ek−1 v) . 1/2 p A P˜ v.

v ≤ (1 + K1 )2 k=0 A(P˜k Ek−1 )v. k=0 p A P˜ v. Applying Lemma 2.60 yields: . Ek−1 v).

2 − ω1 Finally. applying the lower bound for the eigenvalue of P˜ yields: . v ≤ v2V − Ep v2V . (1 + K )2 1 A P˜ v.

(1 + K )2 K0−1 v2V ≤ A P˜ v. TO10] for additional details. K0 (1 + K1 )2 See [XU3. 2 − ω1 This immediately yields the desired inequality: 2 − ω1 Ep v2V ≤ 1 − v2V . . v ≤ 1 v2V − Ep v2V .

To ensure convergence of multiplicative Schwarz iterates.5 Theoretical Results 93 Remark 2. the parameter ω1 must satisfy ω1 < 2.37) for Ep V imposes restrictions on the choice of inexact solvers.62. We will . 2. The bound (2.

h0 and a(. So our eﬀorts will focus primarily on estimating how K0 depends on h. equipped with the A(. yielding Ep V = 0. We will show that K1 is independent of h. v ∈ H01 (Ω) F (v) ≡ Ω f v dx. the mul- tiplicative Schwarz algorithm based on exact solvers will converge in one iteration.28) of elliptic equation (2. Remark 2. for u. for v ∈ H01 (Ω).12) will have the form: q A(u. Sq of Ω which form a nonoverlapping decomposition: a(x) = ak > 0. henceforth assume that inexact solvers A˜k are suitably scaled so that λmax A˜−1 k Ak = ω 1 < 2. Since the rate of convergence of the multiplicative. 4 which is not optimal. The notation |a| will denote the variation in a(x): maxk ak |a| ≡ . for 1 ≤ k ≤ q. We next state our assumptions on the overlapping subdomains {Ωi∗ }pi=1 . suppose V0 . theoretical estimates yield K0 = K1 = 1 and ω0 = ω1 = 1 so that: * 3 Ep V ≤ . additive and hybrid Schwarz algorithms depend only on the parameters K0 and K1 .63. .) in weak formulation (2. . v) ≡ i=1 ai Si ∇u · ∇v dx. We shall assume that c(x) ≡ 0 and that exact solvers are employed in all projections. for x ∈ Sk . We shall make several simplifying assumptions and estimate the dependence of the convergence rate on the underlying mesh size h.).37) for EP V is not optimal.) and β for the ﬁnite element local spaces Vi and forms Ai (. Then. Readers are referred to [XU3. · · · . h0 and a(.).3 Applications to Finite Element Discretizations We shall now apply the preceding abstract Schwarz convergence theory to analyze the convergence of overlapping Schwarz algorithms for solving the ﬁ- nite element discretization (2.).) is piecewise constant on subregions S1 . subdomain size h0 . a(. minl al For the preceding choice of coeﬃcients. we shall estimate how these pa- rameters depend on h.13) of (2. so that A˜k = Ak for 0 ≤ k ≤ p and ω0 = ω1 = 1.12) with BD = ∂Ω.) and F (. TO10] for additional details.. . h0 . However. The bound (2. · · · . Indeed. . the terms A(..).5. Vp are mutually orthogonal subspaces which form an orthogonal de- composition of V .)-inner product. 2.. Assumption 1. overlap factor β h0 and the variation in the coeﬃcient a(. We assume that the coeﬃcient a(.

piecewise linear ﬁnite element functions deﬁned on Th (Ω). We assume that the overlapping subdomains {Ωi∗ }pi=1 are constructed from a non-overlapping decomposition {Ωi }pi=1 . Assumption 5. The Hilbert space V ≡ Vh ∩ H01 (Ω). we let π0 denote a traditional interpolation map onto V0 . for 1 ≤ i ≤ p ⎪ ⎪ ⎨ χ (x) = 0. h0 ) > 0 and c2 (Q0 . We let Vh denote the space of continuous. . We assume the following about the triangulation of Ω and the subdomains. for v ∈ C(Ω) ∩ H 1 (Ω). The following properties will be assumed about these operators. h0 and operator Q0 . (2. where each sub- domain Ωi∗ is an extension of Ωi of diameter h0 . we consider several operators which map onto this subspace. Given Ω1∗ . for 1 ≤ i ≤ p. We associate a p × p adjacency matrix G with the subdomains {Ωi∗ }pi=1 .38) 0. and when applicable. where 0 ≤ β denotes an overlap parameter. and I0 a weighted interpolation map onto V0 .64. · · · . Ωi ) < β h0 } . 1 ≤ i ≤ p i (2. Let the L2 (Ω)-orthogonal projection Q0 onto V0 satisfy: |Q0 v|2H 1 (Ω) ≤ c1 (Q0 . If a coarse space V0 is employed. {Ωi }pi=1 and {Ωi∗ }pi=1 . h. We assume a quasiuniform triangulation Th (Ω) of Ω.40) where c1 (Q0 . it will assumed to satisfy V0 ⊂ Vh ∩ H01 (Ω). Ωp∗ . Assumption 3.94 2 Schwarz Iterative Algorithms Assumption 2. for x ∈ Ω ⎪ ⎩ ∇χi L∞ (Ω) ≤ β h−1 0 . Deﬁnition 2. we deﬁne its adjacency matrix G by: ⎛ ⎞ 1. for v ∈ C(Ω) ∩ H 1 (Ω) v − Q0 v2L2 (Ω) ≤ c2 (Q0 . for x ∈ Ω\Ωi∗ . We assume there exists a smooth partition of unity {χi (x)}pi=1 subordinate to the cover {Ωi∗ }pi=1 satisfying the following con- ditions: ⎧ ⎪ 0 ≤ χi (x) ≤ 1. h0 ) > 0 denote parameters which may depend on h. while subspaces Vi for 1 ≤ i ≤ p are deﬁned as Vi ≡ Vh ∩ H01 (Ωi∗ ). If a coarse space V0 is employed.39) ⎪ ⎪ χi (x) + · · · + χp (x) = 1. if Ωi∗ ∩ Ωj∗ = ∅ Gij = and g0 ≡ max ⎝ Gij ⎠. h. h. We let Q0 denote the L2 (Ω)-orthogonal projection onto V0 . with overlap β h0 : Ωi∗ ≡ Ωiβ h0 ≡ {x ∈ Ω : dist(x. 1 ≤ i ≤ p. but not on the coeﬃcients {al }. (2. We will employ a partition of unity satisfying the following assumptions. h. Assumption 4. if Ωi∗ ∩ Ωj∗ = ∅ i j=i where g0 denotes the maximum number of neighbors intersecting a subdomain. h0 ) h20 |v|2H 1 (Ω) . h0 ) |v|2H 1 (Ω) . whose elements align with the subdomains {Si }qi=1 .

DR11. DR11].41) where c1 (π0 .44) where c is independent of h. The L2 (Ω)-orthogonal projection Q0 onto V0 will satisfy: |Q0 v|2H 1 (Ω) ≤ c |v|2H 1 (Ω) . h0 ) denote parameters which may depend on h. h. a(. The L2 (Ω)-orthogonal projection Q0 will typically be global. If as in multigrid methods. for Ω ⊂ IR2 2 ⎪ |π0 v|2H 1 (Ωi ) ≤ c (1 + (h0 /h)) |v|2H 1 (Ωi ) . for Ω ⊂ IR3 ⎪ ⎩ v − π0 v2L2 (Ωi ) ≤ c h20 |v|2H 1 (Ωi ) . For such a coarse space. for v ∈ H 1 (Ω) (2. h0 . where c1 (I0 . (2.).). Remark 2. for v ∈ H 1 (Ω) where c is independent of h. (2. h0 ) and c2 (I0 . h.65. 2 Remark 2.43) v − Q0 v2L2 (Ω) ≤ c h20 |v|2H 1 (Ω) . h0 ) and ci (I0 . h0 and I0 but not on the coeﬃcients {al }. we assume Sl = Ωl for 1 ≤ l ≤ p with p = q. h. for v ∈ C(Ω i ) ∩ H 1 (Ωi ).42) v − I0 v2L2 (Ωi ) ≤ c2 (I0 . h. as noted in the following. piecewise linear ﬁnite element functions on Th0 (Ω). h0 ) j:Gij =0 d2ij |v|2H 1 (Ωj ) . see [BR22. h0 ) h20 |v|H 1 (Ωi ) . in the sense that (Q0 w)(x) for x ∈ Ωj may depend on w(·) in Ω\Ωj . h. h0 ) and c2 (π0 . interpolation map π0 is required to be local on the subregions Ωj . h. h0 ). for v ∈ C(Ω i ) ∩ H 1 (Ωi ) v − π0 v2L2 (Ωi ) ≤ c2 (π0 . ci (π0 . see [CI2. The weights dij ≥ 0 depend on aj the coeﬃcients {al } and satisfy dij ≤ ai +a j . h0 ) |v|2H 1 (Ωi ) . h0 ) denote parameters which can depend on h. BR21]. 3. then a coarse subspace V0 ⊂ Vh can be deﬁned as the continuous. h. h. XU3. a(. for Ω ⊂ IRd . (2. . but not on the coeﬃcients {al }. so that ai d ij /aj ≤ 1. h. h. h0 ) in assumption 5. the triangulation Th (Ω) is obtained by the reﬁnement of some coarse quasiuniform triangulation Th0 (Ω) whose elements {Ωi }pi=1 have diameter h0 . The standard nodal interpolation map π0 onto V0 will satisfy the following bounds on each element Ωi of Ω for v ∈ C(Ω) ∩ H 1 (Ω): ⎧ ⎨ |π0 v|H 1 (Ωi ) ≤ c (1 + log(h0 /h)) |v|2H 1 (Ωi ) . h0 . We assume that I0 : C(Ω) ∩ H 1 (Ω) → V0 satisﬁes the following bound on each subdomain Ωi for v ∈ H 1 (Ω): |I0 v|2H 1 (Ωi ) ≤ c1 (I0 .5 Theoretical Results 95 When applicable. explicit bounds are known for ci (Q0 . since (π0 w)(x) for x ∈ Ωj depends only on the values of w(·) in Ωj .66. h. BR21. JO2. h0 ) h20 j:Gij =0 d2ij |v|H 1 (Ωj ) . d = 2. h0 and π0 . If a weighted interpolation map can be deﬁned. 2. In contrast. we assume that π0 : C(Ω) ∩ H 1 (Ω) → V0 (the traditional interpolation map) satisﬁes the following local bounds on each Ωi : |π0 v|2H 1 (Ωi ) ≤ c1 (π0 .

Estimation of K1 Lemma 2. WA6]. ∀v ∈ Vh . see also Chap. the following will hold for the subspaces Vi deﬁned as Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p: 1. In particular. .46) See [ST14. MA17] applies to general grids and yields robust convergence. (2. Remark 2. the piecewise constant coarse space [CO8. Assumption 6. aj where c is independent of h and h0 (and a(x)). as in (2. In applications. CA18]. if V0 is employed K1 ≤ ω1 g0 . We assume that the following inverse inequality holds with a parameter c (independent of h) such that on each element κ ∈ Ωh |v|H 1 (κ) ≤ C h−1 vL2 (κ) .38). We refer the reader to [CO8.67. Let g0 denote the maximum number of neighboring subdomains which intersects a subdomain. MA17. Then. DR10.35) will satisfy: l0 ≤ g0 . SA7.9. JO2]. 2. SA7. MA17. 2 (2. 3. see [WI6.96 2 Schwarz Iterative Algorithms A piecewise constant weighted interpolation map I0 onto V0 can be deﬁned satisfying the following bounds on each element Ωi of Ω for v ∈ C(Ω)∩H 1 (Ω): |I0 v|2H 1 (Ωi ) ≤ c 1 + log2 (h0 /h) j:Gij =0 dij |v|H 1 (Ωj ) .45) v − I0 v2L2 (Ωi ) ≤ c h20 j:Gij =0 dij |v|H 1 (Ωj ) . The parameter K1 will satisfy: ω1 (g0 + 1). alternative coarse spaces may be employed. GI3. if V0 is not employed.68. CI2. The parameter l0 deﬁned by (2. and dij ≤ ai +aj .

Similarly. . parameter ij = 1 for 1 ≤ i. . An application of Lemma 2. Consider the matrix E = (ij )i. V1 . j ≤ p. . Gij = 0 will yield ij = 0 for 1 ≤ i. The following observa- tion relates the entries of E to the entries of the following matrix G: Gij = 0 =⇒ Ωi∗ ∩ Ωj∗ = ∅ =⇒ H01 (Ωi∗ ) ⊥ H01 (Ωj∗ ). j ≤ p. when Gij = 1. p Proof. where ω1 = maxi λmax A˜−1i Ai .j=0 of strengthened Cauchy-Schwartz parameters associated with subspaces V0 .51 now yields the desired result. representing subdomain adjacencies. Vp . Thus. . .

Proof. The next preliminary result will be employed later in this section in estimating the parameter C0 . x ∈ κ.69. and the rate of convergence of a traditional two-level overlapping Schwarz algorithm depends primarily only on the parameter K0 (or equivalently C0 ). h0 and a(. or equivalently the partition parameter C0 (since we assume ω0 = ω1 = 1) for diﬀerent Schwarz algorithms. Application of the triangle and arithmetic-geometric mean inequality yields: 2 2 2 |wi |H 1 (κ) ≤ 2 |πh χi (xκ )w|H 1 (κ) + 2 |πh (χi (·) − χi (xκ )) w|H 1 (κ) . 2.). Given w ∈ Vh ∩ H01 (Ω) deﬁne wi ≡ πh χi w ∈ Vi for 1 ≤ i ≤ p. 1. |a|. with or without a coarse space. Thus K1 is typically independent of these parameters. h0 . For a typical overlapping decomposition {Ωi∗ }pi=1 of Ω and for suﬃciently small β. By construction w1 + · · · + wp = πh (χ1 + · · · + χp ) w = πh w = w. 2.5 Theoretical Results 97 Remark 2. we shall estimate the parameter K0 . 2 2 . We obtain w = w1 + · · · + wp .70. Then. and β. We express: wi (x) = πh χi (x)w(x). 2. Let the assumptions 1 through 6 hold. In the following. C will denote a generic constant independent of h. the the number g0 of adjacent subdomains is independent of h. Estimation of K0 Lemma 2. For each 1 ≤ i ≤ p and 1 ≤ j ≤ q the following bound will hold: ! aj |∇wi | dx ≤ 2aj 2 |∇w| dx + Cβ 2 −2 h−2 0 w2L2 (Sj ) . 1. β and |a|. h0 . Substituting πh χi (xκ )w = χi (xκ )w on κ and the inverse inequality yields: |wi |H 1 (κ) ≤ 2χi (xκ )2 |w|H 1 (κ) + Ch−2 |πh (χi (·) − χi (xκ ))w|L2 (κ) 2 2 2 ≤ 2 |w|H 1 (κ) + 2Ch−2 |πh (χi (·) − χi (xκ )) w|L2 (κ) . whose value may diﬀer from one line to the next. Consider an element κ ∈ Sj and let xκ be its geometric centroid. x∈κ = Ih χi (xκ )w(x) + πh (χi (x) − χi (xκ )) w(x). the following results will hold. Let Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p be local ﬁnite element spaces. 3. For convenience. Suppose the following conditions hold. Sj Sj where C > 0 is independent of h. with some abuse of notation.

72. 2 Sj Sj Since the terms on the left hand side above are zero when Sj ∩ Ωi∗ = 0. Without loss of generality. If m0 denotes the maximum number of subdomains Ωj∗ intersecting a subregion Si .70. By Taylor expansion.71.47) ≤ 2g0 1 + Cβ −2 h−2 0 ||a| A(w. Substituting the above in the expression preceding it yields: ⎧ 2 2 −2 ⎨ |wi |H 1 (κ) ≤ 2 |w|H 1 (κ) + Ch πh (χi (·) − χi (xκ )) wL2 (κ) 2 ⎪ ≤ 2 |w|H 1 (κ) + 2Ch−2 β −2 h−2 2 0 h wL2 (κ) 2 2 ⎪ ⎩ = 2 |w|H 1 (κ) + 2Cβ −2 h−2 2 0 wL2 (κ) . we only need sum the above for i such that Gij = 0 to obtain: ! p aj |∇wi |2 dx ≤ 2g0 aj |∇w|2 dx + Cβ −2 h−2 0 aj wL2 (Sj ) . Suppose the following conditions hold. we employed that 0 ≤ χi (xκ ) ≤ 1. Remark 2. we may assume that the subregions {Si }qi=1 are obtained by reﬁnement of {Ωj }pj=1 (if needed by intersecting the Si with Ωj ). |a| and β.70 to obtain: aj |∇wi |2 dx ≤ 2aj |∇w|2 dx + 2Caj β −2 h−2 0 wL2 (Ωj ) . 1. By construction w1 + · · · + wp = w. we estimate C0 when a coarse space V0 is not employed. then it immediately follows that m0 ≤ g0 where g0 denotes the maximum number of overlapping subdomains intersecting any Ωi∗ . Given w ∈ Vh ∩ H01 (Ω) deﬁne wi ≡ πh χi w ∈ Vi for 1 ≤ i ≤ p. xκ ). yielding that parameter C0 ≤ 2g0 1 + Cβ −2 h−2 0 ||a| . Proof. β. 2 i=1 Sj Sj . Our estimate will be based on Lemma 2. for C independent of h. Let the assumptions 1 through 6 hold. |a| and h0 . h0 . we obtain: |χi (x) − χi (xκ )| = |∇χi (˜ x) · (x − xκ )| ≤ Cβ −1 h−1 0 h. 3. the decomposition will satisfy: p −2 −2 q j=1 A(wi . wi ) ≤ 2g0 A(w. Let Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p. Then. 2 Here C is a generic constant independent of h.98 2 Schwarz Iterative Algorithms Here. Summing over all the elements κ ∈ Sj and multiplying both sides by aj yields the result. In the following result. w). for some point x ˜ on the line segment (x. Lemma 2. 2. w) + 2g0 Cβ j=1 aj wL2 (Sj ) 2 h0 (2. Apply Lemma 2.

q yields: p p q i=1 A(wi .5 Theoretical Results 99 Summing the above for j = 1. · · · . wi ) = j=1 aj Sj |∇wi | dx 2 i=1 q . 2.

but not optimal with respect to coeﬃcient variation |a|. we obtain: q 2 A(Q0 v. Q0 v) = j=1 aj Sj |∇Q0 v| dx ≤ a∞ |Q0 v|2H 1 (Ω) ≤ a∞ c1 (Q0 . 3. Theorem 2. 2. Suppose the following conditions hold. where C is independent of h.73. h0 )β −2 |a| . w) + 2g0 Cβ −2 h−2 0 a∞ a −1 ∞ A(w. By construction. it is easily veriﬁed that v0 + v1 + · · · + vp = v. h0 . . h. |a| and β and ci (Q0 . the following will hold for v0 . The preceding bound for C0 deteriorates as h0 → 0. These bounds. . w) + 2g0 Cβ −2 h−2 0 a∞ wL2 (Ω) 2 ≤ 2g0 A(w. 2. whose value may diﬀer from one line to the next. v1 . . vp : p A(vi .40). Given v ∈ Vh ∩ H01 (Ω) deﬁne v0 = Q0 v and vi ≡ πh χi (v − v0 ). . Then. h. Proof. Let V0 ⊂ Vh ∩ H01 (Ω) be a coarse space for which Q0 satisﬁes (2. Here. h. h)|v|2H 1 (Ω) q ≤ c1 (Q0 . With abuse of notation. C denotes a generic constant.40). as it enables transfer of some information globally each iteration. we employed Poincar´e-Freidrich’s inequality to bound w2L2 (Ω) in terms of |w|2H 1 (Ω) . h)a∞ a−1 ∞ j=1 aj Sj |∇v| dx 2 = c1 (Q0 . vi ) ≤ C0 A(v. h0 . i=0 with C0 ≤ C (g0 +1) 1 + c1 (Q0 . derived using the projection Q0 . w). h0 )|a| + c2 (Q0 . Let assumptions 1 to 6 hold with Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p. h0 ) has known dependence on h and h0 for i = 1. This deterioration is observed in Schwarz algorithms in which information is only exchanged between adjacent subdomains each iteration. h. see equation (2. are independent of h0 . w) = 2g0 (1 + Cβ −2 h−2 0 |a|)A(w. ≤ 2g0 j=1 aj Sj |∇w|2 dx + Cβ −2 h−2 0 wL2 (Sj ) 2 ≤ 2g0 A(w.40) in assumption 5. w) + 2g0 Cβ −2 h−2 0 a∞ |w|H 1 (Ω) 2 ≤ 2g0 A(w. Since the projection Q0 satisﬁes (2. . h0 . v). 1. The following result estimates C0 when a coarse subspace V0 is employed. v). h0 ) |a| A(v. Inclusion of a coarse space can remedy such deterioration.

40) from assumption 5.100 2 Schwarz Iterative Algorithms Here.47) from Lemma 2. Now apply equation (2.72 using w = v −v0 and also using wi ≡ vi = πh χi w to obtain: p . we used equation (2.

vi ) ≤ 2g0 A(w. q −2 −2 j=1 A(vi . w) + Cβ j=1 aj wL2 (Sj ) 2 h0 .

≤ 2g0 A(w. w) + Cβ −2 h−2 0 a∞ v − Q0 vL2 (Ω) 2 .

. · · · . . h0 )|a|+c2 (Q0 . h. h0 ) β −2 a∞ a−1 ∞ A(v. w) ≤ 2 (1 + c1 (Q0 . Suppose the following assumptions hold. Let p = q and Sj = Ωj for 1 ≤ j ≤ p. v). ≤ 2g0 A(w. Substituting the above and combining the sums for i = 0. The next result considers alternative bounds for C0 when |a| is large. Theorem 2. then c1 (Q0 . i=0 where C is a generic constant independent of h. vi ) ≤ (g0 +1)C(1+c1 (Q0 . h0 . Let π0 satisfy equation (2.74. w) + Cc2 (Q0 .40) in assumption 5. applying the triangle inequality yields: A(w. where C is independent of h.43). . h0 . see equation (2. piecewise linear ﬁnite element functions deﬁned on a coarse triangulation Th0 (Ω) of Ω from which Th (Ω) is obtained by successive reﬁnement. . |a| and β.. v) = 2g0 A(w. w) + Cc2 (Q0 . h0 . Remark 2. . h0 ) β −2 h−2 0 a∞ h 2 0 |v|2 1 H (Ω) ≤ 2g0 A(w. Let V0 ⊂ Vh ∩ H01 (Ω) be a coarse space. Let assumptions 1 to 6 hold with Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p. When V0 is the traditional coarse space of continuous. 4. For v ∈ Vh ∩ H01 (Ω). 2. β and |a|. h.) and c2 (Q0 . h0 )β −2 |a|)A(v.) are independent of h. yielding: C0 ≤ C (g0 + 1)|a| 1 + β −2 . h.41).75. p yields: p A(vi . h.. 5. h0 ) |a|) A(v. 1. |a| and β. h. h0 . 3. Since w = v − v0 . This result shows that a Schwarz algorithm employing traditional coarse space residual correction is robust when the variation |a| in the coeﬃcients is not large. |a| and β. v). h. h0 ) β −2 |a|A(v. v) . while c2 (Q0 . h0 ) was used from equation (2. deﬁne v0 = π0 v and vi ≡ πh χi (v − v0 ). w) + C c2 (Q0 . h. where C is a generic constant independent of h.

Apply equation (2. |a| and β and ci (π0 . h0 ) j=1 aj |v|2H 1 (Sj ) ⎪ ⎩ = c1 (π0 . h. By construction v0 + · · · + vp = v. 2. Proof. Apply Lemma 2. h.72 with w = v − v0 and wi ≡ vi = πh χi w. i=0 where C is independent of h. h0 ) β −2 A(v. h. v). vi ) ≤ C (g0 + 1) c1 (π0 . h0 ) + c2 (π0 .47): p . h0 .5 Theoretical Results 101 Then. h0 ) are deﬁned in equation (2.41) of assumption 5. yielding (2. v). h. π0 v) = j=1 aj Sj |∇π0 v| dx q ≤ c1 (π0 . the following estimate will hold: p A(vi .41) to obtain: ⎧ q 2 ⎪ ⎨ A(π0 v. h0 ) A(v. h.

w) + Cβ j=1 aj wL2 (Sj ) 2 h0 . q −2 −2 j=1 A(vi . vi ) ≤ 2g0 A(w.

q = 2g0 A(w. w) + Cβ −2 h−2 0 j=1 aj v − π0 vL2 (Sj ) 2 .

h.42) is used onto V0 .. if Ω ⊂ IR3 . w) + Cβ −2 h−2 2 0 c2 (π0 . . h0 ) h0 j=1 aj |vH 1 (Sj ) 2 = 2g0 A(w. This result indicates that Schwarz algorithms employing traditional coarse spaces have reasonably robust theoretical bounds independent of |a|. h. h0 ) + c2 (π0 . Improved bounds result if I0 -interpolation (2. h0 ) is deﬁned in equation (2. if Ω ⊂ IR2 C0 ≤ C (g0 + 1) (h0 /h) + β −2 . then bounds for c1 (π0 .76. the triangle inequality yields: A(w. h. h. Since w = v − v0 . i=0 where C is independent of h. v). |a| and β. . q ≤ 2g0 A(w. Substituting this and combining the terms yields: p A(vi . .44) yields: C (g0 + 1) log(h0 /h) + β −2 . computational tests indicate almost optimal convergence in both two and three dimensions. h0 ) β −2 A(v.) in equation (2. When V0 is a traditional ﬁnite element coarse space deﬁned on a coarse triangulation Th0 (Ω) of Ω. vi ) ≤ C(g0 + 1) c1 (π0 . . h. v). |a| and β. v) where c2 (π0 . w) ≤ 2 (1 + c1 (π0 . h0 )) A(v. h0 )β −2 A(v. whose successive reﬁnement yields Th (Ω).. While these bounds deteriorate in three dimensions. h0 . w) + C c2 (π0 . h0 .41) and C is independent of h. . Remark 2. h.) and c2 (π0 .

Let I0 satisfy equation (2. 4. 5.42) of assumption 5.102 2 Schwarz Iterative Algorithms Theorem 2. I0 v) = i=1 ai |∇I0 v| dx Ωi p d2 ≤ c1 (I0 . Proof. the following estimate will hold: p A(vi . Let p = q and Sj = Ωj for 1 ≤ j ≤ p. h. while ci (I0 . Let assumptions 1 to 6 hold with Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p.77.42). Then. h0 ) is deﬁned in equation (2. h.47) from Lemma 2. |a| and β.42) to obtain: p 2 A(I0 v. For v ∈ Vh ∩ H01 (Ω). Let V0 ⊂ Vh ∩ H01 (Ω) be a coarse space. h0 ) β −2 A(v. h0 ) + c2 (I0 . h0 ) i=1 ai j:Gij =0 aijj aj |v|2H 1 (Ωj ) p ai d2 ≤ c1 (I0 .72 with w = v − v0 and wl ≡ vl = πh χl w: p . 1. vi ) ≤ C (g0 + 1) c1 (I0 . h0 ) A(v. v). 3. 2. h0 . i=0 where C is independent of h. h. deﬁne v0 = I0 v and vi ≡ πh χi (v − v0 ). Suppose the following assumptions hold. Apply (2. Apply equation (2. h. h0 ) i=1 j:Gij =0 ajij aj |v|2H 1 (Ωj ) ≤ g0 c1 (I0 . h. By construction v0 + · · · + vp = v. h. v).

w) + C β i=1 ai wL2 (Ωi ) 2 h0 . p −2 −2 l=1 A(vl . vl ) ≤ 2g0 A(w.

w) + C β −2 h−2 0 i=1 ia v − I 0 v 2 2 . p = 2g0 A(w.

h0 )) A(v. w) ≤ 2 (1 + c1 (I0 . h0 ) g0 β −2 A(v. Since w = v − v0 . where C is independent of h. p L (Ωi ) −2 −2 ai d2 ≤ 2g0 A(w. h. |a| and β. w) + Cg0 c2 (I0 . h0 . i=0 where C is independent of h. h. Substituting this and combining the terms yields: p A(vi . v). |a| and β. w) + Cβ h0 c2 (I0 . v) . h0 )h0 i=1 j:Gij =0 ajij aj |v2H 1 (Ωj ) 2 = 2g0 A(w. . applying the triangle inequality yields: A(w. h0 ) β −2 A(v. h. v). h0 ) + c2 (I0 . h0 . h. h. vi ) ≤ C (g0 + 1) c1 (I0 .

• Non-overlapping subdomains {Ωi }pi=1 can be chosen as strips of the form: Ωi ≡ {(x1 . if Ω ⊂ IRd . x2 ) : bi < x1 < bi+1 } ∩ Ω. and weakly coupled along the x1 axis. Ωp . In the limiting case of = 0.42) will satisfy: C0 ≤ C (g0 + 1) log2 (h0 /h) + β −2 . .78. Anisotropic Problems We next outline estimates for Schwarz algorithms applied to solve anisotropic elliptic equations. To obtain strips of width h0 .e. MA17. (2.49) for some choice of bi . on ∂Ω. . Thus. eﬃcient direct solvers (such as band solvers) may be available for solution of the strip problems. • Extended subdomains {Ωi∗ }pi=1 can be constructed from the strips {Ωi }pi=1 using an overlap factor of β h0 for some 0 < β < 1/2. ensure that: |bi+1 − bi | = O(h0 ).48) u = 0.. . yielding a matrix with small bandsize. for d = 2.) and c2 (I0 . where Ω ⊂ IR2 and 0 < 1 is a small perturbation parameter. The weak coupling along the x1 axis suggests several heuristic choices in the formulation of the Schwarz iterative algorithm. . the elliptic equation will not be coupled along the x1 axis. WA6]. When V0 is the piecewise constant coarse space deﬁned on the subdomain decomposition Ω1 . • If h0 is suﬃciently small. see [CO8. If such layers need to be resolved computationally. then reﬁnement of the grid may be necessary in such subregions. . 3. . there may be subregions of Ω on which the solution has large gradients. then a coarse space V0 may not be required to ensure robust convergence. SA7. Sharper estimates with respect to overlap β are obtained in [DR17].5 Theoretical Results 103 Remark 2. 2. then bounds for c1 (I0 .. in Ω (2. provided the discrete unknowns within each strip are ordered horizontally. We consider the following model anisotropic problem: −ux1 x1 − ux2 x2 + u = f. . i. √ • If the overlap factor is chosen so that β h0 ≥ c . for 1 ≤ i ≤ p.) in equation (2. For 0 < 1.. the solution may exhibit boundary layer behavior near ∂Ω. Schwarz algorithms employ- ing the piecewise constant coarse space will have almost optimal convergence bounds in both two and three dimensions. the preceding elliptic equation will be strongly coupled along the x2 axis. row by row. . Due to presence of the small parameter . .

for suﬃciently small β. The proof involving a ﬁnite element discretization can be obtained by appropriate modiﬁcation of the proof given below.49) with width h0 . and β. h0 . Such a partition of unity will not satisfy χi (x) = 0 for x ∈ ∂Ω. Lemma 2. Consider a ﬁnite element discretization of elliptic equation (2. We outline the proof only in the continuous case. Parameter K1 will satisfy K1 ≤ g0 . since ω0 = ω1 = 1) will satisfy: K0 ≤ C g0 1 + β −2 h−2 0 . given the strip subdomains. provided that in the general case the subdomains be chosen as cylinders or strips whose sections are perpendicular to the axis of weak coupling of the elliptic equation. this will not alter the construction of wi described below. we obtain: ⎧ ⎪ ⎪ A(wi . 1. Employ a Schwarz algorithm based on subspaces Vi ≡ Vh ∩ H01 (Ωi∗ ) for 1 ≤ i ≤ p. Ωp∗ .48) based on a ﬁnite element space Vh ∩ H01 (Ω). Let g0 denote the maximum number of adjacent overlapping subdomains.. To esti- mate K0 . Parameter K0 (equivalently C0 . wi ) = ∂w ∂x1 L2 (Ωi∗ ) + ∂x2 L2 (Ωi∗ ) + wi L2 (Ωi∗ ) i 2 ∂wi 2 2 ⎪ ⎨ . we shall employ a partition of unity χ1 (x).e. However. 1 ≤ i ≤ p. Given such a partition of unity and w ∈ H01 (Ω) deﬁne wi ≡ χi w. · · · . 2. Extend each Ωi to Ωi∗ to have overlap β h0 where β < 1/2. 1.79. Then.68 yields K1 ≤ g0 . Proof. · · · . Applying Lemma 2. Furthermore: ∂wi ∂χi ∂w ∂wi ∂w = w + χi and = χi . without a coarse space V0 . such that χi (x) = χi (x1 ).104 2 Schwarz Iterative Algorithms These ideas may be extended to more general anisotropic problems in two or three dimensions. Then the following will hold. and use exact local solvers. We now estimate the convergence rate of Schwarz iterative algorithms applied to anisotropic problem (2. We further require the smoothness assumption: + + + ∂χi + + + −1 −1 + ∂x1 + ≤ Cβ h0 . Choose subdomains Ωi for 1 ≤ i ≤ p of the form (2. i.48). ∂x1 ∂x1 ∂x1 ∂x2 ∂x2 Employing arguments analogous to the isotropic case. 2. χp (x) subordinate to the strip subdomains Ω1∗ . for C independent of h. by construction (w1 + · · · + wp ) = (χ1 + · · · + χp ) w = w. each partition of unity function is solely a function of the variable x1 . 3. since the partition of unity functions will multiply functions which are in H01 (Ω).

≤ C β −2 h−2 0 w 2 2 ∗ + ∂w 2 2 ∗ + ∂w 2 2 ∗ + w2 2 ∗ ⎪ ⎪ L (Ωi ) .

∂x1 L (Ωi ) ∂x2 L (Ωi ) L (Ω ) i ⎩ = C 1 + β −2 h−2 ∂w 2 ⎪ ∂x1 L2 (Ω ∗ ) + ∂x2 L2 (Ω ∗ ) + wL2 (Ω ∗ ) . ∂w 2 2 0 i i i .

We will assume that the parabolic equation has been suitably discretized. T ] u = 0. 0) = u0 (x). Grid reﬁnement may be necessary to resolve such layer regions. KU6. Let ∗ each extended subdomain √ Ωi be constructed by extending Ωi to include overlap of size βh0 ≥ c τ . KU6. CA. . i=1 Thus C0 ≤ Cg0 1 + β −2 h−2 0 .51) u = 0. for C independent of h. on ∂Ω × [0. where Lu ≡ −∇ · (a∇u). CA3]. Time Stepping Problems We conclude this section by considering the Schwarz algorithm for the iterative solution of the linear system arising from the implicit time stepping of a ﬁnite element or ﬁnite diﬀerence discretization of a parabolic equation: ⎧ ⎪ ⎨ ut + Lu = f. h0 and τ . • The Schwarz algorithm based on the subspaces Vi ≡ Vh ∩ H01 (Ωi∗ ) will have optimal order convergence without the use of a coarse space. where C will be independent of h0 and (and h in the discrete case). then the term 1 + β −2 h−2 0 will be bounded and convergence of Schwarz algorithms will be robust without the inclusion of coarse space correction. CA.50) ⎪ ⎩ u(x. in Ω.80. . 9. . then the elliptic equation resulting from an implicit time stepping of (2. .50) will have the form: (I + τ L) = f˜. see [KU3.51) or its discretizations [KU3. . CA3] and Chap. w). Estimates yield that K1 ≤ g0 and: K0 ≤ Cg0 1 + τ β −2 h−2 0 . The presence of the small parameter 0 < τ 1 enables simpliﬁcation of Schwarz algorithms to solve (2. • Let Ω1 . in Ω (2. on ∂Ω. The absence of coarse space residual correction can be particularly advan- tageous from the viewpoint of parallelization.5 Theoretical Results 105 Summing over 1 ≤ i ≤ p yields the following bound: p A(wi . This elliptic equation is singularly perturbed for τ → 0+ and may exhibit boundary layer behavior on subregions. 2. If τ > 0 denotes the time step. √ Remark 2. in Ω × [0. T ] (2. wi ) ≤ Cg0 1 + β −2 h−2 0 A(w. since coarse spaces requires interprocessor communication. Ωp denote a nonoverlapping decomposition of Ω of size h0 . If the overlap satisﬁes βh0 ≥ c .

9 describes theoretical estimates for the condition number of various Schur complement preconditioners. referred to as the Schur complement system. the traditional substructuring method in structural engineering.4 describes several preconditioners for two subdomain Schur complement matrices. 3.5 and Chap. PR5]. Chap. These methods iteratively solve the linear systems arising from the discretization of a self adjoint and coercive elliptic equation. In Chap. 3. This reduced system.1 we introduce notations. assembles and solves the Schur complement system using a direct method [PR4.3 Schur Complement and Iterative Substructuring Algorithms In this chapter. This parameterization enables reducing the original elliptic equation to a Steklov- Poincar´e problem for determining the solution on such boundaries. while Chap. Our discussion in this chapter is organized as follows. The Schur complement system and its algebraic proper- ties are described in Chap. while Chap.7 describes the Neumann-Neumann and balancing preconditioners. with the substructuring method.3 describes FFT based fast direct solvers for Schur complement systems on rectangular domains with stripwise constant coeﬃcients. By contrast. In the discrete case. 3. 3.8 dis- cusses implementational issues. 3. Chap. 3. the global solution can be obtained by solving a local boundary value problem on each subdomain. parameterizing the global solution in terms of its Dirichlet values on the subdomain boundaries. 3.2. corresponds to a block Gaussian elimination of the unknowns in the interiors of the subdomains. in parallel.6 describe multi-subdomain preconditioners for Schur complements in two dimensions and three dimensions. we describe multi-subdomain Schur complement and iterative substructuring methods. the solution to an elliptic equation can be parameterized in terms of its unknown Dirichlet values on the subdomain boundaries. and this property enables the formulation of various eﬀective preconditioners. is iteratively solved by a PCG method. which pre-dates domain decomposition methodology. Chap. In the continuous case. Chap. based on a decomposition of its domain into non-overlapping subdomains. Once the reduced problem is solved. The Schur complement matrix is by construction a discrete approximation of the Steklov-Poincar´e operator. to obtain a reduced problem. 3. 3. .

Here BD and BD denote the Dirichlet and Neumann boundary segments. with BD ∪ BN = ∂Ω and BD ∩ BN = ∅.1) ⎩ n · (a∇u) = gN (x). (3. and substituting this into (3. Given a quasiuniform triangulation Th (Ω) of Ω. B[i] the exterior non-Dirichlet segment. . for 1 ≤ i ≤ n. ∀vh ∈ Vh ∩ HD 1 (Ω). A ﬁnite element discretization of (3.108 3 Schur Complement and Iterative Substructuring Algorithms 3. ∀u.1) seeks uh ∈ Vh ∩ HD 1 (Ω) satisfying: ⎧ ⎪ ⎪ A(uh . The following notation will be employed for subdomain boundaries. This system will be partitioned into subblocks based on an ordering of the nodes given a decomposition of the domain into non-overlapping subdomains. the standard piecewise linear nodal basis functions {φi (x)}ni=1 dual to these nodes will satisfy: φj (xi ) = δij . Here B (i) denotes the interior and Neumann segment of ∂Ωi . xn .3) A matrix representation of the discretization n(3. on BN .1) if: Ω = ∪pl=1 Ω l and Ωi ∩ Ωj = ∅ when i = j. ∀v ∈ HD 1 (Ω) ⎪ ⎩ 1 HD (Ω ≡ v ∈ H (Ω) : v = 0 on BD . v) ≡ (a ∇u · ∇v + c uv) dx. piecewise linear functions deﬁned on Th (Ω). Deﬁnition 3. where a(x) ≥ a0 > 0 and c(x) ≥ 0.2) can be obtained by expand- ing uh relative to this nodal basis uh (y) ≡ i=1 uh (xi ) φi (y). j ≤ n ⎪ (u)i = uh (xi ).2) Let n denote the number of nodes of Th (Ω) in (Ω ∪ BN ). . This results in a linear system: Ah u = f . Ωp forms a non-overlapping de- composition of Ω (see Fig. and B the interface separating the subdomains. . for 1 ≤ i ≤ n ⎪ ⎩ (f )i = F (φi ). vh ) = F (vh ). We enumerate them as x1 . We also let Bij ≡ B (i) ∩ B (j) denote the interface between Ωi and Ωj . . 1 ≤ i. we shall let Vh denote the ﬁnite element space of continuous.4) where: ⎧ ⎨ (Ah )ij = A(φi . 1 (3.1 Background We consider the following self adjoint and coercive elliptic equation: ⎧ ⎨ −∇ · (a(x) ∇u) + c(x) u = f (x). for 1 ≤ i. 3. Then. We shall say that Ω1 . . (3. . B ≡ ∪pi=1 B (i) and B (i) ≡ ∂Ωi \BD and B[i] ≡ ∂Ωi ∩ BD for 1 ≤ i ≤ p. . v ∈ HD (Ω) 1 Ω ⎪ ⎪ F (v) ≡ Ω f v dx + BN gN v dsx . in Ω u=0 on BD . .1. j ≤ n. (3.2) with vh = φj for 1 ≤ j ≤ n. . where ⎪ ⎨ A(u. φj ).

. Ωp and interface B.. (3. nI ). j ≤ nB ⎨ (uI )j = (u)j . We shall assume that the sub- domains are chosen to align with the triangulation Th (Ω). We shall assume that the chosen ordering of nodes satisﬁes: (1) (i−1) (1) (i) xj ∈ Ωi . Let nB denote the number (1) (p) of nodes on B. (1) (p) where nI ≡ (nI + · · · + nI ) denotes the total number of nodes in subdomain interiors. Multidomain non-overlapping decompositions In most applications.5) will be further partitioned using submatrices arising from the subregions. and this will be described later. . ⎪ ⎪ ⎪ ⎪ for 1 ≤ j ≤ nI ⎪ ⎪ (f I )j = (f )j . though strip decompositions have advantages. . . for (nI + . by construction it will hold n = (nI + · · · + nI + nB ). f TB where: ⎧ ⎪ (AII )lj = (Ah )lj . ⎪ ⎩ (f B )j = (f )nI +j .nI +j . as in Fig. j ≤ nI ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (AIB )lj = (Ah )l. 3. for (nI + 1) ≤ j ≤ (nI + nB ). and that the nodes x1 . The nodes within each subdomain Ωi and on the (i) interface B may be ordered arbitrarily. for 1 ≤ j ≤ nB .5) ATIB ABB uB fB T T corresponding to the partition u = uTI . for 1 ≤ l ≤ nI and 1 ≤ j ≤ nB ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (ABB )lj = (Ah )nI +l. for 1 ≤ l. for 1 ≤ i ≤ p xj ∈ B. . - AII AIB uI fI = . . Let nI denote the number of nodes in (i) subdomain Ωi and nB the number of nodes on B (i) . .4) can be block partitioned as: . . box like subdomain decompositions will be employed. 3. Using this ordering. uTB and f = f TI . . Then. . for 1 ≤ j ≤ nI ⎪ ⎪ ⎪ ⎪ (uB )j for 1 ≤ j ≤ nB ⎪ ⎪ = (u)nI +j .nI +j . xn in Th (Ω) are ordered based on the subdomains Ω1 . for 1 ≤ l. .1 Background 109 Non-overlapping strip decomposition Non-overlapping box decomposition Ω1 Ω2 Ω3 Ω4 Ω5 Ω6 Ω7 Ω8 Ω1 Ω2 Ω3 Ω4 Ω5 Ω6 Ω7 Ω8 Ω9 Ω10 Ω11 Ω12 Ω13 Ω14 Ω15 Ω16 Fig. The block submatrices AII and AIB in (3.1. . . 3. nI ) + 1 ≤ j ≤ (nI + . . system (3.1.

IB 3.2 Schur Complement System The solution to system (3. 4.1). .110 3 Schur Complement and Iterative Substructuring Algorithms 3. First. Compute: ˜f B = f B − AT wI .5) below: AII uI + AIB uB = f I ATIB uI + ABB uB = f B yields uI = A−1 II (f I − AIB uB ) provided AII is invertible. We summarize the resulting algorithm below. The Schur complement system can be employed to T determine the solution uTI . S is the Schur complement of submatrix AII in Ah ). 2. Algorithm 3. This will be possible when matrix S is invertible. It corre- sponds to a discrete approximation of a Steklov-Poincar´e problem associated with elliptic equation (3.5) as follows. uTB . uTB to (3.6) ⎪ ⎩ ˜f ≡ (f − AT A−1 f ). where ⎨ ⎪ S ≡ (ABB − ATIB A−1 II AIB ) (3. Solve for wI : AII wI = f I .2. Matrix S is referred to as the Schur complement (strictly speaking. Eliminating uI using the ﬁrst block equation in (3. T Output: uTI . Substituting this parametric representation of uI into the 2nd block equation above yields the following reduced linear system for uB : ⎧ ⎪ ⎪ SuB = ˜f B .6): uB = S −1 f B − ATIB A−1II f I . but posed on the interface B. B B IB III The system SuB = ˜f B is referred to as the Schur complement system. Once uB has been deter- mined.1 (Schur Complement Algorithm) 1. Solve for uI : AII uI = (f I − AIB uB ). yielding: uI = A−1 II (f I − AIB uB ). uI can be obtained by solving AII uI = (f I − AIB uB ). Solve for uB : S uB = ˜f B .5) can be sought formally by block Gaussian elim- ination. determine uB by (iteratively or directly) solving the Schur complement system (3.

. and an ordering of the nodes based of this. and by subsequently deﬁning S wB ≡ ABB wB + ATIB wI .6) is typically solved using a preconditioned conjugate gradient iterative method. yielding that Aij = A(φi . then matrix S must ﬁrst be assembled. .6). and instead only requires computing the action of S on diﬀerent vectors. 3. By construction xi ∈ Ωj ⇔ i ∈ I (j) and I =. If a direct solver is employed to solve (3. given wB . This does not require explicit assembly of matrix S. Indeed. φj ) = 0. Ωp . in domain decomposition applications. matrix AII in system (3. However. the Schur complement system (3. deﬁne the index set: $ % (1) (j−1) (1) (j) I (j) ≡ i : (nI + · · · + nI + 1) ≤ i ≤ (nI + · · · + nI ) . Such matrix-vector products. . PR5].2 Schur Complement System 111 Schur complement and iterative substructuring algorithms are motivated by the preceding algorithm. for instance S wB . . The preceding version of the Schur complement algorithm can be imple- mented in parallel by using the block structure of matrix AII . note that when nodes xi and xj belong to the interiors of diﬀerent subdomains. To see this. may be computed by ﬁrst solving AII wI = −AIB wB in parallel (as discussed below) for wI .5) will be block diagonal. given a decomposition of Ω into the subdomains Ω1 . then the nodal basis functions φi (x) and φj (x) will have support in diﬀerent subdomains. This is the approach employed in traditional substructuring [PR4. More formally.

AII satisfy: ⎡ (1) ⎤ ⎧. I (1) ∪ · · · ∪ I (p) . . . . It then follows (1) (p) that the diagonal blocks of AII = blockdiag AII . .

Given the non-overlapping subdomains Ω1 . (p)−1 .6). It employs a direct method to solve (3.8) F (v) = i=1 FΩi (v). . . . Ωp .. since the action of A−1 II = blockdiag(AII . v ∈ HD 1 (Ω) p (3. for v ∈ HD (Ω). .. each of which can be computed in parallel. v) = i=1 AΩi (u. ! .7) This block diagonal structure of AII will enhance the parallelizability of (1)−1 Schur complement algorithms. . for v ∈ HD (Ω).) and FΩi (. AII 0 ⎪ ⎪ A (j) = (Ah )˜lk˜ for 1 ≤ l. . for u. for u. v ∈ HD 1 (Ω) FΩi (v) ≡ Ωi f v dx. (3. v) ≡ Ωi (a(x)∇u · ∇v + c(x) uv) dx. but incorporates the assembly of matrices Ah and S by a ﬁnite element subassembly procedure. let AΩi (. k ≤ nI (j) ⎪ ⎨ II lk ⎢ ⎥ AII = ⎢⎣ . We next describe the substructuring algorithm for solving (3.5). v). . the following subassembly relation will hold: p A(u.) denote subdomain forms: AΩi (u. ⎥ where ⎦ ˜l = (n(1) + · · · + n(j−1) ) + l ⎪ ⎪ I I (p) ⎪ ⎩ (1) (j−1) 0 AII ˜ k = (nI + · · · + nI ) + k. AII ) involves p separate blocks.. 1 By deﬁnition. .

112 3 Schur Complement and Iterative Substructuring Algorithms If u. We may then represent: (j) T (j) (j) (j) (j) T (j) uI AII AIB vI fI vI AΩj (uh . each with a speciﬁed (j) local ordering of the nodes (for instance in ascending order of indices). Given ﬁnite element functions uh . with uI . vh ) = (j) (j)T (j) (j) . FΩj (vh ) = (j) (j) . on each subdomain Ωj . v ∈ Vh ∩ HD 1 (Ω). then these local forms can be represented using matrix- vector notation. vI ∈ IRnj and uB . respectively. let u. Accordingly. v ∈ IRn denote its vector of (j) (j) (j) (j) (j) nodal values. respectively. vh ∈ Vh ∩ HD 1 (Ω). let I (j) and B (j) denote the index sets of nodes in Ωj and ∂Ωj \BD . vB ∈ IRnB denoting subvectors corresponding to indices in I (j) and B (j) (in the local ordering of nodes). uB AIB ABB vB fB vB where the submatrices and subvectors are deﬁned by: ⎧. Let nI (j) and nB denote the number of nodes in Ωj and B (j) .

⎪ ⎪ A (j) ≡ AΩj φ˜l . for 1 ≤ l. φk˜ . k ≤ nI (j) ⎪ ⎪ .

φk˜ . 1 ≤ k ≤ nB ⎪ ⎨. for 1 ≤ l ≤ nI . II ⎪ ⎪ lk ⎪ ⎪ (j) AIB (j) (j) ≡ AΩj φ˜l .

for 1 ≤ l. φk˜ . k ≤ nB ⎪ ⎪ . lk (j) (j) ABB ≡ AΩj φ˜l .

for 1 ≤ l ≤ nI ⎪ ⎪ . lk ⎪ ⎪ (j) (j) ⎪ ⎪ fI = FΩi φ˜l .

Deﬁnition 3. B (j) . l with ˜l and k˜ denoting global indices corresponding to the local indices l and k on Ωj and B (j) . l) = j (RW )lj = 0.9) These subassembly relations may equivalently be expressed based on restric- tion and extension matrices. B) let index(W. l ⎪ ⎩ f (j) (j) B = FΩi φ˜l . If nW denotes the number of nodes in W . if index(W. l) denote the global index associated with the l’th node in the lo- cal ordering of indices in W .2. we deﬁne restriction map RW as an nW × n matrix with entries: 1. . l) = j. as deﬁned below. The discrete version of subassembly identity (3. for 1 ≤ l ≤ nB . vB fB vB fB (3.8) becomes: ⎧ T (j) T (j) (j) ⎪ ⎪ AII AIB (j) ⎪ ⎪ u I A II A IB v I p u I vI ⎪ ⎪ = j=1 ⎨ uB ATIB ABB vB uB (j) (j)T AIB ABB (j) (j) vB ⎪ T (j) T (j) ⎪ ⎪ vI fI p vI fI ⎪ ⎪ ⎪ ⎩ = j=1 (j) (j) . For any set of indices W (such as I (j) . if index(W.

3. i.12) uB ATIB ABB uB (i) (i) 2. Suppose the following assumptions hold.10) ⎪ ⎪ ⎪ ⎪ fI p RI fI ⎪ ⎪ = j=1 . (i) (i) (i) (i) 2. with zero values for all other entries. 1. its extension RW T vW will denote a vector of size n whose entries at indices in W correspond to those of vW in the local ordering. - AII AIB uI 0 = .. . ⎩ fB (j) (j) RB fB This subassembly identity relates the global stiﬀness matrix and load vectors to the subdomain stiﬀness matrices and subdomain load vectors. (3. . -.2 Schur Complement System 113 Given a nodal vector v ∈ IRn . (3. given vW ∈ IRnW .11) ATIB ABB uB fB for some vector f B . -T . uTB ∈ IRn be discrete Ah -harmonic.. The subassembly relations (3. It will hold that: p (i)T (i) uTB SuB = uB S (i) uB . The term f B = SuB and the Schur complement energy will satisfy: . The following result establishes a related expression between the global Schur complement (i) (i)T (i)−1 (i) matrix S and subdomain Schur complements S (i) ≡ ABB − AIB AII AIB . Let u = uTI . Let uI = RI u and uB = RB u. Lemma 3. (3. (3.9) may now be alternatively expressed as: ⎧ (j) T (j) (j) ⎪ ⎪ p AII AIB (j) ⎪ AII AIB ⎪ RI RI ⎪ ⎪ = j=1 ⎨ AIB ABB T RB (j) (j) T AIB ABB (j) RB (j) (j) T (j) (3. satisfy: . 3. T 1. its restriction RW v will denote a subvector of nodal values corresponding to the indices in W in the chosen local order- ing of nodes.14) i=1 (i) (i)T (i)−1 (i) where S (i) ≡ (ABB − AIB AII AIB ). Then the following results will hold. While.13) 3. -. - uI AII AIB uI uTB SuB = . The subvectors uI and uB will satisfy: (i) (i) (i) (i) AII uI + AIB uB = 0.e.

(3. i) = l (RG )il = (3. Given region G ⊂ B containing nG indices. . i) denote the index of the i’th local of G in the ordering of indices on B.4. the nodes in Ωi will be cou- pled only to nodes in Ωi and B (i) . using that AII = blockdiag(AII . let index(B. G. uB )T with (3. if index(B. Formally eliminating uI = −AII AIB uB and substi- (i) (i) tuting into the 2nd block equation above yields f B = S (i) uB where: (i) (i)T (i)−1 (i) S (i) ≡ (ABB − AIB AII AIB ). The subassembly identity (3.11) with uTI . we apply (3. Deﬁnition 3. (3. . . To prove that f B = SuB eliminate uI using (3. Apply RI to (3.18) uB AIB ABB uB Substituting expressions (3.16) and employing that f B = S (i) uB yields: T (i) (i) (i) (i) uI AII AIB uI (i)T (i) (i) (i)T (i) (i) = uB S (i) uB . uI . as follows.17) We refer to S (i) as a local (subdomain) Schur complement. . uTB )T to obtain: (i) (i) (i) AII uI + RI AIB uB = 0. take inner product of (3. (3. (i)T (i)T To prove (3.14) may be expressed equivalently using re- striction and extension maps between nodal vectors on B and B (j) . G. - (i) (i) (i) AII AIB uI 0 (i) T (i) (i) = (i) .. uTB and substitute f B = SuB to obtain (3.9) yields (3. . It can easily be veriﬁed that RG = RG RB T .12) and (3. if index(B. (3. Taking the inner (i)T (i)T (i) (i) product of (uI . To prove (3.11) and substitute the resulting expression uI = −A−1 II AIB uB into the 2nd block equation to obtain T the desired result. T (p) (1) (p)T AII ) and u = (uI . Next.12). i) = l.15) Now. we restrict the block equation: AII uI + AIB uB = 0 (i) (1) to indices in I (i) . uB )T : .18) into identity (3.14). G. . This yields: (i) (i) (i) RI AIB uB = AIB uB .19) 0. We deﬁne an nG × nB matrix RG as: 1.15) yields the desired result. . Substituting this expression into (3.11). .13).16) AIB ABB uB fB (i) (i) (i)−1 (i) (i) for some vector f B . for standard ﬁnite element discretizations.14).114 3 Schur Complement and Iterative Substructuring Algorithms Proof.13) to the local nodal vector (uI .

the Schur complement subassembly identity (3. respectively. 3.14) can be stated as: p p .2 Schur Complement System 115 (i) (i)T Using the restriction and extension maps RB and RB .

· · · . (i)T (i) (i)T (i) (i)T (i)−1 (i) (i) S= RB S (i) RB = RB ABB − AIB AII AIB RB . · · · .20) i=1 i=1 (i) (i)T (i)−1 (i) where S (i) = (ABB − AIB AII AIB ) is a subdomain Schur complement. f B (i) ⎪ ⎪ ⎪ ⎨ Determine the Cholesky factors: A(i) = L(i) L(i)T II I I ⎪ Assemble: S (i) ≡ A(i) − A(i)T L(i)−T L(i)−1 A(i) ⎪ ⎪ ⎪ BB IB I I IB ⎪ ⎩ Assemble: ˜(i) (i) (i)T (i)−T (i)−1 (i) f B ≡ f B − AIB LI LI fI . Endfor 3. Endfor .10) and (3. AIB . The traditional substructuring algorithm solves the Schur complement system by using Cholesky factorization. Assemble: ⎧ ⎨ S ≡ p R(i) S (i) R(i) T i=1 B B ⎩˜ p (i)T ˜(i) f B = i=1 RB f B 4. (i) 5. load vectors and Schur complement matrices using (3.2 (Substructuring Algorithm) 1. p in parallel do: ⎧ ⎪ ⎪ (i) (i) (i) (i) Assemble: AII . 2. ABB .9).20).2. (3. Determine the Cholesky factors: S = LS LTS and solve: LS wB = ˜fB T LS uB = wB . p in parallel solve for uI : (i) (i) (i) (i) (i) AII uI = (f I − AIB RB uB ). Algorithm 3. For i = 1. For i = 1. (3. 6. f I . The resulting algorithm is summarized below. and explicitly assembles the subdomain ﬁ- nite element stiﬀness matrices.

ABB . . . T (1)T (p)T Output: uI . However. followed by the computation of the subdomain Cholesky factors. (i) (i) (i) (i) AIB . The computations on diﬀerent subdomains can be performed in parallel. since . . uTB . . the substructuring algorithm is not purely algebraic. (i) Steps 1 and 2 in the substructuring algorithm involve the assembly of AII . modiﬁed loads and and Schur complement matrices S (i) . uI . f I and f B on each subdomain Ωi .

5. and of the forcing term f˜B in (3. SA2] is employed to solve SuB = ˜f B without assembling S.6) must be parallelized using traditional methods. When coeﬃcient c(x) = 0 and B (i) = ∂Ωi . such inverses should not be assembled explicitly [GO4]. A A 1 0 IB BB T where 1 = (1. since nB can be large. Assembly of the global Schur complement matrix S using identity (3. Such a reduction in the . . 1) . .6. For brevity of expression. . 1) is of appropriate size.7. instead the action of the in- verse should be computed by the solution of the associated linear system. the submatrices (i) AII will be invertible. The magnitude of a nonzero entry Sij typically decreases with increasing distance between the nodes xi and xj . . we have employed matrix inverses in (i) the expressions for S (i) and ˜f B in the substructuring algorithm. . a preconditioned itera- tive method [GO4. Each (i) subdomain Schur complement matrix S (i) will be of size nB corresponding to the number of nodes on B (i) . Remark 3.20). The subdomain (i) Schur complement matrices S will typically not be sparse. However. From a computational viewpoint. Once uB is determined.116 3 Schur Complement and Iterative Substructuring Algorithms it employs the subdomain stiﬀness matrices (as they may not be available if the linear system Au = f has already been assembled). Similarly. . Remark 3. assembly of matrix S and its Cholesky fac- torization can be signiﬁcant costs. matrix S (i) will also T be singular with a null vector of the form (1. In this case. the (i) components uI of uI can be determined in parallel (on each subdomain). . the Cholesky factorization of S and the solution of the Schur complement system yielding uB . . however. the entry Sij will be zero. then it may be possible to reduce these computational costs provided an eﬀective preconditioner can be found. However. Remark 3. Explicit assembly of S (i) requires the solution (i) (i) of nB linear systems involving sparse coeﬃcient matrix AII . otherwise. and satisfy: (i) (i) AII AIB 1 0 (i)T (i) = . If instead. must be parallelized traditionally. GR2. then entry Sij will typically be nonzero. The global Schur complement matrix S will have a block matrix structure depending on the ordering of nodes in B. the subdomain stiﬀ- ness matrices will typically be singular. AX. and the subsequent cost of solving SuB = ˜f B using a direct solver. Such properties are further explored when block matrix preconditioners are constructed for S. the cost of the substructuring algorithm is dominated by the cost of assembling matrix S. If nodes xi and xj lie on some common subdomain boundary B (k) . From a computational viewpoint. their entries may decay in magnitude with increasing distance between the nodes.

2. For i = 1. Endfor . p in parallel do: ⎧ ⎪ ⎪ (i) (i) (i) (i) Assemble: AII . f I . However. p in parallel solve for uI : (i) (i) (i) (i) (i) AII uI = f I − AIB RB uB . 3.2 Schur Complement System 117 computational costs motivates the iterative substructuring method. For i = 1. AIB . 3. The iterative substructuring method has similar steps as Alg. 3. Assemble: p (i)T (i) ˜ fB = i=1 RB ˜ fB 4. ABB . · · · . vector ˜f B is assembled) and step 4 is replaced by a preconditioned CG method to solve SuB = ˜f B with a preconditioner M . (i) 5.4 through Chap.2. Precon- ditioners for S are considered in Chap.2. Solve SuB = ˜f B using a preconditioned CG method.7. We summarize the resulting algorithm.2. 2. matrix S is not assembled in step 3 (instead. 3. · · · . Algorithm 3. Endfor 3. 3. f B (i) ⎪ ⎪ ⎪ ⎨ Determine the Cholesky factors: A(i) = L(i) L(i)T II I I ⎪ Assemble: S (i) ≡ A(i) − A(i)T L(i)−T L(i)−1 A(i) ⎪ ⎪ ⎪ BB IB I I IB ⎪ ⎩ Assemble: ˜(i) (i) (i)T (i)−T (i)−1 (i) f B ≡ f B − AIB LI LI fI .3 (Iterative Substructuring Algorithm) 1. Steps 5 and 6 remain as in Alg. 6.2.

8. . Remark 3. uTB . . T (1)T (p)T Output: uI . . uI . The cost of implementing a preconditioned iterative method to solve SuB = ˜f B using a preconditioner M in step 4 will be proportional to the number of preconditioned iterations and to the cost per . .

When these submatrices are available. Y = I. may be less than the cost of assembling S and solving S u = f using a direct method. B.9. a product with S can be computed as: p . The iterative substructuring method is not purely algebraic. if the cost of solving M wB = rB is modest. Furthermore. iteration. When the (i) number of preconditioned iterations is less than mini nB . then the total cost of solving S uB = ˜f B iteratively without assembling S. as it (i) employs the subdomain stiﬀness matrices AXY for X. the cumulative cost for computing matrix-vector products with S will not exceed the cost of assembling the subdomain Schur complement matrices S (i) . Remark 3.

21) i=1 . (i)T (i) (i)T (i)−1 (i) (i) S wB = RB ABB − AIB AII AIB RB wB . (3.

close to machine precision (when computing the matrix-vector product with S). (3.25) (derived later in this section). −1 - I −A˜−1 ˜ II AIB I 0 I 0 A˜II 0 A˜−1 = −1 ˜ .. GO4]: . and these approximations must be scaled appropriately. each iteration will require the solution of a linear system of the form ˜ = r. suppose A˜II and M are preconditioners for matrices AII and S. we shall separately consider two dimensional and three dimensional domains. MA11. since AII is block diagonal. and let A˜IB denote an approximation of AIB . . which can be obtained by formally applying the expression z = A˜−1 r Az given above. then such a product can be computed using: SwB = ABB wB − ATIB A−1 II AIB wB . Indeed. This approach. then motivated by the block form (3.10. These preconditioners will be grouped as two subdomain or multisubdomain pre- conditioners. which results in the Schur complement system. Such an approach will have the advantage that the subdomain problems need not be exact. Indeed. it has been shown that if A˜II ≡ α AII for some α = 1. and when applying a CG method. after describing properties of matrix S and FFT based direct solvers for S. 3. however. A separate section is devoted to the robust class of Neumann-Neumann and balancing domain decomposition preconditioners. followed by deﬁning SwB ≡ ABB wB + ATIB wI . a preconditioner A˜ for stiﬀness matrix A may be constructed: . care must be exercised in the choice of matrices A˜II and A˜IB approximating AII and AIB .1 Properties of the Schur Complement System From a matrix viewpoint. we shall focus on preconditioners M for S for use in the iterative substructuring or Schur complement algorithm.23) 0 I 0M −AIB I T 0 I Matrix A˜ will be symmetric and positive deﬁnite. can be understood to arise from the following block matrix factorization of A. However.2. with p diagonal blocks). The iterative substructuring and Schur complement algorithms have the disadvantage that they require the solution of subdomain problems (i) (i) (i) of the form AII wB = rB .22) This requires computing AIB wB ﬁrst. the block elimination of uI . -. respectively. as expressed next [CO6. An alternative approach which avoids this is to solve the original linear system Au = f by a preconditioned CG method with a block matrix preconditioner A˜ for A.118 3 Schur Complement and Iterative Substructuring Algorithms However. as most preconditioners depend on the geom- etry of the interface B. Remark 3. respectively. when these matrices are not available (for instance. In the latter case. when matrix A is already assembled). -. In the remainder of this chapter. re- quires two subdomain solves per iteration involving coeﬃcient matrix A˜II . then the convergence rate of of the conjugate gradient method deteriorates signiﬁcantly [BO4]. followed by solving AII wI = −AIB wB (in parallel. (3.

Then the following results will hold: 1.. - −1 I −A−1 II AIB AII 0 I 0 A = 0 I 0 S −1 − ATIB A−1 II I .. - T EuB A II A IB EuB uB SuB = . . . - AII AIB I 0 AII AIB A≡ = ATIB ABB ATIB A−1 II I 0 S .24) I 0 AII 0 I A−1 II AIB .25) I −A−1 II AIB I 0 I 0 A−1 II 0 = . . with .27) uB ATIB ABB uB for arbitrary uB . We employ the notation λm (C) and λM (C) to denote the minimum and maximum eigenvalues. but its action must be computed. = ATIB A−1 II I 0 S 0 I −1 where S ≡ ABB − ATIB AII AIB denotes the Schur complement matrix. S will be symmetric and positive deﬁnite. - AII AIB EuB 0 = .2 Schur Complement System 119 . . (3. and let κ2 (C) ≡ λM (C)/λm (C) denote the spectral condition number of C. (3. then the Schur complement matrix S need not be assembled explicitly. Let A be a symmetric positive deﬁnite matrix having the block structure: AII AIB A= . matrix A−1 will formally have the following block factorizations: . of a real symmetric matrix C. The following result provides bounds for the extreme eigenvalues of S when A is a symmetric and positive deﬁnite matrix. However. respectively. Deﬁne EwB ≡ −A−1 II A IB wB for a vector wB . we let σ1 (D) denote its smallest singular value. -. −1 -. and S −1 once. .11.26) ATIB ABB uB SuB for arbitrary uB . . -. . In this case. Let S = (ABB − ATIB A−1 II AIB ) denote the Schur complement matrix. if iterative methods are employed. Given an arbitrary matrix D. Suppose the following assumptions hold.. -. 2.. (3. 3. (3. The energy associated with matrix S will satisfy: . -T .. 1. 3. 0 I 0 S −1 − ATIB I 0 I To formally determine the solution of Au = f using this block factorization of A−1 requires computing the action of A−1 II twice. . Lemma 3. . . ATIB ABB 2.

-T . uTB to obtain: . Consequently. 4. - AII AIB EuB 0 = . When A is symmetric positive deﬁnite.. - T EuB AII AIB EuB uB SuB = uB ATIB ABB uB . -T . The minimum eigenvalue of S will satisfy: λm (A) σ1 (E)2 + 1 ≤ λm (S). Since A is symmetric positive deﬁnite. Proof. -T . .120 3 Schur Complement and Iterative Substructuring Algorithms 3. we immediately obtain that: uTB SuB ≥ λm (A)uTB uB . λM (AII ) 5. since σ1 (E) ≥ 0. matrix S = ABB − ATIB A−1 II AIB will be deﬁned and symmetric by construc- tion. The Schur complement matrix S will be better conditioned than matrix A in the spectral norm: κ2 (S) ≤ κ2 (A). . ATIB ABB uB SuB To show that S is positive deﬁnite. - EuB AII AIB EuB EuB 0 = uB ATIB ABB uB uB SuB = uTB SuB . so that A−1 II is well deﬁned. with its lowest eigenvalue at least as large as the lowest eigenvalue of A: λm (A) ≤ λm (S). its diagonal block AII will also be symmetric and positive deﬁnite. we obtain that: .. . take inner product of the above equation T with (EuB )T . The maximum eigenvalue of S will satisfy: σ1 (AIB )2 λM (S) ≤ λM (ABB ) − . and so S will be positive deﬁnite. - EuB EuB ≥ λm (A) uB uB ≥ λm (A) (EuB )T EuB + uTB uB ≥ λm (A) σ1 (E)2 + 1 uTB uB . Substituting the deﬁnition of EuB and computing directly yields: . In particular. -T .

employing the deﬁnition of S.2 Schur Complement System 121 Next. we obtain that uTB SuB = uTB ABB − ATIB A−1II AIB uB ≤ uTB ABB uB − uTB ATIB A−1 II AIB uB T ≤ λM (ABB ) − σ1 (AIB )2 λm (A−1 II ) uB uB . 3.

we obtain: λM (S) ≤ λM (A). . λm (S) λm (A) which is the desired result. Lemma 3. 1. Then. see [BE17]. ∀i. and since 2 − σλ1M(A(AIBII)) ≤ 0. j and either (K −1 )ij ≥ 0 entrywise or if all minors of K are positive. In particular.12. Equivalently. 2 = λM (ABB ) − σλ1M(A(AIBII)) uTB uB . Deﬁnition 3.13. then the Schur complement S will also be an M -matrix. j. A nonsingular matrix K is said to be an M -matrix if: ⎧ ⎪ ⎨ (K)ii > 0. This will hold even if matrix A is non-symmetric. ∀i (K) ij ≤ 0. K is an M -matrix if it can be expressed in the form K = r I − N where (N )ij ≥ 0 for all i. i = j ⎩ K −1 ≥ 0. ⎪ ij see [VA9. The next result shows that if matrix A is an M -matrix. SA2]. S = (ABB − ABI A−1 II AIB ) will also be an M -matrix. Let matrix A be non-symmetric and block partitioned as follows: AII AIB A = . Reﬁnements of the preceding bounds may be found in [MA11]. Let A be an M -matrix. Suppose the following assumptions hold. ABI ABB 2. since the eigenvalues of the principal submatrix ABB of A must lie between the maximum and minimum eigenvalues of A. Combining the upper and lower bounds for the eigenvalues of S yields: λM (S) λM (A) κ2 (S) = ≤ = κ2 (A).

. We shall now consider analytic properties of the Schur complement ma- trix S. These properties will be employed to construct ap- proximations of S which serve as preconditioners. I we obtain that S −1 ≥ 0 entrywise. for x ∈ B.1). Ωp denote a nonoverlapping decomposition of Ω and let uB denote a suﬃciently regular function deﬁned on interface B with zero values on BD . where wB = E uB denotes the piecewise L-harmonic extension of uB on B: LwB = 0.122 3 Schur Complement and Iterative Substructuring Algorithms Proof. trace theorems.1) and its discretization. Let Lu ≡ −∇ · (a∇u) + c u denote the elliptic operator underlying (3.27). it will hold that S = (ABB − ABI A−1 II A IB ) has the form r I − GBB for GBB = (NBB + ABI A−1 II AIB ) ≥ 0 entrywise. since AII = r I − NII for NII ≥ 0 entrywise and since the minors of AII will be positive. . . Heuristically by analogy with (3.26). uB )L2 (B) = A (EuB . submatrix AII will also be an M -matrix. Remark 3. Using the continuous analog of (3. it will be of the form A = r I − N where N ≥ 0 entrywise. - −1 " # −1 0 S = 0I A . Let Ω1 . MA17]. Since A BB = r I − N BB where NBB ≥ 0 entrywise. BR15. We begin by identifying a Steklov-Poincar´e operator S whose discrete analog yields matrix S. . We shall employ the notation: . We next describe bounds for the eigenvalues of S in terms of the mesh size h and coeﬃcients in the elliptic equation. A−1 II ≥ 0 entrywise. we heuristically deﬁne the action of a Steklov-Poincar´e operator S on a function uB deﬁned on interface B as follows: SuB (x) ≡ LwB (x). because ABI ≤ 0. EuB ). DR10. S will be an M -matrix [BE17]. the energy associated with the Steklov- Poincar´e operator S will satisfy: (SuB . Thus. Furthermore. on ∂Ωi . As a result. Thus.) is deﬁned by (3. inherited from the underlying elliptic partial diﬀerential equation (3. See [CR. NA].14. BR12. (3. Such estimates employ properties of elliptic equation (3. . .1). wB = uB . it will hold that (ABI A−1II AIB ) ≥ 0 entrywise. discrete extension theorems and also inverse inequalities for ﬁnite element spaces. in Ωi for 1 ≤ i ≤ p. see [DR2. First note that since A is an M -matrix. Since: .2).28) where A(. DR14. AIB ≤ 0 and A−1 II ≥ 0 entrywise. fractional Sobolev norms.

but possibly dependent on the subdomain diameter h0 .29) for 1 ≤ i ≤ p where C does not depend on h. Deﬁne σm = min {cm .30) with its energy equivalent to the Schur complement energy.33) .28): uTB SuB = A(uh . Let the following inverse inequality hold for all vh ∈ Vh vh 1/2.Ωi ≡ |∇u|2 dx ⎪ ⎪ Ωi ⎪ ⎪ ⎨ u21.∂Ωi ≤ A(uh .31) 2. v) = 0. Lemma 3. as in (3. 1.2 Schur Complement System 123 ⎧ 2 ⎪ ⎪ |u|1. 3. i=1 i=1 (3. σm and σM .∂Ωi . such that: ! ! p p c σm uh 0. (3. uTB where uI satisﬁes uI ≡ EuB = −A−1 II AIB uB . (3. 2 i=1 i=1 (3. There exists c > 0 and C > 0 independent of h. Suppose the following assumptions hold with BD = ∂Ω.∂Ωi |x−y|d dx dy. uh ) ≤ C σM uh 21/2. Let the coeﬃcients a(x) and c(x) satisfy: 0 < am ≤ a(x) ≤ aM 0 < cm ≤ c(x) ≤ cM .32) 3. 3. The ﬁnite element function uh will be piecewise discrete L-harmonic: A(uh . such that: ! ! p p c σm uh 21/2.∂Ωi ≤ Ch−1/2 vh 0. aM }. but possibly dependent on the subdomain diameter h0 . ∀v ∈ Vh ∩ H01 (Ωi ). The following result will not be optimal with respect to coeﬃcient variation or the diameter h0 of the subdomains. uh ) ≤ C σM 2 uh 0.∂Ωi h−1. 1.∂Ωi ≡ ∂Ωi ∂Ωi |u(x)−u(y)| |x−y|d dx dy + ∂Ωi |u|2 dx. uh ). am } and σM = max {cM . Ωi ⊂ IRd ⎪ ⎪ ⎪ ⎩ u2 2 1/2. (3.∂Ωi ≤ A(uh . 1 ≤ i ≤ p. σm and σM . Then the following results will hold. There exists c > 0 and C > 0 independent of h. 2.Ωi ≡ |∇u|2 dx + Ωi |u|2 dx Ωi 2 ⎪ ≡ ∂Ωi ∂Ωi |u(x)−u(y)| 2 ⎪ ⎪ |u|1/2. Let uh denote a ﬁnite element function corresponding to a nodal vector T u = uTI .∂Ωi .15.

up to a scaling factor. vB .∂Ωi ≤ C h−1 uh 20.. uh ) associated with uh .33). - vI AII AIB EuB vI 0 = .124 3 Schur Complement and Iterative Substructuring Algorithms T Proof.∂Ωi ≤ uh 21/2. uh ) = uTB SuB . Combining the trivial bound c uh 20.Ωi . we obtain A(uh . 3.∂Ωi . we employ a discrete extension theorem (see Chap. ∀vh ∈ Vh ∩ H01 (Ωi ). -T . uh ) ≤ σM uh 21.33) yields: . vh ) = 0.∂Ωi will be equivalent.∂Ωi . Substituting the above upper and lower bounds into (3. we may equivalently express the preceding as: A(uh .Ωi .32).16. then known properties of the mass matrix [ST14.29) yields: c uh 20. for 1 ≤ i ≤ p. where c > 0 is independent of h and the coeﬃcients.26) yields: .Ω . for C > 0 independent of h and the coeﬃcients. Applying the inner product of vTB . To obtain an upper bound. uTB . CI2] imply that uh 20. uh ) ≤ σM uh 21.∂Ωi ≤ uh 21/2. This veriﬁes that uh is discrete L-harmonic on each Ωi . since vh will be zero on B.34) i=1 i=1 Application of the trace theorem on each Ωi yields the lower bound: c uh 21/2. Substituting this in (3. 0)T . vB ATIB ABB uB vB SuB If vB = 0. . 1 ≤ i ≤ p. for 1 ≤ i ≤ p. (EvB )T with (3. If uh is a ﬁnite element function corresponding to the nodal T vector u = uTI . We then decompose the Sobolev norm based on the subdomains to obtain: p p σm uh 21. we employ the equivalence between the energy norm and the Sobolev norm: σm uh 21. -T . but may depend on h0 . If vh denotes the ﬁnite element function corresponding to the nodal vector v = (vTI . Remark 3. (3.Ωi ≤ A(uh . To derive bounds for the energy A(uh . to the Euclidean norm of the nodal vector u restricted to ∂Ωi .34) yields (3.∂Ωi ≤ uh 21. then the right hand side will be zero.Ω ≤ A (uh .Ωi ≤ C uh 21/2.∂Ωi with inverse in- equality (3. ∀vI . Combining the preceding bound with (3.32) yields (3. By choosing vB = uB and vI = EuB .9) and a prior estimates for discrete harmonic functions to obtain: uh 21. but possibly dependent on h0 .

Here.3 FFT Based Direct Solvers For a discretization of a separable elliptic equation. it may be possible to construct fast Fourier transform (FFT) based direct solvers for the stiﬀness matrix A and the Schur complement matrix S. when uh is constant locally). RE. for 1 ≤ i ≤ p. this transformed system will be block diagonal. for any ﬁnite element function uh ∈ Vh satisfying (3. uh ) can become zero even when uh (x) = 0 (for instance. and it can be solved in parallel using band solvers. Remark 3. where h0 denotes the subdomain diameter [BR24]. These bounds compare favorably with the condition number bound of C (σM /σm ) h−2 for κ2 (A). then the following equivalence will hold. the stiﬀness matrix A must have a block matrix structure in which each block is simultaneously diagonalized by a discrete Fourier transform matrix Q. Remark 3. CH14. (3.∂Ωi ≤ A(uh . into a block matrix with diagonal submatrices. see Chap.3 FFT Based Direct Solvers 125 vTB SvB c σm ≤ ≤ C σM h−1 .17. the stiﬀness matrix A can be transformed. vTB vB Thus. . When this property holds. 00 00 Discrete approximations of the fractional Sobolev norm vh 2 1/2 will be H00 (B (i) ) considered later in this chapter for ﬁnite element functions. the condition number κ2 (S) will grow as C (σM /σm ) h−1 with decreas- ing mesh size h. 3. 3. uh ) ≤ C ρi |uh |21/2. After appropriately reorder- ing the unknowns.30): ! ! p p c ρi |uh |21/2. for ﬁxed subdomains.18. see [BJ9. VA4]. A reﬁnement of this estimate yields: κ2 (S) ≤ C (σM /σm ) h−1 0 h −1 .35) i=1 i=1 with 0 < c < C independent of h and a(x). seminorms replace the norms since some of the local Dirichlet energies AΩi (uh . 3. x ∈ Ωi . CH13. then the following norm equivalence can be employed [LI4]: c v2H 1/2 (B (i) ) ≤ v2H 1/2 (∂Ωi ) ≤ C v2H 1/2 (B (i) ) . using an orthogonal similarity transformation. When c(x) = 0 in (3.∂Ωi .1) and a(x) is piecewise constant: a(x) = ρi . For such solvers to be appli- cable. with band matrices along its diagonal.9. If v = 0 on BD and ∂Ωi ∩ BD = ∅.

126 3 Schur Complement and Iterative Substructuring Algorithms Strip subdomains Triangulation of the domain Ω1 Ω2 Ω3 Ω4 Lx2 E (1) E (2) E (3) hx2 L0 L1 L2 L3 Lx1 hx1 Fig.3.2. we outline the construction of such fast direct solvers. for matrix A and its Schur complement S. this will yield an explicit eigendecomposition of the Schur complement S.2. 3. with a uniform grid and constant coeﬃcients. In the special case of a two subdomain rectangular decomposition. and algorithm to solve SuB = ˜f B is summarized in Alg. Lx2 ) for x = (x1 . Strip decomposition with four subdomains In this section.3. We shall consider the following separable elliptic equation posed on a two dimensional rectangular domain Ω = (0.1. The FFT based algorithm to solve Au = f is summarized in Alg. 3. x2 ): . Lx1 ) × (0. 3.

.

. − ∂x ∂ 1 a1 (x) ∂u ∂x 1 − ∂ ∂x 2 a 2 (x) ∂u ∂x 2 = f (x). where L0 ≡ 0 < L1 < · · · < Lp ≡ Lx1 . for 1 ≤ i ≤ p. for x ∈ Ωi (i) for 1 ≤ i ≤ p. Ωp of Ω consisting of the strip subdomains: Ωi ≡ (Li−1 .36) u = 0. The coeﬃcients a1 (x) and a2 (x) in the elliptic equation will be assumed to be constant within each subdomain Ωi : (i) a1 (x) = a1 .j = uh (ihx1 .2. Li ) × (0. a2 (x) = a2 . so that there are integers Lr such that Lr = Lr hx1 . and the nodal values of a ﬁnite element function uh at these grid points will be denoted ui. for 0 ≤ r ≤ p. Lx2 ). 3. . for x ∈ ∂Ω. The grid points (ihx1 . the stiﬀness matrix A resulting from the ﬁnite element discretization of (3. for x ∈ Ω (3. for x ∈ Ωi For this choice of coeﬃcients and triangulation. Triangulate Ω using a uniform grid with (l − 1) × (k − 1) interior grid points having mesh spacings hx1 ≡ (Lx1 /l) and hx2 ≡ (Lx2 /k) as in Fig. jhx2 ). . jhx2 ) for indices 1 ≤ i ≤ (l −1) and 1 ≤ j ≤ (k −1) will lie in the interior. jhx2 ). The subdomain boundary segments E (r) ≡ ∂Ωr ∩∂Ωr+1 for 1 ≤ j ≤ (p−1) will be assumed to align with the triangulation. We formally denote it as: .36) will have the following stencil at a gridpoint (ihx1 . . We consider a nonoverlapping decomposition Ω1 .

⎥+ ⎢ . deﬁne subvectors ui ≡ (ui.j ) if i = Lr ..j = a1 hx (ui. ul−1 ) .j+1 ) ⎪ ⎨ 2 (r) hx2 (Au)i.j − ui−1. ⎥ ⎢ . ⎪ ⎪ 1 ⎪ ⎪ (r) hx1 ⎪ ⎪ + a2 hx (2ui. ⎥ (i) ⎢ a2 h x 1 ⎢ ⎥ (i) ⎢ 2a1 hx2 ⎢ ⎥ . ⎦ −1 2 1 while 1 . ⎥ ⎣ −1 2 −1 ⎦ ⎣ .. 2 To represent (3. hx 2 ⎢ ⎥ hx 1 ⎢ ⎥ ⎢ ⎥ ⎢ .j − ui.37) as a linear system.1 . ui.. ⎥ ⎢ .j ) x 2 ⎪ ⎪ ⎪ (a2 +a2 )hx1 ⎩ (r) (r+1) + 2hx (2ui. the linear system Au = f representing (3. ⎥ ⎢. .. ⎥ . . ⎥ ⎢ −1 2 −1 ⎥ ⎢ . (3. ..j−1 − ui.j−1 − ui. .. . ⎥. ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ −β (l−3) T (l−2) −β (l−2) ⎦ ⎣ ul−2 ⎦ ⎣ f l−2 ⎦ −β (l−2) T (l−1) ul−1 f l−1 where T (r) and β (r) are submatrices of size (k − 1) deﬁned for Li−1 < r < Li : ⎡ ⎤ ⎡ ⎤ 2 −1 1 ⎢ ⎥ ⎢ ..k−1 )T T for 1 ≤ i ≤ l − 1 and employ them to deﬁne a nodal vector u ≡ (u1 .j − ui..j ) if i = Lr (3.38) ⎢ ⎥⎢. ⎥ ⎢ . 3.j − ui−1. · · · .37) ⎪ ⎪ 1 ⎪ ⎪ (r+1) h ⎪ + a1 ⎪ hx1 (ui..37) is: ⎡ (1) ⎤⎡ ⎤ ⎡ ⎤ T −β (1) u1 f1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ −β (1) T (2) −β (2) ⎥ ⎢ u2 ⎥ ⎢ f 2 ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ . For this ordering of nodes. ⎥ ⎢ ⎥ = ⎢ ⎥..3 FFT Based Direct Solvers 127 ⎧ (r) h ⎪ ⎪ a1 hxx2 (2ui.j − ui+1. . ⎥ T ≡ (r) ⎢ .j − ui+1.. . · · · . .j+1 ) .

The submatrices β (r) are multiples of the identity: ⎧ (i) hx ⎨ a1 hx2 I. if Li−1 < r < Li β ≡ (r) . tridiagonal and Toeplitz (it is constant along each diagonal) of size (k − 1). (Li −1) T (r) ≡ T + T (Li +1) . for r = Li . 2 Each matrix T (r) is symmetric.

.38) is that its submatrices T (r) and β (r) are diagonalized by a discrete sine transform matrix Q. 1 (3. if r = Li . An important property of matrix A in (3. 2 1 1 hx 1 where I denotes an identity matrix of size (k − 1). as deﬁned next.39) ⎩ 1 a(i+1) + a(i) hx2 I.

By direct substitution and the use of trigonometric identities it is easily veriﬁed that qj is an eigenvector of matrix T (r) : (r) T (r) qj = λj qj . for 1 ≤ i. Given an integer k > 2.19. j ≤ (k − 1). · · · . Routines for fast multiplication of a vector by Q are available in most FFT packages with complexity proportional to O(k log(k)). Using trigonometric identities.128 3 Schur Complement and Iterative Substructuring Algorithms Deﬁnition 3. sin . see [VA4]. it can be veriﬁed that QT Q = I so that Q is an orthogonal matrix. we apply matrix T (r) to the j’th column vector qj of Q. (r) corresponding to the eigenvalue λj given by: ⎧ (i) ⎪ 2a2 hx1 2a(i) h ⎪ ⎨ hx2 1 − cos jπ k + 1hx x2 . 1 (r) λj ≡ . matrix Q is symmetric. k k k k By construction. if Li−1 < r < Li . we deﬁne the entries of a discrete sine transform matrix Q of size (k − 1) as follows: * 2 ijπ Qij ≡ sin . we let qj denote the j’th column of matrix Q * T 2 jπ 2jπ (k − 1)jπ qj = sin .40) k k For 1 ≤ j ≤ (k − 1). sin . To verify that each block of A is diagonalized by Q. (3.

.

(r) (r) where Λ(r) = diag(λ1 . ⎪ ⎪ (i) (i+1) hx1 jπ (i) (i+1) ⎩ a2 +a2 1 − cos k + a1 +a1 hx2 . λk−1 ). hx2 hx1 (3. where each Dij is a diagonal matrix of size m. Let Q be an orthogonal matrix of size m which simultaneously diagonalizes all the block submatrices of C: QT Cij Q = Dij . Suppose the following assumptions hold. Lemma 3. Since the matrices β (r) are scalar multiples of the identity. The following algebraic result shows how any block partitioned system Cw = g can be reduced to a block diagonal linear system provided all blocks of matrix C can be simultaneously diagonalized by an orthogonal matrix Q. if r = Li .20. 2. 1. . j ≤ n. T (r) has the eigendecomposition: T (r) = QΛ(r) QT .41) Thus. for 1 ≤ i. · · · . j ≤ n. Let C be an invertible matrix of size m n having an n×n block structure in which the individual blocks Cij are submatrices of size m for 1 ≤ i. they are also trivially diagonalized by Q.

⎥. Thus. ⎥ ⎢ . . (3. 1 ≤ k ≤ n.42) Cn1 · · · Cnn wn gn T where g = gT1 . ⎥⎢ . ⎦ = ⎣ .. .3 FFT Based Direct Solvers 129 T 3. ⎥ = ⎢ . For 1 ≤ i ≤ m subvector µi of size n is deﬁned by: (µi )k = QT gk i . ⎦ 0 Gmm αm µm where Gii . ⎦ ⎣ . 1 ≤ l. ⎥ = ⎢ ..42) can be obtained by solving the following block diagonal linear system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ G11 0 α1 µ1 ⎢ ⎥⎢ . By construction Q will also be an orthogonal matrix. . ⎦⎣ ..44) ⎣ ⎦⎣ . each block submatrix Dij = QT Cij Q of D will be a diagonal matrix of size m. 1 ≤ k ≤ n. gTn with gi ∈ IRm . . . ⎥ ⎢ . For 1 ≤ i ≤ m subvector αi of size n is deﬁned by: (αi )k = QT wk i . wTn with wi ∈ IRm denote the solution to the block partitioned linear system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ C11 · · · C1n w1 g1 ⎢ . . Deﬁne a block diagonal matrix Q ≡ blockdiag (Q. for 1 ≤ i ≤ m. . αi and µi are deﬁned by: 1. Then.. By construction. (3. ⎥ ⎢ . ⎦ ⎣ .43) ⎣ . (3. ⎥. using the given orthogonal matrix Q of size m. For 1 ≤ i ≤ m matrix Gii is of size n with entries deﬁned by: (Gii )lk ≡ (Dlk )ii . ⎥ ⎢ . . ⎦. the transformed linear system becomes Dw ˜ =g ˜ . ⎥ ⎢ . . . Let w = wT1 . 3. . . 2. . ⎦ Q Cn1 Q · · · Q Cnn Q T T T Q wn T Q gn Deﬁne D ≡ QT CQ and let w ˜ ≡ QT w and g ˜ ≡ QT g denote the trans- formed vectors. components of w ˜ will be coupled within the transformed linear system Dw ˜ = g ˜ only when its indices diﬀer by an integer multiple of m.. k ≤ n. . Apply Q to transform the linear system Cw = g into QT CQ QT w = QT g : ⎡ T ⎤⎡ T ⎤ ⎡ T ⎤ Q C11 Q · · · QT C1n Q Q w1 Q g1 ⎢ ⎥⎢ . ⎥ .. Then. . ⎥ ⎢ . . As a consequence. Q) having n di- agonal blocks. 3. ⎦ ⎣ . the solution to system (3. a suitable reordering of the indices within the transformed system should yield a block diagonal linear system. Proof. ⎥ ⎢ . for 1 ≤ i ≤ m. . for 1 ≤ i ≤ m. ⎢ ⎥ ⎢ ⎥ ⎣ . ⎥ ⎢ .

j ) = Q (c1. Once all the unknowns cij have been determined by parallel solution of the tridiagonal linear systems. . By construction. for 1 ≤ j ≤ (l − 1). . we partition the index set {1. We reorder the components of w ˜ and deﬁne α ≡ P T QT w. · · · .38) using Lemma 3. each containing n entries ordered in ascending order. We deﬁne cj = QT uj and ˜f j = QT f j for j = 1. . . nm}. . ck−1. each Gii in (3. Once the subproblems Gii αi = µi in (3.21. Let P T denote a permutation matrix whose action on a vector reorders its entries according to the above ordering. A fast direct solver can be constructed for solving (3.j . Furthermore. reordering the rows and columns of matrix a reordering of g D should yield G = blockdiag(G11 . . Remark 3. Gmm ) = P T DP to bea block diagonal matrix. . ck−1. In this case. n = (l − 1) with m = (k−1).130 3 Schur Complement and Iterative Substructuring Algorithms Accordingly. · · · .43).j ) . then each submatrix Gii will be a tridiagonal matrix. we deﬁne µ ≡ P T QT g as ˜ .j ) . . Similarly. The original unknowns wk will satisfy wk = Qyk for 1 ≤ k ≤ n. The resulting partition will be: {1. FFT Based Solution of Au = f . 1 ≤ k ≤ n. There will be m subsets in this partition. . · · · . system QT AQ QT u = QT f will also be block tridiagonal.j . It can be easily veriﬁed that the block submatrices Gii in the preceding will inherit the “block sparsity pattern” of C. each nonzero block in QT AQ will satisfy: QT T (r) Q = Λ(r) . Let Q denote the discrete sine transform matrix deﬁned by (3. . where: T cj = (c1. 2m. . . and since multiplication by Q has O (l k log(k)) complexity. . . · · · . 2. l − 1.43) have been solved in parallel. for j = 1. Since A is block tridiagonal. nm} into subsets such that two indices belong to the same subset only if they diﬀer by an integer multiple of m. l − 1. . Since a tridiagonal system can be solved in optimal order complexity. deﬁne yk for 1 ≤ k ≤ n as follows: (yk )i = (αi )k . · · · . . if C is block tridiagonal.43) will be a tridiagonal matrix. The reordered transformed system P T DP P T w ˜ = PT g ˜ will then correspond to the system (3. 1 + m.20 and choosing C = A. the nodal values {uij } at the grid points can be reconstructed by applying Q columnwise T T (u1. For example. uk−1. for 1 ≤ i ≤ m. n m} = {1. QT β (r) Q = β (r) . . (n − 1)m + 1} ∪ · · · ∪ {m. . . . the complexity of the FFT based solution algorithm will be O (l k log(k)). w = u and g = f .j .40). .

.l−1 (˜f l−1 )i 6. Given a ﬁnite element function uh with nodal values uij = uh (ihx1 . · · · . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −βi (l−3) (l−2) λi −βi (l−2) ⎥ ⎣ ci. for 1 ≤ r ≤ p I ≡ I (1) ∪ · · · ∪ I (p) E (r) ≡ {(Lr hx1 . jhx2 ) : Lr−1 < i < Lr .. ⎥ ⎢ . . ⎦. .39) 1. l − 1 in parallel do: 8. Compute using the fast sine transform ⎡ ⎤ c1. Compute the fast sine transform: f j ≡ Qf j . ⎥ ⎢ ⎥⎢ ⎥ = ⎢. Lemma 3. Endfor T Output: uT1 . .1 (FFT Based Solution of Au = f ) (r) Let λj and β (r) be deﬁned by (3. we will employ the following notation for index sets and nodal vectors associated with them. Endfor 4.1 (f 1 )i ⎢ ⎥ ⎢ −β (1) λ(2) −β (2) ⎥ ⎢ ci. ˜ 3. FFT based solution of SuB = ˜f B .j ⎢ ⎥ uj ≡ Q ⎣ ..2 ⎥ ⎢ (˜f 2 )i ⎥ ⎢ i i i ⎥⎢ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎥⎢. jhx2 ) for 1 ≤ i ≤ (l − 1) and 1 ≤ j ≤ (k − 1).. ⎥ ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ .41) and (3. k − 1 in parallel do: 5. provided the block submatrices of S are simultaneously diagonalized by an orthogonal matrix.3. ck−1. jhx2 ) : 1 ≤ j ≤ (k − 1)}. ⎥ ⎢ ⎥⎢ . uTl−1 . . 3. For j = 1. · · · . Solve the tridiagonal system using Cholesky factorization: ⎡ (1) ⎤ λi −βi (1) ⎡ ⎤ ⎡ ˜ ⎤ ci. ⎥⎢.3 FFT Based Direct Solvers 131 Algorithm 3. ⎢ . . . Accordingly.20 can also be applied to construct a direct solver for the Schur complement system. Endfor 7. . . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ .. . . .. 1 ≤ j ≤ (k − 1)} . in the following we study the block structure of the Schur complement matrix S. For j = 1.. l − 1 in parallel do: 2.. . · · · .. ⎥⎢. . for 1 ≤ r ≤ (p − 1) B ≡ E (1) ∪ · · · ∪ E (p−1) . I (r) ≡ {(ihx1 .j 9. ⎥.l−2 ⎦ ⎣ (˜ f l−2 )i ⎦ ⎣ ⎦ −βi (l−2) (l−1) λi ci.. ⎥ ⎢ . ⎥⎢. For i = 1.

ui. we have used E (r) to denote interface E (r) = ∂Ωr ∩ ∂Ωr+1 as well as the set of indices of nodes on it. The following additional nodal subvectors will be associated with each of the preceding index sets: ⎧ .k−1 ) for 1 ≤ i ≤ (l − 1). · · · . We will employ nodal subvectors T ui ≡ (ui.1 .132 3 Schur Complement and Iterative Substructuring Algorithms For convenience.

· · · . T . for 1 ≤ r ≤ p ⎪ I ⎪ u u Lr−1 +1 u Lr −1 ⎪ ⎪ . T ⎪ ⎪ (r) ≡ T .

· · · . for 1 ≤ r ≤ (p − 1) ⎪ ⎪ ⎪ ⎪ . u(p) T T I I ⎪ ⎪ (r) uE ≡ uLr . ⎪ ⎪ T ⎨ uI ≡ u(1) .

(3. (r) ⎣ .. Matrix AII takes the form: ⎡ (r) ⎤ M −γ (r) ⎡ (1) ⎤ ⎢ ⎥ AII 0 ⎢ −γ (r) M (r) −γ (r) ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ AII = ⎢ . (3.47) ⎢ . ⎥ M ≡(r) ⎢ . ⎥. where dr ≡ (Lr − Lr−1 − 1).. The submatrix γ (r) ≡ a1 hxx2 I is 1 of size (k − 1). .. .. ⎥... AIB and ABB based on the preceding index sets. . A = ⎢ . ⎥. . ⎪ ⎪ T ⎩ u ≡ u(1) . . ⎥ ⎣ −1 2 −1 ⎦ ⎣ . . ⎥ (3.. ⎦ −1 2 1 Matrix AIB will be block bidiagonal with p × (p − 1) blocks Xij = AI (i) E (j) : ⎡ ⎤ X11 0 ⎢X X ⎥ ⎢ 21 22 ⎥ ⎢ ⎥ ⎢ X32 X33 ⎥ ⎢ ⎥ AIB = ⎢ . . while M (r) of size (k − 1) satisﬁes: ⎡ ⎤ ⎡ ⎤ 2 −1 1 ⎢ ⎥ ⎢ . ⎥ ⎢ −1 2 −1 ⎥ ⎢ . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ X(p−1)(p−2) X(p−1)(p−1) ⎦ 0 Xp(p−1) .. ⎦ ..45) II ⎢ ⎥ ⎢ (r) ⎥ 0 (p) AII ⎣ −γ (r) M (r) −γ ⎦ −γ (r) M (r) (r) Here AII is a block tridiagonal and block Toeplitz matrix with dr × dr blocks (r) h of size (k − 1)..46) hx2 ⎢ ⎥ hx1 ⎢ ⎥ ⎢ ⎥ ⎢ . ⎥+ 1 ⎢ . u(p−1) T T .. ⎥ x2 ⎢ . . · · · . ⎥ (r) ⎢ a2 h x 1 ⎢ ⎥ 2a(r) h ⎢ ⎥ . . B E E The stiﬀness matrix A will be block partitioned into the submatrices AII ..

⎥ ⎣ 0 ⎦ ⎣ .. Matrix ABB will be a (p−1)×(p−1) block diagonal matrix whose individual blocks are each of size (k − 1) ⎡ (1) ⎤ AEE 0 ⎢ ⎥ . ⎥ and Es(s−1) = AI (s) E (s−1) = ⎢ . ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ Xrr = AI (r) E (r) = ⎢ . ⎦ −γ (r) 0 (3.3 FFT Based Direct Solvers 133 where for 2 ≤ r ≤ p and 1 ≤ s ≤ (p − 1) its block submatrices are deﬁned by: ⎡ ⎤ ⎡ ⎤ 0 −γ (s) ⎢ . 3.48) with blocks of size (k −1)..

49) ⎣ . ⎥ where A(r) ≡ 1 M (r) + M (r+1) . (3. ABB = ⎢ . ⎦ EE 2 (p−1) 0 AEE Each submatrix M (r) is diagonalized by the sine ..

it is trivially diagonalized (r) h by Q with eigenvalues γ (r) j = a1 hxx2 for 1 ≤ j ≤ (k − 1). then since matrices ABB . respectively. .22. where: (r) (r) (r) a hx1 jπ 2a hx2 λj =2 2 1 − cos( ) + 1 . . (3. . . λk−1 . ⎥ ⎢ ⎥ ⎣ SET (p−3) E (p−2) S S E (p−2) E (p−2) E (p−2) E (p−1) ⎦ T 0 SE (p−2) E (p−1) SE (p−1) E (p−1) (3.50) hx2 k hx1 Since matrix γ (r) is a scalar multiple of the identity.45). . transform matrix Q deﬁned (r) (r) T (r) (r) (r) earlier. .47) and (3. for 1 ≤ j ≤ (k − 1). (3. .49). . E (p−1) the following will hold. . Given the ordering of nodes on B based on the index sets E (1) .47) and (3. Lemma 3. The resulting structure is summarized in the following... The Schur complement matrix S will be block tridiagonal of the form: ⎡ ⎤ SE (1) E (1) SE (1) E (2) 0 ⎢ S T (1) (2) SE (2) E (2) SE (2) E (3) ⎥ ⎢ E E ⎥ ⎢ .45) for ABB . (3. AIB and AII . 1 We next consider the block structure of matrix S given the ordering of nodes on B. 1. in S = (ABB − ATIB A−1 II AIB ).49). it will follow that matrix S must be block tridiagonal. respectively. .. rectangular block bidiagonal and block diagonal. with M = QΛ Q and Λ = diag λ1 . If we substitute the block partitioned matrices (3.51) with block submatrices SE (i) E (j) of size (k − 1). . . ⎥ S=⎢ . AIB and AII are block diagonal. . Explicit expressions for the block submatrices SE (r) E (r) and SE (r+1) E (r) can be obtained by directly computing the block entries of S = (ABB − ATIB A−1 II AIB ) using (3.

the scalars α1 . ⎦. ⎥ ⎢ . j ≤ n. . . . ⎥ ⎢ . 2. ⎦ ⎣ . where each Dij is a diagonal matrix. ⎦ ⎣ . Lemma 3. . . .134 3 Schur Complement and Iterative Substructuring Algorithms 2. . . ..23. ⎥ = ⎢ . Then. . For 1 ≤ r ≤ (p − 2) the block submatrices SE (r+1) E (r) will satisfy: SE (r+1) E (r) = −ATI(r+1) E (r+1) A−1 A (r+1) E (r) .. Cn1 · · · Cnn wn gn where wi ∈ IRm and gi ∈ IRm for 1 ≤ i ≤ n. wTn denote the solution to the block partitioned system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ C11 · · · C1n w1 g1 ⎢ . ⎦ (Dn1 )tt · · · (Dnn )tt αn δn . T 3. Let wT1 . . . for scalars δi ∈ IR where qt ≡ (q1t . For 1 ≤ r ≤ (p − 1) the block submatrices SEr Er will satisfy: SE (r) E (r) = AE (r) E (r) − ATI(r) E (r) A−1 A (r) (r) I (r) I (r) I E −ATI(r+1) E (r) A−1 A (r+1) E (r) . I (r+1) I (r+1) I Proof.. . Let Q be an orthogonal matrix of size m which simultaneously diagonalizes all the block submatrices of C: QT Cij Q = Dij . I (r+1) I (r+1) I 3. 1. As outlined earlier. ⎥ = ⎢ . each wi = αi qt will be a scalar multiple of qt for some αi ∈ IR. j ≤ n. Suppose the following assumptions hold. Furthermore. AIB and ABB can be partitioned into blocks of size (k − 1). and to obtain analytical expressions for the eigenvalues of its blocks. each of which are diagonalizable by the discrete sine transform matrix Q.. ⎣ ⎦⎣ . ⎥ ⎢ . for 1 ≤ i. . Since AII . . . ⎦ ⎣ . ⎥. ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ . qmt ) denotes the t’th column of Q. T 4. Let gi = δi qt . αn will solve the following linear system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ (D11 )tt · · · (D1n )tt α1 δ1 ⎢ ⎥⎢ . Let C denote a positive deﬁnite symmetric matrix of size m n partitioned into n × n blocks Cij of size m for 1 ≤ i. The following two results will be employed to show this. ⎥ ⎣ . ⎥ ⎢ . .. the block submatrices of S = (ABB − ATIB A−1 II AIB ) will also be diagonalizable by matrix Q.

. elimination of the common factors qt yields the linear system: ⎧ ⎪ ⎪ (D11 )tt α1 + · · · + (D1n )tt αn = δ1 ⎨ . 0) . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎣ . Consider the following Toeplitz tridiagonal linear system: ⎡ ⎤ ⎡ ⎤ ⎡˜ ⎤ α1 µ b a˜ 0 ⎢ ⎥ ⎢ 1⎥ ⎢ ˜ ⎥ ⎢ . ⎥⎢ . ⎥ ⎢ . . (3. . ⎥ ⎢ c˜ b a ˜ ⎥⎢ .53) 2a˜ 2a˜ Then.. . ⎥ ⎢ . .. . ⎦ c˜ ˜b αd µd ˜. ⎥ = ⎢ .3 FFT Based Direct Solvers 135 Proof. ⎥ ⎢ . then: ! ρd+1 ρi1 − ρd+1 ρi2 αi = 2 1 . ⎪ ⎩ Cn1 qt α1 + · · · + Cnn qt αn = qt δn . ⎥ ⎢ . −˜b + ˜b2 − 4 a ˜ c˜ −˜b − ˜b2 − 4 a˜ c˜ ρ1 ≡ and ρ2 ≡ . since (qTt qt ) = 1. . This result can be obtained by an application of Lemma 3. . ⎥ ⎢ .24. . ⎪ ⎩ (Dn1 )tt α1 + · · · + (Dnn )tt αn = δn . ⎪ . . ⎥ ⎢ . .. . ⎦ ⎣ . (3. 0. ⎦ ⎣ . . ˜b. ... By construction. . . αn ) = 0. . . for 1 ≤ i ≤ d.. ⎦ ⎣ . . ⎢ ⎥⎢ . ρ2 ∈ IR as follows: . ⎦ αn qt Cn1 · · · Cnn αn qt αn (Dn1 )tt · · · (Dnn )tt αn When C is symmetric and positive deﬁnite. Deﬁne ρ1 . The next result describes the solution of a Toeplitz tridiagonal system. .. . .. ⎥ ⎢ . it will hold that: ⎡ ⎤T ⎡ ⎤⎡ ⎤ ⎡ ⎤T ⎡ ⎤⎡ ⎤ α1 qt C11 · · · C1n α1 qt α1 (D11 )tt · · · (D1n )tt α1 ⎢ . ⎥ ⎢ .. µd ) = (−˜ c. Alter- natively. ⎣ .. . ⎦⎣ . ⎦⎣ . 3.54) ρd+1 2 − ρd+1 1 . ⎥ ⎢ ⎥ = ⎢ ⎥. c˜ ∈ IR satisﬁes (˜b2 − 4˜ where a a c˜) > 0.. ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ . ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ..52) ⎪ . . . Lemma 3. . ⎥. (3. the following will hold: T T 1.. ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ c˜ ˜b a˜ ⎦ ⎢ . verifying that (3. If (µ1 . both terms in the above expression T will be positive for (α1 . ⎥ ⎢ .52) is nonsingular. Since qt is an eigenvector of each matrix Cij corresponding to eigenvalue (Dij )tt . ⎦ ⎣ .20. . ⎥ ⎢ . . substitute the ansatz wi = αi qt to obtain the linear system: ⎧ ⎪ ⎪ C11 qt α1 + · · · + C1n qt αn = qt δ1 ⎨ ...

µd ) = (0.55) ρd+1 1 − ρd+1 2 Proof. . If (µ1 . . Substitute the ansatz that αi = ρi for 0 ≤ i ≤ (d + 1) into the ﬁnite diﬀerence equations. . for 1 ≤ i ≤ d. 0. . . . then: ρi1 − ρi2 αi = . (3. . −˜a) . This yields the following equations: .136 3 Schur Complement and Iterative Substructuring Algorithms T T 2. .

1. Furthermore. The next result shows that each submatrix SE (r) E (s) of the Schur comple- ment matrix S is diagonalized by the discrete sine transform Q of size (k − 1). by employing Lemma 3.56) ⎪ ⎪ k ⎪ ⎪ a1 hx 2 . a The roots of the characteristic polynomial are given by (3. for 1 ≤ i ≤ d. The general discrete solution to the ﬁnite diﬀerence equations will be of the form: αi = γ1 ρi1 + γ2 ρi2 . ⎪ ⎪ ρ1 (r. for arbitrary γ1 and γ2 . t)2 − 1. γ (r) .55). Solving for γ1 and γ2 yields (3. ρ1 (r. ρ2 (r. Then. Solving for γ1 and γ2 yields (3.25. For 1 ≤ r ≤ (p − 1) the vector qt will be an eigenvector of matrix SE (r) E (r) corresponding to eigenvalue (Drr )tt : SE (r) E (r) qt = (Drr )tt qt . the following results will hold. t) be as deﬁned below: ⎧ (r) (r) ⎪ ⎪ γ ≡ a1 (hx2 /hx1 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ λt (r) (r) ≡ 2a2 (hx1 /hx2 ) 1 − cos( tπ k ) + 2γ (r) ⎪ ⎨ a h2 (r) ω(r. (r) Lemma 3. deﬁne dr = (Lr − Lr−1 − 1). t) ≡ ω(r. ˜ ρ2 + ˜b ρ + c˜ ρi−1 = 0. t). t) + ω(r. ω(r. t) ≡ ω(r. To solve the ﬁrst linear system.53) and they will be real and distinct provided (˜b2 − 4 a ˜ c˜) > 0. In addition. we impose the boundary condition α0 = 0 and αd+1 = 1. t) ≡ 2(r) x2 1 1 − cos( tπ ) +1 (3. a It can be solved simultaneously.23 and 3.24. we impose the bound- ary condition α0 = 1 and αd+1 = 0. t) − ω(r. . t) and ρ2 (r. for each i. Let λt .54). t)2 − 1 ⎪ ⎪ ⎪ ⎩ . we can obtain analytical expressions for the eigenvalues of SE (r) E (s) . provided ρ solves the characteristic equation: ˜ ρ2 + ˜b ρ + c˜ = 0. To solve the second linear system.

3.3 FFT Based Direct Solvers 137 where (Drr )tt is given by: ⎧ .

.

t)dr −ρ2 (r.t)dr ⎨ (Drr )tt = −γ (r) ρ1 (r.t)dr +1 + 1 λ (r) + λ (r+1) 2 2 t t . ⎪ ρ1 (r.t) dr +1 −ρ (r.

59) Matrix SE (r) E (r+1) will be diagonalized by the discrete sine transform Q: QT SE (r) E (r+1) Q = Dr.r+1 is a diagonal matrix of size (k − 1).t) dr+1 +1 −ρ (r+1. . .57) ⎪ ⎩ ρ1 (r+1.. ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −γ (r) λt −γ ⎦ ⎣ 0 ⎥ (r) (r) ⎥ ⎢ ⎣0⎦ ⎣ ⎦ (r) 1 −γ (r) λt γ (r) . ⎥ ⎢ ⎥ ⎢ (r) ⎢ . . t)d(r+1) +1 − ρ2 (r + 1. . ⎥ ⎢ t ⎥ ⎢. ⎥ ⎢ ⎥ ⎢. t)d(r+1) +1 (3. ρ1 (r + 1.t) dr+1 −ρ2 (r+1. ⎥ ⎢ . ⎥ ⎢ . we shall employ the following expression for SE (r) E (r) qt : SE (r) E (r) qt = −ATI(r) E (r) A−1 A (r) (r) qt + AE (r) E (r) qt I (r) I (r) I E −ATI(r+1) E (r) A−1 A (r+1) E (r) qt . For 1 ≤ r ≤ (p − 2) the vector qt will be an eigenvector of the matrix SE (r) E (r+1) corresponding to the eigenvalue (Dr. I (r+1) I (r+1) I Each of the submatrices in the above can be block partitioned into blocks that are diagonalized by Q. ⎥ ⎢ . (3.23 it will follow that qt is an eigen- vector of each of the three matrix terms above. We will determine the eigen- value associated with each term separately.58) where Drr is a diagonal matrix of size (k − 1). where Dr. ⎥ ⎢ ⎥ ⎢ ⎥ θ1 = −γ ⎢ . . where (Dr.r+1 )tt is given by: ρ1 (r + 1. To verify that qt is an eigenvector of SE (r) E (r) . . .t)dr+1 +1 .r+1 )tt qt . By Lemma 3. t) (Dr. (3. 2. ⎥.r+1 .23 will yield the following expression for θ1 : ⎡ ⎤T ⎡ (r) ⎤−1 ⎡ ⎤ 0 λt −γ (r) 0 ⎢. t) − ρ2 (r + 1. Let θ1 denote the eigenvalue of − ATI(r) E (r) A−1 A (r) I (r) I (r) I E (r) associated with eigenvector qt : −ATI(r) E (r) A−1 A (r) (r) qt = θ1 qt .r+1 )tt : SE (r) E (r+1) qt = (Dr. I (r) I (r) I E An application of Lemma 3. 1 2 Matrix SE (r) E (r) will be diagonalized by the discrete sine transform Q: QT SE (r) E (r) Q = Drr . . Proof.r+1 )tt = −γ (r+1) . ⎥ ⎢. . ⎥ ⎢ −γ (r) λ(r) −γ (r) ⎥ ⎢.t) dr+1 −γ (r+1) ρ (r+1. ⎥ ⎢ .

t)dr +1 − ρ2 (r. t)dr − ρ2 (r.138 3 Schur Complement and Iterative Substructuring Algorithms The right hand side above can be evaluated as −γ (r) αdr in Lemma 3. t)dr +1 The eigenvalue θ2 of AE (r) E (r) corresponding to eigenvector qt was derived earlier in this subsection as: 1 . This yields: the choice a ρ1 (r. ˜b = λt and d = dr = (Lr − Lr−1 − 1). ρ1 (r.24 for (r) ˜ = c˜ = −γ (r) . t)dr θ1 = −γ (r) .

we may employ Lemma 3. using matrices Gii of size (p − 1): (Gii )r. t)dr+1 θ3 = −γ (r+1) . We summarize the algorithm next. t)dr+1 − ρ2 (r + 1. which veriﬁes (3. t)d(r+1) +1 − ρ2 (r + 1. matrix Q will diagonalize SE (r+1) E (r) = QDr+1.57).s )ii for 1 ≤ r. ρ1 (r + 1. t) (Dr+1.23 and 3. To obtain an expression for the eigenvalue (Dr+1.24. s ≤ (p − 1). It results in the expression: ρ1 (r + 1. Q diagonalizes SE (r) E (r) = QDrr QT . 2 The eigenvalue θ3 of − ATI(r+1) E (r) A−1 A (r+1) E (r) corresponding to I (r+1) I (r+1) I eigenvector qt can be determined as for θ1 using Lemma 3. t) − ρ2 (r + 1. The preceding result shows that the block submatrices of matrix S are simultaneously diagonalized by the discrete sine transform Q.57) or (3. This yields: ρ1 (r + 1.r QT .24. By construction.s = (Dr. where (Dr. (r) (r+1) θ2 = λt + λt . ρ1 (r + 1.59).23 and 3.r )tt of SE (r+1) E (r) we evaluate − ATI(r+1) E (r+1) A−1 A (r+1) E (r) qt at the eigenvector qt .20 to construct a fast direct solver for S. Thus.r )tt = −γ (r+1) .s )ii is deﬁned by (3. t)dr+1 +1 Combining the three terms yields an expression for the eigenvalue (Drr )tt of SE (r) E (r) corresponding to eigenvector qt : (Drr )tt = θ1 + θ2 + θ3 . . 1 ≤ i ≤ (k − 1). t)d(r+1) +1 By construction. us- I (r+1) I (r+1) I ing Lemma 3. t)dr+1 +1 − ρ2 (r + 1. Matrix Gii will be tridiagonal.

3. f TE (p−1) 1.3 FFT Based Direct Solvers 139 Algorithm 3. For i = 1. . .3. . k − 1 do . . .2 (FFT Based Solution of SuB = f B ) T T Let uB = uTE (1) . . . p − 1 do 4. . . p − 1 in parallel do: Compute f E (i) ≡ QT f E (i) ˜ 2. uTE (p−1) and f B = f TE (1) . . . Endfor 3. For j = 1. . . . . . . . . . For i = 1.

. uTE (p−1) . RE]. k − 1 do: Deﬁne cE (i) )j = (cj )i (˜ 11. CH14. . Endfor 13. . . Remark 3. . Endfor T Output: uB = uTE (1) . . In the case of a two strip decomposition. the preceding algorithm will have a complexity of O (p k log(k)). . Endfor 12. Endfor 6. CH13. 14. The loop between lines 7 and 8 requires the solution of a total of (k − 1) tridiagonal linear systems.27. the Schur complement matrix S = SE (1) E (1) will be diagonalized by the discrete sine transform Q: S = QD11 QT . As a result. . The loop between lines 1 and 2 requires the application of a total of (p − 1) fast sine transforms. . . . For i = 1. . For j = 1. . For i = 1. Deﬁne (gj )i ≡ ˜f E (i) j 5. .26. . Endfor 7. Remark 3. . . . each involving (p − 1) unknowns. p − 1 do: 10. p − 1 do: Compute uE (i) = Q˜ cE (i) . . Such eigendecompositions can be employed to precondition a two subdomain Schur complement matrix arising in two dimensional elliptic problems and will be considered in the next section [BJ9. . Endfor 9. . For j = 1. k − 1 in parallel solve (using a tridiagonal solver): Gjj cj = gj 8. The loop between lines 13 and 14 requires the application of a total of (p − 1) fast sine transforms.

We omit additional details. the block matrix techniques that were described can also be applied to discretizations of separable elliptic equations on three dimensional rectangular domains with strip subdomains. 3. and order the nodes in Ω based on Ω1 . In this case. In such cases. the nodal vectors uj correspond to nodal unknowns on planar cross sections of Ω. where the geometry of the interface B is relatively simple.28. see Fig. 3. provided. Given this ordering. Matrix Q will then be a two dimensional FFT or FST ma- trix. a nodal vector u can be partitioned as . the stiﬀness matrix A and the Schur complement matrix S will have block tridiagonal structure. We shall describe preconditioners based either on local Schur complement matrices or on approximations of S which use properties of the Steklov-Poincar´e map associated with S. and the algebraic expressions derived in this section for eigenvalues of (r) the Schur complement blocks will remain valid provided λj and γ (r) cor- respond to eigenvalues of block matrices M (r) and γ (r) . its entries will decay in magnitude with increasing distance between the nodes.3. We consider a ﬁnite element discretization of elliptic equation (3.1) on a domain Ω. In this section. respectively. S will be dense. however.4 Two Subdomain Preconditioners Our study of preconditioners for the Schur complement S begins with the two subdomain case. we have focused solely on FFT based Schur com- plement solvers for discretizations of elliptic equations on two dimensional do- mains. We assume that Ω is partitioned into two nonoverlapping subdomains Ω1 and Ω2 with interface B ≡ ∂Ω1 ∩ ∂Ω2 .140 3 Schur Complement and Iterative Substructuring Algorithms Remark 3. in the three dimensional case. Ω2 and B. with Dirichlet boundary conditions on BD = ∂Ω. However.

T T (1) (2) AIB AIB ABB uB fB Regular decomposition Immersed decomposition Ω2 B Ω2 B Ω1 Ω1 Fig.1): ⎡ ⎤⎡ (1) (1) (1) ⎤ ⎡ (1) ⎤ AII 0 AIB uI fI ⎢ ⎥⎢ ⎢0 (2) ⎥ ⎢ (2) ⎥ AII AIB ⎥ (2) (2) ⎣ ⎦ ⎣ uI ⎦ = ⎣ f I ⎦. Two subdomain decompositions . 3. uTB . and the discretization of (3.1) will be (see Chap. uI . T (1)T (2)T u = uI .3. 3.

2 and substituting this into the third block row above. 3.4 Two Subdomain Preconditioners 141 The Schur complement matrix S associated with the above system can be (i) (i)−1 (i) (i) derived by solving uI = AII (f I − AIB uB ) for i = 1. This will yield the reduced system: .

• Preconditioner based on algebraic approximations of S. S will be dense. (1)T (1)−1 (1) (2)T (2)−1 (2) S uB = f B − AIB AII f I − AIB AII f I . 2. however. where S (i) = ABB − AIB AII AIB . S) ≡ .1 Preconditioners Based on Subdomain Schur Complements The use of the local Schur complement S (i) to precondition S can be motivated by the matrix splitting of S by the subassembly identity (3. or as the coeﬃcient a(x) and the subdomain size h0 varies. In this section. (1)T (1)−1 (1) (2)T (2)−1 (2) where S ≡ (ABB − AIB AII AIB − AIB AII AIB ) is the two subdomain Schur complement. Of these. We shall seek preconditioners M for S such that: λmax M −1 S cond(M. its action can be computed without its assembly. without deterioration as h → 0+ . we shall describe three categories of Schur complement preconditioners for two subdomain decompositions: • Preconditioners based on subdomain Schur complements.20): (i) (i)T (i)−1 (i) S = S (1) + S (2) . (i) since B = B (1) = B (2) and RB = I for i = 1. 3. the preconditioners based on subdomain Schur complements are more easily generalized to the many subdomain case and higher dimensions. • Preconditioners based on FFT’s and fractional Sobolev norms.4. into the algebraic −1 expression S = (ABB − AIB AII AIB ) for the Schur complement matrix: T ⎧ . This splitting may also be (1) (2) derived by substituting the identity ABB = ABB + ABB . λmin (M −1 S) is signiﬁcantly smaller than the condition number of S.

(1)T (1)−1 (1) (2)T (2)−1 (2) ⎨ S = A(1) (2) BB + ABB − AIB AII AIB − AIB AII AIB .

.

⎩ = A(1) − A(1)T A(1)−1 A(1) + A(2) − A(2)T A(2)−1 A(2) = S (1) + S (2) . except when c(x) = 0 and Ωi is immersed in Ω. even if it were (i) (i) assembled). in which case S (i) will be singular. 3. Matrix S (i) need not be assembled (and it will be dense. each S (i) will be symmetric and positive deﬁnite. however. For simplicity. . It will be important to solve the system S (i) vB = rB eﬃciently.7). BB IB II IB BB IB II IB Typically. we shall assume S (i) is nonsingular (see Chap.

and by selecting vB . we list a discretization of the Steklov-Poincar´e formulation (1. I AIB ABB I The subdomain stiﬀness matrix here corresponds to the discretization of an elliptic equation on Ωi with Neumann boundary data on B (i) .19) from Chap. Its name arises. the cost of preconditioning with S (i) is typically less than half the cost of solving Av = r. and derive the discrete version of the Dirichlet-Neumann al- (1) (1) (2) (2) gorithm.9 that cond(S (i) . It suggests that (i) (i) (i) the solution to S (i) vB = rB can be obtained by solving (3. In the special case where the elliptic equation and the grid is symmetric about B. When the number of unknowns on each subdomain is approximately half the total number of unknowns. 1. Iteratively replace the transmission boundary conditions on B using a relaxation parameter 0 < θ < 1. Below. since a Neumann problem must be solved on Ωi and a subsequent Dirichlet problem on its complemen- tary domain. S (i) is traditionally referred to as the Dirichlet-Neumann preconditioner for S. a discretiza- tion of the Steklov-Poincar´e formulation (1.142 3 Schur Complement and Iterative Substructuring Algorithms Fortunately. S) ≤ c. respectively. (k) To obtain a discrete version of the Dirichlet-Neumann algorithm. - −1 (i) 0 AII AIB 0 (i) S (i) rB = (i)T (i) rB . FU. and uI and uB the k’th iterate on Ω2 and B (2) . wB )T and (wI . Let (wI .60) using rB to (i) (i) replace S (i) vB in the right hand side.1. such a system can be solved without assembling S (i) .3. the number of iterations required depends on the eﬀectiveness of this preconditioner. . -T (i) (i) −1 .19) will yield: ⎧ ⎪ ⎪ (1) (1) (1) (1) AII wI + AIB wB = f I (1) ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (1) wB = wB (2) ⎪ ⎪ (2) (2) (2) (2) (2) ⎪ ⎪ AII wI + AIB wB = f I ⎪ ⎪ ⎪ ⎩ (2)T (2) (2) (2) (1)T (1) (1) (1) AIB wI + ABB wB = −AIB wI − ABB wB + f B . MA29]. it will hold that S (1) = S (2) and cond(S (i) . respectively. S) = 1.60) (i) AIB ABB (i) vB S (i) vB (i) This identity can be veriﬁed by block elimination of vI . see [BJ9. Formally: . - (i) (i) (i) AII AIB vI 0 T (i) = (i) . Then. by using the following algebraic property satisﬁed by S (i) : .1 from Chap. (3. respectively. However. BR11. 1. in the Dirichlet-Neumann algorithm (Alg.3). It is shown in Chap. let vI and (k) (k) (k) vB denote the k’th iterate on Ω1 and B (1) . for c > 0 independent of h. 3. wB )T denote nodal vectors associated with ﬁnite element functions on Ω 1 and Ω 2 .3.

uB )T denote starting iterates. Endfor . IB I BB 2 B IB I BB B 4.1 (Dirichlet-Neumann Algorithm) (0) (0) (0) (0) Let (vI .4 Two Subdomain Preconditioners 143 Algorithm 3. 1. 1.4. 3. Solve the Dirichlet problem: ⎧ ⎨ A(1) v(k+1) + A(1) v(k+1) = f (1) II I IB B I ⎩ (k+1) vB (k) = θ uB + (1 − θ) vB (k) 3. vB )T and (uI . For k = 0. Solve the mixed problem: ⎧ ⎨ A(2) (k+1) II uI (2) (k+1) + AIB uB (2) = fI ⎩ A(2)T u(k+1) + A(2) u(k+1) = f − A(1)T v(k+1) − A(1) v(k+1) . · · · until convergence do: 2.

T .

uI . A matrix (k+1) form for this can be derived by solving for vI in step 2: ⎧ −1 . uB (k+1) (k+1) If the interior variables vI and uI are eliminated in the preceding (k+2) (k+1) algorithm. T (k)T (k)T (k)T (k)T Output: vI . vB . we may obtain an expression relating vB to vB .

Solving the resulting block (k+1) (k+1) system using block elimination (representing uI in terms of uB ) yields: ⎧ . B B B and substituting this into the equations in step 3. ⎨ v(k+1) = A(1) (1) (1) (k+1) f I − AIB vB I II ⎩ v(k+1) = θu(k) + (1 − θ) v(k) .

(1)T (1)−1 ⎪ ⎪ (k+1) S (2) uB = f B − AIB AII (1) (1) (k+1) f I − AIB vB ⎪ ⎨ (1) (k+1) (2)T (2)−1 (2) ⎪ −ABB vB − AIB AII f I ⎪ ⎪ ⎩ (1)T (1)−1 (1) (2)T (2)−1 (2) (k+1) = f B − AIB AII f I − AIB AII f I − S (1) vB . this reduces to: −1 . T −1 T −1 (1) (1) (1) (2) (2) (2) Deﬁning ˜f B ≡ f B − AIB AII f I − AIB AII f I .

. we summarize the action (i) of the Dirichlet-Neumann preconditioner M = S (i) on a vector rB . uB = S (2) B (k+2) (k+2) (k+1) (k+1) Since vB is deﬁned as vB = θ uB + (1 − θ) vB . this shows that the preceding Dirichlet-Neumann algorithm corresponds to an unaccelerated Richardson iteration to solve the Schur complement system S uB = ˜f B with M = S (2) as a preconditioner and θ as a relaxation parameter. Below. We may also employ M = S (1) as a preconditioner for S. (k+1) ˜f B − S (1) v(k+1) .

2 (Dirichlet-Neumann Preconditioner) Input: rB Solve: . and grid on the two subdomains diﬀer signiﬁcantly. with Neumann boundary conditions on B. Both issues are addressed by the balancing procedure in Chap. matrices S (1) and S (2) can also diﬀer signiﬁcantly. the solution will be unique only up to a multiple of 1. and in parallel. the local stiﬀness matrix A(i) and its Schur complement S (i) will be singular. When the geometry. Furthermore. when c(x) = 0 and B (i) = ∂Ωi . it may be more equitable to combine information from both the subdomains in the preconditioner. - (i) (i) (i) AII AIB vI 0 (i)T (i) (i) = . coeﬃcients. .3 (Neumann-Neumann Preconditioner) Input: rB and 0 < α < 1 . In this case. As mentioned earlier. 2. Remark 3. 1) . A A vB rB IB BB Output: M −1 rB ≡ (i) vB .29. When applying the Dirichlet-Neumann preconditioner. Algorithm 3. The action of the inverse of this preconditioner is summarized below.7. where 0 < α < 1 is a scalar parameter for assigning diﬀerent weights to each subdomain (though. MA17]. 3. see [MA14. . with T the null space of S (i) spanned by 1 = (1.144 3 Schur Complement and Iterative Substructuring Algorithms Algorithm 3. . hence.4. since S (i) vB = rB will be solvable T only if the compatability condition 1 rB = 0 is satisﬁed. the name Neumann-Neumann preconditioner [BO7]. a speciﬁc Schur complement matrix S (i) must be chosen. typically α = 12 ). The action of the inverse of the two subdomain Neumann-Neumann preconditioner M is deﬁned as: −1 −1 M −1 ≡ α S (1) + (1 − α) S (2) . for i = 1. Computing the action of M −1 requires the solution of a discretized elliptic equation on each subdomain.4. In this case. . the Dirichlet- (i) Neumann preconditioner must be modiﬁed. This motivates the Neumann-Neumann preconditioner.

wB : . Endfor Output: M −1 rB ≡ α wB + (1 − α) wB . - (i) (i) (i) AII AIB wI 0 (i)T (i) (i) = . For i = 1. T (i) (i) 1. 2 in parallel solve for wI . AIB ABB wB rB 2. (1) (2) .

3. and convergence rates independent of h.30. By construction. If M is a matrix that generates the latter fractional Sobolev norm energy. whose interface B ˜ has the same ˜ number of unknowns as on B.1) posed on Ω may be heuristically approximated by an elliptic equation posed on Ω ˆ with (possibly modiﬁed) coeﬃcients a ˆ(x) and cˆ(x) approximating a(x) and c(x). such methodology is primarily applicable in two dimensions. Ω2 and B. 3. Proof. Theoretical analysis in Chap. respectively: ∇ · (ˆ u) + cˆ(x) = fˆ(ˆ a(x)∇ˆ ˆ∈Ω x). when Ω ⊂ IR2 .4. for x ˆ ˆ ∈ ∂ Ω. FU. The advantage of FFT based preconditioners is that when they are applicable. In the fractional Sobolev norm approach. then it will provide a heuristic FFT based preconditioner for S.e.9. Ω ˆ and interface B that approximate Ω1 . It also applies to subdomains with arbitrary geometry in two or three dimensions.2 Preconditioners Based on FFT’s and Fractional Norms Preconditioners for S. the Schur complement S on an interface B is approximated by the Schur complement S˜ of a model problem on another domain. with subdomains Ω ˆ2 ˆ1 . while the Neumann-Neumann preconditioner requires two subdomain solves. If the Schur complement S in the model problem has FFT solvers. can be motivated in alternate ways. then the condition number cond(M. S) will be bounded independent of the mesh size h. Model Problem Based Preconditioners. for applicability. . or have a multilevel structure. In a model problem based approach.4 Two Subdomain Preconditioners 145 An advantage of the Neumann-Neumann preconditioner is that each local problem is typically easy to set up. an equivalence between the energy of the Schur complement S and a fractional Sobolev norm energy of its boundary data on B is employed. with interface B.9 in- dicates that the Dirichlet-Neumann and Neumann-Neumann preconditioners typically yield convergence rates which are independent of the mesh size h. respectively. If M denotes the Dirichlet-Neumann or Neumann-Neumann preconditioner for a two subdomain decomposition. for x ˆ (3. the grid on B must either be uniform. and its algebraic form extends easily to multisubdomain decompositions.. See [BJ9. the elliptic equation (3. it can be employed to precondition the Schur complement S. BR11. MA29] and Chap. Then. based on FFT’s and fractional Sobolev norms.61) u ˆ = 0. However. i. they yield almost optimal order complexity. Given a domain Ω with subdomains Ω1 and Ω2 . the preceding Dirichlet-Neumann preconditioner requires only one subdomain solve. Theorem 3. 3. In three dimensions. let Ωˆ be a region approximating Ω. The model problem approach is heuristic in nature. 3.

ˆ Block partitioning the unknowns in Ω ˆ based on the subregions yields the system: . . with Sˆ = QDQT for a diagonal matrix D.31 for a two dimensional domain Ω. If Ω ˆ is a small subregion of Ω satisfying B ⊂ Ω ˆ ⊂ Ω. we list diﬀerent choices of diagonal entries Dii for 1 ≤ i ≤ k.61) on this grid. see [DR. ˆ The grid on Ω will be chosen to be uniform. as follows. and the coeﬃcients a ˆ(x) and cˆ(x) will be chosen to be constant in each subdomain. CH2. and system (3. we may choose aˆ(·) = a(·) and cˆ(·) = c(·). MA37. see [NE3.32. we may substitute ˆ f I = 0. Ω2 = Ω2 ∩ Ω and B = B. for diﬀerent choices of diagonal matrices D. Let Th (Ω)ˆ denote a ˆ triangulation of Ω having the same number of interior nodes in B ˆ as in B. Remark 3.ˆ Ω ˆ1 and Ω ˆ2 to be rectangular regions. Heuristically.62) AˆIB AˆBB T u ˆB fB The Schur complement matrix Sˆ = (AˆBB − AˆTIB Aˆ−1 ˆ II ABB ) in the model prob- lem may then be employed as a preconditioner for S.57) and (3. ˆ let Sˆ denote the Schur complement associated with the discretized model problem on Ω.146 3 Schur Complement and Iterative Substructuring Algorithms A preconditioner Sˆ can be constructed for S. Furthermore. ˆ is obtained by mapping Ω and if B maps into B. we elaborate on the preconditioner outlined in Remark 3. then we ˆ ˆ ˆ ˆ ˆ may deﬁne Ω1 = Ω1 ∩ Ω.63) We employ a model Schur complement preconditioner Sˆ = QDQT for S. 1 ≤ i. j ≤ k.58). GO3. In this case. when the triangulation of Ω ˆ restricted to B ˆ has the same connectivity as the original triangulation restricted to B.61) is separable. BR11]. the coeﬃcients a ˆ1 and Ω ˆ(x) and cˆ(x) may be chosen so that (3. To construct a preconditioner Sˆ for S. (3.62) and deﬁne the action of the inverse of a preconditioner as Sˆ−1 rB ≡ u ˆ B . and consider a discretization of (3. ˆ Remark 3. in two dimensions.ˆ then a FFT solver can be constructed for S. We shall assume that the interface B can be mapped onto a line segment B.. Ωˆ2 and interface B. Qij ≡ 2/(k + 1) sin (i j π/(k + 1)) . SM]. Given the subdomains Ω ˆ1 . If Ω ˆ we may seek Ω ˆ2 that are rectangular.62) will have a coeﬃcient matrix which is a small submatrix of A. If k denotes the number of unknowns on B (and hence on B). BJ9. - AˆII AˆIB u ˆI ˆ f = ˆI . see (3. (3. yielding diﬀerent preconditioners: . In this case. Next.ˆ and choose Ω. matrix Sˆ should be approximately spectrally equivalent to S.31. ˆ we deﬁne the discrete sine transform matrix Q of size k as: . ˆ f B = rB into (3. Next. matrix Sˆ may be explicitly diagonalized by a discrete sine transform matrix Q. If a uniform triangulation is employed on Ω.

for 1 ≤ i ≤ k ⎪ ⎪ 2(k* + 1) ⎪ ⎨ 1 1 1 + σi − σi + σi2 (3.4 Two Subdomain Preconditioners 147 ⎧ (1) 1/2 ⎪ ⎪ Dii = a + a (2) (σi ) . [BJ9. Then.65) ⎪ ⎪ 2 4 ⎪ ⎪ γ i ≡ * . so the user may need to rescale the output. for 1 ≤ i ≤ k. where Q is the discrete sine transform. the combined cost for solving the linear system Sˆ wB = rB will be O(k log(k)).4 (FST Based Fractional Norm Preconditioner) Input: rB . ii i i 2 (3. ⎪ [BR11. CH2] ⎪ ⎪ 1 − γ 1 − γ m2 +1 4 ⎪ ⎪ i i ⎩ D = 1 a(1) + a(2) σ − σ 2 /61/2 . so that the transform applied twice should yield the identity. CH2] can be formally obtained as follows for a two dimensional domain Ω ˆ with k interior nodes on B. CH2] are integers chosen so that (mi + 1) h and (k + 1)h represents the approximate length and width of subdomain Ωi . The resulting preconditioner Sˆ = QDQT for S. ⎪ ⎪ 1 1 ⎩ 1 + σi + σi + σi2 2 4 The scalars a(1) and a(2) denote values of a(x) at some interior point in Ω1 and Ω2 . Evaluate using the fast sine transform: wB = Q xB Output: Sˆ−1 rB ≡ wB .64) Here.33. Let (mi + 1) h the approximate length of subdomain Ω ˆi and let the (i) ˆ constant in Ω coeﬃcients of the model problem be isotropic with a ˆi . Algorithm 3. a(1) and a(2) will be eigenvalues of a(x) at chosen interior points in Ω1 and Ω2 . However. the parameters σi and γi are deﬁned by: ⎧ ⎪ iπ ⎪ ⎪ σi ≡ 4 sin 2 . is summarized next. D 1. Compute in linear complexity: xB = D−1 yB 3. it will hold that Q−1 = Q. [GO3] 1/2 ⎪ 1 + γim1 +1 (2) 1 + γi m2 +1 1 ⎪ ⎪ Dii = a(1) m1 +1 + a σi + σi2 . BJ9]. respectively. 3. Since the cost of applying a discrete sine transform is typically O(k log(k)). When a(x) is a matrix function. [DR] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (1) (2) σi + σi2 /4 1/2 ⎨ Dii = a + a . when a(x) is a scalar function. The choice of diagonal matrix D in [BJ9. The parameters m1 and m2 in the preconditioner of [BJ9.4. . Remark 3. in FFT packages [VA4] the discrete sine transform may be scaled diﬀerently. Since the discrete sine transform matrix Q is symmetric and orthogonal. Evaluate using the fast sine transform: yB = Q rB 2.

When the parameters m1 and m2 are large above. the eigenvalues Dii of the Schur complement matrix Sˆ in the model problem simpliﬁes to: m1 +1 m2 +1 1/2 (1) 1 + γi (2) 1 + γi 1 2 Dii = a +a σi + σi . 1 − γim1 +1 1 − γim2 +1 4 for 1 ≤ i ≤ k.148 3 Schur Complement and Iterative Substructuring Algorithms for hx1 = hx2 = h.57).65). This follows by algebraic simpliﬁcation of (3. the expression for the eigenvalues Dii can be approximated as: . where σi and γi are as deﬁned in (3.

both preconditioners will be spectrally equivalent 1/2 1/2 1/2 1/2 √ since σi + σi2 /4 = (σi ) (1 + σi /4) and 1 < (1 + σi /4) < 2 for 0 < σi < 4.∂Ωi can be shown to be equivalent to uh H 1/2 (B) when uh is zero on ∂Ωi \B. By construction. BE16] as outlined below. 00 A matrix M satisfying this property can be constructed by employing the theory of Hilbert interpolation spaces [BA3. the norm equivalences (3. between the energy as- sociated with a two subdomain Schur complement S and a fractional Sobolev norm energy.1. Similar preconditioners can be constructed in three dimensions provided the grid on the interface B can be mapped into a two dimensional rectangular grid. LI4. 1/2 Dii → a(1) + a(2) σi + σi2 /4 . Fractional Sobolev Norm Based Preconditioners. see [BR17] and Chap.28) reduce to: c uh 2H 1/2 (B) ≤ uTB SuB ≤ C uh 2H 1/2 (B) . see (3.28) and (3. 00 see [LI4]. Given two Hilbert spaces satisfying H0 ⊃ H1 where the latter space has a stronger norm u H0 ≤ C u H1 for all u ∈ H1 . Matrix Q will then be a two dimensional fast sine trans- form. Such norm equivalences hold for harmonic and discrete harmonic functions. The preconditioner of [DR] can be formally obtained from the 1/2 preconditioner of [GO3] by replacing the terms σi + σi2 /4 by the terms 1/2 (σi ) .66) 00 00 for 0 < c < C independent of h. and is proved using elliptic regularity theory. When the grid on B is not rectangular.34. preconditioners approximating fractional Sobolev norms can be constructed using multilevel methodology. This norm equivalence suggests that a preconditioner M can be constructed for S. provided the grid has a multilevel structure.32) and (3. since 0 < γi < 1. Remark 3. For two subdomain decompositions. This approach is mo- tivated by a norm equivalence. (3. since the fractional Sobolev norm uh 1/2. This heuristically motivates the precondi- tioner of [GO3]. 7. for 1 ≤ i ≤ k.32). a family of interpolation . by representing the discrete fractional Sobolev energy as: uh 2H 1/2 (B) = uTB M uB .

u)0 ≤ C (u. 0 ≤ α ≤ 1. BE16]. LI4. • Let H0 denote an Hilbert space with inner product (. u)1 . the fractional index Sobolev space H00 (B) is often constructed as an interpolation space H 1/2 obtained by interpolating 1/2 H0 = L2 (B) and H1 = H01 (B). so that H0 = H0 and H1 = H1 : Hα ≡ {u ∈ H0 : (T α u. . which corresponds to a Riesz representation map. v)0 = i=1 λα i (Pi u. 1/2 In elliptic regularity theory.)0 and let H1 ⊂ H0 denote a subspace with a stronger inner product (.)0 orthogonal projections onto the eigenspace of T corresponding to eigenvalue λi . Then. with associated inner products deﬁned as outlined below [BA3.. • Let T denote a self adjoint coercive operator satisfying: (T u. i=1 Then.. u)0 < ∞} .. This procedure deﬁnes interpolation spaces Hα satisfying H1 ⊂ Hα ⊂ H0 . v)0 . v)1 . for 0 ≤ α ≤ 1 we may formally deﬁne a fractional operator T α as: ∞ Tα ≡ λα i Pi .4 Two Subdomain Preconditioners 149 spaces Hα can be constructed for 0 ≤ α ≤ 1 with H0 = H0 and H1 = H1 . v ∈ H1 . . i=1 where 0 < λ1 < λ2 < · · · are eigenvalues of T and Pi are (. )1 : (u. 3. v)α ≡ (T α u. v)0 = (u. • Let T have the following spectral decomposition: ∞ T = λi Pi . ∀u. where the inner product on Hα is consistently deﬁned by: ∞ (u. . ∀u ∈ H1 . for each 0 ≤ α ≤ 1 the interpolation space Hα is formally deﬁned as the domain of the fractional operator T α . The space H00 (B) will correspond to the 1 domain of the operator T 2 with associated fractional norm deﬁned by: .

H00 (B) 0 The operator T corresponds to a Laplace-Beltrami operator −∆B deﬁned on B with homogeneous boundary conditions on ∂B. u = i=1 λi2 (Pi u. and are examples of pseudodiﬀerential operators. Formally. These fractional operators T α . will not remain diﬀerential operators for 0 < α < 1. u)0 . To obtain a matrix representation of uh 2 1/2 on the ﬁnite element space Vh (B) of ﬁnite element functions H00 (B) . we may employ fractional powers of ma- trices to represent fractional operators. however. ∀u ∈ H00 (B). the fractional powers of T may be computed by employing the eigenfunction expansion of T and replacing the eigenvalues of T by their fractional powers. In the ﬁnite dimensional case. 1 ∞ 1 1/2 u 2 1/2 ≡ T 2 u.

uh ) = uTB Ah uB . uh ) = uTB Gh uB . where uB denotes the nodal vector corresponding to the ﬁnite element func- tion uh (x) restricted to B. for α = 1 (Thα uh . Formally. Let Gh denote the mass (Gram) matrix associated with the ﬁnite element space Vh ∩ H01 (B) with standard nodal basis. and let Ah denote the ﬁnite element discretization of the Laplace- Beltrami operator with trivial boundary conditions imposed on ∂B. by construction it will hold that: (Thα uh . for α = 0. we seek a symmetric positive deﬁnite matrix Th satisfying: 0 Th uh . for uh ∈ Vh (B) ∩ H01 (B). a matrix representation of fractional operators associated with Th may be constructed as: . uh = uh 2L2 (B) . for uh ∈ Vh (B) ∩ L2 (B) 1 Th uh .150 3 Schur Complement and Iterative Substructuring Algorithms restricted to B. Then. uh = uh 2H 1 (B) . ·) denotes the L2 (B) inner product. 0 where (·.

1 −1 −1 α 1 Thα = Gh2 Gh 2 Ah Gh 2 Gh2 . for 1 ≤ α ≤ 1. This yields: 1 1 .

⎥.. then Th2 can be eﬃciently computed and its associated linear system can be solved eﬃciently. with zero boundary conditions on ∂B is: d2 u −∆B u(x) ≡ − . . . Then. for u(0) = 0 and u(1) = 0.. and nodal vector (uB )i = uh (ih) corresponds to the ﬁnite element function uh . 1 1 −1 −1 2 Th2 = Gh2 Gh 2 Ah Gh 2 Gh2 . . dx2 If the grid size is h = 1/(k + 1). . . . ⎥ and Gh = ⎢ . . 1). . . . h ⎢ ⎥ 6 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ −1 2 −1 ⎦ ⎣ 1 4 1⎦ −1 2 14 .. then the ﬁnite element discretization Ah of the Laplace-Beltrami operator and the Gram matrix Gh will be of size k: ⎡ ⎤ ⎡ ⎤ 2 −1 4 1 ⎢ ⎥ ⎢ ⎥ ⎢ −1 2 −1 ⎥ ⎢1 4 1 ⎥ 1 ⎢⎢ ⎥ ⎢ ⎥ . ⎥ h ⎢ ⎥ Ah = ⎢ . 1/2 To construct an explicit representation of matrix Th we assume that the interface B corresponds to the line segment (0. . . When matrices Ah and Gh can be simultaneously diagonalized by the discrete 1 sine transform Q. the Laplace-Beltrami operator −∆B deﬁned on B.

⎩ λj (Gh ) = 3(k + 1) 2(k + 1) 1 1 The fractional power Th2 can be represented explicitly as Th2 = QDQT where: ⎧ 1 1 ⎪ ⎪ ⎪ ⎪ 2 2 ⎪ ⎪ D jj = λ j (G h ) λ j (A h ) ⎪ ⎪ ⎪ ⎪ ⎨ 1 1 1 j π 2 j π 2 ⎪ = 3 − 2 sin2 ( ) 4 (k + 1) sin2 ( ) ⎪ ⎪ 3(k + 1) 2(k + 1) 2(k + 1) ⎪ ⎪ ⎪ ⎪ ⎪ 1 1 ⎪ ⎪ 2 jπ jπ 2 1 2 2 ⎩ = √ 3 sin (2 ) − 2 sin (4 ) = σj − σj 3 2(k + 1) 2(k + 1) 6 for σi = 4 sin2 ( 2(k+1) iπ ) where 1 ≤ j ≤ k. j ≤ k.64). however.4 Two Subdomain Preconditioners 151 Matrices Ah and Gh can be simultaneously diagonalized by the one dimen- sional discrete sine transform matrix Q with entries: .4. the Schur complements S may be formulated and applied heuristically. 3. Analogous FFT based preconditioners can be constructed for two subdomain Schur complements in three dimensions. 3. BJ9] in (3. coercive self adjoint elliptic operator.3 Preconditioners Based on Algebraic Approximation of S The Schur complement matrix S arising in a two subdomain decomposition is typically a dense matrix. the subdomain Schur complement preconditioner and the fractional Sobolev norm based preconditioner Sˆ will be spectrally equivalent to S as h → 0.26) and noting . 2(k + 1) ⎪ ⎪ 1 3 − 2 sin2 ( jπ ) for 1 ≤ j ≤ k. 3. by analogy with the two dimensional case. This can be veriﬁed heuristically by computing its entries and plotting its magnitude. provided that the grid on the interface B can be mapped onto a uniform rectangular grid [CH2. may depend on the aspect ratios of the sub- domains. CH13]. for 1 ≤ i. 3.36. The convergence rate. with a(1) = a(2) = 1. Remark 3. or by using expression (3. The following result is proved in Chap. The eigenvalues of matrices Ah and Gh corresponding to eigenvector qj is: ⎧ ⎪ ⎪ 2 jπ for 1 ≤ j ≤ k ⎨ λj (Ah ) = 4(k + 1) sin ( ). For any 2nd order. In this case. It may be implemented as in Alg. and also on the coeﬃcients.9. This choice of D yields the precondi- tioner M = QT DQ of [BR11.4. Lemma 3.35. Qij = 2/(k + 1) sin (i j π/(k + 1)) .4.

CH9]. Below. with increasing distance between the nodes xr and xs . Furthermore.)T ⎪ ⎩ p3 = (0. . we illustrate a speciﬁc choice of probe vectors to construct a tridi- agonal approximation M of S.)T . . with increasing distance between the nodes xi and xj . to approximate S. in this subsection we shall describe two alternative algebraic preconditioners for the Schur complement matrix. BE2. AX] will oﬀer no advantages over direct solution. CH9] does not require the explicit assembly of matrix S. 0. . one based on sparse approximation of the Schur complement using the probing technique. and requiring that the matrix vector products of S with each probe vector pl matches the matrix vector product of M with the same probe vector: M pl = Spl . 1. Choose: ⎧ ⎪ ⎨ p1 = (1. SA2. we shall describe the probing technique for determining a sparse approximation M of S. for 1 ≤ l ≤ (2d + 1). A careful choice of the probe vectors based on the decay in the entries of S can increase the accuracy of the probe approximation M . 1. 1. . Both preconditioners may be applied without assembly of S. but does require the computation of the matrix-vector products of S with the chosen probe vectors. 0. 0. The ﬁrst algebraic preconditioner we shall consider is based on the con- struction of a sparse matrix approximation of S using a probing technique [CH13. and the other based on incomplete factorization of the subdomain (i) matrices AII . This motivates choosing a band matrix M . For Ω ⊂ IR2 . If matrix S is of size k. say of band width d. 1. 0. and three probe vectors p1 . as is the case in itera- tive substructuring methods. Nonzero entries of the band matrix M can be determined by choosing probe vectors pl . if the nodes xi on B are ordered consecutively along B. 0. provided the nonzero entries of M approximate the dominant entries of S. traditional algebraic preconditioners based on ILU factorization [GO4. then the entries of the Schur complement matrix S typically decay along diagonal bands. such factorizations cannot be employed when matrix S is not assembled. . KE7. 0. and arises (l)−1 from the decay in the entries (AII )rs of the discrete Green’s function asso- ciated with the elliptic equation on the subdomains. these requirements yield k (2d + 1) equations for the unknown entries of M .152 3 Schur Complement and Iterative Substructuring Algorithms that the discrete Green’s function A−1 II is a dense matrix as h → 0 (due to + the global domain of dependence of elliptic equations). . . The resulting probing technique [CH13. KE7. In this case 2d + 1 = 3. . . This suggests that a sparse approximation M of S may be eﬀective as a preconditioner. 0. 0. 0. A sparse approximation of S can be heuristically moti- vated by a decay property in the entries Sij of a two subdomain Schur com- plement matrix. . This decay property can be observed when S is assembled explicitly. As a result. say for 1 ≤ l ≤ (2d + 1).) T p2 = (0. 1. and also simplify the linear system for the nonzero entries of M . In the following. p2 and p3 will be suﬃcient. 0. Instead.

These products can be computed without the (i) (i)T (i)−1 (i) assembly of S. . All the nonzero entries of the tridiagonal matrix M can be computed explicitly using the above equations. ⎥ ⎢ .. An algorithm for constructing a tridiagonal matrix M of size k is summarized below. .5 (Probe Tridiagonal Approximation of S) Input: Sp1 .. due to the nonzero entries of S outside the tridiagonal band.. .. Let j = mod(i.4.... Sp3 1. . . . ⎥ ⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦ . .20) with S (i) = ABB − AIB AII AIB .. 3). As input. if the entries of S decay rapidly outside the tridi- agonal band. Sp2 . 3 yields: ⎡ ⎤⎡ ⎤ ⎡ ⎤ m11 m12 100 m11 m12 0 ⎡ ⎤ Sp1 Sp2 Sp3 ⎢m m m ⎥⎢0 1 0⎥ ⎢m m m ⎥ ⎢ 21 22 23 ⎥⎢ ⎥ ⎢ 21 22 23 ⎥ ⎢ . . ⎥ ⎢ . ⎥ ⎢ . If the Schur complement matrix S is tridiagonal. ... . . 3) ≡ 2 if i = 3k + 2. . or based on the identity S = ABB −ATIB A−1 II AIB . . The computational cost of con- structing a tridiagonal approximation M of S will essentially be proportional to the cost of computing three matrix-vector products with S. ⎥ ⎢ 32 33 34 ⎥⎢ ⎥=⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ . . . . 3) denotes the remainder in the division of i by 3. Thus.. then this approximation may be reasonably accurate.. ⎢ ... we employ the notation: ⎧ ⎪ ⎨ 1 if i = 3k + 1. 3. . . mod(i.4 Two Subdomain Preconditioners 153 Equating M pk = Spk for k = 1. . ⎥ ⎢ . . Remark 3. .⎥⎢ .. ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ . . 2. 3. . . . . this algorithm requires three matrix vector products of the form Spj for j = 1. 2. . However. it is easily ver- iﬁed that the entries Mij of the reconstructed matrix will match Sij .. the reconstructed entries Mij will only be approximations of the corresponding entries Sij .i ≡ (Spj )i+1 Endif 5. Remark 3. . Endfor Output: Tridiagonal matrix M . . . however. . More generally. . k do: 2.. .37. . for some integer k ⎪ ⎩ 3 if i = 3k. ⎥ . .i+1 ≡ (Spj+1 )i mi+1. . 3. . . . . If i < k deﬁne: mi. . for some integer k. for some integer k mod(i. . For an integer i.38. ⎥ ⎢ .. ⎥ ⎢ . . . For i = 1. . ⎥ ⎢ . mii = (Spj )i 4. . · · · . ⎥ ⎢ m m m ⎥ ⎢ 0 0 1 ⎥ ⎢ m34 m32 m33 ⎥ ⎢ .. Algorithm 3. using the identity (3.

We conclude our discussion on algebraic approximation of S. The following result concerns a tridiagonal probe approximation based on three probe vectors. Proof. with another (i) approximation based on an incomplete factorization of the matrices AII . however.40. it may be necessary to further approximate M by its ILU factorization. Thus. then the tridiagonal probe approxima- tion M of S will also be an M -matrix. if i = j. into d colors.. i. . A reconstruction algorithm may be derived for M using the symmetry of M . In the two subdomain case. a diﬀerent probing technique [KE7] involving only two probe vectors can be employed to construct a symmetric approximation M of S. given a nonsymmetric tridiagonal matrix M . requires some care. deﬁne d probe vectors p1 . See [CH9].e. to enable eﬃcient solvability of the preconditioner. . then nodes j and k cannot be of the same color. . If S is an M -matrix. Mji }. pd so that pj is one at all indices corresponding to the j’th color and zero on all other nodes. To construct an approximation M of S with the same sparsity pattern as G. To generalize to other sparsity patterns. if i = j M˜ ii = Mii . Such approximations will be of interest primarily for multisubdomain decomposi- tions [CA33].154 3 Schur Complement and Iterative Substructuring Algorithms Remark 3. The reconstructed matrix M in the above algorithm may not be symmetric. Given such a coloring of the nodes.67) ⎪ ≈ ABB − AIB L LII AIB − AIB L LII AIB ⎪ ⎪ II II ⎩ ≡ M. The tridiagonal probing procedure described above can be easily gener- alized to to band matrices with larger bandwidths. the method employs an incomplete T (i) ˜ (i) L ˜ (i) for i = 1. if node i is adjacent to nodes j and k in G. 2 factorization of the subdomain stiﬀness matrices AII ≈ L I I to compute a low cost dense approximation M of S: ⎧ ⎪ (1)T (1)−1 (1) (2)T (2)−1 (2) ⎪ ⎪ S = ABB − AIB AII AIB − AIB AII AIB ⎨ (1)T ˜ (1)−T ˜ (1)−1 (1) (2)T ˜ (2)−T ˜ (2)−1 (2) (3. Furthermore. However. its symmetrization M ˜ will also be an M -matrix. the ﬁrst step would be to determine a coloring or partitioning of the nodes so that nodes of the same color are not adjacent in G.39. For a model Laplacian on a rectangular grid with periodic boundary conditions on two boundary edges. a symmetric tridiagonal approximation M ˜ may be obtained as: M˜ ij = max{Mij . Once a sparse approximation M of S has been constructed. Mij = Mji . Alternatively. . . the condition number of the tridiagonal probe approximation will satisfy cond(M. Suppose G denotes the adja- cency matrix representing the sparsity pattern desired for M . S) ≤ C h−1/2 in comparison to cond(S) ≤ C h−1 . We omit further details. Lemma 3.

referred to as globs.7 on Neumann-Neumann preconditioners. However. followed by threshold truncation M˜ of M will yield a sparse approximation ˜ M of S. which can then be used as a preconditioner [CA33] for S. if |Mij | ≤ η (|Mii | + |Mjj |) Mij ≡ (3. but to implement such preconditioners.5 Preconditioners in Two Dimensions 155 If the matrix S is of size k. will be deﬁned based on a partition of the interface B into subregions Gi ⊂ B. then the cost of constructing M will typically be proportional to O(k 2 ). Further- more. 3. The use of incomplete factorization to construct a dense approximation M of S. However. then the approximation M in (3.1) is typically more diﬃcult to precondition for multisubdomain decompositions and in higher dimensions. The approximation M will typically be dense. see [BR24]. With the exception of the coarse space V0 ⊂ V . As the size h0 of each subdomain decreases. the entries in the nonzero blocks will decay in magnitude with increasing distance between the nodes. 3.68) is ˜ will applied to the resulting dense matrix M .5) on a domain Ω ⊂ IR2 . if threshold truncation as in (3. Importantly. In this section. These globs may be extended to deﬁne overlapping or non-overlapping segments on B. the condition number of the multisubdomain Schur complement matrix increases from O(h−1 ) to O(h−1 h−1 0 ). Lemma 3. In the multisubdomain case. in particular).68) Mij . . See [CA33]. otherwise. endowed with the inner product generated by S.5 Preconditioners in Two Dimensions The Schur complement matrix S associated with the discretization of elliptic equation (3. the other subspaces Vi ⊂ V required to deﬁne the Schwarz subspace algorithm for S. for the space V = IRnB of nodal vectors on B.3. the truncated approximation M also be an M -matrix. the submatrices of S must be approximated without assembling S. 3.67) will also be an M -matrix.41. Schwarz subspace precon- ditioners employing suitable overlap between blocks of S can be eﬀective (see Chap. Each of the preconditioners we describe will have the structure of an additive Schwarz subspace preconditioner from Chap. and to the properties of the Steklov-Poincar´e map on B. This diﬃculty can be attributed to the increasingly complex geometry of the interface B for a multisubdomain decomposition. 2. Proof. Furthermore. BPS and vertex space preconditioners for a multisubdomain Schur complement matrix S associated with (3. If A is an M -matrix. we shall describe the block Jacobi. suﬃciently small entries of M may be truncated to zero using a threshold parameter η > 0: ˜ 0. the Schur complement matrix will have zero block entries corresponding to nodes on disjoint subdomains ∂Ωi ∩∂Ωj = ∅.

Eq so that each El corresponds uniquely to a nonempty connected segment Bij .. 3.69) ⎣ SE1 Eq · · · SEq Eq SEq V ⎦ T SE 1V · · · SE T qV SVV . 3. ⎪ for 1 ≤ i ≤ p B ≡ ∪pi=1 B (i) ⎪ ⎩ Bij ≡ int B (i) ∩ B (j) . . with some chosen ordering within each edge El and cross-point set V. . Consider a ﬁnite element discretization of elliptic equation (3. . then the Schur complement matrix can be block partitioned as: ⎡ ⎤ SE1 E1 · · · SE1 Eq SE1 V ⎢ .156 3 Schur Complement and Iterative Substructuring Algorithms Ω1 Ω2 cross Gm point ⊗ vertex vl region Ω7 Ω8 an edge El Fig. . for 1 ≤ i. Eq . ⎥. Here int(B (i) ∩ B (j) ) refers to the interior of B (i) ∩ B (j) . V. · · · . The distinct edges will be enumerated as E1 . if a coarse space V0 is included. with subdomains of diameter h0 . Each connected and nonempty boundary segment Bij will be referred to as an edge.4. ⎥ ⎢ . We shall employ the notation: ⎧ (i) ⎨ B ≡ ∂Ωi \BD . 3. see [MA14]. . The interface B arising in the decomposition of a two dimensional domain can be partitioned based on edges and cross-points as follows: B = E1 ∪ · · · ∪ Eq ∪ V. Endpoints in B of open segments Bij will be referred to as vertices or cross-points. . . j ≤ p.. edges and cross-points will be globs. see Fig. ⎥ (3. . For Ω ⊂ IR2 . Ωp form a nonoverlapping box type decomposition of Ω ⊂ IR2 as in Fig. .4. If the indices of nodes on B are grouped and ordered based on the globs E1 . global transfer of information will be facilitated between the subdomains. and the collection of all vertices will be denoted V: V = B\ (E1 ∪ · · · ∪ Eq ). and this will reduce the dependence of the condition number on h0 . The term glob will refer to subregions of the interface which partition B. . .1) with Dirichlet boundary BD = ∂Ω. Let Ω1 .. ⎥ S=⎢ ⎢ T .4. A partition of Ω into 8 subdomains as h0 → 0+ .

Otherwise.1 Block Jacobi Preconditioner In two dimensions. . This is because in this case submatrix SEl El will correspond to a two subdomain Schur complement matrix. The block submatrices SEl V and SVV will typically have nonzero entries since there will be nodes in El adjacent to nodes in V. However. Since the Schur complement matrix S is not typically assembled in iterative substructuring methodology. Ωj and interface El .5. the diﬀerent submatrices of S in (3.69): ⎡ ⎤ SE1 E1 0 ⎢ . . by applying property (3. (3. ⎥ M =⎢ ⎢ ⎥. 3.70) for submatrix SEl El . This may be applied to yield: ⎡ ⎤T ⎡ (i) (i) ⎤−1 ⎡ ⎤ 0 AII 0 AIEl 0 ⎢ (j) ⎥ −1 = ⎣ 0 ⎦ ⎣ 0 AII AIEl ⎦ ⎣ 0 ⎦ . the submatrices SEl Ek will be zero.71) T T I (i) (j) AIEl AIEl AEl El rEl −1 so that the action of SE l El on a subvector can be computed at the cost of solv- ing the preceding linear system. arising from the decomposition of the subregion Ωi ∪ Ωj ∪ El into the subdomains Ωi . the action of submatrix SEl El on a subvector can be computed explicitly without assembly of SEl El when edge El = B (i) ∩ B (j) .72) i=1 using the interface restriction and extension matrices RG and RTG deﬁned in (3. In matrix form. .26) of Schur complement matrices. More generally. We now describe preconditioners. (3.. the block submatrices SEl Ek will be nonzero and dense. it can be noted that when edges El and Ek belong to a common subdomain boundary B (i) . such a preconditioner will correspond to the block diagonal of matrix (3. Since .5 Preconditioners in Two Dimensions 157 Here SEl Er . (j) SE l El rEl (3.69) will also not be assembled explicitly. This observation yields the formal expression: (i)T (i)−1 (i) (j)T (j)−1 (j) SEl El = AEl El − AIEl AII AIEl − AIEl AII AIEl . ⎥ ⎢ .19) between nodes on B and nodes on G = El or G = V. ⎥ ⎣ SEq Eq ⎦ 0 SVV The action of the inverse of the block Jacobi preconditioner satisﬁes: p −1 −1 M −1 ≡ RTEi SE i Ei REi + RTV SVV RV . 3. SEl V and SVV denote submatrices of S corresponding to indices in the respective globs. Eq and V. . a block Jacobi preconditioner for S can be deﬁned based on the partition of B into the globs E1 .

This observation. Approximation of SEl El . heuristically suggests replacing SVV by MVV = AVV . the diagonal blocks SEl El = REl SRTEl and SVV = RV SRTV of S must typically be approxi- −1 mated.5. Approximation of SVV .158 3 Schur Complement and Iterative Substructuring Algorithms the Schur complement matrix S will not be assembled.72) by a global coarse space residual correction term of the form RT0 S0−1 R0 . Due to its block diagonal structure. 3. DR14. and a ﬁve point stencil is employed. it can easily be veriﬁed that matrix SVV will be identical to the submatrix AVV of stiﬀness matrix A. and this results in a non-optimal convergence rate as h0 → 0. If M is the block Jacobi preconditioner and the subdomains are of diameter h0 . Such preconditioners must be scaled based on the coeﬃcient a(x) within the subdomains Ωi and Ωj . DR10]. Theorem 3. Neumann-Neumann. If there are nEl nodes on El then SEl El will be of size nEl . Proof. the interior solution in a rectangular subdomain will not depend on the nodal value on corner vertices. See [BR12. This does not require as- sembly of SEl El . and the subdomains are rectangular boxes. then there exists C > 0 independent of h0 and h: cond(M. this preconditioner can be obtained by replacing the local residual correction −1 term RTV SVV RV on the vertices V in the block Jacobi preconditioner (3. the action of SE l El on a vector can be computed exactly using (3. Alternate approximations MEl El of SEl El can be obtained by employing any two subdomain preconditioner for SEl El .2 BPS Preconditioner As with the block Jacobi preconditioner. the block Jacobi preconditioner does not globally exchange in- formation between the diﬀerent subdomains. We assume that the grid in quasi-uniform. FFT based or algebraic approximation based preconditioners. Formally.42. or alternatively. This is a consequence of the property that for ﬁve point stencils. fractional Sobolev norm. As a result. Choices of such preconditioners include Dirichlet-Neumann. the BPS preconditioner [BR12] also has the structure of a matrix additive Schwarz preconditioner for S. S) ≤ Ch−2 0 1 + log2 (h0 /h) . Since block submatrix SEl El corresponds to a two subdomain Schur −1 complement matrix by (3. If there are nV vertices in V then SVV will be of size nV .70) when edge El = B (i) ∩ B (j) . the block Jacobi preconditioner M ig- nores coupling between distinct edges and between the edges and the vertex set V. We shall . When Ω is a rectangular domain.71). The latter is easily seen to be diagonal. the action of the inverses of the submatrices SE l El −1 and SVV must be approximated. We outline how such approximations can be obtained. The block submatrix SVV can typically be approximated by a diagonal matrix based on the following heuristics.

will not be a matrix with zero-one entries. Ωp form a coarse triangulation Th0 (Ω) of Ω with elements of size h0 and nodes corresponding to the vertices in V. Consider a Poisson problem on a rectangular domain Ω partitioned . ⎥. we heuristically indicate why such an approximation can be em- ployed.74) φhn00 (x1 ) · · · φhn00 (xnB ). . The matrix S0 = R0 SRT0 associated with the coarse space is typically approximated by a coarse grid stiﬀness matrix A0 obtained by discretizing the underlying elliptic equation (3. φhn00 (x) the coarse grid nodal basis functions associated with these vertices. . and Dl was a suitably chosen diagonal matrix from (3. and the action of the inverse of the BPS preconditioner will have the form: q −1 M −1 ≡ RTEl SE l El REl + RT0 S0−1 R0 . . . given R0 . which have zero-one entries. the coarse space restriction matrix R0 is usually deﬁned when the subdomains Ω1 . Approximation of S E l E l .73) i=1 Unlike the restriction matrices REl and RV onto El and V. . . since the Schur complement matrix S is not assembled. and denote by φh1 0 (x). the matrix R0 whose row space deﬁnes the coarse space. Its transpose RT0 of size nB × n0 is an interpolation onto nodal values on B. Below. As with the block Jacobi preconditioner. where nB denotes the number of nodes on B.72). the coarse space restriction matrix R0 is deﬁned as the following n0 × nB matrix: ⎡ h0 ⎤ φ1 (x1 ) · · · φh1 0 (xnB ) ⎢ ⎥ R0 ≡ ⎢⎣ . xnB .64). In this case. Approximation of S 0 .73) can be replaced by any suitable two subdomain Schur complement preconditioner MEl El just as for the block Jacobi preconditioner (3. Suppose the nodes on B are enumerated as x1 . suitable approximations of the ma- trices SEl El = REl SRTEl and S0 = R0 SRT0 must be employed in the BPS preconditioner (3. 3. Below. we indicate various such approximations [BR12]. however. vn0 .1) on the coarse grid. In the original BPS algorithm. . the matrix S0 = R0 SRT0 will not be a submatrix of S. In applications. . Then.74). . When the region of support of Range(RT0 ) covers Ω. respectively. we enumerate the vertices in V as v1 .. . ⎦ (3. with a(k) corresponding to an evaluation of coeﬃcient a(x) at some point in Ωk .. . this can help reduce the dependence of the condition number of the preconditioned Schur complement matrix on h0 . · · · .5 Preconditioners in Two Dimensions 159 deﬁne R0 in (3. . The submatrix SEl El in (3. Heuristically. As a result. the residual correction term RT0 S0−1 R0 in the BPS preconditioner will transfer information globally between the diﬀerent subdo- mains. . SEl El was approximated by a preconditioner of the form (a(i) + a(j) )Ql Dl QTl where Ql was a discrete sine transform of size nEl . matrix S0 ≡ R0 SRT0 . . (3. .73).

Let φhl 0 (x) denote the coarse grid ﬁnite element nodal basis centered at vertex vl .160 3 Schur Complement and Iterative Substructuring Algorithms into subdomains Ωi which form a coarse triangulation Th0 (Ω) of Ω of size h0 . Then. the entries (A0 )ij of the coarse space stiﬀness matrix A0 will satisfy: .

3. Ωp form a coarse triangulation of Ω of size h0 . and thus also harmonic (and discrete harmonic) within each subdomain. DR10]. Let uB and wB denote the nodal vectors representing the coarse space nodal basis functions φhi 0 (x) and φhj 0 (x) on B. . For each vertex vl ∈ V.43. -T . Then. the logarithmic growth factor 1 + log2 (h0 /h) in the condition number of the BPS preconditioner arises because the BPS preconditioner does not approximate the coupling in S between diﬀerent edges El of B. each local residual correction term approximates coupling in S between edges adjacent to that vertex. wTB where E ≡ −A−1 II AIB denotes the discrete harmonic extension map from B into the interior ∪pi=1 Ωi of the subdomains. uTB and (EwB )T . DR14. Then. the vector representation of T T φhi 0 (x) and φhj 0 (x) on Ω will be given by (EuB )T . then C will also be independent of a(·). Let Th (Ω) denote a quasiuniform triangulation of Ω and let the subdomains Ω1 . by (3. (A0 )ij = A φhi 0 . φhj 0 . . referred to as vertex regions. φhj 0 ) = (A0 )ij .3 Vertex Space Preconditioner for S From a theoretical viewpoint. The vertex space preconditioner extends the BPS preconditioner by including local residual correction terms based on overlapping segments −1 of B. see [SM3]. wB uB This yields that A0 = S0 for this geometry and choice of coeﬃcients a(x). More generally. Consequently. - EwB EuB T (S0 )ij = wB SuB = A = A(φhi 0 . This holds because each coarse grid function φhl 0 (x) is linear within each subdomain. a vertex region Gl is a star shaped connected subset of B that contains segments of length O(h0 ) of all edges Er emanating from vertex vl . Proof.5. there exists C > 0 independent of h0 and h such that: cond(M.27). S) ≤ C 1 + log2 (h0 /h) . matrix A0 may be employed as an approximation of S0 . . See [BR12. It includes a local correction term of the form RTGl SG l Gl RGl involving nodal unknowns on regions Gl ⊂ B. . . The following result concerns the condition number of the BPS preconditioner. If a(·) is constant within each Ωi . it will hold that: . Theorem 3. By construction.

Matrix RGl will be of size nGl × nB when there are nGl nodes on vertex region Gl . one may approximate the action of −1 SG i Gi on a vector rGi as follows: T −1 −1 0 ADi Di ADi Gi 0 SG r i Gi G i ≈ . sparse approximations of SGi Gi can be computed eﬃciently using the probing technique.73). .5 Preconditioners in Two Dimensions 161 Formally. ∪ Gn0 ). we focus on the action of SG i Gi . . . Partition the nodes of Th (Ω) in Ωvi into those in Di ≡ Ωvi \Gi and those in Gi . MA37. In practice. (3.4. with an additional coarse space correction term. as deﬁned by (3. This will induce a block partitioning of the submatrix A(Ωvi ) of stiﬀness matrix A corresponding to all nodes in Ωvi : (Ωvi ) ADi Di ADi Gi A = . SM3].75) l=1 i=1 The resulting preconditioner has the structure of a matrix additive Schwarz preconditioner for S based on the overlapping decomposition: B = (E1 ∪ . Since matrix S is generally not assembled. . the vertex space preconditioner is obtained by adding the terms −1 RTGl SG l Gl RGl to the BPS preconditioner (3. . each matrix SGl Gl = RGl SRTGl will be a submatrix of S of size nGl corresponding to indices of nodes in Gl . 3. By construction. The vertex space preconditioner can be implemented like the BPS preconditioner. Consequently. yielding: q n0 −1 M −1 = RTEl SE l El REl + RT0 S0−1 R0 + −1 RTGi SG i Gi R Gi . Below. see Fig. it will be convenient to construct each vertex region Gl as the intersection of interface B with a subdomain Ωvl ⊃ vl of diameter O(h0 ) centered at vl . 3. Approximation of S Gi Gi . The following convergence bound will hold for the vertex space preconditioner. and have entries which are zero or one. by weighted sums of FFT based matrices.73). SGi Gi and S0 must be appropriately approximated to imple- ment the preconditioner. see [NE3.19). ∪ Eq ) ∪ V ∪ (G1 ∪ . of interface B. the matrices SEl El . and restriction matrix RGl will map a nodal vector on B to its subvector corresponding to nodes in Gl . ATDi Gi AGi Gi Using the above block partitioned matrix. or by use of inexact factorizations see [CH12. I ATDi Gi AGi Gi rGi Alternatively. Let Ωvi ⊂ Ω denote a subregion used to deﬁne the vertex region Gi ≡ B ∩ Ωvi . CA33]. each Gl will be a cross shaped or star shaped subregion of B. since these terms will be identical to −1 those in (3. The matrices SEl El and S0 can be approximated as described for the BPS preconditioner.

1) with BD = ∂Ω. eﬀective preconditioners can be constructed (see in particular. but may depend on the variation of a(·).7 on Neumann-Neumann preconditioners) by employing Schwarz subspace methods with more overlap between blocks of S. There also exists a constant C1 independent of h0 . Proof. 3. and the jumps in a(·) (provided a(x) is constant on each subdomain Ωi ). 3. Chap. each face will be two dimensional. However. By deﬁnition.6 Preconditioners in Three Dimensions The Schur complement matrix S for a three dimensional multi-subdomain decomposition is more diﬃcult to precondition than in two dimensions. cond(M. a vertex space preconditioner. DR10]. where C0 > 0 is independent of h0 . We consider a decomposition of Ω ⊂ IR3 into p non-overlapping box type or tetrahedral subdomains Ω1 . If the diameter of vertex subregions is β h0 . For the Dirichlet boundary value problem (3. which is the same growth as for the BPS preconditioner. the bounds for the 2condition number of the vertex space algorithm can deteriorate to 1 + log (h0 /h) . Typically.162 3 Schur Complement and Iterative Substructuring Algorithms Theorem 3. then the condi- tion number of the vertex space preconditioned system will satisfy: cond(M. They will typically be well deﬁned for tetrahedral or box type subdomains: ⎧ (i) ⎨ Fij ≡ int B ∩ B ⎪ (j) W (i) ≡ B (i) ∩ (∪j=i ∂Fij ) ⎪ ⎩ W ≡ W (1) ∪ · · · ∪ W (p) . Thus. in the presence of large jumps in the coeﬃcient a(·). This diﬃculty arises due to the more complex geometry of the interface B in three dimensions. h. and deﬁne the interface as B = B (1) ∪ · · · ∪ B (p) . See [SM. these subdomains will be assumed to form a quasi- uniform coarse triangulation Th0 (Ω) of Ω. and is referred to as a face of Ωi when it is nonempty. and a par- allel wirebasket preconditioner. Here Fij = int B (i) ∩ B (j) denotes the interior of region B (i) ∩ B (j) . The subregion W (i) of B (i) is referred to as a local . we let B (i) ≡ ∂Ωi \BD denote the non-Dirichlet segment of ∂Ωi . Our discussion of three dimensional preconditioners will focus on several block Jacobi preconditioners for S.44. The diﬀerent additive Schwarz matrix preconditioners we shall consider for S will be based on a decomposition of the interface B into the following subregions of B referred to as globs [MA14]. Ωp . having diameter h0 . · · · . h and β. S) ≤ C1 1 + log2 (h0 /h) . S) ≤ C0 (1 + β −1 ).

Edges and vertices can be expressed formally as: ⎧ ⎨ Eijk ≡ int F ij ∩ F ik ⎪ V ≡ W \ (∪i. Fq where q denotes the total number of faces. as in two dimensions. Fl will correspond each face uniquely to some nonempty intersection int B (i) ∩ B (j) . as deﬁned in (3. In practice. . Given a glob G ⊂ B containing nG nodes of Ωh .k Eijk ) ⎪ ⎩ = {vl : vl ∈ V}.j. . The union of all local wirebaskets is referred to as the global wirebasket.45. Er where r denotes the total number of such edges. .6 Preconditioners in Three Dimensions 163 Vertex Gk vk ⊗ ⊗ @ @ @ @ Edge @ @ El @ @ Face @ @ @ @@ Fij . The collection of all vertices will be denoted V. Deﬁnition 3. it will be convenient to decompose the wirebaskets into smaller globs. We deﬁne vertices as endpoints of edges.19). @ @ @ @ @ Ωi Ωi El Ωi Fig. 3. Boundary segments and vertex regions for three dimensional subdomains wirebasket of Ωi and is the union of the boundaries ∂Fij of all faces of Ωi .5 for an individual subdomain. each wirebasket will be connected and the union of several one dimensional segments. The entries of these glob based restriction and extension matrices will be zeros or ones. we may partition the interface B into the following globs: B = F1 ∪ · · · ∪ Fq ∪ W. . . . . The above mentioned subregions are indicated in Fig. By deﬁnition. we shall enumerate all the faces in B as F1 . We deﬁne an edge as a maximal line segment of a local wirebasket. and we enumerate all the nonempty edges as E1 . 3. we let RG ≡ RG RB T denote a restriction matrix of size nB × nG which restricts a nodal vector on B to a subvector corresponding to nodes on G. and by construction. . Typically. . By deﬁnition each edge will be open. 3. vn0 . . Similarly. Its transpose RTG ≡ RB RG T will extend a vector of nodal values on G to a vector of nodal values on B (extension by zero).5. where n0 will denote the total number of vertices in W . In applications. we enumerate all the vertices as v1 . . . . homeomorphic to an open interval.

⎥.. ⎥ ⎣ SFq Fq ⎦ 0 SW W In terms of restriction and extension matrices. which we shall denote as RW (i) W . If nGl denotes the number of nodes on glob Gl . . ..46. (3. Its transpose RTW (i) W ≡ RW RW T (i) will (i) extend a vector of nodal values on W to a vector of nodal values on W (extension by zero). 3.1 Block Jacobi Preconditioner for S We ﬁrst describe a block Jacobi preconditioner based on the decomposition of B into the faces F1 . Fq and the wirebasket W : B = F1 ∪ · · · ∪ Fq ∪ W. . . Deﬁnition 3. . . ⎥ ⎢ .6. .76) l=1 where SFl Fl = RFl SRTFl and SW W = RW SRTW are submatrices of S corre- sponding to indices in Fl and W . This nonoverlapping decomposition of B induces a block partition of S as: ⎡ ⎤ SF1 F1 · · · SF1 Fq SF1 W ⎢ . we will on occasion employ an additional restriction map. . The block Jacobi preconditioner will be the block diagonal part of S: ⎡ ⎤ SF1 F1 0 ⎢ .164 3 Schur Complement and Iterative Substructuring Algorithms In the three dimensional case. ⎥ M =⎢ ⎢ ⎥. As with the other Schwarz preconditioners for S. ⎥ ⎣ SF1 Fq · · · SFq Fq SFq W ⎦ SFT1 W · · · SFTq W SW W corresponding to indices of nodes within the chosen subregions of B. then SGi Gj will denote a submatrix of S of size nGi × nGj corresponding to the nodes on glob Gi and Gj . in practice the submatrices SFl Fl and SW W of S must be replaced by suitable approximations since S will typically not be assembled. Let RW (i) W ≡ RW (i) RW T denote the matrix which restricts a vector of nodal values on the global wirebasket W into a subvector of nodal values on the local wirebasket W (i) . ⎥ ⎢ . Various alternative approximations may be chosen for such approximations... . . ⎥ S=⎢ ⎢ T . the action of the inverse M −1 the block Jacobi preconditioner will be: q M −1 = RTFl SF−1 l Fl −1 RFl + RTW SW W RW ..

for the seven point stencil. and a seven point stencil is used for the ﬁnite element discretization of (3. -T (i) (i) −1 . 3. Replacing the submatrices SFl Fl and SW W by the preceding approximations will yield an approximate block Jacobi preconditioner for S. at most seven entries of the form (AW W )ij will be nonzero when xi ∈ V. Alternative preconditioners for SFl Fl may be obtained using algebraic approximation of SFl Fl based on generalization of the tridiagonal probing procedure [KE7. An approximation of SW W = RW SRTW can be based on the following heuristic observation. and the subdomains are boxes.47.6 Preconditioners in Three Dimensions 165 Approximation of S F l F l . CH9] or ILU [CA33]. If the triangulation of Fl induced by Ωh can be mapped bijectively into a rectangular grid. Indeed. then we may employ an FFT based preconditioner of the form SFl Fl ≈ (a(i) + a(j) )QDQT where Q is a two dimensional fast sine trans- form and D is a diagonal matrix approximating the eigenvalues of a reference Schur complement matrix SˆFl Fl associated with a three dimensional cubical domain partitioned into two strips [RE].1). I AIFl AFl Fl rFl A Neumann-Neumann preconditioner will also approximate SFl Fl . (j) l Fl rFl (3. AW W may still be used as an approximation of SW W . Alternatively. the piecewise discrete harmonic extension of nonzero nodal values on W and zero nodal values on B\W will be zero in the interior of the subdomains. Ωj and Fl . while at most three entries of the form (AW W )ij will . Here a(l) denotes the coeﬃcient a(x) evaluated at a sample point of Ωl . then SFl Fl will corre- spond to a two subdomain Schur complement associated with the partition of Ωi ∪ Fl ∪ Ωj into Ωi and Ωj .26). This can be veriﬁed by using the property that for seven point stencils the nodal values on the wire- basket will not inﬂuence the interior Dirichlet solution in a box subdomain. When the geometry of the subdomains is more general. As a consequence. If Fl = int B (i) ∩ B (j) . The desired property that SW W = AW W will now follow from (3. Eﬃcient sparse solvers may be employed to solve systems of the form AW W uW = rW . Remark 3. since AW W will typically be sparse. a Dirichlet-Neumann preconditioner may be employed. then matrix SW W = AW W . Approximation of S W W .77) T T I (i) (j) AIFl AIFl AFl Fl rFl where the above blocks are submatrices of A corresponding to indices in Ωi . Consequently. - (i)−1 0 AII AIFl 0 SFl Fl rFl ≈ (i)T (i) . for instance based on subdomain Ωi : . When Ω is rectangular. the action SF−1 l Fl rFl may be computed exactly as follows: ⎡ ⎤T ⎡ (i) (i) ⎤−1 ⎡ ⎤ 0 AII 0 AIFl 0 ⎢ (j) ⎥ SF−1 = ⎣0⎦ ⎣0 AII AIFl ⎦ ⎣ 0 ⎦.

76). Lemma 3. for some C > 0 independent of h and h0 . the coarse space restriction matrix R0 is deﬁned analogous to (3. As the preceding theorem indicates. . can be improved by including some global transfer of information. l=1 into (3. and the action of −1 RTW SW W RW can be approximated by the following matrix additive Schwarz preconditioner: r −1 RTW SW W RW ≈ RTEl A−1 T −1 El El REl + RV AVV RV .166 3 Schur Complement and Iterative Substructuring Algorithms be nonzero when xi ∈ El .76) for S deteriorates as the subdomain sizes h0 becomes small. Proof. . l=1 l=1 (3. (3.76) and replace the local correction term R−1 −1 V SVV RV on the vertices −1 −1 V by a coarse space correction term R0 S0 R0 to obtain the preconditioner: q r M −1 = RTFl SF−1 l Fl RFl + −1 RTEl SE l El REl + RT0 S0−1 R0 . the wirebasket W can be further decomposed into the edges E1 . The following result concerns the convergence rate associated with (3. . . To obtain the ﬁrst variant. the band width of AW W will depend on the ordering of nodes within W . However. The condition number of the Schur complement matrix pre- conditioned by the block Jacobi preconditioner (3. Er and the vertex set V. A second variant is obtained by adding the local correction term R−1 −1 V SVV RV yielding: q r M −1 = RTFl SF−1 l Fl RFl + −1 RTEl SE l El REl + R−1 −1 T −1 V SVV RV + R0 S0 R0 .79) The resulting preconditioners satisfy the following bounds.74). We next describe two variants of the block Jacobi preconditioner (3. DR10]. the convergence rate of block Jacobi preconditioner (3. however. In practice. This deterioration arises primarily because this block Jacobi precon- ditioner exchanges information only locally for the chosen diagonal blocks in the block partition of S. we substitute the approximation: r −1 −1 −1 RTW SW W RW ≈ RTEl SE l El REl + RTV SVV RV . .76) satisﬁes: cond (M.48. S) ≤ Ch−2 2 0 (1 + log(h0 /h)) . with coarse grid nodal basis functions φhi 0 (x) corresponding to each vertex vi ∈ V. l=1 A variant of the block Jacobi preconditioner employs such an approximation. See [BR15. This convergence rate.78) l=1 l=1 Here. and S0 = R0 SRT0 .76) incorporating coarse space correction [DR10].

Approximation of S E l E l . As we have already described approximations of SFi Fi . Proof. the scaling factor for SEl El must be proportional to h−2 instead of h. The vertex space preconditioner [SM3] for S. For smooth coeﬃcients. but may depend on the coeﬃcient a(x). and may be approximated as follows: (SVV )ii ≈ h σi .79) satisﬁes: cond(M. S) ≤ C2 (1 + log(h0 /h))2 . for ﬁnite element discretizations.79) will yield better convergence than preconditioner (3. For ﬁnite diﬀerence discretizations. The preconditioner M in (3.78) satisﬁes the bound: h0 cond(M.78). h and jumps in the coeﬃcient a(x). See [DR10]. SEl El . The coarse space matrix S0 = R0 ART0 can be ap- proximated by A0 as in two dimensions. we shall only focus on the other terms. To obtain a heuristic approximation of SEl El . Approximation of S VV . h while the bound for the preconditioner M in (3.50. where C1 is independent of h0 . where σi denotes a suitably weighted average of the coeﬃcients a(·) in subdomains adjacent to vertex vi .6 Preconditioners in Three Dimensions 167 Theorem 3. due to elimination of (h0 /h).6. To obtain an approximation of SVV . 3. For ﬁnite diﬀerence schemes.49. SVV and S0 must be replaced by suitable approximations since S is not assembled in practice. we approximate SW W ≈ AW W as described earlier to obtain SEl El ≈ AEl El . It is easily veriﬁed that the edge matrix AEl El will be well conditioned and may eﬀectively be replaced by a suitably scaled multiple of the identity matrix: SEl El ≈ h σEl IEl . As with the other matrix Schwarz preconditioners for S. the submatrices SFi Fi . the scaling factor must be h−2 instead of h. where σEl represents the average of the coeﬃcients a(·) in the subdomains adjacent to edge El . The submatrix AVV will also be diagonal.2 Vertex Space Preconditioner for S The diﬀerent variants of the block Jacobi preconditioner are nonoptimal. 3. incorporates some of this coupling by including . preconditioner (3. This arises due to the elimination of the oﬀ diagonal blocks in S. S) ≤ C1 (1 + log(h0 /h))2 . Approximation of S 0 . Remark 3. again we em- ploy the approximation SW W ≈ AW W to obtain SVV ≈ AVV . while C2 is independent of h0 and h.

Additionally. respectively. 3. SEi Ei = REi SRTEi and SGk Gk = RGk SRTGk are submatrices of S corresponding to indices of nodes on Fl .168 3 Schur Complement and Iterative Substructuring Algorithms subspace correction terms on overlapping globs containing segments of faces adjacent to each vertex vl and to each edge El . see [SM3. Ei and Gk . Ei and Gk . Ei and Gr : B = (F1 ∪ · · · ∪ Fq ) ∪ (E1 ∪ · · · ∪ Er ) ∪ (G1 ∪ · · · ∪ Gn0 ).5. for 1 ≤ k ≤ n0 . yielding improved bounds. we deﬁne the restriction maps RFl . We outline below such approximations. The action M −1 of the vertex space preconditioner is then: q r n0 M −1 = RTFl SF−1 l Fl RFl + RTEi SE−1 i Ei REi + −1 RTGk SG k Gk RGk +RT0 S0−1 R0 . MA38]. The overlapping decomposition of B employed in the vertex space preconditioner can be expressed in terms of Fl . The three dimensional vertex space preconditioner is based on an overlap- ping extension of the following partition of B: B = (F1 ∪ · · · ∪ Fq ) ∪ (E1 ∪ · · · . The action of SE−1l El on a vector rEl can be approx- imated as follows. Approximation of S F l F l . Approximation of S El El . a coarse space correction term based on a coarse space is em- ployed. Such restriction maps are deﬁned by (3. Corresponding to each glob Fl . see Fig. the matrices SFl Fl .19) with zero-one entries so that SFl Fl = RFl SRTFl . Er ) ∪ (v1 ∪ · · · ∪ vn0 ). 3. Similarly.5 for segments of El within a subdomain Ωi . A section of glob Gk restricted to subdomain Ωi is illustrated in Fig. partition . a cylindrical subdomain ΩEl ⊃ El of width O(h0 ) is employed to deﬁne: El ≡ ΩEl ∩ B.80) As with the other matrix Schwarz preconditioners for S. We shall omit further discussion of it here. for 1 ≤ l ≤ r. SEi Ei SGk Gk and S0 must be approximated without explicit construction of S. Formally. Additionally. Formally. each vertex vk is extended to a glob Gk of width O(h0 ) containing segments of all faces adjacent to vertex vk . The action of SF−1 l Fl on a vector can either be com- puted exactly or approximately. Given the domain ΩEl such that El = B ∩ ΩEl . as described for block Jacobi preconditioners. R0 will denote a coarse space matrix deﬁned by (3. REi and RGk which restrict a vector of nodal values on B to the nodes on Fl . l=1 i=1 k=1 (3.74). Each edge El is extended to a glob El which includes segments of all faces adjacent to this edge. a domain Ωvk ⊃ vk of size O(h0 ) centered about vertex vk is employed to deﬁne glob Gk : Gk ≡ B ∩ Ωv k . Ei and Gk .

DR10]. −1 Approximation of S Gk Gk .6. S) ≤ C2 (β) . DR10]. the action of SG k Gk may be approximated as: −1 −1 0 AHk Hk AHk Gk 0 SG k Gk rGk ≈ . These preconditioners are typically . the action of SE−1 l El may be approximated as: −1 −1 0 ADl Dl ADl El 0 SEl El rEl ≈ .51. −1 Then. SM2. Proof. Let A(Ωvk ) denote the submatrix of corresponding to nodes in Hk and Gk . h and a(·). based on the wirebasket region of the interface. Let A(ΩEl ) denote the submatrix of A corresponding to indices of nodes in Dl and El . but depending on the coeﬃcients a(·) such that: cond(M. The coarse space matrix S0 = R0 SRT0 can be ap- proximated by coarse grid stiﬀness matrix A0 as in the two dimensional case. MA12. DR3. Approximation of S 0 . h where C2 > 0 is independent of h0 . 3. The rate of convergence of the vertex space preconditioner will be of optimal order provided the globs {El } and {Gk } have suﬃcient overlap of size β h0 when the coeﬃcients a(·) is smooth. Then. S) ≤ C1 1 + log2 (β −1 ) . Partition the nodes in Ωvk based on Hk ≡ Ωvk \Gk and Gk . There exists C1 > 0 independent of h0 and h.6 Preconditioners in Three Dimensions 169 the nodes in ΩEl into Dl ≡ ΩEl \El and El . I ATHk Gk AGk Gk rGk Alternative matrix approximations of SGk Gk may be constructed based on extensions of the probing technique or inexact Cholesky decomposition. but has large jumps across subdomains. Theorem 3. BR15. The action of SG k Gk on a vector rGk can be approximated as follows. If the coeﬃcient a(·) is constant on each subdomain. I ATDl El AEl El rEl Alternative approximations of SEk Ek can be constructed based on extensions of the probing technique or based on inexact Cholesky factorizations. then the above bound deteriorates to: h0 cond(M. See [SM. Let Ωvk denote a domain of width O(h0 ) such that Gk = B ∩ Ωvk . 3.3 A Parallel Wirebasket Preconditioner for S Wirebasket methods for the Schur complement S are preconditioners which employ special coarse spaces [BR14.

if xi ∈ W IW vW i = (3. It can thus be veriﬁed that its transpose IW satisﬁes: .76).81) i=1 If nW and nB denote the number of nodes on the wirebasket region W and interface B. Due to a weaker discrete Sobolev inequality holding for traditional coarse spaces in three dimensions. We ﬁrst deﬁne the extension map IW T .82) j:xj ∈∂Fl (vW )j . Typically however. Once the coarse space given by Range IW has been deﬁned. the wirebasket extension map IW T is deﬁned as the following nB × nW matrix: T (vW )i . where RW is a point- wise nodal restriction matrix with zero-one entries. then IW will be a matrix of size nW × nB and SW B will be a symmetric T positive deﬁnite matrix of size nW . a suitable matrix approximation MW B ≈ SM B ≡ IW S IW T must also be speciﬁed. respectively. 1 n∂F l where xi is a node on W with index i in the local ordering of nodes on B.170 3 Schur Complement and Iterative Substructuring Algorithms formulated to yield robust convergence in the presence of large jump discontinuities in the coeﬃcient a(x). the parallel wirebasket pre- −1 conditioner employs a coarse space correction term of the form IW T SW B IW based on a weighted restriction matrix IW whose rows span the wirebasket coarse space. improved bounds can be obtained. Then. wirebasket coarse spaces help transfer information globally between diﬀerent subdomains. Once IW is deﬁned. The parallel wirebasket preconditioner [SM2] we describe has the form of a matrix additive Schwarz preconditioner for S. with rates of convergence that compare favorably with those for the block Jacobi and vertex space preconditioners. it is based on a partition of the interface into faces and the wirebasket: B = F1 ∪ · · · ∪ Fq ∪ W.76) by the wirebasket coarse −1 space correction term IW SW B IW T where SW B ≡ IW SIW T : q M −1 = RTFi SF−1 i Fi T −1 RFi + IW S W B IW . but involve sig- niﬁcantly more unknowns. With the use of an appropriately chosen wirebasket coarse space. −1 However. Like the preconditioner (3. unlike (3. By deﬁnition. theoretical bounds for the latter two precondi- tioners deteriorate in the presence of large jump discontinuities in a(x). an eﬃcient algebraic solver can be formulated to solve the resulting coarse problems. Let ∂Fl ⊂ W denote the boundary segment of face Fl and let n∂Fl denote the number of nodes on ∂Fl . Like traditional coarse spaces. to ensure that linear systems of the form MW B uW = rW can be solved eﬃciently within the wirebasket preconditioner. the extension IW T vW i equals the average nodal value of vW on ∂Fl when node xi ∈ Fl . if xi ∈ Fl . We shall describe IW T and MW B in the following.76) which employs a local correction term RTW SW W RW corresponding to the nodes on the wirebasket region W . the wirebasket preconditioner is obtained −1 by formally replacing the term RTW SW W RW in (3. (3.

To construct a heuristic approximation MW B of SW B . Indeed.84) yields: p (i) SW B = RTW (i) W SW B RW (i) W . . and so will not be described further.83) n∂Fk {k:xi ∈∂Fk } {j:xj ∈Fk } which yields a weighted combination of the nodal values of vB on B.84) i=1 Using deﬁnition (3. (3.85) where EB (i) W (i) vW (i) is deﬁned next. (3. if xk ∈ W (i) (EB (i) W (i) vW (i) )k ≡ 1 n∂Fl j:xj ∈∂Fl (vW (i) )j . (vW (i) )k . it can be veriﬁed that the extension (interpolation) map IWT acts locally on each subdomain boundary. This expresses SW B as a sum of local (i) (i) contributions. Given a local approximation MW B of SW B . 3. (3. the matrices SFl Fl = RFl SRTFl and SW B ≡ IW SIWT must be approximated in practice. (3. A symmetric positive deﬁnite approximation MW B of SW B and an associated algebraic solver for linear systems of the form MW B vW = rW will be formulated in the remainder of this subsection. an approximation (i) (i) MW B of SW B can be constructed by replacing SW B by MW B in (3.82). Since the Schur complement S is not assembled in iterative substructuring. Remark 3. Symmetric positive deﬁnite approximations of the submatrices SFl Fl of S have already been described in the section on block Jacobi preconditioners. we consider the subassembly identity for the Schur complement matrix: p (i)T (i) S= RB S (i) RB .52. i=1 Substituting this identity into SW B = IW SIW T yields: p (i)T (i) SW B = IW RB S (i) RB IW T .86) i=1 (i) where SW B ≡ EB T (i) W (i) S (i) EB (i) W (i) .86). Substituting this into (3. if xk ∈ Fl ⊂ B (i) . Thus EB (i) W (i) RW (i) W = RB (i) IW T . yielding the following identity on each boundary B (i) : (i) EB (i) W (i) vW (i) = RB IW T vW .6 Preconditioners in Three Dimensions 171 (vB )j (IW vB )i = (vB )i + . the nodal values of IW vW on each subdomain boundary B (i) can be expressed solely in terms of T the nodal values of vW on the wirebasket W (i) .

However. when T c(. .) = 0 and a(x) ≡ a on Ωi and Ωi is immersed. . . . then the local Schur complement S (i) (and consequently SW B ) will scale in proportion to coeﬃcient a(i) . (i) Employing these heuristic observations. In particular.87) zTW (i) D(i) zW (i) zTW (i) zW (i) corresponding to a D(i) -orthogonal projection onto the null space span(zW (i) ) (i) (i) of SW B . Then. This yields the choice of MW B as: (i) MW B = (I − Pi )T D(i) (I − Pi ) = β a(i) (I − Pi ).1). when c(. we will require that (i) (i) each MW B be spectrally equivalent to SW B independent of a(·). Combining the preceding observations yields a global approximation MW B ≈ SW B based on the local approximations (i) (i) MW B ≈ SW B as: . As a (i) consequence. . The fol- lowing heuristic observations will be employed when a(·) is piecewise con- stant. . then S (i) (and also SW B ) will be singular. T where vector (1.88) ωi where ωi is a parameter chosen to minimize the above expression. we shall post-multiply and pre-multiply (i) matrix D(i) and deﬁne MW B = (I − Pi )T D(i) (I − Pi ) where Pi is deﬁned as: zW (i) zTW (i) D(i) zW (i) zTW (i) Pi ≡ = . (i) Matrix MW B may also be equivalently characterized by the requirement: (i) T vTW (i) MW B vW (i) = min (vW (i) − ωi zW (i) ) D(i) (vW (i) − ωi zW (i) ). (3. Secondly. 1) (i) (i) denote a vector of size nW (i) corresponding to the number of nodes on W . if Ωi is immersed in Ω. . Theoretical analysis [SM2. its extension EB (i) W (i) zW (i) of size nB (i) will satisfy: T EB (i) W (i) zW (i) = (1. . B (i) = ∂Ωi .) = 0 in elliptic equation (3. and a(x) ≡ a(i) on (i) each Ωi . (i) i. Thirdly. SW B will be singular when S (i) is singular. (3. 1) .. .172 3 Schur Complement and Iterative Substructuring Algorithms (i) (i) To construct an approximation MW B of SW B so that MW B is spectrally equivalent to SW B independent of the coeﬃcient a(·). This can easily be veriﬁed. DR10] suggests choosing the scaling factor as β = h (1 + log(h0 /h)). . .e. since SW B will scale in proportion to coeﬃcient a(i) . we may seek to approximate SW B by a scalar multiple D(i) = β a(i) I of the identity matrix of size nW (i) for a (i) (i) scaling factor β > 0 to be speciﬁed. and zW (i) will span (i) its null space. let zW (i) ≡ (1. . Firstly. to ensure that SW B and MW B also both have the same null spaces. it (i) will be necessary to choose MW B also proportional to a(i) to ensure spectral (i) (i) equivalence between MW B and SW B independent of {a(l) }. 1) of size nB (i) generates the null space of S (i) .

3.90) = min(ω1 . As a result. .ωp ) i=1 (vW (i) − ωi zW (i) ) D(i) (vW (i) − ωi zW (i) ).89) and (3.89) = i=1 RTW (i) W (I − Pi )T D(i) (I − Pi )RW (i) W . (vW .54. In this case zW (i) will need to be replaced by a matrix whose columns are restrictions to W (i) of a basis for the null space of S (i) .91) vW where J(uW ) ≡ 1 T 2 vW MW B vW − vTW rW is its associated energy. ωp ) p T ≡ 1 2 i=1 (vW (i) − ωi zW (i) ) D(i) (vW (i) − ωi zW (i) ) − vTW rW . Remark 3. . .. and will also be positive semideﬁnite since vTW MW B vW is a sum of nonnegative quadratic forms.88) as satisfying: vTW MW B vW p T (3. .55. We now describe an algebraic solver for MW B uW = rW . This can be veriﬁed to hold for nonzero αi only if SW B is singular. By construction. . . where vW (i) = RW (i) W vW . the solution uW to the linear system MW W uW = rW will also solve the following minimization problem: J(uW ) = min J(vW ). . Remark 3. ωp ). (3.ω1 . . .6 Preconditioners in Three Dimensions 173 p (i) MW B = i=1 RTW (i) W MW B RW (i) W p (3.87). where D(i) = h (1 + log(h0 /h)) a(i) I. matrix MW B is symmetric. . . Matrix MW B may also be equivalently characterized using (3. for 1 ≤ i ≤ p.53. .. Since MW B will be a symmetric and positive deﬁnite matrix. ω1 . This alternative expression will be useful in con- structing an eﬃcient solver for linear systems of the form MW B vW = rW . the null space of S (i) may have several linearly independent vectors.. ωp∗ = min J˜ (vW . A vector vW will belong to the null space of MW B only if: MW B vW = 0 ⇔ RW (i) W vW = αi zW (i) . MW B will be positive deﬁnite whenever SW B is positive deﬁnite. .91) will thus also be equivalent to: J˜ uW . ω1∗ . and Pi is deﬁned by (3.··· ..ωp ) where J˜ (vW . Remark 3. The minimization of (3. For elliptic systems such as the equations of linear elasticity. J(vW ) ≡ 12 vTW MW B vW − vTW rW p T = 12 i=1 minωi (RW (i) W vW − ωi zW (i) ) D(i) (RW (i) W vW − ωi zW (i) ) − vTW rW . ω1 .

i=1 with diagonal entries: (DW B )ii = a(k) h (1 + log(h0 /h)). . . . We summarize the implementation of the parallel wirebasket preconditioner for S. p ∗ T (3. .92): ⎡ ⎤⎡ ∗⎤ ⎡ ⎤ K11 · · · K1p ω1 g1 ⎢ . (3. (i) i=1 A reduced system can thus be obtained for the parameters ω1∗ .174 3 Schur Complement and Iterative Substructuring Algorithms Applying the ﬁrst order derivative conditions for a minimum (diﬀerentiating the above expression with respect to vW and ω1 . ⎥ = ⎢ .. ωp∗ .93) Matrix K can be veriﬁed to be symmetric and sparse.92). (i) where DW B is the following diagonal matrix of size nW : p DW B ≡ RTW (i) W D(i) RW (i) W . T (i) ⎪ ⎪ ⎩ −1 gi ≡ zTW (i) D(i) RW (i) W DW B rW .. ⎦ ⎣ . For each choice of parameters ω1∗ . ⎦⎣ . . ⎥ ⎢ . for i = j ⎨ T −1 Kii ≡ zW (i) D zW (i) − zW (i) D RW (i) W DW (i) T (i) B RW (i) W D zW (j) . ⎣ .92): ! p −1 ∗ T uW = DW B rW + ωi RW (i) W D zW (i) . {k:vi ∈B (k) } An eﬃcient solver for MW B uW = rW can be formulated by solving (3.92) DW B uW − i=1 ωi RW (i) W D zW (i) = rW . ωp∗ by substituting the preceding expression for uW into the ﬁrst block row in (3. ⎧ −1 ⎪ ⎪ Kij ≡ −zTW (i) D(i) RW (i) W DW B RW (j) W D T (j) zW (j) .. and the preceding linear system can be solved using any suitable sparse direct solver. . ωp and requiring it to equal zero) yields the following system of equations: zTW (i) D(i) (RW (i) W uW − ωi∗ zW (i) ) = 0. . . . ⎦ K1p · · · Kpp ωp∗ gp where the entries Kij and gi are deﬁned as follows. . . the vector unknown uW can be determined by solving the second block row in (3. ⎥. . for 1 ≤ i ≤ p. .. . .

The solution to MW B uW = IW rB can be computed as fol- lows. ⎥.6. DR10]..57. including an algorithm with condition number (1 + log(h0 /h)). . The following result concerns the convergence rate of the preceding parallel wirebasket algorithm. . 3. . solve for uW : ! p DW B uW = rW + ωi RTW (i) W D(i) zW (i) . See [SM2.1 (Wirebasket Preconditioner) q M −1 rB ≡ RTFk SF−1 k Fk RFk rB + IW T −1 MW B IW rB . h and a(·) such that cond(M. Theorem 3. Remark 3. ωp∗ : ⎡ ⎤⎡ ∗⎤ ⎡ ⎤ K11 · · · K1p ω1 g1 ⎢ . Alternate wirebasket algorithms are described in [BR15. .. these preconditioners solve a Neumann problem on each subdomain. ⎦ K1p · · · Kpp ωp∗ gp where the entries Kij and gj are deﬁned in (3.93). ⎥ = ⎢ . ⎦ ⎣ . i=1 The terms SF−1 k Fk RFk rB can be computed as described for the block Jacobi preconditioner.. S) ≤ C(1 + log(h0 /h))2 . without the requirement that the subdomains be boxes or tetrahedra.7 Neumann-Neumann and Balancing Preconditioners Neumann-Neumann and balancing domain decomposition methods are a widely used family of preconditioners for multisubdomain Schur complement matrices in two and three dimensions. If the coeﬃcient a(·) is constant within each subdomain. Furthermore. The heuristic approximation MW B of SW B assumed that the coeﬃcient c(x) = 0. 3.. . ⎦⎣ . ⎣ .7 Neumann-Neumann and Balancing Preconditioners 175 Algorithm 3. omitting such terms will remove the mechanism for global transfer of information. Secondly. Firstly. Proof. ⎥ ⎢ . such preconditioners have an algebraic form that may be applied to arbitrary subdomain geometries in two or three dimen- sions. there exists C > 0 independent of h0 . In practice the same matrix MW B described above (based (i) on the vectors zW (i) ) can be employed even when c(x) = 0 though SW B will not be singular in such a case. DR10]. . i=1 This yields uW . and hence the name. MA12. From a computational viewpoint. Indeed. using rW ≡ IW rB solve for ω1∗ . .56.

independent of the jump discontinuities in the coeﬃcient. MA17]. Given a decomposition Ω1 . If no coarse space is employed. a Neumann-Neumann preconditioner has the structure of an additive Schwarz preconditioner for S. . (3. Instead the following may be noted. Diﬀerent coarse spaces facilitating global transfer of information are also employed in each preconditioner. -T (i) (i) −1 . terms of the form S (i) rB (i) can be computed by solving the linear system S (i) wB (i) = rB (i) corresponding to a discrete Neumann problem on Ωi : . based on the decomposition of B into the overlapping boundary segments B (1) . the preconditioner has the form: p † M −1 = RTB (i) S (i) RB (i) . DR14. SB (i) B (i) ≡ RB (i) SRTB (i) submatrix of S is approximated by the subdomain Schur complement S (i) . DE2. . Remark 3. deﬁned in (3. .7.176 3 Schur Complement and Iterative Substructuring Algorithms Theoretical analysis indicates that these methods precondition eﬀectively. . with restriction and extension matrices RB (i) and RTB (i) re- spectively. We also outline an algebraic preconditioner [CA33] based on the Neumann-Neumann preconditioner. while the balancing domain decomposition preconditioner has the structure of a hybrid Schwarz preconditioner for S. 3. - (i) −1 0 AII AIB 0 S rB (i) = (i)T (i) . In this case.95) i=1 † where S (i) denotes the Moore-Penrose pseudoinverse [GO4] of the local Schur complement matrix S (i) . where B (i) ≡ ∂Ωi \BD . . B (p) . It has the formal structure of an additive Schwarz subspace preconditioner for S. DR18. LE. 3. DE3.19). Ωp of Ω. and the balancing domain decomposition preconditioner [MA14. yielding condition number bounds which grow polylogarithmic in the mesh parameters.1 Neumann-Neumann Preconditioners Multi-subdomain Neumann-Neumann preconditioners are extensions of the two subdomain Neumann-Neumann preconditioner from Chap. both preconditioners decompose the interface B into the segments: B = B (1) ∪ · · · ∪ B (p) . From the viewpoint of Schwarz subspace methods. (3. . DR16. Our discussion will focus on the family of Neumann-Neumann preconditioners [BO7.94) Both preconditioners employ the subdomain Schur complement matrix S (i) to approximate the unassembled submatrix SB (i) B (i) = RB (i) SRTB (i) of S. then S (i) = S (i) . Since S is not assembled. the local Schur complement S (i) need not be assembled. I AIB ABB rB (i) . When matrix S (i) † −1 † is nonsingular. . unlike SB (i) B (i) . since S (i) can be singular.58.4. corresponding to the nodes on B (i) . LE5]. . In practical implementation.

1T 1 Alternatively. if a local problem is singular. . we omit a coarse space correction term. elegantly addresses the issue arising with singular local problems and its non-unique solution.7 Neumann-Neumann and Balancing Preconditioners 177 However. MA17]. p in parallel solve: . As a result. We shall next describe a Neumann-Neumann preconditioner employing an algebraic partition of unity. then matrices A(i) and S (i) will be singular. When this compatibility condition is satisﬁed. As noted earlier. - (i) (i) (i) AII AIB wI 0 = . described later. For convenience. Algorithm 3. the linear system S (i) wB (i) = rB (i) will be solvable only if rB (i) satisﬁes the compatability condition: 1T rB (i) = 0. RB (i) rB T (i) (i) (i) AIB ABB wB 2. 1. However. as follows. as any scalar multiple of 1 may be added to it.7. the Neumann- Neumann preconditioner adds duplicates of the solution on the regions of .. 1) of appropriate sizes. . the null space of A(i) and T S (i) will be spanned by vectors of the form 1 = (1. zero or “small” pivots can be set to a prescribed nonzero number > 0. Endfor p Output: M −1 rB ≡ (i) i=1 RTB (i) wB . 3. yielding an approximate solution satisfying 1T wB (i) = 0. a solution wB (i) will exist.e. To motivate this version of the preconditioner. the local Cholesky factorization can be modiﬁed. . For i = 1. If direct solvers are em- T ployed.1 (Neumann-Neumann Preconditioner-No Coarse Space) Given rB the vector M −1 rB is computed as follows. We summarize the algorithm below assuming nonsingular subproblems. the action of S (i) on a vector it typically approximated in Neumann-Neumann algorithms [DE3]. and this approximate factorization can be employed to formally † compute w ˜ B (i) ≈ S (i) rB (i) . though it will not be unique. then when the Cholesky factorization L(i) L(i) of A(i) is computed on each subdomain. this approximate solution w ˜ B (i) may then be projected onto the orthogonal complement of the null space: T 1 w˜ B (i) wB (i) ≡ w ˜ B (i) − 1. the balancing domain decomposition preconditioner [MA14.59. . when c(x) = 0 and Ωi is immersed inside Ω (i. · · · . If desired. Remark 3. though it may be added. note that because of overlap between adjacent boundaries B (i) . In this case. a projected gradient method may be used to iteratively solve S (i) wB (i) = rB (i) . † When S (i) is singular. B (i) = ∂Ωi ).

Such duplication can be reduced by employing a discrete partition of unity on B subordinate to the subdomain boundaries B (1) . . B (p) .96) l=1 Various choices of such diagonal matrices exist. For each 1 ≤ l ≤ p (l) let D(l) denote a diagonal matrix of size nB (l) with nonnegative entries so that a discrete partition (decomposition) of the identity matrix is obtained: p RTB (l) D(l) RB (l) = I. . . Accord- (l) ingly. (3. The diagonal entries of D(l) is also commonly deﬁned based on the coeﬃcient a(x): . let nB (l) denote the number of nodes on B (l) for 1 ≤ l ≤ p and let xi for 1 ≤ i ≤ nB (l) denote an ordering of the nodes on B .178 3 Schur Complement and Iterative Substructuring Algorithms overlap. .

then the following condition number bound will hold: cond(M.e.95) or in (3.97) (l) (a(j) )ρ ii j:x ∈B (j) i where 0 ≤ ρ ≤ 1 denotes some user chosen scaling factor and a(l) denotes some sample value of coeﬃcient a(x) in Ωl . To ensure that the preconditioner is symmetric. i. (a(l) )ρ D(l) = (3. where deg(xi ) denotes the degree of (l) node xi . . See [DE3.98) i=1 where we have omitted a coarse space correction term. Lemma 3.98). the above def- (l) (l) inition yields D(l) ii = 1/deg(xi ). the number of distinct subdomain boundaries B (j) to which (l) node xi belongs to. The parti- tion of unity Neumann-Neumann preconditioner can now be formulated as: p † M −1 rB = T RTB (i) D(i) S (i) D(i) RB (i) rB (3. The following bounds will hold for the standard and partition of unity versions of the Neumann-Neumann preconditioner without a coarse space. If M denotes the preconditioner in (3. Pre- conditioner (3. where C > 0 is independent of h and h0 . When a(x) ≡ 1. S) ≤ C h−2 0 1 + log(h0 /h)2 . Such a discrete partition of the identity on B can be p to distribute an interface load rB to the subdomain boundaries employed rB = i=1 RTB (i) D(i) RB (i) rB so that the load is not duplicated. each matrix D(i) has been employed twice.60. Proof.. DR18].98) corresponds to a matrix additive Schwarz preconditioner for T S based on the subspaces Range(RTB (i) D(i) ) for 1 ≤ i ≤ p with the matrices T S (i) approximating D(i) RB (i) SRTB (i) D(i) .

see also [GL14. S) ≤ C 1 + log(h0 /h)2 . . For brevity. The Neumann-Neumann preconditioner with coarse space correction can be implemented in parallel using (3. Ωp correspond to elements in a coarse triangulation (0) Th0 (Ω) of size h0 .7 Neumann-Neumann and Balancing Preconditioners 179 To improve the convergence rate of the preceding Neumann-Neumann algorithms as the subdomain size h0 decreases. .. a coarse space correction term can be included. . However.100) with the subdomain problems solved as in Alg.2 Balancing Domain Decomposition Preconditioner The balancing domain decomposition preconditioner [MA14. then the coarse space matrix RT0 is: ⎡ ⎤ (h ) (h ) φ1 0 (x1 ) · · · φn00 (x1 ) ⎢. and let (h0 ) (h ) (0) φl (x) denote the coarse space nodal basis satisfying φl 0 (yj ) = δij . FA16]. h0 and {a(l) }. T (3. let yl denote the coarse nodes for 1 ≤ l ≤ n0 . . 3. DR18]. the procedure eliminates arbitrariness in the output of the Neumann-Neumann preconditioner.1 may be employed. if the subdomains Ω1 . See [DE3. If x1 . thereby providing some global exchange of information. . . ⎦ (3.. Proof. the coarse ma- trix S0 may be approximated by the coarse grid discretization A0 of (3.99) (h0 ) (h ) φ1 (xnB ) · · · φn00 (xnB ) A coarse space version of the Neumann-Neumann preconditioner can now be obtained by including the correction term RT0 S0−1 R0 with S0 = R0 SRT0 : p † M −1 rB = RTB (i) D(i) S (i) D(i) RB (i) rB + R0 S0−1 R0 rB . ⎥. Additionally.100) i=1 As with other matrix additive Schwarz preconditioners for S.1. . MA17] for the Schur complement S. . xnB denotes the nodes on B. employs an algebraic procedure referred to as balancing.1). . 2. arising from non-unique subdomain solutions. then the condition number of the partition of unity Neumann-Neumann pre- conditioner with coarse space correction will satisfy: cond(M.7. Any coarse space from Chap. 3. Lemma 3. where C > 0 is independent of h. which ensures that each singular subdomain problem arising in the Neumann- Neumann preconditioner is solvable. If coeﬃcient a(x) satisﬁes a(x) = a(i) on each subdomain Ωi . . we shall not summarize the resulting algorithm.61. 3.7. in principle. . ⎥ RT0 = ⎢ ⎣ . and provides a natural coarse space which transfers information globally.

when vector rB is balanced. which will be described in the following. Kernel(S (l) ) = Range(N For instance when c(x) > 0.101) l=1 When c(x) = 0 and Ωl is ﬂoating in Ω.105) which describes a balanced vector can be . then N ˜l will be a matrix of size n(l) × d˜l .101) will be solvable. NlT D(l) RB (l) rB = 0. By construction. so that Range(N ˜l ) = Kernel(S (l) ). (3.102) will be solvable only if the following compatibility condition holds: ˜ T D(l) RB (l) rB = 0. The methodology will be illustrated for balancing the discrete partition of unity version of the Neumann-Neumann preconditioner: p † M −1 rB = RTB (l) D(l) S (l) D(l) RB (l) rB . each system S (l) wB = D(l) RB (l) rB will be solvable. The balancing procedure will employ a more (l) (l) general matrix Nl of size nB × dl with dl ≥ d˜l such that: ˜l ) ⊂ Range(Nl ).105) (l) In this case. each subproblem (l) S wB = D(l) RB (l) rB in (3. If n(l) denotes the size of S (l) and d˜l the dimension B of the null space of S (l) . When the B (l) matrix S is singular. wB = vB + N (3.104) (l) ˜l αl represents a general term in the where vB is a particular solution. When rB is not balanced.62.103) l When (3.102) will be consistent (even if Nl = N Deﬁnition 3.102) will be: (l) (l) ˜l αl . Equation (3. and N d˜l null space of S for αl ∈ IR .180 3 Schur Complement and Iterative Substructuring Algorithms We shall heuristically motivate the balancing procedure. but it may be advantageous to choose Nl as the matrix whose columns span the null space of the local Schur complement associated with c(x) = 0. before outlining its implementation. the general solution to (3. (3. for 1 ≤ l ≤ p.103) holds. matrix S (l) will be singular. N (3. the subdomain problem: (l) S (l) wB = D(l) RB (l) rB . (l) it may be modiﬁed by subtracting a correction term P0 rB so that (I − P0 ) rB is balanced. (3. matrix S (l) will be nonsingular. where P0 is an S-orthogonal projection. Let N˜l denote a matrix whose columns form a basis for Kernel(S (l) ). A vector rB ∈ IRnB will be said to be balanced if: NlT D(l) RB (l) rB = 0. then system (3. if (l) ˜l ). By the preceding deﬁnition.

Motivated by the preceding. The correction term S C α may then be represented as: −1 T P0 rB ≡ S C α = S C C T SC C rB . so that (rB − S C α) is balanced. αTp ∈ IRd : T C SC α = C T rB . where P0 rB can be easily veriﬁed to be an S-orthogonal projection of rB onto the column space of C (with P0 P0 = P0 and P0 S = S P0T ). 3. the subproblems are solvable (but with non-unique solutions). T where the columns of C consists of the columns of RTB (l) D(l) Nl for 1 ≤ l ≤ p: T T C = RTB (1) D(1) N1 · · · RTB (p) D(p) Np . equation (3. . the output of the Neumann-Neumann preconditioner is subsequently balanced by another application of the (I − P0 ) in a post-processing step. . . which is the S-orthogonal complement of the space Kernel(C T ) of balanced vectors. To ensure sym- metry. Computing the action M −1 rB of the inverse of the hybrid Schwarz pre- conditioner M in (3. Then. This yields the following linear system of T equations for determining α = αT1 . this system will be uniquely solvable by positive deﬁniteness of S. When C T rB = 0.107) l=1 The ﬁrst application of (I − P0 ) ensures that the residual is balanced so that when the partition of unity Neumann-Neumann preconditioner is applied. solve: (C T SC)α = C T rB .7 Neumann-Neumann and Balancing Preconditioners 181 compactly represented using a matrix C of size nB × d for d = (d1 + · · · + dp ). the balancing domain decomposition precon- ditioner M employs the structure of a hybrid Schwarz preconditioner: ! p † M −1 S = P0 + (I − P0 ) RTB (l) D(l) S (l) D(l) RB (l) rB (I − P0 ). (3. (3. In the ﬁrst step.105) for a balanced vector rB becomes: C T rB = 0.107) involves three steps. the term P0 is employed to compute the projection of the solution onto the coarse space V0 = Kernel(C T )⊥ . a correction term (S C α) may be sought for α ∈ IRd : C T (rB − S C α) = 0. . Since this output will lie in the subspace Kernel(C T ) of balanced vectors. .106) When C is of full rank.

In most applications. γ p . matrix K can be singular. .. . (3. the partition of unity Neumann-Neumann preconditioner is formally applied to the balanced residual ˜rB : p † vB = RTB (l) D(l) S (l) D(l) RB (l) ˜rB . ⎦⎣ . Remark 3. .63. . l=1 for some choice of coeﬃcient vectors γ 1 . ⎥ = ⎢ . . ⎥ ⎣ . ⎦. In the second step. αp . . . . K will be symmetric and positive deﬁnite. αTp and the block structure of C into (3.106) to yield the following block partitioned linear system: ⎡ ⎤⎡ ⎤ ⎡ T (1) ⎤ K11 · · · K1p α1 N1 D RB (1) rB ⎢ . Below. .109) and αi ∈ IRdi . . j ≤ p. for 1 ≤ i. (3. . some care must be exercised when extending each matrix N ˜l to Nl . Using S C α. However.108) T K1p · · · Kpp αp NpT Dp RB (p) rB involving (d1 + · · · + dp ) unknowns corresponding to the subvectors α1 .. System (3. when C is not of full rank. Then M −1 rB ≡ (S C α + vB − S C β). If di = 0 for any index i. . l=1 ˜ B = (I − P0 ) vB requires solving the system: In the third step. . then the corresponding block rows and columns of K and α should be omitted. .64. .106) has a block structure which can be obtained by T substituting the block structure α = αT1 .182 3 Schur Complement and Iterative Substructuring Algorithms If rB = S uB . we summarize the action of the inverse of the balancing domain de- composition preconditioner. ⎥ ⎢ . a balanced residual ˜rB is constructed from rB by subtraction of the term S C α: ˜rB = rB − S C α. Here. In this case. this yields C α = P0 uB . and deﬁning v Remark 3. ⎦ ⎣ . the columns of C will be linearly dependent with: p RTB (l) D(l) Nl γ l = 0. the block submatrices Kij will be di × dj matrices deﬁned by: T Kij ≡ NiT D(i) RB (i) SRTB (j) D(j) Nj . To avoid a singular matrix K.. to obtain v (C T SC)β = C T vB .. ˜ B = (vB − S C β).

. 3. j=1 Output: M −1 rB ≡ w∗B + wB + v∗B . ⎥ ⎢ . 1.7 Neumann-Neumann and Balancing Preconditioners 183 Algorithm 3. ⎦⎣ . For i = 1.. ⎦. ⎦. Deﬁne: p w∗B ≡ T j=1 RTB (j) D(j) Nj αj r∗B ≡ rB − Sw∗B . Motivated by this.2 (Balancing Domain Decomposition Preconditioner) Input: rB .. The following convergence bound will hold for the balanced domain decomposition preconditioner. ⎦⎣ . Endfor 5. then step 1 can be omitted in the preconditioner.. in prac- tice steps 1 and 2 are employed in a pre-processing stage to ensure that the initial residual is balanced. . Deﬁne: p v∗B ≡ T RTB (j) D(j) Nj β j . K1p · · · Kpp T αp Np D RB (p) rB T (p) 2. ⎥ ⎣ . 3. the computational cost of each iteration will be proportional to the cost of two subdomain solves on each subdomain and the cost of balancing (which re- quires the solution of a coarse problem P0 ). . ⎥ = ⎢ .. Solve: ⎡ ⎤⎡ ⎤ ⎡ T (1) ⎤ K11 · · · K1p α1 N1 D RB (1) rB ⎢ . Solve: ⎡ ⎤ ⎡ ⎤ ⎡ T (1) ⎤ K11 · · · K1p β1 N1 D RB (1) tB ⎢ . Compute: p T wB = j=1 RTB (j) D(j) wB (j) tB = r∗B − SwB . In this case.. ⎥ ⎣ .. ⎦ ⎣ . ⎦ ⎣ . . Remark 3. · · · .. 6. .7. K1p · · · Kpp T βp Np D RB (p) tB T (p) 7. Then.65. p in parallel solve: S (i) wB (i) = D(i) RB (i) r∗B . steps 1 and 2 can be omitted in all subse- quent applications of M −1 in the CG algorithm.. Thus. ⎥ ⎢ . Each iteration will require one matrix multiplication with S and one multiplication by M −1 . yielding w∗B = 0. the output M −1 rB = wB + v∗B will also be balanced. If the input rB to the preconditioner is balanced. 4. ⎥ = ⎢ .

such that Kernel(Nl ) corresponds to the null space of S (l) when c(x) = 0 (typically with Nl = Span(1)). Remark 3. there will be a constant C inde- pendent of h0 . However.3 An Algebraic Preconditioner We conclude this section by outlining an algebraic preconditioner of [CA33]. h and the {a } such that: (i) 2 cond(M.66.7.184 3 Schur Complement and Iterative Substructuring Algorithms Theorem 3. 3. In this case. S) ≤ C (1 + log(h0 /h)) . this can be remedied by choosing a nontrivial matrix Nl = N˜l on each subdomain. If c(x) > 0. . with SB (i) B (i) = RB (i) SRTB (i) and S0 = R0 SRT0 . . B (p) of B: p −1 T −1 M −1 = RTB (i) SB (i) B (i) RB (i) + R0 S0 R0 . . then each subdomain problem will be nonsingular. Proof. An exact application of the preceding preconditioner requires assembly of the submatrices SB (i) B (i) and the coarse matrix S0 . and the convergence rate of the balancing domain decomposition preconditioner will deteriorate. See [MA14. It approximates the following additive Schwarz preconditioner for S. MA17. . an approximation S˜B (i) B (i) ≈ SB (i) B (i) can be (i) ˜ T of each subdomain ˜ (i) L constructed based on the ILU factorization AII ≈ L (i) −1 stiﬀness matrix A(i) . if each Nl = N ˜l . DR18]. where M denotes the balancing domain decomposition preconditioner. i=1 Here RB (i) denotes a restriction matrix with zero-one entries corresponding to nodes on B (i) .67. based on the segments B (1) . Suppose that c(x) = 0 and that coeﬃcient a(x) = a(i) on each subdomain Ωi . Then. the coarse space V0 = Kernel(C T )⊥ will be trivial. with A(i) ≈ L ˜ −T L ˜ −1 : (i) (i) p . However. and R0 denotes the coarse space weighted restriction matrix.

1). (l) (l)T ˜ −T ˜ −1 (l) S˜B (i) B (i) ≡ RB (i) RTB (l) ABB − AIB L (l) L(l) AIB RB (l) RB (i) . Numerical studies indicate attractive convergence properties for such preconditioners [CA33]. and its incomplete factoriza- tion can be found. The coarse matrix S0 may be approximated by a coarse grid discretization A0 of (3. and can be truncated to a sparse matrix. . Unlike the subdomain stiﬀness matrices S (i) . T l=1 Eﬃcient algorithms for assembling such approximations are described in [CA33]. Matrix S˜B (i) B (i) will be dense. the algebraic approx- imations S˜B (i) B (i) of SB (i) B (i) will not be singular.

. local solvers. Condition number bounds for Schur complement preconditioners Algorithm Mild Coeﬀ.). choice of subdomains.8. FA9.8 Implementational Issues Schur complement algorithms are generally more diﬃcult to implement than Schwarz methods.).1 Choice of Subdomains Various factors inﬂuence the choice of a decomposition Ω1 .1. an automated strategy.) has large jumps. storage and communication costs. 3. The condition number bounds of several Schur complement preconditioners are summarized in Table 3. They include. For anisotropic coeﬃcients. the subdomains should ideally be aligned with the discontinuities in a(. to re- duce the variation of a(. h0 and jumps in the coeﬃcient a(. Estimates are presented for the case when the jumps in a(·) are mild. just as Schwarz algorithms. 3. . 2 2 2D BPS C 1+log (h0 /h) C 1+log (h0 /h) 2 −1 2 2D Vertex Space C(a) 1+log (β ) C(β) 1+log (h0 /h) 2 3D Vertex Space C(a) 1+log (β −1 ) C(β)(h 0 /h) 2 3D Wirebasket C 1+log (h0 /h) C 1+log2 (h0 /h) Neumann & Balancing C 1+log2 (h0 /h) C 1+log2 (h0 /h) . where the implementation.8 Implementational Issues 185 3. general boundary conditions. However. C(a) denotes a parameter independent of h0 and h but dependent on the coeﬃcient a(·).1). since more geometric information is required about the subdomains and their boundaries (Neumann-Neumann and balancing pre- conditioners may be exceptions). an eﬀectively preconditioned Schur complement algorithm can converge at almost optimal rates with respect to h. . parallel libraries. For instance. h and a(·). strip like subdomains may be chosen so that the elliptic equation is cou- pled more strongly within the strips. SI2. regularity of the solution. In this section. and to balance the loads [BE14. when a(. while C is independent of h0 . and time stepped problems. may be employed to mini- mize the communication between the subdomains. preconditioning S or A. and when the jumps are large. when the coeﬃcient a(. Table 3. Ωp of Ω.1. may be reduced due to the lack of overlap between the subdomains. we remark on implementational issues in applications of Schur complement algorithms to solve a discretization of (3. 5. the geometry of the domain. Disc Coeﬀ. PO3. BA20.1. availability of fast local solvers. These include. anisotropic problems. FO2. When a natural decomposition is not obvious. see Chap.) within each subdomain. and het- erogeneity of the coeﬃcients. location of the essential and natural boundary.) is constant within each subdomain. and remarks on discontinuous coeﬃcient problems. PO2]. For the vertex space algorithm C(β) depends on the overlap factor β. .

68. each uI will denote a vector of (l) unknowns in Ωl ∪ (∂Ωl ∩ BN ). will be nonsingular. Ωp of Ω. as before. for BD = ∂Ω. Care must be exercised in deﬁning a coarse space when BN = ∅. and it may also be diﬃcult to formulate a traditional coarse space. . . Schwarz subspace preconditioners can be formulated for (l) S. . and it can be decomposed into globs or overlapping segments. since the interface B will be identical to the interface for a Dirichlet problem. given a decomposition of B into globs or overlapping segments. . We then deﬁne uI = (uI . How- (l) ever. while B = ∪pl=1 (∂Ωl ∩ Ω) will not (1)T (p)T include the natural boundary BN . . However. .186 3 Schur Complement and Iterative Substructuring Algorithms 3.8. we shall indicate both block partitionings.1) with stiﬀness matrix A and load vector f . given a decomposition Ω1 . AII . vertex space and wirebasket preconditioners). a discretization of (3. but also in BN . (l) Thus. Since B will include the natural boundary BN . the subdomain matrix AII will involve natural boundary conditions on (∂Ωl ∩ BN ). .1) can be block partitioned as in (3. the nodal unknowns can in principle be block partitioned in two alternate ways. yielding two diﬀerent Schur com- plement systems. However. In the ﬁrst block partitioning. However. and remark on the construction of Schur complement preconditioners for a discretization of (3. i.5). (l) Second Case. and the solution will be unknown not only in Ω. and the subdomain matrix AII will only involve interior nodal unknowns in Ωl . Remark 3. . We shall deﬁne the “interface” as B = ∪pl=1 ∂Ωl ∩ (Ω ∪ BN ) and let uB denote the unknowns on B. In the following. yielding a Schur complement S = (ABB − ATIB A−1 II AIB ). the triangulation must ideally be chosen so that its elements are aligned with BN . while uB will denote unknowns on (∂Ωl ∩ Ω). and let uB denote the vector of nodal values on B.. uI ) . Then. When more general boundary con- ditions are imposed. . . Schur com- plement preconditioners can be constructed as for a Dirichlet problem. If BN = ∅ and BD = ∅. uI )T . AIB and ABB will have diﬀerent sizes for each partition. since the coarse space must be a subspace of Vh ∩ HD 1 (Ω). it may be diﬃcult to decompose it into standard globs if BN has an irregular shape. In this case. (l) First Case. each uI will denote un- T T (1) (p) T knowns in Ωl and uI = (uI . using the block vectors uI and uB of unknowns. then stiﬀness matrix A and the Schur complement matrix S . . In both of the above cases. In this case. This may complicate the formulation of glob based preconditioners (such as block Jacobi.e. if BN = ∂Ω and coeﬃ- cient c(x) = 0. the natural boundary BN = ∅. In the second block partitioning. .2 General Boundary Conditions Our discussion of Schur complement preconditioners has focused primarily on Dirichlet problems. then stiﬀness matrix A and the Schur complement matrix S. the unknowns on BN ∩ ∂Ωl will be included in uI though they do not strictly lie in the interior of the subdomain. Neumann-Neumann and balancing methods apply in both cases.

the coarse space matrix S0 = R0 SRT0 will also be singular. To obtain a unique solution. AII ) = 1. . . . This approach will require matrix-vector products with S computed exactly (to machine precision). In this case. then modify it to have mean value zero: T 1 wB wB ← wB − 1.69.2.. the Schur complement system SuB = ˜f B and the coarse problem S0 w0 = R0 rB will be solvable only if 1T ˜f B = 0 and 1T R0 rB = 0. the solution to (3.1 and a CG algorithm with an appropriate preconditioner for S. . uI can be obtained at the cost of one subdomain solve.8.. the global stiﬀness matrix A is solved by a precon- ditioned CG algorithm. . see [BO4]. Computing the solution to Au ˜ = f formally requires computing the action of A˜−1 twice. and S˜−1 once.110) I −A˜−1 II AIB I 0 I 0 A˜−1 II 0 = . For instance. (i) (i) . Once uB has been determined. 0 I 0 S˜−1 − ATIB I 0 I where S˜ denotes a preconditioner for S and A˜II a preconditioner for AII . it is important that the submatrices A˜II and (i) A˜IB be scaled similar to AII and AIB . . −1 -. .3 Preconditioning S or A Given subdomains Ω1 . . 1) . where the action of the inverse of the preconditioner A˜ for A has the following block matrix structure: . . 3.5) may in principle be sought in two alternate ways. the Schur comple- ment system may be solved for uB using Alg. . . T respectively. - ˜−1 I −A˜−1 II AIB A˜II 0 I 0 A = 0 I 0 S˜−1 − ATIB A˜−1 II I . respectively.. (3. . If a pre- (i) conditioner A˜ is employed for A. or the convergence rate can deteriorate signiﬁcantly to O(h−2 ) even if cond(A˜II . In the second approach. . Ωp of Ω. 3. Remark 3. 3. The advantage is that an exact solver is not required II (l) for AII . each iterate should be normalized to have zero mean value.. if wB ∈ IRnB de- notes the output of the preconditioned system in the k’th iterate. but the disadvantage is that the inexact solver must be applied twice. for 1 = (1. The second approach has not be studied extensively. and (l) (l) (l) so require solving systems of the form AII wI = rI exactly (to machine (l) precision) each iteration. In the ﬁrst approach. A sparse direct solver may be used for AII . As a result. 1T 1 Such normalizations will not be needed when c(x) = 0.8 Implementational Issues 187 will be singular.

Remark 3. such bounds can deteriorate to O (h0 /h)(1 + log(h0 /h))2 when a traditional coarse based on a coarse triangulation is employed. SA11. 3. see Chap. provided the coeﬃcient a(. Other coarse spaces include wirebasket and partition of unity spaces.) is constant within each subdomain. see [BR15. SA8. see [CO8. if Γ denotes the curve or surface along which the coeﬃcient a(.) has large jump discontinuities. on a two dimensional domain. WI6.97). MA12.P is deﬁned as follows. Then. see [NE5]. Parallelization and Libraries Typically. paralleliza- tion. Another consideration is the choice of a coarse space. For instance. If an initial decomposition of Ω yields subdomains on which a(.) is discontinuous. Let Ni denote a matrix of size nB (i) whose columns form a basis for the null space of the local Schur complement matrix S (i) when c(x) = 0. In the Schur complement method.4 for additional comments on local solvers. V0. but improve 2 to O (1 + log(h0 /h)) when a piecewise constant coarse space is employed (see Remark 3. Importantly. sparse direct solvers are employed for solving the subdomain prob- lems arising in a Schur complement algorithm. 1) .e. Choosing subdomains with reduced variation in a(. In some applications. the subdomains must align with the discontinuities of a(. which require synchronization between the processors assigned to diﬀerent subdomains.8. . then larger subdomains may be further decomposed to improve load balancing. SA7. 2. Such a coarse space will be deﬁned even when the subdomains do not form a coarse triangulation of Ω. and the MPI and PETSc libraries. or the rate of con- vergence of a Schur preconditioned algorithm can deteriorate. Ideally. MA15]. are better when a coarse space is included. For 2nd order scalar T elliptic equations Ni = (1. i. For a Schur complement preconditioner with optimal order complexity. FFT based solvers and iterative solvers are used for subdomain problems. Theoretical bounds for Schur complement preconditioners.8. For a three dimensional domain.5 Remarks on Discontinuous Coeﬃcient Problems When a(. . MA14. SA12].70 below). the action of A−1 II and the action of pre- conditioners typically involve parallel tasks. respectively. then Γ ⊂ B = ∪pi=1 ∂Ωi .P ≡ Range(RT0 ) where: T T RT0 = RTB (1) D(1) N1 · · · RTB (p) D(p) Np .188 3 Schur Complement and Iterative Substructuring Algorithms 3. . care must be exercised in the choice of a subdomain decomposition and a coarse problem.70. .) is smooth. typical bounds are 2 O (1 + log(h0 /h)) when a traditional coarse space is employed. The “piecewise constant coarse space” V0. . however.).4 Local Solvers. Let D(i) be a diagonal matrix of size nB (i) with nonnegative entries deﬁned by (3. Let nB and nB (i) denote the number of nodes on B and B (i) . the PETSc library contains parallel codes implementing most Schur complement algorithms.) also yields better conditioned local problems. DR10..

The limiting cases above indicate that the square root of the discrete Laplace-Beltrami operator on the interface B will generally not be an eﬀective preconditioner for S in the strongly anisotropic case. If α1 = 1 and α2 → 0+ . on ∂Ω. This problem will be strongly anisotropic when (α1 /α2 ) 1 or (α1 /α2 ) 1. yielding an ill-conditioned matrix.111) u = 0. Formally.8. AIB will be proportional to α1 . This suggests ABB as a heuristic preconditioner for S. The diagonal blocks of S approach a discretization of −(∂ 2 /∂x22 ).6 Remarks on Anisotropic Problems To motivate Schur complement algorithms for anisotropic problems. and its eigendecomposition may be obtained exactly. Then. The following special limiting cases may be noted. each diagonal block of S will formally approach a scalar multiple of the identity (and will be well conditioned). in Ω (3. In particular. 3. with increasing indices as x2 increases and as x1 increases.111) may be of singular perturbation type with boundary layers in the solution [KE5. Suppose that the unknowns are ordered consecutively along each vertical line x1 = c. and the oﬀ diagonal blocks of S will formally approach zero. and instead heuristically motivate issues for consideration when formulating a Schur complement preconditioner. L2 ) × (0.). As a result. elliptic equation (3.8 Implementational Issues 189 3. posed on a domain Ω ≡ (−L1 . The traditional norm T equivalence between the subdomain Schur complement energy u(i) S (i) u(i) and the fractional Sobolev energy |uh |21/2. the coeﬃcient matrix A will have a block tridiagonal structure. When this holds. it will result in deteriorated convergence rates. yielding that S = (ABB − ATIB A−1 II AIB ) → ABB as α1 → 0+ . When α1 = 1 and α2 → 0+ . as in Chap. then the linear system will be strongly coupled along the x2 -axis. with the ratio (c2 /c1 ) increasing in proportion to the anisotropy in a(. However. If α2 = 1 and α1 → 0+ . When α1 → 0+ and α2 = 1. and AII will remain nonsingular as α1 → 0+ . consider the following model equation: −α1 ux1 x1 − α2 ux2 x2 = f. if the oﬀ diagonal blocks in S are neglected when a preconditioner is formulated.3.∂Ωi on the subdomain boundary: T c1 |uh |21/2. then the linear system will be strongly coupled along the x1 -axis. but weakly coupled along the x2 axis.∂Ωi . LA5]. 3. but S will still have a block tridiagonal structure in this limiting case. Consider a discretization of the above equation on a uniform grid and suppose that Ω is partitioned into vertical strip subdomains. will deteriorate for an anisotropic problem. . but weakly coupled along the x1 axis.∂Ωi ≤ u(i) S (i) u(i) ≤ c2 |uh |21/2. we shall assume that the boundary layer need not be captured. 1) ⊂ IR2 with parameters α1 > 0 and α2 > 0 which determine the degree of anisotropy in the equation.

Then.7 Remarks on Time Stepped Problems In time stepped problems. . T ) ⎪ ⎩ u(x. then a coarse space will be required. while for the latter. if the strips were chosen with its sides perpendicular to an axis of strong coupling. then the elliptic equation will be strongly coupled on planes perpendicular to the eigen- vector of a(. if a(.). uTB )T and ˜ f = (˜f TI .8.) is a constant matrix having only one relatively small eigenvalue. if a preconditioner is employed. Depending on the alignment of the sides of the subdomains relative to direction of weak coupling. When a(. provided its sides are perpendicular to the eigenvector associated with the smallest eigenvalue of a(. based on the subdomain . where Lu ≡ −∇ · (a∇u). in Ω. . as when α2 → 0+ . for the former. When a(. then the elliptic equation will be strongly coupled along rays (lines) parallel to the eigenvector associated with the largest eigenvalue. .) is a constant (or mildly varying) but strongly anisotropic matrix function on a domain Ω (not necessarily rectangular). each subdomain problem may have similar anisotropic limits. However. strip subdomains may be chosen so that the equation is strongly coupled within each strip. Then. or one based on al- gebraic approximation may be employed. S will formally approach ABB in the limit. Matrix ABB may then be employed as a heuristic algebraic preconditioner for S (without coarse space correction). at each time step.) corresponding to the smallest eigenvalue. For instance. Heuristically. an algebraic approximation may be constructed to have the same anisotropic limits. T ) u = 0. In three dimensions. care must be exercised. on ∂Ω × (0. and by analogy with the model problem as α1 → 0+ . a coarse space may be required. Given a nonoverlapping decomposition Ω1 . ˜ f TB )T . 0) = u0 (x). . where 0 < τ denotes the time step and (I + τ A) corresponds to a ﬁnite diﬀerence discretization of the elliptic operator (I + τ L). 3. We consider an implicit scheme in time and a ﬁnite diﬀerence discretization in space for the parabolic equation: ⎧ ⎪ ⎨ ut + L u = f. block partition u = (uTI . strip subdomains may still be em- ployed. Heuristically.) is a constant matrix having two relatively small eigenvalues. in Ω × (0. This will yield a linear system (I + τ A) u = ˜ f .190 3 Schur Complement and Iterative Substructuring Algorithms However. the Schur complement matrix will have a block tridiagonal structure. a preconditioner based on subdomain Schur complements (such as the Neumann-Neumann or balancing preconditioner). Ωp of Ω with interface B. the condition number of an unpreconditioned Schur complement matrix improves with decreasing time step. However. the coeﬃcient matrix a(x) will have three eigenvalues for each x ∈ Ω and the elliptic equation will be strongly anisotropic if either one or two eigenvalues of a(x) are very small relative to the others. with sides perpendicular to the direction in which the equation is weakly coupled.

(3. The time stepped system (I + τ A) u = ˜ f.112) τ AIB I + τ ABB uB ˜ fB The Schur complement system will be: f B − τ ATIB (I + τ AII )−1˜ S(τ ) uB = (˜ f I ).8 Implementational Issues 191 interiors ∪pi=1 Ωi and interface B. will have the following block structure: I + τ AII τ AIB uI ˜ fI T = . 3. where the Schur complement matrix S(τ ) satisﬁes: .

since formally. S(τ ) → I as τ → 0+ . FFT based preconditioners M (τ ) can be constructed to adapt to the τ and h dependent terms. we outline a heuristic approach based on a subassembly identity for the time stepped Schur complement S(τ ) in terms of S (l) (τ ): p S(τ ) = RTB (l) S (l) (τ )RB (l) . see [DA4. Due to τ and h dependent terms. such as the square root of the discrete Laplace-Beltrami matrix for a two subdomain decompo- sition. DA5. we may split S(τ ) as: p . it may also be of interest to formulate a stable one iteration algorithm which computes the discrete solution at each time step to within the local truncation error. such as would be heuristically expected for the Neumann-Neumann and balancing preconditioners. Below. Given such a decomposition. AIB and AII grow as O(h−2 ) as h → 0+ . In time stepped problems. for a ﬁxed h. LA3. −1 S(τ ) = I + τ ABB − τ 2 ATIB (I + τ AII ) AIB . The entries of ABB . and I = l=1 RTB (l) I (l) RB (l) forms an algebraic partition of the identity. will not perform uniformly.71. However. by heuristic analogy with Schwarz algorithms. ZH5]. Remark 3. For such preconditioners. In the strip or two subdomain case. DR5. a preconditioner M (τ ) must ideally adapt to both parameters uniformly. l=1 (l)T ABB ) − τ 2 AIB (I + τ AII )−1 AIB is a subdomain (l) (l) (l) where each S (l) (τ ) = (I (l) + τ p Schur complement. using a ﬁxed FFT based based preconditioner M for S(τ ). we expect that a coarse space may not be required if some time step constraint of the form τ ≤ c h20 holds. LA4.

LA3. VA. (l) (l) (l) S(τ ) = I + τ l=1 and apply a generalized ADI (alternating directions implicit) method to con- struct an approximate solution [DR5. . VA2]. 9. Time step constraints may apply. For an alternative scheme. (l)T RTB (l) ABB − τ AIB (I + τ AII )−1 AIB RB (l) . see [ZH5]. LA4. see Chap.

and in some cases on the jumps in the coeﬃcients a(·). DR10. Our discussion will be organized as follows. In most applications. 3. We omit theoretical discussion of wirebasket preconditioners. In Chap.5. DR2.6. 3.9. In Chap. we describe theoretical methods for estimating the condition number of selected Schur complement preconditioners.1. and employ theoretical properties of elliptic equations and Sobolev norms to estimate the dependence of partition parameters on the mesh size h. we estimate the condition number of several two sub- domain preconditioners.1) on a quasiuniform triangulation Th (Ω) of Ω. We focus primarily on the dependence of the condition numbers on the mesh parameter h and subdomain size h0 . we describe estimates for the condition number of the balancing domain decom- position preconditioner. subdomain size h0 and jumps in the coeﬃcient a(.). Chap. MA17]: . LI4. 3.9. The coeﬃcient c(.9. we will employ the abstract Schwarz convergence theory described in Chap. . for 1 ≤ i ≤ p c(x) = 0.1 Background Results We will consider a ﬁnite element discretization of elliptic equation (3. .3. we describe theoretical properties of the traditional and piecewise constant coarse spaces. Poincar´e-Freidrich’s inequalities. GR8. we assume that the ﬁnite element space consists of continuous piecewise linear ﬁnite elements.1) will be assumed to be zero. . 2. 3.) will be assumed to be constant on each subdomain: a(x) = ρi .2 describes discrete Sobolev inequalities for ﬁnite element spaces and uses them to prove a result referred to as the glob theorem (our proof will hold only in two dimensions). we introduce scaled Sobolev norms. useful in estimating partition parameters for glob based algorithms. We use these background results to derive an equivalence be- tween the energy associated with the Schur complement matrix and a scaled sum of fractional Sobolev norm energies.5. . Ωp which forms a quasiuniform triangulation Th0 (Ω) of Ω of diameter h0 . We will denote the ﬁnite element space deﬁned on Ω as Vh (Ω) and by Vh (D) the space of ﬁnite element functions restricted to D.9.) in (3. in Ω. BR15. BPS and vertex space preconditioners. 3. 3. In Chap. To obtain such bounds. 3. we estimate the condition number of multisubdomain block Jacobi.9. The domain Ω will be assumed to be partitioned into nonoverlapping subdomains Ω1 . for x ∈ Ωi .9. and trace and exten- sion theorems.192 3 Schur Complement and Iterative Substructuring Algorithms 3.4. for any subregion D ⊂ Ω (including D ⊂ B). In Chap. while coeﬃcient a(. In Chap. The following scaled norms and seminorms will be employed throughout this section [NE.9 Theoretical Results In this section.9. DR14.

9 Theoretical Results 193 ⎧ 2 ⎪ ⎪ |u|1.B (i) ≡ B (i) B (i) |u(x)−u(y)| |x−y|d dxdy + h0 B (i) |u| dx. 1 2 By construction. As a consequence. the above norms and seminorms will scale similarly under dilations of the underlying domain in IRd . H00 (Di ) Deﬁnition 3.∂Ωi will be stronger than v1/2. extension.Di + .∂Ωi when the func- tion v(·) ∈ H 1/2 (∂Ωi ) is zero outside some subregion Di ⊂ ∂Ωi .Ωi ≡ |∇u|2 dx ⎪ ⎪ Ωi ⎪ ⎪ ⎨ u21. h0 When Ωi ⊂ IR2 . Ωi ⊂ IRd ⎪ ⎪ ⎪ ⎩ u2 2 1/2. We will frequently encounter norms of the form v21/2. Importantly. the norm v1/2. We deﬁne an extension by zero map E0 as: v on Di E0 v = 0 in ∂Ωi \Di .114) ⎩ v 1/2 ≡ E0 v1/2. H00 (Di ) Substitution of the above deﬁnition into the integral form of the fractional Sobolev norm on H 1/2 (∂Ωi ) yields: |v(x) − v(y)|2 |v(x)|2 v2H 1/2 (D ) ≡ dxdy + 2 dydx 00 i Di Di |x − y|d Di ∂Ωi \Di |x − y| d u20. ∂Ωi \Di ) denotes the distance of x to ∂Ωi \Di . 1/2 the fractional Sobolev space H00 (Di ) may also be deﬁned equivalently as . and map the results back to the original domain and obtain estimates of the norms in the trace and extension theorems independent of the width h0 of the subdomain.72.Di and will be de- noted v2 1/2 as formalized below. Let Di ⊂ ∂Ωi .113) ⎪ ≡ B (i) B (i) |u(x)−u(y)| 2 ⎪ ⎪ |u|1/2. 3. this is easily veriﬁed to be equivalent to: |v(x) − v(y)|2 1 |u(x)|2 v2H 1/2 (D ) ≡ dxdy + dx. we may map a subdomain Ωi of width h0 to a reference domain of width 1.∂Ωi .B (i) |x−y|d dxdy. ∂Ωi \Di ) Here dist(x. and Poincar´e-Freidrich’s type inequalities are independent of h0 when scaled norms are employed. apply trace or extension theorems on the reference domain.Ωi ≡ |∇u|2 dx + h12 Ωi |u|2 dx Ωi 0 2 (3. 1/2 and deﬁne a Sobolev space H00 (Di ) and its norm by: ⎧ ⎨ H00 1/2 (Di ) ≡ v ∈ H 1/2 (Di ) : E0 v ∈ H 1/2 (∂Ωi ) (3. We will thus assume heuristically that the bounds in the trace. In such cases. 00 i Di Di |x − y| d h0 Di dist(x.

Lemma 3. Proof.Di ≤ v2H 1/2 (D ) ≤ C λl Pl v20.114) will satisfy: $ % 1/2 1/2 H0 (Di ) = v ∈ L2 (Di ) : −∆Di v ∈ L2 (Di ) . This equivalence enables an alternate formal representation of fractional Sobolev spaces and their norm using fractional powers of eigenvalues in the spectral expansion of a Laplace-Beltrami operator associated with the underlying spaces. See [LI4. see [LI4. l=1 denote its spectral representation where each Pl denotes an L2 (Di )- orthogonal projection onto the null space of −∆Di associated with eigen- value λl > 0. provided the function either has zero mean value on the underlying domain. Formally deﬁne the fractional power (−∆Di ) of operator −∆Di as: ∞ 1/2 1/2 (−∆Di ) = λl Pl . while its fractional Sobolev norm will satisfy: ∞ ∞ 1/2 1/2 c λl Pl v20.194 3 Schur Complement and Iterative Substructuring Algorithms an interpolation space of the form [L2 (Di ). BA3. ∀u ∈ H01 (Di ) ⊂ L2 (Di ). Suppose the following assumptions hold. 1. BA3. the fractional Sobolev space H00 (Di ) deﬁned by (3. l=1 1/2 Then. BE16]. BE16]. as guaranteed by the Riesz representation theorem. 1/2 2. We next describe a result referred to as the Poincar´e-Freidrich’s inequality. H01 (Di )]1/2 using interpolation between embedded Hilbert spaces. or is zero on a segment of the boundary. 00 i l=1 l=1 for some 0 < c < C.Di . Let H01 (Di ) = v : E0 v ∈ H 1 (∂Ωi ) and let −∆Di formally denote a self adjoint coercive operator which generates the Dirichlet form: (−∆Di u.73. which establishes a bound for the L2 (Ωi ) norm of a function in terms of one of its Sobolev seminorms. . Let: ∞ −∆Di = λl Pl . as de- scribed below. u)L2 (Di ) ≡ u21.Di .

Ωi ≤ C|v|21. The parameter C will be independent of h0 be- cause of the scaled norms employed. If v ∈ H (Ωi ) satisﬁes Ωi vdx = 0 then: 1 v20. If v ∈ H 1 (Ωi ) satisﬁes v = 0 on Di ⊂ ∂Ωi where measure(Di ) > 0. If v ∈ H 1 (Ωi ).Ωi ≤ (1 + C) |v|21.113).B (i) . and states that when Ωi ⊂ IRd for d = 2. then: g20. by the closed graph theorem this mapping will have a bounded right inverse. GR8]. 2.75 (Trace Theorem).Ωi .B (i) .Ωi v21.∂Ωi ≤ Cv1. then: |g20. 3.Ωi . The linear mapping of v ∈ H 1 (Ωi ) to its boundary value v ∈ H 1/2 (∂Ωi ) is not only bounded.B (i) |g21/2. 3.B (i) ≤ (1 + C) |g|21/2. The following bounds will hold. As a consequence. 4. respectively. The next result we describe is referred to as a trace theorem.Ωi .Ωi v21. If g ∈ H 1/2 (B (i) ) satisﬁes B (i) gds = 0.B (i) ≤ C |g|21/2. This result is stated below. functions in H 1 (Ω) will have boundary values (or trace) of some regularity (smoothness). See [NE]. Theorem 3. and referred to as an extension theorem. where C > 0 is independent of v and h0 . for some C > 0 independent of v and h0 .B (i) ≤ (1 + C) |g|21/2. LI4. . see [NE. GR8]. See [NE. 1.74 (Poincar´ e-Freidrich’s). with v ∈ H 1/2 (∂Ωi ) ⊂ L2 (∂Ωi ).B (i) ≤ C |g|21/2.Ωi ≤ C|v|21.Ωi ≤ (1 + C) |v|21. 3. for some C > 0 independent of v and h0 . Proof. If g ∈ H 1/2 (B (i) ) satisﬁes g = 0 on Di ⊂ ∂Ωi where measure(Di ) > 0. Additionally. it is surjective. and: v1/2. LI4.B (i) g21/2. the ﬁrst and third Poincar´e-Freidrich’s inequalities may equivalently be stated in the quotient space H 1 (Ωi )/IR or H 1/2 (B (i) )/IR. since the seminorms are invariant under shifts by constants. for some C > 0 independent of g and h0 . then its restriction to the boundary ∂Ωi will be well deﬁned. then: v20. For the choice of scaled Sobolev norms and seminorms deﬁned in (3. Proof. for some C > 0 independent of g and h0 .9 Theoretical Results 195 Lemma 3. the parameter C will be independent of h0 .

LI4.∂Ωi . Proof. NE6]. The independence of C from h0 is a consequence of the scaled norms employed. Let Ωi be a polygonal domain of size h0 triangulated by a grid Th (Ωi ) quasiuniform of size h. Furthermore. it can easily be shown that Hgh ∈ H 1 (Ωi ). BR11.Ωi ≤ C gh 1/2. such that for gh ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) Eh gh = gh . in Ωi Hgh = gh . More general results are described in [AS4. in which case the solution to Laplace’s equation has suﬃciently regular solutions. we will require a discrete version of the preceding extension theorem in which the extended function is a ﬁnite element function. where C > 0 is independent of gh . with the following bound holding: Eh gh 1. GR8]. We refer to such a result as a discrete extension theorem. given gh ∈ Vh (∂Ωi ). As we will be working with ﬁnite element functions.∂Ωi for C > 0 independent of g and h0 . Hgh will satisfy the a priori bound: Hgh 1. BJ9. To construct a ﬁnite element extension Eh gh ∈ Vh (Ωi ).∂Ωi . Lemma 3.76 (Extension Theorem).Ωi ≤ C g1/2.196 3 Schur Complement and Iterative Substructuring Algorithms Theorem 3. we will ﬁrst extend gh to the interior of the subdomain as a harmonic function Hgh : −∆(Hgh ) = 0. Applying the continuous extension theorem and using the weak formulation of Laplace’s equation on Ωi . Proof. We will outline a proof when Ωi ⊂ IR2 . satisfying the following bound: Eg1. WI. on ∂Ωi . h and h0 . See [ST7. Then there exists a bounded linear map: Eh : Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) → Vh (Ωi ) ∩ H 1 (Ωi ). on ∂Ωi .Ωi ≤ Cgh 1/2. There exists a bounded linear map E : H 1/2 (∂Ωi ) → H 1 (Ωi ) such that for each g ∈ H 1/2 (∂Ωi ): Eg=g on ∂Ωi .77.

Ωi ≤ C h |Hgh |1+.∂Ωi .∂Ωi .Ωi ≤ C |gh |1/2. The harmonic extension Hgh will. 3. Substituting an inverse inequality [CI2. EV]: |Hgh |1+. so that using a quotient space and applying Poincar´e- Freidrich’s inequality yields: Ih Hgh 1. the harmonic extension Hgh will be H 1+ (Ωi ) regular on the polygonal domain [GR8] and satisfy the following a priori bound.Ωi + |Hgh |1. JO2] of the form: |gh |1/2+. We now verify that the discrete extension map Eh is bounded.∂Ωi where C is independent of h0 and h. Consequently. GI.Ωi ≤ C h |gh |1/2+. not be a ﬁnite element function. see [NE.9 Theoretical Results 197 where C > 0 is independent of gh and h.∂Ωi . Applying standard error bounds [CI2. Since Hgh is a harmonic function.∂Ωi .Ωi ≤ C |gh |1/2+. Since gh is continuous and piecewise polynomial it will hold that gh ∈ H 1 (∂Ωi ). Ih Hgh will be well deﬁned in Vh (Ω). JO2] for the interpolation map yields: |Ih Hgh − Hgh |1. GR8.∂Ωi ≤ C h− |gh |1/2. it will be continuous in the interior and so the interpolant Ih Hgh will be well deﬁned on the interior nodes of Ωi . We shall next state and prove a basic norm equivalence between the energy associated with the Schur complement matrix on a subdomain and a weighted fractional Sobolev norm energy on the boundary of the subdomain. By construction we obtain Ih Hgh = α when gh (x) = α ∈ IR. Since ρi is not involved in this construction. So deﬁne Eh gh as the interpolant Ih Hgh of Hgh onto the ﬁnite element space V h (Ωi ): Eh gh ≡ Ih Hgh .∂Ωi . Thus. into the preceding yields: |Ih Hgh − Hgh |1. C will be independent of ρi .Ωi ≤ |Ih Hgh − Hgh |1.Ωi ≤ C h h− |gh |1/2.Ωi ≤ C gh 1/2. . however. Applying the triangle inequality and employing the preceding bounds yields: |Ih Hgh |1. the interpolant Ih Hgh is well deﬁned on the boundary ∂Ωi since Hgh = gh is continuous and piecewise polynomial on ∂Ωi . By construction.

Suppose the following assumptions hold. on ∂Ωi . Let a(·) in (3. given data gi ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) let αi denote the mean value of ui = Hih gi on Ωi . ∀v ∈ Vh (Ωi ) ∩ H01 (Ωi ) ui = gi .∂Ωi = |gi − αi |21/2. v) ≡ ρi ∇u · ∇vdx. ui ) . C2 > 0 denote generic constants independent of h. h0 . To prove the upper bound. 3.∂Ωi ≤ C1 ui − αi 21.∂Ωi is replaced by gi 2 1/2 (i) and provided the appropriate version of the H00 (B ) Poincar´e-Freidrich’s inequality is employed.Ωi = Cρi2 Ai (ui . will be analogous provided the boundary norm gi 21/2.Ωi = C2 |ui |21.198 3 Schur Complement and Iterative Substructuring Algorithms Lemma 3. and that Hih (gi − γi ) = ui − γi . Let ui ∈ Vh (Ωi ) ∩ H 1 (Ωi ) satisfy: Ai (ui . To prove the lower bound. h0 and ρi . represent the extension ui = Hih gi in the form: Hih gi = Ei gi + wi . 2. Apply the invariance of seminorms under shifts by constants and the trace theorem to obtain: |gi |21/2. Here C1 .78. We will employ the notation ui = Hih gi ∈ Vh (Ωi ) ∩ H 1 (Ωi ) to denote a discrete harmonic function with boundary values gi ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ).∂Ωi . Proof. v) = 0. for 0 < c < C independent of h. 1. The proof when ∂Ωi ∩ BD = ∅.∂Ωi ≤ Ai (ui .Ωi ≤ C2 |ui − αi |21. where Ai (u. Ωi Then the following norm equivalence will hold: c ρi |gi |21/2. . ρi and gi . it will follow that if γi is a constant then Hih γi = γi . Let gi ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) satisfy: gi = 0 on BD ∩ ∂Ωi . where the third line above follows by Poincar´e-Freidrich’s inequality since αi corresponds to the mean value of ui on Ωi .1) satisfy a(x) = ρi on Ωi for 1 ≤ i ≤ p and c(·) ≡ 0 on Ω. Since c(x) = 0. ui ) ≤ C ρi |gi |21/2. We will describe the proof for the case ∂Ωi ∩ BD = ∅.

∂Ωi .115) Eih gi 1. by construction. where Hih (gi − γi ) = ui − γi . .∂Ωi as given by the discrete extension theorem (Lemma 3.77). wi ) = −Ai (Eih gi .115) yields: |ui |1. on ∂Ωi (3. such as its aspect ratio. The parameters Ci in the preceding estimates will be indepen- dent of h and ρi . Ci may depend on other geometrical properties of Ωi .Ωi ≤ |Eih gi |1.Ωi ≤ C2 gi 1/2.Ωi . which is the desired upper bound. 3. they will be independent of h0 due to the scale invariance of the seminorms. In general. The same bound will hold if gi is replaced by gi − αi for any constant αi : |ui − γi |1. If we choose γi as the mean value of gi on ∂Ωi . Applying the triangle inequality to ui = Eih gi + wi . ui ) = ρi |ui |21.Ωi |wi |1. and wi is deﬁned by wi ≡ Hih gi − Eih gi ∈ Vh (Ωi ) ∩ H01 (Ωi ). It will thus hold that: Ai (ui .Ωi ≤ C1 gi 1/2.∂Ωi .∂Ωi ≤ C3 |gi − γi |1/2.Ωi = Ai (wi . then gi − γi will have zero mean value on ∂Ωi . and choose v = wi ∈ Vh (Ωi ) ∩ H01 (Ωi ) to obtain: ρi |wi |21.∂Ωi . v) = 0 ∀v ∈ Vh (Ωi ) ∩ H01 (Ωi ).∂Ωi .79. wi ) ≤ ρi |Eih gi |1.Ωi ≤ C2 gi − γi 1/2. In addition. Remark 3. It thus follows that: |wi |1.9 Theoretical Results 199 where Eih gi ∈ Vh (Ωi ) ∩ H 1 (Ωi ) is an extension of gi satisfying the following: Eih gi = gi . We substitute the above repre- sentation into the equation satisﬁed by Hih gi : Ai (Ei gi + wi . and an application of the Poincar´e-Freidrich’s inequality will yield: |ui − γi |1.Ωi = ρi |ui − γi |21.∂Ωi .Ωi ≤ C gi 1/2. and using the preceding bound and equation (3.Ωi ≤ C2 gi − γi 1/2.Ωi ≤ C3 ρi |gi |21/2.∂Ωi = C3 |gi |1/2.

T 2. Given a vector uB of nodal values on interface B.80. . Let Vh (Ωi ) denote a ﬁnite element space deﬁned on a domain Ωi ⊂ IR2 of diameter h0 triangulated by a quasiuniform grid of size h. where C > 0 is independent of h0 and h.2 Discrete Sobolev Inequalities We next describe a discrete Sobolev inequality [BR12] which holds for ﬁnite element functions on Ω ⊂ IR2 . Remark 3. uh ) ≤ C ρi |uh |21/2. v) ≡ ρi ∇u · ∇vdx. 1.Ωi ≤ C (1 + log(h0 /h)) h−2 0 uh 0. Ωi Then. . Let the coeﬃcient a(x) = ρi on Ωi and c(x) = 0 on Ω with: Ai (u.82. i=1 The result now follows by an application of the preceding lemma on each subdomain. Let uh denote the discrete harmonic ﬁnite ele- ment function corresponding the nodal vector u. MA17]. Lemma 3. DR10. 3. Let Ω1 . and summing over all subdomains using that gi = uh on ∂Ωi . uTB where uI = −A−1 II AIB uB .Ωi + |uh |1. Ωp form a quasiuniform triangulation of Ω of width h0 . Suppose the following assumptions hold.∂Ωi ≤ uTB SuB = A(uh . .9. see also [DR2. deﬁne u = uTI .81.Ωi . Proof.200 3 Schur Complement and Iterative Substructuring Algorithms Applying the preceding equivalence on each subdomain and summing over all the subdomains yields a global equivalence between the Schur complement energy and a weighted sum of the subdomain fractional Sobolev energies. . Then the following bound will hold for the maximum norm on Vh (Ωi ) ⊂ H 1 (Ωi ): uh 2∞. uh ) ≡ T Ai (uh . h0 and ρi . In view of the preceding result.∂Ωi . uh ). 2 2 ∀uh ∈ Vh (Ωi ). 3. . Theorem 3. MA14. i=1 i=1 for 0 < c < C independent of h. the following estimate will hold: p p c ρi |uh |21/2. a preconditioner M for S must ideally be chosen so that its interface energy uTB M uB approximates the above weighted sum of fractional Sobolev energies on its subdomain boundaries. it will satisfy: p uTB SuB = u Au = A(uh . Since uh is piecewise discrete harmonic by assumption.

Let C ⊂ Ω i denote a cone of radius R and angle α at vertex x∗ . Let x∗ ∈ Ω i denote a point where the ﬁnite element function uh attains it maximum modulus |uh (x∗ )| = uh ∞. θ) + (r. θ) dr. Apply the fundamental theorem of calculus along a ray within the cone: R ∂uh uh (0. We follow the proof in [BR12]. 0) = uh (R.9 Theoretical Results 201 Proof. θ) within the cone so that (0.Ωi . Introduce polar coordinates (r. 0 ∂r Split the integral using the intervals 0 ≤ r ≤ . 3. 0) corresponds to x∗ and so that the cone is speciﬁed in polar coordinates by 0 ≤ r ≤ R and 0 ≤ θ ≤ α.

h and .

h ≤ r ≤ R for some 0 < .

and employ the inverse inequality −1 dr ∞.Ωi ≤ uh ∞.Ωi h du h within the interval 0 ≤ r ≤ . take absolute values of all terms. 1.

θ) | ∂uh∂r dr| + h (r. θ)| + . θ)| + 0 (r. h (which holds trivially for piecewise linear ﬁnite elements) to obtain: h R |uh (0.θ) | ∂uh∂r dr| R ≤ |uh (R. 0)| ≤ |uh (R.

θ) = |uh (R.θ) dr| R ∂uh (r. h uh ∞. θ)| + .Ωi h−1 + h | ∂uh∂r (r.

bringing back the term .Ωi + h | ∂r dr| Since |uh (0. uh ∞. 0)| = uh ∞.Ωi .

Ωi yields: R (1 − . uh ∞.

Ωi ≤ |uh (R. α) yields: α αR (1 − .) uh ∞.θ) | ∂uh∂r dr|. θ)| + h (r. Integrating the above expression as θ ranges in (0.

Squaring both sides. applying the triangle inequality and the Cauchy-Schwartz inequality to the terms on the right side yields: α α uh 2∞.) αuh ∞.Ωi ≤ 0 |uh (R. θ)|2 dθ)( 0 dθ) . θ)| dθ + 0 h (r.θ) = 0 |uh (R.θ) | ∂uh∂r |dr dθ α αR 1 ∂uh (r. θ)| dθ + 0 h r | ∂r |r dr dθ.Ωi ≤ α2 (1−) 2 2 ( 0 |uh (R.

αR α R 2 + α2 (1−) 2 (r. Simplifying the expression yields the bound: .θ) 2 ( 0 h | ∂uh∂r | r dr dθ)( 0 h r12 r dr dθ) .

Ωi ≤ α2 (1−) 2α 2 0 (r. θ)|2 dθ + log(R/.θ) 2 |uh (R. αR α uh 2∞.

h) 0 h | ∂uh∂r | r dr dθ .Ωi ≤ α(1−) 2 2 0 |uh (R. we obtain: Since ( ∂u h 2 ∂uh 2 1 ∂uh 2 2 α uh 2∞. ∂r ) ≤ ( ∂r ) + r 2 ( ∂θ ) = |∇uh | . θ)|2 dθ + (log(1/.

Ωi .) + log(R/h) |uh |21. . θ)|2 dθ + C(1 + log(h0 /h) |uh |21.C α ≤ α(1−) 2 2 0 |uh (R.

202 3 Schur Complement and Iterative Substructuring Algorithms Multiplying both sides by R dR and integrating over 0 ≤ R ≤ β h0 (assuming that the cone C can be extended within Ωi to have diameter β h0 . for some 0 β ≤ 1) yields the estimate: β 2 h20 2 uh.

Ωi 2 βh0 α Cβ 2 h20 ≤ 2 α(1−)2 0 0 |uh (R.∞. θ)|2 R dR dθ + 2 + log(h0 /h)) |uh |21.Ωi (1 .

Ωi + 2 0 (1 + log(h0 /h)) |uh |21.Ωi . As a corollary of the preceding result. the following bound will hold: . Then. Cβ 2 h2 ≤ 2 α(1−)2 uh 20. Lemma 3.83. Dividing both sides by the factor β 2 h20 /2 yields the desired result. we obtain a discrete Sobolev in- equality holding on the boundary ∂Ωi of a two dimensional domain. Let Ωi ⊂ IR2 be of diameter h0 and triangulated by a quasiu- niform grid of size h.

We shall now present an alternate proof of the preceding discrete Sobolev inequality based on Fourier series [BR29]. k=−∞ . for C > 0 independent of h0 and h. Given vh ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ) let Hih vh ∈ Vh (Ωi ) ∩ H 1 (Ωi ) denote the discrete harmonic extension of vh into Ωi .∂Ωi ≤ C (1 + log(h0 /h)) |vh |21/2..Ωi ≤ Cvh 21/2.Ωi ⎪ ⎩ ≤ C (1 + log(h0 /h)) vh 21/2. under a Lip- schitz continuous parameterization).∂Ωi . where C > 0 is independent of h0 and h.∂Ωi + h0 vh 20.∂Ωi (3.∂Ωi ≤ Hi vh ∞.Ωi 2 h 2 ⎪ ≤ C (1 + log(h0 /h)) Hih vh 21.∂Ωi . Proof. This proof will use the property that the boundary ∂Ωi of a simply connected polygonal domain Ωi ⊂ IR2 will be Lipschitz homeomorphic to the unit circle S 1 (i. satisfying: Hih vh 21. Given such a parameterization x(θ) of the boundary ∂Ωi by a 2π periodic function x(θ) with arclength measure ds(x(θ)) = |x (θ)| dθ deﬁned along the curve. for C > 0 independent of h0 and h. we may represent any function u(·) ∈ L2 (∂Ωi ) by a Fourier series expansion of the form: ∞ u(x(θ)) = ck eikθ . vh 2∞. there will be a one to one correspondence between ∂Ωi and the unit circle S 1 .e.116) for vh ∈ Vh (∂Ωi ) ∩ H 1/2 (∂Ωi ). Applying the preceding lemma to Hih vh and using the boundedness of Hih yields: ⎧ ⎨ vh ∞.

the following bound will hold: v2L∞ (0. 2β H β (∂Ωi ) = 2π where h0 = |∂Ωi | denotes the length of ∂Ωi . The alternate proof of the discrete Sobolev inequality will be obtained based on the following continuous Sobolev inequality for 2π periodic functions.2π) + . 2π] with Fourier expansion: ∞ v(x) = ck eikx .84. 3. for 0 < β < 1. k=−∞ Then. 1+ Lemma 3.2π) ≤ C v2L2 (0. 2 ⎩ |u|2 h01−2β ∞ k=−∞ 2π |k| |ck |2 . 2π) denote a real periodic function on [0. Let v(x) ∈ H 2 (0. the following equivalences will hold [BR29]: ⎧ h 0 ∞ ⎨ u2L2 (∂Ωi ) = 2π k=−∞ 2π|ck | .9 Theoretical Results 203 When Ωi is shape regular.

(3.−1 v2 1+ .2π) for 0 < .117) H 2 (0.

< 1 and C independent of .

. Proof. To prove the bound take absolute values of the Fourier expansion and apply the Cauchy-Schwartz inequality to obtain: .

∞ 2 v2L∞ (0.2π) ≤ |c0 | + k=−∞. k=0 |ck | .

k=0 (|k| 2 |ck |) |k|− 2 . ∞ 1+ 1+ = |c0 | + k=−∞.

.

k=0 |k| . ∞ ∞ −1− ≤ 2π|c0 |2 + 2π k=−∞. k=0 |k| 1+ |ck |2 k=−∞.

2π) Using the integral test.2π) + |v|2 1+ k=−∞. ∞ −1− ≤ v2L2 (0. k=0 |k| . H 2 (0. we may bound: ∞ ∞ dx 1 |k|−1− ≤ 2 1 + =2 1+ ≤ 4.

−1 . 1 x1+ .

k=−∞. k=0 for 0 < .

2π) + 4 .2π) ≤ v2L2 (0. Substituting this into the preceding bound yields: v2L∞ (0. < 1.

−1 |v|2 1+ H 2 (0.117) by choosing .116) can now be obtained from (3.2π) which is the desired result. The discrete Sobolev inequality (3.

appropriately and using an inverse inequality for ﬁnite elements. .

Let vh ∈ Vh (∂Ωi ) ∩ H 2 (∂Ωi ) be 2π-periodic.85. Proof. and employ norm equiva- lences to obtain the bound: vh 2∞.∂Ωi ≤ C h−1 0 vh 0. Then.∂Ωi + C . Apply the preceding continuous Sobolev inequality to the 2π-periodic representation of vh .204 3 Schur Complement and Iterative Substructuring Algorithms 1+ Lemma 3. We follow the proof in [BR29]. the fol- lowing bound will hold: vh 2L∞ (∂Ωi ) ≤ C (1 + log(h0 /h)) vh 2H 1/2 (∂Ωi ) for C > 0 independent of h and h0 .

2 −1 h0 |vh |2 1+ . ∀vh ∈ Vh (∂Ωi ) H 2 (∂Ωi ) H 2 (∂Ωi ) in the preceding bound. to obtain: vh 2∞ ≤ C h−1 0 vh 0.∂Ωi + C . with C > 0 independent of h. H 2 (∂Ωi ) Substitute the following inverse inequality: |vh |2 1+ ≤ C h− |vh |2 1 .

H 2 (∂Ωi ) Importantly. 2 −1 − h0 h |vh |2 1 . the parameter .

> 0 may be chosen small enough so that .

This will hold provided: 1 4.−1 (h0 /h) ≤ (1 + log(1/h)). if (h/h0 ) ≥ e−4 .

if (h/h0 ) < e−4 .≡ −1 log(h0 /h) . and can be veriﬁed by an application of the derivative test for a maximum in the parameter .

86. Then.∂Ωi = |wh |2H 1/2 (D ) ≤ C (1 + log(h0 /h)) wh 2∞. Let Di ⊂ ∂Ωi denote a connected subset of length di ≤ h0 . Lemma 3. 2. We now apply the discrete Sobolev inequalities to derive results useful for estimating the condition number of Schur complement preconditioners. Suppose the following assumptions hold. 1.118) for C > 0 independent of h0 and h. Let wh ∈ Vh (∂Ωi ) satisfy: wh (x) = 0 for x ∈ ∂Ωi \Di . the following results will hold: |wh |21/2.Di + |wh |21/2. .. The desired result follows immediately.Di 00 i (3.

Substituting these yields: di /2 |wh (x(s))|2 h 2 d /2 2 0 s ds = 0 |wh (x(s))| s ds + h i |wh (x(s))| s ds h 2 d /2 ≤ wh 2∞.Di (1 + log(h0 /h)) . for nodes x ∈ G In particular. Deﬁnition 3. h] and [h. the following decomposition of the identity will hold: I= IG . we may bound |wh (x(s))| ≤ wh ∞. ∂Ωi \Di ) where ds(x) denotes the arclength measure along ∂Ωi for s ∈ (0. G∈G . For 0 ≤ s ≤ h. 3. di ) and dist(x. If G ⊂ B is a glob. di /2 di − s Combining bounds and substituting them into (3. We may similarly bound: di |wh (x(s))|2 ds ≤ C wh 2∞. Let G denote all globs of B. deﬁne the map IG : Vh (B) → Vh (B) which assigns zero nodal values at all nodes in B outside G: vh (x).87. 0.Di 2h2 + wh 2∞.120) Since wh (x) is zero when s = 0 and linear for 0 ≤ s ≤ h.Di + 2 2 2 ds(x). Since the arclength distance satisﬁes: dist(x.Di (s/h) since wh (x(s)) is linear on the interval and wh (x(0)) = 0.∂Ωi ≤ |wh |1/2. ∂Ωi \Di ) denotes the arclength distance between x and ∂Ωi \Di .Di 0 hs2 s ds + wh 2∞. we may bound |wh (x(s))| ≤ wh ∞. for nodes x ∈ G IG vh (x) ≡ for vh ∈ Vh (B). (3.Di . We now describe estimates of ﬁnite element decompositions based on globs.120) yields the result. di − s}. ∂Ωi \Di ) 0 s di /2 di − s (3. Since wh (x) is zero outside Di . Di dist(x. di /2].119) Di dist(x. ∂Ωi \Di ) = min{s. and a face.Di h i 1s ds h2 = wh 2∞.Di (1 + log(h0 /h)) .Di log(di /2h) ≤ C wh 2∞. Recall that a glob is either an edge or a vertex of B when Ω ⊂ IR2 .9 Theoretical Results 205 Proof. we may employ the equivalent integral expression for the fractional Sobolev seminorm: |wh (x)|2 |wh |1/2. For h ≤ s ≤ di /2. The ﬁrst integral may further be split over the intervals [0. the above integral can be split as: di /2 di |wh (x)|2 |wh (x(s))|2 |wh (x(s))|2 ds(x) = ds + ds. an edge or a vertex of B when Ω ⊂ IR3 .

the following bound will hold: 2 |IG wh |21/2. The following preliminary result establishes a bound for IG when G is a vertex glob in two dimensions.86 to ψG (x) which has support on BG to obtain: |ψG |1/2. and let ψG h (x) ∈ Vh (B) denote a ﬁnite element nodal basis function centered at vertex G on B: h 1. (3. It will be useful for estimating par- tition parameters in abstract Schwarz algorithms.∂Ωi |ψG h |1/2.88. Deﬁnition 3.123) . the parameters K and L will be bounded independent of h.122) i For typical subdomain decompositions arising from a coarse triangulation. h Proof. Let BG ⊂ ∂Ωi denote the union of elements adjacent to G on which ψG h (x) h has support. Apply Lemma 3. if xj = G ψG (xj ) = 0. Given wh ∈ Vh (B) let IG wh ∈ Vh (B) denote the ﬁnite element function: IG wh (x) ≡ wh (G) ψG h (x). by linearity we obtain: |IG wh |1/2. Let L > 0 denote the maximum number of globs on any shared interface of the form ∂Ωi ∩ Ωj : L ≡ max |{G : G ⊂ ∂Ωi ∩ ∂Ωj }| . Then. if xj = G. See [MA17].121) i. h 2 (3.∂Ωi ≤ C (1 + log(h0 /h)) wh 21/2.∂Ωi ≤ C (1 + log(h0 /h)) ψG h 2 ∞.∂Ωi ≤ wh ∞.j Let K > 0 denote the maximum number of neighboring subdomains: K ≡ max |{j : ∂Ωi ∩ ∂Ωj = ∅}| . Let G ∈ ∂Ωi denote a vertex glob. Since IG wh (x) = wh (G)ψG (x).∂Ωi ∀wh ∈ Vh (B) for some C > 0 independent of h0 and h. We shall now outline an important theoretical result referred to as a glob theorem. (3.∂Ωi .BG + |ψG h 2 |1/2.89. h0 and ρi . The glob theorem provides a bound for the H 1/2 (∂Ωi ) seminorm of the ﬁnite element interpolation map IG . 1.BG . where each xj denotes a node in B. 2.∂Ωi = |wh (G)| |ψG h |1/2. Let Ω ⊂ IR2 and suppose the following assumptions hold.206 3 Schur Complement and Iterative Substructuring Algorithms We associate the following parameters with a subdomain decomposition. Lemma 3.

123) yields: .BG = h 2 ds(x) ds(y) ≤ 4.9 Theoretical Results 207 for C > 0 independent of h0 and h.BG by substituting h 2 |x−y| that |ψG h (x) − ψG G (y)| ≤ h for x. −h −h |x − y|2 Substituting the preceding bound and using ψG h ∞. y ∈ BG to obtain: h h |ψG h (x) − ψGh (y)|2 |ψG |1/2.BG = 1 in (3. 3. We estimate |ψG |1/2.

Let G ∈ ∂Ωi denote a edge glob. the following bound will hold: 2 |IG vh |21/2.∂Ωi (C (1 + log(h0 /h)) + 4) ≤ C (1 + log(h0 /h)) wh 2∞. 2. where we employed the discrete Sobolev inequality in the last step. if xj ∈ ∂Ωi \G. Lemma 3.∂Ωi ≤ C vh 0.89 yields an estimate of the form: 2 IG wh 21/2. By construction. if xj ∈ G IG vh (xj ) ≡ 0. Given an edge glob G ⊂ ∂Ωi . |IG wh |21/2. Proof.∂Ωi . let GL .90. See [MA17].∂Ωi will hold triv- ially since the mass matrix on B is spectrally equivalent to an identity matrix.∂Ωi .∂Ωi C (1 + log(h0 /h)) ψG ∞. and given vh ∈ Vh (B) let IG vh ∈ Vh (B) denote the ﬁnite element function deﬁned by: vh (xj ).∂Ωi ≤ wh 2∞. Then. Let Ω ⊂ IR2 and suppose the following assumptions hold.∂Ωi .∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2. 1.∂Ωi ≤ C (1 + log(h0 /h)) (1 + log(h0 /h)) wh 21/2.∂Ωi ≤ C (1 + log(h0 /h)) wh 21/2. Let BG denote the union of all elements of ∂Ωi intersecting the glob G.91. GR ∈ ∂Ωi denote its endpoints. for some C > 0 independent of h0 and h. The next result bounds the H 1/2 (∂Ωi ) seminorm of IG when G corresponds to an edge glob in ∂Ωi ⊂ B on a two dimensional domain Ωi . corresponding to vertex globs.BG + |ψG h 2 |1/2. A bound of the form IG vh 0.BG h 2 ≤ wh 2∞. Remark 3. Combining such a bound with Lemma 3. ∀wh ∈ Vh (B) for C > 0 independent of h0 and h. the ﬁnite element function wh (x) will be zero at these endpoints GL and GR and outside the . where xj denotes nodes on ∂Ωi .

BG as follows: ⎧ .∂Ωi ≤ C (1 + log(h0 /h)) vh 2∞.BG + |wh |21/2. Substituting that wh ∞. Since wh (x) = vh (x) − IGL vh (x) − IGR vh (x) on BG . we may apply the gener- alized triangle inequality to estimate the seminorm |wh |21/2.∂Ωi ≤ C (1 + log(h0 /h)) wh 2∞. Applying bound (3.BG + |wh |21/2.208 3 Schur Complement and Iterative Substructuring Algorithms glob.BG 2 ≤ C (1 + log(h0 /h)) vh 21/2. and estimating the latter term by the discrete Sobolev inequality yields: |wh |21/2.BG = vh ∞. and may alternatively be expressed as: vh (x) − IGL vh (x) − IGR vh (x) for x ∈ BG wh (x) ≡ IG vh (x) = 0 for x ∈ ∂Ωi \BG .118) to wh (x) on BG yields: |wh |21/2.∂Ωi + |wh |21/2.BG for C > 0 independent of h0 and h.BG .BG (which holds by construction).

B + |IGR vh |21/2.B .B ≤ 3 |vh |21/2. ⎨ |wh |21/2. .B + |IGL vh |21/2.

∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2.∂Ωi .93. where the latter expression was obtained using |IGL vh |1/2. . Combining the above estimate with the trivial bound |vh |21/2.∂Ωi . ∀vh ∈ Vh (B). the following bound will hold for vh ∈ Vh (B): 2 IG vh 21/2.BG ≤ |IGL vh |1/2. 1.92. Let Vh (Ω) be a ﬁnite element space on a quasiuniform triangulation.∂Ωi . Combining such a bound with the preceding lemma will yield an estimate of the form: 2 IG wh 21/2. Let Ω ⊂ IR2 and suppose the following assumptions hold.∂Ωi and employing bounds for the vertex glob interpolants.∂Ωi .BG ≤ |vh |21/2. we obtain: 2 |wh |21/2.∂Ωi ≤ C (1 + log(h0 /h)) wh 21/2.BG + (1 + log(h0 /h)) |vh |21/2.∂Ωi will also hold trivially for edge globs since the mass matrix on B is spectrally equivalent to an identity matrix. ∀wh ∈ Vh (B) for C > 0 independent of h0 and h. 2.∂Ωi which is the desired estimate. a bound IG vh 0. G G G G ⎩ 2 ≤ C |vh |21/2.∂Ωi ≤ C vh 0. Lemma 3.BG . Combining the preceding results yields the two dimensional glob theorem. Remark 3. Similarly for the term |IGR vh |1/2. As for vertex globs. Let IG denote the glob interpolation map for vertex or edge globs G ⊂ ∂Ωi .∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2. Then.

95. There exists C > 0 independent of h. When the subdomains Ω1 . DR10. 1. 3. h0 and ρi such that: 2 IG vh 21/2. 2. the traditional coarse space deﬁned based on an underlying coarse triangulation of the domain. . . Theorem 3.3 Properties of Coarse Spaces We shall now summarize theoretical properties of two types of coarse spaces employed in Schur complement algorithms. j:∂Ωj ∩G=∅ Proof. .∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2. . π0 or πh0 . We now state the general glob theorem [MA17] in two or three dimensions. MA17]. . . . the following results will hold. the traditional coarse space V0. See [BR12. 3. 2. Then.∂Ωi in the preceding lemmas. 3. which will hold for some C > 0 independent of h0 and h for any glob G because of the spectral equivalence between the mass matrix and a scaled identity matrix on ∂Ωi . . ψnh00 (x) which satisfy ψih0 (yj ) = δij (where δij is the Kronecker delta). Ωp of size h0 form a coarse triangulation Th0 (Ω) of Ω.∂Ωi ≤ Cvh 20. . DR17. . Let Th (Ω) be a quasiuniform triangulation of Ω ⊂ IRd for d = 2. 3. with estimates for IG in the L2 (∂Ωi ) norm: IG vh 20. . SM2]. . Suppose the following assumptions hold. BR15. . see [BR15. Let Ω1 .T vh (x) = vh (yi ) ψih0 (x). .94 (Glob Theorem). If y1 . Deﬁnition 3.T (B) ⊂ Vh (B) cor- responds to the restriction to B of the ﬁnite element space deﬁned on the coarse triangulation. then the coarse space interpolation map I0. 1. i=1 The traditional interpolation map I0.B ≤ C (1 + log(h0 /h)) uh 21/2.T : Vh (B) → V0. based on a decomposition of the interface into globs. I0 . Let G ⊂ ∂Ωi be a glob within B and let vh ∈ Vh (B).9. h0 and ρi such that: 2 IG vh 21/2. yn0 denote the coarse vertices with associated coarse space nodal basis functions ψ1h0 (x). There exists C > 0 independent of h. . The proof follows by combining the seminorm bounds for |IG vh |21/2. We shall omit discussion of wirebasket coarse spaces.T is also denoted Ih0 .∂Ωi . . .∂Ωi . Ωp form a coarse triangulation Th0 (Ω) of Ω of size h0 .9 Theoretical Results 209 Proof.T (B) ⊂ Vh (B) is deﬁned by: n0 I0.∂Ωj . and the piecewise constant coarse space employed in the balancing domain decomposition pre- conditioner.

T vh |21.j without a h0 scaling factor.T in two dimensions. Proof. JO2]. if Ω ⊂ IR2 |I0.T vh |1/2. We shall only outline the proof of boundedness of I0. The interpolation error will satisfy: |vh − I0.T vh |20. .96. Under the same assumptions as Lemma 3.∂Ωi .Ωi which in turn can be estimated by the discrete Sobolev inequality as bounded by C(1 + log(h0 /h))vh 21/2. . Let Th (Ω) be a quasiuniform triangulation of Ω of size h. CI2. we may multiply them by a factor ρi on each subdomain.Ωi when Hh is the discrete harmonic extension map into the subdomains. .97. Then. . Lemma 3. if Ω ⊂ IR2 ρi |I0. Since each of the bounds in Lemma 3. Ωp form a quasiuniform coarse triangulation Th0 (Ω) of Ω of size h0 . if Ω ⊂ IR3 for C > 0 independent of h. Lemma 3.∂Ωi ≤ 2 (3.T (B) = Vh0 (B) ⊂ Vh (B).T vh |1/2. The interpolation error: |vh − I0.T vh |21/2.T vh |21. see [BR15.∂Ωi . and sum over all subdomains to obtain global estimates involving weighted terms.∂Ωi and |Hh I0. the following bound will hold for vh ∈ Vh (B): p p C (1 + log(h0 /h)) i=1 ρi vh 21/2.∂Ωi ≤ Ch0 |vh |21/2.124) C (h0 /h) vh 21/2.∂Ωi is standard [ST14.Ωi will involve the diﬀerence quotients: (i) 2 alj (vh (xl ) − vh (xj )) l.T vh will be linear on each triangular subdomain Ωi . Employ the equivalence between |I0. Let Ω1 . the term |Hh I0. h0 and ρi . ∀vh ∈ Vh (B) (3. This yields the desired bound for Ω ⊂ IR2 . The following bound will hold locally on each ∂Ωi for vh ∈ Vh (B): C (1 + log(h0 /h)) vh 21/2. 2. in two dimensions.96. h0 and ρi .∂Ωi .T vh |21.Ωi can thus be estimated by C |vh |2∞.∂Ωi .∂Ωi . for 1 ≤ i ≤ p the following results will hold.T onto the standard coarse space V0. For the general proof. DR10].125) for C > 0 independent of h. 1.∂Ωi ≤ Ch0 |vh |21/2. if Ω ⊂ IR3 for C > 0 independent of h.T vh |20. we summarize known bounds for the coarse grid interpo- lation map I0. as indicated below. .210 3 Schur Complement and Iterative Substructuring Algorithms In the following.96 are local.∂Ωi .∂Ωi ≤ 2 p i=1 C (h0 /h) i=1 ρi vh 21/2. h0 and ρi . The term |Hh I0. Since Hh I0.

by carefully choosing the partition of unity parameters dj (G).P vh = IG dj (G) Qj vh . Deﬁnition 3. Remark 3. denoted V0. We shall now turn to theoretical estimates of the interpolation map I0. if G ∩ ∂Ωj = ∅ {j:G⊂∂Ωj } dj (G) = 1. Fortunately.99. We next consider the piecewise constant coarse space V0.126) is referred to as piecewise constant since the ﬁnite element functions within this space have constant values on nodes within each glob of B.P : V0. and its values on a glob G depend on the mean value of the function on the boundaries of adjacent subdomains. 3. ∂Ωi ds The piecewise constant coarse space. with t ≥ .P (B) ⊂ Vh (B).100.9 Theoretical Results 211 Proof. {l:G⊂∂Ωl } ρl 2 For 1 ≤ j ≤ p we deﬁne a map Qj : Vh (B) → IR by: u ds Qj u = ∂Ωi .P .P (B) ≡ Range(I0. Unlike the traditional interpolation map I0. we let 0 ≤ dj (G) ≤ 1 denote non- negative partition of unity parameters which satisfy: dj (G) = 0.96 by the factor ρi and sum over all the subdomains.P (B) deﬁned by (3.P is not local. We deﬁne an interpolation map I0.P is also denoted Q0 elsewhere in these notes. (3. .126) The interpolation map I0.P ). Given a glob G ∈ G. Deﬁnition 3. G⊂G {j:G⊂∂Ωi } where Qj and dj (G) are as deﬁned in the preceding. The piecewise constant coarse space is then deﬁned as the range of the interpolation map I0. The space V0.P (B) used in the balancing domain decomposition preconditioner. global norm bounds can be obtained which do not depend on {ρi } and furthermore do not deteriorate in three dimensions. The quantities dj (G) are typically deﬁned by: ρtj 1 dj (G) = t . Multiply the local bounds in Lemma 3.P : Vh (B) → Vh (B): I0. and describe analogous estimates. is formally deﬁned next as the range of an associated interpolation map I0.98.P .T the map I0.

∂Ωj . We follow the proof in [MA14.128) in the expression for IG (vh − I0.∂Ωi .P denote the operator deﬁned earlier based on the globs G ∈ G: I0. {j:G⊂∂Ωj } Applying the generalized triangle inequality to the above expression yields: |IG (vh − I0. we obtain: IG (vh − I0. 2 2 2.∂Ωi ≤ L dj (G)2 IG (I − Qj )vh 21/2. Proof.P vh ) and using that IG1 IG2 = 0 whenever G1 and G2 are distinct globs.129) ≤ C (1 + log(h0 /h)) {j:G⊂∂Ωj } dj (G) |vh |1/2. If glob G ⊂ ∂Ωi . then: p ρi |(I − I0. Then.∂Ωi 2 (3.127) G∈G {j:G⊂∂Ωj } where I = G∈G IG (3. Let I0. 1.101. If the partition parameters dj (G) based on the globs are deﬁned by: ρtj dj (G) = . Suppose the following assumptions hold. . . 1.P vh ) = dj (G)IG (I − Qj )vh . MA17.P vh )|21/2. MA15]. .∂Ωi ≤ IG (I − Qj )vh 21/2. for G ⊂ G {l:G⊂∂Ωl } ρtl for t ≥ 12 . the following bounds will hold for vh ∈ Vh (B). then: IG (I − I0.∂Ωi i=1 2 p (3. Let Ω1 .∂Ωj ≤ c2 IG (I − Qj )vh 21/2.128) 1 = {j:G⊂∂Ωj } dj (G).P )vh 21/2. 2.127) and (3.130) ≤ C L2 K 2 (1 + log(h0 /h)) i=1 ρi |vh |1/2. . C > 0 is independent of h.P vh ≡ IG dj (G)(Qj vh ). h0 and {ρj }.212 3 Schur Complement and Iterative Substructuring Algorithms Lemma 3. Substituting (3. {j:G⊂∂Ωj } Since G ⊂ ∂Ωi and G ⊂ ∂Ωj the following norms will be equivalent: c1 IG (I − Qj )vh 21/2. Ωp form a quasiuniform triangulation of Ω of size h0 . 2 In both of the above. (3.∂Ωi .∂Ωi . .P )vh |21/2.

∂Ωj ⎪ . The latter will in turn be equivalent to IG w21/2.∂Ωi .9 Theoretical Results 213 with 0 < c1 < c2 independent of h.∂Ωj and IG w2 1/2 will be H00 (G) 1/2 equivalent by deﬁnition of H00 (G). h0 and {ρl }.P vh )|21/2. {j:G⊂∂Ωj } Applying the glob theorem to the above yields: ⎧ ⎪ ⎪ |IG (vh − I0.∂Ωi ≤ c2 L dj (G)2 IG (I − Qj )vh 21/2.P vh )|21/2. Applying this norm equivalence yields: |IG (vh − I0.∂Ωj . This will hold because of the compact support of IG w so that IG w21/2. 3.∂Ωi ⎨ ≤ q(h/h0 ) {j:G⊂∂Ωj } dj (G)2 (I − Qj )vh 21/2.

129). Since the seminorms are invariant under shifts by constants.∂Ωj 2 1 0 {j:G⊂∂Ωj } d j (G) 2 where q(h/h0 ) ≡ C c2 L (1 + log(h0 /h)) . using that Qj preserves constants and employing the scaling of seminorms under dilation) we obtain: (I − Qj )vh 20. Using a quotient space argument [CI2] (mapping ∂Ωj to a reference domain.130). we multiply (3.∂Ωj .∂Ωj : |IG (vh − I0. h0 and {ρl }.∂Ωi ⎪ ⎨ 2 ≤ C c2 L (1 + log(h0 /h)) {j:G⊂∂Ωj } ρi dj (G) |vh |1/2.∂Ωj 2 2 (3.P vh )|21/2.∂Ωj ≤ C h0 |vh |21/2.P vh )|21/2. {j:G⊂∂Ωj } 2 where q(h/h0 ) ≡ C c2 L (1 + log(h0 /h)) .∂Ωj . for c3 > 0 independent of h.∂Ωj by |vh |21/2. ⎩ = q(h/h ) ⎪ |(I − Qj )vh |21/2. This yields (3.∂Ωj + h0 (I − Qj )vh 20. To obtain (3.129) by the factor ρi and rearrange terms: ⎧ ⎪ ⎪ ρi |IG (vh − I0. we may replace |(I − Qj )vh |21/2.∂Ωi ≤ q(h/h0 ) dj (G)2 |vh |21/2.131) ⎪ ⎪ ⎪ ⎩ = C c2 L (1 + log(h0 /h)) 2 ρ d (G) 2 {j:G⊂∂Ωj } i j ρj ρj |vh |21/2. When G ⊂ (∂Ωi ∩ ∂Ωj ) the following bound may be obtained for dj (G): ρ2t j ρ2t j ρ2t j dj (G)2 = .∂Ωj .

2 ≤ ≤ t 2 2t . ρj ρj ρ2t i + ρj 1 + (ρj /ρi )2t . ρti + ρj ρ2t i + ρj {l:G⊂∂Ωl } ρtl which yields the following estimate for ρi dj (G)2 /ρj : ρi dj (G)2 ρi ρ2t j (ρj /ρi )2t−1 ≤ 1+2t = .

78. Lemma 3. 2 {G⊂∂Ωi } 0 {j:G⊂∂Ωj } Summing over all subdomain boundaries ∂Ωi yields: p 2 p ρi |vh − I0. SM2] for theoretical estimates of the wirebasket interpolation map I0. Let the assumptions in Lemma 3. Neumann-Neumann [BO7]. the preceding expression will be uniformly bounded when 2t ≥ 1.4 Two Subdomain Preconditioners for S As an application of the preceding theoretical results.∂Ωj . Proof. .214 3 Schur Complement and Iterative Substructuring Algorithms Since the factor (ρj /ρi ) is positive.∂Ωi ≤ C c2 K L (1 + log(h0 /h)) 2 2 ρj |vh |21/2. 3.∂Ωi .P )vh .∂Ωi ⎪ ⎪ ⎩≤Cc K L (1 + log(h /h)) 2 ρj |vh |21/2.P vh |21/2. {G⊂G} This yields the bound: ⎧ ⎪ ⎪ ρi |vh − I0.P vh |21/2.∂Ωj . CH2.P vh = IG (vh − I0.131) yields: ρi |IG (vh − I0. Substituting this upper bound into (3. DR10. we estimate the con- dition number of the two subdomain Dirichlet-Neumann [BJ9.101 and the triangle inequality to I0.9.∂Ωi ⎨ ≤ ρi K {G⊂∂Ωi } |IG (vh − I0.P vh ). Follows immediately by an application of Lemma 3. with an upper bound of one.∂Ωi we employ the property of IG on ∂Ωi : vh − I0.P vh )|21/2. GO3.132) i=1 i=1 for C > 0 independent of h. and the fractional Sobolev norm precon- ditioners [DR. (3. for the Schur complement. on ∂Ωi .P vh )|21/2. FU.∂Ωj . The reader is referred to [BR15.W .∂Ωi 2 ≤ C c2 L (1 + log(h0 /h)) {j:G⊂∂Ωj } ρj |vh |1/2. 2 To estimate |vh − I0. i=1 j=1 which is the desired bound (3.101 hold. Then: p 2 p ρi |I0.P vh = vh − (I − I0.∂Ωi ≤ C L K (1 + log(h0 /h)) 2 2 ρi |vh |21/2.P vh . As an immediate corollary. h0 and {ρj }.130).102. BR11. MA29]. BJ9.103. Remark 3.P vh |21/2. Such esti- mates can be obtained by applying Lemma 3. BR11]. we obtain the following bounds for I0.P vh |21/2.

∀vB = 0. Suppose F represents the fractional Sobolev norm energy: vTB F vB = |vh |2H 1/2 . M corresponds to a Neumann-Neumann preconditioner.104.9 Theoretical Results 215 Lemma 3. for some β3 > 0 independent of h and {ρ1 . The desired result follows immediately. −1 −1 3. Follows from Lemma 3. Let the coeﬃcient a(x) = ρi in Ωi for i = 1. BR11] from (3. the following bound will hold for vh ∈ Vh (Ω) with associated nodal vector vB on B = ∂Ω1 ∩ ∂Ω2 : ci ρi |vh |2H 1/2 (B) ≤ vTB S (i) vB ≤ Ci ρi |vh |2H 1/2 (B) . 00 then the preceding lemma yields that for vB = 0: vTB S (i) vB ci ρi ≤ ≤ Ci ρi . 00 00 for 0 < ci < Ci independent of h and {ρ1 . If M = S (i) . vTB F vB where S (i) denotes the subdomain Schur complement matrix. ρ2 }. H00 (B) Remark 3. M is a Dirichlet-Neumann preconditioner. BJ9. 2. for some β2 > 0 independent of h and {ρ1 . By Lemma 3.105. ρ2 }. then M will be spectrally equivalent to F and satisfy: cond(M. If M −1 = α S (1) + (1−α)S (2) for some 0 < α < 1. Then. i.106.104 hold. ρ2 }. then: cond(M. Lemma 3.104. Proof.. . GO3.e. 2 and c(x) = 0 on Ω. ρ2 }. then: cond(M. Suppose the assumptions from Lemma 3. S) ≤ β3 .∂Ωi is norm equivalent to 1/2 |vh |2 1/2 by deﬁnition of H00 (B).64). S) ≤ β2 .. Proof. 1. ∀vh ∈ Vh (B). We have the following condition number estimates. Suppose that Th (Ω) is a quasiuniform triangulation of Ω and that neither Ω1 nor Ω2 is immersed. i. CH2. we obtain that S = (S (1) + S (2) ) (ρ1 + ρ2 ) F.78 since |vh |21/2. Let M denote any of the preconditioners [DR. S) ≤ β1 . 3.e. for some β1 > 0 independent of h and {ρ1 .

(3.∂Ωi ≤ S(uh .) as: T EuB AII AIB EvB S(uh . The additive Schwarz subspace preconditioners we shall consider will be based on the subspaces Vh (Gi ) ≡ Range(IGi ) ⊂ Vh (B) corresponding to globs Gi ∈ G.∂Ωl . 2. BR15.2 can then be obtained by applying the glob theorem and other theoretical tools described in this section. To study such convergence. ·) on Vh (Gi ) as deﬁned below: Si (uh . and a coarse space V0 (B) ⊂ Vh (B). vh ) ≡ S(uh . The local bilinear forms A˜i (·. . 3. vB . the following equivalence will hold: p p c ρi |uh |21/2. vh ).5.. uB ATIB ABB vB where E ≡ −A−1 II AIB . ∀uh ∈ Vh (B). improved bounds may be ob- tained in some cases by employing other tools. . ∀uh .∂Ωi . vh ∈ Vh (B) deﬁned on B with associated nodal vectors uB . uh ) ≤ C ρi |uh |21/2. . Gn for some n. .) ≡ S(.216 3 Schur Complement and Iterative Substructuring Algorithms 3. DR17. ·) : Vh (B) × Vh (B) → IR. We shall consider only the traditional coarse space V0.B ≤ S(uh . . Our estimates will be applicable when the coeﬃcients {ρi } have large variation across subdomains.80. we shall employ the Schwarz subspace framework from Chap. TO10]. and the reader is referred to [BR12. 2.P ⊂ Vh (B). ·) in the abstract Schwarz framework of Chap.B ≡ ρl |uh |21/2. vh ) ≡ uTB SvB = .5. we shall use the notation: p uh 21/2. . Given ﬁnite element functions uh .B . for the coarse space. when the variation in the coeﬃcients is mild. DR10. The inner produce S(·.. we deﬁne the bilinear form S(.2 will be denoted Si (·.T (B) ⊂ Vh (B) and the piecewise constant space V0. Similarly.) deﬁned later. .. endowed with the inner product A(.5. and a coarse space V0 ⊂ Vh (B). Estimates for the parameters K0 and K1 from Chap. uh ) ≤ C uh 21/2. 2.133) l=1 so that c uh 21/2. However. We also omit wirebasket preconditioners. For convenience.2 with the linear space V = Vh (B). By Thm. . that we shall employ in Vh (B) will be generated by the Schur complement matrix S. Subspaces Vi ⊂ V will be chosen as subspaces of the form Vh (G) ⊂ Vh (B) based on globs G ⊂ G. vh ∈ Vh (Gi ). i=1 i=1 For notational convenience.9. the globs in G shall be enumerated as G1 .5 Multi-Subdomain Preconditioners for S We now estimate the condition number of several multisubdomain Schwarz subspace preconditioners for the Schur complement matrix S.

we shall assume that exact solvers are employed for the submatrices so that the parameters ω0 = ω1 = 1 and K0 = C0 . we deﬁne an n × n matrix . 3.9 Theoretical Results 217 To simplify our discussion. If a coarse space is not employed.

= (.

ij ) of strengthened Cauchy-Schwartz parameters such that: S(vi . vj ) ≤ .

It is easily veriﬁed that the spectral radius ρ(. vi )1/2 S(vj .ij S(vi . vj )1/2 ∀ vi ∈ Vh (Gi ) and ∀ vj ∈ Vh (Gj ).

) of matrix .

is bounded by K L. matrix . When a coarse space is employed.

Then. Gn denote an enumeration of all the distinct globs in G so that the following decomposition of identity property holds: n I= IGi . Due to the decomposition of unity property for the IGi . . ∀vh ∈ Vh (B). We shall estimate the partition parameter C0 in the weighted boundary norm (3. MA17]. By the abstract theory of Chap.133) instead of S(·. vi ) ≤ C0 S(vh . will be of size (n + 1) and its spectral radius will be bounded by (K L + 1) regardless of the choice of coarse space. . Lemma 3. the condition number of additive Schwarz subspace preconditioner for S will satisfy: C0 K L. vh ). and IGi : Vh (B) → Vh (Gi ) for 1 ≤ i ≤ n. 2. Let G1 . Since K and L are typically independent of h. With Coarse Space. given vh ∈ Vh (B) there exists vi ∈ Vh (Gi ) for 1 ≤ i ≤ n satisfying vh = v1 + · · · + vn and p S(vi . h0 and {ρi }. ρmin with C independent of h. S) ≤ C0 (K L + 1). . Proof.107. Given vh ∈ Vh (B) deﬁne vi = IGi vh ∈ Vh (Gi ) for 1 ≤ i ≤ n. i=1 where I : Vh (B) → Vh (B).2. it will hold that: v1 + · · · + vn = vh . i=1 where ρmax C0 ≤ C L (1 + log(h0 /h)) h−2 2 0 . . No Coarse Space cond(M. The next result yields an estimate for C0 when there is no coarse space. See [TO10.5. . ·) since both are equivalent. h0 and {ρi }. we only need to focus on the partition parameter C0 .

∂Ωi ≤ C Hh vh 21. i=1 Ωi Then. (3.Ωi .218 3 Schur Complement and Iterative Substructuring Algorithms If Gl ⊂ ∂Ωi .∂Ωi . ∀vi ∈ Vh (Ωi ) ∩ H01 (Ωi ). then by the glob theorem. an application of the trace theorem to Hh vh on ∂Ωi yields: ⎧ ⎨ vh 21/2. v) ≡ ρi ∇u · ∇v dx. where p A(u.∂Ωi ≤ C (1 + log(h0 /h)) vh 21/2. vi ) = 0. we obtain that: 2 |IGl vh |21/2.134) Deﬁne Hh vh as the discrete harmonic extension of the the interface value vh into the subdomain interiors Ωi for 1 ≤ i ≤ p A(Hh vh .

⎩ = C |Hh vh |21.Ωi .∂Ωi .Ωi + h12 Hh vh 20. and summing over all adjacent subdomains yields the following: p IGl vh |21/2.B = i=1 ρi |IGl vh |21/2.134). 0 Substituting the preceding bound into (3. multiplying by the factor ρi .

2 ≤ C (1 + log(h0 /h)) i:Gl ⊂∂Ωi ρ i |H h vh | 2 1.Ωi .Ωi + 1 2 h0 H h vh 2 0.

Ωi . 2 1 0 Summing over all globs yields the estimate: n n p l=1 IGl vh |1/2.B = i=1 ρi |IGl vh |1/2.Ωi + h2 Hh vh 20. 2 ≤ C (1 + log(h0 /h)) ρmax i:Gl ⊂∂Ωi |Hh vh |1.∂Ωi 2 2 l=1 .

Ωi 2 1 2 2 p .Ωi + h20 Hh vh 0. 2 n ≤ C (1 + log(h0 /h)) l=1 i:Gl ⊂∂Ωi ρi |Hh vh |1.

≤ C (1 + log(h0 /h)) ρmax L i=1 |Hh vh |21.Ωi .Ωi + h12 Hh vh 20.

and substitute it in the preceding bound to obtain: n n p l=1 IGl vh 1/2.Ωi . Thm.B = i=1 ρi |IGl vh |1/2. 0 Since Hh vh is piecewise discrete harmonic.Ω .80 yields the equivalence: . we apply Poincar´e-Freidrich’s inequality: Hh vh 20.Ω ≤ C |Hh vh |21.∂Ωi 2 2 l=1 2 p (3. 2 1 0 Since Hh vh is zero on BD .Ω . 3.135) ≤ C (1 + log(h0 /h)) ρmax L (1 + h12 ) i=1 ρi |Hh vh |21. 0 2 = C (1 + log(h0 /h)) ρmax L |Hh vh |1.Ω + h2 Hh vh 20.

2. This upper bound may be unduly pessimistic when the factor (ρmax /ρmin ) is large. However. Let Ω ⊂ IRd for d = 2. h0 and {ρi }.135) yields: n 2 ρmax 1 S(IGl vh . 3. i=1 for c. if V0 = V0. 3 C0 ≤ C (1 + log(h0 /h))2 . vh ). As an immediate corollary. h0 and {ρi }. we obtain the following condition number estimate for the block Jacobi Schur complement preconditioner in two or three dimensions: 2 ρmax 1 cond(M. 3. Let coarse space V0. Gn denote an enumeration of all the distinct globs in G so that the following decomposition of identity property holds: n I= IGi .9 Theoretical Results 219 p c vh 21/2. ρmin h0 for some C > 0 independent of h. these bounds can be improved signiﬁcantly. . i=0 where ⎧ 2 ⎪ ⎨ C (1 + log(h0 /h)) . . MA17]. 1. if V0 = V0.B ≤ S(vh . Let G1 . 3. . Let the following conditions hold. C independent of h. i=1 where I : Vh (B) → Vh (B).109. IGl vh ) ≤ C L (1 + log(h0 /h)) 1 + 2 S(vh . Our next estimate is for the Schur complement additive Schwarz precon- ditioner when a coarse space is included [TO10. given vh ∈ Vh (B) there exists v0 ∈ V0 and vi ∈ Vh (Gi ) for 1 ≤ i ≤ n with vh = v0 + v1 + · · · + vn satisfying: p S(vi . . Remark 3. h0 and {ρi }.B . vh ) = ρi |Hh vh |21.T and d = 3. if a suitable coarse space V0 is employed. S) ≤ C L (1 + log(h0 /h)) 1+ 2 .P (B) ⊂ Vh (B) be employed.T (B) ⊂ Vh (B) or V0.108.Ωi ≤ C vh 21/2. with C independent of h. Lemma 3. . vi ) ≤ C0 S(vh .T and d = 2 ⎪ ⎩ C (h0 /h).P for d = 2. Let the partition parameters dj (G) be deﬁned for t ≥ 12 by: ρtj dj (G) = . Substituting this into (3. if V0 = V0. and IGi : Vh (B) → Vh (Gi ) for 1 ≤ i ≤ n. vh ) ρmin h0 l=1 which is the desired result. {l:G⊂∂Ωl } ρtl Then.

∂Ωj .T : Vh (B) → V0.T (B) will be analogous. {j:Gl ⊂∂Ωj } where C > 0 is independent of h. for t ≥ 1/2.T depending on whether d = 2 or d = 3.B .∂Ωj 2 = C (1 + log(h0 /h)) L2 vh 21/2.P )vh 21/2.B 2 2 n ≤ C (1 + log(h0 /h)) l=1 {i:Gl ⊂∂Ωi } {j:Gl ⊂∂Ωj } ρj |vh |1/2. Multiply the above expression by ρi and sum over all subdomains containing Gl to obtain: {i:Gl ⊂∂Ωi } ρi IGl (I − I0.P (B).∂Ωj 2 2 p ≤ C (1 + log(h0 /h)) L2 j=1 ρj |vh |21/2. with diﬀerences arising from the bound for I0.∂Ωi {i:Gl ⊂∂Ωi } 2 ≤ C (1 + log(h0 /h)) {i:Gl ⊂∂Ωi } {j:Gl ⊂∂Ωj } ρj |vh |1/2. We shall only outline the proof for the choice V0 = V0.P vh 21/2. the following bound can be obtained.P )vh 1/2. ρj ρj ρ i + ρ j Substitution of the above into (3. h0 and {ρj }.P vh ) when Gl ⊂ ∂Ωi .B .129) yields: 2 IGl (I − I0. By construction. Given vh ∈ Vh (B). bound (3.P )vh 1/2.{ρi }.∂Ωi ≤ C (1 + log(h0 /h)) dj (Gl )2 |vh |21/2.∂Ωj .220 3 Schur Complement and Iterative Substructuring Algorithms Proof. 2 Summing over all globs Gl yields: n l=1 IGl (I − I0.132) from the preceding section yields that: 2 I0. as before: ρi dj (Gl )2 ρi ρ2t ≤ 2t j 2t ≤ 1.P vh where I0. it will hold that: vh = v0 + v1 + · · · + vn . Bound (3.136) When Gl ⊂ (∂Ωi ∩ ∂Ωj ).B ≤ C (1 + log(h0 /h)) vh 21/2.P )vh 21/2.∂Ωj . .P (B). deﬁne v0 ≡ I0.∂Ωi 2 2 ≤ C (1 + log(h0 /h)) {i:Gl ⊂∂Ωi } {j:Gl ⊂∂Ωj } ρi dj (Gl ) |vh |1/2. The choice V0 = V0. (3. For 1 ≤ i ≤ n deﬁne vi ≡ IGi (vh − I0 vh ).B = ρi IGl (I − I0.P is the interpolation onto V0. To estimate vl = IGl (vh − I0.136) yields the bound: IGl (I − I0.P )vh 21/2.∂Ωj 2 2 2 ρi dj (Gl )2 = C (1 + log(h0 /h)) {i:Gl ⊂∂Ωi } {j:Gl ⊂∂Ωj } ρj ρj |vh |21/2.

. if V0 = V0. MA17. Since cond(M. ∂Ωp of B and either coarse space V0.P vh and vl = IGl (I − I0. .P will yield similar bounds as the vertex space preconditioner.132) yields: n I0. .B + l=1 IGl (I − I0. Lemma 3. depending only on the amount β of overlap.P . The preceding result yields logarithmic bounds. Let the assumptions in Lemma 3. while C (1 + (h0 /h)) will hold in three dimensions. .B . WI6. then the bound C (1 + log(h0 /h)) will hold in two dimensions. BR12. edge and face globs (in three dimensions) or their extensions. DR10. if V0 = V0.T or V0. • The BPS preconditioner in two dimensions is an additive Schwarz subspace preconditioner based on the edge and vertex globs. we combine the upper bound K1 ≤ M L with preceding bounds for K0 = C0 to obtain the desired result. DR10]. Proof. based on the vertex. is also an additive Schwarz subspace preconditioner. S) ≤ C K L (1 + log(h0 /h))2 . .T and d = 2 ⎪ ⎩ C K L (h0 /h) . wh ) and wh 21/2. 3. Readers are referred to [BJ8. • The Schwarz subspace preconditioner for S based on the overlapping sub- regions ∂Ω1 . . the desired result follows by equivalence between S(wh . DR17.P . BR15. BR13. TO10] for additional theory. then the bound C (1 + log(h0 /h)) will hold in two and three dimensions.P vh 21/2. and a coarse space V0. Improved bounds independent of h0 and h can be proved.110. BR14. 3 cond(M. .P )vh 21/2. Then: ⎧ 2 ⎪ ⎨ C K L (1 + log(h0 /h)) .P is 2 employed. . when the coeﬃcient a(x) is smooth.P and d = 2.T or V0.T and d = 3.109 hold. KL8. we estimate the condition number of the additive Schwarz preconditioner for S based on Vh (G1 ). and V0.{ρi }.9 Theoretical Results 221 Combining the above bound with (3. Vh (Gn ). .T or V0. if V0 = V0. see [SM. BJ9. • The vertex space preconditioner in two or three dimensions.T is 2 employed.B . The preceding lemma may be applied to estimate the condition number of several Schur complement preconditioners in two and three dimensions.P )vh for 1 ≤ l ≤ n. If coarse space V0. Since v0 = I0. DR14] and [DE3. S) ≤ K0 K1 .B 2 ≤ C (1 + log(h0 /h)) 1 + L2 vh 21/2. As a corollary. If coarse space V0. XU10. BR11.

v) = uT Sv. v) ≡ (Su. The number of nodes on B will be denoted n and the number of nodes on B (i) = ∂Ωi \BD will be denoted ni . MA17]. We shall employ the following notation in our discussion. ∀ u. ∀ u.222 3 Schur Complement and Iterative Substructuring Algorithms 3. we deﬁne a semi-norm: . v ∈ IRn . DE3. The Euclidean inner product on IRn will be denoted: (u. DR18. TO10] for general convergence estimates on Neumann-Neumann preconditioners. v) = uT v. v ∈ IRn . We refer the reader to [DR14. by estimating the condition number of the balancing domain decomposition preconditioner using an algebraic framework introduced in [MA14. We also employ the inner product generated by the Schur complement S: S(u.9.6 Balancing Domain Decomposition Preconditioner We conclude our discussion on bounds for Schur complement preconditioners. On each non-Dirichlet boundary segment B (i) .

|wi |2S (i) ≡ S (i) wi . wi . The following Cauchy-Schwartz inequality will hold: . for wi ∈ IRni .

.

1/2 .

even when the Schur complement matrices S (i) is singular. Indeed. For each subdomain. ui S (i) vi . vi for all ui . vi ≤ S (i) ui . When the subdomain stiﬀness matrix S (i) is singular. (S (i) )1/2 ui (S ) vi . such a Cauchy-Schwartz inequality follows from the Euclidean Cauchy-Schwartz in- equality since the fractional powers (S (i) )α are well deﬁned for α ≥ 0 because S (i) is symmetric positive semideﬁnite: (i) S ui . 3. (S (i) )1/2 vi 1/2 (i) 1/2 = S (i) ui . vi = (S (i) )1/2 ui . For each subdomain.7) denote the ni × n matrix which restricts a nodal vector on B to its subvector corresponding to nodes on B (i) . ui S vi . let Ri (same as RB (i) in Chap. vi . vi ∈ IRni . 3. let Di denote a diagonal matrix of size ni with positive diagonal entries such that the following identity holds: . whose column space (i. 1/2 S (i) ui .7) an ni × di matrix: Kernel(S (i) ) ⊂ Range(Zi ). range) contains the null space of S (i) . we shall denote by Zi (identical to Ni in Chap. ∀ui . vi ∈ IRni .e. (S (i) )1/2 vi 1/2 (i) 1/2 1/2 ≤ (S (i) )1/2 ui ..

111. Lemma 3. . 1. ∀v ∈ Range(N ). ∀u. u). GO4]. 3. By deﬁnition. v ∈ IRn S(P0 u. we express the matrix form of the balancing domain decomposition preconditioner for S.9 Theoretical Results 223 p I= RTi Di Ri . The inverse M −1 of the balanced domain decomposition preconditioner is: M −1 = P˜0 + (I − P˜0 S)T (I − S P˜0 ). Proof. i=1 † where S (i) denotes the Moore-Penrose pseudoinverse of matrix S (i) . i=1 which we refer to as a decomposition of unity. Follows from the hybrid Schwarz description of the balancing domain decomposition preconditioner. 3. Consequently.7. Deﬁne N (identical to matrix C in Chap. 3. v). P0 v).137) = P0 + (I − P0 )T S(I − P0 ). ∀u. Employing the above notation. u ∈ IRn S(P0 u. The following properties will hold. 2. We deﬁne P˜0 as the following n × n symmetric matrix: −1 T P˜0 = N N T SN N and P0 = P˜0 S. the following properties will hold: P0 P0 = P0 P0 (I − P0 ) = 0 S(P0 u. v) = S(u. P0 = P˜0 S corresponds to the S-orthogonal projection onto Range(N ). v) = S(u. We deﬁne T as the following n × n matrix: p † T = RTi DiT S (i) Di Ri . P0 u) ≤ S(u.7) as the following n × d matrix. The preconditioned Schur complement matrix M −1 S will have the form: M −1 S = P˜0 S + (I − P˜0 S)T S(I − P˜0 S) (3. in Chap. where d ≡ (d1 + · · · + dp ): " # N ≡ RT1 D1T Z1 · · · RTp DpT Zp . where M −1 S will be symmetric in the S-inner product. see [ST13.

139) S (u. (3. ∀u ∈ IRn \{0}. By substituting (3. We will derive bounds for cond(M. It is then proved that γm = 1.140) S ((I − P0 )u. u = S (P0 u + (I − P0 )T S(I − P0 )u. u) = S(P0 u. γM } cond(M.138) λm where λm and λM denote the minimum and maximum values of the general- ized Rayleigh quotient associated with M −1 S in the S-inner product: S M −1 Su. γM }.224 3 Schur Complement and Iterative Substructuring Algorithms Since the preconditioned matrix M −1 S is symmetric in the S-inner prod- uct. u λm ≤ ≤ λM . u) (3. S (P0 u. (I − P0 )u) and substituting the bounds in (3. we obtain the following equivalent expression: S M −1 Su. P0 u) + S(T S(I − P0 )u. u . as the follow- ing result shows. P0 u) + S (T S(I − P0 )u.142) yields the estimates: S (P0 u.138) and (3.140) into (3. Employing the Pythagorean theorem: S(u. ∀u ∈ IRn \{0}. Lemma 3. u) = S (P0 u. (3. (3. P0 u) + S ((I − P0 )u. (3. (I − P0 )u) γm ≤ ≤ γM . . the following bound will hold: max{1. S) by estimating the extreme values of the generalized Rayleigh quotient of M −1 S in the S-inner product.137) into S M −1 Su. P0 u) + S((I − P0 )u. u) + S((I − P0 )T S(I − P0 )u. its condition number can be estimated as: λM cond(M. S) = . Next an alternative expression is derived for S (T S(I − P0 )u. Following that. (I − P0 )u) for u = 0. as described in (3. (I − P0 )u) for parameters 0 < γm ≤ γM Then.141) min{1. (I − P0 )u) . S) ≤ . Readers are referred to [MA17] for additional details. u).139).112. γm } Proof. γm } ≤ ≤ max{1.142) = S (P0 u. λM may be simpliﬁed using the S-orthogonality of the decomposition u = P0 u + (I − P0 )u.114 proves a bound for γM . u) Estimation of the parameters λm . Lemma 3. (I − P0 )u) min{1. Suppose the following condition holds: S (T S(I − P0 )u.

Given u ∈ IRn deﬁne ui ∈ IRni as follows: † ui ≡ S (i) Dj Ri (I − P0 )u for 1 ≤ i ≤ p. 3.113. and deﬁne .9 Theoretical Results 225 Lemma 3.

ui . (I − P0 )u) . S(I − P0 )u) . (3. (I − P0 )u) ≤ S (T S(I − P0 )u. the lower bound γm = 1 will hold. substitute T = i=1 RTi DiT S (i) Di Ri and simplify as follows: (T S(I − P0 )u. |ui |2S (i) ≡ S (i) ui . S ((I − P0 )u. the following identity will hold: p S (T S(I − P0 )u.144) Proof.. Then.143) i=1 Furthermore. (I − P0 )u) = |ui |2S (i) . for 1 ≤ i ≤ p. (3. i. (I − P0 )u) in the Euclidean p † inner product.143).e. express S (T S(I − P0 )u. ∀u ∈ IRn . To derive (3.

S(I − P0 )u p . p T T (i)† = i=1 Ri Di S Di Ri S(I − P0 )u.

† = i=1 S (i) Di Ri S(I − P0 )u. Di Ri S(I − P0 )u p .

insert I = i=1 RT i D i i in S ((I − P0 )u. for 1 ≤ i ≤ p. and applying the Cauchy-Schwartz inequality yields: . Ri (I − P0 )u). (I − P0 )) and expand to obtain: R S ((I − P0 )u. (I − P0 )u T = p (3. † † = i=1 S (i) S (i) Di Ri S(I − P0 )u. the vector S(I − P0 )u will be balanced. S (i) Di Ri S(I − P0 )u p = i=1 S (i) ui .145) and expressing the result in terms of ui . (I − P0 )u) p i=1 Ri Di Ri S(I − P0 )u. Therefore it will hold that Di Ri S(I − P0 )u ⊥ Kernel(S (i) ).145) = i=1 RTi Di Ri S(I − P0 )u. Substituting this into (3. To p derive a lower bound for S (T S(I − P0 )u. (I − P0 )u). (I − P0 )) = (S(I − P0 )u. (I − P0 )u p = i=1 (Di Ri S(I − P0 )u. so a property of the pseudoinverse † yields Di Ri S(I −P0 )u = S (i) S (i) Di Ri S(I −P0 )u. By deﬁnition of P0 . ui p = i=1 |ui |2S (i) .

(I − P0 )) p = i=1 (Di Ri S(I − P0 )u. Ri (I − P0 )u) p .226 3 Schur Complement and Iterative Substructuring Algorithms S ((I − P0 )u.

Ri (I − P0 )u p 1/2 p 1/2 ≤ (i) i=1 (S ui . (3. up ) : ui ∈ IRni . Ri (I − P0 )u) (i) p 1/2 p 1/2 i=1 |ui |S (i) ( i=1 RTi S (i) Ri (I − P0 )u. Ri (I − P0 )u p 1/2 (i) 1/2 ≤ i=1 S (i) ui .). S (i) ui ⊥ Range(Zi ) . 2. . 2 = Canceling common terms and squaring the resulting expression yields: p S ((I − P0 )u. and simplify the resulting expression: . Lemma 3. Ri (I − P0 )u p = i=1 S (i) ui . This yields the bound γm = 1.146) i=1 |ui |S (i) 2 (u1 . (3.. .143)..147) Proof.143) was employed.up )∈K\0 Then.. (I − P0 )) ≤ |ui |2S (i) = S (T S(I − P0 )u. . . S(I − P0 )u) employing (3. † = i=1 S (i) S (i) Di Ri S(I − P0 )u. (I − P0 )u) .. (I − P0 )u) 2 = p 1/2 1/2 i=1 |ui |S (i) (S(I − P0 )u. 1. Suppose the following assumptions hold. . the following estimate will hold: γM ≤ C. Then To estimate γM .. ui S Ri (I − P0 )u.114. γM corresponds to the maximum of the generalized Rayleigh quotient associated with T S on the subspace Range(I −P0 ) in the inner product S(. ui ⊥ Kernel(S (i) ). (I − P0 )u) i=1 where bound (3. Let K denote the following set: $ % K ≡ (u1 . expand substitute that S = j=1 RTj S (j) Rj . ui ) i=1 (S Ri (I − P0 )u. Let C > 0 be as deﬁned below: p p i=1 |Ri j=1 RjT DiT uj |2S (i) C= sup p . p (T S(I − P0 )u.

9 Theoretical Results 227 p (T S(I − P0 )u. ui p . 3. S(I − P0 )u) = i=1 S (i) ui .

† = i=1 S (i) S (i) Di Ri S(I − P0 )u. ui p = i=1 S(I − P0 )u. RTi DiT ui p .

Rj ( i=1 RTi DiT ui ) . Rj RTi DiT ui p p = j=1 S (j) Rj (I − P0 )u. Rj ( i=1 RTi DiT ui ) p 1/2 ≤ j=1 S (j) Rj (I − P0 )u. RTi DiT ui p p = i=1 j=1 S (j) Rj (I − P0 )u. Rj (I − P0 )u (j) p p 1/2 S Rj ( i=1 RTi DiT ui ). p = i=1 ( j=1 RTj S (j) Rj )(I − P0 )u.

R j (I − P 0 )u) . 1/2 p ≤ j=1 (S (j) R j (I − P0 )u.

Rj ( i=1 RTi DiT ui )) T T . p p 1/2 p j=1 (S (j) Rj ( i=1 Ri Di ui ).

(I − P0 )u) j=1 |R j i=1 R T T i D u | 2 i i S (j) p . 1/2 1/2 p p = (S(I − P0 )u.

Given glob G ∈ (∂Ωi ∩ ∂Ωj ) deﬁne IG as the matrix of size ni × nj ji IG ≡ Rj IG RTi . let yj for 1 ≤ j ≤ ni denote the nodes on B (i) in the local ji ordering. S(I − P0 )u) ≤ | RTi DiT ui |2S (j) . S) ≤ C.144) and (3. 1/2 1/2 p p ≤ i=1 |ui |S (i) 2 j=1 |Rj i=1 Ri Di ui |S (j) T T 2 . . j=1 i=1 Applying (3. where C is deﬁned in (3. . .141) with bounds (3. The following notation will be employed. which yields an upper bound with γM ≤ C. yields: p p (T S(I − P0 )u. we shall estimate C for a ﬁnite element discretization. Let x1 . it will hold that: IG = I. if xi ∈ G (IG )ii = 0. we obtain the condition number estimate cond(M.146). See [MA17]. Next. By construction. (3.148) {G⊂G} (i) On each ∂Ωi .144) was applied to obtain the last line. where (3. let IG denote the following n × n diagonal matrix: 1. By combining bound (3. . (I − P0 )u). S(I − P0 )u) ≤ C S((I − P0 )u. Canceling the common terms and squaring the resulting expression.146) yields (T S(I − P0 )u. if xi ∈ G. .147). For each glob G. xn denote the nodes on B.

228 3 Schur Complement and Iterative Substructuring Algorithms Expressing Rj RTi = Rj IRTi and substituting for I using (3.115.up )=0 i=1 |u i | S (i) yielding cond(M. ρti + ρtj Additionally: Rj RTi Di ui = {G⊂∂Ωi ∩∂Ωj } Rj IG RTi Di ui ji = {G⊂∂Ωi ∩∂Ωj } IG Di ui ji ii (3. Let ui ∈ Vh (∂Ωi ) denote a ﬁnite element function with associated nodal vector ui ∈ IRni on B (i) .. Suppose the following assumptions hold. For K and L deﬁned by (3.150) ρj ρj for ui ⊥ Kernel(S (i) ) and S (i) ui ⊥ Range(Zi )..122) and (3. (u1 .. {G⊂(∂Ωi ∩∂Ωj )} Diagonal matrix Di of size ni has the following representation: Di ≡ ii di (G)IG .149) = {G⊂∂Ωi ∩∂Ωj } IG di (G)IG ui ji = {G⊂∂Ωi ∩∂Ωj } di (G)IG ui . for t ≥ .. Let R > 0 be the bound in the following discrete harmonic extension: 1 ji 2 1 |IG ui |S (j) ≤ R |ui |2S (i) . then the following will hold: ρti di (G) ≤ . 2. {G⊂∂Ωi } where the scalars di (G) are deﬁned by: ρti 1 di (G) ≡ t. (3. {l:G⊂∂Ωl } ρl 2 When G ⊂ (∂Ωi ∩ ∂Ωj ).148) yields: ji Rj RTi = IG .121). Lemma 3. the following estimate will hold: p p i=1 |Ri RjT DiT uj |2S (i) sup p j=1 2 ≤ K 2 L2 R. 1. . S) ≤ K 2 L2 R.

150) yields: ji |Rj RTi Di ui |S (j) ≤ {G⊂∂Ωi ∩∂Ωj } di (G)|IG ui |S (j) ρti ji ≤ {G⊂∂Ωi ∩∂Ωj } ρti +ρtj |IG ui |S (j) t− 1 1 ρi 2 ρj2 ≤ {G⊂∂Ωi ∩∂Ωj } ρti +ρtj R1/2 |ui |S (i) ρ1/2 ≤ L R1/2 supρ>0 1+ρt |ui |S (i) ≤ L R1/2 |ui |S (i) . The parameters K and L are generally independent of h. h0 and {ρj }. We follow the proof in [MA14. {G:G⊂∂Ωi ∩∂Ωj } Applying the triangle inequality and employing assumption (3. If the assumptions in Lemma 3. The condition number of the balancing domain decomposition system now follows immediately from the preceding result. MA17]. (3.115 hold. 3. S) ≤ C K 2 L2 (1 + log(h0 /h)) . then the balancing domain decomposition preconditioned system will satisfy: 2 cond(M. S) ≤ K 2 L2 R. when t ≥ 12 .116.149) it holds that: ji Rj RTi Di ui = di (G)IG ui . T 2 Summing over the indices j yields: p p p | Rj RTi Di ui |2S (j) ≤ K 2 max |Rj RTi Di ui |2S (j) . By Lemma 3. Substituting the above in (3. h and coeﬃcients {ρj }. for some C > 0 independent of h0 .122) so that at most K terms of the form Rj RTi Di ui will be nonzero: p p 2 | i=1 Rj RTi Di ui |2S (j) ≤ i=1 |Rj Ri Di ui |S (j) T p ≤K i=1 |Rj Ri Di ui |S (j) . where the parameters K. depending only on .9 Theoretical Results 229 Proof.115. p Apply the generalized trian- gle inequality to estimate the term |Rj i=1 RTi Di ui |2S (j) and use assump- tion (3. L and R are as deﬁned earlier.151) j j=1 i=1 i=1 By property (3. Theorem 3. Proof. the condition number of the balancing domain decomposition preconditioned system will satisfy the bound: cond(M.151) yields the desired result.

. 2 where we have employed the Poincar´e-Freidrich’s inequality in the last line. 2 due to the constraint ui ⊥ Kernel(S (i) ). 1) ∈ Kernel(S (i) ) and so a Poincar´e-Freidrich’s inequality of the following form will hold for ui 1 |ui |21/2.∂Ωi ≤ C |ui |2S (i) . KL8].∂Ωj and |IG ui |21/2.230 3 Schur Complement and Iterative Substructuring Algorithms the spatial dimension and the shape regularity properties of the subdomain decomposition.∂Ωj . For additional details. . In either case. and subsequently applying the glob theorem. . Thus. A similar Poincar´e-Freidrich’s inequality will hold for ui if S (i) is not singular (since c(x) = 0 and due to zero ji Dirichlet values on a segment of ∂Ωi ).∂Ωi 2 ≤ C (1 + log(h0 /h)) |ui |21/2. ρi for some C > 0 independent of h0 .∂Ωi 2 1 ≤ C (1 + log(h0 /h)) ρi |ui |S (i) . h and {ρj }. IG vi will correspond to the ﬁnite element function IG ui restricted to ∂Ωj . Applying the equivalence between the scaled Schur complement ji energy ρ1j |IG ui |2S (j) and the fractional Sobolev boundary energy |IG ui |21/2. . h and {ρi }. MA17. . the equivalence between |IG ui |21/2. we only need to estimate parameter R.∂Ωj 1 2 ≤ C|IG ui |21/2. let ui denote a ﬁnite element function on ∂Ωi with associ- ated vector of nodal values ui ∈ IRni satisfying ui ⊥ Kernel(S (i) ). the reader is referred to [MA14.∂Ωi (due to support of IG ui on ∂Ωi ∩ ∂Ωj ). Accordingly. then (1. we arrive at the following estimates: ji ρj |IG ui |S (j) = |IG ui |21/2. with ∂Ωi ∩ ∂Ωj as the support. If matrix T S (i) is singular. Thus R ≤ C (1 + log(h0 /h)) for some C > 0 independent of h0 .

4. subject to the constraint that the local displacements be continuous across the subdomains. It is a Lagrange multiplier based iterative substructuring method for solving a ﬁnite element discretization of a self adjoint and coercive elliptic equation. FA16. 4. Chap.3 describes a projected gradient algorithm for determining the Lagrange multiplier ﬂux variables in the FETI method. each subdomain solution is parameterized by a Lagrange multiplier ﬂux variable which represents the Neumann data of each subdomain solution on the subdomain boundary. FA14. MA25. 4. In traditional substructuring. we describe the FETI method (the Finite Element Tearing and Interconnecting method) [FA2. Chap. . 4. FA16. each subdomain solution is parameterized by its Dirichlet value on the bound- ary of the subdomain. The Lagrange multiplier variables correspond to ﬂux or Neumann data on the subdomain boundaries. resulting in a highly parallel algorithm with Neumann subproblems. Chap. in Lagrange multiplier substructur- ing. KL8]. Applications include elasticity. Our discussion is organized as follows. Both methods are based on the PCG method with a special coarse space and with local problems that impose constraints on the globs. Given a non-overlapping decomposition. shell and plate problems [FA2. the FETI method employs an extended energy functional asso- ciated with the self adjoint and coercive elliptic.4 describes the FETI-DP and BDDC variants of the FETI algorithm. It is obtained by weakening the continuity of the displacements across the subdomain boundaries. The FETI method then minimizes this extended energy.2 describes the Lagrange multiplier formulation associated with this constrained minimization problem.4 Lagrange Multiplier Based Substructuring: FETI Method In this chapter. By contrast. The global solution is sought by solving a reduced Schur complement system for determining the unknown Dirichlet boundary values of each subdomain solution.1 describes the constrained minimization problem underlying the FETI method. by solving a saddle point problem. Several preconditioners are outlined. based on a non-overlapping decomposition of its domain. FA15]. Chap. They yield identical convergence rates and provide advantages in parallelizability. The global solution is then sought by determin- ing the unknown Lagrange multiplier ﬂux variable. FA15.

subject to the constraint that the local solutions match across the sub- domain boundaries. ⎪ ∀v ∈ H01 (Ω). . local solutions are sought on each subdomain which minimize an extended global energy.3) 2 l=1 By construction. Ωp of Ω. We also deﬁne the following subdomain forms and spaces: ⎧ ⎨ AΩl (ul . we deﬁne its internal boundary segments B (l) = ∂Ωl ∩ Ω and external boundary segments B[l] = ∂Ωl ∩ ∂Ω. and common interfaces Blj = ∂Ωl ∩ ∂Ωj .1) seeks u ∈ H01 (Ω) satisfying: ⎧ ⎨ A(u. vl ) − FΩl (vl ) . v∈H01 (Ω) where J(v) ≡ 21 A(v. where A(u. v) = F (v). v) ≡ Ω (a ∇u · ∇v + c u v) dx (4.1. we deﬁne an extended energy functional JE (·) as: p 1 JE (vE ) = AΩl (vl . .1 Constrained Minimization Problem: Continuous Case Consider the following self adjoint and coercive elliptic equation: −∇ · (a(x)∇u) + c(x)u = f (x). . 4. then it can be veriﬁed that J(v) = JE (vE ). The weak formulation of (4. Given a non-overlapping decomposition Ω1 . in Ω (4. (4. . In this section. . ∀vl ∈ HB 1 (Ωl ) where ⎪ ⎩ H 1 (Ω ) ≡ {v ∈ H 1 (Ω ) : v = 0 on B }. vp ) where each local function vl (·) ∈ HB 1 [l] (Ωl ). . if v ∈ H01 (Ω) and vl (·) ≡ v(·) on Ωl for 1 ≤ l ≤ p. on ∂Ω.1) seeks u ∈ H01 (Ω): J(u) = min J(v). [l] B[l] l l [l] Given a collection of subdomain functions vE = (v1 . . vl = vj need not match across common interfaces Blj = ∂Ωl ∩ ∂Ωj . vl ) ≡ Ωl (a ∇ul · ∇vl + c ul vl ) dx.1) u(x) = 0. we describe the constrained minimization formulation of a self adjoint and coercive elliptic equation. Given a non-overlapping decomposition of a domain. ∀ul . v) − F (v). based on a non- overlapping decomposition.2) ⎪ ⎩ F (v) ≡ Ω f v dx.1 Constrained Minimization Formulation The FETI method is based on a constrained minimization formulation of an elliptic equation. Generally. .232 4 Lagrange Multiplier Based Substructuring: FETI Method 4. . The minimization formulation of (4. We also describe the ﬁnite element discretization of the elliptic equation and its constrained minimization formulation. vl ∈ HB[l] (Ωl ) 1 ⎪ FΩl (vl ) ≡ Ωl f vl dx. yet JE (vE ) is well deﬁned.

we expect that minimizing JE (vE ) in V0 will yield: JE (uE ) = min JE (vE ) (4. . deﬁne I∗ (l) ⊂ I(l) as the subindex set of interface segments of dimension (d − 1) when Ω ⊂ IRd . (4. . . we deﬁne the following constraint set of local functions: V0 ≡ {vE : vl = vj on Blj if j ∈ I(l).1. φj ).5) where A is symmetric positive deﬁnite with entries n Aij = A(φi . 1 ≤ l ≤ p}. Heuris- tically. If {φ1 . . Note that j ∈ I ∗ (l) if and only if l ∈ I ∗ (j). such that if Blj = ∅. . φn } denotes a nodal basis for Vh ∩ H01 (Ω). . 4. as in Fig.1 Constrained Minimization Formulation 233 Deﬁne I ∗ (l) ≡ {j : Blj = ∅}. while nodes on B (i) Ω1 Ω2 Ω3 Ω4 Ω5 Ω6 Ω7 Ω8 Ω9 Ω10 Ω11 Ω12 Ω13 Ω14 Ω15 Ω16 Fig. .2 Constrained Minimization Problem: Discrete Case Let Th (Ω) denote a quasiuniform triangulation of Ω with n nodes in Ω. 4. vh ) = F (vh ). but not both. and u denotes the displacement vector with uh (x) = i=1 (u)i φi (x) and f denotes the load vector. Let Vh denote a space of ﬁnite element functions on the triangulation Th (Ω) of Ω. 4.1. . . 4. up ) will satisfy ul = u on Ωl for 1 ≤ l ≤ p. A non-overlapping decomposition . we shall block partition the nodal unknowns on each subdomain as follows. .1. A ﬁnite element discretization of (4. for the desired solution u(.4) vE ∈V0 where uE = (u1 .1) will seek uh ∈ Vh ∩ H01 (Ω) such that: A(uh . for a chosen ordering of the nodes.). then either j ∈ I(l) or l ∈ I(j). . The FETI method employs a Lagrange multiplier formulation of a discrete version of this problem. Ωp of Ω. Additionally. Nodes in Ωi will be regarded as “interior” nodes in Ωi . the resulting discretization will yield the linear system: A u = f. with (f )i = F (φi ). ∀vh ∈ Vh ∩ H01 (Ω). Then V0 will consist of local functions which match across subdomains. choose I(l) ⊂ I ∗ (l) as a subindex set. then. and iteratively solves the resulting saddle point system using a preconditioned projected gradient method. . Heuristically. Given a nonoverlapping decomposition Ω1 . For 1 ≤ l ≤ p.

respectively.10): p A = i=1 RiT A(i) Ri p (4. The FETI algorithm solves (4. . The local displacements. The common interface will be denoted as B = ∪pi=1 B (i) . we obtain 0 = ∇J (u) = Au − f .5) by a constrained minimization reformulation of (4. We shall denote by (i) (i) (i) (i) uI ∈ IRnI and uB ∈ IRnB vectors of ﬁnite element nodal values on Ωi and B (i) . (4. Its transpose RiT will extend by zero a nodal vector on Ωi ∪ B (i) to the rest of Ω. In terms of nodal vectors. For discretizations of more general elliptic equations. Suppose A = AT > 0 and let u solve the linear system (4. The constrained minimization problem employed in the FETI method is obtained by weakening the requirement that the subdomain ﬁnite element functions be continuous across the interface B.. The next result describes a minimization problem equivalent to (4. . i.e. . f i = I(i) . relating the local and global stiﬀness matrices and load vectors.7) f = i=1 RiT f i .234 4 Lagrange Multiplier Based Substructuring: FETI Method as “subdomain boundary” nodes. we shall let Z (i) denote an ni × di matrix whose columns form a basis for the kernel of A(i) : Range(Z (i) ) = Kernel(A(i) ).8) When A(i) is nonsingular. local stiﬀness matrices and load vectors will be denoted as: (i) (i) (i) (i) uI A II AIB f ui = (i) . .. When matrix A(i) is singular. A(i) = (i)T (i) . . At the critical point of J (·). and the number of nodes in Ωi and B (i) will be denoted (i) (i) (i) (i) as nI and nB . (4.5).9) v∈IR 2 Proof. we deﬁne Z (i) = 0 and set di = 0. Since A = AT > 0. (4. Ω i ⊂ Ω.) and F (. and by subsequently enforcing continuity across B as a constraint.1. for 1 ≤ i ≤ p. each local . such as the equations of linear elasticity.) based on the subdomains. Lemma 4. for v ∈ IRn . where J(v) ≡ v Av − vT f . Kernel A(i) may have dimension di up to six (for Ω ⊂ IR3 ).6) uB AIB ABB fB given a local ordering of the nodes. the critical point u will correspond to a minimum. We shall denote by Ri the restriction map which maps a nodal vector u ∈ IRn of nodal values on Ω onto its subvector ui = Ri u of size ni of nodal values on Ωi ∪ B (i) . When coeﬃcient c(x) = 0 in (4. will yield the subassembly identity (3.5). Then u will minimize the associated energy functional: 1 T J(u) = minn J(v). then T the local stiﬀness matrix A(i) will be singular with 1 ≡ (1.1) and Ωi is ﬂoating. respectively. for the chosen local ordering of the nodes. with ni ≡ (nI + nB ). 1) spanning Kernel A(i) . Decomposing A(.5).

f B )T denote local loads. . .. . . where (i)T (i)T f i = (f I .1 Constrained Minimization Formulation 235 (i)T (i)T displacement vector vi = (vI . .. In the following. To determine an extended displacement vE whose components match on the interface B. By construction. a block matrix Z of size nE × d: ⎡ (1) ⎤ Z 0 ⎢ . . The FETI method also employs extended loads f E ≡ (f T1 . wTp ∈ IRnE . ⎥ Z≡⎣ . . . displacement and load vectors will have the following block structure: ⎡ (1) ⎤ ⎡ ⎤ ⎡ ⎤ A 0 v1 f1 ⎢ . ⎦ . . the extended stiﬀness matrices. Deﬁne wi = Ri v ∈ IRni and wE = wT1 . . 4. . ⎥ ⎢ . we introduce the extended energy functional JE (wE ) that corresponds to the sum of the local displacement energies. d1 } + · · · + min{1.. vTp )T of size nE = (n1 +· · ·+np ). where d = min{1.10) 0 A(p) vp fp Given matrices Z (i) of size ni × min{1. ⎦ .11) 0 Z (p) will also be employed. ⎥ ⎢ . . A(p) of size nE . f E ≡ ⎣ . based on the local stiﬀness matrices A(i) . and an extended block diagonal stiﬀness matrix AEE ≡ blockdiag A(1) . . . . ⎥ AEE ≡ ⎣ . Lemma 4. constraints are imposed on vE and an extended energy functional is minimized subject to these constraints. di } whose columns span the null space of A(i) with Range(Z (i) ) = Kernel(A(i) ).. (4. . . need not match with adjacent displacements on B (i) ∩B (j) . ⎦ . ⎦ (4.2. f Tp )T . vB )T in an extended global displacement vE ≡ (vT1 . Suppose the following assumptions hold for v ∈ IRn : T 1. vE ≡ ⎣ . dp }.

i=1 3. f B ∈ IRni deﬁne: p T f= RiT f i ∈ IRn and f E = f T1 . for J(v) deﬁned by (4. i=1 .12) 2 E Then. . . (4. f Tp ∈ IRnE . Let JE (wE ) denote the following extended energy functional: 1 T JE (wE ) ≡ w AEE wE − wTE f E . Given local load vectors f i = f I . . .7) for the stiﬀness matrix yields: p T v Av = vT RiT A(i) Ri v = wTE AEE wE . The subassembly identity (4. T T (i) (i)T 2.9). Proof. it will hold that J(v) = JE (wE ).

Lemma 4. for i = 1.9). . The matrix M will be chosen so that the equation M wE = 0 enforces each admis- (i) (j) sible pair of local displacement vectors wB and wB to match on the nodes in B (i) ∩ B (j) . . By construction.9) can be expressed as a constrained minimiza- tion of the extended energy functional JE (wE ) within the constraint set V0 .13). . . By deﬁnition of subspace V0 .3. 3. . . A constrained minimization formulation of (4. the parametric representation wi = Ri v in terms (i) (j) of v ∈ IRn ensures that the nodal values of wB match with those of wB for nodes on B (i) ∩ B (j) . . . provided a matrix M can be constructed. p. We let nB denote the number of nodes on B = ∪pi=1 B (i) . . corresponding to nodal values of v at the speciﬁc nodes. T 4. .4. Let V0 and matrix M of size m × nE be as in (4.14) vE ∈V0 Then. Let wE = wT1 . An application of the preceding lemma will yield the desired result. (R1 v)T . . .13). the following results will hold: wi = Ri u. .13) Here M will be a matrix of size m × nE . . (Rp v)T (4. . Proof. . .5) can now be obtained. wTp : $ % T V0 ≡ : v ∈ IRn = {wE ∈ IRnE : M wE = 0}. the minimization problem (4. Suppose the following assumptions hold. i=1 i=1 i=1 Substituting these into J(v) and JE (wE ) yields the desired result. wTp denote the constrained minimum: JE (wE ) = min JE (vE ). . Let u denote the minimum of (4. . p T 1. f Tp . . When matrix M can be constructed. 2.236 4 Lagrange Multiplier Based Substructuring: FETI Method since wi = Ri v. (4. . We shall now describe how to construct a matrix M so that the representation V0 = Kernel(M ) holds in (4. Construction of Matrix M . p Remark 4. The subassembly identity for load vectors yields: ! p p T p T T T v f =v Ri f i = (Ri v) f i = wTi f i = wTE f . the above equivalence between J(v) and JE (wE ) will hold only when the constraints wi = Ri v for 1 ≤ i ≤ p are satisﬁed. When f = i=1 RiT f i . Let f = i=1 RiT f i and f E = f T1 . such that wi = Ri v for 1 ≤ i ≤ p T if and only if M wE = 0 for wE = wT1 . the following parametric representation wi = Ri v will hold for 1 ≤ i ≤ p and for some v ∈ IRn .

we may require matching of nodal values of vl and vj at node xi for each pair of indices l.1 Constrained Minimization Formulation 237 Deﬁnition 4. xnB on interface B.16) so that M vE = M (1) v1 + · · · + M (p) vp where M (i) is of size m × ni . We describe two alternate choices of matrix M (not necessarily full rank). j ∈ W (xi ). Each row of matrix M must be chosen to enforce a constraint which matches two nodal values. . . 4. and the degree of a node xi denotes the number of distinct subdomain boundaries to which it belongs. it will be suﬃcient to select a subset of linearly dependent constraints so that all such matching conditions can be derived from the selected few constraints. having the block structure: " # M = M (1) · · · M (p) .15) ⎪ ⎪ ⎩ index xl . . j ∈ W (xi ). In principle. Given nodes x1 . However. this will typically yield redundant equations when degree (xi ) ≥ 3. Since . Each node xi ∈ B will belong to degree (xi ) distinct subdomain boundaries. In practice. This can be done by requiring that the diﬀerence between the nodal value of vl and vj be zero at node xi . B (j) ≡ local index of xl in B (j) . Here W (xi ) denotes the indices of all subdomains whose boundaries contain xi . There is much arbitrariness in the choice of matrix M . . for each pair of indices l.5. (4. we deﬁne: ⎧ ⎪ ⎪ W (xi ) ≡ {j : xi ∈ ∂Ωj } ⎨ degree (xi ) ≡ |W (xi )| (4.

if l. (4. T (i)T (i)T each vi = vI . each M (i) may further be partitioned as: (i) (i) (i) M (i) = [MI MB ] = [0 MB ]. Corresponding to each node xi ∈ B. vB corresponds to interior and boundary nodal values. Speciﬁcally. The matrices we shall construct will have their entries Mij chosen from {−1. +1}. selected based on the following observations. Then the continuity of vl and vj at node xi can be enforced as follows: . j ∈ W(xi ) let ˜li = index(xi . j ∈ W(xi ) we will require that the diﬀerence of the entries of vl and vj be zero at xi . There is arbitrariness in the choice of entries of M . there will be 12 degree(xi ) (degree(xi ) − 1) distinct pairs of subdomains which contain node xi . B (j) ). B (l) ) and ˜ji = index(xi .17) (i) The submatrix MI will be zero since the matching of boundary values does not involve interior nodal values. 0. For each l.

.

j ∈ W(xi ). and in the following. we shall require l < j. j are selected from W(xi ). if l. (l) (j) vB − vB = 0. By convention. describe two diﬀerent choices of matrices M depending on how many index pairs l. . 0.18) ˜ li ˜ ji This will yield entries of M to be from {−1. (4. +1}.

for each such node xi and consecutive indices l < j from W(xi ) deﬁne the entries of M as: ⎧ (l) ⎪ ⎪ MB k. l. choice 2 is preferable for parallel implementation [FA14]. For each node xi .r i All other entries in the k’th row of M are deﬁned to be zero. j) denote the numbering (between 1 and m) assigned to the constraint involving node xi and subvectors vl and vj .238 4 Lagrange Multiplier Based Substructuring: FETI Method Choice 1. For a two subdomain decomposition.19).r = 0. Choice 2. If l < j are consecu- tive indices in W(xi ). impose one constraint corresponding to each distinct pair l < j of indices in W(xi ).6. noting that l. i=1 By construction. . Consequently. In this case. An alternative choice of matrix M may be obtained as follows. if r = ˜j . . j ∈ W(xi ) need not be consecutive indices. .r ⎩ M (l) = 0. however. as may be veriﬁed by the reader. Remark 4. all such constraints will be linearly independent. matrix (1) (2) M will have the following block structure with MB = I and MB = −I: " # M = 0 I 0 −I . The entries of matrix M can be deﬁned as in (4. choices 1 and 2 will coincide. matrix M will not be of full rank if degree(xi ) ≥ 3 for at least one node xi . However. . For each consecutive pair of such indices. and the total number m of constraints: nB m= (degree(xi ) − 1) . i=1 2 In this case. For each node xi . if r = ˜ji ⎪ ⎪ B k. Consequently. In both cases. The actual entries of matrix M will depend on the ordering of the constraints used. all nodes on B will have degree two. . impose one constraint.19) ⎪ ⎪ M = −1. due to it being of full rank. Then. Choice 1 for M is easier to analyze than choice 2. the constraints will not be redundant. the constraint set V0 will satisfy: $ T % V0 = (R1 v)T . if r = ˜li (j) (4. and matrix M will be of full rank (with rank equal to m). arrange all the indices in W(xi ) in increasing order. In this case. (Rp v)T : v ∈ IRn = Kernel (M ) . Since there are degree(xi ) such indices. let k(i. there will be 12 degree(xi ) (degree(xi ) − 1) such constraints. provided all nodes on B are ordered identically in both subdomains. several of the constraints will be redundant if degree(xi ) ≥ 3. if r = li ˜ ⎪ ⎪ ⎪ ⎨ M (l) B k. ⎪ B k. so that: nB 1 m= degree(xi ) (degree(xi ) − 1) . yield- ing a total of degree(xi ) − 1 constraints corresponding to node xi .r = 1.

(4. ∀vE ∈ V0 t=0 ⇔ ∇JE (uE ) ⊥ V0 ⇔ ∇JE (uE ) ∈ Kernel(M )⊥ ⇔ ∇JE (uE ) ∈ Range(M T ). we describe the saddle point system associated with the constrained minimization problem (4.22) M 0 λ 0 Proof. We may represent any vector in Range(M T ) in the form −M T λ for λ ∈ IRm . At the saddle point of the Lagrangian function. referred to as Lagrange multipliers. Suppose the following assumptions hold. By construction. for each choice of nonzero vector vE ∈ V0 consider the line x(t) = uE + t vE ∈ V0 for t ∈ IR.14). the FETI method reformulates (4. Applying the derivative test yields: + dJE (x(t)) + dt + = 0. Further. Let uE = uT1 . In the following result. Let M be a matrix of size m × nE of full rank m. dt +t=0 Since uE corresponds to the minimum of JE (·) in V0 . and since x(t) ⊂ V0 with x(0) = uE the function JE (x(t)) will attain a minimum along the line at t = 0. there will exist a vector λ ∈ IRm such that: AEE M T uE fE = . whose saddle point (a critical point which is neither a local maximum nor a local minimum) yields the constrained minimum from its components.2 Lagrange Multiplier Formulation 239 4.21) 2. one for each constraint. Lemma 4.14). uTp ∈ IRnE denote the solution of: JE (uE ) = min JE (wE ) (4.14) as a saddle point problem (saddle point or Lagrange multiplier methodology is described in [CI4. GI3] and Chap. To verify the ﬁrst block row of (4. It introduces new variables. . Then.20) wE ∈V0 where V0 ≡ {wE ∈ IRnE : M wE = 0}. referred to as the Lagrangian function.7. it associates a function. and the resulting system of equations can be solved to determine the constrained minimum. 10). (4.2 Lagrange Multiplier Formulation To determine the solution to the constrained minimization problem (4. · · · . T 1. its gradient with respect to the original and Lagrange multiplier variables will be zero. ∀vE ∈ V0 ⇔ ∇JE (uE ) · vE = 0.22). 4. it passes through uE when t = 0 with: + dx(t) ++ = vE .

22). We shall associate the following dual function with the Lagrangian function. Each λi in λ = (λ1 .9. .240 4 Lagrange Multiplier Based Substructuring: FETI Method Choosing −M T λ (the negative sign here is for convenience). we may deﬁne the class G of admissible Lagrange multipliers as: G ≡ {µ : Z T (f E − M T µ) = 0}.12. λm ) is referred to as a Lagrange multi- plier. it will represent an inter-subdomain ﬂux.10. Deﬁnition 4.22). which yields the ﬁrst block row of (4.22). . To verify the second block row of (4. This latter requirement can be equivalently stated as Kernel(M ) ∩ Kernel(AEE ) = {0}. Recall that Z of rank d satisﬁes: Range(Z) = Kernel(AEE ). Since matrix AEE may be singular. the above inﬁmum could be −∞ if (f E − M T µ) ∈ Range(AEE ). we associate a Lagrangian function L(vE . µ). Using Z. and to require that matrix ATEE = AEE ≥ 0 be coercive on the null space V0 of M . µ . Remark 4. λ will not be uniquely determined. . then D(µ) > −∞. vE Remark 4. By deﬁnition. if µ ∈ G. To ensure solvability of (4. 11. for some λ ∈ IRm . Deﬁnition 4.11. There will be m Lagrange multipliers. Given µ ∈ IRm of Lagrange multipliers. µ) ≡ JE (vE ) + µT M vE (4. it is suﬃcient to require that M is an m×nE matrix of full rank m. note that since uE ∈ V0 . µ). the derivative test for the critical point of L(·. When M is not of full rank. we obtain: AEE uE − f E = ∇JE (uE ) = −M T λ. By construction. For µ ∈ IRm deﬁne the dual function D(µ): D(µ) ≡ inf L(vE . .2). µ) with the constrained minimization problem (4. ·) yields (4.8. one corresponding to each row of M which enforces one of the m constraints.23) = 12 vTE AEE vE − vTE f E + µT M vE . T Remark 4. For vE ∈ IRnE deﬁne a function E(vE ): E(vE ) ≡ sup L(vE .22).20): L(vE . Since each λi is a dual variable to the Dirichlet data (see Chap. we obtain M uE = 0.

It is easily veriﬁed that since L(·. We say that (uE . if M vE = 0 E(vE ) = JE (vE ) . following [FA14] we describe an iterative algorithm for obtain- ing the solution uE and λ to saddle point system (4. Remark 4. Due to the block structure of matrices M and Z. 10 need to be modiﬁed. Deﬁnition 4. The term “saddle point” is motivated by the following property. λ) = D(λ) ≤ L(vE .15. FA14].14. λ) as vE is varied. λ) yields system (4. Let uET . λ) corresponds to a minimum of L(vE .22): AEE uE + M T λ = f E (4. λ).25) M uE = 0. the saddle point (uE .16. µ) ≤ E(uE ) = L(uE . Suppose the following assumptions hold. ∀vE . and subsequently uE . So we deﬁne a class of admissible displacements vE as V0 : V0 ≡ {vE : M vE = 0} . By deﬁnition. that Z has rank d. traditional saddle point iterative algorithms from Chap.3 Projected Gradient Algorithm In this section.13.22).3 Projected Gradient Algorithm 241 Remark 4. ·) is linear in µ: +∞. 4. µ. . . the ﬁrst order derivative test (diﬀerentiation with respect to vE and µ) for a critical point of L(vE . µ) as µ is varied. if M vE = 0.) if the following conditions are satisﬁed: L(uE . (4.24) When local stiﬀness matrix A(i) is nonsingular. T 1. and we discuss these modiﬁcations [FA15. Thus. The next result describes a system for determining λ. 4. In the next section. As mentioned before. Since matrix AEE may be singular. Z (i) = 0 and M (i) Z (i) = 0. λ) is a saddle point of the Lagrangian functional L(. We assume that if AEE is singular. we describe an algorithm for determining uE and λ. and to a maximum of L(uE .22). We deﬁne G ≡ M Z as a matrix of size m × d. we obtain: " # G = M Z = M (1) Z (1) · · · M (p) Z (p) . µ) at (uE . if vE ∈ V0 then we will have E(vE ) = JE (vE ) < ∞.. Lemma 4. λT denote the solution to the saddle point system (4.

the following results will hold for G = M Z deﬁned by (4.24): 1.26) GT λ = g. where . and A†EE and (GT G)† denote Moore-Penrose pseudoinverses. where P0 ≡ I − G(GT G)† GT . 2. Given λ. The Lagrange multiplier λ will solve the following reduced system: P0 K λ = P0 e (4. e ≡ M A†EE f E . g ≡ Z T f E . the displacement uE can be determined as follows: ⎧ ⎨ uE = A†EE f E − M T λ + Zα.242 4 Lagrange Multiplier Based Substructuring: FETI Method Then. K ≡ M A†EE M T .

When the compatability condition GT λ = g is satisﬁed.25) yields: ⎧ ⎪ ⎪ AEE uE = f E − M T λ ⇐⇒ f E − M T λ ∈ Range(AEE ) ⎪ ⎨ ⇐⇒ f E − M T λ ⊥ Kernel(AEE ) ⎪ ⎪ ⇐⇒ Z T f E − M T λ = 0 ⎪ ⎩ ⇐⇒ GT λ = g. Since AEE is singular.28) GT λ = g. it follows that Rank(G) = d and P0 is an orthogo- nal projection onto a space of dimension m − d. This eﬀectively constitutes . and A†EE is the Moore- Penrose pseudoinverse of AEE . where g ≡ Z T f E . the general solution to the singular system AEE uE = f E − M T λ will be: uE = A†EE f E − M T λ + Zα. Proof. which corresponds to the Euclidean orthogonal projection onto Range(G)⊥ : P0 Kλ = P0 e GT λ = g.27) ⎩ α = (GT G)† GT Kλ − M A†EE f E . GO4]. Applying the constraint M uE = 0 to the above expression for uE yields: M A†EE f E − M T λ + M Zα = 0.28) can be eliminated by ap- plying P0 = I − G(GT G)† GT . The term Gα in the ﬁrst block equation in (4. Combining the compatability condition with the preceding yields the system: K λ − Gα = e (4. This corresponds to K λ − Gα = e. the ﬁrst block row in (4. Here α ∈ IRd is arbitrary. for K ≡ M A†EE M T and e ≡ M A†EE f E . Since d = Rank(Z). since matrix Z has rank d. which constitutes m + d equations for the m + d unknown entries of λ and α. see [ST13. (4.

When matrix AEE is nonsingular. it will be solvable by a conjugate gradient method in that subspace. the unknown coeﬃcient vector α can be determined using (4. However.3. the FETI method seeks the Lagrange multiplier variables λ ∈ IRm by solving: P0 K λ = P0 e (4. and g = 0. 4. Remark 4.30) are identical to the ﬁrst and second block equations in (4. In this case. e = M AEE f E .29) with weights based on matrix C.30): C T P0 K λ∗ = C T P0 e (4.31) GT λ∗ = g. The ﬁrst and third block equations in (4. . the coeﬃcient matrix in the ﬁrst block equation in (4. as outlined below. P0 = I.17. the reduced system (4.29) GT λ = g.30) ⎪ ⎩ GT λ = g. Remark 4. while the second block equation in (4.19 it is shown that this system is symmetric and positive deﬁnite within a certain subspace G∗ of IRm . Typically. Furthermore.29). both will be included for generality [FA14].29) as follows: ⎧ ⎪ ⎨ P0 K λ = P0 e C P0 K λ = C T P0 e T (4. Suppose λ∗ ∈ IRm can be found satisfying the 2nd and 3rd block equations in (4.1 Projected Gradient Algorithm to Solve (4. In Lemma 4.27) once λ is determined.3 Projected Gradient Algorithm 243 m = (m − d) + d equations for the unknown λ ∈ IRm . If Z (and hence G = M Z) has rank d. G = 0. 4. third block equation will have rank d. while the second block equation will be redundant consisting of q linear combinations of rows of the ﬁrst block equation.30) is redundant. We now motivate a projected gradient algorithm to solve (4. Thus. K = M A−1 T −1 EE M .26) will correspond to the stationarity condition for a maximum of the dual function D(µ) associated with the Lagrange multiplier variables. then the orthogonal projection matrix P0 will have rank (m − d).18.25) can be obtained using (4. corresponding to linear combinations of the ﬁrst block in (4.26) Since the solution to (4. however.30) will have rank (m − d). to include global transfer of information within the algorithm.30). and consequently.29). matrix Z will have zero rank and vector α can be omitted. Once λ is deter- mined by solving the above problem.28) as α ≡ (GT G)† GT Kλ − GT e . either matrix G = 0 or C = 0. Let C denote an m × q matrix having rank q where q < m. the FETI method solves a modiﬁed linear system equivalent to (4. Employing matrix C we modify system (4.

30) as λ = λ∗ + λ ˜ solves: ⎧ ⎪ ˜ = P0 (e − K λ∗ ) P0 K λ ⎨ C P0 K λ T ˜=0 (4. ˜ within the subspace G0 ⊂ IRm deﬁned by: If we seek the correction λ G0 ≡ µ ∈ IRm : C T P0 K µ = 0.32) ⎪ ⎩ T˜ G λ = 0.244 4 Lagrange Multiplier Based Substructuring: FETI Method ˜ provided λ Then.19 below. Importantly. we may seek the solution to (4.33) then. (4. ·): . GT µ = 0 . the second and third block equations in (4.30) will automatically hold. matrix P0 K will be symmetric and posi- tive deﬁnite in subspace G0 equipped with the Euclidean inner product (·. by Lemma 4.

.

˜ P0 K µ ˜ . for some c > 0. ˜ λ˜ ∈ G0 (4. ˜ µ) ˜ . ˜ = λ. Consequently. ˜ µ P0 K λ. ∀µ ˜ ∈ G0 . ∀µ.34) (P0 K µ. a projected conjugate gradient iterative method may be applied to determine λ ˜ within G0 so that: . ˜ µ) ˜ ≥ c (µ.

By applying the constraints (4. ∀µ ˜ ∈ G0 .36) GT G β ∗ + GT C γ ∗ = g. γ ∗ and δ ∗ : ⎧ T ⎪ T ⎨ G K (Gα∗ + Cβ ∗ ) + G Gµ∗ = G P0 e T C T K (Gα∗ + Cβ ∗ ) + C T Gµ∗ = C T P0 e ⎪ ⎩ GT (Gα∗ + Cβ ∗ ) = GT g. seek it as λ∗ = Gβ ∗ + Cγ ∗ where the coeﬃcient vectors β ∗ ∈ IRd and γ ∗ ∈ IRq are to be determined. . ˜ µ P0 K λ.37) yields the following block system for β ∗ . we obtain the following block equations for β ∗ and γ ∗ : C T P0 KGβ ∗ + C T P0 KCγ ∗ = C T P0 e (4. represent P0 Kλ∗ as Kλ∗ + Gδ ∗ where δ ∗ ∈ IRd denotes an unknown ⊥ coeﬃcient vector to be selected so that Kλ∗ + Gδ ∗ ∈ Range (G) : GT (Kλ∗ + Gδ ∗ ) = 0. (4. it will be advanta- geous to combine the computation of P0 Kλ∗ into the above system.31) holds. µ) ˜ . (4.37) Substituting P0 Kλ∗ = Kλ∗ + Gδ ∗ into (4.36) and applying the constraint (4. ˜ = (P0 (e − K λ∗ ). Accord- ingly.35) To determine λ∗ ∈ IRm such that (4. Rather than solve this system involving d + q unknowns.31) to Gβ ∗ + Cγ ∗ .

1. by deﬁnition P0 λ = λ and P0 µ = µ. The matrix P0 K will be symmetric in the subspace G∗ with: (P0 Kλ. Suppose the following assumptions hold. Also. (4. P0 Kµ) .35). choose µ. Deﬁne G∗ ≡ µ ∈ IRm : GT λ = 0 . µ) = (Kλ. Let M be of full rank. the following results will hold. P0 = I − G GT G GT and Range(Z) = Kernel(AEE ). 3. if GT µ = 0. satisfying: (P0 Kµ. P0 µ) ⎪ ⎪ ⎪ ⎪ = (Kλ. ∀wE such that Z T wE = 0. We next verify that P0 K is symmetric positive deﬁnite in G∗ . Let σ∗ (AEE ) be the smallest nonzero singular value of matrix AEE : wTE AEE wE ≥ σ∗ (AEE ) wTE wE . G = M Z. Then. The solution λ of (4. Lemma 4. P0 Kµ) . Let σ∗ (M ) be the smallest singular value of M . 2. Matrix P0 K will be positive deﬁnite.30) may now be expressed as λ = λ∗ + λ ˜ where λ ˜ solves (4. µ) ⎨ = (λ. Kµ) ⎪ ⎪ ⎩ = (λ. ·) denote the Euclidean inner product. µ) = (λ. Proof. λ ∈ G∗ . µ ∈ G∗ .3 Projected Gradient Algorithm 245 This system has the following block matrix form: ⎡ T ⎤⎡ ⎤ ⎡ T ⎤ G KG GT KC GT G β∗ G P0 e ⎣ C T KG C T KC C T G ⎦ ⎣ γ ∗ ⎦ = ⎣ C T P0 e ⎦ . Kµ) ⎪ ⎪ ⎪ ⎪ = (P0 λ. let (·. ∀λ. the solution λ∗ to problem (4. we obtain: ⎧ ⎪ ⎪ (P0 Kλ. µ) . µ) ≥ σ∗ (AEE ) σ∗ (M ) (µ. † 1. K = M A†EE M T .19.38) is solved. To show that P0 K is symmetric in G∗ . Since P0T = P0 and K T = K. . 2. 4.38) GT G GT C 0 δ∗ GT g Once system (4. 4. Then.31) is: λ∗ = Gα∗ + Cβ ∗ .

i. µ) = (Kµ. µ) ⎪ ⎪ ..246 4 Lagrange Multiplier Based Substructuring: FETI Method To verify positive deﬁniteness. suppose that µ ∈ IRm \{0} and that GT µ = 0. Z T M T µ = 0.e. Then. applying the hypothesis yields: ⎧ ⎪ ⎪ (P0 Kµ.

µ . ⎪ ⎪ † ⎪ ⎨ = M A EE M T µ.

we shall seek its projection Qλ ∈ G0 in the form: Qλ ≡ λ + Gβ + Cγ.40) in the two special cases of interest. In the following.40) 0 GT G GT C 0 GT A pseudoinverse was employed in the above since in the cases of interest. . see (4. satisfying Q2 = Q) will be employed to project residuals or preconditioned updates onto the subspace G0 each iteration. (4. Lemma 4. a projection matrix Q (possibly oblique. the PCG method may be employed to determine λ ˜ ∈ G0 . M T µ ⎪ ⎪ T ⎪ ⎪ ≥ σ (A EE ) M µ. µ) . γ ∈ IRq are chosen to satisfy: ⎧ T ⎨ G K (Gβ + Cγ) + G Gδ = −G Kλ ⎪ T T C T K (Gβ + Cγ) + C T Gδ = −C T Kλ ⎪ ⎩ GT (Gβ + Cγ) = −GT λ. The resulting projection Q will thus have matrix representation: ⎡ ⎤T ⎡ T ⎤† ⎡ T ⎤ GT G KG GT KC GT G G K Q ≡ I − ⎣ C T ⎦ ⎣ C T KG C T KC C T G ⎦ ⎣ C T K ⎦ . δ ∈ IRd was introduced to represent: P0 K (Gβ + Cγ) = K (Gβ + Cγ) + Gδ. To do this. By con- struction Qλ ∈ G0 . to ensure that all iterates and residuals in the conjugate gradient algorithm remain within the subspace G0 .19 shows that P0 K is symmetric and positive deﬁnite in G∗ . however. As a result. (4. and hence in G0 ⊂ G∗ . we describe the projection matrix (4.39) where the coeﬃcient vectors β ∈ IRd . In the following. since Z T M T µ = 0 and since µ = 0. Given λ ∈ IRm . M µ T ⎪ ⎪ ∗ ⎩ ≥ σ∗ (AEE ) σ∗ (M ) (µ. we derive an expression for such a (possibly oblique) projection matrix Q. either C = 0 or G = 0.33). In the above. Care must be exercised. and this coeﬃcient matrix will become singular. † ⎪ = A EE M T µ.

the FETI algorithm will typically have convergence rates deteriorating only mildly with increasing number of nodes per subdomain. Form of Q when c(x) ≥ c0 > 0. however.1) with c(x) = 0. only the action QN needs to be computed when the residuals from previous iterates lie in G0 . so that: λ∗ = G(GT G)−1 GT g. If the coeﬃcient c(x) ≥ c0 > 0 in (4. In this case G = 0 and P0 = I. . ˜ . and a suitable precondi- tioner. Such transfer may be included in a suitably constructed preconditioner. will be nonsingular. For this choice of matrix C. the convergence rate may deteriorate only mildly with h. However. employing the projection matrices P0 and Q and a preconditioner. A preconditioner for P0 K can be sought within G0 . The nonhomogeneous term λ∗ can be sought as λ∗ = Gβ ∗ with: GT Gβ ∗ = GT g. While this may be viewed as an advantage. The algorithm below includes the computation of λ∗ and λ. 4. In this case Z. then the local stiﬀness matrices A(i) . When P0 K is suitably preconditioned. so that λ∗ = C(C T KC)−1 C T P0 e. the operator Q = I − G(GT G)−1 GT reduces to P0 and will be an orthogonal projection in the Euclidean inner product. it will be advantageous to include it by selecting a nontrivial matrix C ≡ M Z˜ where Z˜ is an nE × d matrix whose columns form a basis for Kernel(A˜EE ) where A˜EE = blockdiag(A˜(1) . In this case.1). the projection onto subspace G0 will provide global transfer of information. . In applica- tions. due to the block diagonal terms M (i) Z (i) in matrix G. then the subdomain stiﬀness matrix A(i) will be singular when Ωi is a ﬂoating subdomain. A˜(p) ) denotes the extended stiﬀness matrix arising from discretization of the elliptic operator in (4. In this case. If c(x) = 0 in (4. Computation of the initial nonhomogeneous term λ∗ reduces to: (C T KC) β ∗ = C T P0 e.1). and hence AEE . and typically C is chosen to be 0 (or equivalently omitted).3 Projected Gradient Algorithm 247 Form of Q when c(x) = 0. . will be nontrivial. whose inverse has the form QN QT (though in practice. it will be suﬃcient to evaluate only QN ) for a matrix N . so that the action of the inverse of the preconditioner has the form QN QT where N is symmetric (in the Euclidean inner product). We may now summarize the FETI algorithm. . and hence G = M Z. Importantly. operator Q = I − C(C T KC)−1 C T K and will be orthogonal only in the K induced inner product. it results in an algorithm without any built in mechanism for global trans- fer of information.

Compute the residual: r0 ≡ P0 (Kλ∗ − e).3. 5. Endfor 7. 4.1 (FETI Algorithm to Solve (4.26)) Let λ0 de a starting guess (for instance λ0 = 0) 1. 2. For k = 1.248 4 Lagrange Multiplier Based Substructuring: FETI Method Algorithm 4. · · · until convergence do: ⎧ ⎪ ⎪ zk−1 = N rk−1 preconditioning ⎪ ⎪ ⎪ ⎪ yk−1 = Qzk−1 projection ⎪ ⎪ ⎪ ⎪ ⎨ ξ k = rk−1 yk−1 T ⎪ pk = yk−1 + ξk−1 ξk pk−1 (p1 ≡ y0 ) ⎪ ⎪ ⎪ ν k = T ξk ⎪ ⎪ ⎪ pk P0 Kpk ⎪ ⎪ ⎪ ⎪ λ = λk−1 + νk pk ⎪ ⎩ k rk = rk−1 − νk P0 Kpk 6. Compute: . Compute: e ≡ M A†EE f E g ≡ ZT f E 2. Solve the following system (using a pseudoinverse): ⎧ T ⎨ G K (Gβ ∗ + Cγ ∗ ) + G Gδ ∗ = G P0 (e − Kλ0 ) ⎪ T T C T K (Gβ ∗ + Cγ ∗ ) + C T Gδ ∗ = C T (P0 e − Kλ0 ) ⎪ ⎩ GT (Gα∗ + Cβ ∗ ) = GT g. Deﬁne: λ∗ ← λ0 + Gβ ∗ + Cγ ∗ . 3.

for matrix P0 K. Both the preconditioners considered below have a similar structure. a coarse space term will be unnecessary in FETI preconditioners. . α ≡ (GT G)† GT Kλ − M A†EE f E u = A†EE f E − M T λ + Zα.3. Since information will be transferred globally within the FETI algorithm in the projection step involving matrix Q.2 Preconditioners for P0 K We shall describe two preconditioners proposed in [FA15]. 4. We next describe preconditioners of the form Q N in the FETI algorithm. and are motivated as follows.

This for- mal analogy suggests that preconditioners can be sought for K having a similar structure to Neumann-Neumann preconditioners [FA14. (1) given a two subdomain decomposition the constraints will be MB = I and (2) 2 † MB = −I.41) ⎪ ⎪ MB (i)T AIB ABB (i) MB ⎪ ⎪ ⎪ ⎪ p ⎩ (i) † (i)T = i=1 MB S (i) MB . so that K = i=1 S (i) . Other preconditioners based on analogy with two subdomain Schur com- plement preconditioners are also possible. the resulting con- dition number will be independent of h. 0. then the (1)† † formal inverses of S and S (2) will be spectrally equivalent to each other (independent of h). 1} are interpreted as boundary restriction matrices Ri . it can be veriﬁed by employing the block structure of A(i) and the algebraic deﬁnition of the pseudoinverse of a matrix. 4. The last equation above follows easily when submatrix A(i) is nonsingular. For instance. KL8].3 Projected Gradient Algorithm 249 Since matrices AEE and M have the following block structures: AEE = blockdiag A(1) . i=1 (i) Each matrix M will have the following block structures due to the ordering of interior and boundary nodes within each subdomain: (i) (i) (i) M (i) = [MI MB ] = [0 MB ] (i) where MI = 0 since the continuity constraint involves only interface un- knowns. The additive expression for K in (4. Substituting this into the preceding expression for K yields: ⎧ p † ⎪ T ⎪ K = i=1 M (i) A(i) M (i) ⎪ ⎪ ⎪ ⎪ T ⎨ (i) † p 0 (i) AII AIB 0 = i=1 (i)T (i)T (4. the heuristic similar- ity with Neumann-Neumann preconditioners suggests a preconditioner whose formal inverse has the structure: ⎧ . matrix K = M A†EE M T will formally satisfy: p † T K= M (i) A(i) M (i) . · · · . we may heuristically deﬁne the action of the 2 inverse of a preconditioner for K by QN = Q i=1 S (i) . " # M = M (1) · · · M (p) . By construction. (i) heuristically. If matrix AEE is nonsingular. A(p) . More generally. provided the boundary constraint matrices MB with entries from {−1.41) resembles a subassembly identity. When A(i) is singular. In this case.

⎨ QN ≡ Q p M (i) S (i) M (i) T .

i=1 B .

42) ⎩ p (i) (i) (i)T (i)−1 (i) (i)T =Q i=1 MB ABB − AIB AII AIB MB . . B (4.

since computation of the action of . and the resulting preconditioner is referred to as a Dirichlet preconditioner.250 4 Lagrange Multiplier Based Substructuring: FETI Method In this case computing the action of QN will require the solution of a local Dirichlet problem on each subdomain.

The action QL of the inverse of an alternative preconditioner. (i) (i)T (i)−1 (i) ABB − AIB AII AIB requires the solution of a Dirichlet problem on each subdomain. is obtained as follows: . referred to as the lumped preconditioner.

λmin (QN P0 K) Proof. our notation diﬀers from that in [DO. MA18. See [MA25. h and jumps in the coeﬃcients: λmax (QN P0 K) 3 cond (P0 K. More general constraints are considered in [TO10]. 1. DO2. we describe two popular variants of the FETI method to solve the saddle point problem (4. To be consistent with preceding sections. The following bounds hold for the Dirichlet preconditioner. except for a family of chosen continuity constraints. MA19]. The FETI-DP (Dual-Primal) method solves a reduced version of (4. Theorem 4. The resulting preconditioned matrices have the same spectra.20. and improve upon the scalability of the FETI algorithm in three dimensions. yielding robust conver- gence.4 FETI-DP and BDDC Methods In this section. Both methods are CG based.22) while BDDC (Balancing Domain Decomposition with Constraints) corresponds to a primal version of FETI-DP [FA11. KL8]. we describe the reduction of system (4. Both methods work in a class of local solutions which are discontinuous across the subdomain boundaries. This preconditioner does not require the solution of local Dirichlet problems (i) and is obtained by approximating the local Schur complement S (i) ≈ ABB . prior to formulating the FETI-DP and BDDC methods. except for zeros or ones. DO. In the following. QN ) ≡ ≤ C (1 + log(h0 /h)) . p (i) (i) (i)T QL ≡ Q i=1 M B A BB M B . DO2.22) or its associated primal formulation.22) to a smaller saddle point system and introduce notation. we only consider simple continuity constraints across cross points. FA10. The following theoretical results will hold for the preconditioner Q N . ST4. There exists C > 0 independent of h0 . 4. For simplicity. MA19]. . MA18. edges and faces of a subdomain boundary.

4.4 FETI-DP and BDDC Methods 251 Reduced Saddle Point System.22) by elimination of (l) the interior unknowns uI for 1 ≤ l ≤ p.22). we re-order the block vector uE as T T T (uI . where: . uB ) in the saddle point system (4. To reduce system (4.

T .

⎥.. . while the load vectors satisfy: . . . This will yield the following reordered system: ⎡ ⎤⎡ ⎤ ⎡ ⎤ AII AIB 0 uI fI ⎢ T T ⎥⎢ ⎥ ⎢ ⎥ ⎣ AIB ABB MB ⎦ ⎣ uB ⎦ = ⎣ f B ⎦ . T (1)T (p)T (1)T (p)T uI = uI . (4. ABB = ⎢ ⎦ ⎣ . . .. ⎥ . uI and uB = uB . ⎦ (p) (p) (p) 0 AII 0 AIB 0 ABB with matrix MB = MB(1) · · · MB (p) . . .. AIB = ⎢ ⎦ ⎣ . .43) 0 MB 0 λ 0 where the block submatrices AII . AIB and ABB are deﬁned as follows: ⎡ (1) ⎤ ⎡ (1) ⎤ ⎡ (1) ⎤ AII 0 AIB 0 ABB 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ AII = ⎢ ⎣ . . uB . . . ⎥ .

T T .

43) can be obtained by solving (4. The reduced saddle point system will be: SEE MBT uB ˜ fB = . Substituting this expression into the second block row of (4. . ⎦ BB IB II IB 0 S (p) The solution to (4.43) yields a reduced saddle point system for determining uB and λ. the matrices AXY and MB .6) for X. T . . fI and f B = f B . Here MB and MB will be of size (l) (1) (p) m × nB and m × nB respectively. f X are as in (4. . and SEE of size nB = (nB + · · · + nB ).10) and (4. (4.43). (l) (l) (l) (l) Here. and vectors uX . . . We solve for uI = A−1 II (f I − AIB uB ) using the ﬁrst block row of (4.44) MB 0 λ 0 f B ≡ f B − ATIB A−1 where ˜ −1 II f I and SEE = (ABB − AIB AII AIB ) satisﬁes: T ⎡ ⎤ S (1) 0 ⎢ ⎥ −1 SEE = ⎢ ⎥ where S (i) ≡ (A(i) − A(i) A(i) A(i) ). B. . Y = I. T T (1) (p)T (1) (p)T fI = fI . .44) for uB and λ. . ⎣ . and subsequently uI = A−1 (l) II (f I − AIB uB ). f B . ..

and U parameterizes the global degrees of freedom on B. . so that B can be further partitioned into globs. . . Let RE : U → W denote the restriction matrix from U into W: " # " #T RTE = RT1 · · · RTp and RE = RT1 · · · RTp . We employ the notation: • Let U = IRq be the space of nodal vectors associated with ﬁnite element functions on B in traditional substructuring. . 1). EnE . . • By construction Kernel(MB ) = Range(RE ). 1). . thus MB RE = 0. let Rl u denote the restriction of the vector u of nodal values on B onto the indices of nodes on B (l) . . We assume the cross points. . We let nX denote the number of distinct cross points and enumerate them as X1 . where MB vB denotes the jump discontinuity in vB across the subdomains. we heuristically deﬁne a face as any non-trivial segment int(∂Ωl ∩ ∂Ωj ) which can be mapped homeomorphically onto the open square (0. . (l) • Let Wl ≡ Range(Rl ) = IRnB denote the space of local nodal vectors associated with displacements on B (l) and let W ≡ (W1 × · · · × Wp ) be the (1) (p) space of extended local displacements with dim(W) = (nB + · · · + nB ). edges and faces when Ω ⊂ IR3 . and let B = ∪pl=1 B (l) be the interface. for vB ∈ W. 1). 1) × (0. We deﬁne a cross point as any endpoint in Ω of an edge. . . Ωp let B (l) = ∂Ωl ∩ Ω denote the interior segments of subdomain boundaries. We deﬁne a cross-point as an endpoint within Ω of an edge. . . . We shall assume that Ω1 . We let nE denote the number of distinct edges and enumerate them as E1 . . For 1 ≤ l ≤ p and u ∈ U. EnE . MB is of size m × nB and m ≡ dim(Λ) ≥ q also denotes the number of Lagrange multiplier variables. l=1 which is employed in traditional iterative substructuring. Here. such as cross points and edges for Ω ⊂ IR2 . . . Restriction and Extension Maps. q equals the number of nodes of the triangulation on B. FnF . Given Ω1 .252 4 Lagrange Multiplier Based Substructuring: FETI Method Primal and Dual Spaces. . Here. We let nX be the number of distinct cross points and enumerate them as X1 . . . We let nE denote the number of distinct edges and enumerate them as E1 . When Ω ⊂ IR3 . • Denote the primal Schur complement matrix S of size q × q as: p S ≡ RE SEE RE = T RTl S (l) Rl . We assume that the cross points and edges partition B. we heuristically deﬁne an edge as any non-trivial segment int(∂Ωl ∩∂Ωj ) which can be mapped homeomorphically onto the open segment (0. XnX . . (l) Thus Rl will be an nB × q matrix with zero-one entries. . . . . . or cross points. where RE is a matrix of size nB × q. . . XnX . • Let MB : W → Λ. We let nF denote the number of distinct faces and enumerate them as F1 . We heuristically deﬁne globs such as cross points. in the following. Ωp are geometrically conforming. We deﬁne an edge as any non-trivial intersection in Ω of two faces int(F l ∩ F j ) which can be homeomorphically mapped onto the open interval (0. edges and faces. edges and faces partition B. When Ω ⊂ IR2 . .

Each row of Q0 will be associated with a distinct glob of B. In the FETI-DP and BDDC methods. • If the i’th row of Q0 is associated with a cross point Xl then: 1 if node j in B is the cross point Xl (Q0 )ij = 0 otherwise • If the i’th row of Q0 is associated with the edge El then: 1 if node j in B lies in El (Q0 )ij = |El | 0 if node j in B does not lie in El where |El | denotes the number of nodes in El .22. Thus. Deﬁne Q0 as an q0 × q matrix which maps onto the coarse degrees of freedom. Each row of Q0 will be associated with a distinct glob of B. in some chosen ordering of the globs. We deﬁne Q0 as an q0 × q matrix which maps onto the coarse degrees of freedom. More generally. one coarse degree of freedom will be associated with each distinct glob in B. the entries of the local mass matrix on the glob must be divided by its row sum. There will be as many coarse degrees of freedom or coarse basis functions as there are distinct globs in B. let q0 = (nF + nE + nX ) denote the number of coarse degrees of freedom on B. as follows. and one basis function with mean value one on each glob with zero nodal values outside the glob will employed in formulating the primal space. The above weights are uniform within each glob. then (Q0 u)i will be the mean value of u on the glob associated with row i. When Ω ⊂ IR2 . When Ω ⊂ IR3 . let q0 = (nE + nX ) denote the number of coarse degrees of freedom. 4. as described below. in some chosen ordering of the globs. for simplicity. Deﬁnition 4.21. and we shall assume that the interface B can be partitioned into distinct globs. if u ∈ U = IRq is a nodal vector of global degrees of freedom on B. • If the i’th row of Q0 is associated with cross point Xl then: 1 if node j in B is the cross point Xl (Q0 )ij = 0 otherwise • If the i’th row of Q0 is associated with the edge El then: 1 if node j in B lies in El (Q0 )ij = |El | 0 if node j in B does not lie in El where |El | denotes the number of nodes in El . edges and faces are referred to as globs. Deﬁnition 4.4 FETI-DP and BDDC Methods 253 The disjoint cross points. .

Since each coarse degree of freedom is associated with a distinct glob. Thus. only certain coarse degree of freedom will (i) be non-zero on B (i) . the entries of the local mass matrix on the glob must be divided by its row sum. We deﬁne a matrix Ci ≡ Rci Q0 RTi for 1 ≤ i ≤ p. Deﬁnition 4. then (Q0 u)i will be the mean value of u on the glob associated with row i. We also deﬁne C ≡ blockdiag(C1 . We then (i) deﬁne a restriction matrix Rci of size q0 × q0 as a matrix with zero or one entries which picks the coarse degrees of freedom which are non-zero on B (i) . with six faces. for simplicity.254 4 Lagrange Multiplier Based Substructuring: FETI Method • If the i’th row of Q0 is associated with the face Fl then: 1 if node j in B lies in Fl (Q0 )ij = |Fl | 0 if node j in B does not lie in Fl where |Fl | denotes the number of nodes in Fl . each glob either lies entirely within a subdomain boundary segment B (i) or does not lie in B (i) . (i) we deﬁne a restriction matrix Rci of size q0 × q0 as follows: 1 if glob j in the global ordering is l in the local ordering on B (i) (Ri )lj ≡ c 0 otherwise.25. . Using the restriction matrices Rci and the coarse degrees of freedom matrix (i) (i) Q0 . More generally. ⎥. we deﬁne a family of constraint matrices Ci of size q0 × nB that will be employed to formulate the primal and dual spaces. Deﬁnition 4. if u ∈ U = IRq is a nodal vector of global degrees of freedom on B. . . Remark 4. with four cross points and four edges. the weights are uniform within each glob. If Ω i ⊂ Ω ⊂ IR3 is a box. Here too.23. then there will be eight coarse degrees of freedom associated with ∂Ωi . For instance. Cp ) as the block diagonal matrix of size (1) (p) (1) (p) (q0 + · · · + q0 ) × nB (where nB = (nB + · · · + nB ): ⎡ ⎤ C1 0 ⎢ ⎥ C≡⎢ ⎣ . .24. ⎦ 0 Cp .. Let q0 denote the number of globs in B (i) . then there will be twenty six coarse degrees of freedom on ∂Ωi . and since by deﬁnition. Given a global ordering of the q0 globs (and associated (i) coarse degrees of freedom) on B and a local ordering of the q0 globs on B (i) . twelve edges and eight cross points. if Ω i ⊂ Ω ⊂ IR2 is a rectangle. .

. Deﬁnition 4. Thus. wB )T : Ci wB = 0 for 1 ≤ i ≤ p . and deﬁned as the span of q0 local basis functions whose coarse degrees of freedom are continuous across the subdomains.. then wi will be zero at all the cross points in B (i) . . . Since RTi and Rci are matrices with zero or one entries. . The other degrees of freedom in W∗ may be discontinuous across the subdomain boundaries.e. Below. while the space WP . i. We deﬁne W∗ as the following subspace of W: $ % (1)T (p)T W∗ ≡ wB = (wB . WD . i..29. . (1) (p) Deﬁnition 4. The primal space WP will be employed as a coarse space. with at most one non-zero entry per row or column. 4. whose associated ﬁnite element functions are discontinuous across the subdomains. . We deﬁne a matrix Rc of size (q0 + · · · + q0 ) × q0 as: ⎡ c⎤ R1 ⎢ . involves global constraints. with mean value zero on the edges and faces (if any) in B (i) .27. The FETI-DP and BDDC methods employ several subspaces W0 .e. for each wB ∈ W∗ there must exist some u ∈ IRq0 such that C wB = Rc u. where WD is referred to as the dual space and involves local constraints. wB )T : C wB ∈ Range(Rc ) . The dual space WD ≡ Kernel(C) ⊂ W∗ will consist of local nodal vectors whose coarse degrees of freedom (mean value on each glob) are zero on each subdomain boundary: (1)T (p)T (i) WD ≡ Kernel(C) = wB = (wB . if Ci wi = 0. WP and W∗ of W.4 FETI-DP and BDDC Methods 255 Remark 4.. . The primal space WP will be a subspace of W∗ complementary to WD . Deﬁnition 4. then Ci wi will (i) compute the average value of wi on each of the q0 distinct globs on B (i) . in the local orderings. The space W∗ can be further decomposed as a sum of two spaces: W∗ = WD + WP . which is referred to as the primal space. we deﬁne W∗ ⊂ W as the space of local nodal vectors whose local coarse degrees of freedom are unique. ⎥ R ≡⎢ c ⎣ . . continuous across the subdomain boundaries. if wi ∈ Wi . Recall that W = (W1 × · · · × Wp ) denotes the space of nodal vectors on the boundaries. ⎦ ⎥ Rcp corresponding to a restriction of global coarse degrees of freedom on B onto the local coarse degrees of freedom onto each of the local boundaries B (i) .28. .26.

The FETI-DP and BDDC methods will employ the following property.45) C 0 µ 0 reduces to the solution of: (i) ˜(i) S (i) CiT uB fB = for 1 ≤ i ≤ p. . Thus WP and WD are complementary: T T T W∗ = WP + WD . c Φp (l) where Φl is of size nB × q0 and Φ is of size nB × q0 with dim(Range(Φ)) = q0 . Remark 4. S (p) ) and C = blockdiag(C1 . wi ≡ (vB −Φi u) will satisfy Ci wi = 0 for 1 ≤ i ≤ p. µTp )T . . . if vB ∈ W∗ . By construction. ˜f B = (˜f B . λT )T to (4.32. . . Remark 4.30. by reordering the system. Thus. We deﬁne the primal space as WP ≡ Range(Φ) where: ⎡ ⎤ Φ1 ⎢ . . . vB )T . The subspace W0 ≡ Kernel(MB ) satisﬁes: W0 ⊂ WD ⊂ W∗ ⊂ W. The minimization of JB (vB ) = 12 vTB SEE vB − vTB ˜f B subject to the constraint that C vB = 0 can be reduced to p concurrent local problems. . .44) using a PCG algorithm to determine λ ∈ IRm .46) Ci 0 µi 0 If ˜f B = SEE wB and SEE is positive deﬁnite within WD . FETI-DP Method. ⎥ Φ≡⎢ ⎥ ⎣ . . each uB ∈ W∗ may be decomposed and sought in the form: uB = u D + Φ u c where C uD = 0 and Φ uc ∈ WP .. ⎦ where Ci Φi = Ri for 1 ≤ i ≤ p. The FETI-DP method seeks the solution (uTB . Then. Cp ). . . . . Henceforth. It is based on the decomposition uB = uD + Φ uc where wD ∈ WD with C w D = 0 and Φ uc ∈ WP . let (1)T (p)T (1)T (p)T vB = (vB . and µ = (µT1 . (4. . .31. ˜f B )T .44) by maximizing a dual function F(λ) associated with (4.256 4 Lagrange Multiplier Based Substructuring: FETI Method Deﬁnition 4. . Indeed. then there exists u ∈ IRq0 such (i) (i) that Ci vB = Rci u.33. . We recall f B ≡ f B − ATIB A−1 the saddle point problem (4. . . . yielding that (w1 . since SEE = blockdiag(S (1) . As a result.44) with ˜ II f I : . . . Remark 4. then it is easily veriﬁed that uB = PWD wB where PWD denotes the SEE -orthogonal projection onto WD . we assume that matrix SEE is positive deﬁnite within W∗ . wp ) ∈ WD . the solution to: SEE C T uB ˜ fB = (4. .

47) MB 0 λ 0 The Lagrangian function associated with the above saddle point problem is: 1 T L(uB . uTc . µT . 4. In the following. λ) = u SEE uB − uTB ˜f B + λT MB uB .51) . λT )T results in the system: ⎡ ⎤⎡ ⎤ ⎡ ˜ ⎤ SEE CT SEE Φ MBT uD fB ⎢C ⎢ 0⎥⎢ µ ⎥ ⎢ 0 ⎥ ⎥ ⎥ ⎢ ⎢ 0 0 ⎥ ⎢ T ⎥⎢ ⎥=⎢ ⎥.49) ⎣ Φ SEE 0 ΦT SEE Φ ΦT MBT ⎦ ⎣ uc ⎦ ⎣ ΦT ˜ fB ⎦ MB 0 MB Φ 0 λ 0 The FETI-DP method solves the above system by solving a symmetric positive deﬁnite system for determining λ ∈ Λ = IRm by a PCG method. This will alter µ and λ. LT ≡ ⎣ 0 ⎦ .4 FETI-DP and BDDC Methods 257 SEE MBT uB ˜ fB = .50) L 0 λ 0 where the matrices K and L and the vectors x and g are as described next. The FETI-DP method seeks uB = uD + uP where C uD = 0 and uP = Φ uc . x ≡ ⎣ µ ⎦ . we express system (4.48) ⎣C 0 0 0⎦⎣ µ ⎦ ⎣ 0 ⎦ MB MB Φ 0 0 λ 0 Rearranging the unknowns as (uTD . λ) ≡ L(uD + Φ uc . (4. (4.49) more compactly as: K LT x g = (4. uc . 2 B Since the constraint MB uB = 0 yields uB ∈ W0 ⊂ W∗ . µ. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ SEE CT SEE Φ MBT uD ˜ fB ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ K ≡ ⎣C 0 0 ⎦ . ΦT SEE 0 ΦT SEE Φ ΦT MBT uc ΦT ˜ fB (4. g ≡ ⎣ 0 ⎦. Seeking the saddle point of the augmented Lagrangian: Laug (uD . The constraint C uD = 0 can be imposed by augmenting the Lagrangian with (1) (p) the term µT CuD for µ ∈ IRq0 +···+q0 . λ) + µT CuD will yield the following saddle point system: ⎡ ⎤⎡ ⎤ ⎡ ˜ ⎤ SEE SEE Φ C T MBT uD fB ⎢ ΦT S T T T ⎥⎢ ⎥ ⎢ Φ MB ⎥ ⎢ uc ⎥ ⎢ ΦT ˜ ⎥ ⎢ EE Φ SEE Φ 0 fB ⎥ ⎢ ⎥⎢ ⎥=⎢ ⎥. but not uB . (4. we may alternatively minimize the functional within W∗ subject to the constraint MB uB = 0.

within the subspace C vD = 0. . .38. Remark 4. however. MA18. as follows: ⎡ ⎤⎡ ⎤ ⎡ ⎤ SEE CT SEE Φ uD g1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎣C 0 0⎦⎣ µ ⎦ = ⎣ 0 ⎦ (4. (4. D(p) ) : W → W be a discrete partition of unity: RTE D RE = I.35. Here. since C uD = 0. A system of the form K x = g can be solved by duality. Remark 4. By deﬁnition. In the following. see [DO.53). . it will hold that Sc = ΦT (I − PWD )T SEE (I − PWD )Φ. µT )T by solving (4.258 4 Lagrange Multiplier Based Substructuring: FETI Method The FETI-DP method seeks the solution to (4. we may determine (uTD . where ⎪ ⎪ ⎛ ⎪ ⎪ T −1 ⎞ ⎪ ⎪ T ⎪ ⎨ Sc = ⎝ΦT SEE Φ − S EE Φ S EE C S EE Φ ⎠ 0 C 0 0 (4. since SEE and C are both block diagonal. FA10. we may solve the ﬁrst two block rows above to obtain: −1 uD SEE C T g1 − SEE Φ uc = . Let D = blockdiag(D(1) . Matrix Sc of size q0 can be shown to be sparse and can be assembled in parallel. The FETI-DP preconditioner F0 for F is chosen so that both the FETI-DP and BDDC preconditioned matrices have the same spectra. . and is indeﬁnite. see [FA11. F λ = d arises as the condition for maximizing the dual function F(λ) = inf x Laug (x. matrix F = F T . Sc will be positive deﬁnite within WP . DO2. MA19] and Remark 4.52) ΦT SEE 0 ΦT SEE Φ uc g2 Given uc . we elaborate on the action of K −1 . However. we obtain x = K −1 (g − LT λ). it will be positive deﬁnite. MA19].53) µ C 0 0 Substituting this into the third block row yields the reduced system for uc : ⎧ ⎪ Sc uc = gc .33. where T the inverse of the preconditioner is MD SEE MD . matrix K can be veriﬁed to be positive deﬁnite. see Remark 4. By Remark 4. As a result. λ): F ≡ (LK −1 LT ) and d ≡ (LK −1 g). Once λ ∈ IRm is determined. .54) ⎪ ⎪ ⎪ ⎪ T −1 ⎪ ⎪ SEE Φ SEE C T g1 ⎪ ⎪ ⎩ gc = g2 − .36.44) by eliminating x and by solving the resulting reduced system F λ = d for λ by a PCG method. Matrix K is a saddle point matrix. 0 C 0 0 Once Sc uc = gc is solved.34.

4. Substituting this.4 FETI-DP and BDDC Methods 259 where each D(l) : Wl → Wl is a diagonal matrix with non-negative diagonal entries. (4. wTc )T = 0 for λ = 0.36.l. Let ind(α. we will obtain that: T T wD SEE SEE Φ wD x Kx = T T . Remark 4. MA19] is a PCG method to solve the primal problem associated with system (4. We express F = (LK −1 )K(K −1 LT ) and for λ ∈ IRm let: ˜ T . MA18.j) ≡ 0 if α ∈ B (l) Let MD be a matrix the same size as MB deﬁned by: (1) (p) MD ≡ D∗ M (1) · · · D∗ M (p) . Then. we may seek uB = RE u for some u ∈ U = IRq .j) if α ∈ B (l) ∩ B (j) (D∗ )ind(α. x = (wTD . since C wD = 0. Diagonal dual weight matrices (l) D∗ : Λ → Λ. We deﬁne the diagonal dual matrix D∗ for all α ∈ B as: (l) (D(j) )ind(α.55) wc Φ SEE Φ SEE Φ wc The latter will be positive provided SEE is positive deﬁnite within W∗ (which we assume to hold) and provided (wTD .44). The inverse F0−1 of the FETI-DP preconditioner for F is: F0−1 ≡ MD SEE MD T =⇒ cond(F0 . see [RI5. DO2. j) denotes the row index in MB which enforces the matching between the local nodal values at α in B (l) and in B (j) . The BDDC method [DO. µ Then. and that ind(α. Recall that each row of MB is associated with a matching requirement between nodal values on two distinct subdomains. Suppose that a node α on B lies on B (l) ∩ B (j) . the primal problem associated with (4. Matrix F can be veriﬁed to be positive deﬁnite as follows. F ) ≤ c (1 + log2 (h0 /h)). FR]. it can be shown that MB MD T MB = MB and MD T MB + RE RTE D = I.56) The BDDC preconditioner S0 is formulated using the same coarse space and local saddle point problems employed in the FETI-DP method. Such weight matrices are employed in the BDDC method to average the solution on diﬀerent subdomain boundaries. are deﬁned based on the entries of the matrices D(j) as follows. Since MB uB = 0. KL10. each of size m. Matrix S0−1 S . (4. wTc )T = K −1 LT λ. BDDC Method. j) denote the index of the node α in the local (l) ordering in B (j) .44) can easily be veriﬁed to be the Schur complement system arising in traditional substructuring: Su=f where S ≡ (RTE SEE RE ) and f = (RTE ˜f B ). l.

In practice. C 0 G Rc ˆ then since C Φ = Rc . 0 C 0 0 Employing Remark 4. . can be assembled using either expression above. Then: (l) Aii (D(l) )ii = (j) . for 1 ≤ i ≤ p. with D = blockdiag(D(1) . . based on the following subspaces of U: U0 = Range(RTE DT Ψ ) Ui = RTi D(i) wi : Ci wi = 0. i) the local index of node i in B (j) . D D The coarse matrix Sc . satisfying: RTE D RE = I. wi ∈ Wi . associated with continuous ﬁnite element functions on B. matrix Φ If we expand Ψ = Φ + Φ.260 4 Lagrange Multiplier Based Substructuring: FETI Method in the BDDC method has essentially the same spectrum as the preconditioned matrix F0−1 F in the FETI-DP method. .33. However. The BDDC preconditioner S0 for S corresponds to an additive Schwarz preconditioner with inexact solvers. {j:B (j) ∩B (l) =∅} Aj(i) j(i) The BDDC preconditioner also employs a coarse basis Ψ of size nB × q0 obtained by modifying the matrix Φ of size nB × q0 which satisﬁes C Φ = Rc : SEE C T Ψ 0 = . The spaces Range(Ψ ) and W consist of nodal vectors associated with ﬁnite element functions which are discontinuous across the subdomain boundaries. it follows that Φ ˜ = −PW Φ and Ψ = (I − PW ) Φ. D(p) ). . ˆ will satisfy: SEE C T ˆ Φ −SEE Φ = . ⎛ T −1 ⎞ T S EE Φ SEE C S EE Φ Ψ T SEE Ψ = ⎝ΦT SEE Φ − ⎠ = Sc . the weighted averaging using RTi D(i) or RT DT yields nodal vectors in U. where each D(l) : Wl → Wl is a diagonal matrix with non-negative entries. The action S0−1 of the inverse of the BDDC preconditioner for S is deﬁned as: . Let i denote the index of a node on B (l) and j(i) = ind(B (j) . the diagonal entries of D(l) are chosen as a weighted average of the diagonal entries of the stiﬀness matrices A(j) . except for zeros or ones. The BDDC preconditioner employs a discrete partition of unity matrix on B. C 0 G 0 ˆ and computing Ψ T SEE Ψ after algebraic simpliﬁcation yields: Solving for Φ.

Thus. i=1 0 Ci 0 0 The following bounds will hold for the FETI-DP and BDDC methods. each local saddle point problem: S (i) CiT wi fi = . Remark 4. Remark 4. In applications. On each B (i) . The non-zero entries of the sparse matrix Sc of size q0 can be computed as (Sc )ij = ψ Ti SEE ψ j based on the support of ψ i and ψ j .39. C 0 µj R c ej The components of ψ j will be non-zero only on the boundaries B (l) which intersect the glob associated with the j’th column of Rc . If ej denotes the j’th column of the identity matrix I of size q0 . The columns of matrix Ψ of size nB × q0 can be constructed as follows. only a few local problems need to be solved. MA19]. (4.37. 4. Lemma 4.4 FETI-DP and BDDC Methods 261 T −1 DT RE SEE C T DT RE S0−1 r≡ RTE DΨ Sc−1 Ψ D RE r + T T r. The entries of the Lagrange multiplier variables enforcing the constraints on the cross points can also be eliminated. Proof. then the j’th column ψ j of Ψ can be computed by solving: SEE C T ψj 0 = . DO2. The resulting submatrix of S (i) . The following convergence bounds will hold: λmax (S0−1 S) λmax (F0−1 F ) = ≤κ λmin (S0−1 S) λmin (F0−1 F ) and MD T MB w2SEE RE RTE Dw2SEE κ ≤ sup = sup . 0 C 0 0 The block diagonal structure of SEE and C yields its parallel form: T −1 p D (i)T R S (i) C T D (i)T R S0−1 = RTE D Ψ Sc−1 Ψ T DT RE + i i i . MA18.57) Ci 0 µi gi can be solved using the Schur complement method.38. Thus. the speciﬁed rows of S (i) and CiT and associated columns of S (i) and Ci must be eliminated. See [DO. using the block structure of SEE and C. the entries of wi corresponding to the cross-points on B (i) can be eliminated. w∈W∗ wSEE 2 w∈W∗ w2SEE where κ ≤ c(1 + log2 (h0 /h)) and h0 is the diameter of the subdomains.

parameterize wi in terms of µi using the ﬁrst block row.262 4 Lagrange Multiplier Based Substructuring: FETI Method will be non-singular (even if S (i) were singular). To solve for the remaining entries of wi and µi . Substituting this expression into the second block row yields: . For notational convenience. This formally yields: −1 wi = S (i) (f i − CiT µi ). we shall denote the resulting saddle point system as in the above.

(i)T AIB ABB (i) wi f i − CiT µi Thus. the solution to (4. See [DO. Instead. wi = S (i) (f i − CiT µi ).57) can be sought by solving two sparse symmetric positive deﬁnite systems and one dense symmetric positive deﬁnite system of a small size. DO2. MA19] for alternative methods. . MA18. the solution of the system can be obtained by solving the sparse system: (i) (i) AII AIB yi 0 = . q0 will be at most eight for rectangular subdomains when Ω ⊂ IR2 or of size twenty-six −1 when Ω ⊂ IR3 . Note that (i) (i)T (i)−1 (i) matrix S (i) = (ABB − AIB AII AIB ) need not be assembled. −1 −1 Ti µi = Ci S (i) f i − gi where Ti ≡ (Ci S (i) CiT ). (i) (i) The Schur complement Ti of size q0 can be assembled explicitly. Once µi is determined.

5. When an algorithm is implemented using multiple processors. We then de- scribe the Kernighan-Lin. we discuss several computational issues that arise with the implementation of domain decomposition algorithms.1 presents background on grid generation and graph theory. We describe graph partitioning algorithms which partition a grid. the number of interior unknowns per subdomain must be approximately the same. Employing a heuristic model of an ideal- ized parallel computer with distributed memory. Some heuristic coarse spaces are also outlined for use on unstructured grids. while the number of boundary unknowns must be minimized to reduce inter-subdomain communication. Under such idealized assumptions.5 Computational Issues and Parallelization In this chapter. and describes how the problem of partitioning a domain or an unstructured grid can be heuristically reduced to a graph partitioning algorithm. The ﬁrst issue concerns the choice of a decomposition of a domain into non-overlapping or overlapping subdomains. Following that. it is shown that domain decomposition iterative algorithms have reasonable scalability. to ensure load balancing. we describe models for the computational time required for implementing various domain decomposition preconditioners.2 discusses background on the speed up and scalability of al- gorithms on parallel computers. Chap. 5. with subdomains of irregular shapes. Chap. recursive spectral bisection and multilevel graph partitioning algorithms for partitioning graphs. . The second issue concerns the expected parallel computation time and speed up when implementing a do- main decomposition preconditioner on an idealized parallel computer archi- tecture. We outline heuristic estimate for this using idealized models for the computational time and inter-processor data transfer times. we brief dis- cuss the implementation of Schwarz and Schur complement algorithms on unstructured grids.

Chap.1. MA40. so that the number of grid points per subdomain is approximately the same.2. 5.1. and so that the communication time between processors assigned to diﬀerent subdomains is minimized. OW].7. This issue is typically addressed by employing heuristic graph partitioning algorithms. Chap. is generally a computationally intensive task.1. TH3.1. 5. In such applications.3 describes the construc- tion of subdomains. which require the triangulation and discretization of partial diﬀerential equations on regions having complex geometry.1.1 describes grid generation algorithms.1. we discuss several practical techniques for implementing domain decomposition solvers on unstructured grids. 5. the density of grid points and the number of elements incident to each node can vary signiﬁcantly with location.1 Additionally.4. Comments on Schwarz. Timothy Barth for his kind permission to use Fig. We also discuss the formulation of heuristic coarse spaces for elliptic equations discretized on unstructured grids with subdo- mains having irregular boundaries. HO2. 5.5 to Chap. . We discuss the selection of subdomains so that load balancing constraints are satisﬁed. see Fig.1. Schur complement and FETI algorithms are presented in Chap. BA22. 5. where traditional coarse spaces are not deﬁned.264 5 Computational Issues and Parallelization 5. 5. 5. As a result. 5. There is Fig.1 Algorithms for Automated Partitioning of Domains The term unstructured grid refers broadly to triangulations without any iden- tiﬁable structure. An unstructured grid [BA23] 1 The author thanks Dr. 5. while a few coarse spaces are described for unstructured grids in Chap. MA41]. MA40. followed by graph partitioning algorithms in Chap. In this section. Such grids arise in computational ﬂuid dynamics [PE4] and aerodynamics computations [ST6. algorithms are required to automate the partitioning of a domain into subdomains. and lack the connectivity of uniform grids and the hierarchical structure of multigrids.1. HE9.1 Grid Generation Algorithms Generating a triangulation Th (Ω) on a domain Ω in two or three dimensions with complex geometry.1. The resulting grids are typically not quasiuniform. 5.1. MA41. the triangulation is generated by using grid generation software [GE6.

• Decomposition and mapping method. As a result. A structured triangulation of each reference domain is then mapped back to triangulate the original subdo- mains. can be formulated mathematically as a graph partitioning problem. a graph [BO2] (or a weighted graph) can be constructed representing . The algorithm then advances (updates) these fronts by generating new nodes and elements of a desired size within the interior of the domain. In this method. Many Delaunay triangulation algorithms are available. or partitioning an index set of nodes I = {x1 . Readers are referred to [GE6. • Grid based method. Typically. However. however. . and each subregion is mapped onto one or more standard reference regions. TH3. by the decom- position and mapping method) and new nodes are introduced within the interior. OW]. here the domain is decomposed into subregions. MA40. HE9. and the triangulation Th (Ω ∗ ) of Ω ∗ is modiﬁed to conform to the boundary ∂Ω. yielding an initial front of the triangulation. it can be of poor quality for numerical approximation. A Delaunay triangulation is a simplicial triangulation (triangles in IR2 or tetrahedra in IR3 ) such that any circum- sphere (i. • Advancing front method. . xn } into subindex sets. . TH3] for literature on unstructured meshes and to [OW] for a survey of software algorithms. the boundary ∂Ω of the do- main is ﬁrst triangulated (for instance. Given a triangulation Th (Ω) of Ω. in the ﬁrst phase nodes are placed on the boundary ∂Ω of the domain (for instance. HO2. so that the triangulations must be appropriately modiﬁed. Automatic mesh generation software may combine one or more of the above methods and include a phase of reﬁnement or smoothing of the resulting grid. HE9. . we list a few.e. the generated grid may not be quasiuniform or structured. 5.1. a sphere in IR3 or a circle in IR2 passing though the nodes of a tetrahedra or triangle) do not contain other nodes in the interior.1 Algorithms for Automated Partitioning of Domains 265 an extensive literature on algorithms and software for automated generation of grids [GE6.2 Graph Partitioning Algorithms The problem of decomposing a domain Ω into subdomains. 5. The resulting triangulation of Ω will be of low cost. HO2. In this method. Below. The algorithm terminates when the entire domain is triangulated. some based on the computation of Voronoi cells (polyhedral cells consisting of all points in Euclidean space closest to a node). In the second phase.. a Delaunay triangulation Th (Ω) of Ω is constructed using the given distribution of nodes. MA41. • Delaunay triangulation method. MA40. a uniform or structured simplicial or box type grid Th (Ω ∗ ) with a speciﬁed grid size h is overlaid on an extended domain Ω ∗ ⊃ Ω. the subdomain triangulations may not match near their boundaries. by the decomposition and mapping method). and adjacent to the current front. using the advancing front method (or alternative methods). One of the earliest methods. MA41. depending on the geometry and speciﬁcations for the grid size.

vn } . By default. refers to the number m of edges. A graph G = (V. otherwise. A partition of the domain.1. The number of edges incident to a vertex vi is referred to as the degree of the vertex and will be denoted as d(vi ). Weights may also be assigned to individual vertices vi ∈ V and denoted by w(vi ). In various applications. we introduce the graph partitioning problem. MA30]. vj ) ∈ E (MG )ij = 0. this problem can be formulated as a com- binatorial minimization of an objective functional incorporating the above re- quirements. Thus. E) with weights wij assigned to each edge (vi . while the size of the graph. Formally. if edge ej is incident with vertex vl (NG )lj = 0. . The order of the graph. If the edges in E are enumerated as e1 . E) consists of a collection V of n vertices V = {v1 . vj ) = (vj . vj ) ∈ E. . denoted by |V |. then the vertices incident to each edge may be summarized in an n × m incidence matrix NG : 1. em . it will be useful to assign weights to edges and vertices in a graph. E) by deﬁning wij = wji = 1 if (MG )ij = 1 and w(vi ) = 1. . vi ) ∈ E. A weighted graph is a graph G = (V. Deﬁnition 5. may then be obtained by partitioning this associated graph (or weighted graph) into subgraphs. weights can be assigned to any graph G = (V. refers to the number n of vertices. 1. the adjacencies in E may be represented using an n × n symmetric matrix MG referred to as the adjacency matrix. Such graphs are referred to as weighted graphs. Load balancing require- ments can be incorporated by requiring that the subgraphs be approximately of equal size. Here. denoted |E|. · · · . and a collection E of m edges E = {e1 . . and describe three heuristic algorithms for its solution. or of the index set I of nodes in the triangulation. . · · · . Deﬁnition 5. if (vi . vj ) ∈ E. em } . if edge el is incident to vertices vi and vj we denote it as el = (vi . its associated combinatorial minimization problem. Such weights can be summarized in an n × n symmetric weight matrix W . while minimization of communication costs can be imposed by requiring that the number of edges cut between subgraphs in the partition is minimized [FA9. if (vi .266 5 Computational Issues and Parallelization the connectivity of the triangulation (either connectivity of elements or of the nodes within the triangulation). Given a graph G of order n. where each edge represents adjacencies between pairs of vertices. The reader is referred to [PO3] for details.2.

.. A graph G = (V. any 2 × 2 block partitioning of matrix P MG P T must yield a nonzero oﬀ diagonal block for any permutation matrix P reordering the rows or columns. E) with a nonnegative weight matrix W . In applications to the partitioning of a triangulation. More details of such associations will be described later. i. In this case vertices vi and vj can be deﬁned to be adjacent if nodes xi and xj belong to the same element. If the graph is unweighted. then the default weights wij = (MG )ij for i = j. as shall be described later. i. xl = vj such that all consecutive vertices are adjacent. is referred to as the Fiedler vector of the graph G. For a connected graph G. Due to symmetry and weak diagonal dominance of LG . it will be preferable to identify the vertices vi of the graph with elements κi of triangulation Ωh . In this case. Vertex vi can be deﬁned to be adjacent to vertex vj if elements κi ∩κj = ∅. then the algebraic multiplicity of the zero eigenvalue of LG will yield its number of connected components. and in this case the diagonal entries (LG )ii = d(vi ) will correspond to the degrees of the vertices. should be used. Consequently LG will be singular with eigenvector x1 = (1. (xr .e. By deﬁ- nition. vj ) ∈ E (5. given by the adjacency matrix. • In applications to Schur complement algorithms. a graph G will be connected if and only if its adjacency matrix MG is irreducible. its eigenvalues {λi } will be nonnegative. vj ∈ V there exists a “path” vi = x1 . 5.4. In matrix terms.. l − 1. vj ) ∈ E and i = j. the algebraic multiplicity of the zero eigenvalue of LG will be one and λ2 > 0. Deﬁnition 5. Deﬁnition 5. We assume that these eigenvalues are ordered as: 0 = λ1 ≤ λ 2 ≤ · · · ≤ λ n . if (vi .1) ⎪ ⎩ 0. · · · . • In applications to Schwarz algorithms. If a graph G is not connected. two alternative graphs G = (V. xr+1 ) ∈ E for r = 1. This can be applied recursively. · · · . The Fiedler vector of a connected graph can be employed to partition a graph into two. the eigenvector x2 of LG corresponding to eigenvalue λ2 > 0: LG x2 = λ2 x2 .e. we deﬁne an n × n graph Laplacian matrix LG as follows: ⎧ ⎪ ⎨ l=i wil . 1)T corresponding to eigenvalue λ1 = 0.1 Algorithms for Automated Partitioning of Domains 267 Associated with a graph G = (V.3. · · · . LG will be symmetric and weakly diagonally dominant with zero row sums. E) is said to be connected if for any two vertices vi . if j = i (LG )ij ≡ −wij . if (vi . x2 . E) may be associated with a given triangulation Ωh . let the vertices vi in the graph correspond to nodes xi of Ωh .

e.. E) has been associated with a domain or the nodes in Ωh . Given a graph G = (V. E) and a parameter . respectively so that: V1 ∪ · · · ∪ Vp = V. · · · . The induced subgraph on the vertices Vi (i. Deﬁnition 5. the adjacencies from E between vertices in Vi ) will be required to be connected. if i = j. np . (5. · · · . can be obtained by partitioning the vertices of graph G = (V. as stated formally in the following. Vp of order n1 .268 5 Computational Issues and Parallelization Once a graph G = (V.5. The load balancing constraint can be heuristically approximated by requiring the number ni of nodes within each subset Vi be approximately the same. E) into p subsets V1 . a partition of the domain or its nodes.2) Vi ∩ Vj = ∅.

> 0 chosen by the user. Vp of size n1 . · · · . If ni = |Vi | for i = 1. · · · . if the following hold: 1. respectively. · · · . np . we deﬁne K as an admissible partition of V into p sets V1 . p. then: n n (1 − .

) ≤ ni ≤ (1 + .

Then.3) vl ∈Vi This will reduce to the number of vertices in Vi if w(vl ) = 1. The induced subgraphs Gi = (Vi .6. If weighted edges are used. for i = 1. Ei ) are connected. a weight w(vi ) can be assigned to each vertex to denote the number of nodes that vertex vi represents. the number ni of nodes which subset Vi represents should be computed as: ni = |Vi | = w(vl ). In some applications. If one processor is assigned to each subdomain deﬁned by Vi .). then the vol- ume of communication between the diﬀerent processors can be heuristically estimated in terms of the total number of edges between the vertices in dif- ferent sets Vi in the partition. where each Ei denotes adjacencies from E between vertices in Vi . it may be convenient to let each vertex in the graph represent more than one nodal unknown in the original triangulation Ωh . p p 2. Remark 5. p. we may deﬁne an objective functional δ (·) which represents the sum of edge weights between distinct subsets Vi in the partition. Accordingly. . In such cases. this quantity may be replaced by the sum of the edge weights on edges between diﬀerent sets Vi . · · · . (5. The requirement that the communication between diﬀerent subdomains be minimized may thus be approximated by minimizing the sum of such edge weights between distinct subsets Vi .

we deﬁne δ (V1 . Vj ) ≡ wrs . Vp ) will represent the volume of communication be- tween subsets in the partition. may formally be approximated by the following combinatorial minimization problem. Vp ) will correspond to the total number of edges between all distinct pairs of subsets Vi and Vj of the partition of V . Vj ) the total sum of edge weights between all pairs of vertices in Vi and Vj : δ (Vi . . Vp ) ≡ δ (Vi . Vp satisfying: . we denote by δ(Vi . Vp ) as the sum of edge weights between each distinct pair of subsets Vi and Vj : p−1 p δ (V1 .5) i=1 j=i+1 The functional δ (V1 . · · · . . (5. E) with weight matrix W and two disjoint vertex subsets Vi and Vj of V .4) {vr ∈Vi . · · · . Find a K partition V1 . Vj ) . then δ (V1 . Given a graph G = (V. · · · . If W is chosen by default with wij = (MG )ij . (5.1 Algorithms for Automated Partitioning of Domains 269 Deﬁnition 5. . The problem of partitioning a graph G so that the load balancing constraint holds and so that the communication costs between subdomains is minimized. .7.vs ∈Vj } Given three or more disjoint vertex subsets of V . · · · . 5.

the Kernighan-Lin algorithm permits a ﬁxed number q∗ of exchanges within K which increase the value of δ(·). however. . The latter algorithm generally has the lowest complexity amongst the three. . . . Consequently. The following three algorithms will be outlined in the following: the Kernighan-Lin algorithm. · · · . Repeatedly exchange pairs of vertices vi and vj for which the resulting partition is still within K and for which a reduction in the functional δ(·) is obtained. The algorithm must ideally be implemented for several selections of initial partitions.6). If the vertex weights w(vi ) are unitary then such an exchange will leave n1 . this is an NP hard discrete problem. Vp ) = min δ V˜1 . Start with any initial partition V˜1 .··· . V˜p . (5. To avoid stagnation at a local minimum. This algorithm [KE4]. We therefore restrict consideration to heuristic algorithms which approximate the solution to the above. see [PO3]. as with most combinatorial optimization problems.6) ˜ ˜ (V1 . corresponds to a discrete descent method for the combinatorial minimization problem (5. δ (V1 . np unchanged. Kernighan-Lin Algorithm. V˜p in K . and the partition corresponding to the lowest value of δ(·) must .Vp )∈K Unfortunately. · · · . the recursive spectral bisection algorithm and the multilevel graph partitioning algorithm. if nonunitary vertex weights are employed. · · · . no algorithm of polyno- mial complexity is known for determining the exact solution. this constraint must be checked.

4.. Recursive Spectral Bisection Algorithm. if vi ∈ V1 (q)i ≡ −1. Each graph (or subgraph) is partitioned based on sorting the entries of the Fiedler vector of the graph (or subgraph). PO2. i.vj ∈V2 } ⎪ ⎪ ⎩ = 4 δ(V1 . vs ) ∈ E gain(vr . see [PO3]. The complexity of the Kernighan-Lin algorithm is O(n2 log(n)) if a ﬁxed number of iterations is implemented.e. however. then deﬁne a vector q as: 1. we obtain: ⎧ T ⎪ ⎪ q LG q = wij (qi − qj )2 ⎪ ⎪ {(vi . and is relatively expensive to implement due to computation of the Fiedler vector. In this case we seek |V1 | = |V2 | and the partition is referred to as a bisection.270 5 Computational Issues and Parallelization be stored. BA21.6). At most q∗ exchanges resulting in a negative gain should be accepted. FI. vertices which are adjacent to vertices in other sets of the partition. vs ) ∈ E. for any subset V˜i ⊂ V and vertex vr deﬁne dV˜i (vr ) as the sum of edge weights wrs between vertex vr and vertices vs in V˜i : dV˜i (vr ) ≡ wrs . Let LG denote the weighted graph Laplacian matrix (5.1). {(vr . Suppose V1 . The recursive spectral bisec- tion algorithm is a popular graph partitioning algorithm which repeatedly partitions a graph into two subgraphs [SI2. To implement the Kernighan-Lin sweep. . Once a prescribed number of iterations have been completed. The partitions obtained by recur- sive spectral bisection are typically of very good quality as measured by δ (·). BA20. E) with weight matrix W into two subgraphs so that (5. However. this optimal stored partition can be chosen as an approximate solution of (5.6) is minimized. the number of exchanges of vertices per iteration can be reduced signiﬁcantly if only boundary vertices are exchanged. BO3]. V2 is a solution to the graph bisection problem. then the exchange should be accepted. Remark 5.8. We motivate the spectral bisection algorithm by considering the partition of a graph G = (V.vs )∈E:vs ∈V˜i } Deﬁne the gain associated with exchanging vr ∈ V˜i and vs ∈ V˜j as follows: dV˜i (vr ) − dV˜j (vr ) + dV˜j (vs ) − dV˜i (vs ) if (vr . V2 ). if vi ∈ V2 . If the gain is nonnegative. By construction.vj )∈E} ⎨ = wij 4 ⎪ ⎪ {vi ∈V1 . the algorithm is ideally suited for p ≈ 2J for integer J ≥ 1. we suppose that |V | is an even integer and that all vertex weights w(vi ) are unitary. FI2. An O(|E|) complexity algorithm is known for p = 2. vs ) = dV˜i (vr ) − dV˜j (vr ) + dV˜j (vs ) − dV˜i (vs ) − 2wrs if (vr . For simplicity.

√ • Compute the Fiedler vector x2 (having norm n) associated with LG : LG x2 = λ2 x2 . • Since the components (x2 )i may not be in {+1.. sort its entries in increasing order and let α1/2 denote a median value of the entries of x2 . 3. q ˜T 1 = 0 . If (x2 )i = α1/2 deﬁne qi = ±1.1 Algorithms for Automated Partitioning of Domains 271 Additionally qT 1 = j qj = 0. where x2 is a Fiedler vector (i. The above algorithm is easily generalized when |V | is not even and when the vertex weights are not unitary. Let G be a connected graph. xT 1 = 0} ⊃ Q. . Suppose the following assumptions hold. an eigenvector of L√G corresponding to eigen- value λ2 > 0) scaled so that its Euclidean norm is n. 2. we may extend the above partitioning by deﬁning V1 as the vertices corresponding to the ﬁrst n1 components of the Fiedler vector after sorting (taking into account nonunitary weights of vertices).7) {˜ q∈Q} This is called a quadratic assignment problem [PO3]. . 1 ≤ i ≤ n. We may thus state the bisection problem as determining q ∈ Q such that: ˜ T LG q qT LG q = min q ˜. Indeed. Therefore. Lemma 5. Let x2 denote the Fiedler vector of LG . will be equivalent to the minimization of qT LG q for q ∈ Q where: $ % T Q≡ q ˜ = (˜q1 . 1. V2 ) over all admissible partitions.9. • If (x2 )i > α1/2 deﬁne qi = +1 and if (x2 )i < α1/2 deﬁne qi = −1. q˜n ) : q˜i = ±1. −1}. and it is a discrete (com- binatorial) optimization problem which may be heuristically approximated by a quadratic minimization problem over IRn (with appropriate constraints) as indicated next. We obtain: ⎧ ⎪ min qT LG q ≥ min xT LG x = xT2 LG x2 ⎨ q∈Q xT 2 L G x2 ⎪ = λ2 xT2 x2 ⎩ = λ2 n. Deﬁne Q∗ ≡ {x ∈ IRn : xT x = n. . . We may approximate the discrete minimum of qT LG q in Q as follows. For α ≥ 0 and β ≤ 0 deﬁne: Iα ≡ {i : (x2 )i ≤ α} Jβ ≡ {i : (x2 )i ≥ −β} . The following theoretical result will hold. (5. for any choice of nonnegative integers n1 and n2 satisfying n1 + n2 = n. 5. so that (n/2) components have +1 entries. minimization of δ(V1 .e. .

E1 ) ≡ G = (V. Weight matrices of subgraphs are deﬁned as submatrices of the parent weight matrix corre- sponding to the indices in the subgraphs. CI7]. For results 1 and 2 see [FI]. CI8. The induced graph associated with V1 = {vi : i ∈ Iα } is connected. For additional details. Spectrally bisect each subgraph Gi at level k into two: $ % (k) (k+1) (k+1) Gi → GI1 (i) . · · · . BA20. GI2 (i) for 1 ≤ i ≤ 2k−1 . The multilevel graph parti- tioning algorithm [SI2. E) 1. The partitioned subgraphs of . Proof. Each graph in the hierarchy is constructed to have approximately half the number of vertices as its parent graph.1. VA3. E) with weight matrix W . a standard graph partitioning algorithm (such as recursive spectral bisection) is applied to partition the coarsest weighted graph by minimizing a suitably deﬁned objective functional equivalent to (5. and employ the notation (k) (k) (k) Gi = (Vi . Given a graph G = (V. Reindex Gi so that indices 1 ≤ i ≤ 2k 4. The induced graph associated with V2 = {vi : i ∈ Jβ } is connected. The recursive spectral bisection algorithm partitions a graph G = (V.272 5 Computational Issues and Parallelization Then the following results will hold: 1. For k = 1. As mentioned earlier. HA2. though more expensive to compute. KA3. HE7. E) by repeatedly applying the spectral bisection algorithm to each of the sub- graphs obtained from the previous applications of the spectral bisection algorithm. For result 3 see [PO3. by repeated merging (agglomeration) of pairs of vertices within each parent graph. Algorithm 5. KU] is motivated by graph compaction algorithms and multigrid methodology [BR22. CI6]. We summarize the algorithm below. before reindexing.6). (k+1) 3. this graph partitioning algorithm constructs a hierarchy of smaller order or “coarser” graphs G(l) = (V (l) .1 (Recursive Spectral Bisection Algorithm) Let p ≈ 2J denote the number of sets in the partition (1) (1) (1) Deﬁne G1 = (V1 . Endfor Here I1 (i) and I2 (i) denote temporary indices for the partitioned graphs. J − 1 do: (k) 2. the Fiedler vector x2 (or an approximation of it) may be computed approximately by the Lanczos algorithm [GO4]. see [PO3. the quality of spectral partitions are very good. E (l) ) with weight matrices W (l) . 3. 2. MC2]. In practice. For any p ∈ Q x2 − q2 ≤ x2 − p2 . Ei ) to denote the i’th subgraph at stage k. Multilevel Graph Partitioning Algorithm. Once a weighted coarse graph of suﬃciently small order has been constructed.

e.. . . i. Given a parent graph G(l−1) = V (l−1) . then the default weight matrix W is employed with unitary weights w(vi ) = 1 assigned to the original vertices vi in V . We describe additional details. We shall denote by I1 (i. W (0) ≡ W and n0 ≡ n. A maximal matching is a matching in which no additional edge can be added without violating the matching condition. Since a vertex vi at level l is the agglomeration of (l−1) (l−1) vertices vI1 (i. E) is not weighted. index l = 0 will denote the original and largest order graph in the hierarchy. The new vertices vi in V (l) are assigned weights as follows: (l) (l−1) (l−1) w(l) (vi ) = w(l−1) (vI1 (i. the graphs in the hierarchy will be denoted as G(l) = V (l) . 1. 5. however.l) }.1 Algorithms for Automated Partitioning of Domains 273 the coarse graph are then “projected” onto the next ﬁner level in the hierar- chy by unmerging (deagglomeration) of the merged vertices.l) from V (l−1) . J.e. E (l) ) a matching is any subset of edges from E (l) such that no more than one edge is incident to each vertex.l) ).l) ∪ vI2 (i. from the graph and determine an unmatched (l−1) vertex adjacent to it (if it exists) with maximal edge weight.. E (0) ) ≡ (V. E (l) with weight matrices W (l) of size nl . the computational cost is signiﬁcantly reduced. then it is left as a singleton and matched with itself. Since the bulk of the computations are implemented on the coarsest graph. we express this as: (l) vi = {vI1 (i. To obtain a maximal matching. Consequently. Deﬁnition 5. this procedure is repeated till there are no remaining unmatched vertices. A maximal matching can be constructed in graph G(l−1) as follows. If a vertex is matched with itself (i. This procedure is recursively applied till a partitioning of the orig- inal graph is obtained. vertices in V (l) represent a subset of vertices from the original (l) graph V (0) = V . . l) denotes the index at level (l) (l) (l − 1) of vertex vi . match vr (l−1) (l−1) (l−1) with vs if wrs is largest amongst all the unmatched vertices vs . E (l−1) . Given a graph G(l) = (V (l) . Select (l−1) one vertex randomly. Each graph in the multilevel hierarchy will be indexed as l = 0. l) = I2 (i. For 0 ≤ l ≤ J.l) and vI2 (i. . say vr .l) ) + w(l−1) (vI2 (i. If (l−1) no adjacent unmatched vertex is found for vr . this algorithm deﬁnes a coarser (smaller order) graph G(l) = V (l) . is a singleton) then I1 (i. . E). These projected partitions are improved at the ﬁner level by applying a Kernighan-Lin type algorithm. The initial graph will be the original weighted graph G(0) ≡ G with (V (0) . If the original graph G = (V. while index l = J will denote the coarsest and smallest order graph in the hierarchy. In contrast with traditional multilevel notation. l) and I2 (i. l) the indices of the two parent vertices at level (l − 1) which (l) are matched and merged to yield vertex vi at level l.10. E (l) by merging (agglomerating) pairs of vertices within V (l−1) by a procedure referred to as maximal matching.

Vp of vertices in V (l) can be “projected” to yield a partition of V (l−1) .l)} s∈{I1 (j.l) and vj = vI1 (j. We deﬁne a projection Pll−1 as: . r∈{I1 (i. Otherwise w(l) (vi ) = w(l−1) (vI1 (i. .l) ).I2 (i. .274 5 Computational Issues and Parallelization (l) (l) (l−1) if vi is not a singleton.l)} The ﬁrst phase of the multilevel graph partitioning algorithm recursively applies maximal matching to compute coarser graphs till a coarse graph G(J) of suﬃciently small order is constructed. The weight (l) (l) w(l) (vi ) will denote the number of vertices of the original graph V (0) in vi . . . (l) (l) Vertices vi and vj in V (l) will be deﬁned to be adjacent in E (l) if any of (l) (l) the parent vertices of vi are adjacent to any parent vertices of vj at level (l) (l) (l) (l − 1). More speciﬁcally.l) . Weights and edges are recursively deﬁned by applying the preceding expressions. we discuss how a parti- (l) (l) tion V1 . Before we describe the second phase in the multilevel graph partitioning algorithm. A weight wij will be assigned to adjacent vertices vi and vj by (l) (l) summing the weights on all edges between the parent nodes of vi and vj at level (l − 1).I2 (j.l).l) ∪ vI2 (i.l). we deﬁne: (l) (l−1) wij = wrs . if (l) (l) vi = vI1 (i.l) ∪ vI2 (j.

l) . . (5. . We next describe how to deﬁne an induced objective function δ (l) (·) which (l) (l) is equivalent to δ (·) for a partition V1 . Vp at level l: ⎧ . a partition V1 .9) (l) (l) Thus. given indices 0 ≤ r < l. .8) j i More generally. we deﬁne a projection Plr recursively: (l) (l) Plr Vi ≡ Pr+1 r · · · Pll−1 Vi . (5.l) ∪ wI2 (j. . Plr Vp . . . . . (l) (l−1) (l−1) Pll−1 Vi ≡ ∪w(l) ∈V (l) wI1 (j. . which will deagglomerate all the vertices (l) (l) vi ∈ Vk . Vp of V (l) will yield a partition of V (r) by use of (l) (l) the projections Plr V1 . Formally.8). . we obtain an expression similar to (5. . .

⎨ (l) δ (l) Vi . Vj (l) ≡ {v(l) ∈V (l) . v(l) ∈V (l) } wrs (l) .

r i s j .

. V (l) . . (5. the preceding objective functionals will satisfy: . V (l) ≡ p−1 p δ (l) V (l) .10) ⎩ δ (l) V (l) . 1 p i=1 j=i+1 i j By construction. . .

.

. by construction. . where δ (0) (·) = δ (·). . if a sequence of partitions are constructed on graph G(l) such that the value of the objective . Vp = δ (r) Plr V1 . (5. (l) (l) (l) (l) δ (l) V1 . Plr Vp . . .11) for 0 ≤ r < l. . . . Hence.

This procedure is recursively applied till a partition is obtained on the ﬁnest graph.l). . KA3. Improve the partition employing Kernighan-Lin and δ (l) (·) 9. For additional discussion. to minimize δ (J) (·). .1 Algorithms for Automated Partitioning of Domains 275 functional δ (l) (·) monotonically decreases. PO2]. E (0) ) and J denoting the number of desired levels in the hierarchy. Vp ) 6. Various software implementations of multilevel partitioning algorithms are available. Algorithm 5. indicate that the quality of multilevel partitions are comparable with that obtained by recursive spectral bisection as measured by δ(·). . Vp Numerical studies.I2 (i. . 1 do: (l) (l−1) 7. . . . see [SI2. 3.l)} wrs 4.3 Construction of Subdomain Decomposition Graph partitioning can be applied to either partition Ω into nonoverlapping subdomains Ω1 . Ωp . The resulting partition is then projected to the next ﬁner level using PJJ−1 and reﬁned by an application of several iterations of the Kernighan-Lin algorithm using δ (J−1) (·). HE7. For l = J. . Project: Pll−1 Vi → Vi for i = 1. . 5. . . . . p 8. readers are referred to [PO3]. . a hierarchy of graphs G(1) . We now summarize the multilevel graph partitioning algorithm. . or to partition the index set I of nodes in Th (Ω) into subindex sets I1 . the coarsest graph G(J) is partitioned using an eﬀective graph partitioning algorithm. . Deﬁne vertex and edge weights: ⎧ ⎨ w(l) (vi(l) ) = w(l−1) (v (l−1) ) + w(l−1) (v (l−1) ) I1 (i. Ip . Then.l)} s∈{I1 (j. Partition: V (J) → (V1 . Endfor (J) (J) 5. .2 (Multilevel Graph Partitioning Algorithm) 1. so that load balancing and minimal communication . Given G(0) ≡ G. .I2 (j. BA20.1. . . . J do: 2. . . Endfor (0) (0) Output: V1 . · · · . see CHACO [HE8]. such as Kernighan-Lin or recursive spectral bisection. G(J) are constructed by maximal matching. . . . . For l = 1. Construct a coarser graph using maximal matching: V (l) ← V (l−1) E (l) ← E (l−1) .1. METIS [KA3] and [KU]. then the value of δ (0) (·) will also decrease monotonically for the projected partitions at level l = 0.l) ⎩ (l) (l−1) wij = r∈{I1 (i. 5. The algorithm is summarized next for an input graph G(0) = (V (0) .l) I2 (i.l). .

xn } of vertices in Ωh . Apply any of the partitioning algorithms to minimize δ (·) within K (for a suitable . .276 5 Computational Issues and Parallelization constraints hold. . n where vi is adjacent to vj in E if vertex xi and xj belong to the same element κ ∈ Th (Ω). Deﬁne a graph G = (V. vj ) ∈ E. . Assign unitary weights w(vi ) = 1 to the vertices and unitary weights wij ≡ 1 to the edges (vi . . . . To partition the index set I = {x1 . . E) with vertices vi ≡ xi for i = 1. .

E) with vertices vi ≡ κi for i = 1. xj ) ≤ β h0 . Vp . > 0) and partition V into V1 . Remark 5. . . . q. Deﬁne a graph G = (V. This yields a partition I1 . . violating load balancing requirements. . vj ) ∈ E. . . . . let κ1 . .11. xj ) denotes the Euclidean distance between xl and xj . κq denote an ordering of the elements in triangulation of Ωh . where vi is adjacent to vj in E if κi ∩ κj = ∅. and if Th (Ω) is not quasiuniform. . then n∗i may vary signiﬁcantly. for j ∈ Ii } . . for any β > 0 extend each index set Ii as: Ii∗ ≡ {l : dist(xl . We assign unitary vertex weights w(vi ) = 1 and unitary edge weights wij = 1 for (vi . If nj denotes the number of vertices in Ij and n∗j ≥ nj the number of vertices in Ij∗ . We may apply any of the partitioning algorithms to minimize δ (·) within K (for . Ip of the index set I. . To obtain overlap amongst the index sets.12) where dist(xl . . To partition Ω into nonoverlapping subdomains. . (5. .

1. 5. We shall let Vh (Ω) denote the ﬁnite element space deﬁned on the unstructured grid Th (Ω). . where dist(κr . Vp . alternative coarse spaces may be employed to provide global transfer of information on such grids [WI6. (5. CH3. SA13]. By construction. The size of Ωi∗ and the associated number of nodes may vary signiﬁcantly if Th (Ω) is not quasiuniform. .13) Overlap may be included amongst the subdomains by extending each subdo- main Ωi to Ωi∗ by including all elements adjacent within a distance β h0 > 0. . since Th (Ω) is not obtained by the reﬁnement of Th0 (Ω). SA12. CH17. this will yield a partition of Ω into connected subdomains: Ωi ≡ (∪vl ∈Vi κl ). for 1 ≤ i ≤ p. • Piecewise constant discrete harmonic ﬁnite element space V0. CA4. corresponding to a subspace of nodal vectors in IRn associated with ﬁnite element functions.4 Coarse Spaces on Unstructured Grids Traditional coarse spaces deﬁned on a coarse grid Th0 (Ω) will not be applicable on unstructured grids. SA11. > 0) and partition V into V1 . . We shall outline the following coarse spaces: • Coarse space V0. and formulate coarse spaces either algebraically. .I (Ω) obtained by interpolation of an external space. κj ) denotes the the distance between the centroids of elements κr and κj .P (Ω). Instead.

⎦ (5. (5.13. .12. . To ensure that each coarse node in Th0 (Ω ∗ ) corresponds to a true (nonredundant) degree of freedom.1 Algorithms for Automated Partitioning of Domains 277 Coarse Space Based on Interpolation. If {φ1 (x). . · · · . φn0 (x)} denote n0 basis functions in (0) (0) ∗ Vh0 (Ω∗ ) ⊂ H (Ω ) having desirable properties.I (Ω) 1 is deﬁned as the subspace of Vh (Ω) ∩ H01 (Ω) spanned by interpolants (or projections) of these basis functions onto the ﬁnite element space: $ % (0) V0 (Ω) ≡ span Ih φ1 (·). . an n × n0 extension matrix R0T is deﬁned: ⎡ (0) (0) ⎤ φ1 (x1 ) · · · φn0 (x1 ) ⎢ ⎥ R0T ≡ ⎢⎣ . . Ih φ(0) n0 (·) ⊂ Vh (Ω). A matrix representation of V0. If Ω ∗ ⊃ Ω is a polygonal or polyhedral domain covering Ω and triangulated by a quasiuniform grid Th0 (Ω ∗ ). CH17. a tensor product of one dimensional polynomials may be employed. let {φ1 (x). Example 5. see [CA4.. Then. Such a basis was tested in [CA4. Such basis functions will be in H 1 (Ω ∗ ). where R0 will be sparse. CA17]. CH17. ⎥. An alternative coarse space can be constructed by choosing a space Vn0 (Ω ∗ ) of polynomials on Ω ∗ ⊃ Ω and interpolating it onto the (0) (0) ﬁnite element space Vh (Ω). xn } denote an ordering of the interior nodes of Th (Ω). xd1 . .14) where Ih denotes a ﬁnite element interpolation or projection map onto Vh (Ω). Heuristics .15). . Alternatively. it will be assumed that the support of each nodal basis function deﬁned on Th0 (Ω ∗ ) intersects interior nodes of Th (Ω). · · · . CH3] and shown to yield a quasioptimal convergence rate under appropriate assumptions. Example 5. .I (Ω) can be obtained using the standard interpolation map Ih as follows. may be used. 5. . φn0 (x)} denotes a monomial or Tchebycheﬀ basis for polynomials of degree d or less on a rectangular domain Ω∗ ⊃ Ω. x1 x2 . In two dimensions. x21 . and A0 ≡ R0 AR0T . . CH3. The restriction matrix R0 will be the transpose of the extension matrix. x2 . but a Tchebycheﬀ basis would be preferable. then the matrices R0 . It is more suited for Dirichlet boundary value problems. However. · · · . the monomials: Vd (Ω∗ ) ≡ span {1. . A coarse space can be constructed as in (5. φn0 (x)} (0) (0) ∗ denote a ﬁnite element nodal basis deﬁned on triangulation Th0 (Ω ). We indicate two examples below. xd2 }. these matrices will not be sparse. Let I = {x1 .15). The coarse space V0. x22 .. . Let Ω ∗ ⊃ Ω and let {φ1 (x).I (Ω) ⊂ Vh (Ω) is deﬁned by interpolating or projecting an external ﬁnite dimensional space Vh0 (Ω∗ ) of functions with desirable approximation prop- erties onto the ﬁnite element space Vh (Ω). The ﬁnite element coarse space V0. . R0T and A0 can be constructed as in (5. · · · . .15) (0) (0) φ1 (xn ) · · · φn0 (xn ) (0) The functions {φi (·)}ni=1 0 should ideally be chosen so that the above matrix is of full rank. x1 . .

- AII AIB A≡ . ATIB ABB (1) (p) where AII = blockdiag(AII . . Denote by B = ∪pi=1 B (i) the common interface. . A matrix basis for V0 can be constructed as follows [MA14. resulting in the following block structure for A: . . . . AII ). The columns of R0T will be deﬁned as piecewise discrete A-harmonic vectors corresponding to the following p (k) speciﬁcally chosen interface data vectors wB for k = 1. with nI and nB denoting the number of nodes in I and B. Ωp denote a nonoverlapping subdomain decomposition of Ω. nB denote the ordering of nodes on B. respectively. Ωp and on interface B. Let Ω1 . . in which case nodes on Ω ∪ BN must also be included in (5. . . SA7]. CO8. . The coarse ﬁnite element space V0. . Then. for each node yi ∈ B deﬁne NG (yi ) as the number of subdomain boundaries B (k) with yi ∈ B (k) . .278 5 Computational Issues and Parallelization studies indicate reasonable convergence for Neumann boundary value prob- lems [CA18]. · · · . · · · . We shall assume that the indices in I are grouped and ordered as I ∪ B corresponding to the nodes in the subdomains Ω1 . Let yi for i = 1. Employ the block T partitioning w = wTI .15).P (Ω) will consist of ﬁnite element functions which are discrete harmonic on each subdomain Ωl with specially chosen boundary values on each B (l) = ∂Ωl \BD . . constructed by graph partitioning of the triangulation Ωh of Ω. p: . wTB as in Schur complement methods.P (Ω) ⊂ Vh (Ω) of piecewise discrete harmonic ﬁ- nite element functions. We next describe a coarse space V0. Coarse Space of Piecewise Discrete Harmonic Functions.

if yi ∈ B (k) . wB · · · wB The coarse ﬁnite element space V0. otherwise. Approximation properties of such spaces are described in [CO8.P (B) employed in the balancing domain decomposition preconditioner [MA17]. nB (k) wB = NG (yi ) i 0. . .P (Ω) ⊂ Vh (Ω)will consist of ﬁnite element functions whose nodal vectors are in Range R0T . . 1 . MA17]. for i = 1. SA7. of ﬁnite element functions in the piecewise constant coarse space V0. . The restriction matrix R0 will be the transpose of R0T and A0 ≡ R0 AR0T .P (Ω) correspond to discrete harmonic extensions into the subdomains. Denote the discrete harmonic extension matrix as E ≡ −A−1 II AIB and deﬁne matrix R0T as: (1) (p) EwB · · · EwB R0 ≡ T (1) (p) . The ﬁnite element functions in V0.14. . Remark 5.

the system A u = f may be solved using matrix multiplicative. the local Schur complement S (i) : . 5. .1. . the subdomain boundary segments B (i) will be well deﬁned. Rp . the nonoverlapping subdomains Ω1 . Once a coarse space has be chosen with restriction matrix R0 . . . Il∗ ) = j (Rl )ij = 0. Deﬁne an index function with index(i. 5. . . if index(i. Il∗ ) denoting the global index in I of the local index 1 ≤ i ≤ n∗l in Il∗ . so that Neumann-Neumann and balancing domain decomposition precondi- tioners can be applied based on S (i) . CA18] and [CO8. amount of overlap and other factors in unstructured grid applications are presented in [CI8]. Ip∗ have n∗1 . . These algorithms can be formulated as before. If the unstructured grid is quasiuniform. Ωp determined by a graph partitioning algorithm may have complex geometry. optimal convergence should be ob- tained. . . . see [CH3. CA4. faces and wirebaskets may be diﬃcult to identify. . . CH17. Depending on whether c(x) = 0 or c(x) ≥ c0 > 0 in the elliptic equation. and that each index set Il has been extended to Il∗ as described earlier in this section. .5 Schwarz Algorithms We consider next the matrix implementation of Schwarz iterative algorithms on an unstructured grid Th (Ω). n∗p nodes in each set. R1 . based on suitable restriction and extension matrices.6 Schur complement algorithms When the grid is unstructured. The extension matrices RlT will be transposes of the restriction matrices with Al ≡ Rl ARlT . . However. SA7]. Let the subindex sets I1∗ . .1. the entries of an n∗l × n local restriction matrix Rl can be deﬁned by: 1. Ip using a graph partitioning algorithm which minimizes δ (·). We shall assume that the index set I has been partitioned into subindex sets I1 . if index(i.1 Algorithms for Automated Partitioning of Domains 279 5. . . . additive or hybrid Schwarz algorithms based on the restriction matrices R0 . Studies of the eﬀects of partitioning algorithms. . Il∗ ) = j. and traditional globs such as edges. . Then.

the subdomain stiﬀness matrices A(i) will not be singular. The singular vector will T be zi = (1. MA15]. When c(x) ≥ c0 > 0.P (Ω) described earlier for unstruc- tured grids. Instead. then the convergence rate of the resulting Neumann-Neumann algorithm will deteriorate as h−2 0 (1 + log(h0 /h)) if the grid Th (Ω) is quasi- 2 uniform and the subdomains are shape regular of diameter h0 . · · · . if no mechanism is employed for global transfer of information. may be singular when Ωi is a ﬂoating subdomain. (i) (i)T (i)−1 (i) S (i) = ABB − AIB AII AIB . . 1) and the balancing domain decomposition algorithm can be applied. However. see [SA7. the balancing domain decomposition or traditional Neumann-Neumann algorithm can be employed with the coarse space V0.

matrix A(i) will be non-singular.7 FETI Algorithms As with Neumann-Neumann and balancing domain decomposition algorithms. and CG acceleration. Our discussion will be organized as follows. with smooth coeﬃcients a(x) ≥ a0 > 0 and c(x) ≥ 0. with and without coarse space correction. In Chap. We consider representative Schwarz or Schur complement preconditioners. Chap. 5.280 5 Computational Issues and Parallelization 5. 5.4 we employ these bounds to obtain models for the parallel eﬃciency of various domain decomposition iterative solvers. while if c(x) ≥ c0 > 0.1.2. 5. with and without coarse space correction. 5. Appropriate versions of the FETI algorithm can be employed on unstructured grids [FA15].17) where A = AT > 0 is of size n. (5.2.1 Background Consider the following self adjoint and coercive elliptic equation: −∇ · (a(x)∇u) + c(x)u = f (x). will yield the linear system: A u = f. on ∂Ω. with grid size h. the subdomain stiﬀness matrices A(i) will be singular. In Chap. 5. . we describe background on parallel computers and measures for assessing the speed up.16) u = 0.2. 5. in Ω ⊂ IRd (5. If c(x) = 0 and Ωi is ﬂoating. under highly idealized assumptions.2. for the execution times of representative domain decomposition solvers imple- mented on a parallel computer having p processors with distributed memory. GR12. FETI algorithms also require minimal geometric information about the sub- domains on unstructured grids.2 Parallelizability of Domain Decomposition Solvers In this section.2. GR16]. eﬃciency and scal- ability of parallel algorithms. In Chap. SM4. CH15.1 we present background and notation on identities used for the parallel computation of matrix-vector products and inner products. we heuristically model the potential parallel eﬃciency of do- main decomposition solvers [GR10. and representative Schwarz and Schur complement preconditioners. and derives heuristic estimates for the parallel execution times of represen- tative solvers. FA9. We do this by employing theoretical models.3 describes a domain decomposition strategy for allocating memory and computations to individual processors.2. Its discretization by a ﬁnite element method based on a quasiuniform triangulation τh (Ω) of Ω. SK.

Notation. i=1 Local load vectors will be denoted f (i) for 1 ≤ i ≤ ns so that the global load ns (i)T (i) vector has the form f ≡ i=1 R f . If a Schur complement preconditioner is employed. then Ri will denote the pointwise restriction map from nodes on interface B onto the boundary segment B (i) of Ωi .17) by a preconditioned CG algorithm using an additive Schwarz or Neumann-Neumann preconditioner. i=1 A decomposition of the identity on B of the form: ns I= RTi I (i) Ri .2 Parallelizability of Domain Decomposition Solvers 281 We consider the solution of (5. . Accordingly. the subassembly identity can be expressed in the form: ns T A= R(i) A(i) R(i) . resulting in subdomain Ωi∗ . The local Schur complements will be denoted S (i) . By construction. Ωns denote a nonoverlapping decomposition of of Ω ⊂ IRd into ns subdomains. we let Ω1 . i=1 where matrix I (i) has the same size as A(i) with nonnegative diagonal en- tries. so that the subassembly identity has the form: ns S= RTi S (i) Ri . To obtain an overlapping decomposition. the volume (area) of the extended subdomains will satisfy |Ωi∗ | = O ((1 + β∗ ) |Ωi |) for β∗ ≡ (1 + β)d − 1. We will employ the following notation. The pointwise nodal re- striction map onto nodes in Ω i will be denoted R(i) and the local stiﬀness matrix on Ω i will be denoted A(i) . . . we extend each subdomain Ωi by including all points of Ω within a distance of β h0 from Ωi . We shall assume there exists diagonal matrices I (i) which form a decomposition of the identity: ns T I= R(i) I (i) R(i) . then each nonoverlapping subdomain Ωi will contain O(n/ns ) unknowns while overlapping subdomains Ωi∗ will contain O ((1 + β∗ )n/ns ) unknowns. Such matrices can be constructed by deﬁning (I (i) )kk = 1 if xk ∈ Ωi and (I (i) )kk = 1/N (xk ) if xk ∈ B (i) where N (xk ) denotes the number of subdomain boundaries to which node xk belongs to. i=1 will also be assumed. . each of diameter h0 and volume (area) |Ωi | = O(hd0 ). Due to quasiuniformity of the underlying triangulation. Consequently. . where I (i) (with some abuse of notation) denotes a diagonal matrix of the same size as S (i) with nonnegative diagonal entries. if n denotes the number of interior nodes in Ω. 5.

This unfortunate fact places constraints on the types of parallel algorithms suitable for implementation on such hardware. and are generally suited for implementation on MIMD architectures. τf τc . and the coarse space matrix will be denoted A0 = R0 AR0T . which we shall assume is zero for simplicity.e. QU8. then the row space of R0 will span the coarse space. multiple data) architecture with distributed memory [HO. For simplicity. it will be spanned by the rows of R0 with S0 = R0 SRT0 denoting the coarse Schur complement matrix. AL3. interprocessor communication must be kept to a minimum to obtain high speed up of algorithms. GR]. then a suitable protocol such as message passing interface [GR15] may be employed. LE16. it will be assumed that data can be communicated directly between any pair of processors (though.2. We shall let Tcomm (n) ≡ τ0 + n τc denote the average time for transferring n units of data between two processors. In such cases. GR].2 Parallel Computation We consider a parallel computer with an MIMD (multiple instruction. AL3. Here τ0 denotes the start up time. The performance of an algorithm on a parallel computer is typically as- sessed by a quantity referred to as the speed up. n) denotes the execution time for implementing a parallel algorithm having problem size n using p processors. By design. If several processors simultaneously send data to each other. The remaining portions typically require communication. Formally. If a coarse space is employed. then its relative speed up is deﬁned as the ratio of its execution time T (1. . LE16. which measures the rate of reduction in its execution time as the number of processors is increased. each with local memory and capable of executing programs independently. as speciﬁed by some adjacency matrix). large portions of domain decomposition algorithms involve computations which can be implemented independently without communication. n) on a parallel computer with p processors.282 5 Computational Issues and Parallelization If a coarse space is employed. if T (p. in most domain decomposition applications it will be suﬃcient to pass data between neighboring processors. either between adjacent subdomains or with a coarse space (if present). Algorithms having relatively large sections of independent computations with relatively small sections requiring communication are said to have coarse granularity. so that Ai = Ri ARiT will be a principal submatrix of A corresponding to nodes in Ωi∗ . see [HO. i. 5. We will assume there are p identical processors. QU8. we let Ri denote the pointwise restric- tion map onto nodes in Ωi∗ . On a typical MIMD parallel computer. the speed of communication τc between processors will be signiﬁcantly slower than the speed τf of ﬂoating point operations. n) on a serial computer to its execution time T (p. where τf denotes the time for a ﬂoating point operation. provided each processor is assigned to implement the computations on one or more subdomains. Given overlapping subdomains Ωi∗ ..

n) ≡ . the amount β of overlap (if overlapping subdomains are employed). In ﬁnite ele- ment applications. n) = φ(n) τf where φ(n) = c0 nα +o(nα ) for 1 < α ≤ 3. depending on the elliptic equation. T (p. When the speed up is measured relative to the best serial execution time. T (p. Tbest (1. n) ≤ p. Remark 5. Remark 5.17. linear (or almost linear) order complexity may be attained for multigrid and fast Poisson solvers. This is because the relative speed up ratio is not measured with reference to the best serial execution time. The relative speed up of an algorithm implemented using p processors is deﬁned as: T (1. n) When the best serial algorithm or execution time is not known. The execution time T (p. the lowest attainable complexity for the solution of a sparse linear system of size n arising from discretizations of elliptic equations will be denoted φ(n). geometry and discretization.18. Deﬁnition 5. there may be other parallel implementations with shorter execution times. Even if the relative speed up of a parallel algorithm attains its maximum value. n) of domain decomposition algorithms may depend on other factors. n) S(p. n) The speed up ratio has a theoretical maximum value of p for a perfectly paral- lelizable algorithm with 1 ≤ S(p. The total speed up of an algorithm is deﬁned as: Tbest (1. 5. n) S(p.15. where Tbest (1. n) denotes the best serial execution time. In special cases.16. the relative speed up may be used as a measure of its parallel performance. This is deﬁned below. n) ≡ . but we will assume Tbest (1.2 Parallelizability of Domain Decomposition Solvers 283 Deﬁnition 5. n) = C n τf . the stopping criterion . such as the number ns of subdomains. In such cases. the resulting speed up is referred to as total speed up.

we shall denote the execution time as T (p. size n0 of the coarse space. n. amongst other factors. ns .. β. the complexity φ(·) of the local solver. . If this dependence of the execution time on such ad- ditional factors needs to be emphasized.

n.. β. φ) and the relative speed up as S(p. ns . n0 . .

. ns . . β. n. φ) and the total speed up as S(p. n0 .

n) ≡ × 100%. p T (p. The relative parallel eﬃciency of an algorithm implemented using p processors is deﬁned as: T (1. n) E(p. we deﬁne the parallel eﬃciency of an algorithm as the percentage of the speed up relative to the maximum speed up of p. n) ≡ × 100%. In the following. n) The total parallel eﬃciency of an algorithm is deﬁned as: Tbest (1.19. φ). Deﬁnition 5. n0 . n) . n) E(p.. p T (p.

Let 0 < α < 1 denote the fraction of computa- tions within an algorithm which are serial in nature. n) = = ≤ . In practice. The parallel execution time given p processors is decomposed as: T (p. T (p. In applications. The fraction α of serial computations within an algorithm can be diﬃcult to estimate and may vary with the problem size n. Empirical evidence indicates that α(n) diminishes with increasing problem size n for most algo- rithms. while B(n) denotes the parallel execution time for the parallelizable portion of the algorithm. n) T (p. This yields the following estimate for the serial execution time of the algorithm: T (1. is a measure of how eﬃciently an algorithm makes use of additional processors. it is often of interest to know whether parallel algorithms can be found which maintain their eﬃciency as the size n of the problem is scaled up. T (p. This yields the following upper bound for the speed up: T (1. n) A(n) + B(n) A(n) + B(n) A(n) + B(n) Unlike the ﬁxed bound given by Amdahl’s law. A less pessimistic upper bound for the maximum speed up was derived by Gustafson-Barris as indicated below. n) = A(n) + B(n). n)/p. n) = A(n) + p B(n) from which we estimate the speed up as: T (1. n) 1 1 S(p. the following estimate can be obtained for the optimal execution times: T (1. and ignoring over- head and communication costs. The scalability of a parallel algorithm. deﬁned below. regardless of the computer hardware. n) + (1 − α) T (1. n) = α T (1. due to portions of the algorithm in which computations can only be executed sequen- tially. Such an upper bound on the speed up is given by Amdahl’s law. n) α + (1 − α)/p α Thus. n) A(n) + p B(n) A(n) B(n) S(p. there may be constraints on the maximal speed up attainable in an algorithm. n) = = = + p. . n) = α T (1. Amdahl’s law yields a pes- simistic bound in practice. n) + (1 − α) T (1. assuming perfect parallelizability of the remaining portion of the algorithm.284 5 Computational Issues and Parallelization Amdahl’s Law. due to the implicit assumption that the fraction α of serial computations remains ﬁxed independent of n. the Gustafson-Baris bound for the speed up increases linearly with the number of processors. where A(n) denotes the execution time for the serial portion of the algorithm. Then. which may be derived as follows. the parallel speed up of an algorithm cannot exceed the inverse of the fraction α of serial computations within the algorithm.

5.2. an algorithm is scalable if given m p processors where m > 1. three alternative approaches may be employed for solving a coarse space problem in parallel in domain decomposition preconditioners: • Parallelize the solution of the coarse problem (using all the processors) and store relevant data on each processor. n). n): T (1.20. n(m))/(m T (1. An algorithm is said to be perfectly scalable if its eﬃciency remains constant when the problem size n and the number of processors p are increased by the same factor m: E(m p. An algorithm is said to be scalable if it is possible to keep its eﬃciency constant by increasing the problem size as the number of processors increases. n). • Gather all the relevant coarse data on a speciﬁc processor and solve the coarse problem only on this processor. in relation to T (p. n(m)) = ( ) T (p. . and a portion computing the action of the inverse of the preconditioner. n). it is easily seen that the fol- lowing will hold for an algorithm satisfying E(m p. n(m)) = E(p. update of residuals. m n) = E(p. n)) is the factor by which the com- putation time is increased or decreased. Using the deﬁnition of scalability. An algorithm is said to be highly scalable if its parallel eﬃciency depends only weakly on the number of processors as the problem size n and the number p of processors are increased by the same factor. the expression T (1. Fur- thermore. solve the coarse problem redun- dantly in parallel on each processor. the problem size can be increased to n(m) > n such that: E(m p. n). thereby minimizing communication of additional data. it will be desirable to allo- cate memory to individual processors in a way compatible with both sections of the algorithm. iterates and inner products). m T (1. n(m)) = E(p. n(m)) T (m p. • Gather the coarse data on each processor. When implementing a PCG algorithm on a parallel computer with distributed memory. a portion not involving the preconditioner (matrix-vector products.2 Parallelizability of Domain Decomposition Solvers 285 Deﬁnition 5.3 Parallelization of PCG Algorithms Each iteration in a PCG algorithm can be decomposed into two portions. 5. Remark 5. and broadcast the result to all other processors. if coarse space correction is employed within the preconditioner.21. care must exercised in the parallel implementation of the coarse problem. More speciﬁcally. as the number of processors is increased to m p and the problem size is increased to n(m). Typically. n) Here.

286 5 Computational Issues and Parallelization

**Generally, the latter two approaches are preferable on typical parallel archi-
**

tectures [GR10], though we shall consider only the second approach.

Motivated by the preceding, we shall heuristically consider the following

strategy for allocating memory and computations to individual processors.

• Each of the p processors is assigned to handle all the computations corre-

sponding to one or more subdomains or a coarse problem. Thus, if a coarse

space is not employed, each processor will be assigned to handle (ns /p)

subdomains, and (ns /p) + 1 subproblems if a coarse space is employed.

• To ensure approximate load balancing, we shall require the number of

unknowns O(n/ns ) per nonoverlapping subdomain (or O ((1 + β∗ )n/ns )

per overlapping subdomain) to be approximately equal. If a coarse space

is employed, we shall additionally require the number n0 of coarse space

unknowns not to exceed the number of unknowns per subdomain, yielding

the constraint n0 ≤ C(n/ns ).

• To reduce communication between the processors, we shall assume that

the subdomain data are distributed amongst the diﬀerent processors as

follows. The processor which handles subdomain Ωi should ideally store

the current approximation of the local solution u(i) on Ω i , the local stiﬀ-

ness matrix A(i) , local load vector f (i) and matrix I (i) . If overlapping

subdomains Ωi∗ are used, then the local solution ui on Ωi∗ , submatrix

Ai = Ri ARiT , local load Ri f , local residual Ri r and the components Ri RjT

for adjacent subdomains should also be stored locally. If a coarse space

T

is employed, then the nonzero rows of R0 R(i) and R0 RiT should also be

stored locally.

• The processor which handles the coarse space should also store matrix

A0 = R0 AR0T and the nonzero entries of Rj R0T for 1 ≤ j ≤ ns .

We shall let K denote the maximum number of adjacent subdomains.

When deriving theoretical estimates of execution times, we shall as-

sume that an eﬃcient sparse matrix solver having complexity φ(m) = c0 mα +

o(mα ) for some 1 < α ≤ 3 is employed to solve all the subproblems of size m

occurring within a domain decomposition preconditioner. Analysis in [CH15]

suggests that if a serial computer is employed, then the optimal diameter h0

of a traditional coarse grid must satisfy:

h0 = O hα/(2 α−d) for Ω ⊂ IRd .

**If a parallel computer is employed with p processors, then load balancing
**

requires the number n0 of coarse space unknowns to satisfy n0 ≤ c(n/ns ).

Since theoretical analysis indicates a coarse space must satisfy an approxi-

mation property of order h0 for optimal or almost optimal convergence, this

heuristically suggests n0 ≈ ns ≈ n1/2 for traditional coarse spaces.

In the following, we outline parallel algorithms for evaluating matrix

multiplication and inner products, and the action of additive Schwarz and

5.2 Parallelizability of Domain Decomposition Solvers 287

**Neumann-Neumann preconditioners. We derive heuristic estimates for the
**

parallel execution times of the resulting algorithms.

Parallelization of Matrix Vector Products. By assumption,we let a vec-

tor w be distributed amongst diﬀerent processors with component R(i) w (and

Ri w, if overlapping subdomains are employed) stored on the processor han-

dling Ωi . As a result, a matrix-vector product A w can be computed using

the subassembly identity:

ns

T

Aw = R(i) A(i) R(i) w,

i=1

**and the result can be stored locally using the following steps.
**

1. In parallel, multiply each of the local vectors R(i) w (assumed to be stored

locally) using the local stiﬀness matrix A(i) .

T

2. The processor handling Ωi should send the data R(j) R(i) A(i) R(i) w to

the processor handling Ωj .

3. The processor handling Ωj should sum the contributions it receives:

ns

T

R(j) Aw = R(j) R(i) A(i) R(i) w,

i=1

**from all (at most K) neighbors, and store the result locally.
**

If ti denotes the parallel execution time for the i’th step above, it will satisfy:

⎧

⎨ t1 ≤ c1 (ns /p) (n/ns ) τf

⎪

t2 ≤ c2 (ns /p) K (n/ns )(d−1)/d τc + τ0

⎪

⎩

t3 ≤ c3 (ns /p) K (n/ns )(d−1)/d τf .

Apart from τ0 , the other terms are inversely proportion to p.

Matrix-vector products involving the Schur complement matrix S can be

computed similarly, based on an analogous subassembly identity:

ns

SwB = RTi S (i) Ri wB .

i=1

T −1

(i) (i) (i) (i)

Since S (i) = AII − AIB AII AIB , such computations require the solution

of local linear systems, with the solver of complexity φ(.). Thus, the parallel

execution time for matrix multiplication by S will be bounded by a sum

of t1 = c1 (ns /p) φ(n/ns ) τf , t2 = c2 (ns /p) K (n/ns )(d−1)/d τc + τ0 and also

t3 = c3 K (ns /p) (n/ns )(d−1)/d τf . Again, apart from the start up time τ0 , the

other terms are inversely proportion to p.

Parallelization of Inner Products. Inner products can be computed in

parallel based on the distributed data stored on each processor. By assump-

tion, given vectors w and v, their components R(i) w and R(i) v will be stored

288 5 Computational Issues and Parallelization

**on the processor handling Ωi . Since matrix I (i) will also be stored locally, the
**

inner product wT v can be computed using the identity:

ns

T

wT v = wT R(i) I (i) R(i) v.

i=1

**This computation may be distributed as follows.
**

1. In parallel, the processor handling Ωi should compute the local inner

T

products wT R(i) I (i) R(i) v.

2. Each processor should sum the (ns /p) local inner products it handles and

communicate the computed result to all the other processors.

3. Each processor should sum all the local inner products it receives and

store the resulting answer locally.

If ti denotes the execution time for the i’th step above, it will satisfy:

⎧

⎨ t1 ≤ c1 (ns /p) (n/ns ) τf

⎪

t2 ≤ c2 (ns /p) τf + c3 p τc + τ0

⎪

⎩

t3 ≤ c4 p τf .

**Except for c3 p τc + τ0 and c4 p τf , the other terms vary inversely with p.
**

Analogous estimates will hold for inner products in Schur complement

algorithms, based on interface unknowns. The total execution time in this

case will be bounded by the sum of t1 = c1 (ns /p) (n/ns )(d−1)/d τf along with

t2 = c2 (ns /p) τf + c3 p τc + τ0 and t3 = c4 p τf . Except for c3 p τc + τ0 and

c4 p τf , the other terms are inversely proportion to p.

Parallelization of an Additive Schwarz Preconditioner. If there is no

coarse space, the inverse of such a preconditioner will have the form:

ns

M −1

= RiT A−1

i Ri .

i=1

**Computation of the action of M −1 on a residual vector r can be implemented
**

in parallel as follows.

1. In parallel, solve Ai wi = Ri r using the locally stored residual vector Ri r

and the locally stored submatrix Ai .

2. In parallel, the processor handling Ωi∗ should send Rj RiT wi to each of the

processors handling Ωj∗ for Ωj∗ ∩ Ωi∗ = ∅.

3. In parallel, each processor should sum contributions

ns of solutions from

adjacent subdomains and store Rj M −1 r = i=1 Rj RiT wi locally.

The computational time for each step can be estimated.

5.2 Parallelizability of Domain Decomposition Solvers 289

**If ti denotes the execution time for the i’th step, it will satisfy:
**

⎧

⎨ t1 ≤ c1 (ns /p) φ ((1 + β∗ )(n/ns )) τf

⎪

t2 ≤ c2 K β∗ (n/p) τc + τ0

⎪

⎩

t3 ≤ c3 K β∗ (n/p)τf .

**Apart from τ0 , the terms are inversely proportional to p.
**

If a coarse space is included, the preconditioner will have the form:

ns

M −1 = RiT A−1 T −1

i Ri + R0 A 0 R0 .

i=1

**Care must be exercised when parallelizing the coarse grid correction term
**

R0T A−1

0 R0 since the computation of R0 r requires global communication be-

tween processors. We shall assume that the coarse space computations are

performed on a processor assigned to the coarse space, however, they may

alternatively be performed redundantly on each of the other processors in

parallel. We shall not consider the parallelization of coarse space computa-

T

tions. By assumption, the nonzero rows of R0 R(i) , matrix I (i) and vector

R(i) r are stored locally on the processor handling Ωi∗ . Thus, the vector R0 r

may be computed based on the following expression:

⎧

⎨ R0 = R0 ns R(i)T I (i) R(i)

ns

i=1

⎩ = i=1 R0 R(i)

T

I (i) R(i) .

**Below, we summarize an algorithm for the parallel computation of M −1 r.
**

1. The processor handling Ωi∗ should compute the nontrivial rows of the term

T

R0 R(i) I (i) R(i) r using the locally stored vector R(i) r and matrix I (i) . Send

these nontrivial rows to the processor handling coarse space correction.

The processor handling the coarse space should sum the components:

ns

T

R0 r ≡ R0 R(i) I (i) R(i) r.

i=1

**2. In parallel, solve Ai wi = Ri r for 0 ≤ i ≤ ns .
**

3. If Ωi∗ ∩Ωj∗ = ∅ then the processor handling Ωi∗ should send Rj RiT wi to the

processor handling Ωj∗ . The processor handling the coarse space should

send relevant components of R0T w0 to the processor handling Ωi∗ .

4. In parallel, the processor handling Ωi∗ should sum the components:

ns

Ri M −1 r ≡ Ri RjT wj .

j=0

The computational time for each step above can be estimated.

290 5 Computational Issues and Parallelization

**If ti denotes the execution time for the i’th step above, it will satisfy:
**

⎧

⎪

⎪ t1 ≤ c1 K (1 + β∗ )(n/p) τf + c2 K n0 τc + τ0 + c3 K n0 τf

⎪

⎪

⎨ t2 ≤ c4 (ns +1) φ ((1 + β∗ )(n/ns )) τf

p

⎪ t3 ≤ c5 K (1 + β∗ ) (n/p) τc + τ0

⎪

⎪

⎪

⎩ t ≤ c (ns +1) (K + 1) (1 + β ) (n/n ) τ ,

4 6 p ∗ s f

**provided that n0 ≤ (1 + β∗ )(n/ns ). Additionally, if ns scales proportionally
**

to p, then apart from τ0 , the other terms are inversely proportional to p.

Parallelization of the Neumann-Neumann Preconditioner. We next

consider a Neumann-Neumann Schur complement preconditioner, in which

the action of the inverse of the preconditioner has the form:

ns

†

M −1 = RTi S (i) Ri + RT0 S0−1 R0 ,

i=1

**where S0 = R0 SRT0 A0 . Care must be exercised when parallelizing the
**

computation of RT0 S0−1 R0 rB , since it requires global communication. It will be

assumed that the nonzero rows of R0 RTi are stored on the processor handling

Ωi . The action of R0 on rB can be computed using the identity:

ns

R0 = R0 i=1 Ri I Ri

T (i)

ns

= i=1 R0 RTi I (i) Ri .

**Below, we list the implementation of the Neumann-Neumann preconditioner.
**

1. In parallel, each processor handling Ωi∗ should compute the nontrivial rows

of R0 RTi I (i) Ri rB using (the locally stored) Ri rB and matrix I (i) . Send

these nontrivial rows to the processor handling coarse space correction

and then sum the components to obtain:

ns

R0 rB ≡ R0 RTi I (i) Ri rB .

i=1

**2. In parallel, solve S (i) wi = Ri rB for 0 ≤ i ≤ ns where S (0) ≡ S0 .
**

3. In parallel, if Ωi∗ ∩Ωj∗ = ∅, the processor handling Ωi∗ should send Rj RTi wi

to the processor handling Ωj∗ . The processor handling the coarse space

should send Ri RT0 w0 to the processor handling Ωi∗ for 1 ≤ i ≤ ns .

4. In parallel, the processor handling Ωi∗ should sum the components:

ns

−1

Ri M rB ≡ Ri RTj wj .

j=0

The computation times for the above steps can be estimated.

5.2 Parallelizability of Domain Decomposition Solvers 291

**If ti denotes the execution time for the i’th step above, it will satisfy:
**

⎧

⎪

⎪ t1 ≤ c1 (nsp+1) K (n/ns )(d−1)/d τf + c2 K n0 τc + τ0 + c3 K n0 τf

⎪

⎪

⎨ t ≤ c (ns +1) φ (n/n ) τ

2 4 p s f

⎪ t3 ≤ c5 K (ns /p) (n/ns )(d−1)/d τc + τ0

⎪

⎪

⎪

⎩

t4 ≤ c6 (nsp+1) (K + 1) (n/ns )(d−1)/d τf ,

**provided that n0 = O(n/ns ). If ns is proportional to p, then apart from τ0 ,
**

the other terms vary inversely with p.

5.2.4 Estimation of the Total Execution Times

Using the preceding estimates, we may estimate the execution time T (p, n,

)

of CG algorithms for diﬀerent choices of preconditioners. Here T (p, n,

) is the

total execution time for implementing a PCG algorithm to solve a problem

of size n, on a p processor parallel computer, where the initial residual is

reduced by a factor

**. The total execution time will be the product of the
**

number N (n,

) of iterations required to reduce the residual by the factor

,

and the parallel execution time T∗ (p, n) per iteration:

T (p, n,

) = N (n,

n). (5.18) We shall suppress dependence on .) T∗ (p.

n) = φ(n) τf ≤ c0 nα τf . n) and G∗ (p. n) denotes the execution time per iteration for the remaining computations (matrix-vector products. n) per iteration can be further decomposed as: T∗ (p. • We assume that the best serial execution time satisﬁes: Tbest (1. while H∗ (p. n) + H∗ (p. we heuristically es- timate the parallel eﬃciency of the additive Schwarz and Neumann-Neumann PCG algorithms. n) can be obtained by summing up the relevant execution time estimates ti for appropriately chosen routines from the preceding pages. d. • We assume that: τ0 = 0. Estimates for H∗ (p. inner products. p2 ≤ n. n) denotes the execution time per iteration of the preconditioning step. α and γc = (τc /τf ) 1. making several simplifying assumptions.19) where G∗ (p. (5. vector addition). n) = G∗ (p. • We omit lower order terms in expressions. p ≤ ns . . n). The execution time T∗ (p. We shall express the eﬃciency in terms of n. p. for convenience. n0 ≤ (1 + β∗ )(n/ns ). Employing the total execution times.

n) can be obtained similarly by summing the ti from the preceding section for the additive Schwarz preconditioner without a coarse space. and omit all lower order terms. ⎧ ⎪ ⎪ H∗ (p.20) ⎪ ⎪ G∗ (p. n) ≤ c0 e1 (ns /p) (1 + β∗ )α (n/ns )α τf ⎪ ⎪ ⎩ +e2 K β∗ (n/p) τc + e3 K β∗ (n/p) τf .292 5 Computational Issues and Parallelization Additive Schwarz Preconditioner Without Coarse Space. p2 ≤ n and p ≤ ns . We assume that τ0 = 0. . 2 for the condition number of the additive Schwarz PCG algorithm without coarse space correction yields: cond(M. h0 . n) to solve Au = f using a CG algorithm can be obtained by sum- ming the appropriately chosen quantities ti from the preceding section for matrix-vector products and inner products routines. Estimates of H∗ (p. Estimates of G∗ (p. Standard estimates for error reduction in PCG algorithms [GO4] yields: N (n. n) ≤ d1 (n/p) τf + d2 K (n/p) τc ⎪ ⎪ ⎨ +d3 (ns /p) K (n/ns )(d−1)/d τf (5. A) ≤ C(β) h−2 0 . Bounds from Chap.

. β) ≤ C(.

for some C(. β) h−1 0 ..

n). and retaining only the highest order terms and substituting h−1 1/d 0 = O(ns ). yields: .. n) and G∗ (p. β) independent of n. ns and p. Summing H∗ (p. −d (which holds since ns = O(|Ω| h0 )).

ns ) yields the following heuristic bound for the total eﬃciency when p = ns ≤ n1/2 . τ0 = 0 and 1 < α ≤ 3: ! nα E(p. Thus the above algorithm is scalable. as p is varied. n) = c0 nα τf along with the pre- ceding bound for T (p. ns . it may be noted that the value of n can be increased to maintain a constant eﬃciency. Heuristically. Estimates of G∗ (p. n. the value of p which minimizes the denominator will optimize the eﬃciency. ns ) ≤ c0 n1/d s C1 γc (n/p) + C2 (ns /p) (n/ns )α + C3 (n/p)(d−1)/d τf where γc ≡ (τc /τf ) 1. Here Ci may depend on all parameters excluding n. n) ≥ . p(d+1)/d C1 γc (n/p) + C2 (n/p)α + C3 (n/p)(d−1)/d By considering only the leading order terms as p increases. Additive Schwarz Preconditioner with Coarse Space. Substituting that Tbest (1. p and γc . n) and H∗ (p. n) can be obtained for the additive Schwarz preconditioner with coarse space correction by summing the appropriate ti : . T (p. for a ﬁxed n. n.

n) ≤ c0 e1 (ns /p) (1 + β∗ )α (n/ns )α τf ⎪ ⎪ ⎩ + e2 K β∗ (n/p) τc where lower order terms and the start up time τ0 have been omitted. 2 yield the following estimate for the condition number of the additive Schwarz PCG algorithm with coarse space correction: cond(M. n) ≤ d1 (n/p) τf + d2 K (n/p) τc ⎪ ⎪ ⎨ + d3 (ns /p) K (n/ns )(d−1)/d τf (5. Bounds from Chap. 5.2 Parallelizability of Domain Decomposition Solvers 293 ⎧ ⎪ ⎪ H∗ (p. A) ≤ C(β).21) ⎪ ⎪ G∗ (p. . Standard estimates for error reduction in PCG algorithms [GO4] yields: N (n. h0 .

β) ≤ C(..

where C(. β)..

ns ) yields a bound for E(p. Neumann-Neumann Preconditioner for the Schur Complement. n) and H∗ (p. By considering only the leading order terms. n. n. ns . p (C1 γc (n/p) + C2 (n/p)α ) The above bound is an improvement over the eﬃciency of the additive Schwarz algorithm without coarse space correction.22) ⎪ ⎩ (d−1)/d + e2 K (ns /p) (n/ns ) τc . the eﬃciency can be maintained. Bounds from Chap. p and ns . Heuristically. n) ≤ e1 K (ns /p) φ(n/ns ) τf (5.. n) can be estimated for the Schur complement algo- rithm with Neumann-Neumann preconditioner by summing relevant estimates ti for routines described in the preceding section: ⎧ ⎪ H (p. n) ≥ . n) ≤ d1 (ns /p) φ(n/ns ) τf + d2 K (n/p) τc ⎨ ∗ G∗ (p. 3 yield the following condition number estimate: 2 cond(M. n) and retaining only the highest order terms in φ(·) yields the following bound: T (p. and Ci may depend on all parameters excluding n. . β) is independent of n. ns ) ≤ c0 (C1 γc (n/p) + C2 (ns /p) (n/ns )α ) τf where γc ≡ (τc /τf ) 1. the value of p which minimizes the denominator optimizes the eﬃciency. it is seen that as p is increased. Here. Summing H∗ (p. A) ≤ C (1 + log(h0 /h)) . n) when p = ns ≤ n1/2 and τ0 = 0: nα E(p. this algorithm is scalable. Substituting Tbest (1. n) = c0 nα τf and the preceding bounds for T (p. The terms G∗ (p. n) and G∗ (p. lower order terms and the start up time τ0 have been omitted. Thus.

h0 .294 5 Computational Issues and Parallelization for the Neumann-Neumann algorithm with coarse space correction. Bounds for the error reduction of PCG algorithms [GO4] yields: N (n. .

β) ≤ C(..

) (1 + log(h0 /h)) . where C(.

GR16] for additional discus- sion on the parallel implementation of domain decomposition algorithms. An intermediate value of p will optimize the eﬃciency. n) and G∗ (p. it is seen that as p is increased. though not perfectly scalable.22. a value of n can be determined so that the eﬃciency in maintained. ns . GR12. ns ): T (p. Thus. Since by assumption h−1 1/d 0 = O(ns ). ns ) yields the following lower bound for the total eﬃciency when p = ns ≤ n1/2 and τ0 = 0: ! nα E(p. ns ) ≤ c0 log(n/ns ) C1 (ns /p) (n/ns )α + C2 (n/p)(d−1)/d γc τf + c0 log(n/ns ) (C3 (n/p) γc ) τf . The preceding discussion shows that the representative domain decomposition solvers are scalable. this algorithm is scalable. n).) is independent of n. SK. Readers are referred to [GR10. where γc = (τc /τf ) 1. n. n. SM4. n. Summing the terms H∗ (p. Substituting the estimate Tbest (1. Remark 5. it follows that log(h0 /h) = O(d−1 log(n/ns )). substituting log(h0 /h) = O(d−1 log(n/ns )) and retaining only the highest order terms in φ(·). n) = c0 nα τf and using the preceding bound for T (p. n) ≥ . CH15. yields the following bound for T (p. FA9. . p log(n/ns ) C1 (n/p)α + C2 (n/p)(d−1)/d γc + C3 (n/p) γc By considering only leading order terms.

One of the algorithms elaborates an algorithm from Chap. and solve it using a CG algorithm. since the iterative algorithms based on the Schur complement. Although saddle point methodology may also be employed to solve this least squares-control problem. subject to the constraint that the local unknowns solve the elliptic equation on each subdomain. on ∂Ω.1) employs unknown functions on each subdomain.5. AT. We denote a ﬁnite element discretization of (6. where c(x) ≥ 0.2 considers two non-overlapping subdomains.2) where A = AT > 0 is the stiﬀness matrix of size n and b ∈ IRn . 1. we consider a decomposition of Ω into two overlapping subdomains. GL. for simplicity we shall describe a matrix formulation for the following self adjoint elliptic equation: L u ≡ −∇ · (a(x)∇u) + c(x)u = f. with unknown boundary data that serve as control data. we describe iterative algorithms formulated based on the least squares-control theory framework [LI2. 6.3. These unknowns solve the partial diﬀerential equation on each subdomain. Given a decomposition of Ω into two or more subdomains. This problem can be formulated mathematically as a constrained minimization problem.1. Schwarz and Lagrange multiplier formulations are more extensively studied.6 Least Squares-Control Theory: Iterative Algorithms In this chapter. . we reduce it to an unconstrained minimization problem. 6. The methodology ap- plies to non-self adjoint elliptic equations. a least squares- control formulation of (6. in Ω (6.1) u = 0.1) as: Au = f (6. The control data must be chosen so that the subdomain solutions match with neighbors to yield a global solution to (6. Some extensions to multiple subdomains are discussed in Chap. GU2]. however.1). Our discussion is heuristic and described for its intrinsic interest. which seeks to minimize the diﬀerence between the local unknowns on the regions of overlap. while Chap. In Chap. 6.

Remark 6.) ≡ u(.5) ⎪ ⎩ vi = 0. By construction. u2 ) to (6. which minimizes the functional within V∗ : J(u1 .) deﬁned on the subdomains Ω1∗ and Ω2∗ . 6. The least squares- control formulation of (6.1) seeks local functions u1 (. Our focus will be on the iterative solution of system (6.1) exists.v2 )∈V∗ where V∗ consists of v1 (.1 Two Overlapping Subdomains In this section.2).296 6 Least Squares-Control Theory: Iterative Algorithms Regular decomposition Immersed decomposition Ω2∗ - Ω1∗ - Ω2∗ - Ω∗ - 1 B (2) Ω12 ∗ B (1) Fig. Here gi denotes the unknown local Dirichlet data. we consider two subdomains Ω1∗ and Ω2∗ which form an overlapping decomposition of Ω with ∗ suﬃcient overlap.1.1. Accordingly. as in Fig. 6. u2 ) = min J(v1 .) and v2 (.4) and (6. In this case.) = u2 (. We shall employ the following functional in the overlapping case: J(v1 . the Dirichlet boundary data gi (.) on Ωi∗ for i = 1. respectively. on B[i] .1.) on Ωi∗ . if the global solution u to (6.) solving: ⎧ ∗ ⎪ ⎨ L vi = f. v2 ) ≡ v1 − v2 2α.Ω12 with minimum value zero.1) based on a decomposition of Ω into two overlapping subdomains [AT]. then its restriction ui (. then it can easily be veriﬁed that ui (.) and u2 (.5) satisﬁes u1 (. Two overlapping subdomains 6.4). We deﬁne Ω12 ≡ Ω1∗ ∩ Ω2∗ as the region of overlap between the two subdomains.) are deﬁned on Ω1∗ and Ω2∗ . 2 will minimize u1 − u2 2α. If the solution (u1 . on B (i) for i = 1. 2 (6.3) where v1 (.) on B (i) can be regarded as control data which needs to be determined in order to minimize the square norm error term (6. we describe a least squares-control formulation of (6. .) on ∗ Ω12 . for 0 ≤ α ≤ 1.Ω12 ∗ be the fractional Sobolev norm H (Ω α 12 ) on Ω12 . (6.4) (v1 .) and v2 (. ∗ ∗ Let · α. in Ωi vi = gi .Ω12 ∗ . We deﬁne B (i) = ∂Ωi ∩Ω as the internal boundary of each subdomain and B[i] = ∂Ωi ∩ ∂Ω as its external boundary.) will match the true solution u(. v2 ) (6.

Let nI = (n11 + nB (2) + n12 ) be the number of nodes in Ω1∗ and nI = (n12 + nB + n22 ) the number of nodes in Ω2∗ . v(2) as: J v(1) . we let v(i) = (vI . . i. The indices of nodes in Ω1∗ will be I (1) = I11 ∪ B (2) ∪ I12 and I (2) = I12 ∪ B (1) ∪ I22 in Ω2∗ .6) ⎪ ⎪ ⎪ ⎪ (1) B = indices of nodes in B (1) ⎪ ⎪ ⎩ ∗ I22 = indices of nodes in Ω2∗ \ Ω 1 . A discrete version of the least squares-control problem (6. respectively.e.Ω12 ∗ = 2 R12 v 1 (1) − R21 v(2) 2Aα T (6. n12 . B (2) . v2 ) are ﬁnite element functions deﬁnedon (Ω1∗ . we let R12 denote an n12 × n1 restriction matrix which maps a nodal ∗ ∗ vector on Ω 1 into its subvector of nodal values on Ω 12 . Given the original load vector (i) (i) f ∈ IRn . B (1) and I22 . v . if (v1 . we deﬁne a restriction matrix R21 as an n12 × n2 matrix mapping a vector of nodal values ∗ ∗ on Ω 2 into its subvector of nodal values on Ω 12 . nB .5) can now be obtained by discretizing the square norm functional and constraints. vB )T ∈ IRni denote the vector of its interior and boundary nodal values.4) and (6.1 Two Overlapping Subdomains 297 We shall formulate a matrix version of the above least squares-control formulation using the following notation. v(2) )T ∈ IRnE . (i) (i) We let AII denote a submatrix of A of size nI corresponding to the indices in I (i) . Given the ordering B (2) ∪ I12 ∪ B (1) of nodes in ∗ Ω 12 . Accordingly. 6. B (2) . We shall order all the nodes in Ω ∗ and partition them based on the subregions Ω1∗ \ Ω 2 . Ω12 ∗ . Similarly. representing coupling between interior nodes in Ωi∗ . (1) I12 . (i) (i) (i) we let AIB denote an nI ×nB submatrix of A representing coupling between nodes in I and B .7) = 12 R12 v(1) − R21 v(2) Aα R12 v(1) − R21 v(2) . we deﬁne J v . Similarly. nB and n22 denote the number of indices in I11 . (i) (i) A global extended vector consisting of the local subdomain nodal vectors T T will be denoted vE = (v(1) . (2) (1) Let n11 . For 0 ≤ α ≤ 1 we let Aα denote a symmetric positive deﬁnite matrix of size n12 representing the ﬁnite ∗ ∗ element discretization of the H α (Ω12 ) Sobolev inner product on Ω 12 . interior nodes in Ωi∗ with boundary nodes on B (i) . and f B ∈ IRnB as the restriction of f onto the nodes on B.. If vi denotes a ﬁnite element ∗ (i)T (i)T function deﬁned on subdomain Ω i . (2) (1) (i) (i) Deﬁne ni = (nI + nB ) and nE = (n1 + n2 ). B (1) and ∗ ∗ Ω2 \ Ω 1 and deﬁne the associated set of indices as: ⎧ ∗ ⎪ ⎪ I11 = indices of nodes in Ω1∗ \ Ω 2 ⎪ ⎪ ⎪ ⎪ (2) (2) ⎨ B = indices of nodes in B ∗ I12 = indices of nodes in Ω12 (6. v(2) ≡ 1 2 v1 − v2 2α. Ω2∗ ) with (1) (1) (2) associated nodal vectors v . we deﬁne local interior load vectors f I ∈ IRnI as the restriction of f onto the interior nodes in each subdomain.

By construction.298 6 Least Squares-Control Theory: Iterative Algorithms The constraints (6. The objective functional J v(1) . v(2) and the linear constraints may be expressed compactly using matrix notation. ∗ of the local nodal vectors will match on Ω 12 . 2 v (2) −R21 Aα R12 R21 Aα R21 T T v (2) 2 E (6. . then the restrictions: R12 u(1) = R21 u(2) . We let K denote a singular matrix of size nE having the following block structure: T R12 Aα R12 −R12 T Aα R21 K= (6. and hence their associated ﬁnite ∗ element functions u1 and u2 will also match on the region Ω 12 of overlap. Remark 6.8) vB = g(i) (i) where f I denotes the local internal load vector and g(i) denotes the unknown discrete Dirichlet boundary data on B (i) . Here N is an (1) (2) (i) (nI +nI )×n rectangular matrix and N (i) is an nI ×ni rectangular matrix.10) The constraints (6.11). of full rank.11) (i)T (i)T where the local nodal vectors v(i) satisfy v(i) = (vI . Since the second block row above corresponds to a renaming of the Dirichlet boundary data. v(2) )T . v(2) )T may be equivalently expressed as: T 1 v(1) T R12 Aα R12 −R12 T Aα R21 v(1) 1 T J (vE ) = = v KvE . we eliminate g(i) (i) and shall henceforth employ vB . where N = (2) . u(2) = 0. The discrete least squares-control formulation seeks to minimize J (vE ) subject to constraint (6. as described in the following result.8) may be expressed compactly as: (1) N (1) 0 f N vE = f E .5) can be discretized to yield the following linear system: (i) (i) (i) (i) (i) AII vI + AIB vB = f I (i) 1≤i≤2 (6. Then functional J(vE ) T T for vE = (v(1) .9) −R21T Aα R12 R21 T Aα R21 T T corresponding to the partitioning vE = (v(1) .2. if J u(1) . f E = I(2) (i) (i) (i) 0 N fI (6. N ≡ AII AIB . vB )T .

Suppose the following assumptions hold. Let u denote the solution of (6.1 Two Overlapping Subdomains 299 Lemma 6. 6.3.2). . 1.

• Matrix K should be symmetric and coercive within the subspace V0 : V0 = {vE : N vE = 0} .12) is easily derived by re- quiring the ﬁrst variation of L (vE . see [FA14]) may be employed to solve (6.14) (1) (2) where λ ∈ IRnI +nI denotes a vector of Lagrange multiplier variables.13) 3. 4. This is equivalent to the inf sup condition which can easily be veriﬁed for (6.4. with J (wE ) = 0. λ) ≡ J (vE ) + λT (N vE − f E ). (6. we shall describe an alternative approach.15) N 0 λ fE Here matrix K is a singular matrix of size nE having low rank. Proof. 2 let Ri denote a restriction matrix mapping a nodal vector of ∗ the form v onto a vector of nodal values on Ω i . as described in Chap. while matrix (1) (2) N is an (nI + nI ) × nE matrix of full rank. General results in Chap. the saddle point linear system associated with (6. 2. (6. λ) to be zero. w(2) denote an extended nodal vector satisfying: J (wE ) = min J (vE ) (6. deﬁne a Lagrangian function L (vE .12) can be reformulated as a saddle point linear system. λ) L (vE . the following results will hold for 0 ≤ α ≤ 1 w(i) = Ri u for i = 1. • Matrix N should have full rank. Traditional iterative algorithms based either on augmented Lagrangian formulations [GL7] or the projected gradient method (as in Chap. (6.15). For i = 1. Indeed. 10: K NT vE 0 = . Remark 6. T T T 2.11) since N (i) are of full rank. .15) will be nonsingular even though matrix K is singular. Let wE = w(1) . Follows by construction. 10 show that a saddle point system is nonsingular when the following conditions hold.12) vE ∈V∗ where V∗ = {vE : N vE = f E } . We brieﬂy outline why system (6. Then. However. The constrained minimization problem (6. Then.

By construction N vE = 0 yields N (i) v(i) = 0 for i = 1. the subdomain Dirichlet data vB and vB represent (1)control variables. and so the restriction R12 v(1) − R21 v(2) will be ∗ discrete harmonic on Ω12 . 2. it will hold that (1) (2) R12 v = R21 v and consequently. (6. 2. The minimum of J (vE ) in V∗ can alternatively be sought by parameterizing V∗ and minimizing the resulting unconstrained functional. e1 ≡ R12 I 0 −1 −1 (6.300 6 Least Squares-Control Theory: Iterative Algorithms Suppose the coercivity of K within V0 is violated.18) (2) (2) (2) (2) −AII AIB AII fI H2 ≡ R21 . We arrive at a contradiction. denote the restrictions of such vectors to Ω 12 as: (1) R12 v(1) = H1 vB + e1 (2) (6. e2 ≡ R21 . yielding that v = 0 and v(i) = 0 for i = 1. The general solution to the full rank system N vE = f E can be (i) (i) parameterized in terms of the boundary data vB by solving N (i) v(i) = f I : (i)−1 (i) (i)−1 (i) (i) −AII AIB (i) AII f I v = vB + for i = 1.16) I 0 ∗ To simplify the expressions. 2. there must exist a non-trivial vE ∈ V0 satisfying vTE KvE = R12 v(1) − R21 v(2) 2Aα = 0. Since R12 v(1) − R21 v(2) 2Aα = 0. I 0 (1) (2) Here. a global nodal vector v can be deﬁned matching v(i) on both subdomains and by construction v will satisfy A v = 0. (2) Substituting this parameterization . then due to the ﬁnite dimensionality of V0 . We describe this approach next.17) R21 v(2) = H2 vB + e2 where (1)−1 (1) (1)−1 (1) −AII AIB AII fI H1 ≡ R12 .

v yields the (1) (2) following reduced functional JB vB . into the functional J v . vB = J(v(1) . v(2) ): .

/.

.

v B ≡ 1 2 / H1 vB + e1 − H2 vB + e2 / . /2 (1) (2) / (1) (2) / JB v B .13) (1) (2) seeks boundary data wB and wB which minimizes: . Aα The new unconstrained minimization problem associated with (6.12) and (6.

.

vB . (6.vB ) Applying stationarity conditions to: .19) (1) (2) (vB . wB = min JB vB . (1) (2) (1) (2) JB wB .

wB = H1 wB − H2 wB + e1 − e2 2Aα 2 . 1 (1) (2) (1) (2) JB wB .

1 Two Overlapping Subdomains 301 yields the linear system: ⎧ . 6.

⎪ ⎨ 21 ∂J(1) B (1) (2) = H1T Aα H1 wB − H2 wB + e1 − e2 = 0 ∂wB .

20).12) and (6. We thus have the following equivalence between (6. ·): J u(1) .19) is obtained by solving sys- tem (6. ⎪ ⎩ 2 (2) = −H2T Aα H1 w(1) (2) B − H2 wB + e1 − e2 = 0. (v(1) . u(2) = min J v(1) . Suppose the following assumptions hold.v(2) )∈V∗ . T 1. v(2) . we assume that the solution to (6. Lemma 6. . T - H1T Aα H1 −H1T Aα H2 wB H1 Aα (e2 − e1 ) = . Let u(1) .20) −H2T Aα H1 H2T Aα H2 wB (2) H2T Aα (e1 − e2 ) Henceforth.5.19). 1 ∂JB ∂wB Rewriting the above yields a block linear system: . (6. u(2) denote the constrained minimum of J(·. (1) .

T (1) (2) 2. Let wB . wB denote the unconstrained minimum of JB (·. ·): .

.

vB . (1) (2) (1) (2) JB wB . (1) (2) (vB . wB = min JB vB . the following results will hold: (i)−1 .vB ) Then.

20) generates the quadratic form associated with a square norm. and an iterative method such as CG algorithm may be applied to solve (6. The coeﬃcient matrix in (6.20). 2. Follows by direct substitution and algebraic simpliﬁcation. Remark 6. it will also be positive deﬁnite. 2.20) is positive deﬁnite. Since the coeﬃcient matrix in (6.6. wB Proof. without loss of generality let ei = 0 for i = 1. Importantly. it will be positive semideﬁnite: . To verify that the coeﬃcient matrix in (6. (i) (i) (i) A f − A w u(i) = II I (i) IB B i = 1.20) is symmetric by construction.

Aα To show deﬁniteness